sica.base.StabilizedICA¶
- class sica.base.StabilizedICA(n_components, n_runs, resampling=None, algorithm='fastica_par', fun='logcosh', whiten=True, max_iter=2000, plot=False, normalize=True, reorientation=True, pca_solver='auto', chunked=False, chunk_size=None, zero_center=True, n_jobs=1, verbose=0)[source]¶
Implement a stabilized version of the Independent Component Analysis algorithm. It fits the matrix factorization model X = AS, where A is the unmixing matrix (n_mixtures, n_sources), S is the source matrix (n_sources, n_observations) and X is the observed mixed data (n_mixtures, n_observations).
- Parameters:
- n_componentsint
Number of ICA components/sources.
- n_runsint
Number of times we run the FastICA algorithm
- resamplingstr {None , ‘bootstrap’ , ‘fast_bootstrap’}, optional
Method for resampling the data before each run of the ICA solver.
If None, no resampling is applied.
If ‘bootstrap’ the classical bootstrap method is applied to the original data matrix, the resampled matrix is
whitened (using the whitening hyperparameters set for the fit method) and the ICA components are extracted.
If ‘fast_boostrap’ a fast bootstrap algorithm is applied to the original data matrix and the whitening
operation is performed simultaneously with SVD decomposition and then the ICA components are extracted (see References).
Resampling could lead to quite heavy computations (whitening at each iteration), depending on the size of the input data. It should be considered with care. The default is None.
- algorithmstr {‘fastica_par’ , ‘fastica_def’ , ‘picard_fastica’ , ‘picard’ , ‘picard_ext’ , ‘picard_orth’}, optional.
The algorithm applied for solving the ICA problem at each run. Please see the supplementary explanations for more details. The default is ‘fastica_par’, i.e. FastICA from sklearn with parallel implementation.
- funstr {‘cube’ , ‘exp’ , ‘logcosh’ , ‘tanh’} or function, optional.
If
algorithmis in {‘fastica_par’ , ‘fastica_def’}, it represents the functional form of the G function used in the approximation to neg-entropy. Could be either ‘logcosh’, ‘exp’, or ‘cube’.If
algorithmis in {‘picard_fastica’ , ‘picard’ , ‘picard_ext’ , ‘picard_orth’}, it is associated with the choice of a density model for the sources. See supplementary explanations for more details.The default is ‘logcosh’.
- whitenboolean, optional
If True the matrix X is whitened, i.e. centered then projected in the space defined by its first
n_componentsPCA components and reduced to unit variance along each of these axes.If False the input X matrix must be already whitened (the rows must be centered, scaled to unit variance and uncorrelated.)
The default is True.
- max_iterint
Maximum number of iteration for the FastICA algorithm.
- plotboolean, optional
If True plot the stability indexes for each cluster in decreasing order. The default is False.
- normalizeboolean, optional
If True normalize the rows of
S_(i.e. the stabilized ICA components) to unit standard deviation. The default is True.- reorientationboolean,optional
If True re-oriente the rows of
S_towards positive heavy tail. The default is True.- pca_solverstr {‘auto’, ‘full’, ‘arpack’, ‘randomized’ , ‘lobpcg’}, optional
Solver for the different PCA methods. Please note that some solvers may not be compatible with some PCA methods. See _whitening.py for more details. The default is “auto”.
- chunkedboolean, optional
Parameter for the whitening step, see _whitening.py for more details. The default is False.
- chunk_sizeint, optional
Parameter for the whitening step, see _whitening.py for more details. The default is None.
- zero_centerboolean, optional
Parameter for the whitening step, see _whitening.py for more details. The default is True.
- n_jobsint, optional
Number of jobs to run in parallel. -1 means using all processors. See the joblib package documentation for more explanations. The default is 1.
- verbose: int, optional
Control the verbosity: the higher, the more messages. The default is 0.
- Attributes:
- S_: 2D array, shape (n_components , n_observations)
Array of sources/metagenes, each line corresponds to a stabilized ICA component (i.e. the centrotype of a cluster of components).
- stability_indexes_1D array, shape (n_components)
Stability indexes for the stabilized ICA components.
- mean_1D array, shape (n_mixtures)
Notes
Here
n_componentscorresponds to the number of ICA sources,n_mixturescorresponds to the number of linear mixtures (i.e. linear mixtures of ICA sources) that we observe, andn_observationscorresponds to the number of observations collected for these mixtures. Each time, the user needs to carefully determine which dimension in his data set should correspond to the linear mixtures of ICA sources and which dimension should correspond to the observations. The user should keep in mind that, at the end, he will obtainn_componentsvectors of dimensionn_observations, independent form each other (as finite samples of latent independent distributions). The user guide and the definition of the ICA framework should be helpful.For a data set of discretized sound signals registered by 10 microphones at 100 time points, if we want to
retrieve 5 ICA sources we need to set
n_mixtures = 10,n_observations = 100andn_components = 5.For a gene expression data set with 100 samples and 10000 genes, if we want to retrieve 10 ICA sources **in
the gene space** we need to set
n_mixtures = 100,n_observations = 10000andn_components = 10.References
- Fast bootstrap algorithm:
Fisher A, Caffo B, Schwartz B, Zipunnikov V. Fast, Exact Bootstrap Principal Component Analysis for p > 1 million. J Am Stat Assoc. 2016;111(514):846-860. doi: 10.1080/01621459.2015.1062383. Epub 2016 Aug 18. PMID: 27616801; PMCID: PMC5014451.
- ICASSO method :
J. Himberg and A. Hyvarinen, “Icasso: software for investigating the reliability of ICA estimates by clustering and visualization,” 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718), 2003, pp. 259-268, doi: 10.1109/NNSP.2003.1318025 (see https://www.cs.helsinki.fi/u/ahyvarin/papers/Himberg03.pdf).
- Picard algorithm and extensions:
Pierre Ablin, Jean-Francois Cardoso, Alexandre Gramfort, “Faster independent component analysis by preconditioning with Hessian approximations” IEEE Transactions on Signal Processing, 2018 (see https://arxiv.org/abs/1706.08171).
Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort “Faster ICA under orthogonal constraint” ICASSP, 2018 (see https://arxiv.org/abs/1711.10873)
- UMAP:
For more details about the UMAP (Uniform Manifold Approximation and Projection), see https://pypi.org/project/umap-learn/.
Examples
>>> import pandas as pd >>> from sica.base import StabilizedICA >>> df = pd.read_csv("data.csv" , index_col = 0) >>> sICA = StabilizedICA(n_components = 45 , n_runs = 30, plot = True, n_jobs = -1) >>> sICA.fit(df) >>> Sources = pd.DataFrame(sICA.S_ , columns = df.index , index = ['source ' + str(i) for i in range(sICA.S_.shape[0])]) >>> Sources.head()
Methods
fit(X[, y])Fit the ICA model with X (use stabilization).
fit_transform(X[, y])Fit to data, then transform it.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
inverse_transform(X[, copy])Transform the mixing matrix back to the mixed data (applying the sources).
projection([method, ax])Plot the
n_components*n_runsICA components computed during fit() in 2D.set_inverse_transform_request(*[, copy])Request metadata passed to the
inverse_transformmethod.set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
set_transform_request(*[, copy])Request metadata passed to the
transformmethod.transform(X[, copy])Apply dimensionality reduction to X (i.e.
- fit(X, y=None)[source]¶
Fit the ICA model with X (use stabilization).
Compute the ICA components of X
n_runstimes.Cluster all the
n_components*n_runscomponents with agglomerative hierarchical clustering (average linkage) inton_componentsclusters.For each cluster compute its stability index and return its centrotype as the final ICA component.
- Parameters:
- X2D array-like, shape (n_mixtures, n_observations) or (n_components, n_observations) if whiten is False.
Training data
- yIgnored
Ignored.
- Returns:
- selfobject
Returns the instance itself.
- fit_transform(X, y=None, **fit_params)[source]¶
Fit to data, then transform it.
Fits transformer to
Xandywith optional parametersfit_paramsand returns a transformed version ofX.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- inverse_transform(X, copy=True)[source]¶
Transform the mixing matrix back to the mixed data (applying the sources).
- Parameters:
- X: 2D array-like, shape (n_mixtures , n_components)
- copy: bool, optional
If False, data passed to fit are overwritten. The default is True.
- Returns:
- X_new: 2D array, shape (n_mixtures, n_observations)
- projection(method='mds', ax=None)[source]¶
Plot the
n_components*n_runsICA components computed during fit() in 2D. Approximate the original dissimilarities between components by Euclidean distance. Each cluster is represented with a different color.- Parameters:
- methodstring, optional
Name of the dimensionality reduction method (e.g “tsne” , “mds” or “umap”) The default is “umap”.
- axmatplotlib.axes, optional
The default is None.
- Returns:
- None.
Notes
We use the dissimilarity measure
sqrt(1 - |rho_ij|)(rho the Pearson correlation) instead of1 - |rho_ij|to reduce overlapping.Please note that multidimensional scaling (MDS) is more computationally demanding than t-SNE or UMAP. However, it takes into account the global structures of the data set while the others don’t. For t-SNE or UMAP one cannot really interpret the inter-cluster distances.
- set_inverse_transform_request(*, copy='$UNCHANGED$')[source]¶
Request metadata passed to the
inverse_transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toinverse_transformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toinverse_transform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- copystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
copyparameter ininverse_transform.
- Returns:
- selfobject
The updated object.
- set_output(*, transform=None)[source]¶
Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of
transformandfit_transform."default": Default output format of a transformer"pandas": DataFrame output"polars": Polars outputNone: Transform configuration is unchanged
New in version 1.4:
"polars"option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_transform_request(*, copy='$UNCHANGED$')[source]¶
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- copystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
copyparameter intransform.
- Returns:
- selfobject
The updated object.
- transform(X, copy=True)[source]¶
Apply dimensionality reduction to X (i.e. recover the mixing matrix applying the pseudo-inverse of the sources).
- Parameters:
- X2D array-like, shape (n_mixtures, n_observations)
- copy: bool, optional
If False, data passed to fit are overwritten. The default is True.
- Returns:
- A2D array, shape (n_mixtures, n_components)
Unmixing matrix which maps the independent sources to the data (i.e. X.T = AS)