`sica.base`.MSTD¶

sica.base.MSTD(X, m, M, step, n_runs, fun='logcosh', algorithm='fastica_par', whiten=True, max_iter=2000, n_jobs=- 1, ax=None)[source]¶

Plot “MSTD graphs” to help choose an optimal dimension for ICA decomposition.

Run stabilized ICA algorithm for several dimensions in [m , M] and compute the stability distribution of the components each time.

Parameters:

X2D array, shape (n_mixtures, n_observations): Training data
mint: Minimal dimension for ICA decomposition.
Mint > m: Maximal dimension for ICA decomposition.
stepint > 0: Step between two dimensions (ex: if step = 2 the function will test the dimensions m, m+2, m+4, … , M).
n_runsint: Number of times we run the FastICA algorithm (see fit method of class Stabilized_ICA)
funstr {‘cube’ , ‘exp’ , ‘logcosh’ , ‘tanh’} or function, optional.: The default is ‘logcosh’. See the fit method of StabilizedICA for more details.
algorithmstr {‘fastica_par’ , ‘fastica_def’ , ‘picard_fastica’ , ‘picard’ , ‘picard_ext’ , ‘picard_orth’}, optional.: The algorithm applied for solving the ICA problem at each run. Please the supplementary explanations for more details. The default is ‘fastica_par’, i.e. FastICA from sklearn with parallel implementation.
whitenbool, optional: It True, X is whitened only once as an initial step, with an SVD solver and M components. If False, X must be already whitened, with M components. The default is True.
max_iterint, optional: Parameter for _ICA_decomposition. The default is 2000.
n_jobsint: Number of jobs to run in parallel for each stabilized ICA step. Default is -1
axarray of matplotlib.axes objects, optional: The default is None.

Returns:

None.

References

Kairov U, Cantini L, Greco A, Molkenov A, Czerwinska U, Barillot E, Zinovyev A. Determining the optimal number of independent components for reproducible transcriptomic data analysis. BMC Genomics. 2017 Sep 11;18(1):712. doi: 10.1186/s12864-017-4112-9. PMID: 28893186; PMCID: PMC5594474. (see https://bmcgenomics.biomedcentral.com/track/pdf/10.1186/s12864-017-4112-9 ).

Examples

>>> import pandas as pd
>>> from sica.base import MSTD
>>> df = pd.read_csv("data.csv" , index_col = 0)
>>> MSTD(df.values , m = 5 , M = 100 , step = 2 , n_runs = 20)

sica.base.MSTD¶

`sica.base`.MSTD¶