sica.base.MSTD

sica.base.MSTD(X, m, M, step, n_runs, fun='logcosh', algorithm='fastica_par', whiten=True, max_iter=2000, n_jobs=- 1, ax=None)[source]

Plot “MSTD graphs” to help choose an optimal dimension for ICA decomposition.

Run stabilized ICA algorithm for several dimensions in [m , M] and compute the stability distribution of the components each time.

Parameters:
X2D array, shape (n_mixtures, n_observations)

Training data

mint

Minimal dimension for ICA decomposition.

Mint > m

Maximal dimension for ICA decomposition.

stepint > 0

Step between two dimensions (ex: if step = 2 the function will test the dimensions m, m+2, m+4, … , M).

n_runsint

Number of times we run the FastICA algorithm (see fit method of class Stabilized_ICA)

funstr {‘cube’ , ‘exp’ , ‘logcosh’ , ‘tanh’} or function, optional.

The default is ‘logcosh’. See the fit method of StabilizedICA for more details.

algorithmstr {‘fastica_par’ , ‘fastica_def’ , ‘picard_fastica’ , ‘picard’ , ‘picard_ext’ , ‘picard_orth’}, optional.

The algorithm applied for solving the ICA problem at each run. Please the supplementary explanations for more details. The default is ‘fastica_par’, i.e. FastICA from sklearn with parallel implementation.

whitenbool, optional

It True, X is whitened only once as an initial step, with an SVD solver and M components. If False, X must be already whitened, with M components. The default is True.

max_iterint, optional

Parameter for _ICA_decomposition. The default is 2000.

n_jobsint

Number of jobs to run in parallel for each stabilized ICA step. Default is -1

axarray of matplotlib.axes objects, optional

The default is None.

Returns:
None.

References

Kairov U, Cantini L, Greco A, Molkenov A, Czerwinska U, Barillot E, Zinovyev A. Determining the optimal number of independent components for reproducible transcriptomic data analysis. BMC Genomics. 2017 Sep 11;18(1):712. doi: 10.1186/s12864-017-4112-9. PMID: 28893186; PMCID: PMC5594474. (see https://bmcgenomics.biomedcentral.com/track/pdf/10.1186/s12864-017-4112-9 ).

Examples

>>> import pandas as pd
>>> from sica.base import MSTD
>>> df = pd.read_csv("data.csv" , index_col = 0)
>>> MSTD(df.values , m = 5 , M = 100 , step = 2 , n_runs = 20)