{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Stability analysis of ICA components for transcriptomic data \n", "\n", "Here we propose a short analysis of the stability and the reproductibility of ICA components extracted from several gene expression data sets. We are mainly interested in studying the behavior of transcriptomic data extracted from [\"Defining the Biological Basis of Radiomic Phenotypes in Lung Cancer\" Grossman et al. 2017](https://elifesciences.org/articles/23421)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "#%load_ext autoreload\n", "#%autoreload 2\n", "\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Load data sets \n", "\n", "### Grossman data set \n", "\n", "These data sets were extracted from [\"Defining the Biological Basis of Radiomic Phenotypes in Lung Cancer\" Grossman et al. 2017](https://elifesciences.org/articles/23421). \n", " \n", "df1 contains the expression of 21,766 unique genes for 269 patients with Non-small cell lung cancer (NSCLC) treated at the H. Lee Moffitt Cancer Center, Tampa, Florida, USA. df2 contains the expression of the same 21,766 unique genes for 89 patients with Non-small cell lung cancer (NSCLC)treated at MAASTRO clinical, Maastricht, NL. Gene expression values were measured on a custom Rosetta/Merck Affymetrix 2.0 microarray chipset and normalized with the robust multi-array average (RMA) algorithm. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | \n", " | 3643 | \n", "84263 | \n", "7171 | \n", "2934 | \n", "11052 | \n", "1241 | \n", "6453 | \n", "57541 | \n", "9349 | \n", "11165 | \n", "... | \n", "643669 | \n", "1572 | \n", "8551 | \n", "26784 | \n", "26783 | \n", "26782 | \n", "26779 | \n", "26778 | \n", "26777 | \n", "100132941 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset | \n", "Samples | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| Grossman USA | \n", "RadioGenomic-017 | \n", "5.205151 | \n", "7.097989 | \n", "9.559617 | \n", "8.396808 | \n", "7.603719 | \n", "7.990605 | \n", "10.044401 | \n", "9.054930 | \n", "7.383169 | \n", "8.177010 | \n", "... | \n", "6.419273 | \n", "3.809826 | \n", "6.507880 | \n", "6.572121 | \n", "5.400848 | \n", "5.951391 | \n", "3.381860 | \n", "9.825584 | \n", "2.905091 | \n", "5.622438 | \n", "
| RadioGenomic-055 | \n", "5.615738 | \n", "6.585052 | \n", "9.777869 | \n", "9.082415 | \n", "8.639498 | \n", "6.781274 | \n", "9.541826 | \n", "8.866110 | \n", "6.422702 | \n", "7.196294 | \n", "... | \n", "5.753828 | \n", "4.186127 | \n", "6.821582 | \n", "7.031406 | \n", "4.852417 | \n", "6.140850 | \n", "2.629760 | \n", "9.005145 | \n", "3.366466 | \n", "5.495330 | \n", "|
| RadioGenomic-227 | \n", "5.679276 | \n", "7.747854 | \n", "10.648704 | \n", "9.127985 | \n", "7.369421 | \n", "7.203773 | \n", "8.972255 | \n", "8.328371 | \n", "7.269232 | \n", "7.449183 | \n", "... | \n", "5.666999 | \n", "4.316130 | \n", "6.637855 | \n", "6.248824 | \n", "4.664228 | \n", "5.767970 | \n", "2.911470 | \n", "8.674466 | \n", "3.337194 | \n", "6.308605 | \n", "|
| RadioGenomic-222 | \n", "5.317341 | \n", "7.196276 | \n", "10.949771 | \n", "8.098896 | \n", "7.639882 | \n", "7.971876 | \n", "10.159637 | \n", "8.667702 | \n", "8.474250 | \n", "7.271477 | \n", "... | \n", "5.531060 | \n", "3.403776 | \n", "7.059419 | \n", "6.201873 | \n", "4.690005 | \n", "6.256286 | \n", "4.119688 | \n", "9.099659 | \n", "3.181781 | \n", "5.740033 | \n", "|
| RadioGenomic-212 | \n", "7.196904 | \n", "9.346492 | \n", "9.673778 | \n", "9.358636 | \n", "8.741693 | \n", "7.616498 | \n", "10.376653 | \n", "8.701461 | \n", "6.601991 | \n", "7.344651 | \n", "... | \n", "5.519642 | \n", "3.796049 | \n", "7.332635 | \n", "6.050121 | \n", "4.898523 | \n", "6.537895 | \n", "3.600895 | \n", "8.792510 | \n", "2.945391 | \n", "5.835411 | \n", "
5 rows × 21766 columns
\n", "| \n", " | \n", " | 1 | \n", "2 | \n", "144571 | \n", "65985 | \n", "13 | \n", "201651 | \n", "51166 | \n", "195827 | \n", "79719 | \n", "22848 | \n", "... | \n", "9183 | \n", "55055 | \n", "11130 | \n", "7789 | \n", "158586 | \n", "440590 | \n", "79699 | \n", "7791 | \n", "23140 | \n", "26009 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset | \n", "Samples | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| GSE33356 | \n", "GSM494556 | \n", "5.234802 | \n", "11.922797 | \n", "5.031032 | \n", "9.141089 | \n", "9.934807 | \n", "6.868094 | \n", "5.038721 | \n", "6.845863 | \n", "7.877033 | \n", "9.201101 | \n", "... | \n", "7.149422 | \n", "7.737312 | \n", "8.386004 | \n", "5.994052 | \n", "7.685643 | \n", "4.546110 | \n", "8.870484 | \n", "7.938569 | \n", "7.596789 | \n", "8.049103 | \n", "
| GSM494557 | \n", "5.069035 | \n", "11.146006 | \n", "4.737330 | \n", "8.926558 | \n", "4.419542 | \n", "3.276470 | \n", "7.517926 | \n", "6.945466 | \n", "7.387878 | \n", "9.279903 | \n", "... | \n", "7.307666 | \n", "7.693829 | \n", "9.986055 | \n", "5.808656 | \n", "5.968874 | \n", "4.979624 | \n", "9.177783 | \n", "7.976926 | \n", "7.044507 | \n", "8.192161 | \n", "|
| GSM494558 | \n", "5.514972 | \n", "12.378934 | \n", "6.990395 | \n", "7.268946 | \n", "4.253987 | \n", "2.978153 | \n", "6.286073 | \n", "7.201963 | \n", "7.125318 | \n", "10.170862 | \n", "... | \n", "6.886556 | \n", "7.927103 | \n", "8.124558 | \n", "5.526909 | \n", "6.000892 | \n", "4.951939 | \n", "8.613027 | \n", "8.642108 | \n", "7.747791 | \n", "7.947040 | \n", "|
| GSM494559 | \n", "6.871695 | \n", "11.853862 | \n", "4.589314 | \n", "8.874169 | \n", "5.911853 | \n", "4.412516 | \n", "6.304975 | \n", "7.731857 | \n", "7.300380 | \n", "8.649045 | \n", "... | \n", "7.939107 | \n", "8.589020 | \n", "9.673297 | \n", "6.190925 | \n", "6.736635 | \n", "4.765782 | \n", "9.643763 | \n", "7.295604 | \n", "7.839339 | \n", "8.377581 | \n", "|
| GSM494560 | \n", "6.761781 | \n", "11.200515 | \n", "5.001184 | \n", "9.027746 | \n", "5.356248 | \n", "3.515208 | \n", "5.545497 | \n", "7.552539 | \n", "7.309306 | \n", "9.229319 | \n", "... | \n", "7.833734 | \n", "8.689572 | \n", "10.304345 | \n", "6.090843 | \n", "6.703918 | \n", "5.821131 | \n", "9.187349 | \n", "7.557380 | \n", "7.320086 | \n", "8.827688 | \n", "
5 rows × 10077 columns
\n", "