Expression¶
-
class
openomics.
Expression
(data, transpose, gene_index=None, usecols=None, gene_level=None, sample_level='sample_index', transform_fn=None, dropna=False, npartitions=None, **kwargs)[source][source]¶ Bases:
object
This class handles importing of any quantitative omics data that is in a table format (e.g. csv, tsv, excel). Pandas will load the DataFrame from file with the user-specified columns and genes column name, then tranpose it such that the rows are samples and columns are gene/transcript/peptides. The user will also specify the index argument, which specifies if the genes are ensembl genes ID or gene name, or transcripts id/names. The user should be careful about choosing the right genes index which makes it easier to annotate functional, sequence, and interaction data to it. The dataframe should only contain numeric values besides the genes_col_name and the sample barcode id indices.
Attributes Summary
Args: level (int): Default None.
Args: level:
Methods Summary
drop_genes
(gene_ids)Drop columns representing genes/rna/proteins in self.expressions dataframe.
drop_samples
(sample_ids)- param sample_ids
get_genes_list
([level])- param level
Default None. Only needed if gene index is a
pd.MultiIndex
get_samples_list
([level])- param level
load_dataframe
(data, transpose, usecols, …)Reading table data inputs to create a DataFrame.
load_dataframe_glob
(globstring, usecols, …)- param globstring
name
()preprocess_table
(df[, usecols, gene_index, …])This function preprocesses the expression table files where columns are samples and rows are gene/transcripts :param df: A Dask or Pandas DataFrame :type df: DataFrame :param usecols: A regular expression string for the column names to fetch.
set_genes_index
(index, old_index)- param index
Attributes Documentation
Methods Documentation
-
drop_genes
(gene_ids)[source][source]¶ Drop columns representing genes/rna/proteins in self.expressions dataframe.
- Parameters
gene_ids (str) – list of strings that are a subset of the columns list
-
get_genes_list
(level=None)[source][source]¶ - Parameters
level (int) – Default None. Only needed if gene index is a
pd.MultiIndex
-
load_dataframe
(data, transpose, usecols, gene_index, dropna)[source][source]¶ Reading table data inputs to create a DataFrame.
- Parameters
data – either a file path, a glob file path (e.g. “table-*.tsv”), a pandas.DataFrame, or a dask DataFrame.
transpose (bool) – True if table oriented with samples columns, else False.
usecols (str) – A regex string to select columns. Default None.
gene_index (str) – The column name what contains the gene names or IDs.
dropna (bool) – Whether to drop rows with null values
- Returns
The loaded dataframe.
- Return type
Union[pd.DataFrame, dd.DataFrame]
-
load_dataframe_glob
(globstring, usecols, gene_index, transpose, dropna)[source][source]¶ - Parameters
globstring (str) –
usecols (str) –
gene_index (str) –
transpose (bool) –
- Returns
dd.DataFrame
-
preprocess_table
(df, usecols=None, gene_index=None, transposed=True, sort_index=False, dropna=True)[source][source]¶ This function preprocesses the expression table files where columns are samples and rows are gene/transcripts :param df: A Dask or Pandas DataFrame :type df: DataFrame :param usecols: A regular expression string for the column names to fetch. :type usecols: str :param gene_index: The column name containing the gene/transcript names or id’s. :type gene_index: str :param transposed: Default True. Whether to transpose the dataframe so columns are genes (features) and rows are samples.
- Parameters
df (pd.DataFrame) –
usecols (str) –
gene_index (str) –
transposed (bool) –
sort_index (bool) –
dropna (bool) –
- Returns
a processed Dask DataFrame
- Return type
Union[pd.DataFrame, dd.DataFrame]