Expression#

class openomics.Expression(data, transpose, gene_index=None, usecols=None, gene_level=None, sample_level='sample_index', transform_fn=None, dropna=False, npartitions=None, **kwargs)[source][source]#

Bases: object

This class handles importing of any quantitative omics data that is in a table format (e.g. csv, tsv, excel). Pandas will load the DataFrame from file with the user-specified columns and genes column name, then tranpose it such that the rows are samples and columns are gene/transcript/peptides. The user will also specify the index argument, which specifies if the genes are ensembl genes ID or gene name, or transcripts id/names. The user should be careful about choosing the right genes index which makes it easier to annotate functional, sequence, and interaction data to it. The dataframe should only contain numeric values besides the genes_col_name and the sample barcode id indices.

Attributes Summary

`features`	Args: level (int): Default None.
`gene_index`
`samples`	Args: level:

Methods Summary

`drop_genes`(gene_ids)	Drop columns representing genes/rna/proteins in self.expressions dataframe.
`drop_samples`(sample_ids)	param sample_ids
`get_genes_list`([level])	param level Default None. Only needed if gene index is a `pd.MultiIndex`
`get_samples_list`([level])	param level
`load_dataframe`(data, transpose, usecols, ...)	Reading table data inputs to create a DataFrame.
`load_dataframe_glob`(globstring, usecols, ...)	param globstring
`name`()
`preprocess_table`(df[, usecols, gene_index, ...])	This function preprocesses the expression table files where columns are samples and rows are gene/transcripts :param df: A Dask or Pandas DataFrame :type df: DataFrame :param usecols: A regular expression string for the column names to fetch.
`set_genes_index`(index, old_index)	param index

Attributes Documentation

features[source]#: Args: level (int): Default None. Only needed if gene index is a pd.MultiIndex

gene_index[source]#

samples[source]#: Args: level:

Methods Documentation

drop_genes(gene_ids)[source][source]#

Drop columns representing genes/rna/proteins in self.expressions dataframe.

Parameters: gene_ids (str) – list of strings that are a subset of the columns list

drop_samples(sample_ids)[source][source]#

Parameters: sample_ids –

get_genes_list(level=None)[source][source]#

Parameters: level (int) – Default None. Only needed if gene index is a pd.MultiIndex

get_samples_list(level=None)[source][source]#

Parameters: level –

load_dataframe(data, transpose, usecols, gene_index, dropna, **kwargs)[source][source]#

Reading table data inputs to create a DataFrame.

Parameters

data – either a file path, a glob file path (e.g. “table-*.tsv”), a pandas.DataFrame, or a dask DataFrame.
transpose (bool) – True if table oriented with samples columns, else False.
usecols (str) – A regex string to select columns. Default None.
gene_index (str) – The column name what contains the gene names or IDs.
dropna (bool) – Whether to drop rows with null values

Returns

The loaded dataframe.

Return type

Union[pd.DataFrame, dd.DataFrame]

Return type

DataFrame

load_dataframe_glob(globstring, usecols, gene_index, transpose, dropna, **kwargs)[source][source]#

Parameters

globstring (str) –
usecols (str) –
gene_index (str) –
transpose (bool) –

Returns

dd.DataFrame

classmethod name()[source][source]#

preprocess_table(df, usecols=None, gene_index=None, transposed=True, sort_index=False, dropna=True)[source][source]#

This function preprocesses the expression table files where columns are samples and rows are gene/transcripts :param df: A Dask or Pandas DataFrame :type df: DataFrame :param usecols: A regular expression string for the column names to fetch. :type usecols: str :param gene_index: The column name containing the gene/transcript names or id’s. :type gene_index: str :param transposed: Default True. Whether to transpose the dataframe so columns are genes (features) and rows are samples.

Parameters

df (pd.DataFrame) –
usecols (str) –
gene_index (str) –
transposed (bool) –
sort_index (bool) –
dropna (bool) –

Returns

a processed Dask DataFrame

Return type

Union[pd.DataFrame, dd.DataFrame]

set_genes_index(index, old_index)[source][source]#

Parameters

index (str) –
old_index (str) –