Expression#

class openomics.Expression(data, transpose, gene_index=None, usecols=None, gene_level=None, sample_level='sample_index', transform_fn=None, dropna=False, npartitions=None, **kwargs)[source][source]#

Bases: object

This class handles importing of any quantitative omics data that is in a table format (e.g. csv, tsv, excel). Pandas will load the DataFrame from file with the user-specified columns and genes column name, then tranpose it such that the rows are samples and columns are gene/transcript/peptides. The user will also specify the index argument, which specifies if the genes are ensembl genes ID or gene name, or transcripts id/names. The user should be careful about choosing the right genes index which makes it easier to annotate functional, sequence, and interaction data to it. The dataframe should only contain numeric values besides the genes_col_name and the sample barcode id indices.

Attributes Summary

features

Args: level (int): Default None.

gene_index

samples

Args: level:

Methods Summary

drop_genes(gene_ids)

Drop columns representing genes/rna/proteins in self.expressions dataframe.

drop_samples(sample_ids)

param sample_ids

get_genes_list([level])

param level

Default None. Only needed if gene index is a pd.MultiIndex

get_samples_list([level])

param level

load_dataframe(data, transpose, usecols, ...)

Reading table data inputs to create a DataFrame.

load_dataframe_glob(globstring, usecols, ...)

param globstring

name()

preprocess_table(df[, usecols, gene_index, ...])

This function preprocesses the expression table files where columns are samples and rows are gene/transcripts :param df: A Dask or Pandas DataFrame :type df: DataFrame :param usecols: A regular expression string for the column names to fetch.

set_genes_index(index, old_index)

param index

Attributes Documentation

features[source]#

Args: level (int): Default None. Only needed if gene index is a pd.MultiIndex

gene_index[source]#
samples[source]#

Args: level:

Methods Documentation

drop_genes(gene_ids)[source][source]#

Drop columns representing genes/rna/proteins in self.expressions dataframe.

Parameters

gene_ids (str) – list of strings that are a subset of the columns list

drop_samples(sample_ids)[source][source]#
Parameters

sample_ids

get_genes_list(level=None)[source][source]#
Parameters

level (int) – Default None. Only needed if gene index is a pd.MultiIndex

get_samples_list(level=None)[source][source]#
Parameters

level

load_dataframe(data, transpose, usecols, gene_index, dropna, **kwargs)[source][source]#

Reading table data inputs to create a DataFrame.

Parameters
  • data – either a file path, a glob file path (e.g. “table-*.tsv”), a pandas.DataFrame, or a dask DataFrame.

  • transpose (bool) – True if table oriented with samples columns, else False.

  • usecols (str) – A regex string to select columns. Default None.

  • gene_index (str) – The column name what contains the gene names or IDs.

  • dropna (bool) – Whether to drop rows with null values

Returns

The loaded dataframe.

Return type

Union[pd.DataFrame, dd.DataFrame]

Return type

DataFrame

load_dataframe_glob(globstring, usecols, gene_index, transpose, dropna, **kwargs)[source][source]#
Parameters
  • globstring (str) –

  • usecols (str) –

  • gene_index (str) –

  • transpose (bool) –

Returns

dd.DataFrame

classmethod name()[source][source]#
preprocess_table(df, usecols=None, gene_index=None, transposed=True, sort_index=False, dropna=True)[source][source]#

This function preprocesses the expression table files where columns are samples and rows are gene/transcripts :param df: A Dask or Pandas DataFrame :type df: DataFrame :param usecols: A regular expression string for the column names to fetch. :type usecols: str :param gene_index: The column name containing the gene/transcript names or id’s. :type gene_index: str :param transposed: Default True. Whether to transpose the dataframe so columns are genes (features) and rows are samples.

Parameters
  • df (pd.DataFrame) –

  • usecols (str) –

  • gene_index (str) –

  • transposed (bool) –

  • sort_index (bool) –

  • dropna (bool) –

Returns

a processed Dask DataFrame

Return type

Union[pd.DataFrame, dd.DataFrame]

set_genes_index(index, old_index)[source][source]#
Parameters
  • index (str) –

  • old_index (str) –