MultiOmics#

class openomics.MultiOmics(cohort_name, omics_data=None)[source][source]#

Bases: object

A data object which holds multiple -omics data for a single clinical cohort.

Methods Summary

add_clinical_data(clinical, **kwargs)

Add a ClinicalData instance to the MultiOmics instance.

add_omic(omic_data[, init_annotations])

Adds an omic object to the Multiomics such that the samples in omic matches the samples existing in the other omics.

build_samples([agg_by])

Running this function will build a dataframe for all samples across the different omics (either by a union or intersection).

get_omics_list()

load_data(omics[, target, ...])

Prepare the multiomics data in format

match_samples(omics)

Return the index of sample IDs of the intersection of samples from all modalities

print_sample_sizes()

remove_duplicate_genes()

Removes duplicate genes between any omics such that the gene index across all omics has no duplicates.

Methods Documentation

add_clinical_data(clinical, **kwargs)[source][source]#

Add a ClinicalData instance to the MultiOmics instance.

Parameters
add_omic(omic_data, init_annotations=True)[source][source]#

Adds an omic object to the Multiomics such that the samples in omic matches the samples existing in the other omics.

Parameters
  • omic_data (Expression) – The omic to add, e.g., MessengerRNA, MicroRNA, LncRNA, etc.

  • init_annotations (bool) – default True. If true, initializes the annotation dataframe in the omic object

build_samples(agg_by='union')[source][source]#

Running this function will build a dataframe for all samples across the different omics (either by a union or intersection). Then,

Parameters

agg_by (str) – [“union”, “intersection”]

get_omics_list()[source][source]#
load_data(omics, target=['pathologic_stage'], pathologic_stages=None, histological_subtypes=None, predicted_subtypes=None, tumor_normal=None, samples_barcode=None, remove_duplicates=True)[source][source]#

Prepare the multiomics data in format

Parameters
  • omics (list) – A list of the data modalities to load. Default “all” to select all modalities

  • target (list) – The clinical data fields to include in the

  • pathologic_stages (list) – Only fetch samples having certain stages in their corresponding patient’s clinical data. For instance, [“Stage I”, “Stage II”] will only fetch samples from Stage I and Stage II patients. Default is [] which fetches all pathologic stages.

  • histological_subtypes – A list specifying the histological subtypes to fetch. Default is [] which fetches all histological sybtypes.

  • predicted_subtypes – A list specifying the histological subtypes to fetch. Default is [] which fetches all histological sybtypes.

  • tumor_normal – [“Tumor”, “Normal”]. Default is [], which fetches all tumor or normal sample types.

  • samples_barcode – A list of sample’s barcode. If not None, only fetch data with matching samples provided in this list.

  • remove_duplicates (bool) – If True, only selects samples with non-duplicated index.

Returns

Returns (X, y), where X is a dictionary containing the multiomics data with matched samples, and y contain the :param target: labels for those samples.

Return type

Tuple[Dict[str, pd.DataFrame], pd.DataFrame]

match_samples(omics)[source][source]#

Return the index of sample IDs of the intersection of samples from all modalities

Parameters

omics – An array of modalities

Returns

An pandas Index list

Return type

matched_sapmles

Return type

Index

print_sample_sizes()[source][source]#
remove_duplicate_genes()[source][source]#

Removes duplicate genes between any omics such that the gene index across all omics has no duplicates.