MultiOmics#

class openomics.MultiOmics(cohort_name, omics_data=None)[source][source]#

Bases: object

A data object which holds multiple -omics data for a single clinical cohort.

Methods Summary

`add_clinical_data`(clinical, **kwargs)	Add a ClinicalData instance to the MultiOmics instance.
`add_omic`(omic_data[, init_annotations])	Adds an omic object to the Multiomics such that the samples in omic matches the samples existing in the other omics.
`build_samples`([agg_by])	Running this function will build a dataframe for all samples across the different omics (either by a union or intersection).
`get_omics_list`()
`load_data`(omics[, target, ...])	Prepare the multiomics data in format
`match_samples`(omics)	Return the index of sample IDs of the intersection of samples from all modalities
`print_sample_sizes`()
`remove_duplicate_genes`()	Removes duplicate genes between any omics such that the gene index across all omics has no duplicates.

Methods Documentation

add_clinical_data(clinical, **kwargs)[source][source]#

Add a ClinicalData instance to the MultiOmics instance.

Parameters

clinical (openomics.clinical.ClinicalData) –
**kwargs –

add_omic(omic_data, init_annotations=True)[source][source]#

Adds an omic object to the Multiomics such that the samples in omic matches the samples existing in the other omics.

Parameters

omic_data (Expression) – The omic to add, e.g., MessengerRNA, MicroRNA, LncRNA, etc.
init_annotations (bool) – default True. If true, initializes the annotation dataframe in the omic object

build_samples(agg_by='union')[source][source]#

Running this function will build a dataframe for all samples across the different omics (either by a union or intersection). Then,

Parameters: agg_by (str) – [“union”, “intersection”]

get_omics_list()[source][source]#

load_data(omics, target=['pathologic_stage'], pathologic_stages=None, histological_subtypes=None, predicted_subtypes=None, tumor_normal=None, samples_barcode=None, remove_duplicates=True)[source][source]#

Prepare the multiomics data in format

Parameters

omics (list) – A list of the data modalities to load. Default “all” to select all modalities
target (list) – The clinical data fields to include in the
pathologic_stages (list) – Only fetch samples having certain stages in their corresponding patient’s clinical data. For instance, [“Stage I”, “Stage II”] will only fetch samples from Stage I and Stage II patients. Default is [] which fetches all pathologic stages.
histological_subtypes – A list specifying the histological subtypes to fetch. Default is [] which fetches all histological sybtypes.
predicted_subtypes – A list specifying the histological subtypes to fetch. Default is [] which fetches all histological sybtypes.
tumor_normal – [“Tumor”, “Normal”]. Default is [], which fetches all tumor or normal sample types.
samples_barcode – A list of sample’s barcode. If not None, only fetch data with matching samples provided in this list.
remove_duplicates (bool) – If True, only selects samples with non-duplicated index.

Returns

Returns (X, y), where X is a dictionary containing the multiomics data with matched samples, and y contain the :param target: labels for those samples.

Return type

Tuple[Dict[str, pd.DataFrame], pd.DataFrame]

match_samples(omics)[source][source]#

Return the index of sample IDs of the intersection of samples from all modalities

Parameters: omics – An array of modalities
Returns: An pandas Index list
Return type: matched_sapmles
Return type: Index

print_sample_sizes()[source][source]#

remove_duplicate_genes()[source][source]#: Removes duplicate genes between any omics such that the gene index across all omics has no duplicates.