MultiOmics¶

class openomics.multiomics.MultiOmics(cohort_name, omics_data=None)[source][source]¶

Bases: object

A data object which holds multiple -omics data for a single clinical cohort.

Methods Summary

`add_clinical_data`(clinical, **kwargs)	Add a ClinicalData instance to the MultiOmics instance.
`add_omic`(omic_data[, initialize_annotations])	Adds an omic object to the Multiomics such that the samples in omic matches the samples existing in the other omics.
`annotate_samples`(dictionary)	This function adds a “predicted_subtype” field to the patients clinical data.
`build_samples`([agg_by])	Running this function will build a dataframe for all samples across the different omics (either by a union or intersection).
`get_omics_list`()
`get_sample_attributes`(matched_samples)	Fetch patient’s clinical data for each given samples barcodes in the matched_samples
`load_data`(omics[, target, …])	param omics A list of the data modalities to load. Default “all”
`match_samples`(omics)	Return the index of bcr_sample_barcodes of the intersection of samples from all modalities
`print_sample_sizes`()
`remove_duplicate_genes`()	Removes duplicate genes between any omics such that the gene index across all omics has no duplicates.

Methods Documentation

add_clinical_data(clinical, **kwargs)[source][source]¶

Add a ClinicalData instance to the MultiOmics instance.

Parameters: clinical (openomics.ClinicalData) –

add_omic(omic_data, initialize_annotations=True)[source][source]¶

Adds an omic object to the Multiomics such that the samples in omic matches the samples existing in the other omics.

Parameters

omic_data (Expression) – The omic to add, e.g., MessengerRNA, MicroRNA, LncRNA, etc.
initialize_annotations (bool) – default True. If true, initializes the annotation dataframe in the omic object

annotate_samples(dictionary)[source][source]¶

This function adds a “predicted_subtype” field to the patients clinical data. For instance, patients were classified into subtypes based on their expression profile using k-means, then, to use this function, do:

annotate_patients(dict(zip(patient index>, <list of corresponding patient’s subtypes>)))

Adding a field to the patients clinical data allows openomics to query the patients data through the .load_data(subtypes=[]) parameter,

Parameters: dictionary – A dictionary mapping patient’s index to a subtype

build_samples(agg_by='union')[source][source]¶

Running this function will build a dataframe for all samples across the different omics (either by a union or intersection). Then,

Parameters: agg_by (str) – [“union”, “intersection”]

get_omics_list()[source][source]¶

get_sample_attributes(matched_samples)[source][source]¶

Fetch patient’s clinical data for each given samples barcodes in the matched_samples

Returns: samples_index: Index of samples

Parameters: matched_samples – A list of sample barcodes

load_data(omics, target=['pathologic_stage'], pathologic_stages=None, histological_subtypes=None, predicted_subtypes=None, tumor_normal=None, samples_barcode=None)[source][source]¶

Parameters

omics (list) – A list of the data modalities to load. Default “all” to select all modalities
target (list) – The clinical data fields to include in the
pathologic_stages (list) – Only fetch samples having certain stages in their corresponding patient’s clinical data. For instance, [“Stage I”, “Stage II”] will only fetch samples from Stage I and Stage II patients. Default is [] which fetches all pathologic stages.
histological_subtypes – A list specifying the histological subtypes to fetch. Default is [] which fetches all histological sybtypes.
predicted_subtypes – A list specifying the histological subtypes to fetch. Default is [] which fetches all histological sybtypes.
tumor_normal – [“Tumor”, “Normal”]. Default is [], which fetches all tumor or normal sample types.
samples_barcode – A list of sample’s barcode. If not None, only fetch data with matching samples provided in this list.

Returns

Returns X, a dictionary containing the multiomics data that have data

Return type

(X, y)

match_samples(omics)[source][source]¶

Return the index of bcr_sample_barcodes of the intersection of samples from all modalities

Parameters: omics – An array of modalities
Returns: An pandas Index list
Return type: matched_sapmles
Return type: Index

print_sample_sizes()[source][source]¶

remove_duplicate_genes()[source][source]¶: Removes duplicate genes between any omics such that the gene index across all omics has no duplicates.