Database#

class openomics.database.base.Database(path, file_resources=None, index_col=None, keys=None, usecols=None, col_rename=None, blocksize=None, verbose=False, **kwargs)[source][source]#

Bases: object

This is a base class used to instantiate an external Database given a a set of files from either local files or URLs. When creating a Database class, the load_dataframe() function is called where the file_resources are used to load (Pandas or Dask) DataFrames, then performs data wrangling to yield a dataframe at self.data . This class also provides an interface for -omics tables, e.g. ExpressionData , to annotate various annotations, expressions, sequences, and disease associations.

Attributes Summary

COLUMNS_RENAME_DICT

Methods Summary

`close`()
`get_annotations`(on, columns[, agg, agg_for, ...])	Returns the Database's DataFrame such that it's indexed by :param index:, which then applies a groupby operation and aggregates all other columns by concatenating all unique values.
`get_expressions`(index)	param index
`list_databases`()
`load_dataframe`(file_resources[, blocksize])	Handles data preprocessing given the file_resources input, and returns a DataFrame.
`name`()

Attributes Documentation

COLUMNS_RENAME_DICT = None[source]#

Methods Documentation

close()[source][source]#

get_annotations(on, columns, agg='unique', agg_for=None, keys=None)[source][source]#

Returns the Database’s DataFrame such that it’s indexed by :param index:, which then applies a groupby operation and aggregates all other columns by concatenating all unique values.

Parameters

on (str, list) – The column name(s) of the DataFrame to group by.
columns (list) – a list of column names to aggregate.
agg (str) – Function to aggregate when there is more than one values for each index key value. E.g. [‘first’, ‘last’, ‘sum’, ‘mean’, ‘size’, ‘concat’], default ‘concat’.
agg_for (Dict[str, Any]) – Bypass the agg function for certain columns with functions specified in this dict of column names and the agg function to aggregate for that column.
keys (pd.Index) – The values on the index column to filter before performing the groupby-agg operations.

Returns

An filted-groupby-aggregated dataframe to be used for annotation.

Return type

values

get_expressions(index)[source][source]#

Parameters: index –

static list_databases()[source][source]#

abstract load_dataframe(file_resources, blocksize=None)[source][source]#

Handles data preprocessing given the file_resources input, and returns a DataFrame.

Parameters

file_resources (dict) – A dict with keys as filenames and values as full file path.
blocksize (int) –

Return type

DataFrame

classmethod name()[source][source]#