Database#
- class openomics.database.base.Database(path, file_resources=None, index_col=None, keys=None, usecols=None, col_rename=None, blocksize=None, verbose=False, **kwargs)[source][source]#
Bases:
object
This is a base class used to instantiate an external Database given a a set of files from either local files or URLs. When creating a Database class, the load_dataframe() function is called where the file_resources are used to load (Pandas or Dask) DataFrames, then performs data wrangling to yield a dataframe at self.data . This class also provides an interface for -omics tables, e.g. ExpressionData , to annotate various annotations, expressions, sequences, and disease associations.
Attributes Summary
Methods Summary
close
()get_annotations
(on, columns[, agg, agg_for, ...])Returns the Database's DataFrame such that it's indexed by :param index:, which then applies a groupby operation and aggregates all other columns by concatenating all unique values.
get_expressions
(index)- param index
load_dataframe
(file_resources[, blocksize])Handles data preprocessing given the file_resources input, and returns a DataFrame.
name
()Attributes Documentation
Methods Documentation
- get_annotations(on, columns, agg='unique', agg_for=None, keys=None)[source][source]#
Returns the Database’s DataFrame such that it’s indexed by :param index:, which then applies a groupby operation and aggregates all other columns by concatenating all unique values.
- Parameters
on (str, list) – The column name(s) of the DataFrame to group by.
columns (list) – a list of column names to aggregate.
agg (str) – Function to aggregate when there is more than one values for each index key value. E.g. [‘first’, ‘last’, ‘sum’, ‘mean’, ‘size’, ‘concat’], default ‘concat’.
agg_for (Dict[str, Any]) – Bypass the agg function for certain columns with functions specified in this dict of column names and the agg function to aggregate for that column.
keys (pd.Index) – The values on the index column to filter before performing the groupby-agg operations.
- Returns
An filted-groupby-aggregated dataframe to be used for annotation.
- Return type
values