Annotatable¶
-
class
openomics.database.base.
Annotatable
[source][source]¶ Bases:
abc.ABC
This abstract class provides an interface for the -omics (
Expression
) to annotate its genes list with the external data downloaded from various databases. The database will be imported as attributes information to the genes’s annotations, or interactions between the genes.Attributes Summary
Methods Summary
annotate_attributes
(database, on, columns[, …])Performs a left outer join between the annotation and Database’s DataFrame, on the index key.
annotate_diseases
(database, index)- param database
annotate_expressions
(database, index[, …])Annotate :param database: :param index: :param fuzzy_match:
annotate_interactions
(database, index)- param database
annotate_sequences
(database, index[, agg, omic])Annotate a genes list (based on index) with a dictionary of <gene_name: sequence>.
get_rename_dict
(from_index, to_index)Used to retrieve a lookup dictionary to convert from one index to another, e.g., gene_id to gene_name, obtained from two columns in the data frame.
initialize_annotations
(index[, gene_list])- param index
set_index
(new_index)Resets :param new_index: :type new_index: str
Attributes Documentation
Methods Documentation
-
annotate_attributes
(database, on, columns, agg='concat', fuzzy_match=False)[source][source]¶ Performs a left outer join between the annotation and Database’s DataFrame, on the index key. The index argument must be column present in both DataFrames. If there exists overlapping columns from the join, then .fillna() is used to fill NaN values in the old column with non-NaN values from the new column.
- Parameters
database (Database) – Database which contains an dataframe.
on (str) – The column name which exists in both the annotations and Database dataframe to perform the join on.
columns ([str]) – a list of column name to join to the annotation.
agg (str) – Function to aggregate when there is more than one values for each index instance. E.g. [‘first’, ‘last’, ‘sum’, ‘mean’, ‘concat’], default ‘concat’.
fuzzy_match (bool) – default False. Whether to join the annotation by applying a fuzzy match on the index with difflib.get_close_matches(). It is very computationally expensive and thus should only be used sparingly.
-
annotate_diseases
(database, index)[source][source]¶ - Parameters
database (DiseaseAssociation) –
index (str) –
-
annotate_expressions
(database, index, fuzzy_match=False)[source][source]¶ Annotate :param database: :param index: :param fuzzy_match:
- Parameters
database –
index –
fuzzy_match –
-
annotate_interactions
(database, index)[source][source]¶ - Parameters
database (Interactions) –
index (str) –
-
annotate_sequences
(database, index, agg='longest', omic=None, **kwargs)[source][source]¶ Annotate a genes list (based on index) with a dictionary of <gene_name: sequence>. If multiple sequences per gene name, then perform some aggregation.
- Parameters
database (Database) – The database
index (str) – The gene index column name.
agg (str) – The aggregation method, one of [“longest”, “shortest”, or “all”]. Default longest.
omic (str) – Default None. Declare the omic type to fetch sequences for.
**kwargs –
-
get_rename_dict
(from_index, to_index)[source][source]¶ Used to retrieve a lookup dictionary to convert from one index to another, e.g., gene_id to gene_name, obtained from two columns in the data frame.
- Returns
Dict[str, str]: the lookup dictionary.
- Parameters
from_index (str) – an index on the DataFrame for key
to_index –