SequenceDatabase¶
-
class
openomics.database.sequence.
SequenceDatabase
(replace_U2T=False, **kwargs)[source][source]¶ Bases:
openomics.database.base.Database
Provides a series of methods to extract sequence data from SequenceDataset.
Methods Summary
get_aggregator
([agg])Returns a function used aggregate a list of sequences from a groupby on a given key.
get_sequences
(index, omic, agg_sequences, …)Returns a dictionary where keys are ‘index’ and values are sequence(s).
read_fasta
(fasta_file, replace_U2T[, …])Returns a pandas DataFrame containing the fasta sequence entries.
Methods Documentation
-
static
get_aggregator
(agg=None)[source][source]¶ Returns a function used aggregate a list of sequences from a groupby on a given key.
- Parameters
agg – One of (“all”, “shortest”, “longest”), default “all”. If “all”, then for all
-
abstract
get_sequences
(index, omic, agg_sequences, **kwargs)[source][source]¶ Returns a dictionary where keys are ‘index’ and values are sequence(s).
- Parameters
index (str) – {“gene_id”, “gene_name”, “transcript_id”, “transcript_name”}
omic (str) – {“lncRNA”, “microRNA”, “messengerRNA”}
agg_sequences (str) – {“all”, “shortest”, “longest”}
**kwargs – any additional argument to pass to SequenceDataset.get_sequences()
-
abstract
read_fasta
(fasta_file, replace_U2T, npartitions=None)[source][source]¶ Returns a pandas DataFrame containing the fasta sequence entries. With a column named ‘sequence’.
- Parameters
fasta_file (str) – path to the fasta file, usually as self.file_resources[<file_name>]
replace_U2T (bool) –
npartitions –
-
static