RNAcentral#

class openomics.database.sequence.RNAcentral(path='https://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/', file_resources=None, col_rename={'GO terms': 'go_id', 'ensembl_gene_id': 'gene_id', 'external id': 'transcript_id', 'gene symbol': 'gene_id'}, species_id=None, index_col='RNAcentral id', keys=None, remove_version_num=True, remove_species_suffix=True, **kwargs)[source][source]#

Bases: openomics.database.sequence.SequenceDatabase

Loads the RNAcentral database from https://rnacentral.org/ and provides a series of methods to extract sequence data from it.

Default path: https://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/ . Default file_resources: {

“rnacentral_rfam_annotations.tsv”: “go_annotations/rnacentral_rfam_annotations.tsv.gz”, “database_mappings/gencode.tsv”: “id_mapping/database_mappings/gencode.tsv”, “gencode.fasta”: “sequences/by-database/gencode.fasta”, …

}

Attributes Summary

COLUMNS_RENAME_DICT

Methods Summary

`add_rfam_annotation`(transcripts_df, ...[, ...])	rtype `Union`[`DataFrame`, `DataFrame`]
`get_sequences`([index, omic, agg])	param index
`load_dataframe`(file_resources[, blocksize])	param file_resources
`load_sequences`(fasta_file[, index, keys, ...])	param index ()

Attributes Documentation

COLUMNS_RENAME_DICT = {'GO terms': 'go_id', 'ensembl_gene_id': 'gene_id', 'external id': 'transcript_id', 'gene symbol': 'gene_id'}[source]#

Methods Documentation

add_rfam_annotation(transcripts_df, file_resources, blocksize=None)[source][source]#

Return type: Union[DataFrame, DataFrame]

get_sequences(index='RNAcentral id', omic=None, agg='all', **kwargs)[source][source]#

Parameters

index –
omic –
agg –
**kwargs –

load_dataframe(file_resources, blocksize=None)[source][source]#

Parameters

file_resources –
blocksize –

load_sequences(fasta_file, index=None, keys=None, blocksize=None)[source][source]#

Parameters

() (keys) –
fasta_file –
() –
blocksize –