RNAcentral#

class openomics.database.sequence.RNAcentral(path='https://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/', file_resources=None, col_rename={'GO terms': 'go_id', 'ensembl_gene_id': 'gene_id', 'external id': 'transcript_id', 'gene symbol': 'gene_id'}, species_id=None, index_col='RNAcentral id', keys=None, remove_version_num=True, remove_species_suffix=True, **kwargs)[source][source]#

Bases: openomics.database.sequence.SequenceDatabase

Loads the RNAcentral database from https://rnacentral.org/ and provides a series of methods to extract sequence data from it.

Default path: https://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/ . Default file_resources: {

“rnacentral_rfam_annotations.tsv”: “go_annotations/rnacentral_rfam_annotations.tsv.gz”, “database_mappings/gencode.tsv”: “id_mapping/database_mappings/gencode.tsv”, “gencode.fasta”: “sequences/by-database/gencode.fasta”, …

}

Attributes Summary

Methods Summary

add_rfam_annotation(transcripts_df, ...[, ...])

rtype

Union[DataFrame, DataFrame]

get_sequences([index, omic, agg])

param index

load_dataframe(file_resources[, blocksize])

param file_resources

load_sequences(fasta_file[, index, keys, ...])

param index ()

Attributes Documentation

COLUMNS_RENAME_DICT = {'GO terms': 'go_id', 'ensembl_gene_id': 'gene_id', 'external id': 'transcript_id', 'gene symbol': 'gene_id'}[source]#

Methods Documentation

add_rfam_annotation(transcripts_df, file_resources, blocksize=None)[source][source]#
Return type

Union[DataFrame, DataFrame]

get_sequences(index='RNAcentral id', omic=None, agg='all', **kwargs)[source][source]#
Parameters
  • index

  • omic

  • agg

  • **kwargs

load_dataframe(file_resources, blocksize=None)[source][source]#
Parameters
  • file_resources

  • blocksize

load_sequences(fasta_file, index=None, keys=None, blocksize=None)[source][source]#
Parameters
  • () (keys) –

  • fasta_file

  • ()

  • blocksize