STRING#

class openomics.database.interaction.STRING(path='https://stringdb-static.org/download/', file_resources=None, species_id='9606', version='v11.0', source_col_name='protein1', target_col_name='protein2', edge_attr='combined_score', directed=False, relabel_nodes=None, index_col='#string_protein_id', keys=None, alias_types={'Ensembl_UniProt', 'Ensembl_UniProt_AC'}, blocksize=None, **kwargs)[source][source]#

Bases: openomics.database.interaction.Interactions, openomics.database.sequence.SequenceDatabase

Loads the STRING database from https://string-db.org/ .

Default path: “https://stringdb-static.org/download/” . Default file_resources: {

“{species_id}.protein.info.txt.gz”: f”protein.info.{version}/{species_id}.protein.info.{version}.txt.gz”, “{species_id}.protein.aliases.txt.gz”: f”protein.links.{version}/{species_id}.protein.aliases.{version}.txt.gz”, “{species_id}.protein.links.txt.gz”: f”protein.links.{version}/{species_id}.protein.links.{version}.txt.gz”, “{species_id}.protein.sequences.fa.gz”: f”protein.sequences.{version}/{species_id}.protein.sequences.{version}.fa.gz”

}

Edge attributes for protein.actions.txt include [“mode”, ‘action’, ‘is_directional’, ‘a_is_acting’ “score”] Edge attributes for protein.actions.txt include [“combined_score”]

Attributes Summary

Methods Summary

get_sequences([index, omic, agg])

Returns a dictionary where keys are 'index' and values are sequence(s).

load_network(file_resources[, ...])

Handles data processing from file_resources to a Pandas DataFrame which contain edgelist data, then constructs and return a NetworkX Graph.

Attributes Documentation

COLUMNS_RENAME_DICT = {'#ncbi_taxid': 'species_id', '#string_protein_id': 'string_protein_id', 'preferred_name': 'gene_name', 'protein_external_id': 'protein_id', 'string_protein_id_2': 'homologous_protein_id'}[source]#

Methods Documentation

get_sequences(index='protein_id', omic=None, agg=None)[source][source]#

Returns a dictionary where keys are ‘index’ and values are sequence(s).

Parameters
  • index (str) – {“gene_id”, “gene_name”, “transcript_id”, “transcript_name”}

  • omic (str) – {“lncRNA”, “microRNA”, “messengerRNA”}

  • agg (str) – {“all”, “shortest”, “longest”}

  • **kwargs – any additional argument to pass to SequenceDataset.get_sequences()

load_network(file_resources, source_col_name='protein1', target_col_name='protein2', edge_attr='combined_score', directed=False, filters=None, blocksize=None)[source][source]#

Handles data processing from file_resources to a Pandas DataFrame which contain edgelist data, then constructs and return a NetworkX Graph. :param file_resources: a dict of file name and file path/object :param source_col_name: column name of the dataframe for source in the edge :type source_col_name: str :param target_col_name: column name of the dataframe for target in the edge :type target_col_name: str :param edge_attr: list of str for column data to include in each edge :type edge_attr: list :param directed: True to return a DiGraph(), else Graph() :type directed: bool :param filters: A dict of {column name: column values} to filter the dataframe :param blocksize ():

Returns

a NetworkX Graph or DiGraph

Return type

network