DeepGOWeb function predictor

DeepGOWeb is a webserver for running DeepGOPlus protein function predictor. It allows users to obtain predicted functions in three different ways. First, by using the Prediction web page users can enter protein sequences and get predictions which are downloadable in JSON format. Second, we provide a REST API for users to access our servers programmatically. Finally, users can use our SPARQL Endpoint to call our predictors within a SPARQL query.

Web page for submitting protein sequences

Users should provide the following data to use the service on the Prediction page:

  • Format: FASTA format or Raw Sequences separated by a newline
  • Threshold: a value between 0.1 and 1.0 for filtering predictions by the confidence score of the model
  • Data: Protein sequences in selected format. Maximum 10 sequences are allowed in one request.

After submitting the request, users will be redirected to results page where they can see the predictions and download them in JSON format. Users can also save the link to the results page and come back to it anytime.

REST API

The REST API is for accessing the service programmatically. Here we provide an example using python and Requests library.

import requests

sequence = "MPYKLKKEKEPPKVAKCTAKPSSSGKDGGGENTEEAQPQPQPQPQPQAQSQPPSSNKRPSNSTPPPTQLSKIKYSGGPQIVKKERRQSSSRFNLSKNRELQKLPALKDSPTQEREELFIQKLRQCCVLFDFVSDPLSDLKFKEVKRAGLNEMVEYITHSRDVVTEAIYPEAVTMFSVNLFRTLPPSSNPTGAEFDPEEDEPTLEAAWPHLQLVYEFFLRFLESPDFQPNIAKKYIDQKFVLALLDLFDSEDPRERDFLKTILHRIYGKFLGLRAYIRRQINHIFYRFIYETEHHNGIAELLEILGSIINGFALPLKEEHKMFLIRVLLPLHKVKSLSVYHPQLAYCVVQFLEKESSLTEPVIVGLLKFWPKTHSPKEVMFLNELEEILDVIEPSEFSKVMEPLFRQLAKCVSSPHFQVAERALYYWNNEYIMSLISDNAARVLPIMFPALYRNSKSHWNKTIHGLIYNALKLFMEMNQKLFDDCTQQYKAEKQKGRFRMKEREEMWQKIEELARLNPQYPMFRAPPPLPPVYSMETETPTAEDIQLLKRTVETEAVQMLKDIKKEKVLLRRKSELPQDVYTIKALEAHKRAEEFLTASQEAL"
threshold = 0.3
r = requests.post('http://deepgoplus.bio2vec.net/deepgo/api/create', data={'data_format': 'enter', 'data': sequence, 'threshold': threshold})                
result = r.json()
              

SPARQL

The SPARQL endpoint allows to call function prediction model in SPARQL query. We provide a custom function called "deepgo" which takes protein sequence and prediction threshold as an input and returns predicted functions along with the subontology, label and prediction score. The output can be downloaded in different formats such as json, xml, csv or text.

Example queries:

  • Example 1: Simple example query
    PREFIX dg: <http://deepgoplus.bio2vec.net/functions#>
    PREFIX GO: <http://purl.obolibrary.org/obo/GO_> 
    
    SELECT ?ont ?go ?label ?score
    {
     (?ont ?go ?label ?score)
    		    dg:deepgo("MPYKLKKEKEPPKVAKCTAKPSSSGKDGGGENTEEAQPQPQPQPQPQAQSQPPSSNKRPSNSTPPPTQLSKIKYSGGPQIVKKERRQSSSRFNLSKNRELQKLPALKDSPTQEREELFIQKLRQCCVLFDFVSDPLSDLKFKEVKRAGLNEMVEYITHSRDVVTEAIYPEAVTMFSVNLFRTLPPSSNPTGAEFDPEEDEPTLEAAWPHLQLVYEFFLRFLESPDFQPNIAKKYIDQKFVLALLDLFDSEDPRERDFLKTILHRIYGKFLGLRAYIRRQINHIFYRFIYETEHHNGIAELLEILGSIINGFALPLKEEHKMFLIRVLLPLHKVKSLSVYHPQLAYCVVQFLEKESSLTEPVIVGLLKFWPKTHSPKEVMFLNELEEILDVIEPSEFSKVMEPLFRQLAKCVSSPHFQVAERALYYWNNEYIMSLISDNAARVLPIMFPALYRNSKSHWNKTIHGLIYNALKLFMEMNQKLFDDCTQQYKAEKQKGRFRMKEREEMWQKIEELARLNPQYPMFRAPPPLPPVYSMETETPTAEDIQLLKRTVETEAVQMLKDIKKEKVLLRRKSELPQDVYTIKALEAHKRAEEFLTASQEAL" 0.3) .
    }
        
  • Example 2: Federated query which runs deepgo on two sequences from UniProt SPARQL Endpoint
    PREFIX dg: <http://deepgoplus.bio2vec.net/functions#>
    PREFIX GO: <http://purl.obolibrary.org/obo/GO_> 
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
    PREFIX up: <http://purl.uniprot.org/core/>
    SELECT ?protein ?organism ?isoform ?sub ?go ?label ?score
    WHERE
    {
    {
    SELECT DISTINCT ?protein ?organism ?isoform ?aa_sequence
      WHERE 
      {
      SERVICE <http://sparql.uniprot.org/sparql> {
        ?protein a up:Protein .
        ?protein up:organism ?organism .
        ?organism rdfs:subClassOf taxon:9606 .
        ?protein up:sequence ?isoform .
        ?isoform rdf:value ?aa_sequence .
      }
      }
    LIMIT 2
    }
    (?sub ?go ?label ?score) dg:deepgo(?aa_sequence 0.3) .
    }
        

Commandline tool

Users can also install DeepGOPlus on their system and run the predictor locally.

Installation

pip

pip install deepgoplus
Download the data

Download all the files from http://deepgoplus.bio2vec.net/data/data.tar.gz and extract them into data folder

Run:

deepgoplus --data-root path_to_data_folder --in-file input_fasta_filename