Supplementary Material

Supplementary Material MOWServ: a web client for integration of bioinformatic resources Sergio Ramírez1;#, Antonio Muñoz1;#, Johan Karlsson1, Maximiliano García1, M. Gonzalo Claros2 and Oswaldo Trelles1 1

Departamento Arquitectura de Computadores, Escuela Técnica Superior de Ingeniería Informática, Campus de Teatinos s/n, University of Malaga, 29071 Málaga, Spain, and 2 Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Campus de Teatinos s/n, University of Malaga, 29071 Málaga, Spain #These

authors contributed equally

Homology Search and Phylogenetic Study When we have one amino acid query sequence, it is interesting to find similar sequences with a common evolutive history. In this example, a real case will be shown where we will do the followings steps: STEP 1: Retrieve the sequence queried by the user from databases (getAminoAcidSequenceUMA service): STEP 2: Homology Search (Blast service) STEP 3: Retrieving sequences from Blast (getBestHitsFromBlast & getAminoAcidSequenceCollection) STEP 4: Multiple Alignment (Clustalw service) STEP 5: Phylogenetic Tree (CreateTreeFromClustalw service) STEP 1: Amino Acid Sequence retrieval. To start we have to open the INB resource web. The first step could be to obtain the protein sequence we are interested in. Go to tree frame and select: Services -> Service -> Bioinformatics -> Database -> Retrieving -> Getting sequences -> Getting AminoAcids (several services with similar or the same name can appear, corresponding to services from different authorities). This will open a window with the interface of the selected service. In this interface, there are two main groups of parameters: INPUT PARAMETERS: The different parameters of the service, being the last one the most important parameter from them. In this case it only appears the ID of the sequence (SMN_HUMAN for example), but it depends on the service. OUTPUT NAME: At present, the output parameters are restricted to the output name. So we include a selected name for our output object in this box. This name is important for later identifying our objects and possibly for sending it to other services. The default name for the output name is the service name together with the current time and date but this name can be changed.

Once the submit button is clicked, the progress of the service can be followed in “User tasks” tab using reload in case of the service require more than few seconds, and when the process had finished, the result can be displayed in different formats also from this tab: •

XML: View the object in XML format like a BioMOBY object (click on the below icons to view how your output objects are shown). • HTML: View the object in a user-friendly window with HTML format. The different objects have different viewers (click on the below icons to view how your output objects are shown). • Download: Download the object to your local computer. If you click on an object name, a list of services accepting this object will appear, thus making it easier to perform additional analysis on the results.

STEP 2: Homology Search. Now we have to search the right service for the search along the tree. In this case we will look for it using the searching box. We could write NCBIblastp into the box and click on the “expand all” button to check all the possibilities that appear highlighted. We click on the chosen service (runNCBIBlastp1) and the window will change to the runNCBIBlastp1 service interface. By default the only object we have created appear as input, so we only need to click submit to run, although as the previous service we can change the output object name.

When the service has finished, the output object can be visualized in any available format (i.e. html view):

Now, we will retrieve the sequences from the best Blast hits, using the Blast E-value as threshold. STEP 3: Retrieving Sequences. The used services in this step are getBestHitsFromBlast (Services -> Service -> ObjectHandling -> Parsing), and getAminoAcidSequenceCollection (Services -> Service -> Database -> Retrieving -> GettingSequences -> GettingAminoacids). The former service parses the Blast report and produces a collection of amino acid sequence IDs together with its corresponding database (it is an object collection). This service asks for a threshold value, and it will only take hits with E-value lower than this threshold. We are going to select the 1 value as threshold to have sequences distantly related with the query sequence and also limit the hits to maximum 5 (note that there are different values in the screenshot).

Then, the latter service (getAminoAcidSequenceCollection) will retrieve the aminoacid sequences from the previous IDs/databases (object collection) to produce an amino acid sequence collection. When the service has finished we can look at the created objects in the html viewer.

Now we will run the ClustalW program on the most similar found sequences to our query sequence, so the service will build a multiple alignment with them. STEP 4: Multiple Alignment. The used service in this step is runClustalwFastUMA, located on the tree branch: Services -> Service -> Bioinformatics -> Alignment -> Multiple_Sequence_Comparison. This service has several input parameters that we will leave in its default values (it can be changed in order to modify the final alignment). The service has also an input where we select the previous sequence collection. Now we can choose an output name and submit the task. When the service has finished we can look at the created object in the html viewer. This displays the multiple alignment formatted by the Mview program.

The results show several related sequences with common domains. In addition, the last sequence is larger than the others and it keeps a domain called 'Tudor domains', and others. This domain information about the proteins in the multiple alignment can be used for looking up the corresponding Swiss-Prot Entry: either clicking their database link in the previous Blast-Text object or for running the service getEntryfromSwissProt (Services -> Service -> Bioinformatics -> Database -> Retrieving -> GettingSequences -> GettingText) with a input object being Namespace=Swiss-Prot, and ID=ID or AC from the query sequence. In both of cases, we obtain the complete entry from the query sequence and can be looked up the domain information about the protein to compare with our multiple alignment. Finally we will build a phylogenetic tree with phylip format which we can view with the html viewer. STEP 5: Phylogenetic Tree Now we run the runCreateTreeFromClustalw service (Services -> Service -> Bioinformatics -> Distances -> Phylogenetics -> Phylogenetics_Tree_Computing). This service has one only input parameter: Clustalw_report: This is the primary input and the only input for the service. We will select the before clustalw_report from the object list in the interface (runClustalwFromBlast-runBlastAaSequence SMN_HUMAN vs SwissProt). Again we have to include an output name and submit the task.

Now, when the service finishes, we can see the created object, as a phylogenetic tree, with the html viewer. Here, the initial query sequence (SMN_HUMAN or Q16637; Survival motor neuron protein) is mostly linked with the other proteins from the same family (O02771, O18870, rat and mouse homologues, O35876, and P97801, and one fish protein Q9W6S8), the Survival of motor neuron-related splicing factors are grouped together (O75940, and Q8BGT7), and other proteins containing the tudor domain (Q91W18, and Q9H7E2) are grouped in other branch together with one very large protein more evolutionarily distant, and also with tudor domains (P25823). In short, with this set of services we have obtained the SMN_HUMAN homologues from Swiss-Prot database and their phylogenetic relations, but also have the relations with other distant proteins sharing domains with our query sequence. Additionally we could have executed other intermediated services, if they are considered useful for our analysis.