Geno-Cluster: New Drug Discovery

Results

Gene'D'cfer (GDC) Approach

Prokaryotic Gene Identification Using ANN & Peptide Library

We have developed a generic and versatile new approach, designated Gene'D'cfer (GDC), for prokaryotic gene identification. Unlike other existing methods, this approach employs peptides as markers for protein coding DNA sequences. GDC determines candidate genes among all possible ORFs in a given DNA sequence through the use of Artificial Neural Network (ANN) trained on a set of known peptide library. Potential ORFs are ranked according to a scoring scheme based on the abundance and distribution pattern of heptapeptides along the ORF. ORFs identified by GDC can be overlaid with other features using complementary software programs for ribosomal binding sites, promoter sequences, transcription start sites, or codon biases for further examination. An analysis of 18 completely sequenced prokaryotic genomes has been carried out to demonstrate the capabilities of GDC. In addition, GDC has been applied on various strains of SARS virus and 4 new genes were predicted.

Proteome Calkulator

Rapid Comparative Proteomics via Peptide Library Approach

Delineating Conserved and Variable regions in sequences is of fundamental biological importance. Conserved regions are strong indicators for phylogenetically conserved functional roles whereas variable regions are generally implicated in auxiliary roles, often related to specific cases. The traditional approaches towards this objective involve comparing the homologous sequences using multiple sequence alignment algorithms. This approach although sound in theory is limited in terms of its speed and is not suited for high capacity. Although this limitation can be overcome in principle using powerful computers with enlarged memory, the results need careful scrutiny by the user. In most cases, users simply wish to know, in a first pass, the conserved and variable regions. PROTEOME CALKULATOR meets this need by offering a rapid approach to compare all the proteins (proteome) of a species with proteomes of other species using a peptide library approach.

SEAPATH

ANN-Based Adhesin Prediction at 97.4% Accuracy

Prediction of surface proteins involved in virulence from the complete sequences of proteomes of pathogens can greatly facilitate the development of anti-infectives towards eradicating infectious diseases. ANN was used to develop SEAPATH, which predicts the probability of a protein being an adhesin (Pad) based on 105 compositional properties of a sequence. SEAPATH draws upon the base algorithm SPAAN, which had optimal sensitivity of 89% and specificity of 100% and could identify 97.4% of adhesins from a wide range of bacterial pathogens causing a broad range of diseases in humans and other hosts. In the case of Severe Acute Respiratory Syndrome (SARS) associated Human corona virus, the spike glycoprotein, and nsps (nsp2, nsp5, nsp6 and nsp7) of SARS virus were identified with adhesin-like characteristics and offer new leads for rapid experimental testing.

GDC Methodology — 5 Major Steps

Step 1
Generate
Peptide Library

→

Step 2
Artificial Translation
6 Reading Frames

→

Step 3
Integer Coded
Sequences

→

Step 4
Training
ANN

→

Step 5
Decipher
Genes

The Four Software Tools

Developed by CSIR-IGIB, supported by Indian Centre for Social Transformation — all holding US patents and licensed by top institutions including IIT and IICB.

▼ Click any tool heading to expand or collapse details

GeneD'cfer (GDC)

Prokaryotic Gene Identification Tool

This software tool for predicting genes in Prokaryotes determines gene candidates amongst all possible ORFs of a given DNA sequence by using a peptide library and an Artificial Neural Network (ANN).

Background: Development of GeneD'cfer is based upon the observation that difference between total number of theoretically possible peptides of a given length and those which are actually observed in nature, grows drastically as this length of the peptide increases. Moreover, it is interesting to note that most of these peptides selected by nature are found only in coding regions and very rarely in theoretically translated non-coding regions. Prediction of a given ORF as a coding region/gene is based upon the number of heptapeptides present and the distribution of these heptapeptides along the ORF.

Method � 5 Major Steps

Generation of a peptide library
Artificial translation of a given genome into six reading frames
Conversion of each translated sequence into an integer coded sequence
Training of ANN
Deciphering genes using trained ANN

Features

Powered by a database of conserved Heptapeptides across organisms
Based on Artificial Neural Networks (ANN) using evolutionary principles
Cross validation of proteomic information to explicate its protein coding sequences
Good for both small as well as large genomes unlike HMM based methods
Parallel algorithms for creating faster library
Statistical interpretation
Interactive Graphical User Interface (GUI)
Customization options
Flexibility to build your own peptide library
Excellent circular genome result visualizer
Follows a combinatorial approach by taking both compositional as well as database similarity into consideration

★ Distinctions

Four new SARS genes were discovered using this software after some customization
GeneD'cfer has got a high accuracy of more than 90% on an average
It has got a high level of sensitivity and specificity
It is a high end quality product of the combined effort of CSIR, IGIB & Indian Centre for Social Transformation
It has got a long list of licensed users that include highly esteemed institutes like IIT, IICB etc.

PLHost^FA

Protein Function Assignment Tool

This software tool is based on invariant peptide motif signatures and assigns putative functions to unknown proteins. It is a complementary tool to BLAST and is an auto-annotator unlike BLAST.

Background: The knowledge of conserved invariant peptides in a protein can be useful in assigning functions to hypothetical proteins, identifying critical amino acids, structural determinants and so on. The software PLHost and the database COPS (Comprehensive Peptide Signature) were developed to perform this task. The database provides information about function, structure and occurrence in biochemical pathways of the proteins containing these signature peptides. This database also facilitates the identification of folding nucleus / structural determinants in proteins and functional assignment to novel proteins.

Concepts & Methods

PLHost is based on the novel peptide library based approach for the identification of 'functional signatures'. This approach is independent of alignment methods, which does not require any priori classification of protein functional families and hence is applicable in case of proteins with weak degrees of overall sequence similarity
PLHost provides a novel method for simultaneous comparison of multiple proteomes comprising of millions of peptides and retrieves functional signatures without a prior classification of protein functional families

Features

Annotation and homology of small peptides
Octapeptide library and cross validation
Longest conserved peptide sequences
Annotation of unknown proteins
Peptides involved in active site formation
Homology in invariant peptides
User friendly and no usage of complicated sequence alignment tools
Customizable and interactive Graphical User Interface (GUI)

★ Distinctions

69 potential antibacterial drug targets found using PLHost
112 human proteins annotated
12,076 invariant peptides predicted as functional signatures using PLHost to make COPS database
4 new SARS genes annotated using PLHost
It is a high end quality product of the combined effort of CSIR, IGIB & Indian Centre for Social Transformation
It has got a long list of licensed users that include highly esteemed institutes like IIT, IICB etc.

Proteome Calkulator

Comparative Proteomics Tool

Comparative Proteomics play a vital role in analyzing protein sequence of various organisms. It helps in understanding the disease process, develop new biomarkers for diagnosis and accelerate drug development.

Background: Delineating conserved and variable regions in sequence is of fundamental biological importance. Conserved regions are strong indicators for phylogenetically conserved functional roles whereas variable regions are generally implicated in auxiliary roles, often related to specific cases. The traditional approaches towards this objective involve comparing the homologous sequences using multiple sequence alignment algorithms. In the real time cases, researchers simply wish to know, in a first pass, the conserved and variable regions. Proteome Calculator meets this need by offering a rapid approach to compare all the proteins (Proteome) of given species with proteomes of the other species using a novel peptide library approach.

Method

Proteome Calculator is a powerful computational tool to study several proteomes at one go by performing set theory operations like union, intersection, difference and inverse. These operations would help in identifying the most unique, conserved and clustered regions of proteins across species, which enable us to formulate a specific drug target in pharmaceutical industry
The characteristic feature of the tool is that it carries out multiple analysis on a wide range of bacterial strain. It performs a screening on the pathogenic organisms, narrowing down to a unique disease condition. Its efficient backstitching operation fishes out specific protein functions and domains

Features

Alphabetically indexed peptide library
Unique set theory operations applied to comparative proteomics
Extensive data mining options
Proteomics comparison in wide spectrum of organisms
Gives high confidence level for invariant peptide by giving its total occurrence in proteins and organisms
Search options based on peptide, occurrence and both in query results
Stitch module and multiple analysis
User friendly and Interactive Graphical User Interface (GUI)
Customizable

★ Distinctions

Less computational and accurate
It is a high end quality product of the combined effort of CSIR, IGIB & Indian Centre for Social Transformation
It has got a long list of licensed users that include highly esteemed institutes like IIT, IICB etc.

SEAPATH

Adhesin Prediction Tool

Prediction of surface proteins involved in virulence from the complete sequences of proteomes of pathogens can greatly facilitate the development of anti-infectives towards eradicating infectious diseases.

Background: The virulent organisms possess adhesin proteins which bind to the host and resist any defense mechanisms. These proteins help in identifying potential targets (bacterial and viral surface antigens / adhesin) for new vaccine formulations and developing therapeutics against pathogens. The conventional methods for identifying adhesin proteins are time consuming and demand large resources. It uses decisive parameters to assess whether a protein is an adhesin. The software not only identifies the known adhesins but also helps the researcher in narrowing down their search and thus enhances the accuracy percentage in annotations of proteins as adhesins.

Method

SEAPATH is based on the base algorithm SPAAN
The underlying architecture of this tool is based on Artificial Neural Networks (ANN)
It takes 105 compositional properties of a sequence in consideration so as to predict the adhesins

Features

Non homology based method
Analysis based on 5 parameters
Only software available for Adhesin prediction
Optimal sensitivity of 89% and specificity of 100% on a defined test set
Adhesin prediction accuracy for known adhesins of 97.4% for wide range of bacteria
Tabulates individual contributions of the parameters
Choice of parameters
User friendly and Interactive Graphical User Interface (GUI)
Customizable
Primarily used for drug discovery

★ Distinctions

Novel adhesins were identified for different pathogens
In case of SARS associated Human corona virus, the spike glycoprotein, and nsps (nsp2, nsp5, nsp6 and nsp7) of SARS virus were identified after some customization
Only counterpart to wet lab in such type of analysis
It is a high end quality product of the combined effort of CSIR, IGIB & Indian Centre for Social Transformation
It has got a long list of licensed users that include highly esteemed institutes like IIT, IICB etc.
Reference: http://www.pnas.org/content/105/14/5555.abstract ↗

Why Pharmaceutical Companies Use Geno-Cluster

Especially valuable for companies into Re-engineering Vaccines, new drug discovery, or finding druggable drug targets, or for the following needs:

To increase the efficacy of drug based on population genetics
To decrease the number of Adverse Drug Reactions (ADR)
For targeting only those populations capable of responding to a drug will reduce the cost and risk of clinical trials
To reduce the number of medicines patients must take to find an effective therapy
To revive previously failed drug targets, as they are matched with the niche population they survey
To shorten the length of time patients are on medication
To increase the range of possible drug targets will promote a net decrease in the cost of health care
To discover potential therapies more easily using genome targets
To facilitate the drug approval process, as trials are targeted for specific genetic population groups providing greater degrees of success

Indian Bioinformatics Product Success Story

NMITLI — National Mission on Innovation

NMITLI is the largest public-private-partnership R&D initiative of the Govt. of India. In a short span of time, the programme has several significant achievements to its credit. These include the TB molecule, herbal formulations for Psoriasis, low cost computer, weather forecast system, Bioinformatics products etc, with GENO-CLUSTER being one of them that has been developed by Institute of Genomics and Integrative Biology (IGIB) and the Council of Scientific and Industrial Research (CSIR) and now further development and hosting it is supported by Indian Centre for Social Transformation (Indian CST) a public charitable Trust.

All the applications hold US patents, and have been installed in the leading academic and research institutions all over India and across the world. The software has already proved to be of tremendous use in the discovery of novel genes of the SARS virus and has several papers credited to its findings.

For Universities & Colleges

New training courses should be initiated by Universities to promote the idea of the advantages of using Indigenous Bioinformatics tools in bioinformatics departments across the country
These courses would include the fundamentals of Operating Systems, Parallel Programming, Hardware Design, Architectures and how to use these tools etc.
Interdisciplinary research should be encouraged and students across departments should be allowed to take up such courses
These tools will help students in generating and developing new ideas and concepts, which could revolutionize the bioinformatics research
At the postgraduate level, it could be used for carrying out project work and publishing papers

National Interest & Benefits

In the larger national interest, Bioinformatics tools developed in India should be taught to the students by all our Universities, Colleges and Bioinformatics centers to strengthen the employability of the qualifiers and also to create manpower familiar with such tools
Students across departments should be allowed to take up such courses, thus allowing them to get the exposure of the recent emerging trends in the field and the experience of using a supercomputing environment

Geno-Cluster

▶ Platform Pipeline

About Geno-Cluster

Results

Gene'D'cfer (GDC) Approach

Proteome Calkulator

SEAPATH

GDC Methodology — 5 Major Steps

The Four Software Tools

GeneD'cfer (GDC)

★ Distinctions

PLHostFA

★ Distinctions

Proteome Calkulator

★ Distinctions

SEAPATH

★ Distinctions

Geno-Cluster Achievements

Why Pharmaceutical Companies Use Geno-Cluster

GENO-CLUSTER in Drug Discovery Process

Genes

Protein Function

Adhesion

Drug Target Discovery

Drug Screening & Toxicology

Drug Discovery & Characterization

Medicinal Chemistry

Validation of Animal Models

Patient Profiling

Preclinical Studies

Clinical Studies

Why It Will Be a Successful Story

The Future of Personalized Medicine

Indian Bioinformatics Product Success Story

NMITLI — National Mission on Innovation

For Universities & Colleges

National Interest & Benefits

Institutes Who Have Purchased Geno-Cluster

For More Details & Demo

Free Academic Use

Commercial Licensing

University Training

R&D Collaboration

PLHost^FA