[email protected]  |  [email protected] | +91 9739047849  |  Helpdesk: +91 9742979111  |  Tel: +91 8041473425 | Bengaluru, Karnataka, India
Novel Drug Discovery Platform· 280+ Genomes Analyzed· 283 Proteins Patented· US Patents Held· 28+ Institutions Worldwide· 97.4% Adhesin Accuracy· Novel Drug Discovery Platform· 280+ Genomes Analyzed· 283 Proteins Patented· US Patents Held· 28+ Institutions Worldwide· 97.4% Adhesin Accuracy·
Register Now Enquiry Subscribe

CSIR-IGIB · Indian CST · NMITLI


Geno-Cluster

A Novel Platform Software Tool for Facilitating New Drug Discovery

A suite of software programs — GENE'D'CFER, PROTEOME CALKULATOR, PLHOSTFA, and SEAPATH — ported into a LINUX cluster to harness enhanced computational power for prediction of prokaryotic genes, functional assignment of encoded products, and identification of adhesins using Artificial Neural Network based algorithms.

▶ Platform Pipeline

GeneD'cfer

Gene Prediction

Proteome Calkulator

Comparative Proteomics

PLHostFA

Function Assignment

SEAPATH

Adhesin Prediction

PRIMARY SEQUENCE DATABANK

PEPTIDE DATABANK · ANN-Powered · LINUX Cluster

About Geno-Cluster

Geno-Cluster Flowdiagram
Motivation

The availability of complete sequences of more than 280 genomes provides novel opportunities for in depth understanding of various biological phenomena through in silico comparative genomics. Identification of novel genes, assignment of function to gene products and their evaluation as potential drug targets is considered to be of prime importance.

We have developed a suite of software programs GENE'D'CFER, PROTEOME CALKULATOR, PLHOSTFA, and SEAPATH and porting them into LINUX cluster to harness the enhanced computational power that aids in the prediction of prokaryotic genes, functional assignment of encoded products, identification of adhesins with the help of Artificial Neural Network based algorithms.

Key Capabilities
  • Prediction of prokaryotic genes using ANN-based heptapeptide algorithms
  • Functional assignment of encoded gene products via invariant peptide motifs
  • Identification of adhesins — surface virulence proteins
  • Comparative proteomics across species using peptide library approach
  • Drug target discovery and vaccine development support
  • Applied to 18 completely sequenced prokaryotic genomes
280+
Genome Sequences Analyzed
283
Proteins Identified & Patented
4
New SARS Genes Predicted
90%+
GeneD'cfer Accuracy
97.4%
Adhesin Identification Rate
28+
Licensed Institutions

Results

Gene'D'cfer (GDC) Approach

Prokaryotic Gene Identification Using ANN & Peptide Library

We have developed a generic and versatile new approach, designated Gene'D'cfer (GDC), for prokaryotic gene identification. Unlike other existing methods, this approach employs peptides as markers for protein coding DNA sequences. GDC determines candidate genes among all possible ORFs in a given DNA sequence through the use of Artificial Neural Network (ANN) trained on a set of known peptide library. Potential ORFs are ranked according to a scoring scheme based on the abundance and distribution pattern of heptapeptides along the ORF. ORFs identified by GDC can be overlaid with other features using complementary software programs for ribosomal binding sites, promoter sequences, transcription start sites, or codon biases for further examination. An analysis of 18 completely sequenced prokaryotic genomes has been carried out to demonstrate the capabilities of GDC. In addition, GDC has been applied on various strains of SARS virus and 4 new genes were predicted.

Proteome Calkulator

Rapid Comparative Proteomics via Peptide Library Approach

Delineating Conserved and Variable regions in sequences is of fundamental biological importance. Conserved regions are strong indicators for phylogenetically conserved functional roles whereas variable regions are generally implicated in auxiliary roles, often related to specific cases. The traditional approaches towards this objective involve comparing the homologous sequences using multiple sequence alignment algorithms. This approach although sound in theory is limited in terms of its speed and is not suited for high capacity. Although this limitation can be overcome in principle using powerful computers with enlarged memory, the results need careful scrutiny by the user. In most cases, users simply wish to know, in a first pass, the conserved and variable regions. PROTEOME CALKULATOR meets this need by offering a rapid approach to compare all the proteins (proteome) of a species with proteomes of other species using a peptide library approach.

SEAPATH

ANN-Based Adhesin Prediction at 97.4% Accuracy

Prediction of surface proteins involved in virulence from the complete sequences of proteomes of pathogens can greatly facilitate the development of anti-infectives towards eradicating infectious diseases. ANN was used to develop SEAPATH, which predicts the probability of a protein being an adhesin (Pad) based on 105 compositional properties of a sequence. SEAPATH draws upon the base algorithm SPAAN, which had optimal sensitivity of 89% and specificity of 100% and could identify 97.4% of adhesins from a wide range of bacterial pathogens causing a broad range of diseases in humans and other hosts. In the case of Severe Acute Respiratory Syndrome (SARS) associated Human corona virus, the spike glycoprotein, and nsps (nsp2, nsp5, nsp6 and nsp7) of SARS virus were identified with adhesin-like characteristics and offer new leads for rapid experimental testing.

GDC Methodology — 5 Major Steps

Step 1
Generate
Peptide Library
Step 2
Artificial Translation
6 Reading Frames
Step 3
Integer Coded
Sequences
Step 4
Training
ANN
Step 5
Decipher
Genes

The Four Software Tools

Developed by CSIR-IGIB, supported by Indian Centre for Social Transformation — all holding US patents and licensed by top institutions including IIT and IICB.

▼ Click any tool heading to expand or collapse details

GeneD'cfer (GDC)

Prokaryotic Gene Identification Tool

This software tool for predicting genes in Prokaryotes determines gene candidates amongst all possible ORFs of a given DNA sequence by using a peptide library and an Artificial Neural Network (ANN).

Background: Development of GeneD'cfer is based upon the observation that difference between total number of theoretically possible peptides of a given length and those which are actually observed in nature, grows drastically as this length of the peptide increases. Moreover, it is interesting to note that most of these peptides selected by nature are found only in coding regions and very rarely in theoretically translated non-coding regions. Prediction of a given ORF as a coding region/gene is based upon the number of heptapeptides present and the distribution of these heptapeptides along the ORF.
Method � 5 Major Steps
  • Generation of a peptide library
  • Artificial translation of a given genome into six reading frames
  • Conversion of each translated sequence into an integer coded sequence
  • Training of ANN
  • Deciphering genes using trained ANN
Features
  • Powered by a database of conserved Heptapeptides across organisms
  • Based on Artificial Neural Networks (ANN) using evolutionary principles
  • Cross validation of proteomic information to explicate its protein coding sequences
  • Good for both small as well as large genomes unlike HMM based methods
  • Parallel algorithms for creating faster library
  • Statistical interpretation
  • Interactive Graphical User Interface (GUI)
  • Customization options
  • Flexibility to build your own peptide library
  • Excellent circular genome result visualizer
  • Follows a combinatorial approach by taking both compositional as well as database similarity into consideration

★ Distinctions

  • Four new SARS genes were discovered using this software after some customization
  • GeneD'cfer has got a high accuracy of more than 90% on an average
  • It has got a high level of sensitivity and specificity
  • It is a high end quality product of the combined effort of CSIR, IGIB & Indian Centre for Social Transformation
  • It has got a long list of licensed users that include highly esteemed institutes like IIT, IICB etc.

PLHostFA

Protein Function Assignment Tool

This software tool is based on invariant peptide motif signatures and assigns putative functions to unknown proteins. It is a complementary tool to BLAST and is an auto-annotator unlike BLAST.

Background: The knowledge of conserved invariant peptides in a protein can be useful in assigning functions to hypothetical proteins, identifying critical amino acids, structural determinants and so on. The software PLHost and the database COPS (Comprehensive Peptide Signature) were developed to perform this task. The database provides information about function, structure and occurrence in biochemical pathways of the proteins containing these signature peptides. This database also facilitates the identification of folding nucleus / structural determinants in proteins and functional assignment to novel proteins.
Concepts & Methods
  • PLHost is based on the novel peptide library based approach for the identification of 'functional signatures'. This approach is independent of alignment methods, which does not require any priori classification of protein functional families and hence is applicable in case of proteins with weak degrees of overall sequence similarity
  • PLHost provides a novel method for simultaneous comparison of multiple proteomes comprising of millions of peptides and retrieves functional signatures without a prior classification of protein functional families
Features
  • Annotation and homology of small peptides
  • Octapeptide library and cross validation
  • Longest conserved peptide sequences
  • Annotation of unknown proteins
  • Peptides involved in active site formation
  • Homology in invariant peptides
  • User friendly and no usage of complicated sequence alignment tools
  • Customizable and interactive Graphical User Interface (GUI)

★ Distinctions

  • 69 potential antibacterial drug targets found using PLHost
  • 112 human proteins annotated
  • 12,076 invariant peptides predicted as functional signatures using PLHost to make COPS database
  • 4 new SARS genes annotated using PLHost
  • It is a high end quality product of the combined effort of CSIR, IGIB & Indian Centre for Social Transformation
  • It has got a long list of licensed users that include highly esteemed institutes like IIT, IICB etc.

Proteome Calkulator

Comparative Proteomics Tool

Comparative Proteomics play a vital role in analyzing protein sequence of various organisms. It helps in understanding the disease process, develop new biomarkers for diagnosis and accelerate drug development.

Background: Delineating conserved and variable regions in sequence is of fundamental biological importance. Conserved regions are strong indicators for phylogenetically conserved functional roles whereas variable regions are generally implicated in auxiliary roles, often related to specific cases. The traditional approaches towards this objective involve comparing the homologous sequences using multiple sequence alignment algorithms. In the real time cases, researchers simply wish to know, in a first pass, the conserved and variable regions. Proteome Calculator meets this need by offering a rapid approach to compare all the proteins (Proteome) of given species with proteomes of the other species using a novel peptide library approach.
Method
  • Proteome Calculator is a powerful computational tool to study several proteomes at one go by performing set theory operations like union, intersection, difference and inverse. These operations would help in identifying the most unique, conserved and clustered regions of proteins across species, which enable us to formulate a specific drug target in pharmaceutical industry
  • The characteristic feature of the tool is that it carries out multiple analysis on a wide range of bacterial strain. It performs a screening on the pathogenic organisms, narrowing down to a unique disease condition. Its efficient backstitching operation fishes out specific protein functions and domains
Features
  • Alphabetically indexed peptide library
  • Unique set theory operations applied to comparative proteomics
  • Extensive data mining options
  • Proteomics comparison in wide spectrum of organisms
  • Gives high confidence level for invariant peptide by giving its total occurrence in proteins and organisms
  • Search options based on peptide, occurrence and both in query results
  • Stitch module and multiple analysis
  • User friendly and Interactive Graphical User Interface (GUI)
  • Customizable

★ Distinctions

  • Less computational and accurate
  • It is a high end quality product of the combined effort of CSIR, IGIB & Indian Centre for Social Transformation
  • It has got a long list of licensed users that include highly esteemed institutes like IIT, IICB etc.

SEAPATH

Adhesin Prediction Tool

Prediction of surface proteins involved in virulence from the complete sequences of proteomes of pathogens can greatly facilitate the development of anti-infectives towards eradicating infectious diseases.

Background: The virulent organisms possess adhesin proteins which bind to the host and resist any defense mechanisms. These proteins help in identifying potential targets (bacterial and viral surface antigens / adhesin) for new vaccine formulations and developing therapeutics against pathogens. The conventional methods for identifying adhesin proteins are time consuming and demand large resources. It uses decisive parameters to assess whether a protein is an adhesin. The software not only identifies the known adhesins but also helps the researcher in narrowing down their search and thus enhances the accuracy percentage in annotations of proteins as adhesins.
Method
  • SEAPATH is based on the base algorithm SPAAN
  • The underlying architecture of this tool is based on Artificial Neural Networks (ANN)
  • It takes 105 compositional properties of a sequence in consideration so as to predict the adhesins
Features
  • Non homology based method
  • Analysis based on 5 parameters
  • Only software available for Adhesin prediction
  • Optimal sensitivity of 89% and specificity of 100% on a defined test set
  • Adhesin prediction accuracy for known adhesins of 97.4% for wide range of bacteria
  • Tabulates individual contributions of the parameters
  • Choice of parameters
  • User friendly and Interactive Graphical User Interface (GUI)
  • Customizable
  • Primarily used for drug discovery

★ Distinctions

  • Novel adhesins were identified for different pathogens
  • In case of SARS associated Human corona virus, the spike glycoprotein, and nsps (nsp2, nsp5, nsp6 and nsp7) of SARS virus were identified after some customization
  • Only counterpart to wet lab in such type of analysis
  • It is a high end quality product of the combined effort of CSIR, IGIB & Indian Centre for Social Transformation
  • It has got a long list of licensed users that include highly esteemed institutes like IIT, IICB etc.
  • Reference: http://www.pnas.org/content/105/14/5555.abstract ↗

Geno-Cluster Achievements

283
Proteins Identified & Patented
4
New SARS Genes Identified
15
Protein-Coding Regions in SARS-CoV
2,605
Bacterial Proteins Annotated
112
Human Proteins Annotated
69
Antibacterial Drug Targets Found
12,076
Invariant Peptides as Functional Signatures
28+
Institutions Licensed Worldwide
Geno-Cluster Achievements

Why Pharmaceutical Companies Use Geno-Cluster

Especially valuable for companies into Re-engineering Vaccines, new drug discovery, or finding druggable drug targets, or for the following needs:

GENO-CLUSTER in Drug Discovery Process

Scientists rely on bioinformatics during every step of the drug discovery process in an effort to comprehend biological and disease mechanisms, identify new targets and to select and design novel drugs. But while methods for sequencing, measuring expression, and assessing structure have achieved high-throughput capacity via automation, the means by which data is analyzed are lagging behind.

Geno-Cluster Drug Discovery Process Diagram

Genes

Protein Function

Adhesion

Drug Target Discovery

Drug Screening & Toxicology

Drug Discovery & Characterization

Medicinal Chemistry

Validation of Animal Models

Patient Profiling

Preclinical Studies

Clinical Studies

Why It Will Be a Successful Story

The Future of Personalized Medicine

The practice of studying genetic disorders is changing from investigation of single genes in isolation to discovering cellular networks of genes, understanding complex interactions, and identifying their role in disease.

As a result of this, a whole new age of individually tailored medicine will emerge. Bioinformatics will guide and help molecular biologists and clinical researchers to capitalize on the advantages brought by computational biology.

On the horizon: more effective and affordable medicines, new research that leads to treatment and cures, and healthcare decisions based on a person's genes.

Collaborations: between small biotech companies and larger drug development organizations, such as pharmaceutical companies, can be mutually beneficial. Under such agreements, smaller companies can gain financing to carry on with their R&D programs, while the bigger company will supplement its new drug pipeline with an innovative product.

Scientists rely on bioinformatics during every step of the drug discovery process in an effort to comprehend biological and disease mechanisms, identify new targets and to select and design novel drugs. But while methods for sequencing, measuring expression, and assessing structure have achieved high-throughput capacity via automation, the means by which data is analyzed are lagging behind.

Indian Bioinformatics Product Success Story

NMITLI — National Mission on Innovation

NMITLI is the largest public-private-partnership R&D initiative of the Govt. of India. In a short span of time, the programme has several significant achievements to its credit. These include the TB molecule, herbal formulations for Psoriasis, low cost computer, weather forecast system, Bioinformatics products etc, with GENO-CLUSTER being one of them that has been developed by Institute of Genomics and Integrative Biology (IGIB) and the Council of Scientific and Industrial Research (CSIR) and now further development and hosting it is supported by Indian Centre for Social Transformation (Indian CST) a public charitable Trust.

All the applications hold US patents, and have been installed in the leading academic and research institutions all over India and across the world. The software has already proved to be of tremendous use in the discovery of novel genes of the SARS virus and has several papers credited to its findings.

For Universities & Colleges

  • New training courses should be initiated by Universities to promote the idea of the advantages of using Indigenous Bioinformatics tools in bioinformatics departments across the country
  • These courses would include the fundamentals of Operating Systems, Parallel Programming, Hardware Design, Architectures and how to use these tools etc.
  • Interdisciplinary research should be encouraged and students across departments should be allowed to take up such courses
  • These tools will help students in generating and developing new ideas and concepts, which could revolutionize the bioinformatics research
  • At the postgraduate level, it could be used for carrying out project work and publishing papers

National Interest & Benefits

  • In the larger national interest, Bioinformatics tools developed in India should be taught to the students by all our Universities, Colleges and Bioinformatics centers to strengthen the employability of the qualifiers and also to create manpower familiar with such tools
  • Students across departments should be allowed to take up such courses, thus allowing them to get the exposure of the recent emerging trends in the field and the experience of using a supercomputing environment

Institutes Who Have Purchased Geno-Cluster

1
Indian Institute of Chemical Biology
2
Dr. Naidu's Global Academy
3
Indian Veterinary Research Institute (IVRI)
4
University of Pune
5
Madurai Kamaraj University
6
Rajiv Gandhi Centre for Biotechnology
7
Fisheries College
8
Vidya Pratishthan's (Baramati)
9
Indian Institute of Technology Madras
10
Amrita Vishwa Vidyapeetham
11
RIKEN — Genome Science Center, Japan
12
DOEACC Kolkata
13
Sri Ramachandra Medical College & Research Institute
14
Holy Cross College
15
Union Christian College
16
Banasthali Vidyapith
17
TNAU, Coimbatore
18
Bharathidasan University
19
National Institute of Technology
20
IBSD (Imphal)
21
BARC
22
CIMAP
23
West Bengal University of Technology
24
Lyallpur Khalsa College
25
National Jalma Institute
26
University of Kerala
27
University of Allahabad
28
NIPER

For More Details & Demo

For more details and real time solutions demo experience visit www.indiancst.in. Academic access to CSIR-IGIB data servers is free. Contact CSIR-IGIB or Indian CST for commercial use.

Visit www.indiancst.in

Free Academic Use

CSIR-IGIB data servers free for academic institutions

Commercial Licensing

Contact CSIR-IGIB or Indian CST for commercial use

University Training

Training on indigenous bioinformatics tools

R&D Collaboration

Joint research with CSIR-IGIB & Indian CST