Bioinformatics, Biostatistics, and Computational Biology
We envision that the BBC will build upon all these recent advances – hardware and personnel, and be able to provide a set of specific services required by the Projects, provide general services required by multiple projects as well as work with Campuses and LONI resource providers to guide the development of Cyberinfrastructure to be effective for Computational Biologists and BioInformaticians. The BBC will also greatly increase the level – frequency and sophistication, of Computational Training provided to LBRN researchers.
The functionality of the BBC services can be roughly grouped into 10 different categories:
1. HPC computing: Project requires HPC computing resources provided by LONI.
2. Data intensive: Project requires data intensive computing that will be supported by LONI/Petashare.
3. Data visualization: Project benefits from data visualization provided by LONI.
4. Rational Drug Discovery (Docking/protein-protein/NA-NA interactions): Projects carries out rational drug discovery computation and BBC will support computing resources and domain expertise.
5. Microarray data: Project carries out microarray experiment and requires microarray data analysis or develops tool kits for microarray data analysis.
6. Genomic database search: Projects benefits from genome database search that will be supported by BBC with computing resources and consulting.
7. Pathway analysis: Project carries out pathway analysis with BBC providing supports and consulting on packages.
8. Bioinformatics tool development: Assistance with tool development and delivery for Bioinformatics research.
9. Application of Statistical analysis and standalone tools: Assistance and mentoring with case-based statistical and tool consulting.
10. Data and information integration: Project uses LONI resources to integrate data and perform discovery from related experiments using BBC supported CyberTools.
In general, Categories 1-3 provide the software and infrastructure services to enhance the High Performance Computing resources and environments to support effective computational and data management, including support for Visualization requirements for research projects. Categories 4-7 provide support for specific Bioinformatics tools, which along with the first set will be the primary responsibility of the Bioinformatics subcore. Categories 8-10 will be jointly supported by the two sub-cores.
Under the current of the grant, tools specific to life science data sets have been developed. They have been utilized and integrated with tools in the public domain, are scalable and capable of handling large data sets. Some specific tools developed include, but are not limited to:
LABiViz: Data Analysis and Visualization
The suite of visualization tools that helps with the analysis and visualization as well as identification of patterns in biomedical data. In particular, it focuses on neural networkaugmented high-performance computing-supported visualization techniques and algorithms. These tools enable life scientists to analyze high-dimensional biomedical data, ranging from gene expression arrays, miRNA predictions, and sequences to any other data in commonly used tabular format. The techniques incorporated include a modified version of a self-organizing map, a data mining algorithm, visualizations (parallel coordinates, radviz, scatter plot, etc.) and statistical tools (min, max, range, averages, etc.) and selection and filtering tools that enable a real-time interaction with the data.
Castle: Cluster Analysis
The tools incorporated in Castle are based on metrics and provide for a visual comparison of unsupervised and supervised clustering results. The users can choose from a set of clustering algorithms incorporated into castle, or, if they prefer, import their own clustered data in order to be analyzed. The users are able to visually identify the overlap and differences among the different algorithms and make the best decision with regards to the true clustered structure. The suite includes data mining, visualization and computational tools.
P3Maps: Protein Physico-chemical Property Maps
P3Maps provides an understanding of the sequence and physico-chemical relationship and paves the way for users to identify local sequence property modulations that impact protein function without changing the protein structure. It provides users a means to analyze multiple sequences (DNA or Protein) using existing tools of Clustal W and phylogenetic trees in highperformance computing environments. Its features provide users to analyze conserved domains using data mining techniques of prediction. The tool is based on coherence profiles of over 400 different physio-chemical properties of protein sequences and allows users to focus on a distinct property-space or sequence component for comparison and mining purposes.
PC4: Protein Core Calculation, Conformation and Classification.
PC4 is a graph based data mining tool, provides users a way to analyze the structure of proteins. The tool features enable users to extract and isolate protein structural units of sustained invariance among evolutionary related proteins, by integrating physicochemical properties over the 3-D structure of a protein. The tool features data mining classification algorithms for classification and prediction and connects to known datasets and databases, such as the PDB and SCOP. Various sub-graphs can be generated for demonstrating interaction patterns among residues under different property constraints and the tool can be focused on structural units of specific interest to the user.
AIMS: Associative Image Classification System
AIMS is an automated image search, retrieval and classification engine. It enables the user to perform content-based image retrieval and image classification using weighted isomorphic relationships. Query can be launched by choosing an image already present in the database or by giving a completely new image from the web. AIMS also provide the user with the functionality of using raw features instead of weighted rules for querying in order to see comparison among different classification techniques. The tool allows the user to discovery areas of similarity and dissimilarity among images and can autonomously classify input images into known categories base don redundancy rules. Current version of AIMS supports querying with both natural and medical images.
INFUSE is a tool that integrates information from heterogeneous sources to facilitate understanding by providing knowledge that is not evident from the individual sources. This tool aims at providing a framework for the development of an information fusion system. The current version of INFUSE is enabled with the mechanism of integrating two different gene expression synchronization profiles based on a number of statistical methods like - Principle Component Analysis (PCA), Spectral Coherence and Correlation measures.
All of these features – advanced and intermediate will be provided through LBRN Bio-portal, which will provide a single, consistent and simple interface. In Phase-I of the project, in conjunction with LONI and CyD Groups, we began the development of a Web Portal exclusively for LBRN researchers. In its current form, it has been used for some elementary job submission; encouraged by its uptake, the aim is to enhance its capabilities and synchronize its development in conjunction with the evolving needs of the LBRN Research community. For example, the enhanced LBRN Bio-portal will provide: i. Integrated information on accessible high performance computing resources in LONI/Campus HPC/Teragrid, as well as other nation-wide federated grids ii. Information on Bioinformatics/Biocomputing tools iii. Science Gateway for bioinformatics services (Phylogeny, Docking, Structure prediction tools) iv. Portal to BBCore project which will provide access to the many tools that have been developed in Phase-I v. Significant Web-based, pervasive Work Space We will build upon existing best-practices and tools used to facilitate the development of Web-based portals (e.g. RENCI and TeraGrid Life-Sciences Gateway).
Last Updated: Tuesday, December 07 2010 @ 03:06 PM CST|Hits: 2,702