Bioinformatics Architecture Project
 

There are three fundamental resources required for bioinformatics research efforts: data storage, computational resources and communication bandwidth. Under the lead of Peter L. Bird, PhD, MCBI is implementing a facility that will provide researchers across the State of Michigan with these necessary computational resources.


The management of the volume of bioinformatics data requires significant hardware and software infrastructure. We are implementing a multi-processor, redundant data server cluster designed to provide access to tera-byte scale data sources. The data server environment of MCBI is designed to enforce access policies defined by the publishing researcher (or institution).

   

Data published to MCBI’s server will be mirrored and archived to provide both performance and reliable retrieval. The data architecture is designed to scale in response to the volume of published data.

MCBI will host bioinformatic applications used across the Michigan Life Sciences Corridor (MLSC). Our application server cluster is designed to provide a host site for these applications that is directly integrated with our data server cluster, providing for high volume processing of research data. For research applications requiring significant computational facilities, MCBI has a 100+ node Linux-based Beowulf super-computing cluster. Our computational cluster is connected to our database server with a dedicated Gigabit Ethernet network.

MCBI is working closely with the Center for Advanced Computing (CAC) at the University of Michigan. The CAC has recently acquired the central supercomputer from the San Diego National Laboratory: an IBM SP-2, 176-node supercomputer. At this time, this is the most powerful computing cluster in Michigan. Together with MCBI’s Beowulf cluster, these supercomputer clusters provide the most significant computational facilities available in the State to researchers of the MLSC.


An effective communications infrastructure is required for management of high-volume data across multiple sites. The MCBI computing facility has a Gigabit Ethernet connection through the Merit Networks directly to the Internet. The University of Michigan is the prime contractor for Internet 2 . As this infrastructure becomes fully operational, MCBI will utilize its advanced bandwidth for our hosted applications and data migration requirements.

   
Current connectivity across Michigan is shown in Figure below.
network map
   
MCBI has established an active research program in bioinformatics systems architecture. We are collaborating with researchers and administrators for the University’s Grid computing initiative (see www.grid.org for an overview of Grid computing). The charter of this University group is to provide a seamless infrastructure for high-performance and ubiquitous computing across the University of Michigan.
   
 
computer users

This collaboration will provide MCBI with a framework for data management (and migration), resource management and security. We anticipate that this will allow MCBI to quickly utilize additional bioinformatic databases and supercomputing resources as they are become available.

MCBI is working to develop a flexible system architecture for bioinformatics research. We anticipate a rapid growth in data volume, computational requirements and data bandwidth needs of our members. Under MCBI’s architecture, new projects can utilize our existing infrastructure and contribute only those components (disk drives or specific processors) required for their needs. The database and supercomputing clusters, applications and security structures are shared by all MCBI members.

Researchers and biotechs companies can utilize this environment for pilot projects and proof of concept projects, reducing project development costs and eliminating the need to re-create a computational infrastructure for their work.

   

We have two pilot projects that will exercise MCBI’s architecture. The first project provides MCBI members with shared access to micro-array gene-expression analysis software from Silicon Genetics. Researchers from any client site across the MLSC will be able to analyze their data, at their local labs, using GeneSpring™. Experimental results can be published to a centralized Oracle-based gene-expression database, hosted by MCBI. In the near future, published data will be able to be analyzed on the MCBI computing clusters.

   
The second pilot project is a proteomics database that will be hosted by MCBI as part of the Michigan Proteome Consortium’s proteomics enterprise software project. The proteomics database will hold terabytes of data received from mass-spectrometers distributed around the University of Michigan. Dr. Philip Andrew’s lab is developing enterprise software to track and analyze the data, under an MPC project. After the proof-of-concept stage, their software will be distributed for use at other sites across the MLSC.

 


For questions or comments regarding this page please contact: webmaster Last update October 3, 2007 9:28 AM
Michigan Center for Biological Information (MCBI) ©2002

 

MCBI Home MCBI Site Map MCBI Personnel MCBI News/Events MCBI Projects MCBI Resources MCBI Services MCBI Links VHP link VHP link