July 09, 2012
by James Coffin, Ph.D., vice president and general manager, Dell Healthcare and Life Sciences
The transformation of healthcare from episodic to true personalized care is being met with both optimism and the realities of a system that does not take into account a patient’s full health record (both genomic and non-genomic attributes) or the need to collaborate effectively to coordinate care. With the advent of new high-performance computing (HPC) technologies and genomic tools, we are entering an era where healthcare professionals can make more informed decisions on clinical care. While high-throughput research platforms, like next-generation sequencing (NGS), allow researchers to investigate genome-wide variations in genetic markers between normal and diseased tissue, they also create a new problem: the management, sharing and analysis of massive amounts of genomic data related to a patient needed by healthcare professionals to improve diagnosis and treatment. A fresh approach is needed to bridge the gap between clinical research and practice in order to build a more complete picture of disease and treatment strategies and allow healthcare professionals to share knowledge with other experts to determine the best course of care and improve outcomes.
A collaboration between the Translational Genomics Research Institute (TGen) and the Neuroblastoma and Medulloblastoma Translational Research Consortium (NMTRC) is underway on the world’s first FDA-approved personalized medicine clinical trial for pediatric cancer. TGen will use its genomic technology to help NMTRC identify a greater depth of personalized treatment strategies for children with neuroblastoma who are enrolled in the trial, which brings together scientific and medical partners from all across the country. Crucial to its success is an information technology (IT) platform that supports collaboration among the participating clinical sites to create the knowledge base critical for targeted care.
In conjunction with Dell and its partners, the organizations have built a best-in-class, HPC and cloud-based IT infrastructure designed to accelerate genetic analysis and identification of targeted treatments for patients. As part of the infrastructure, trial-specific portals and a high-speed, grid-based architecture are being implemented to facilitate the rapid transfer of genomic and relevant clinical data between collaborators in the trial. This will help facilitate the integration of genomic data into the studies to build a unique medical profile for each patient that will allow clinicians to predict which of the available therapeutic drugs will be most effective. The goals of the collaboration are long-term object storage, quick data transfer between sites and transparency for everyone from patients to bioinformaticians to scientists to trial administrators.
The Big Data Challenge
The initial focus of the project was to tackle the “big data” challenge faced by the organizations performing the NGS experiments. The raw data generated by these instruments is extremely large and, in the case of TGen, is doubling every six months. The data objects are complex files with important metadata about the samples and instruments themselves and can be up to 3TB in size and require significant processing resources to collect, manage and analyze. With TGen managing up to 200TB in genomic data per patient, it was important to develop an IT strategy to allow them to analyze these massive files quickly and affordably. After all, the children enrolled in these studies require a quick turnaround to give them the best chance in fighting their disease.
To overcome these challenges, TGen replaced a legacy Dell PowerEdge C2100 system with a cluster of Dell PowerEdge M710HD blade servers. The blades, which run CentOS Linux, are housed in three Dell M1000e modular blade enclosures. Dell Force10 C300 and S4810 10-Gigabit switches provide connectivity for the cluster’s 800 cores. All told, the cluster’s maximum performance is eight teraflops, but despite the dramatic improvement in processing power, the HPC cluster has a small footprint—with three-fold more cores packed into the same floor space—and reduced energy consumption as the blades use 25 percent less power per core than the legacy servers.
For data storage, TGen is building a multi-tier solution that combines multiple technologies as part of the Dell Fluid Data architecture. The technology implemented in this case keeps the data available for researchers and clinicians to collaborate on care while at the same time making it easy to manage and back up to archive. The storage architecture includes a high-performance file system for high-speed, parallel file access, plus Dell Compellent storage in support of more traditional applications, such as Microsoft SQL Server databases and laboratory file sharing. For back-up and archiving, TGen is leveraging the Dell DX Object Storage Platform. The DX platform is especially important because the cost per terabyte makes it affordable to store large amounts of data, scaling well into the petabytes, while allowing TGen’s researchers to use their advanced algorithms to mine these large data sets.
The next phase of the implementation is addressing the challenge of long-distance communication. As part of this clinical trial, TGen must partner on research projects with many different professionals from organizations around the world. In addition to patients and their families, the trial involves many clinicians, researchers and pathologists. Patient samples are collected and dissected by biologists, geneticists apply the latest genomics technology to the samples, and bioinformaticians mine the data. Add in the supporting biostatisticians, computer scientists and software engineers and it is critical to create a high-throughput environment that everyone can use as targeted treatments are being developed. TGen and Dell are developing a cloud-based collaboration system to facilitate such interactions. The cloud-based platform provides a virtual library of data that can be accessed by researchers and allow data to be checked out and analyzed using HPC capabilities. This enables fluid integration between premise-based capabilities and virtual capabilities (the cloud) and is providing the framework to move the data seamlessly through the research lifecycle, protect it, and make it available for future use. In addition, the system includes a high-performance, grid-based architecture to move the massive amounts of genomic data around quickly and securely. Data can be ingested at various sites, moved to the cloud and then made available for analysis either in premise-based HPC environments or any HPC cloud environment. Another important feature of the IT strategy was to localize data near HPC capacity both in the cloud and on-premise to speed analysis and validation.
The HPC environment and first phase of the cloud infrastructure is already yielding significant benefits. The project has increased TGen’s gene sequencing and analysis capacity by 1,200 percent and improved collaboration between physicians, genetic researchers, pharmacists and computer scientists involved in the clinical trial. This cloud infrastructure and portal technology is designed to efficiently manage the volume and complexity of that data while making it secure and accessible to many. For this personalized medicine trial to be successful, doctors and researchers will need the ability to interpret their patient’s genomic information into useful knowledge for targeted care both quickly and affordably. With TGen translational knowledge and the Dell high-performance cloud technology, researchers have accelerated the analysis of patient-specific genomic data from several days down to one day, resulting in a significant improvement in time to targeted treatment. For patients with neuroblastoma, this literally means the difference between life and death.
James Coffin, Ph.D.
Vice President and General Manager, Dell Healthcare and Life Sciences
As vice president and general manager of Dell Healthcare and Life Sciences, James Coffin leads teams in developing the latest innovative information technology solutions and services for healthcare, building the partner ecosystem and driving Dell’s thought leadership in healthcare. Prior to joining Dell, Coffin spent more than 12 years at IBM, where he held a variety of leadership positions. Prior to joining IBM, he was considered a leader in the application of computational chemistry techniques and high-performance computing to real world chemical and biological problems. Coffin holds a Ph.D. in physical chemistry from the University of Arkansas and a Bachelor of Science degree from Louisiana Tech. He studied at Cambridge University as a Cambridge Fulbright Postdoctoral Fellow and was a member of the scientific staff of the National Center for Supercomputing Applications at the University of Illinois. He lectures worldwide on innovation in the field of electronic medical records, personalized medicine, high-performance computing and leading edge in silico techniques to accelerate drug discovery.
The ever-growing complexity of scientific and engineering problems continues to pose new computational challenges. Thus, we present a novel federation model that enables end-users with the ability to aggregate heterogeneous resource scale problems. The feasibility of this federation model has been proven, in the context of the UberCloud HPC Experiment, by gathering the most comprehensive information to date on the effects of pillars on microfluid channel flow.
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
Frank Ding, engineering analysis & technical computing manager at Simpson Strong-Tie, discussed the advantages of utilizing the cloud for occasional scientific computing, identified the obstacles to doing so, and proposed workarounds to some of those obstacles.
May 23, 2013 |
The study of climate change is one of those scientific problems where it is almost essential to model the entire Earth to attain accurate results and make worthwhile predictions. In an attempt to make climate science more accessible to smaller research facilities, NASA introduced what they call ‘Climate in a Box,’ a system they note acts as a desktop supercomputer.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/02/2012 | AMD | Developers today are just beginning to explore the potential of heterogeneous computing, but the potential for this new paradigm is huge. This brief article reviews how the technology might impact a range of application development areas, including client experiences and cloud-based data management. As platforms like OpenCL continue to evolve, the benefits of heterogeneous computing will become even more accessible. Use this quick article to jump-start your own thinking on heterogeneous computing.