Chapter 6. Case studies

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Case studies

This chapter illustrates the following case studies:

•Sidra Medicine

•Amsterdam UMC

•L7 Informatics

•University of Birmingham

•Thomas Jefferson University

•Biotechnology and Biomedicine Center of the Czech Academy of Sciences and Charles University: BIOCEV

•Washington University St. Louis and Vanderbilt University

6.1 Sidra Medicine

Supporting game-changing genomics research to improve the health of a nation.

“Analyzing hundreds of samples in parallel on a regular basis requires a robust HPC system to handle the load properly. From our experience, IBM systems have proven to be reliable in helping us address this technical requirement.”

—Dr. Mohamed-Ramzi Temanni, Manager, BioInformatics Technical Group at Sidra Medical and Research Center

Per this announcement letter, IBM is collaborating and providing solutions for a compute and storage infrastructure for Sidra Medical and Research Center (Sidra). The goal of Sidra Medical and Research Center is to be a research and education institution, in addition to a world-class hospital focusing on the health and well-being of children and women.

For more information about the IBM collaboration to provide the platform that is deployed by Sidra to advance Qatar’s biomedical research capabilities, see this press release.

This is not the first collaboration between Sidra and IBM. One of the first programs that Sidra used from the IBM technology platform was for the Qatar Genome Project (QGP).

The goal of this section is to describe the Sidra Medical and Research Center Advancing Qatar’s Biomedical Research Capabilities with IBM solutions. For more information about this solution, see this YouTube video.

6.1.1 About Sidra

Sidra focuses on three pillars:

•Healthcare

•Education

•Biomedical research

There has been an increase in obesity, diabetes, and cardiovascular-related diseases. Sidra has discovered new insights, which is improving the treatment of patients.

Sidra had to adjust the way it approached healthcare, research, and education to accommodate the need for personalized healthcare. Genomic care is important because it can be tailored to a specific treatment that is based on the genomic signature of the patient. In the past, a treatment was generic; now, a treatment is customized for each patient.

6.1.2 The Qatar Genome Project focuses on population health and better treatments

QGP is the first initiative in the world where a sequence of the entire population is done. Sidra is the provider for the sequencing and bioinformatics analysis for this project. Until this project, only a small subset (of about 350,000 samples total) had been sequenced. Now, you can sequence 18,000 samples per year by using the sequencer at Sidra.

The QGP looks at genomic signatures and the disease that fits with that specific genomic signature to provide a customized treatment.

The goal of the QGP is to help the researcher answer complex research questions. The research starts with a hypothesis where the researcher is trying to understand the importance of the genomic factor in the prevalence of a disease. A blood sample moves from the biological realm to the digital realm through the sequencing step, and then that data is transferred to a HPC cluster. A pipeline and analysis are performed to, for example, specify what mutations are responsible or related to the disease.

6.1.3 Personalized medical advances depend on having a unified view

Genomic is not everything. Genomic is taking a snapshot of your gene at a certain point in time. But beyond the genome, there are other types of data, such as the RNA-seq, which is analyzing the transcriptome (you can also analyze the ribosome). With Ribo-seq, you also have information about the metabolome, which is where you look at the metabolite. This data tells a story about how the human body is functioning and the status of the disease. You combine this different heterogeneous data to gain a better insight about the disease.

When you work with multifactorial diseases, such as obesity, you must account for genomic data and other information, such as phenotypical or environmental data, to better understand the disease. If you do not account for these factors, it is as though you are watching a video without any sound.

For example, if the clinical data is worth one dollar, and the genomic data is worth one dollar, combining the two is worth a thousand dollars because the combined data provides much better insight into the disease than looking at the data separately. You must combine multiple sorts of data, such as combining the phenotypical and genomics data. For example, for a multifactorial disease, such as obesity, there are genetic and environmental factors, so capturing only the genetic information provides only part of the answer.

Scientists are trying to correlate the genome data and clinical data to find abnormal cases that are related to obesity or blood pressure-related activities, which is the reason to study the clinical data and genome data together in a single platform. The goal is to build a unique and integrated platform to help the researchers analyze their bioinformatics data in an easy and efficient way.

6.1.4 Converging high-performance computing, big data, and cognitive computing

The Sidra infrastructure is a national resource for other research institutions. The unified infrastructure is used for genomic workflow, big data analytics, and machine learning in a single platform.

Genomics workloads with high-performance computing

HPC addresses a “one size fits all” approach for many research requirements. An application-driven architecture helps you build genomic workflows on HPC, machine learning algorithms on big data, and image processing on a centralized infrastructure. An application-driven architecture also helps you run multi-disciplinary applications on a single infrastructure.

Genomic workloads deal with large amounts of data. Data analysis can take a couple of weeks, and if the analysis fails, you must redo the entire workload. A failure adds to the number of days to run to complete the job, which adds constraints to meeting deadlines.

There are three key elements that are used to select the best solution for HPC: scalability, support, and flexibility.

Accelerating pipelines with Apache Spark

Apache Spark is ideal for organizing big data and genomic pipelines. There are several bioinformatics tools that are integrated into Spark, such as PacBio, Partic-Floor, and Galaxy, which improve biomedical pipeline development, and also big data tools, such as Spark, that are integrated to help minimize the run time of the pipeline.

6.1.5 Why cognitive computing and IBM

Cognitive computing plays a major role in Sidra’s application development. There are two cases:

•The entire querying mechanism to which a scientist provides a natural language query. This mechanism uses IBM cognitive solutions to convert this query into a technical query to the system.

•IBM cognitive solutions help the user by suggesting the best way to submit a job to the HPC workload, with the goal of efficiently using the resources to the maximum.

When the user submits an inefficient job, the number of resources are used less, which leads to inefficiency. IBM cognitive solutions help resolve those inefficiencies by providing a better way of submitting the jobs.

6.1.6 A collaboration

Sidra scientists decided to work with IBM because IBM has solid experience in engineering systems. There are many complex problems to solve, and there is a good team of scientists with Sidra who know how to deal with those problems. So, the goal of Sidra was to find a partner that collaborated with their scientists to address the problem by providing a robust solution, and by working hand-in-hand to tackle those problems on both sides.

Sidra collaborated with IBM to build centralized natural resources to address the diversified categories of applications, which include a pathogenome project, machine learning, big data analytics, and image processing. All these applications must run in a centralized infrastructure so that data can move in and around for research requirements.

The project started from the ground up. The team built the entire infrastructure in collaboration with IBM. The scientists provided input, and with their experience in combination with IBM knowledge, a new robust infrastructure was built that is now used for many projects at Sidra, and with many business partners. The Sidra infrastructure is also used as nationalized resources for other organizations.

6.1.7 Software-defined infrastructure for all data and workloads

The teams are running diversified projects, such as pathogenome projects, machine learning, image processing, and big data, in a single infrastructure (see Figure 2-1 on page 14). The customized design infrastructure is suitable for machine learning algorithms and image-processing applications.

The sources for image processing are mostly MRI scans and scanning machines, which are processed by open source and MATLAB publications. The image-processing applications are integrated with MATLAB and open source applications, so they can be processed in a single infrastructure.

The data can be moved from HPC to big data analytics. HPC addresses genome data, and big data addresses clinical data.

6.1.8 Faster results with scalability, reliability, and speed

Optimization of the scientific workload is important because pathogenome projects contain much data and many samples. Scalability plays a crucial role in pathogenome projects in terms of computing, storage, and networking.

IBM Spectrum Computing Solutions are flexible, scalable, and expandable. IBM Systems, IBM Storage, and IBM Spectrum Compute Solutions are a key combination to run pipeline and bioinformatics tools optimally. IBM Spectrum Computing Solutions provide end-to-end solutions for your research requirements.

Researchers and scientists must run more than a thousand jobs per day. Intelligent resource management systems provide scalability, quality of service (QoS), and the best turnaround time in the infrastructure. Intelligent resource management systems can be implemented in IBM Spectrum LSF. For example, in the last two years, researchers have run 700,000 genomics jobs in an IBM Spectrum LSF cluster.

By optimizing the different aspects, such as optimizing the population calling and tweaking the parameters of IBM Spectrum LSF, the researchers reduced one of the steps from 30 days to only four days.

The research team has used this solution for the last three years, and never had any failures or outages. The IBM Spectrum LSF cluster is 90 - 100% used, and the number of jobs are increasing day by day.

IBM Spectrum LSF RTM and IBM Spectrum LSF Application Center produce reports on performance metrics and job slot utilization, and many other reports that help management plan the capacity of the HPC cluster. For example, IBM Spectrum LSF RTM helps monitor the jobs across the cluster. IBM Spectrum LSF RTM is a dashboard monitoring system where the user can log in and check their own jobs that are running on the cluster, and can pull the reports, which helps their research.

IBM Spectrum LSF Application Center helps the researchers submit any jobs to the IBM Spectrum LSF cluster through a web interface. IBM Spectrum LSF Application Center is a tool where the user does not need to remember any IBM Spectrum LSF command-line arguments, and it helps any researchers to send a job to the IBM Spectrum LSF cluster.

6.1.9 Adding big data and cognitive computing to high-performance computing

IBM Spectrum Computing products integrate HPC and big data workloads in a single platform. To support the QGP, IBM Spectrum LSF provides cluster integration with current technologies, such as Docker and Open Stack, and includes integration with other big data tools.

IBM already has integrated Spark and Docker containers with IBM Spectrum LSF successfully, and is integrating IBM Spectrum Conductor with a Spark container for IBM Spectrum LSF to optimize the computer sources and applications.

IBM Spectrum Scale helps customize the genomics solution design, HPC capabilities, and cognitive computing, and integrates them in a single infrastructure. IBM Spectrum Scale is useful for data-intensive applications.

The sample solution has about 3 petabytes of data. However, IBM Spectrum Scale is a highly scalable solution, and compared to other file system options, IBM Spectrum Scale has better features and capabilities, is highly integrated, and is more stable.

Additionally, IBM Spectrum Scale RTM helps downsize the resource requirements for applications. IBM provides test fixes in a short period of time.

The genomics solution proof of concept sequenced 3,000 samples, which were successfully analyzed, and needed about 1.5 PB of storage.

6.1.10 Future

The field of precision medicine keeps evolving thanks to the revolution that is happening in biotechnology, and specifically in the world of sequencer technology, where observations show higher data generation, lower cost, and a faster turnaround time to generate the data. For scientists, the expectation of an HPC system is to have an innovative technology platform that can match the incremental data generation in the biotechnology world so that there is rapid and on-time analysis.

The Qatar biomedical informatics division is considered a national resource. This division helps other scientists and researchers from other institutes with their bioinformatics and research computing needs. In addition, all of these collaborations use the HPC system. Scientists and researchers expect HPC systems to provide innovative technology that can help them meet their needs in terms of data analysis and ever-increasing rapid data generation.

The scientists hope that the current analysis that they are doing will identify variants that cause major diseases, which will help them develop personalized and more effective treatments.

6.2 Amsterdam UMC

Enabling ground breaking research with scalable, cost-effective storage for big data.

“Thanks to our work with IBM and E-Storage, we’ve created a secure, scalable storage platform to support stakeholders across the organization.”

—Patrick Dekkers, Storage Specialist, Amsterdam UMC

6.2.1 Customer background

In 2019, VU Medical Center (VUmc) and the Academic Medical Center (AMC) joined forces as Amsterdam UMC - https://www.vumc.com/. The two Amsterdam academic hospitals are working together and have the same goals: keep delivering high-quality patient care, conduct ground-breaking scientific research, and provide excellent academic education.

6.2.2 Business challenge

As its unstructured data (including administrative documents, research materials, and medical images) exploded, the customer wanted the security, cost-efficiency, and scalability of a centralized storage platform.

6.2.3 Transformation

By using a centralized, scalable and flexible storage platform for big data, Amsterdam UMC is able to optimize data performance and costs through capacity planning, storage utilization, and data placement based on IBM Spectrum Scale that automatically moves data between storage systems without disrupting users or applications.

This guarantees clinicians, researchers, and users have maximum availability for any type of data based on self-provisioning with multiple levels of service (gold, silver and bronze) depending on how often the data needs to be accessed or used. Today, their archive based on IBM Tapes is spanning more than 100 years’ worth of data, protected from disasters and GDPR-compliant.

6.2.4 Business benefits

The following business benefits are described from the case scenario:

•99% faster data migrations enable IT to focus on value-added development

•7% increase in backup frequency due to reduced complexity and increased efficiency

•Streamlines governance by migrating the organization to a centralized pool of storage

6.2.5 Solution components

The following list shows the solution components:

•IBM Spectrum Archive Enterprise Edition

•IBM Spectrum Scale

•IBM TS4500 Tape Library

You can read the full story at the following website:

https://www.ibm.com/case-studies/vu-medical-center-research-spectrum-storage

You can also watch the video at the following website:

https://www.youtube.com/watch?v=ISFVscG20xU

6.3 L7 Informatics

Building a high-performance Genomic Cloud to support ground-breaking research.

“We were able to cut the run time of one standard genome analysis pipeline down from 24 hours to just over an hour—a time saving of 96 percent.”

—Chris Mueller, Founder, L7 Informatics

6.3.1 Customer background

L7 Informatics (https://www.l7informatics.com/) provides software and services that enable synchronized solutions for science and health. L7’s novel Enterprise Science Platform (ESP) is a scientific process and data management (SPDM) solution that enables life science and healthcare companies to connect people, processes, and systems to accelerate discoveries and drive precision healthcare.

6.3.2 Business challenge

To advance our understanding of the human genome, scientists must process vast amounts of data. However, many research centers struggle to manage the immense volume of data that they generate, so that they can put it to its best use.

6.3.3 Transformation

L7 teamed up with IBM to build an HPC environment on the cloud, leveraging IBM Spectrum technology for flexible, highly scalable data storage and user-friendly workload management.

6.3.4 Business benefits

The following business benefits are described from the case scenario:

•96% reduction in the run time of a standard genome analysis pipeline

•1/3 the price of using commodity solutions to perform the same work at scale

•2 weeks from conceptual design to fully-functional IBM HPC environment on the cloud

6.3.5 Solution components

The following list shows the solution components:

•IBM Cloud

•IBM Spectrum LSF

•IBM Spectrum Scale

You can read the full story at the following website:

https://www.ibm.com/case-studies/l7-informatics-systems-spectrum-hpc

You can also watch the video at the following website:

https://www.youtube.com/watch?time_continue=46&v=lD8CiPQYRTI

6.4 University of Birmingham

Driving innovative research forward by taking control of data.

“Breakthroughs are happening all the time at the university. Underpinning all of this pioneering innovation, IBM Spectrum Storage solutions make sure that the data is there, whenever our researchers need it.”

—Simon Thompson, Research Computing Infrastructure Architect, University of Birmingham

6.4.1 Customer background

Established by Queen Victoria in 1900, the University of Birmingham ((https://www.birmingham.ac.uk/index.aspx) is one of the largest universities in the UK, serving approximately 34,000 undergraduate and graduate students.

The university’s Computer Centre is the centerpiece of the Birmingham Environment for Academic Research (BEAR), a collection of IT resources available without cost to the University of Birmingham community and qualified external researchers.

6.4.2 Business challenge

To maintain its reputation as a premier research institution, the University of Birmingham must ensure that data is always available to a growing number of users running increasingly complex simulations.

6.4.3 Transformation

The university deployed IBM Spectrum Scale and IBM Spectrum Protect, increasing transparency around data’s location and who accesses it, and increasing its mobility within a diverse IT environment.

6.4.4 Business benefits

The following features represent a few of the business benefits gained from the implemented solution:

•Supports compliance with data protection regulations at low cost and without disruption

•Up to an estimated 2 FTEs savings due to enhanced operational efficiency

•5000 researchers supported by infrastructure that helps them find solutions to key issues faster

6.4.5 Solution components

The following list shows the solution components:

•IBM Spectrum Scale Data Management Edition

•IBM Spectrum Protect

•IBM Power Systems AC922

•IBM PowerAI Enterprise

You can read the full story at the following website:

https://www.ibm.com/case-studies/university-of-birmingham-systems-software-spectrum-scale

6.5 Thomas Jefferson University

Deepening the understanding of disease enables radically new approaches to diagnosis and treatment.

“When you let data lead the way, you can entertain bolder journeys that are not limited by what is already known in the literature. High-performance computing is the catalyst that makes such scientific explorations possible.”

— Isidore Rigoutsos, PhD, Founding Director of the Computational Medicine Center, Thomas Jefferson University

6.5.1 Customer background

Jefferson (Philadelphia University + Thomas Jefferson University: https://www.jefferson.edu/) is a distinctive, comprehensive national university setting a new standard for 21st-century professional education. It has 7,800 students, more than 4,000 faculty members, and offers approximately 160 undergraduate and graduate programs on multiple campuses. Its unique Nexus Learning model focuses on collaborative, inter-professional, and trans-disciplinary approaches to learning supported by design and systems thinking, innovation, entrepreneurship, empathy, and the modes of thought central to the liberal arts and scientific inquiry.

6.5.2 Business challenge

What causes some people to develop diseases and not others? The attempt to find an answer is driving ground-breaking research, and leading pioneers to challenge traditional approaches to treatment.

6.5.3 Transformation

The Computational Medicine Center at Jefferson is breaking new ground in the understanding of disease by analyzing huge amounts of biological data with the help of high-performance computing.

6.5.4 Business benefits

The following are a few of the business benefits from the implemented solution:

•Push the boundaries of knowledge, anticipating new breakthroughs in healthcare

•Support the development of diagnostics and therapies that boost positive outcomes

•Remove barriers to scientific exploration through data-driven research

6.5.5 Solution components

The following list shows the solution components:

•IBM Spectrum Scale

•IBM Spectrum Protect

•IBM Storwize® V5030

•IBM TS3310 Tape Library

You can read the full story at the following website:

https://www.ibm.com/case-studies/jefferson

You can also read the story with the following link to the e-book:

https://www.ibm.com/downloads/cas/ZBXXNGP2

You can also watch the videos at the following website:

https://bit.ly/2wBh4TN

6.6 Biotechnology and Biomedicine Center of the Czech Academy of Sciences and Charles University: BIOCEV

Building research infrastructure with performance, efficiency, and reliability in its DNA. č

“As scientists introduce the latest generation of appliances and lab equipment, the data generated by their research activities surges, and the IBM platform ensures we can cope with even the highest demand.”

—Michal Sedláček, IT Architect, BIOCEV

6.6.1 Customer background

Biotechnology and Biomedicine Centre of the Academy of Sciences and Charles University in Vestec (BIOCEV: https://www.biocev.eu/en) was founded as a joint initiative from the Academy of Sciences of the Czech Republic and two faculties at Charles University in Prague. The project’s goal is to establish a European center of excellence for biomedicine and biotechnology, with the following aims: detailed study of cellular mechanisms at the molecular level, the research and development of novel therapeutic strategies, early diagnostics, biologically active agents including chemotherapeutic, protein engineering, and other technologies.

6.6.2 Business challenge

Scientific research does not end after experiment results are achieved. Instead, the outcomes must be evaluated and verified, making data storage an essential component of successful innovation. To achieve its goal of becoming a European center of excellence for biomedicine and biotechnology, BIOCEV had to help scientists store huge amounts of research data.

6.6.3 Transformation

With a software-defined storage solution from IBM, BIOCEV gained the high-performance, efficient, and reliable platform needed to support scientific breakthroughs, offering scientists fast, reliable storage and access to data. The automated storage management contributes to low TCO.

6.6.4 Business benefits

The following are a few of the business benefits from the implemented solution:

•Facilitates research by offering scientists fast, reliable storage and access to data

•Enables high efficiency through automated storage management

•Elevates the organization’s reputation by enabling non-stop services

6.6.5 Solution components

The following list shows the solution components:

•IBM Spectrum Scale

•IBM Spectrum Protect

•IBM Storwize V7000 Gen2

•IBM TS3500 Tape Library

You can read the full story at the following website:

https://www.ibm.com/case-studies/BIOCEV

6.7 Washington University St. Louis and Vanderbilt University

Advancing medical imaging research with deep learning.

“IBM did an excellent job to form a very skilled team to support us in a very timely manner.”

—Dr. Yong Wang, PhD, Assistant Professor of Gynecology and Obstetrics, Radiology, and Biomedical Engineering, Washington University St. Louis

6.7.1 Customer background

Washington University School of Medicine (https://medicine.wustl.edu/research/) in St. Louis is committed to advancing human health throughout the world, and has an outstanding history of biomedical research in an environment that cultivates the best minds in science and medicine. To advance medical imaging with AI, they are collaborating with the Vanderbilt Institute of Imaging Science (https://vuiis.vumc.org/), which is a trans-institutional initiative within Vanderbilt University serving physicians, scientists, students, and corporate affiliates.

6.7.2 Business challenge

Recent gains in computing power have made it possible to capture more detailed medical images, but making sense of the growing volumes of data is a challenge. Applying AI capabilities, and in particular, deep learning, can hold the potential to overcome these obstacles and fill in the gaps in incomplete MRI brain scans.

6.7.3 Transformation

With deep learning supported by IBM software-defined infrastructure and high-performance computing solutions, researchers are analyzing brain scans to identify brain tumors faster and more accurately, helping physicians improve patient care. The enhancement of MRI scan analysis enables physicians to effectively use large amounts of generated data, while keeping scan times short.

6.7.4 Business benefits

The following features represent a few of the business benefits gained from the implemented solution:

•20x faster training of deep learning models than traditional PC environments

•Increased speed and accuracy of diagnosis, enhancing treatments and patient outcomes

•Lower barriers to deep learning for physicians, addressing the big data skills gap

6.7.5 Solution components

The following list shows the solution components:

•IBM Spectrum Conductor Deep Learning Impact

•IBM POWER8

For additional information about the solution implemented, and feedback about it, see the following website:

https://ibm.co/2PsVgFJ

You can read the full story at the following website:

https://ibm.co/2EMbz8e

You can also watch the informative videos at the following websites:

https://bit.ly/2RAaqco

https://bit.ly/2COO1zX

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 6. Case studies

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 6. Case studies