Posts Tagged ‘laboratory informatics’

Information Management in Health Research

Larry H. Bernstein, MD, FCAP, Curator



Researchers find Potential Security Hole in Genomic Data-sharing Network


Jennie Dusheck, Stanford University

Sharing genomic information among researchers is critical to the advance of biomedical research. Yet genomic data contains identifiable information and, in the wrong hands, poses a risk to individual privacy. If someone had access to your genome sequence — either directly from your saliva or other tissues, or from a popular genomic information service — they could check to see if you appear in a database of people with certain medical conditions, such as heart disease, lung cancer or autism.

Work by researchers at the Stanford University School of Medicine makes that genomic data more secure. Suyash Shringarpure, Ph.D., a postdoctoral scholar in genetics, and Carlos Bustamante, Ph.D., a professor of genetics, have demonstrated a technique for hacking a network of global genomic databases and how to prevent it. They are working with investigators from the Global Alliance for Genomics and Health on implementing preventive measures.

The work, published October 29, 2015, in The American Journal of Human Genetics, also bears importantly on the larger question of how to analyze mixtures of genomes, such as those from different people at a crime scene.

A network of genomic data sets on servers, or beacons, organized by the National Institutes of Health-funded Global Alliance for Genomics and Health, allows researchers to look for a particular genetic variant in a multitude of genomic databases. The networking of genomic databases is part of a larger movement among researchers to share data. Identifying a gene of interest in a beacon tells researchers where to apply for more complete access to the data. A central assumption, though, is that the identities of those who donate their genomic data are sufficiently concealed.

“The beacon system is an elegant solution that allows investigators to ‘ping’ collections of genomes,” said Bustamante. Investigators on the outside of a data set can ping and ask which data set has a particular mutation. “This allows people studying the same rare disease to find one another to collaborate.”

Beacons’ vulnerability

But many genomic data sets are specific to a condition or disease. A nefarious user who can find the match for an individual’s genome in a heart disease beacon, for example, can infer that the individual — or a relative of that person — likely has heart disease. By “pinging” enough beacons in the network of beacons, the hacker could construct a limited profile of the individual. “Working with the Global Alliance for Genomics and Health, we’ve been able to demonstrate that vulnerability and, more importantly, how to put policy changes in place to minimize the risk,” said Bustamante.

To protect donors’ identities, the organizers of the network, which is called the Beacon Project, have taken steps, such as encouraging beacon operators to “de-identify” individual genomes, so that names or other identifying information are not connected to the genome.

Despite such efforts, Shringarpure and Bustamante calculated that someone in possession of an individual’s genome could locate that individual within the beacon network. For example, in a beacon containing the genomes of 1,000 individuals, the Stanford pair’s approach could identify that individual or their relatives with just 5,000 queries.

Genomic information isn’t completely covered by the federal law that protects health information, and the consequences for a person whose information is disclosed can be significant. For example, although the national Genetic Information Nondiscrimination Act prevents health insurers from denying someone coverage or raising someone’s premiums because they have a particular genetic variant, the act does not apply to other forms of insurance, such as long-term care, disability or life insurance.

Approaches for better security

The Beacon Project has the potential to be enormously valuable to future genetic research. So, plugging this security hole is as important to Shringarpure and Bustamante as to the Global Alliance for Genomics and Health. In their paper, the Stanford researchers suggest various approaches for making the information more secure, including banning anonymous researchers from querying the beacons; merging data sets to make it harder to identify the exact source of the data; requiring that users be approved; and limiting access in a beacon to a smaller region of the genome.

The beacon system is an elegant solution that allows investigators to ‘ping’ collections of genomes.

Peter Goodhand, executive director of the Global Alliance for Genomics and Health, said, “We welcome the paper and look forward to ongoing interactions with the authors and others to ensure beacons provide maximum value while respecting privacy.”

Goodhand also said that the organization’s mitigation efforts, which adhere to the best practices outlined in its privacy and security policy, include aggregating data among multiple beacons to increase database size and obscure the database of origin; creating an information-budgeting system to track the rate at which information is revealed and to restrict access when the information disclosed exceeds a certain threshold; and introducing multiple tiers of secured access, including requiring users to be authorized for data access and to agree not to attempt specific risky scenarios.

Shringarpure and Bustamante are also interested in applying the technique described in their study to the area of DNA mixture interpretation, in which investigators seek to identify one DNA sequence in a mixture of many similar ones. The DNA mixing problem is relevant to forensics, studies of the microbiome and ecological studies. For example, Bustamante said, if a weapon used in a crime had DNA from several people on it, DNA mixture interpretation can help investigators pick out the DNA of a particular person, such as the suspect or the victim, revealing whether they touched the weapon. In fact, investigators could potentially use the same type of analysis used on the beacon network to look for individuals who may have touched a railing in a subway station or other public space.

This research was partially supported by the National Institutes of Health (grant U01HG007436).Stanford’s Department of Genetics also supported the work. Bustamante is on the scientific advisory boards for Ancestry.com, Personalis, Liberty Biosecurity and Etalon DX. He is also a founder and chair of the advisory board for IdentifyGenomics. None of these entities played a role in the design, interpretation or presentation of the study. Stanford University’s Office of Technology Licensing has evaluated the work presented in the paper for potential intellectual property and commercial rights.


Computational Models to Sort out the Genetic Chaos of Cancer Cells


University of Luxembourg

Scientists have developed a method for analyzing the genome of cancer cells more precisely than ever before. The team led by Prof. Antonio del Sol, head of the research group Computational Biology of the Luxembourg Centre for Systems Biomedicine of the University of Luxembourg, is employing bioinformatics: Using novel computing processes, the researchers have created models of the genome of cancer cells based on known changes to the genome. These models are useful for determining the structure of DNA in tumors.

“If we know this structure, we can study how cancer develops and spreads,” says del Sol. “This gives us clues about possible starting points for developing new anticancer drugs and better individual therapy for cancer patients.”

The LCSB researchers recently published their results in the scientific journal Nucleic Acids Research.

“The cause of cancers are changes in the DNA,” says Sarah Killcoyne, who is doing her PhD at the University of Luxembourg and whose doctoral thesis is a core component of the research project. “Mutations arise, the chromosomes can break or reassemble themselves in the wrong order, or parts of the DNA can be lost,” Killcoyne describes the cellular catastrophe: “In the worst case, the genome becomes completely chaotic.” The cells affected become incapable of performing their function in the body and — perhaps even worse — multiply perpetually. The result is cancer.

If we are to develop new anticancer drugs and provide personalized therapy, it is important to know the structure of DNA in cancer cells. Oncologists and scientists have isolated chromosomes from tumors and analyzed them under the microscope for decades. They found that irregularities in the chromosome structure sometimes indicated the type of cancer and the corresponding therapy.

“Sequencing technologies have made the identification of many mutations more accurate, significantly improving our understanding of cancer,” Sarah Killcoyne says. “But it has been far more difficult to use these technologies for understanding the chaotic structural changes in the genome of cancer cells.”

This is because sequencing machines only deliver data about very short DNA fragments. In order to reconstruct the genome, scientists accordingly need a reference sequence — a kind of template against which to piece together the puzzle of the sequenced genome.

Killcoyne continues: “The reference sequence gives us clues to where the fragments overlap and in what order they belong together.” Since the gene sequence in cancer cells is in complete disarray, logically, there is no single reference sequence. “We developed multiple references instead,” says Sarah Killcoyne. “We applied statistical methods for our new bioinformatics approach, to generate models, or references, of chaotic genomes and to determine if they actually show us the structural changes in a tumor genome.”

These methods are of double importance to group leader del Sol, as he states: “Firstly, Sarah Killcoyne’s work is important for cancer research. After all, such models can be used to investigate the causes of genetic and molecular processes in cancer research and to develop new therapeutic approaches. Secondly, we are interested in bioinformatics model development for reapplying it to other diseases that have complex genetic causes — such as neurodegenerative diseases like Parkinson’s. Here, too we want to better understand the relationships between genetic mutations and the resulting metabolic processes. After all, new approaches for diagnosing and treating neurodegenerative diseases are an important aim at the Luxembourg Centre for Systems Biomedicine.”

Citation: Mathematical ‘Gingko trees’ reveal mutations in single cells that characterize diseases. Sarah Killcoyne et al. Identification of large-scale genomic variation in cancer genomes using reference models , Nucleic Acids Research (2015). DOI: 10.1093/nar/gkv828



Mathematical ‘Gingko trees’ reveal mutations in single cells that characterize diseases

DOI: 10.1093/nar/gkv828

Seemingly similar cells often have significantly different genomes. This is often true of cancer cells, for example, which may differ one from another even within a small tumor sample, as genetic mutations within the cells spread in staccato-like bursts. Detailed knowledge of these mutations, called copy number variations, in individual cells can point to specific treatment regimens.

The problem is that current techniques for acquiring this knowledge are difficult and produce unreliable results. Today, scientists at Cold Spring Harbor Laboratory (CSHL) publish a new interactive analysis program called Gingko that reduces the uncertainty of single-cell analysis and provides a simple way to visualize patterns in copy number mutations across populations of .

The open-source software, which is freely available online, will improve scientists’ ability to study this important type of genetic anomaly and could help clinicians better target medications based on cells’ specific mutation profiles. The software is described online today in Nature Methods.

Mutations come in many forms. For example, in the most common type of mutation, variations may exist among individual people—or cells—at a single position in a DNA sequence. Another common mutation is a copy number variation (CNV), in which large chunks of DNA are either deleted from or added to the genome. When there are too many or too few copies of a given gene or genes, due to CNVs, disease can occur. Such mutations have been linked not only with cancer but a host of other illnesses, including autism and schizophrenia.

Researchers can learn a lot by analyzing CNVs in bulk samples—from a tumor biopsy, for example—but they can learn more by investigating CNVs in . “You may think that every cell in a tumor would be the same, but that’s actually not the case,” says CSHL Associate Professor Michael Schatz.

“We’re realizing that there can be a lot of changes inside even a single tumor,” says Schatz. “If you’re going to treat cancer, you need to diagnose exactly what subclass of cancer you have.” Simultaneously employing different drugs to target different cancer subclasses could prevent remission, scientists have proposed.

One powerful single-cell analytic technique for exploring CNV is whole genome sequencing. The challenge is that, before sequencing can be done, the cell’s DNA has to be amplified many times over. This process is rife with errors, with some arbitrary chunks of DNA being amplified more than others. In addition, because many labs use their own software to examine CNVs, there is little consistency in how researchers analyze their results.

To address these two challenges, Schatz and his colleagues created Gingko. The interactive, web-based program automatically processes sequence data, maps the sequences to a reference genome, and creates CNV profiles for every cell that can then be viewed with a user-friendly graphical interface. In addition, Gingko constructs phylogenetic trees based on the profiles, allowing cells with similar copy number mutations to be grouped together.

Importantly, Gingko, which Schatz and his colleagues validated by reproducing the findings of five major single-cell studies, also analyzes patterns in the sequence reads in order to recognize, and greatly reduce, amplification errors.

Schatz and his team named their software after the gingko tree, which has many well-documented therapeutic benefits. “We like to think our Gingko ‘trees’ will provide benefits as well,” says Schatz, referring to the graphical way that CNV changes are represented by analysts. Right now, CNV is not a commonly used diagnostic measurement in the clinic. “We’re looking into the best way of collecting samples, analyzing them, and informing clinicians about the results,” says Schatz. He adds that CSHL has collaborations with many hospitals, notably Memorial Sloan Kettering Cancer Center and the North Shore-LIJ Health System, to bring single-cell analysis to the clinic.

For Schatz, Gingko represents a culmination of CSHL’s efforts over the past decade—spearheaded by CSHL Professor Michael Wigler—to pioneer techniques for studying single cells. “Cold Spring Harbor has established itself as the world leader in single-cell analysis,” says Schatz. “We’ve invented many of the technologies and techniques important to the field and now we’ve taken all this knowledge and bundled it up so that researchers around the world can take advantage of our expertise.”

Explore further: A shift in the code: New method reveals hidden genetic landscape

More information: Interactive analysis and assessment of single-cell copy-number variations, Nature, DOI: 10.1038/nmeth.3578


Interactive analysis and assessment of single-cell copy-number variations

Tyler GarvinRobert AboukhalilJude KendallTimour BaslanGurinder S AtwalJames HicksMichael Wigler & Michael C Schatz

Nature Methods12,1058–1060(2015)    http://dx.doi.org:/10.1038/nmeth.3578

We present Ginkgo (http://qb.cshl.edu/ginkgo), a user-friendly, open-source web platform for the analysis of single-cell copy-number variations (CNVs). Ginkgo automatically constructs copy-number profiles of cells from mapped reads and constructs phylogenetic trees of related cells. We validated Ginkgo by reproducing the results of five major studies. After comparing three commonly used single-cell amplification techniques, we concluded that degenerate oligonucleotide-primed PCR is the most consistent for CNV analysis.

Figure 2: Assessment of data quality for different single-cell whole genome amplification methods using Ginkgo.

Assessment of data quality for different single-cell whole genome amplification methods using Ginkgo.

(a) LOWESS fit of GC content with respect to log-normalized bin counts for all samples in each of the nine data sets analyzed: three for MDA (top left, green), three for MALBAC (center left, orange) and three for DOP-PCR (bottom left, b…



Breaking Through the Barriers to Lab Innovation


Author: Helen Gillespie, Informatics Editor, Technology Networks


Innovation is a hot topic today and just about every type of laboratory is scrambling to figure out what it means for them. Lab Managers are expected to design profitable new products that enable the research organization to stay competitive in today’s marketplace. This means change. Process change. Systems change. Informatics technologies change. As a result, systemic change is occurring at all levels of the organization, driving the implementation of integrated lab solutions that unlock disparate, disconnected lab data silos and harmonize the IT infrastructure. Getting greater control of lab data is part of this and one of the most critical components of future success and corporate sustainability. As a result, some of the greatest change is taking place in Informatics in laboratories around the world.

Two of the most significant barriers to innovation are outdated informatics tools and inefficient workflows. Moving from paper-based manual methodologies to digital solutions can breathe new life into researcher productivity while enabling forward-looking companies to better compete and excel in today’s rapidly changing business environment.

This article examines the drivers behind the move for greater innovation, challenges, current trends in laboratory informatics, and the tools and techniques that can be used to break through barriers to lab innovation. Several leading informatics vendors provide their views.

Selected Vendors

featured productLaboratories worldwide seeking a single, integrated informatics platform can now standardize on one comprehensive laboratory information management system (LIMS). Thermo Fisher’s integrated informatics solution now comprises method execution, data visualization and laboratory management, and seamlessly integrates with all popular enterprise-level software packages.

“Thermo Scientific SampleManager is a fully integrated laboratory platform encompassing laboratory information management (LIMS), scientific data management (SDMS) and lab execution (LES).”
Trish Meek, Director Strategy, Informatics, Thermo Fisher Scientific

More Information

featured productBIOVIA Unified Lab Management allows for streamlined and more efficient lab workflows and a fully integrated and automated easy-to-deploy process. Based on the BIOVIA Foundation it works as an integration hub for BIOVIA applications as well as all major 3rd party systems and instruments allowing for seamless data transfer.

“BIOVIA Unified Lab Management is part of our unique end-to-end Product Lifecycle support for science-based organizations to improve innovation, quality, compliance, and efficiency.”
Dr. Daniela Jansen, Senior Solution Marketing Manager

More Information

featured productWaters® NuGenesis® Lab Management System uniquely combines data, workflow and sample management capabilities to support the entire product lifecycle from discovery through manufacturing. This user-centric platform encompasses NuGenesis SDMS, compliance-ready data repository, NuGenesis ELN, a flexible analytical electronic laboratory notebook, and NuGenesis Sample Management.

“The NuGenesis LMS readily adapts to existing informatics environments, smoothly linking data from the lab to the business operations of a company, so science-driven organizations can see more, know more and do more.”
Garrett Mullen, Senior Product Marketing Manager, Laboratory Management Informatics, Waters

More Information

The Impact of Corporate Wide Initiatives

There are a number of sweeping changes occurring throughout the corporate world that are turning the spotlight on research laboratories, examining everything from workflows to documentation. These changes are driven by corporate initiatives to increase profits, reduce costs, develop new products and drive operational efficiencies throughout the enterprise. These are not new goals, but the methodologies for achieving these goals have changed significantly thanks to the rapid changes in technology. Now, there is a greater focus on how technology can drive innovation throughout the enterprise.

In fact, almost every leading multinational organization nowadays touts innovation as an underlying theme for how they conduct business and develop the next generation products. To be truly innovative however, businesses of all types must embrace innovation at every level of the enterprise – not just in the products under development, but also how those products are being developed.

“Organizations nowadays cannot afford to not look into innovation,” emphasizes Dr. Daniela Jansen, Senior Solution Marketing Manager at BIOVIA. “Now, they are questioning how product quality is being supported by innovation throughout the end-to-end product lifecycle. The time is past when researchers looked to a single functionality to make a difference. Now, all software needs to drive innovation, to drive costs down and to drive efficiency.”

Garrett Mullen, Senior Product Marketing Manager at Waters Corporation, offers another perspective. “We drive innovation by addressing the challenges. Sometimes it is specific to the market, such as petrochemical or pharmaceutical, sometimes it is specific to the task, such as sample registration for the QA/QC department. All markets are suffering from similar challenges, whether it is products coming off patent or waning market share. So there is a big focus on what they can do about it, from controlling costs to simplifying processes.”

Operational excellence plays a significant role in corporate initiatives for innovation, and this is where the initiatives drill down into the research laboratories. According to Trish Meek, Director of Product Strategy for the Informatics business at Thermo Fisher Scientific, “Executives are looking more closely at the lab as part of a more holistic view of operational efficiencies across the entire organization. There’s a larger expectation than ever before that there is hidden value in the lab, and that can be found in optimizing efficiencies and more fully integrating processes across the lab and throughout the rest of the manufacturing or production process. Executive metrics now include the lab as they analyze data from all aspects of their operations in order to improve their processes, improve the quality of their products and drive profitability. Executives are now mining and reviewing data to determine how to make operations better from a holistic perspective, and that is causing the spotlight to be on the lab more than it ever was.”

A key aspect of operational excellence is that it goes hand in hand with product quality. Not only is there a need to expedite innovation to deliver new products, those new products need to be high quality and to comply with changing environmental regulations and consumer expectations. As a result, research organizations are reviewing their Informatics infrastructure and streamlining laboratory operations.

Further, the technology that supports lab Informatics has been evolving rapidly, delivering new functionality that is changing the way research can be performed.  This points to the heart of the matter: current technology is enabling new workflows (such as digital collaboration) while delivering greater access to research and also enabling better examination of the research (such as through the ‘Cloud’). This paradigm shift is happening at many levels, from how research is performed to how the data is shared, with technology at the center of the shift.

Barriers to Innovation: The Migration from Paper to Digital

Legacy paper-based activities in the lab are perhaps one of the greatest barriers to innovation. Data captured in paper lab notebooks is typically difficult to find, read or share. Written observations are often transcribed incorrectly. Tests and experiments are repeated because prior data is lost or inaccessible. Even though many lab activities are conducted electronically, certain steps are often still conducted on paper. Such repetitious manual activities are one of the greatest impediments to productivity. These workflow gaps are slowly being replaced with seamless digital activities.

One of the most interesting aspects of the drive for innovation is the ability to take advantage of the technology tools now available, which deliver a significant new range of functionality to users. Electronic Lab Notebooks (ELNs), for instance, can now be connected in the Cloud so that scientists anywhere can collaborate and share research data. This is important because not only is the transition to ELN’s happening on a local level, it is part of a larger global movement toward distributed research as a result of changes in how research organizations are now managing their operations. Large multinationals with research centers distributed around the globe are enabling their scientists to collaborate easily and efficiently with ELN’s as part of their effort to streamline operations.

Quote2.jpg“It is still surprising to see paper in the lab,” states Meek. “It’s in many cases a cultural issue – a comfort level – which makes it hard to move away from paper, and it’s a system everyone knows. Despite its flaws, paper is infinitely flexible, but in general it is terribly inefficient with regards to big data and computational power. Now, the need to look at all the data, and have all the data available is far more important, meaning that the move away from paper or manual data management is now more important than ever.”

“It continues to be about paper in many labs,” Jansen confirms. “But you need to look at the entire chain of cause and effect and the role that paper plays. Now, it’s about what drives the entire organization, not localized practices. This means that there’s a focus on reducing the time spent on documentation and removing barriers. There’s a focus on getting quality designed into the process, getting greater efficiency, and connecting the disparate silos of data the impede innovation. One way to do this is to use an open science-aware framework like the BIOVIA Foundation to integrate processes and applications from different providers. And virtual experiments that enable scientists to identify potential new products earlier in the process can significantly save time and money.“

Cost savings are one of the key reasons organizations make the transition from paper to digital practices. “We’ve found that processes went from hours to minutes when you eliminate the numerous manual review processes and transcriptions and replace them with electronic processes,” explains Mullen. “For example, in the past one central analytical lab at a company might have performed all LC [liquid chromatography] testing. Users submitted samples via email and the samples were boxed, tests requested, samples were received and registered at the central lab, etc. Very labor intensive. A digital solution changes all that. Now the new NuGenesis web interface enables the user to register the sample, enter the samples, specify the tests digitally, and thus reduce transcription errors and expedite the process. An automatic acknowledgement that the samples are approved is sent and the testing processes start. This eliminates the manual tasks associated with checking that everything is accurate. The time and cost savings are enormous.”

quote3.jpgOther factors are influencing the migration from paper to digital lab processes, including the recession and the heightened merger and acquisition activity. Many organizations have downsized, are running leaner, and employ fewer researchers. Yet the productivity demands remain as high as when there was more staff. Thus, there’s an increased need to ensure that researcher activities are more efficient. Manual workflows are out of sync in the digital environment.

Adopting Next Generation Technology

While there are numerous paper-based workflows in research labs worldwide, the vast majority of these labs have adopted some level of technology, including informatics software solutions. What began with instrument-specific software solutions, such as Thermo Scientific ChromeleonTM chromatography data system (CDS), has expanded to numerous application-specific and task-specific systems as computers have become an integral part of the lab work environment. Laboratory Information Management Systems (LIMS) have been commercially available since the early 1980’s. The increase in demand for fast turnaround and greater volumes of sample testing and analysis drove the growth in these solutions. NuGenesis® introduced the first Scientific Data Management System (SDMS) to help capture, catalog and archive lab data better in the 1990’s. ELN’s were one of the last lab systems to become a ubiquitous tool mainly because of the challenge of managing unstructured data versus structured data, but technology has overcome this issue too.

The increase in computing power accelerated the Informatics vendors’ ability to deliver faster, better, more comprehensive software tools. In parallel, the adoption of sophisticated technology by consumers created expectations for similar capabilities in the workplace, driving the demand for hardware such as tablets and other handheld devices as access tools for ELNs, LIMS and other lab software.

Yet while these different lab data and sample management systems have provided significant benefits to the lab, they started as separate systems and thus created separate data repositories that require an interface or middleware to enable data to be shared. But that challenge too is fast disappearing as new technology and new pathways to innovation arise.

“One of the things that Thermo Fisher Scientific is focused on is delivering  integrated informatics,” states Meek. “Traditionally, LIMS delivered specific functionality for R&D or manufacturing labs, but didn’t cover the entire laboratory process. Our customers today want an integrated solution that covers the complete lab workflow. So, we built an Integrated Informatics platform to combine many of these together so that they’re no longer separate silos with different data in different systems. Now, lab data management, method execution and scientific data management is done within the SampleManagerTM solution making its much more than just a LIMS. All of the functionality for scientific method and data management is now part of the same solution.” SampleManager has continued to evolve to offer greater functionality for our customers, so that now it has become the enabler for our customers to better manage their lab, and save their companies time and valuable financial resources formerly necessary to purchase, implement and support multiple software systems. Our goal is to continue to build upon the SampleManager platform so we can offer the greatest degree of functionality to our customers.”

“What is happening is that LIMS are now being supplemented with ELN and LES toolsets. Everyone is moving towards a center space, where LIMS become ELNs, etc.,” explains Mullen. Waters recently introduced the NuGenesis® Lab Management System (LMS) as an alternative to LIMS. Based on the NuGenesis SDMS, the LMS offers significantly more functionality that can be switched on as components are needed for various workflow and sample management tasks.

Mullen continues, “The NuGenesis LMS can create the testing protocol procedure to ensure that the tests are done correctly. It can specify the values and results, the upper and lower limits, etc., then pull the test values back into the worksheet. Results are instantly flagged as in or out of specification.  If reagents are expired or an instrument needs calibration, these are flagged automatically. The result is much faster transaction times than traditional paper-based processes.”

Quote4.jpg“For BIOVIA, when we talk about the benefits of our solutions, we’re talking about workflow efficiencies, cost savings, compliance and brand reputation,” states Jansen. “As a vendor, we support organizations by driving innovation, by strengthening the R&D pipeline while ensuring quality in their processes and outcomes. Now that BIOVIA is part of Dassault Systèmes,” Jansen continues, “we’re engaging in much larger conversations because we can now support the entire lab to plant process expanding our solutions to the 3D Experience platform. From ELNs to LIMS to virtual molecular modeling with our Discovery or Materials StudioTMsolution, BIOVIA offers an integrated, unified experience that is transforming how our customers are improving product quality, collaborating across sites, reducing cycle times and reducing costs. The bottom line is the ability to rapidly, easily and accurately transfer and utilize knowledge.”

Each of these vendors offers a different path to a similar end, with solutions that deliver greater access to not just legacy data but also the astounding volumes of data being created in labs worldwide. The ability to turn that data into knowledge that is accessible, accurate and reusable is necessary to fuel the new product demands both inside and outside the enterprise. Next generation technology is being developed and implemented with increasing rapidity to address these market requirements.


Corporate demand for innovation at every level of the enterprise is helping to drive laboratory innovation, from the tools adopted to perform research to the processes used to manage that research and all the associated data, samples, reagents, tests and more.

Operational excellence has risen to the top of corporate agendas, driven in part by the availability of technology that can support a global approach to better manage the entire product lifecycle, from initial research to final product. Now, informatics solutions exist that can support every stage of the process whether the organization engages in pharmaceutical research and needs to identify promising candidates early in the process, or whether the organization develops consumer product goods that have a short product lifecycle and thus require a constant stream of new products to maintain market share.

Information integration is playing a major role in breaking through the barriers to lab innovation. As a result, there is a significant transformation underway in the informatics tools to integrate the solutions so that data is no longer inaccessible in single purpose system. For some time there have been LIMS with ELN capabilities, CDS with LIMS functions, ELNs with sample management attributes, and more. Now, the need to exchange and move data quickly and easily from one user to another has driven the availability of integrated collaborative environments that can share laboratory data cross-team, cross-location and cross organizations.

At the core of these changes is the need to more rapidly address the larger business challenges in the lab through more efficient, more market-oriented new product development. And that’s the bottom line: informatics technology can be used as an enabling tool to solve both business challenges and lab challenges. Informatics vendors all approach the market requirements differently, depending on their own corporate culture, but all strive to enable their customers to innovate.


Bioinformatics beyond Genome Crunching

Flow Cytometry, Workflow Development, and Other Information Stores Can Become Treasure Troves If You Use the Right IT Tools and Services

  • Click Image To Enlarge +
    Shown here is the FlowJo platform’s visualization of surface activation marker expression (CD38) on live lymphocyte CD8+ T cells. Colors represent all combinations of subsets positive and negative for interferon gamma (IFNγ), perforin (Perf), and phosphorylated ERK (pERK).










    Advances in bioinformatics are no longer limited to just crunching through genomic and exosomic data. Bioinformatics, a discipline at the interface between biotechnology and information technology, also has lessons for flow cytometry and experimental design, as well as database searches, for both internal and external content.

    One company offering variations on traditional genome crunching is DNAnexus. With the advent of the $1,000 genome, researchers find themselves drowning in data. To analyze the terabytes of information, they must contract with an organization to provide the computing power, or they must perform the necessary server installation and maintenance work in house.

    DNAnexus offers a platform that takes the raw sequence directly from the sequencing machine, builds the genome, and analyzes the data, and it is able to do all of this work in the cloud. The company works with Amazon Web Services to provide a completely scalable system of nucleic acid sequence processing.

    “No longer is it necessary to purchase new computers and put them in the basement,” explains George Asimenos, Ph.D., director of strategic projects, DNAnexus.  “Not only is the data stored in the cloud, but it is also processed in the cloud.”

    The service provided by DNAnexus allows users to run their own software. Most users choose open source programs created by academic institutions.

    DNAnexus does not write the software to process and analyze the data. Instead, the company provides a service to its customers. It enables customers to analyze and process data in the cloud rather than buying, maintaining, and protecting their own servers.

    “Additionally, collaboration is simplified,” states Dr. Asimenos. “One person can generate the data, and others can perform related tasks—mapping sequence reads to the reference genome, writing software to analyze the data, and interpreting results. All this is facilitated by hosting the process, data, and tools on the web.”

    “When a customer needs to run a job, DNAnexus creates a virtual computer to run the analysis, then dissolves the virtual computer once the analysis is complete,” clarifies Dr. Asimenos. “This scalability allows projects to be run expeditiously regardless of size. The pure elasticity of the system allows computers to ‘magically appear’ in your basement and then ‘disappear’ when they are no longer being used. DNAnexus takes care of IT infrastructure management, security, and clinical compliance so you can focus on what matters: your science.”

    Merging IT and Flow Cytometry

    Click Image To Enlarge +
    Life scientists are being overwhelmed by the huge amounts of data they generate for specialized projects. They not only look for solutions within their own organizations but also increasingly enlist the help of service companies to help them with Big Data overload. [iStock/IconicBestiary]

    Technical advances in flow cytometry allows the labeling of individual cells with up to 50 different markers; 12,000 cells can be counted a second. This flood of information overwhelms traditional methods for data processing in flow cytometry.

    “FlowJo software offers a solution to this problem,” asserts Michael D. Stadnisky, Ph.D., CEO, FlowJo. “With an open architecture, our software serves as a platform that lets researchers run whatever program or algorithm they wish. Scientists can focus on the biological questions without having to become computer programmers.”

    FlowJo presents an intuitive and simple user interface to facilitate the visualization of complex datasets.

    While still in development (beta testing), FlowJo is offering plug-ins. Some of them are free, and others are for sale. They include software components for automatic data analysis, the discovery of trends and identification of outliers, and the centralization of data for all researchers to access. Applications for FlowJo range from traditional immunology to environmental studies, such as assessments of aquatic stream health based on analyses of single-cell organisms.

    “Ultimately, FlowJo wants to offer real-time analysis of data,” discloses Dr. Stadnisky. “Presently, we have the capacity to process a 1,536-well plate in 15 minutes.”

    FlowJo’s platform has benefitted users such as the University of California, San Francisco. Here, researchers in the midst of Phase I clinical trial were facing 632 clinical samples with 12 acquisition runs and 12 different time points. By employing FlowJo, the researchers realized a 10-fold reduction in the time spent analyzing all data.

    Clients have also integrated other data types. For example, they have integrated polymerase chain reaction (PCR), sequencing, and patient information with data from FlowJo, which facilitates this type of cross-functional team work. The data output from FlowJo, the company maintains, is easily accessible by other scientists. The platform is available as a standalone system that can be installed on a company’s computers or be hosted on the cloud.

    Optimizing Experiments

    One dilemma facing large pharmaceutical companies is the need to optimize conditions with a very limited supply of a precious reagent. Determining the best experimental design is crucial to avoid wasting valuable resources.

    Roche has used a commercially available electronic tool to build a workflow support tool. “This application allows scientists to set up their experiments more efficiently,” declares Roman Affentranger, Ph.D., head of small molecular discovery workflows, Roche. “The tool assists scientists in documenting and carrying out their work in the most effective manner.”

    “Frequently, a quick formulation of a peptide is necessary to hand over to a toxicologist for animal testing,” continues Dr. Affentranger. “The formulation of the peptide needs to be optimized for the pH, the type of buffer, and the surfactants, for example. The tool we developed evaluates the design of the scientist’s experiment to use the minimum amount of the precious resource, the peptide in question.

    “Testing these various conditions rapidly turns into a combinatorial problem with hundreds of tubes required, using more and more of the small sample. Our system assists scientist in documenting and carrying out work, taking the place of finding a colleague to evaluate your experimental design.”

    “The data is entered electronically rather than printed out as hardcopy and glued into a notebook,” points out Dr. Affentranger. “Consequently, the information is readily accessible within the lab, across labs, and across the global environment we all work in today.”

    Indexing Internal Content

    Another issue facing large, multinational pharmaceutical companies is finding material that they previously acquired. This could be as simple as a completed experiment, an expert in a content area, or an archive-bound business strategy analysis.

    To address this issue, a company could index its internal content, much the way Google indexes the Internet. At a large company, however, such a task would be onerous.

    Enter Sinequa, a French-based company that provides an indexing service. The company can convert more than 300 file formats such as pdfs, Word documents, emails, email attachments, and PowerPoint presentations into a format that its computers can “read.”

    According to Sinequa, a large enterprise, such as a pharmaceutical company, may need to cope with 200 to 500 million highly technical documents and billions of data points. This predicament is akin to the situation on the web in 1995. It was necessary to know the precise address of a website to access it. This unnecessary complication was eliminated by Google, which indexed everything on the web. Analogously, Sinequa offers the ability to index the information inside a company so that searches can yield information without requiring inputs that specify the information’s exact location.

    With this kind of search ability, a company can turn its information trove into a treasure trove. Put another way, information can be made to flow, keeping applications turning like turbines, generating the “data power” needed to reposition drugs, reduce time to market, and identify internal and external experts and thought leaders.

    “Sinequa offers a kind of Google algorithm customized for each customer,” details Xavier Pornain, vice president of sales and alliances at Sinequa. “At least 20,000 people use the technology generated by Sinequa. Modern companies create lots of data; we make it searchable.”

    The data searched is not limited to internal documents. Sinequa can also add in external databases or indexing sites such as PubMed, Medline, and Scopus. Of demonstrated flexibility, the search engine can run one version inside a company firewall and another one in the cloud.

    Emulating Intelligence Approaches

    A different search approach, one that leverages the experience of the intelligence community, it taken by the Content Analyst Company. With this approach, a company can comb through internal and external content stores to find relevant information that has value not only as output, but as input. That is, the information can cycle through the search engine, turning its machine learning gears.

    “By adapting to the voice of the user, our software package, Cerebrant, has been very successful in the intelligence and legal communities,” says Phillip Clary, vice president, Content Analyst. “For typical indexing services, such as Google and PubMed, people do huge searches using a long list of key words. A simpler scenario is to write a few sentences, enter the text, and get all the related relevant items returned. Cerebrant can take the place of an expert to sift through all the results to find the relevant ones.”

    Typical searches often yield confounding results. For example, if a user were to ask Google to generate results for the word “bank,” the top results would be financial institutions. Then there would be results for a musical band/person named Bank. Eventually, long past the first page of results, there would be information about the kind of bank that borders a stream or river course. Such results would frustrate a scientific user interested in tissue banks or cell line repositories.

    “In the past, companies have approached the problem of obtaining germane results by attempting to create databases with curation and controlled vocabulary,” notes Clary. “This is how Google works. All those misspelled words have to be entered into the code.

    “Cerebrant functions by learning how the information relates to itself. This was a powerful tool for the intelligence community, because the program can look at all kinds of information (emails, texts, metadata) and make connections within the unstructured data, even when users attempt to veil their meanings by using code words.”

    Search requests composed on Cerebrant can consist of a single sentence or a paragraph describing what sort of information the user wishes to find. This is much more efficient than determining the 30 to 40 keywords you need to use to locate all the information on a complex topic. Then there is still the task of removing the irrelevant finds.

    Cerebrant is a cloud-based application. Generally, it take only about a day to a week to get it up and running. Because it is scalable, Cerebrant can be used by an individual consultant or a multinational conglomerate.

    Given the enormous amount of time, energy, and money invested by the intelligence community, it is refreshing to see a novel application of the wisdom gained from all this work, just as we saw innovative uses of the technology that was developed by the space program.

Read Full Post »

%d bloggers like this: