Posts Tagged ‘Biomolecular structure’

The Vibrant Philly Biotech Scene: Focus on Computer-Aided Drug Design and Gfree Bio, LLC

Curator and Interviewer: Stephen J. Williams, Ph.D.



philly philly2night








This post is the second in a series of posts highlighting interviews with Philadelphia area biotech startup CEO’s and show how a vibrant biotech startup scene is evolving in the city as well as the Delaware Valley area. Philadelphia has been home to some of the nation’s oldest biotechs including Cephalon, Centocor, hundreds of spinouts from a multitude of universities as well as home of the first cloned animal (a frog), the first transgenic mouse, and Nobel laureates in the field of molecular biology and genetics. Although some recent disheartening news about the fall in rankings of Philadelphia as a biotech hub and recent remarks by CEO’s of former area companies has dominated the news, biotech incubators like the University City Science Center and Bucks County Biotechnology Center as well as a reinvigorated investment community (like PCCI and MABA) are bringing Philadelphia back. And although much work is needed to bring the Philadelphia area back to its former glory days (including political will at the state level) there are many bright spots such as the innovative young companies as outlined in these posts.

efavirenz_med-2In today’s post, I had the opportunity to talk with molecular modeler Charles H. Reynolds, Ph.D., founder and CEO of Gfree Bio LLC, a computational structure-based design and modeling company based in the Pennsylvania Biotech Center of Bucks County. Gfree is actually one of a few molecular modeling companies at the Bucks County Biotech Center (I highlighted another company RabD Biotech which structural computational methods to design antibody therapeutics).

Below is the interview with Dr. Reynolds of Gfree Bio LLC and Leaders in Pharmaceutical Business Intelligence (LPBI):

LPBI: Could you briefly explain, for non-molecular modelers, your business and the advantages you offer over other molecular modeling programs (either academic programs or other biotech companies)? As big pharma outsources more are you finding that your company is filling a needed niche market?

GfreeBio: Gfree develops and deploys innovative computational solutions to accelerate drug discovery. We can offer academic labs a proven partner for developing SBIR/STTR proposals that include a computational or structure-based design component. This can be very helpful in developing a successful proposal. We also provide the same modeling and structure-based design input for small biotechs that do not have these capabilities internally. Working with Gfree is much more cost-effective than trying to develop these capabilities internally. We have helped several small biotechs in the Philadelphia region assess their modeling needs and apply computational tools to advance their discovery programs. (see publication and collaboration list here).

LPBI: Could you offer more information on the nature of your 2014 STTR award?

GfreeBio: Gfree has been involved in three successful SBIR/STTR awards in 2014.   I am the PI for an STTR with Professor Burgess of Texas A&M that is focused on new computational and synthetic approaches to designing inhibitors for protein-protein interactions. Gfree is also collaborating with the Wistar Institute and Phelix Therapeutics on two other Phase II proposals in the areas of oncology and infectious disease.

LPBI: Why did you choose the Bucks County Pennsylvania Biotechnology Center?

GfreeBio: I chose to locate my company at the Biotech Center because it is a regional hub for small biotech companies and it provides a range of shared resources that are very useful to the company. Many of my most valuable collaborations have resulted from contacts at the center.

LPBI: The Blumberg Institute and Natural Products Discovery Institute has acquired a massive phytochemical library. How does this resource benefit the present and future plans for GfreeBio?

GfreeBio: To date Gfree Bio has not been an active collaborator with the Natural Products Insititute, but I have a good relationship with the Director and that could change at any time.

LPBI: Was the state of Pennsylvania and local industry groups support GfreeBio’s move into the Doylestown incubator? Has the partnership with Ben Franklin Partners and the Center provided you with investment and partnership opportunities?

GfreeBio: Gfree Bio has not been actively seeking outside investors, at least to date. We have been focused on growing the company through collaborations and consulting relationships. However, we have benefitted from being part of the Keystone Innovation Zone, a state program that provides incentives for small technology-based businesses in Pennsylvania.

LPBI: You will be speaking at a conference in the UK on reinventing the drug discovery process through tighter collaborations between biotech, academia, and non-profit organizations.  How do you feel the Philadelphia area can increase this type of collaboration to enhance not only the goals and missions of nonprofits, invigorate the Pennsylvania biotech industry, but add much needed funding to the local academic organizations?

GfreeBio: I think this type of collaboration across sectors appears to be one of the most important emerging models for drug discovery.   The Philadelphia region has been in many ways hard hit by the shift of drug discovery from large vertically integrated pharmaceutical companies to smaller biotechs, since this area was at the very center of “Big Pharma.” But I think the region is bouncing back as it shifts more to being a center for biotech. The three ingredients for success in the new pharma model are great universities, a sizeable talent pool, and access to capital. The last item may be the biggest challenge locally. The KIZ program (Keystone Innovation Zone) is a good start, but the region and state could do more to help promote innovation and company creation. Some other states are being much more aggressive.

LPBI: In addition, the Pennsylvania Biotechnology Center in Bucks County appears to have this ecosystem: nonprofit organizations, biotechs, and academic researchers. Does this diversity of researchers/companies under one roof foster the type of collaboration needed, as will be discussed at the UK conference? Do you feel collaborations which are in close physical proximity are more effective and productive than a “virtual-style” (online) collaboration model? Could you comment on some of the collaborations GfreeBio is doing with other area biotechs and academics?

GfreeBio: I do think the “ecosystem” at the Pennsylvania Biotechnology Center is important in fostering new innovative companies. It promotes collaborations that might not happen otherwise, and I think close proximity is always a big plus. As I mentioned before, many of the current efforts of Gfree have come from contacts at the center.   This includes SBIR/STTR collaborations and contract work for local small biotech companies.

LPBI: Thompson Reuters just reported that China’s IQ (Innovation Quotient) has risen dramatically with the greatest patents for pharmaceuticals and compounds from natural products. Have you or your colleagues noticed more competition or business from Chinese pharmaceutical companies?

GfreeBio: The rise of Asia, particularly China, has been one of the most significant recent trends in the pharmaceutical industry. Initially, this was almost exclusively in the CRO space, but now China is aggressively building a fully integrated domestic pharmaceutical industry.

LPBI: How can the Philadelphia ecosystem work closer together to support greater innovation?

GfreeBio: A lot has happened in recent years to promote innovation and company creation in the region. There could always be more opportunities for networking and collaboration within the Philadelphia community. Of course the biggest obstacle in this business is often financing. Philadelphia needs more public and private sources for investment in startups.

LPBI: Thank you Dr. Reynolds.

Please look for future posts in this series on the Philly Biotech Scene on this site

Also, if you would like your Philadelphia biotech startup to be highlighted in this series please contact me: or @StephenJWillia2.
Our site is read by ~ 570,000 readers, among them thousand international readers daily and followed by thousands of Twitter followers.


Other posts on this site in this VIBRANT PHILLY BIOTECH SCENE SERIES OR referring to PHILADELPHIA BIOTECH include:

RAbD Biotech Presents at 1st Pitch Life Sciences-Philadelphia

The Vibrant Philly Biotech Scene: Focus on Vaccines and Philimmune, LLC

What VCs Think about Your Pitch? Panel Summary of 1st Pitch Life Science Philly

1st Pitch Life Science- Philadelphia- What VCs Really Think of your Pitch

LytPhage Presents at 1st Pitch Life Sciences-Philadelphia

Hastke Inc. Presents at 1st Pitch Life Sciences-Philadelphia

PCCI’s 7th Annual Roundtable “Crowdfunding for Life Sciences: A Bridge Over Troubled Waters?” May 12 2014 Embassy Suites Hotel, Chesterbrook PA 6:00-9:30 PM

Pfizer Cambridge Collaborative Innovation Events: ‘The Role of Innovation Districts in Metropolitan Areas to Drive the Global an | Basecamp Business

Mapping the Universe of Pharmaceutical Business Intelligence: The Model developed by LPBI and the Model of Best Practices LLC


Read Full Post »

Reporter: Aviva Lev-Ari, PhD, RN

A New Approach Uses Compression to Speed Up Genome Analysis

Public-Domain Computing Resources

Structural Bioinformatics

The BetaWrap program detects the right-handed parallel beta-helix super-secondary structural motif in primary amino acid sequences by using beta-strand interactions learned from non-beta-helix structures.
Wrap-and-pack detects beta-trefoils in protein sequences by using both pairwise beta-strand interactions and 3-D energetic packing information
The BetaWrapPro program predicts right-handed beta-helices and beta-trefoils by using both sequence profiles and pairwise beta-strand interactions, and returns coordinates for the structure.
The MSARi program indentifies conserved RNA secondary structure in non-coding RNA genes and mRNAs by searching multiple sequence alignments of a large set of candidate catalogs for correlated arrangements of reverse-complementary regions
The Paircoil2 program predicts coiled-coil domains in protein sequences by using pairwise residue correlations obtained from a coiled-coil database. The original Paircoil program is still available for use.
The MultiCoil program predicts the location of coiled-coil regions in amino acid sequences and classifies the predictions as dimeric or trimeric. An updated version, Multicoil2, will soon be available.
The LearnCoil Histidase Kinase program uses an iterative learning algorithm to detect possible coiled-coil domains in histidase kinase receptors.
The LearnCoil-VMF program uses an iterative learning algorithm to detect coiled-coil-like regions in viral membrane-fusion proteins.
The Trilogy program discovers novel sequence-structure patterns in proteins by exhaustively searching through three-residue motifs using both sequence and structure information.
The ChainTweak program efficiently samples from the neighborhood of a given base configuration by iteratively modifying a conformation using a dihedral angle representation.
The TreePack program uses a tree-decomposition based algorithm to solve the side-chain packing problem more efficiently. This algorithm is more efficient than SCWRL 3.0 while maintaining the same level of accuracy.
PartiFold: Ensemble prediction of transmembrane protein structures. Using statistical mechanics principles, partiFold computes residue contact probabilities and sample super-secondary structures from sequence only.
tFolder: Prediction of beta sheet folding pathways. Predict a coarse grained representation of the folding pathway of beta sheet proteins in a couple of minutes.
RNAmutants: Algorithms for exploring the RNA mutational landscape.Predict the effect of mutations on structures and reciprocally the influence of structures on mutations. A tool for molecular evolution studies and RNA design.
AmyloidMutants is a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability.


GLASS aligns large orthologous genomic regions using an iterative global alignment system. Rosetta identifies genes based on conservation of exonic features in sequences aligned by GLASS.
RNAiCut – Automated Detection of Significant Genes from Functional Genomic Screens.
MinoTar – Predict microRNA Targets in Coding Sequence.

Systems Biology

The Struct2Net program predicts protein-protein interactions (PPI) by integrating structure-based information with other functional annotations, e.g. GO, co-expression and co-localization etc. The structure-based protein interaction prediction is conducted using a protein threading server RAPTOR plus logistic regression.
IsoRank is an algorithm for global alignment of multiple protein-protein interaction (PPI) networks. The intuition is that a protein in one PPI network is a good match for a protein in another network if the former’s neighbors are good matches for the latter’s neighbors.


t-sample is an online algorithm for time-series experiments that allows an experimenter to determine which biological samples should be hybridized to arrays to recover expression profiles within a given error bound.

Compressive genomics

Nature Biotechnology 30, 627–630 (2012) doi:10.1038/nbt.2241

Published online 10 July 2012
Algorithms that compute directly on compressed genomic data allow analyses to keep pace with data generation.

Figures at a glance


In the past two decades, genomic sequencing capabilities have increased exponentially123, outstripping advances in computing power45678. Extracting new insights from the data sets currently being generated will require not only faster computers, but also smarter algorithms. However, most genomes currently sequenced are highly similar to ones already collected9; thus, the amount of new sequence information is growing much more slowly.
Here we show that this redundancy can be exploited by compressing sequence data in such a way as to allow direct computation on the compressed data using methods we term ‘compressive’ algorithms. This approach reduces the task of computing on many similar genomes to only slightly more than that of operating on just one. Moreover, its relative advantage over existing algorithms will grow with the accumulation of genomic data. We demonstrate this approach by implementing compressive versions of both the Basic Local Alignment Search Tool (BLAST)10 and the BLAST-Like Alignment Tool (BLAT)11, and we emphasize how compressive genomics will enable biologists to keep pace with current data.


Compressive algorithms for genomics have the great advantage of becoming proportionately faster with the size of the available data. Although the compression schemes for BLAST and BLAT that we presented yield an increase in computational speed and, more importantly, in scaling, they are only a first step. Many enhancements of our proof-of-concept implementations are possible; for example, hierarchical compression structures, which respect the phylogeny underlying a set of sequences, may yield additional long-term performance gains. Moreover, analyses of such compressive structures will lead to insights as well. As sequencing technologies continue to improve, the compressive genomic paradigm will become critical to fully realizing the potential of large-scale genomics.Software is available at
  1. Lander, E.S. et alNature 409, 860–921 (2001).
  2. Venter, J.C. et alScience 291, 1304–1351 (2001).
  3. Kircher, M. & Kelso, J. Bioessays 32, 524–536 (2010).
  4. Kahn, S.D. Science 331, 728–729 (2011).
  5. Gross, M. Curr. Biol. 21, R204–R206 (2011).
  6. Huttenhower, C. & Hofmann, O. PLoS Comput. Biol. 6, e1000779 (2010).
  7. Schatz, M., Langmead, B. & Salzberg, S. Nat. Biotechnol. 28, 691–693 (2010).
  8. 1000 Genomes Project data available on Amazon Cloud. NIH press release, 29 March 2012.
  9. Stratton, M. Nat. Biotechnol. 26, 65–66 (2008).
  10. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. J. Mol. Biol. 215, 403–410 (1990).
  11. Kent, W.J. Genome Res. 12, 656–664 (2002).
  12. Grumbach, S. & Tahi, F. J. Inf. Process. Manag. 30, 875–886 (1994).
  13. Chen, X., Li, M., Ma, B. & Tromp, J. Bioinformatics 18, 1696–1698 (2002).
  14. Christley, S., Lu, Y., Li, C. & Xie, X. Bioinformatics 25, 274–275 (2009).
  15. Brandon, M.C., Wallace, D.C. & Baldi, P. Bioinformatics 25, 1731–1738 (2009).
  16. Mäkinen, V., Navarro, G., Sirén, J. & Välimäki, N. in Research in Computational Molecular Biology, vol. 5541 of Lecture Notes in Computer Science (Batzoglou, S., ed.) 121–137 (Springer Berlin/Heidelberg, 2009).
  17. Kozanitis, C., Saunders, C., Kruglyak, S., Bafna, V. & Varghese, G. in Research in Computational Molecular Biology, vol. 6044 of Lecture Notes in Computer Science (Berger, B., ed.) 310–324 (Springer Berlin/Heidelberg, 2010).
  18. Hsi-Yang Fritz, M., Leinonen, R., Cochrane, G. & Birney, E. Genome Res. 21, 734–740 (2011).
  19. Mäkinen, V., Navarro, G., Sirén, J. & Välimäki, N. J. Comput. Biol. 17, 281–308 (2010).
  20. Deorowicz, S. & Grabowski, S. Bioinformatics 27, 2979–2986 (2011).
  21. Li, H., Ruan, J. & Durbin, R. Genome Res. 18, 1851–1858 (2008).
  22. Li, H. & Durbin, R. Bioinformatics 25, 1754–1760 (2009).
  23. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. Genome Biol. 10, R25 (2009).
  24. Carter, D.M. Saccharomyces genome resequencing project. Wellcome Trust Sanger Institute (2005).
  25. Tweedie, S. et alNucleic Acids Res. 37, D555–D559 (2009).

Primary authors

  1. P.-R.L. and M.B. contributed equally to this work.
    • Po-Ru Loh &
    • Michael Baym


  1. Po-Ru Loh, Michael Baym and Bonnie Berger are in the Department of Mathematics and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
  2. Michael Baym is also in the Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

September 2012

Compressing a dataset with specialized algorithms is typically done in the context of data storage, where compression tools can shrink data to save space on a hard drive. But a group of researchers at MIT has developed tools that compute directly on compressed genomic datasets by exploiting the fact that most sequenced genomes are very similar to previously sequenced genomes.

 Speed Up Genome Analysis

by exploiting the fact that most sequenced genomes are very similar to previously sequenced genomes.

Led by MIT professor Bonnie Berger, the group has recently released tools called CaBlast and CaBlat, compressive versions of the widely used Blast and Blat alignment tools, respectively.

In a Nature Biotechnology paper published in July, Berger and her colleagues describe how the algorithms deliver alignment and analysis results up to four times faster than Blast and Blat when searching for a particular sequence in 36 yeast genomes.

“What we demonstrate is that the more highly similar genomes there are in a database, the greater the relative speed of CaBlast and CaBlat compared to the original non-compressive versions,” Berger says. “As we increase the number of genomes, the amount of work required for compressive algorithms scales only linearly in the amount of non-redundant data. The idea is that we’ve already done most of the work on the first genome.”

These two algorithms are still in the beta phase, and the MIT team has several refinements planned for future release to optimize performance. To that end, Berger has made the code for both algorithms available with the hope that developers will help them build “industrial-strength” software that can be used by the research community.

“To achieve optimal performance in real-use cases, we expect the code will need to be tuned for the engineering trade-offs specific to the application at hand,” she says. “The algorithm used to find and compress similar sequences in the database may need to be tweaked to take this issue into account, and the coarse- and fine-search steps should be aware of these constraints as well.”

While computing resources are becoming increasingly powerful, Berger contends that better algorithms and the use of compression technology will play a crucial role in helping researchers to keep up with the production of next-generation sequencing data.

Matthew Dublin is a senior writer at Genome Technology.

Read Full Post »