Foreword

Enoch S. Huang,     Newton, Massachusetts, April 2012

Twelve years ago, I joined the pharmaceutical industry as a computational scientist working in early stage drug discovery. Back then, I felt stymied by the absence of a clear legal or IT framework for obtaining official support for using Free/Libre Open Source Software (FLOSS) within my company, much less for its distribution outside our walls. I came to realize that the underlying reason was because I do not work for a technology company wherein the establishment of such policies would be a core part of its business. Today, the situation is radically different: the corporate mindset towards these technologies has become far more accommodating, even to the point of actively recommending their adoption in many instances. Paradoxically, the reason why it is so straightforward today to secure IT and legal support for using and releasing FLOSS is precisely because I do not work for a technology company! Let me explain.

In recent years I have perceived a sea change within our company, if not the industry. I recall hearing one senior R&D leader stating something to the effect of ‘ultimately we compete on the speed and success of our Phase III compounds’ as he was making the case that all other efforts can be considered pre-competitive to some degree. This viewpoint has been reflected in a major revision of the corporate procedure associated with publishing our scientific results in external, peer-reviewed journals, especially for materials based on work that do not relate to an existing or potential product. Given that my employer is not in the software business, the process I experience today feels remarkably streamlined. Likewise, in previous years I would have been expected to file patents on computational algorithms and tools prior to external publication in order to secure IP and maintain our freedom to operate (FTO). The prevailing strategy today, at least for our informatics tools, is defensive publication.

The benefits of publication to a pharmaceutical company in terms of building scientific credibility and ensuring FTO are clear enough, but what about releasing internally developed source code for free? A decade ago my proposal to release as open source the Protein Family Alignment Annotation Tool (PFAAT) [1] was met by reactions ranging from bemusement to deep reluctance. We debated the risk associated with our exposing proprietary technology that might enable our competitors, at a time when ‘competitive’ activities were much more broadly defined. Moreover, due to our lack of experience with managing FLOSS projects, it was difficult to assure management that individuals not in our direct employ would willingly and freely contribute bug fixes and functional enhancements to our code. Fortunately in the case of PFAAT, our faith was rewarded, and today the project is being managed by an academic lab. It continues to be developed and available to our researchers long after its internal funding has lapsed. In many key respects our involvement with PFAAT foreshadowed our wider participation in joint precompetitive activities in the informatics space [2], now with aspirations on a grander scale.

It has been fantastic to witness the gradual reformation of IT policies and practices leading to the corporate acceptance and support of systems built on FLOSS in a production environment. I imagine that the major factors include technology maturation, the emergence of providers in the marketplace for support and maintenance, and downward pressure on IT budgets in our sector. For a proper treatment of this subject I recommend Chapter 22 by Thornber. From an R&D standpoint, the business case seems very clear, particularly in the bioinformatics arena. The torrent of data streaming from large, government-funded genome sequencing centers has driven the development of excellent FLOSS platforms from these institutions, such as the Genome Analysis Toolkit [3] and Burrows-Wheeler Aligner [4]. Other examples of FLOSS being customized and used within my department today include Cytoscape [5], Integrative Genomics Viewer [6], Apache Lucene, and Bioconductor [7]. It makes sense for large R&D organizations like ours, having already invested in bioinformatics expertise, to leverage such high-quality, actively developed code bases and make contributions in some cases.

Looking back over the last dozen years, it is apparent that we have reaped tremendous benefit in having embraced FLOSS systems in R&D. Our global high performance computing system is based on Linux and is supported in a production environment. The acceptance of the so-called LAMP (Linux/Apache/MySQL/PHP) stack by the corporate IT group sustained our highly successful grassroots efforts to create a company-wide wiki platform. We have continued to produce, validate, and publish new algorithms and make our source code available for academic use, for example for causal reasoning on biological networks [8]. It has been a real privilege being involved in these efforts among others, and with great optimism I look forward to the next decade of collaborative innovation.

References

[1] Caffrey, D.R., Dana, P.H., Mathur, V., et al. PFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignments. BMC Bioinformatics. 2007; 8:381.

[2] Barnes, M.R., Harland, L., Foord, S.M., et al. Lowering industry firewalls: pre-competitive informatics initiatives in drug discovery. Nature Reviews Drug Discovery. 2009; 8(9):701–708.

[3] McKenna, A., Hanna, M., Banks, E., et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010; 20(9):1297–1303.

[4] Li, H., Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010; 26(5):589–595.

[5] Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., Ideker, T. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011; 27(3):431–432.

[6] Robinson, J.T., Thorvaldsdottir, H., Winckler, W., et al. Integrative genomics viewer. Nature Biotechnology. 2011; 29(1):24–26.

[7] Gentleman, R.C., Carey, V.J., Bates, D.M., et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biology. 2004; 5(10):R80.

[8] Chindelevitch, L., Ziemek, D., Enayetallah, A., et al. Causal reasoning on biological networks: interpreting transcriptional changes. Bioinformatics. 2012; 28(8):1114–1121.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.93.236