1,433
Views
7
CrossRef citations to date
0
Altmetric
Editorial

The successes and challenges of open-source biopharmaceutical innovation

Abstract

Introduction: Increasingly, open-source–based alliances seek to provide broad access to data, research-based tools, preclinical samples and downstream compounds. The challenge is how to create value from open-source biopharmaceutical innovation. This value creation may occur via transparency and usage of data across the biopharmaceutical value chain as stakeholders move dynamically between open source and open innovation.

Areas covered: In this article, several examples are used to trace the evolution of biopharmaceutical open-source initiatives. The article specifically discusses the technological challenges associated with the integration and standardization of big data; the human capacity development challenges associated with skill development around big data usage; and the data-material access challenge associated with data and material access and usage rights, particularly as the boundary between open source and open innovation becomes more fluid.

Expert opinion: It is the author’s opinion that the assessment of when and how value creation will occur, through open-source biopharmaceutical innovation, is paramount. The key is to determine the metrics of value creation and the necessary technological, educational and legal frameworks to support the downstream outcomes of now big data-based open-source initiatives. The continued focus on the early-stage value creation is not advisable. Instead, it would be more advisable to adopt an approach where stakeholders transform open-source initiatives into open-source discovery, crowdsourcing and open product development partnerships on the same platform.

1. Introduction

While the Human Genome Project catalyzed the open-source movement in genomics- and proteomics-based research, Allarakhia et al. (2010) discuss that increasingly open-source–based alliances seek to provide broad access to research-based tools – including microarrays, assays, software, preclinical samples – including biological models and tissue samples, and downstream compounds Citation[1]. Interestingly, the emergence of open-source alliances appears to have shifted from the public sector to the private sector – in some cases, with private sector stakeholders such as AstraZeneca, Eli Lilly, Merck, Pfizer and GlaxoSmithKline (GSK) unilaterally encouraging open-source discovery. The challenge, however, exists in the creation of value from these deposits and the large repositories of data. I suggest that beyond the actual deposits themselves, open-source initiatives should foster linkages between experts in key disease arenas; encourage collaborations through such linkages or more overtly through the associated information and communication technologies (ICT) infrastructure or license-based alliances arising from discovery-based activities; enable human capacity development through educational and other training programs that take advantage of the available knowledge-based assets; and effectively integrate not only big data, but provide increased access to biological/chemical materials and models for downstream value creation.

2. The state of affairs

In the area of open-source discovery, there are several initiatives underway including open-source genomics, proteomics, systems and biomarker research where the primary mandate is the sharing of data, tools and equipment for biological processes understanding. Biological discovery is complemented increasingly by open-source compound discovery based on research on targets and target families. Open-source compound discovery typically includes the open donation of chemical structure data and more so the compounds themselves for analysis. Moving further downstream, open-source preclinical programs typically involve the sharing of biological samples for biological analysis and chemical testing. Emerging now are programs for open-source clinical trial management. While in some cases, stakeholders have proposed open-source clinical trials where public sector agencies lead trials including providing the necessary capital (with the goals of transparency, openness of data, and effective pricing), open-source clinical trial programs have traditionally focused on clinical trial data capture and management. Under discussion however is the need for open clinical trial data scrutiny. Below I discuss several examples to trace the evolution of biopharmaceutical open-source initiatives.

2.1 Pre-competitive open-source drug discovery

The public sector has long participated in the development of the open-source model. Open-source databases such as DrugBank, the Potential Drug Target Database, Therapeutic Target Database and SuperTarget provide target and drug profiles Citation[2-5]. These databases feature drug targets, including protein and active site structures, association with related diseases, biological functions and associated signaling pathways Citation[3]. SuperTarget provides the additional connectivity between target protein and relations to drugs – specifically, 7300 relations to 1500 drugs Citation[4]. Similarly, DrugBank featuring both small and large molecule drugs in the database provides comprehensive information on target diseases, proteins, genes and organisms on which these drugs act Citation[5]. ChemProt, a disease chemical biology database, is based on a collection of multiple chemical-protein annotation resources, as well as disease-associated protein–protein interactions Citation[6]. Moving further downstream, compound-specific databases comprise PubChem, ChEMBL and ChemSpider Citation[2]. Hinting at the need for data integration and standardization is Open Babel. Open Babel is an open, collaborative project permitting users to search, convert, analyze or store data from molecular modeling, chemistry, solid-state materials, biochemistry or related areas Citation[7]. While these endeavors provide a view to the endorsement of the open-source model by the public sector, of interest in this article are those initiatives that jointly comprise and/or involve the public and private sectors. Such initiatives provide a glance at the dual incentives driving participation or interaction – namely enjoyment of the common value of assets (jointly shared and accessed by stakeholders) versus the private value of assets (signaling the need to carefully assess the transition from openly shared assets to privately exploitable assets for downstream, product-based value creation targeting patients).

The structural genomics consortium (SGC) is engaged in pre-competitive research to facilitate the discovery of new medicines. As part of its mission the SGC is generating reagents and knowledge related to human proteins and proteins from human parasites. The SGC believes that its output will have maximal benefit if released into the public domain without restriction on use, and thus has adopted an open-access policy Citation[8]. Specifically, the SGC and its scientists are committed to making their research outputs (materials – probes and knowledge-protocols) available without restriction on use. This involves the rapid placement of results in the public domain without the provision of patent protection on any research outputs. Private sector participants support the consortium through the provision of funding to accelerate precompetitive drug research in the areas of protein sciences and epigenetics in addition to supplying the consortium with subsets of compound libraries for screening Citation[9]. Such stakeholders provide medicinal chemistry expertise to support projects – bridging industry-specific medicinal chemistry expertise with the biological expertise of academia. To promote human capacity development, the SGC has established a visiting scientist program in which external scientists, graduate students and postdoctoral fellows are invited to work within the SGC on proteins of mutual interest Citation[8].

2.2 Open-source data standardization

With the vast amounts of data being generated, the need for data aggregation and standardization has arisen. The Pistoia Alliance was first conceived by informatics experts at AstraZeneca, GSK, Novartis and Pfizer interested in finding solutions to address the need to aggregate, access and share precompetitive data. The alliance encompasses life science companies, academic groups, informatics vendors and publishers. Members have the objective to lower barriers to innovation by improving the interoperability of R&D business processes through precompetitive collaboration Citation[10]. Open standards are being generated for common business terms, relationships and processes. A closer look at this alliance, however, reveals the need to encourage effective collaboration to generate tangible and measurable value for organizations. Here we see the first instance of moving beyond the provision of information to proof of concepts and then solution creation for employment in industry. Where the role division exists (that is, role of such initiatives in creating upstream vs downstream value) is worth exploring.

2.3 Open-source data integration

In parallel, the Open PHACTS consortium is building an Open Pharmacological Space. The aim is to build a freely available platform, integrating pharmacological data from a variety of information resources and providing tools and services to analyze and validate integrated data in an effort to support pharmacological research. The project aims to create an online platform with a set of integrated publicly available pharmacological data. The software and data will be available for download and local installation, under an open-source and open-access model – with a distinction between access and usage rights Citation[11]. The open data distribution is a fully downloadable resource and aggregates all of the data from databases whose underlying licenses meet the open knowledge definition (OKD) Citation[11]. This distribution allows the community access to a resource for experimentation and innovation. The Explorer data distribution, which is accessible via the Internet, allows the user to map and interact with data from both the OKD-compliant databases and those for which there is no right for redistribution Citation[11]. The extra data present in the Explorer distribution come from databases whose underlying licenses allow access, interaction and analysis of the information, but do not allow downloading of that data Citation[11].

2.4 Managing the boundary between open-source and open innovation discovery

Through the Eli Lilly Phenotypic Drug Discovery (PD2) Initiative, Lilly works with research universities, institutes and biotechnology companies to uncover compounds that may become future medicines targeting cancer, neurological disorders and metabolic diseases. Using the PD2 website, external investigators can access Lilly’s phenotypic assays. As part of the PD2 business model, promising compounds can be further advanced through optimization. The goal of PD2 is not to promote a random, high-volume compound submission, but rather to stimulate the joint testing of compounds that represent novel chemical diversity and molecular hypotheses that are strategically considered in light of the biology associated with each assay Citation[12,13]. With the implementation of the TargetD2 initiative, Lilly has expanded access to target-based assays in addition to relevant computational methods. The TargetD2 initiative – or Target Drug Discovery – focuses on evaluating a disease hypothesis through the process of discovery and clinical testing of a molecule designed to interact with a specific genomic target believed to be involved in disease pathogenesis Citation[13]. Supporting this joint knowledge transfer is the open disclosure of Lilly’s evaluation process. For compounds of interest, Lilly invites investigators/institutions to disclose the structure of the selected compound(s) under confidentiality. Thereafter, Lilly can declare its interest for further studies. Lilly and the investigator/institution can choose to negotiate a collaborative agreement if mutual interest exists. Alternatively, the investigator/institution may choose not to disclose the structure(s), and may opt to publish or retain the provided data for future disclosures Citation[13]. While this initiative resembles more the open innovation format for collaboration, it is worthwhile to explore this model further as open-source initiatives seek to clarify the boundary between collective knowledge and appropriated knowledge.

2.5 Moving downstream with open-source development

The newly formed Arch2POCM is an example of the extension of this model further downstream in the biopharma value chain. Arch2POCM is a new public private partnership (PPP) with the goal of accelerating the pharmaceutical industry’s development of new and effective medicines by conducting discovery through Phase II clinical trials on novel targets. Arch2POCM will operate without filing any patents and will share all of its data and drug-like compounds – providing knowledge about the roles a selected target plays in human biology and disease including with those stakeholders that will test them across a variety of indications (Arch2POCM, 2014) Citation[14]. Even though an Arch2POCM molecule would not be patented, stakeholders cannot pursue clinical trials with potential molecules without having the IND database. By controlling the database, Arch2POCM will de facto have exclusivity on the right to determine trials for a molecule, although the molecule and the resulting science are deposited into public Citation[15]. In this sense, the PPP exploits the existing laws of data exclusivity (modifying the traditional barrier between precompetitive and competitive data) and instead provides a de-risked molecule to private stakeholders wishing to pursue further trials with market exclusivity the downstream incentivization Citation[15].

Finally, in May 2013, GSK made a commitment to provide access to de-identified patient-level clinical trial data. Investigators are able to request access to de-identified patient-level data from a subset of GSK-sponsored clinical trials Citation[16]. Once investigators have received approval for a research proposal from the independent review panel appointed by GSK and have signed a data-sharing agreement, the data are made available to them. To promote further transparency, investigators will be required to make a summary of their analysis plan publicly available, to post summary results and to seek publication of the results after study completion. Investigators will retain the rights to any new intellectual property derived from their research, but GSK will require a no-cost, nonexclusive license to use any resulting invention. GSK suggests that the next step must be to transition to a system whereby a third party without connection to the generation of data acts as the custodian of access to the database (or databases), with the expectation that data will be available from multiple companies, public-sector organizations and funders. This would allow for a variety of cross-sponsor meta-analyses Citation[16].

Across all of these examples it becomes apparent that the open-source model in itself is not sufficient. The challenge is how to create value from open-source biopharmaceutical innovation. This value creation may occur via human capacity development through educational and other training programs that take advantage of the available knowledge-based assets as we see in the case of the SGC; collaborations that foster data analysis, proof-of-concept development and technology development, all areas of exploration at the Pistoia Alliance; data integration as is occurring in the Open PHACTS initiative; the bridging of medicinal chemistry development and biological expertise as stakeholders move seamlessly between open source and open innovation in the Lilly platforms; the usage of other private incentivization opportunities including market exclusivity to generate freely available biological/chemical materials through the Arch2POCM initiative; and the opening up of clinical trial data and collaborative cross-sponsor meta analyses with the GSK initiative a starting point for this movement downstream (). The next section aggregates and discusses these challenges as being technologically, educationally and legally oriented – offering suggestions for maximizing the opportunities for big data value creation.

Table 1. The state of affairs for current open-source biopharmaceutical initiatives.

3. The challenges of open-source pharmaceutical innovation

The challenge of open-source biopharmaceutical innovation is to employ effective technological, educational and legal platforms to progress beyond the simple aggregation of data for large-scale access, to big data value creation (). Big data is defined for our purposes as collections of data so large and from a vast number of sources and sectors that sophisticated technological and intellectual capital assets are required to create downstream value creation Citation[1]. This value creation may involve data validation, data usage in further discovery and/or data usage including its embodiment in downstream product development.

Table 2. Addressing the challenges of open-source biopharmaceutical innovation.

3.1 The technological challenge

In the era of big data, the effective discovery, securing including storage, access, sharing, analysis and visualization of data will be necessary by a variety of stakeholders who may be geographically separated. Specifically, biological, chemical, tissue and reagent databases, trial results and adverse reaction events databases will need to be integrated or linked together – including those sponsored by academia, nonprofit institutions, other private organizations or regulatory agencies. Linkages to literature and patent databases will add value to stakeholders seeking to understand the current technological and legal state of affairs in a disease/drug arena. Data may be mined, categorized and made accessible as a function of disease area, target family or drug family. Hence, we will see the need for effective data ontology and classification strategies. Alongside deposit requirements for data into public and privately sponsored databases, standardized data representation will be necessary to ensure effective access and analysis of data deposits. Worth noting is establishment of requisite levels of access to data sets – ranging from full and open access and download opportunities, to access, interaction and analysis without the associated download opportunities.

3.2 The human capital development challenge

The previous discussion suggests moving away from the situation where each disciplinary silo generates large amounts of data in isolation. Adding to this discussion is the need to encourage collaborations using the ‘big data’ generated from these initiatives. A variety of stakeholders may be offered the opportunity to participate in dialogue with respect to new knowledge innovation including the generation and/or sharing of knowledge-based assets, and the validation of such knowledge-based assets. Sage Bionetworks, for example, has established a catalogue of freely available datasets, contributed by a variety of stakeholders, for use in integrative genomics analysis and building predictive computational disease models. The associated data are available through Synapse, an innovation space created to bring scientific data, tools and disease models together into a ‘commons’ permitting collaborative research. The Synapse platform consists of a web portal, web services, data analysis tools, and analysis communities that any scientist can create or join Citation[17]. Arising then out of these collaborations should be human capacity engagement through educational and other training opportunities. I have discussed the SGC visitor training program as one example and we see the recently established WIPO Re:Search initiative offering educational and training opportunities for researchers within the private sector context Citation[2].

3.3 The data-material access challenge

It is simply not enough to deposit knowledge (whether data, biological materials, chemical structures, tools and/or IP) into the open commons without understanding access and downstream usage rights. This may consist of the employment of ICT, repositories, to the crafting of complex rules governing participation, accessibility and usability in downstream product development. Reichman and Uhlir (2004) discuss that the use of the information in the commons often depends either on the possibility of linking databases and downstream user communities Citation[18]. The need then arises for collaboration spaces where users can discuss, query, rate and/or validate the data, as well as seek partners for discovery and development activities. Participants can share disembodied pure knowledge including, but not limited to, ideas, articles, papers and other literary work, data, software, applications, notes, results of experiments and patents. Equivalently, embodied knowledge in the form of reagents, molecular and chemical libraries and biological models can be donated by participants Citation[19]. Of consequence is then the management of the resources. Disembodied pure knowledge can be managed through sophisticated and interconnected databases, while embodied knowledge may be physically housed in material and model repositories. Supporting this infrastructure will be patient privacy assurances for materials sourced from patients, researcher access rights, compound nomination usage and/or licensing provisions. Ultimately, the value of any open-source initiative resides in an understanding that a transition point exists between public and private value creation, thereby necessitating a discussion and clarification of materials access and exploitation rights.

3.4 The challenge of assessing the common versus private value of knowledge assets

Ultimately, researchers will need to assess the common value generated through participation in open-source initiatives and the private value from unilateral R&D and/or through the appropriation of knowledge assets for downstream development. These conflicting valuation considerations are likely to be particularly salient in those open-source initiatives with representation from the public and private sectors.

I contend that the common benefits from openly contributing versus the private benefits associated with appropriating knowledge will determine when a participant will choose to signal his/her departure from an open-source alliance or even opt to pursue unilateral activities. Allarakhia et al. (2010) suggest that the synthesis of knowledge in an open-source alliance will create a common value CV related to the knowledge units jointly generated and contributed to the alliance. Private benefits are those that a firm can obtain unilaterally. The knowledge that is held unilaterally will have a private value PV for its owner. By pooling a partner’s knowledge with internal firm knowledge, a firm can gain a competitive advantage in downstream activities. From a knowledge perspective, Allarakhia et al. (2010) suggest that the common value of knowledge will derive from the characteristics associated with the knowledge and the value from collectively holding all of the knowledge in the public domain. Equivalently, the private value will derive from the characteristics of knowledge – namely the level of substitutability, complementarity, and applicability and the ability to pursue downstream activities unilaterally and/or through privatization of knowledge assets without incurring the costs associated with patent blockades.

Here the challenge resides in the ability to effectively determine the difference between the common value CV and private value PV. Criteria affecting this determination will include the characteristics of knowledge as warranting more open access; the phase of development as determining the incentivization needed to encourage collaborative versus unilateral R&D particularly when large capital assets are required for value creation; the state of knowledge as driving the need for joint analysis and validation to elucidate the association of knowledge to drug development; and the cost savings from joint and open collaboration versus the costs incurred from a loss in unilateral private value capture.

4. Expert opinion

It is the assessment of when and how value creation will occur through open-source biopharmaceutical innovation that is paramount – with a particular focus on common value creation opportunities. The technological solutions that will enable the seamless integration of data, the training programs that will ensure skill development around the domains of open-source biopharmaceutical innovation, and the establishment of access/usage rights will all determine the point of value creation – be it through collaborative discovery or collaborative product development. These metrics should progress beyond data deposits and access levels, to the number of partnerships enabled through adjoining collaboration and networking tools, the materials sourced through linked repositories, the number of formal projects initiated through calls for cooperation, the number of leads moving formally into downstream development, and, ultimately, with increased transparency and openness of clinical data, the time and success parameters across clinical phases and regulatory approval. The failure of past open-source endeavors likely resides in the limited perspective of big data creation, rather than collaborative value creation. The biopharmaceutical industry should encourage traditional and newly involved stakeholders to communicate and corroborate each other’s results to create a coherent view of biological processes as well as points of intervention. The industry has successfully created the open-source frameworks to generate data. Continued focus on this early stage of value creation, particularly if created by the various silos of the industry, is not advisable. Rather, the time has come to adopt the open innovation lens where stakeholders transform open-source initiatives into open-source discovery, crowdsourcing and open product development partnerships on the same platform.

Declaration of interest

The author has no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

Bibliography

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.