The sciences are on the verge of a revolution: new technologies are constantly emerging that are fundamentally changing the ways in which we carry out experiments and process the results. Broader trends like the Internet of Things (IoT) and big data have placed unprecedented amounts of data in the hands of researchers. While this helps provide greater context to research, simply improving access to data will not solve the scientific and engineering problems industry, academia and even governments hope to tackle today. There must be a paradigm shift in the way future science is carried out. This future will need to ask different types of questions: rather than ask, “what happened?” or “what will happen?” when designing an experiment, researchers will need to ask, “how do I make it happen?” This will require a widespread embrace of advanced artificial intelligence (AI)—specifically “prescriptive” analytics. 

Reza Sadeghi
Managing Director, Dassault Systèmes BIOVIA

As the challenges researchers face grow more sophisticated, so too must their approaches to solving them. Whether it is a biopharmaceutical firm hoping to understand the multiple mechanisms that lead to cancer, a specialty chemicals firm exploring the molecular interactions that make up self-healing polymers, or an aerospace company attempting to develop lighter, stronger composite aircraft, research methodologies must evolve. The multifaceted nature of these questions is stretching the limits of what physical experimentation can accomplish alone. 

Consider epigenetics, which explores phenotypic outcomes due to variations in the amount of gene expression rather than changes in the genetic code. Introducing more or less copies of a protein directly impacts a complex signaling system of thousands of proteins and ligands within a single cell, which in turn affects nearby cells, tissues, organs and, eventually, the entire organism. The combinations of potential experiments a scientist could consider are astronomical, raising an important question: “what do I test next and how do I execute that test?”

This question creates a problem for research, especially within industry. The pace of discovery is accelerating, placing strains on laboratory and knowledge resources. Physical experiments are costly and time consuming, and researchers are feeling the pressure to prune the discovery process from a branching maze into a straight line. While widespread adoption of technologies that support broader trends such as IoT and big data have helped, augmenting these traditional approaches by providing researchers with new and more tools to manipulate data can uncover hidden trends and interactions. The majority of these tools have previously had diagnostic or predictive applications, seeking to explain “what happened in my experiment?” or “what will happen if I do this experiment?” These tools have begun to grow more automated, assessing data as it comes in to help guide future experimentation. An example of this approach is automated dashboards, which can run statistics on new data, predict values for new inputs and visualize the results. These tools have helped streamline the research process, but still fall short when it comes to answering the question, “what do I test next?” To achieve this, these tools must evolve.

Prescriptive analytics

At their core, traditional analytics are limited in scope to answering the specific question their creator sought to answer. This hampers the true potential of these analytical approaches, as these creators are in turn limited by their own minds. They don’t know what they don’t know. This presents a problem: how do scientists know what questions to ask if they don’t know to ask them? Advanced AI provides a promising solution in the form of “prescriptive” analytics. This is the next logical step in AI-based prediction: it asks, “what should I test to make this happen?” This approach allows machines to go where scientists cannot: by assessing more potential experimental outcomes, the scope of consideration for what a scientist should test grows substantially. Effectively, prescriptive analytics screens future experiments based on existing experimental outcomes, suggesting only the most promising approaches for scientists to verify at the lab bench. Achieving this cognitive level of computing ultimately requires a shift to pursue scientific knowledge as a digital continuity where discovery and research is an evolving and adapting continuum. 

The goal of prescriptive analytics, then, is to evolve the decision-making capacity of a given researcher. However, this evolution requires consistent, timely inputs to ensure that the decision logic of the application remains both relevant and actionable. Tools such as data lakes and IoT can support this approach: a data lake aggregates data together while IoT provides more sources for this data in real time. These tools simplify the timely collection and aggregation of data for researchers to use. However, this alone leads to the previous problem of simply having more data; the challenge is determining what to use and how to use it to generate actionable insights. 

Again, AI provides different approaches that can help support the development and maintenance of prescriptive models. Semi-supervised learning methods, such as active learning, can interact with data sources to label new data as it comes in. For example, consider a formulation development project for a thermoplastic that has a high tensile strength but low modulus: an active learning approach would look at existing formulations that meet the project’s criteria to determine the likelihood that a new formulation could meet those criteria. It can then iteratively suggest new formulations to test, adjusting its methodology throughout subsequent generations as new data becomes available. This approach thus uses new data to refine existing models to ensure that the model’s “decision logic” is up to date, and allows the shift from the previous diagnostic and predictive questions of, “what happened?” and “what will happen?” to prescriptive foresight: “how can we make it happen?” There are existing tools that automate this process; incorporating such technology into an active learning environment helps streamline the managing of these models throughout their lifecycle. 

Prescriptive analytics supports decision-making in other ways as well, especially within search functions. In a way, it becomes a Netflix for scientists, suggesting related items for researchers to consider. For example, if a scientist searches for a particular experiment within the organization’s electronic lab notebook system, it can suggest related experiments carried out by colleagues for the researcher to read. This also applies to literature searches, suggesting related journal articles and thus expanding the potential pool of information available to the researcher. All these potential applications combine to evolve the decision-making of the researcher and expand the scope of consideration for an accelerated time to discovery.

Knowledge-driven decisions

This concept could be extended to think beyond raw data alone: these prescriptive models could utilize the results of existing models as inputs to generate new “models of models.” This would allow researchers to take on more complex questions. In the biopharmaceutical industry, a prescriptive model could suggest small molecule drug candidates that simultaneously optimize multiple parameters, such as maximum target affinity and synthesizability while minimizing hERG-related toxicity. In polyolefin catalyst design, such models could propose new structures for catalysts that minimize synthesis cost and reaction temperature, while maximizing product specificity and catalyst lifetime. In the end, however, these approaches will not take the place of physical experimentation; instead, they will augment the existing work that researchers do, guiding their decisions to ensure work at the bench has the highest likelihood to succeed. 

This would create a feedback loop with the lab, where researchers would validate the results of models with physical experimentation, which would generate data to inform future models. This feedback loop would thus accelerate research, gradually pruning off R&D dead ends and helping each researcher answer the question, “what do I test next?”
The pace of discovery is accelerating, and the next generation of AI will provide enhanced experiences that allow scientists to truly think outside the box. By expanding a researcher’s scope of consideration, researchers can begin to more effectively ask the questions that they did not know they needed to ask. Prescriptive analytics brings together the benefits of existing technologies and helps to unlock their true potential. Automating these processes also improves the overall benefits these approaches offer to an organization, allowing each researcher the ability to become a sort of citizen data scientist, continuously guiding their projects with increasingly sophisticated analytical techniques. The future of science is one of knowledge-driven decisions where data, both from virtual and physical experimentation, becomes the fuel that accelerates toward discovery.