Enhance ideation in SBDD with centralized 3D protein structure repository

The value of three-dimensional (3D) protein structures for structure-based drug discovery (SBDD) lies in the information they contain and how that information can be applied to specific discovery projects. Depending on the SBDD projects’ goals, the same structure of the target protein can, therefore, have a different impact (See our article on Maximizing the value of 3D protein structure).

To ensure we fully express the value of each 3D protein structure, it is crucial to manage them and their associated structural metadata properly. This implies having the appropriate tools and systems for accessing, navigating, sharing, analyzing, and reusing the available structures.

However, due to the ever-growing volume and complexity of structural data, their effective handling and exploitation pose significant challenges for efficient drug discovery. (Discover the most common challenges in managing large structural datasets in our whitepaper). When 3D protein structures are left unmanaged, valuable information for the project is missing. Such a lack of insights hinders innovation and imposes additional costs on the SBDD teams.

In the recent Discngine Labs event, we hosted a roundtable with drug discovery experts from the pharmaceutical industry who shared their experience and knowledge in structural data management. In this blog post, you will find the key insights and best practices for ensuring the vast 3D protein structures are used as initially intended – to unlock new opportunities for drug discovery and development.

Access the full recording of the event at the end of this page.

3D protein structure data management: The basics

The roundtable was chaired by Simone Fulle (Head of Protein Engineering in the Computational Drug Design area at Novo Nordisk), who led the discussion of the panelists:

Daria Goldmann(Artificial Intelligence Expert, Integrated Drug Discovery unit at Sanofi)
David Thompson (Senior application scientist at CCG)
Simone Culurgioni (Head of Crystallography at Exscientia)

To introduce the discussion, we conducted a poll among the attendees of the event (mainly from industry), asking:

What do you use to store your 3D protein structures?

The results of the poll conducted during the Discngine Labs event with a simple question: “What do you use to store your 3D protein structures?” The majority of participants from the pharma industry reported that they are still using Internal folders/Sharepoint to store their structural data.

Almost half of the participants reported simply storing data in folders. This result was particularly striking for the panelists, who agreed that the best practice for structural data management in pharmaceutical companies is internally developed databases. Currently, the most common workflow for storing is that structural biologists deposit their own 3D structures in the database. Then, other project members (such as from molecular modeling or medicinal chemistry teams) can access the PDB files from this repository.

The issue here is that the internally developed database systems are usually not very interactive. To actually work with 3D protein structures (e.g., visualize or analyze them), you would need to export structures into different programs. This is something where they think commercial software solutions can help, providing more intuitive interfaces to facilitate the user experience and integrate analytic tools.

Another aspect to consider when accessing structural data among the project team is that, ideally, everything should be available. However, just seamless access to all 3D protein structures is not enough for optimal SBDD. Given the multidisciplinary backgrounds of drug discovery team members, it is fundamental that they not only have open access to the structure but also get indications on how to interpret them for correct usage and understanding.

For a protein structure to make a real contribution to the drug discovery effort, it must be understood and contextualized with metadata in the ongoing project.

“The structures cannot be taken for granted like numbers or color codes. Everything needs to be there, but it needs to be explained.”

AI-predicted and in-silico models

Traditionally, structural data management mainly concerned experimentally produced structures, but now we have a new challenge posed by AI-predicted models.

“Analyzing massive structural data coming from AI is new in our field.”

From the panelists' discussion, AI-predicted models are a precious tool when no other structural knowledge is available. An example is the solution of new experimental structures: when no previous experimental structure of the drug target is available, using AI-predicted models as a starting point for Molecular Replacement is incredibly valuable. However, such models are still not as helpful for later stages of drug discovery research, for instance, in hit-to-lead and lead compounds optimization.

This is because they are not very representative in all states of the proteins (apo/holo, active/inactive conformation), so they should be treated carefully. Therefore, integrating all these structures in the databases without considering their possible usage is not very valuable unless it is for specific use cases.

**Watch the recording of our previous Discngine Labs with drug discovery experts from industry and academia on the topic:** *Protein structure prediction:* What next after Alpha Fold?

Watch "What's next after AlphaFold?”

Similarly, in-silico models internally produced are beneficial for the discovery of drugs but should not be deposited in the database without any indication. A good practice that should apply to any structure deposited in a database (also the experimental ones, like Cryo-EM, NMR, and X-Ray) should be associating information on how the structure was produced. This includes, for example, the conditions of the in-vitro or in-vivo experiments. Regarding in-silico produced models, you should always include information about the computational methods and parameters used.

“We should be recording the conditions of our experiments, whether in the lab or in silico, because I want to be able to reproduce what you did, and I want to be able to show someone what you did.”

Future perspectives

The final part of the roundtable was focused on the future of structural data management for efficient SBDD.

All panelists believe the key is the flexibility and integration between available tools. We currently have different systems that allow access to diverse information, from target identification to molecular design. Higher integration of such systems would be crucial for extracting and collecting all the relevant knowledge and supporting the drug discovery effort.

Integrating emerging machine-learning technologies, such as ChatGPT, into existing platforms could make extracting structural knowledge from data easier. In general, all AI tools will be more and more included in the drug discovery workflows as new technologies emerge. In big pharma, there is a big push to integrate novel AI technologies. However, it might require some time because the traditional systems are still in place. This process can be accelerated through outsourcing and external collaboration. Nevertheless, this market is moving very fast, and we are already witnessing the appearance of new AI-driven companies, where the new technology is leading the drug discovery effort.

Even if AI will undoubtedly impact and even drive drug discovery research, it is important to remember that the human factor is always essential. If the new technologies can speed up the Design-Make-Test-Analyze (DMTA) cycle, the critical thinking of scientists in the process is always the most significant contribution.

“You can use the computers in the algorithmic approaches as an accelerant to explore the space, but having the human to provide that context and to nudge the direction is absolutely critical.”

Summary

Our panelists provided valuable insights into the essential criteria that should be considered for efficient 3D protein structure data management, such as:

Establish a centralized repository that is seamlessly accessible to the entire SBDD team
Go beyond mere deposition of 3D protein structures (experimental or in-silico models); the important information related to the structure must also be stored and shared
Be selective about including AI-predicted models in the repository, having only those that bring value
Recognize the necessity for a higher level of integration among various data and tools to accelerate drug discovery efforts
IncorporateAI systems to enhance the transformation from structural data to knowledge

With such insights in mind, you can ensure that each 3D target protein structure is fully leveraged to its maximum potential, undoubtedly contributing to crucial information for rational design cycles and accelerating the discovery of new drugs.

Watch event recording

How is Discngine helping SBDD scientists in 3D protein structure data management?

3decision is a protein structure repository that centralizes public and proprietary structural data (X-ray, Cryo-EM, NMR, and computational models) in a single database. It also integrates advanced analytic tools to allow you to maximize the value of your 3D protein structures for drug discovery.

In addition to the roundtable, at the Discngine labs event, we showed an example of how you can use 3decision to extract new knowledge from the vast quantity of structural data and generate design ideas for repurposing of existing therapeutics.

In her live demo, 3decision Scientific Business Developer Lorena Zara reproduced the results of a paper from Salentin et al., showing how you can use the 3decision Protein-Ligand Interaction Search for the repurposing of an antimalaria drug for cancer application and generate new design ideas. You can find the video of her use case below.

How to best exploit 3D protein structures for ideation in SBDD