Data & AI MarkLogic

Leveraging AI and Future Ready-Data to Accelerate R&D

by Mike Savel Posted on June 06, 2025

A Q&A with a Panel of Experts from the Dow Chemical Company, hosted by Progress

R&D is all about competitive edge. Generative AI has emerged as an R&D catalyst, promising to unlock new use cases and opportunities. We recently assembled an expert panel of seasoned data scientists and information research leaders from the Dow Chemical Company to discuss how they built a semantic data hub with the Progress Data Platform to capture years of R&D knowledge, facilitate its discovery across the organization and make it future-ready for years to come.

This post is a summary of the information presented by the panel, which did a much deeper dive into the topics covered. The material from the Q&A session has been edited for length and clarity. If you would like to watch the interview in its entirety, you can view it here .

The Panel

From Dow Chemical. Dow is one of the world’s leading materials science companies, with sales of $43 billion in 2024.

Simon Cook, P.H.D, Senior Solution Manager/Scientist
Alix Schmidt, Senior Data Scientist, R&D Model Deployment Strategy
John Talbert, Fellow and Systems Architect

From Progress: Drew Wanczowski, Moderator, Senior Principal Solution Engineer.

How does information research support R&D at Dow, and how do you collaborate across business units and R&D groups to navigate complex data and security requirements?

Alix: We act as translators helping scientists capture, store and transform knowledge and data as well as protect it. This is a complex area that requires agility because things in research change all the time. We need to design our data systems so when you find something new or you're working with something new, we can take that data and be able to incorporate it within our ecosystem very fast.

John: Keep in mind that the scale of Dow and our R&D function includes thousands of people going to the lab, doing experiments, collecting information, documenting information, accessing external information—all kinds of things to innovate and serve our customers and so it's a pretty large-scale initiative. At the same time, we’re working towards digital transformation to really get to that next level of what does R&D look like in the future and how do we compete with newer digital native companies?

Describe R&D data and how it is different than typical manufacturing data.

Alix: I've had about 12 years of experience within the R&D function and five years within manufacturing. When I think of manufacturing data, I think of it as the original Big Data before web analytics. It’s very consistent, the schema stays the same, so you have the same columns, the same tags and other kinds of information within your database.

In contrast, the role of R&D is always to be innovating and doing something new. There are always new pieces of data you're collecting, new instruments that you might be integrating and variables you might change that you had never changed before. So now you've got a different metadata kind of situation. I think the big difference is the agility and being able to just manage those data flows.

What are some of the challenges you see around data silos and data discovery when you have all these varying shapes?

John: Probably one of the big challenges is around consistency across silos as some of the nomenclature may not be consistent from one silo to another. I think that master data is also key if you're attempting to bring data together from multiple systems. If you are not capturing your identifiers in a consistent way, it makes it very hard to make sense of that data over time. It actually starts in teaching folks that are generating data to understand the importance of capturing metadata as much as possible at the source.

Can you elaborate on your foundational data management approach, particularly around the capture, standardization and discovery?

Simon: If we look at foundational data management, it's not a static platform. Data systems are constantly evolving so what that really means for any kind of foundational data management is you need to move with those changes and some of the more recent external standards. We have to change the mindset from “I have to pick a standard and stick with it,” to “whatever standard we use, we can change and evolve over time as we need to.” You have to transform data into a form that you can actually use. A lot of our work in that foundational platform involves transforming it into a vendor agnostic format using the best external standards that are out there and then making sure that we can transform them as they change over time.

Can you elaborate a little bit on your approach and the role a semantic data hub plays in tying all this data together?

John: The approach is basically to bring data together and connect it so that people can more easily access it. As we're aggregating data, we have to understand what's the context of those data: how should we unify it, should we unify it depending on the use case, etc. If you're taking data from different systems and just putting it into one place, you haven't done anything; you haven't made it any easier to use, you just moved it to a central location. If you want to be able to get the most from those data, master data management is really key. You need to ensure that those pieces are in place as part of your data strategy to ensure that you have the identifiers that are needed when you bring the data together from multiple locations.

How do you make data future-ready rather than future-proof?

Simon: I don't think there's such a thing as future proof because we don't know what the future's going to bring, apart from our data is going to get more complex. We really need to organize our data the best way we can with our current tools. But whenever we do that, we need to make sure that our data is organized so that we can transform it to anything that's coming in. In the future, as you enhance existing information with more data, make sure that those structures and concepts you start to build are extensible. Future ready is a better term because it means that you can transform your data rather than going down a blind alley where you're bought into a standard that's not translatable to modern data.

How is Dow leveraging GenAI to advance research and take advantage of this new technology?

Alix: We have leveraged both traditional AI machine learning and now generative AI. With GenAI, we're still trying to give researchers an easy way to access Dow's extensive internal know-how as well as external information like patents and papers. We think that GenAI along with traditional semantic searching of data can surface some of the context that's not yet structured. We're using GenAI to make that information discoverable and usable to support all the other information that our researchers use to make their decisions.

For example, if you wanted to understand an opportunity to make a new type of plastic, you could look through all of the patent literature about plastic and find out what hasn't been done yet. But that is way too prohibitively time consuming. With GenAI, now there's ways to process that information to suggest new pathways. The key here is acceleration and the ability to use that massive trove of data within Dow.

Can you take us through some of your requirements for success in setting the foundation up for GenAI?

Alix: At the very beginning, it's about how do we architect for scaling versus the need to scale immediately. For a GenAI solution to work effectively, we need to focus on quality, interoperable data as the fuel. This entails marrying aggregation with context; standardizing where you can or at least dynamically understand where things are semantically equivalent and where you might have to bring in ontologies and taxonomies that help to describe those relationships.

Can you elaborate on your security requirements and why it is so important to have those in place?

Simon: The number of years of data we have is incredible. If we left that wide open to the external world, we wouldn't be in business very long. This is why the protection of our data from outside attacks is essential—and that means that we need to put in place many different layers of protection.

Internally, we're also looking at how do we use the least privilege principle in general—do you need access to all the data, or do you really need access to a subset of the data? Additionally, we're working with external partners and are very careful to make sure that information between partners is not accidentally leaked across boundaries.

How would you say Progress has supported you and empowered you to build these solutions?

John: We initially brought MarkLogic in to help with our problem with searching literature. Our prior solution was architected for document management, so it did a very good job with the workflow and sign offs.

Now we're utilizing MarkLogic to help manage literature and enable our researchers to access and find the information they need. We’re expanding that use case to build into new systems such as a data hub where we can aggregate structured and unstructured data from various systems. The data hub is essential for our group's effort to improve the metadata that's missing for both making data easy to access and normalizing the security around the data access. As we start to enrich those data, it becomes a really good source for feeding into GenAI models and other types of technologies.

The entire space of how you produce enterprise GenAI solutions is a field that's only a couple years old, but a few architectures have become very popular, such as RAG, and specifically Vector RAG. The capabilities provided by Progress around knowledge graphs, graph RAG and other types of semantic search are equally as interesting in terms of providing a user with the response they're truly interested in. We're not just putting all of our eggs in one basket with Vector RAG—we're exploring different options and how we can combine the different tools in our toolkit to surface the right information through a hybrid approach.

Watch the entire interview

This post is a summary of the information presented by the panel, which did a much deeper dive into the topics covered. If you would like to watch the interview in its entirety, you can view it here .

Watch the Panel Discussion

Mike Savel

Mike is a Senior Content Specialist at Progress. With a couple of decades of technology writing experience under his belt, Mike enjoys the wonders of a constantly changing technology landscape and putting together the words to adequately describe and market it.

Related Tags

AI Progress Data Platform

How to Get Started with RAG-Enabled GenAI Applications

Discover how Retrieval-Augmented Generation (RAG) is revolutionizing generative AI by enhancing AI accuracy and trustworthiness—and how Progress experts can help you kick off your AI projects in less than 8 weeks.

Data & AI MarkLogic

Monique Bruins March 04, 2025

Introducing Hybrid Search with Early Access to MarkLogic Server 12

MarkLogic Server debuts hybrid search with early access to vector operations, best-match relevance ranking and graph algorithms to help organizations augment GenAI with domain-specific knowledge and securely build trustworthy, AI-enhanced applications with a flexible RAG pattern.

Data & AI MarkLogic

James Kerr September 30, 2024

Leveraging AI and Future Ready-Data to Accelerate R&D

Mike Savel

Related Tags

Related Articles

Latest Stories in Your Inbox