Large Language Models VS Knowledge Graphs in Creating Golden Records for Material Master Data
One of the key tasks in the successful management of businesses related to supply chains is accurate and unambiguous identification of materials (products, equipment) passing through the supply chain. This article features servicing of supply chains with the use of Enterprise Resource Planning (ERP) systems.
A well-known approach to addressing the challenge of material identification in ERP systems is the creation and maintenance of material master data that encompass golden records, digital twins of materials.
The implementation of a centralized management scenario is a recognized and widely adopted approach to addressing the task of creating and maintaining material master data in ERP systems. This scenario involves development and implementation of a Master Data Management system, such as SAP MDG with the MDG-M domain.
However, it is feasible to have golden records for material master data prior to the go-live phase of business processes developed within the Master Data Management system.
Based on our experience in the implementation of master data management systems using SAP MDG, these golden records are typically created in a separate workflow that runs concurrently with workflows related to the development and implementation of business processes.
Development of corporate materials master data is a labor-intensive process. It involves identification and engagement of business owners of data subdomains, such as specialists responsible for the procurement or operation of cable products, pipeline fittings, etc. In the oil and gas industry, the number of top-level subdomains may be dozens and the typical overall quantity of material classifier nodes several thousand.
With the emergence and explosive growth of the number of users, implementations, and applications of Large Language Models (LLM) technologies like ChatGPT/OpenAI, LLAMA/Meta, Bard/Google, and BingAI/Microsoft, the question arises, if these technologies are appropriate for creation of material master data golden records. The objective is to enhance data quality and reduce the workload on this workflow in Material Master Data Management projects.
An obvious approach to assessing the applicability of LLM in the creation of golden records for material master data would be to process a representative sample of a source data records, compare the results with the control sample, and analyze the statistical findings. However, this approach would miss the intricate details of a golden record creation process, which is crucial.
Therefore, in this article, we thoroughly examine the pros and cons of two typical scenarios for material master data golden records generation, namely BingAI/Microsoft with LLM and Knowledge Graphs (OWL/RDF ontologies) implemented in the IBA HOTD system.
*The definitions of the advantages and disadvantages of each approach were prepared using BingAI.
Let’s consider a typical practical example of forming a golden record for material master data based on incomplete/partial descriptions from the requisitions list.
In the first example, all values of an elbow attributes (bend radius 3d, angle 90 degrees, outside diameter 21mm, wall thickness 2mm, steel grade/number 1.4307) are existing and described in the text of the standard BS EN 10253-4:2008 as permissible.
In the second example, (elbow, bend radius 3d, angle 90 degrees, outside diameter 21mm, wall thickness 16mm, steel grade/number 1.444), intentional errors are made – a comma is omitted in the wall thickness value (it should be 1.6), and a non-existent (not specified in the text of standard) steel grade/number is indicated, 1.444.
Retrieval of the Product Name and Description Using a Combination of Values of Attributes Provided in the Standard
Analysis of the product description based on the identified characteristics (attributes)
Retrieval of the Product Name and Description Using a Combination of Characteristics Not Provided in the Standard
1 In this case, the HOTD search algorithm utilizing a knowledge graph generated several values that don’t fully match the original values in the search string but are acceptable according to the standards’ texts.
Analysis of Product Description Using Identified Characteristics (Attributes, Properties)
While the progress in LLM looks promising for creation of golden records for materials master data from potentially incomplete and inaccurate source descriptions, we cannot rely on LLM as a mature technology capable of significantly reducing labor costs compared to alternative approaches, particularly those utilizing knowledge graphs.
To be able to build golden records for materials master data with trustworthy results, the LLM technology should address the following challenges:
- The system needs to augment incomplete material descriptions based on characteristics presented in text of standards.
- As text of standards often contain tables of various forms and sizes for description of permissible combinations of characteristics values, the system should be able to interpret these correctly.
- The system should not generate combinations of mandatory attributes and values that do not exist in the standards or, at the very least, notify about such instances.
- During system’s retraining, the earlier standard results should not degrade. Therefore, a control mechanism is imperative to manage this behavior. However, the LLM system is “a black box” that lacks transparency, unlike in the case of knowledge graphs.
Reach us, if you want to prepare your material master data or to know more about IBA Group’s experience in implementing and operating material master data management systems.