Episode Overview:
Episode Links & Resources:
Good morning. Good afternoon. Good evening. Good whatever time it is, whenever you’re watching or listening to the podcast.
Hey. This is Malcolm Hawker. I’m the host of the CDO Matters podcast that you’re listening to. Thank you for checking us out.
I’m also the, CDO of Profisee.
We make amazing MDM software. If you ever have MDM related problems, hit me up on LinkedIn. Send me a DM. We’ll have a chat about MDM, my favorite topic.
Alright. Today, we are going speaking of favorite topics, the what we’re gonna talk about today is up there for me. So we’re gonna talk today with Andreas Blumauer. Andreas, hello. How are you?
Hello, Malcolm. I’m doing well. How are you?
I’m I’m excellent. So today, we’re we’re gonna talk about knowledge.
But we’re not gonna start our conversation talking about knowledge or knowledge management. This is a a topic near and dear to my heart because I sincerely believe that we as data leaders need to pivot more towards knowledge management. But we’re not gonna start with that. We’re gonna start with a little bit of a history lesson.
We’re gonna talk about the semantic web, and we’re gonna talk about some early efforts to bring structure to metadata in in our world and the kind of the web writ large. Andreas is the founder and CEO of a company called the Semantic Web Company. He’s now a senior vice president of growth at Graphwise. Graphwise is a a combination, recent merger of the Semantic Web Company and another company called OntoText.
OntoText, was, maybe still is or was, a, a graph database company.
If you want to learn more about graph, I actually talked to an extremely smart fellow Gartner alumnus named Summit Powell on episode sixty five of this podcast. So if you wanna go into some of the really technical details of what is graph, if you hear us talking about graph today and graph databases, graph triples, graph nodes, if you hear some things you’re like, what what’s that? Episode sixty five. I assume it is a super, super smart guy and goes into a lot of the details.
The episode is called demystifying knowledge graphs. So if there’s things that we’re talking about today don’t make too much sense, hey. Turn in tune in to episode, sixty five. I also talked to somebody else from Monte text, Doug Kimball, the former chief marketing officer, in episode twenty six, where we’re talking about Graph from a business perspective and why should why should business people care about Graphs?
Why should this even be relevant? So we’ve got a lot of content related to graph on the CDO Matters podcast. If some of the things you’re hearing today don’t make that much sense, check out some of those earlier episodes.
But, Andreas, let’s dive into it.
Tell me, what is what is semantic web? What is the whole concept behind semantic web, and and why is it as a data person in twenty twenty five, should I why is that something I should probably know?
Yeah. Semantic web, it has a really long story attached, to so it’s it’s like started out somewhere nineteen ninety eight, I think it was, when Tim Berners Lee was publishing a little note on its w three c server called the resource description framework, RDF.
And I I the whole idea was always okay.
Obviously, the web became a very busy crowded place with a lot of different, let’s say, formats and and interfaces and whatsoever.
And the whole idea was, okay, let’s build a second web, which is the semantic web parallel to it or also somehow embedded in the meantime, which allows to access machines in a in a really standardized way.
So the data which can be produced in different formats, like, for human beings to be consumed by human beings, but at the same time, also for machines in in a standardized format.
And this format, you shouldn’t just think of how is it serialized. It’s really about how does the what what the metadata and and the description of the metadata, how that should be standardized. So we are all very focused on the metadata, like, in in the first order. But then when you think of this massively, you know, this this scaled scaled out system called the World Wide Web, you cannot or cannot trust that the the authors, all this metadata will have a consistent view on the metadata.
Of course not. So in in also in enterprises, you see that at very quite large smaller scale, every department starts to create maybe some kind of quite, specific metadata, quite not interoperable, really. So the keyword is interoperability at scale, and that was the whole idea of the semantic web that all the publishers of content across different domains should at least agree on the way they describe their metadata. And on top of that, that, the knowledge models, the ontologies and taxonomies describing the meaning of the metadata.
So it was very important that that time, a bit later, yeah, twenty twenty fifteen or so, that the the the the agreement on on on particular standards, was slowing down to some degree. And at the same time, Web two point zero already came up, which was actually siloing all the data. So it’s, you know, you cannot anymore access data, except via APIs.
In in the majority of the cases, platforms have really captured the data and and trust control what what they wanna give away. So it it’s no longer, a standard which is on the world that that powerful as it, was was supposed to be. I think it was at the very early stage of the of the web was a great idea if that would have converged, but it didn’t. Because what’s missing really is also a way to to to let the user somehow control, his or her own data. It was part of the, idea that you can set up a profile and say, okay, what I’m interested in, and then you can say the the one data provider. Okay.
I I I I let you look into my profile from this perspective. I let the other one look into my profile from that perspective. And by that, you you would have still control over your data, but you haven’t. Tim Berners Lee at one point said even I remember he’s quite disappointed how the router web has has developed, in in what it has developed. And he should have, let’s say, enforced the semantic web even earlier more than the the the web of documents as as we know it. Right?
But it’s too late now, he said. And what’s in parallel, was quite nice to see is that enterprises have started to adopt semantic web standards. So actually, it shouldn’t be called anymore semantic web. It could be semantic enterprise standards really because we see a lot of that adoption of RDF technologies, cost, spark, all that is now really all around in in in enterprises which want to implement an enterprise knowledge graph. So that’s the pretty much the same technology under the hood, and enterprise knowledge graph has adopted the semantic web standards.
Okay. Let let’s let’s recap. So is it correct to say that in the earlier early days of the Internet, the World Wide Web, there was an effort to have some idea of open standards related to metadata exchange, and that’s what the Semantic Web represented was just kind of open standards for exchange of data across any resource on on on the Internet. Okay.
Alright. But what instead involved was this closed environment where, I mean, you you could look at it from the perspective of of APIs potentially, where I publish an API, you have to adhere to my API standards, and and it’s very much well, the web obviously isn’t peer to entirely peer to peer. But if I wanna go get data from another company, for example, well, if I wanna go exchange data with you or if I need to go exchange data with some large data provider, it’s largely a point to point interaction through API where where all the definitions are are are part of this kind of one to one interaction.
Is did I did I just kind of summarize things correctly? So what evolved instead of the summer instead of the semantic web? Is it is it what is the what is the closed environment that exists on the Internet today that we just kind of don’t see under the covers?
I mean, what what’s quite prominent, is schema dot org, that was Okay. Enforced by Google also to better understand the meaning of the content and their knowledge graph. So Wikidata was one of the, you know, driving pieces behind the Google knowledge graph. Actually, it was free based originally, then Wikidata came in. That’s all based on the semantic web standards plus schema dot org. And all these additional information was helping Google to build a more competitive search index.
That that’s still not well understood, but that was one of the main reasons others couldn’t do it at the same level of accuracy.
So, obviously, early days already the fusion of of knowledge graphs and machine learning, which which Google brought up to the level where they are now. And and interestingly now, Google also just recently has announced that their internal the enterprise, like, the the big data query engine and so on will be more and more complemented now by by knowledge graphs. So the same pattern or design pattern now start to to get embraced by the by the enterprise data people inside inside Google. So that that that obviously have have learned a lot on the web how to do that and try to bring that now into into the enterprise data environment. Like many others, SAP has just announced, you know, the SAP knowledge graph is orchestrating the agentic, AI framework. So knowledge graph now really prime time. And since you mentioned Sumit, my colleague, Sumit, just before Gartner, now has also put knowledge graph at, finally next to the plateau of productivity.
So it’s really no longer a second class citizen in the AI strategy. Typically, it’s it’s at the forefront and just helps enterprises to make, their data AI ready. And that’s the key element.
Okay. I I was using APIs as kind of a metaphor for data exchange, trying to understand, okay, how is how is data being exchanged at scale today through the Internet?
But what you described was the use of knowledge graphs as a way with with, I assume, RDF being the primary.
Are there other are there others besides RDF in the in the knowledge graph world?
Well, absolutely. The Internal, like Two two there are two well known kind of approaches. So RDF is an open standard, as you said, and builds the basis for the knowledge graph community. And then there’s a a property graph community, And they use LPG as as the, let’s say, underpinning methodology.
It both have their absolutely the the advantages that’s here. It’s it’s not the one superseding the other, but we can come to that later. So it depends on the use case and it comes to the use, the use case and the data, which is is is used in the particular use case if you choose the one or the other. And it’s also a bit of strategic question enterprises start to ask themselves. So and and depending on that, you either go for RDF or for property graphs.
So, I mean, I’m I’m trying to wrap my head around the idea of there being this kind of this common standard for metadata exchange between companies. I mean, in in my world today, this is obviously relevant, right? If I’m trying to share data, I’m a health care provider, I’m trying to share data from one health care company to another health care company. Or if I’m part of a manufacturing kind of supply chain, I’ve got information related to my products or, you know, that are part or supplies and materials that are being passed from one company to another. And I’m trying to wrap my head around historically how we’ve been doing that, right, versus the world that you’re kind of describing. And maybe, am am I focusing too much on the world of structured data versus unstructured data?
Help help me understand the difference between a simple file exchange, for example, where, you know, I I send you I send you a file or or maybe we exchange some data over an API.
What is the difference between what I’ve described and kind of historically how companies would potentially be sharing data between each other and a world enabled by these knowledge graphs that are powered by by by some of these standards like RDF?
I mean, what’s important to state is the the kind of the way metadata is described in in in semantic web standards and knowledge graphs is is fundamentally different to other technologies. So it the meaning together with the data.
Okay.
So It’s a self describe you you would never find a documentation about what does this data mean, in in in our world. It’s just self describing.
And you use ontologies also to make sure that the data is valid. So it it’s, you you know, it’s it you can run tests, and and see is is your data consistent, is it complete, and so on and so forth. And and all that is done in a in a way that it’s self describing for the machine itself understandable. And and and so if you want to build interfaces on top for for the for the people and for the data engineers.
And that’s the beauty of this standard that you can a domain in a in a domain specific way, you can express what’s needed in your domain that your data makes sense. And and those things are called taxonomies ontologies, and those are used on top of the metadata. And you can do it with any kind of data. It can be a document.
It can be a semi structured document, a non a completely unstructured document. It can be SQL data. Can be anything you can imagine. You can just use that as the baseline, put your your metadata, which then again is described, by using ontologies taxonomies in a standardized way.
And this uses a package and bring it over to to to your to your peer. And and and this way to transport data adds something on top. It it’s really more about knowledge exchange than data exchange only. So it it brings data to the next level.
And this is so important in days of AI when AI would love to get to the point where the real info like in rack architectures, you really wanna know, okay, where where resides actually the most relevant data unit, you you need really you need the metadata and the description of the knowledge model on top of the metadata even more than ever before. So now it really starts to pay off to build this knowledge graph like an umbrella on top of all the data silos you have in your company.
And and this exchange of data across silos and enterprises is not the the predominant use case. Of course, also doing to create, external data, like, when you are probably subscribed to a data provider and you wanna link it back to your enterprise data in a meaningful way. All that can be better automated when you have this knowledge graph infrastructure in place.
So wonderful answer. Thank you.
Still my first cup of coffee waking up here this morning. And I that was really, really helpful because I’m thinking here. I was like, hey. Well, you know, I sent files.
I can send a record. I can send a file. But But there’s something inherently different between there being some checksums on a file exchange, right, and and exchanging context and meaning. Right?
Where what you just described, which I love, which is something that is very self describing. Right? I can send somebody a file of customer records, and I know it’s customer records, but that’s pretty much it. Right?
I I I may know that maybe they’re that there’s a bit on, you know, the active flag for customers so that I know maybe that they’re active. Like, I mean, there’s things that I can do in a relational database to kind of of add a little bit of context, but what you could be describing through a knowledge graph is so much more rich because of the kind of the niches of how knowledge graphs work vis a vis these triples. Correct? And that’s that’s is that the self describing that you’re talking about, which is these these triples that are kind of inherent to the graph that tell me things about the data that I would never know from a relational data store?
Yeah. I mean, on a very atomic level, it’s triples. It’s for for me, I I think you should it’s it’s very abstract. I would rather rephrase it to some degree and say, okay. You you have a a record, data record, and you put some metadata on top. Let’s say, I have it in front of me. Let’s say coffee, water and mouse.
Yeah.
Good. What does it mean? Yeah. What what what is it? What what does it mean to the other agents, to the other application?
What’s water? I mean, if if it if it if the data comes from, let’s say, an industry plant and it it means, okay, maybe water consumption.
If it’s, then sent over to a government agency, they probably look at water in a different way. So we we need, first of all, have to a bit of an idea.
So what is meant by that kind of data? And this can be described in a in a knowledge model. And this knowledge model can make use of standards like SCORs and and and OWL and all that. And, yes, essentially, there are triples inside, but this is too small to understand what the semantic web really does.
It’s about the knowledge models, the domain models, which make the difference also to the to the property craft community. They cannot handle that very well. And I think that that’s that’s the point we should look at. So we need domain aware data.
A data set which understands, so to speak, a little bit itself. Yeah. So what I am all about as a data set. Ah, I’m I’m describing the the the water consumption of an industry plant.
So then just expose that. Make it clear to the agents out there, why are maybe MCB protocols even. And and this can be automated with knowledge graphs now. And and and, you know, there are a lot of vocabularies out there which make this content production in a way more interoperable.
For instance, medical subject headings, yeah, is used widely across the medical and life science industry.
A lot of of companies, you know, use MASH.
Where other examples would be EuroWalk. Yeah. That that’s widely used in the European Union.
It’s the European vocabulary to annotate, for instance, protocols on on political fora. Right? So you really know what they are talking about. So it’s it’s it’s not so trivial as you would believe to understand if the data is relevant for me or not.
They cannot be be decided by artificial intelligence that that’s that’s not possible because those guys don’t have the bigger view. Not the the context is missing in in most cases. So in other words, what we are doing really is providing a context engine into this architecture, which allows to to to to to shrink the the the space relatively quickly into the most relevant content nuggets. Right?
It’s really small chunks of content which are most relevant. And vector embeddings, come on. This is not a nice technology to do that. So that it’s random.
That that’s really random. So vector is not good to to to relevancy ranking on on the level it should be.
So your concern with vectors are what is it? Is there too simplistic? I mean, if you’re just kinda looking at chunking out blocks of text and then drawing some conclusion based off the block of text, is that just gonna you’re saying that vectors are limited because they’re missing the broader picture? Is that is that kind of the story you’re trying to tell?
Yeah. I mean, just look at a a simple workflow. Somebody, asks a question in a in a chatbot, and that’s the whole context you get. Right?
And then you you you use a large text to space model to to find relevant information that doesn’t work. So you really need to understand. So where does this person come from? What’s the context this person currently sits in?
And and here we’re talking about domain domains knowledge domains typically. So when working in a particular field of a particular, company or along the lines of a of a defined process to stay compliant with a certain regulation whatsoever.
There’s a lot of context typically available to understand the user intent better, and this needs to be very accurate. Yeah? If if you’re working, let’s say, in the financial sector and and someone asks a specific question, and we we need to know exactly as to what what are what are the regulations, which are, you know, important in such a case. And that shouldn’t be randomly chunked up by a by a vector model. I mean, this is we also I mean, if our brains start to work like like a vector or a rock, then we are done finally. Now this is this is then really the the the complete, idiocracy.
So I I get it. So I can see now from the perspective that you just gave, which is this kind of very self describing, contextually rich source of data, This is something I could easily see why LLMs would be why these are, you know, knowledge graphs are so powerful when it comes to LLMs because they absolutely need that context in order to limit hallucinations, to improve the accuracy results. I mean, I could I could go on and on.
You know, I I think our mutual friend Juan Ciketa has done some interesting research here about, you know, using graphs to kind of ground and improve the behavior of LLM. So I certainly I can certainly understand that. What about can can knowledge graphs play a role from the perspective of governance?
Could I build a, like, a governance graph that help describe potentially rules that I would expect to see from a governance or data quality perspective? Is there and I’m really kind of wondering out loud here without a full cup of coffee in me yet, but I can certainly see how you could use a knowledge graph to make data far more AI ready.
That that I get. I’m trying to understand where a kind of a more of a legacy governance model would fit into this world, particularly from a data validation kind of data accuracy perspective. You have you have you thought at all about this?
Yeah. Sure. So another important feature of knowledge graphs is that they’re very well prepared to do data integration across multimodal data sources.
And so RDF is kind of a a super language if you want. So you can you can map quite everything to RDF.
And by that, you can do, for instance, with the help of Shekel, which is part of the standards stack, set up a a set of, constraints to make sure the data follows your set of rules across the silos. So not just for a particular silo anymore. So probably by that, you detect finally data inconsistencies you otherwise would never ever have surfaced. Right? So step one, you transform and that, by by the way, can be also virtualized. But typically, you materialize it, transform data from different silos into and map it by that into the knowledge graph where you then execute your Shekel, rules, and then you find out missing pieces or inconsistencies, etcetera.
And that allows you really to bring data quality to the next level. Right? And, by the way, it’s not just the LLM then necessarily producing the answer. You can also use GraphRack the other way around if you want so.
That you make sure that in most critical queries the the users have, it’s definitely fact based. So you go back to the to the knowledge base, to the knowledge graph, and enforce the system to give an answer out of the knowledge graph, which obviously is mirroring the proof data. And the LLM you can use to translate the natural language query into a sparkly query. Right?
Sparkly is the query language in the semantic web stack, which, executes on RDF craft sophisticated queries you could never ever execute with SQL databases because too many joins at the same time wouldn’t work in the SQL world. So there’s it’s a very powerful technology stack by that. And, short answer to your question, yes. Data governance with this technology in place probably comes to the next level.
Well, I could see how that would be I could see how what you just described would be very powerful to to identify outliers and and and exceptions, right, or or to find relationships or find conditions in the data that you wouldn’t expect to be there from a quality perspective.
This first, I think, to me, starts to get into things like human in the loop and data stewardship and how you do that in real time is is a little perplexing to me.
But we can put we can put that issue aside. I’m trying to figure out in my head what I was listening to you talk, you know, if, like, let’s go forward a year or two years. And and where does how do we bring everything that you just described closer to the world of traditional data management? Right?
And in and in my world, traditional data management, there’s MDM. Right? Master data management, where where which is where we have kind of managed definitions of things, and we manage quality standards of things. The things being shared data, customer, product, asset.
We’ve got, you know, traditional data integration, right, which we’ve got traditional data quality.
And I’m I’m I’m my head is really kind of spinning because I’m trying to figure out how do we bring these things together or do we necessarily even need to? I think we probably do because the world of rows and columns isn’t going away. Right? We’re gonna continue to have traditional analytics.
We’re gonna have dashboards. We’re gonna have the things we’ve always been doing. But But the world that you’re describing is a very different world, and I’m I’m trying to, in my head, figure out how do we bring these things together. Am I am I am I being crazy on a Friday, or or are these things that you guys are thinking about?
Yeah. I think it it really starts, from the top of the organization. If companies no longer want to to drive the business just based on analytics, but also maybe more going in the direction of knowledge discovery.
Leveraging the existing data across silos, you know, one keyword is collaboration.
Also self-service reporting is another one that more people will make use of data, democratizing the usage of data in general.
I think there’s no other way than moving forward with a knowledge graph because that’s the interface needed between natural language and the data. And, I mean, the very specific data models in relational data, which have nothing to do sometimes with the actual business logic.
Right. Yes.
That that’s the and the, you know, the key element here that the knowledge graph really is kind of a translator between those two worlds. And by that, you can imagine that far more people can make use of the data, ask questions about the data, and it’s going to be translated into, the, let’s say, embedded or in in the internal query language needed. And and the knowledge graph using REF just makes it even broader, so you can you can ask questions across different data silos.
For instance, a customer of ours working in pharmaceutical space, they’ve integrated more than twenty data sources quite heterogeneous, harmonizing all the metadata that was already in place.
Nothing was changed there. So we added kind of a virtual semantic layer on top of that. And now the LLM’s access to knowledge graph to give answers, which can integrate or you make use of all the different data sources in one go. So they have now the infrastructure to to let different stakeholders ask their questions where they can, you know, make use of different data sources that probably would that wouldn’t have otherwise found find even.
So that that it’s an that’s not not uncovered for many people. They don’t know even in large enterprises that this data exists or this this document exists. So it’s it’s really about this strategic decision. Do I want to bring the usage of my data and content to the next level and have simple interfaces for my employees, for my stakeholders that they can ask questions they want and get the answers from from the from the real sources, yeah, without hallucination.
That’s also important in that case.
So so let’s let’s get a little more specific.
I’m a big believer in kind of starting small and and and taking when when it comes to managing, governing data, start small, focus on specific use cases, and grow over time. I could easily see a world where people can be very attracted to, okay, these things called knowledge graphs. This is awesome. I need these.
And I could I could see a world where it’s like, hey. Let’s go. Let’s just go graph everything.
Right? Like, that that could be a very, very time consuming enterprise, I would imagine.
What would a best practice look like for a company who is just starting in this? Is it use case bound? Is it am I am I going to help marketing to execute this one campaign? Or what what is what is what does it look like to start simple with getting getting to know knowledge graphs and getting to integrate them into your data analytics practice?
Yeah.
Before I I give an answer to that question, I I really have to to to miss bust busting here, so to speak. So creating a knowledge graph is is not more sophisticated or more time consuming than data and content management in general.
It’s just Okay.
It’s just a different exercise and a lot of stuff is is automated in the meantime. So it’s it’s not like somebody sits down and and develops the whole day ontologies and and and and and so on. This is no longer the case. So, that was probably the early days of of the semantic web.
Everybody was was kind of opening up a strange tool, creating strange ontologies, and only two people could read that. That’s no longer the case. It’s it’s it’s it’s more or less became a commodity. So we at Graphwise, you know, we also spend a lot of attention on this build your graph part of this loop.
So that the the build your graph is is more or less done automatically in the meantime. Still the human in the loop to supervise the process. And then the graph obviously enhances the the graph rack application.
And and and then you have some some metrics, in in this loop, and then you can automate even the the next iteration and and to determine which part of my knowledge graph are still kind of underrepresented, etcetera. So you really get to this more and more of this closed loop called the recursively self improving AI with the graph in the middle, so to speak, or as an elementary part of this loop. But coming back to your question, of course, you cannot start and say, oh, I will do the fully fledged normal graph, as a next step. No.
You would start, obviously, with one, I would say, business critical application. That’s a good start. It shouldn’t be a nice to have problem you wanna solve, of course, to learn in the first POC. Why not?
But if you wanna start with the first real world and, productive, use case, pick out one use case where you know when you can improve that.
The business impact would be pretty high. And then there’s a methodology in place. It’s it’s no longer like we randomly start somewhere. It’s it’s always the same procedure more or less to build the domain model to, ingest the the datasets, transform, link them, test them, make them available to the GraphRack system, for instance, or it could be still more analytics problem or it could be also a search problem.
So analytics search and Gen AI, those are the three typical use cases then on top of the knowledge graph. And then, of course, it’s going to be tested by the end users who are typically domain experts. And then you iterate maybe a couple of times, and then you have a well working, system in place. And then if if that works for this one particular problem, then, of course, you can apply the same methodology to more and more datasets and more and more departments.
And then eventually, everything grows together in this knowledge graph infrastructure.
We call it the five star journey.
Customers start with zero stars, typically, maybe with one star, and several of our customers are in the meantime on on level five. So you get there, and depending on how strategically important it is for your management, maybe within a couple of months. If it’s always a bit of a side effect only, yeah, let’s see what the knowledge graph can do for us, then, of course, it takes longer. But now I think we we we see more and more enterprises loud and clearly saying, okay. We’ve understood finally, tried out everything, MactoRack didn’t work. Let’s go ahead with with our knowledge or let’s go ahead with Graph now. And and and, well, it it looks looks quite nice at the moment.
Well, something that, you know, I recommend strongly to to my clients and I recommend this when I was at at Gartner, which is, you know, pick a use case, pick a problem that that you need to try to solve.
What I love to use all the time is, like, cross sell or up sell, like, just to understand where there could be additional sales opportunities that you don’t already already necessarily know about. And, you know, in my in my in my world of MDM, that’s creating some ID of a three hundred sixty degree view because you have a single ID that can go be used across the entire organization to link various IDs to the one master ID. I may have, you know, Joe Smith, Joseph Smith, JM Smith, JJ Smith, but they’re all the same person. I’ve got one ID that links all those IDs together, and now I can go build a report that shows all the transactions related to those various IDs. But what you just described goes well beyond that.
Right? I wouldn’t necessarily just be looking at the transactions associated to this person. I wouldn’t I could be looking at other things. I could be looking at that person’s household. I could be looking at other people that were related to him. So I think we we may need to kind of rethink this idea of a three hundred and sixty degree view is not necessarily about just kind of linking IDs together. It certainly is that.
But what you described could go an awful lot further to help describe a company’s relationship with a person or an another person or even with a given material or anything else for that matter. So I I I I would love love to figure out how to bring these kind of these two worlds together because I still think we’re very, very siloed, which is really ironic.
I agree. Right? Given MDM exists to break silos and and you kept using you’ve been using the phrase breaking silos through from the beginning of the conversation about how how to more deeply weave these worlds together because I haven’t I’m not seeing that sufficiently today.
I mean, I I can come up with probably too naive or too simple example, but somehow it’s still it was couple of years ago a large bank came to us to show us their, system, how to manage the the assets. And and there were, of course, there was a lot of metadata on top of this of this data records. Then we looked at the metadata and said okay let’s put metadata on top of metadata on this on this existing one.
Like new and additional classifications and and and also attributes and relations between these classifications to do even probably inferencing and reasoning about it later on. But what that allowed the bank to do instantly, and it wasn’t a lot of work. We we ran a POC with them, It’s just to ask in addition to all the queries and and all the reports they’ve done regularly to ask additional ones. And it wasn’t a lot of work in reality because what we did essentially was what the humans, the the experts had in their brain when they looked at the data, they did it anyway, always kind of but not digitally.
Yeah. They they looked at it and said, okay. If this and this and that is, like, the category attached to this asset, that means to me this and that. But why not putting that on top as a as a digital asset, yeah, which essentially knowledge graph is and the domain knowledge, model is, and and automate on that to some degree.
And and somehow those guys got stuck into in in a in a world where this probably was not possible, maybe in the nineties. I don’t know. I was not. My career started in late nineties.
I don’t know where this comes from, this rather rigid structure, but the knowledge graph is highly, you know, dynamic. If you need additional metadata, just put it there. So it it doesn’t take long, yeah, to to add those categories. So if the if the environment changes and you have a new regulation, just put the metadata there.
Done. So and and then you can start querying over all kind of assets you have. Is it the bank or whoever? It doesn’t it doesn’t matter.
So it’s more it’s not a technical issue. I still I I I in the middle of time, leave more and more. It’s more, oh, we’ve always done it like that mentality. So it’s it’s it it works mentality.
It’s good enough mentality and all that. And and, you know, breaking up the silos sounds so dangerous to so many people. They wanna stay in their controlled environment in the box, and the box is beautiful.
And and, you know, you don’t have to do anything else than what you’ve done in the past. And and this graph is dangerous to many people to many people out there. And and that’s the reason. It’s it’s still, a concept a lot of people try to push away.
But now the, you know, the environment has Well, that’s a marketing problem.
That that’s a that’s a that’s a marketing problem. Right? If if the capabilities are there and the value is there, what you just described was marketing problem, those are the sorry, marketers. Those are the easiest problems to solve.
I would I would answer. If the value is there, the technology is there. Okay. Okay.
Last question because I I do know we’re running low on time. We we’ve talked a lot about knowledge graphs and those insights for AI.
What about AI for knowledge graphs? So I’m very intrigued by using AI to help me find relationships that I didn’t even know exist.
Right? So today, human beings are still largely driving these the models that are underlying these graphs. Is that correct? It’s it’s mostly humans that are kind of driving that process?
Yes. The that’s that’s still the case. But as I said before, we at Graphwise have already changed that and have, built an LLM driven model, which does exactly what you just described. So it it adds, own elements to the knowledge model, on its own.
Still, we let the subject matter expert, supervise that process if needed, but it generates very good quality. So, I mean, I really have to say that the the the profession of an ontologist and taxonomist now tremendously changes these days. They did like all the other jobs out there, obviously AI support, and the the right usage of AI is is key. I’m not saying they’re obsolete, not at all.
They they remain, but their role will change. So it’s more becoming a knowledge steward because they also have to find out what needs to be added to the knowledge graph to make the biggest business impact. By that, they need to understand business and and what’s really needed more than ever before. They cannot break the models, like, separate it from from the rest of the organization.
It’s really it needs to get an integrated function because the actual creation or this nitty gritty creation, new concept, new synonym or whatsoever is no longer where you spend the majority of your time. This is done by the LLM.
Of course, you still need to guide the LLM and bringing, you know, the the scoping, for instance, is very important. So what do we want to to to change in our business to to be more successful. That that’s that’s what the knowledge steward has to to find out.
Well, the same is true in more kind of traditional, you know, BI worlds.
I know this isn’t gonna be a very popular statement, but I would argue that data modeling is is very quickly becoming, shall we say, more of a legacy capability. Now data modelers I know that are extremely good, They would argue, well, I can see things in the future and I can foresee things that humans, you know, that that the AI can’t.
That may be true. That that doesn’t sound very data driven to me, but that’s fine. Separate, you know, you know, conversation over a stein of beer maybe one day about the relevancy of of human modeling versus AI driven modeling. But one thing I know for sure based on everything you just said, in my lifetime, I’ve had the greatest success providing insights when I was able to tell business people things they didn’t know.
Right? Like, the things that that drove the, oh my god. I had no idea. Right?
I had no idea that we were selling this product to this business, or I had no idea we had this risk in our supply chain. Right. Everything you just described, if you can if you can actually have AI find patterns in data that you at scales that that humans simply cannot, like, that’s the I need some of that. So that’s that’s been my experience.
And that seems like what you’re see you’re kinda suggesting here.
So that’s very exciting.
Right. Yeah. I I agree. It’s it’s it’s an assistant, which which can come up with suggestions, which are always you need to approve them. You need to find out if they’re really true, especially in the domains we are working in. But it accelerates the way to get to the point where such engines can can really become a trusted source.
And and that that’s where we are getting now. So these trusted sources are needed. I think in the still, for many businesses, not we are not there yet. So they have this situation where they say AI works in, let’s say, simple tasks already very well.
But when it comes to very specific complicated tasks, we still cannot use it. That’s where the knowledge graph comes in because what’s missing is the background knowledge, the domain knowledge. And to bring this domain knowledge even faster into this system, you can also use LLMs, but still you need the human in the loop. But when you close those those loops into one, then it finally comes to to the next level.
You get to the next level.
But, yeah, there will be a bit of a, let’s say, a rush out, regarding the the the roles in in a company. I’m still very, very positive and optimistic. We cannot just now cancel a lot of junior jobs because what’s coming next if we don’t have any juniors? Exactly. Yeah.
In ten years, fifteen years, the organization is done. So we still have to educate our people that they understand the whole thing.
And and yeah. So it it’s going to be interesting how different organizations will, yeah, fail with that and become too greedy. I think those who will now become too greedy, well, in ten years already have a big problem.
Yeah. You mentioned trusted source. That’s the world that I live in. Our challenge historically around the trusted source, I I think, has been this deterministic way of looking at it.
Either it is or it isn’t. And what I love about what I what I heard in our conversation today was that these graphs could provide the context needed in order to say, okay. For this context, this could certainly be trusted. In this context, maybe not.
Right? And it’s it’s all about the context and how the how the data is both produced and actually as, you know, consumed as well. So, Andres, thank you so much for spending an hour with the community today. Really, really, really appreciate it.
If you’re still listening, please take a moment to, you know, thumbs up on the video on on YouTube and maybe even subscribe to the channel. That would be awesome. We do this every two weeks. We’re talking to some of the brightest minds in data and analytics across the globe.
Andres, thank you again.
Thank you, Malcolm. It was a pleasure.
Wonderful. Alright.
Bye bye.
Stay tuned, folks. Check out another episode of the CDOT Matters podcast, another two weeks. I hope to see you sometime very soon. Bye for now. Cheers.
Cheers.
ABOUT THE SHOW
