Culture
Data Management
Data Professionals

The CDO Matters Podcast Episode 71

Data Virtualization Demystified with Alberto Pan

X

Episode Overview:

Have you ever wondered exactly what Data Virtualization is?  What about a virtual data warehouse?  If you’re a CDO eager to learn why Data Virtualization should be a part of a strategy to modernize your data ecosystem, then this episode of CDO Matters with Alberto Pan, the CTO of Denodo, is for you!  

Episode Links & Resources:

Good morning, good afternoon, good evening, good whatever time it is, wherever you are in this amazing planet of ours. I am Malcolm Hawker. I am the host of the CDO Matters podcast and also the CDO of Profisee.

If you don’t know about Profisee, check us out. We make some amazing MDM software, but that’s not why we’re here today. We’re here today to talk to mister Alberto Pan. He’s the CTO of Denodo, and we’re gonna talk all about what I’m just calling loosely modern data architectures. We’re gonna focus on, the data estate, building a data foundation, building a modern data ecosystem.

So we’re gonna get into a bunch of different topics today. Maybe we’ll talk a little bit about fabric. Maybe we’ll talk a little bit about lake houses and swamps and warehouses and where things are going there.

Can’t have a data related podcast these days without maybe talking a little bit about AI. Maybe. Just a little bit.

It does, hopefully, that that’s, grabbing your interest. Alberto, so nice to see you. Glad to, glad for you to join today.

Thank you, Malcolm. Nice to see you. Thanks for having me.

You’re very welcome. You are you are you in Spain right now?

Yes. Yes. Because, actually, they know this is a US company, but, actually, the regions of the company are in Spain.

Actually, in the northwest of Spain, in Galicia. I don’t know if you are familiar with the with the geography of Spain.

And, actually, most of the research and development team, is still based here. So I’m also based based here in in Spain. Yeah.

I’m I’m kind of familiar with your part of the world. I I like to do photography, and I watch a lot of landscape photography, things on YouTube. And there’s some amazing coastline where you are. I’ve heard the coastline looks just brilliant with rocks and and and cliffs.

Yeah.

It looks brilliant.

Yeah. Actually, it’s called, the coast of death because yeah. You know?

I shouldn’t be laughing about the coast of death for that. Okay. That doesn’t sound like a place where you wanna go vacation.

Yeah. Yeah. Yeah. No. It’s a fantastic place, actually, for for going on vacation.

But it was not such great, you know, if if you’re warning in a ship in the middle of a storm Okay.

Because the coast is really, really rocky. So it can be quite dangerous for ships, but it’s amazing for people and a very, very beautiful place.

Okay. So it’s the coast of death if you’re in a if in a boat with poor navigation or poor propulsion.

But for vacations, it’s it’s it’s wonderful. Okay. That’s good. It’s it’s good it’s good to know that. Let let’s talk a little bit first about data virtualization. Let’s let’s assume that I’m a newer CDO or I’m a newer data leader. Maybe I’ve been promoted out of a business position.

Maybe I’m somebody that is now responsible for data and analytics practice or defining some governance policies, but I’ve come out maybe of a business position.

And I’m looking at all the technologies, and I am I’m overwhelmed, man, because there’s a lot of stuff out there and there’s a lot of buzzwords.

Help that new CDO understand what data virtualization is. How is that different than, like, data integration? Why is data virtualization an important part of a of a modern data ecosystem?

Well, yes. Well, data virtualization basically allows you to create a logical data layer layer on top of multiple distributed data sources.

So if you have data distributed in multiple data sources and you want to combine that data and exposing in the language of the business for, you know, for, maybe for creating reports, creating data products, or for exposing the data in the same data certain application needs. Data virtualization basically allows you to do that, allows you to create these virtual views that, will appear to your consuming applications as virtual tables as if they were accessing a a database.

But, actually, behind the scenes, the engine is going initially in real time to the data sources to get the data that is needed to answer, each question. It’s it’s one of the queries and return you to the to the acoustic applications in the in the appropriate, form.

So it’s a way of virtual data integration. Let’s see. Because you don’t need to move the data to a central system before starting to query.

Okay. So how is that different than a semantic layer? Or are there or are they different?

Actually, it’s, conceptually speaking, it’s very, very similar to a semantic layer. Right?

I could say that it’s a particular type of semantic layer. For instance, in this case, typically, the metaphor that we use to expose the data is relational.

So, basically, you are accessing virtual tables.

And those virtual tables Yeah. As you say, are combining data from several data sources and are exposing that, yeah, using different naming conventions, using, typically well, the language of the business. So to that respect, it’s very similar to a semantic layer. But for instance, other semantic layers, for instance, tend to use a multidimensional, model to expose the data. In this case, it’s it’s typically, relational.

But, yeah, in a nutshell, the the the idea is is very similar to hide the complexity of the data landscape, also to allow the data landscape evolve, without affecting in any way the data consumers that always get this consistent view of the data in the language of the business.

Okay. That’s helpful. So let me let me summarize if I may and keep me honest. So it is it is a logical layer that allows you to, in essence, virtually break data silos.

By by having a single place to go to get data for a common maybe a common object like a customer. Right? And maybe customer table one, table two, table three, table twenty. We’ve got twenty tables of customer data, but you want one place to go get it. How does a how does a virtualization layer handle the fact that its customer over here and its account over here and its person over here? Is there is there a management layer there that that allows you to kind of virtually map those together? How does that how do how do those things get linked?

Yes. Data virtualization software, like like, the node has this type of, let’s say, data integration and mapping functionalities.

Okay.

We can also, of course, leverage specialized systems like NDM systems, like, that in many cases have already created, like, this, let’s say, that well curated and with high data quality views of key business entities. And then we can also combine that with all the additional data sources. For instance, with transactional data or for data coming from SaaS applications or for data coming from, external APIs to, let’s say, farther and reach, this, the information about those, key data entities to feed the needs of a specific business use cases. So, yeah, in summary, yes, on one hand, data virtualization includes, data integration capabilities to the to perform this type of matching, of matchings, mappings, and so on. But we can also leverage a specialized systems. And then in that case, you can simply put data visualization on top of that, and it will provide a unified view over this NDM system and all the other systems that you may have in your organization.

Okay. But I I assume this is all read. Like, this this is all re reading databases.

Is is is there does it support write?

Or or matters that It it also supports right when the data source also supports rights.

Right? So for instance, if your data source is a database, it will you will be also able to write to the to the to the database. But for instance, in in many cases, it happens that, for instance, maybe your data source is an API or a software as a service application, and and maybe those operations are not are not allowed. And, also, when you are creating unified views, you know, it’s the same like writing a view in a database. Some views cannot be updated. It depends on the definition of the view. But when when the data source can be updated and your definition of the view also allows that, then, yes, data visualization will allow you to write back to the to the data sources.

Okay. So what what are you persisting in this logical layer? Are you persisting just links back to to the source? In in essence, are you and I know this is gonna sound like a drastic oversimplification.

But are in essence, you managing kind of a lookup table as it were to that points back down to to all those various sources, or are you actually persisting chunks of the data itself?

I suspect the answer is no on the latter. Correct?

Yeah. By default, all the access to the data is done in real time. But, actually, we include, you know, many functionalities also for caching for many types of caching. Right?

It may be, for instance, maybe you want to materialize or to cache the data of this particular data source, but not the data from this other data source. Or maybe you want to cache only the data, that is accessed more frequently, or maybe you want to have cache pre aggregated caches like materialized views, idea like materialized views. So for instance, when you are computing aggregates, you don’t need to start from scratch, for every query. So let’s say, by default, JS, by default access is in real time.

We have a really sophisticated query optimizer that the user queries in such a way that we maximize query push down to the data sources and minimize network traffic so you don’t need to, you know, to bring a lot of data through the network.

But we also complement that with this, advanced caching functionalities that are transparent to the user. Actually, we are even able to use AI, automatically select the best caching strategies for each use case. So, yeah, behind the scenes, there are there is also some partial copying of the data. But, from the point of view of the user, this is pretty much transparent.

Got it. Okay. So where would this virtualization layer kind of physically sit with within within the architecture or maybe logically sit? It does not necessarily physically sit. Is this something between the source applications and a data warehouse, or is it generally something kinda sitting beside the the data warehouse? Where do these things fit together?

Typically, the the most typical pattern in the most typical pattern, data virtualization works on top of the analytics data sources. So for instance, works works on top of, maybe one or several data warehouses, one or several data lakes, ODS ODS, data Mars, and so on. So it’s like a the logical layer between the analytical data sources and then the analytical applications. For instance, I would say that the most typical consumer of of of the node abuse may be Power BI, right, or or this type of BI tools.

It’s it’s only a particular example. Right? Also, data science tools, also custom applications. Right? But, typically, the most most typical pattern is where, Denodo provides or data authorization provides this unifying analytic view.

Gartner, for instance, call this the logical data we have. Right? Yep. So you have multiple analytic systems that are specialized for different analytic needs.

Because as you know, in modern organizations, in in big organizations today, the the the data needs are very diverse.

In most cases, you are not able to attend all your data needs with a single system in an optimal way. So in a natural way, it happens that in bigger organizations, at the end of the day, you have different repositories. Maybe some of them are on premises. Some of them are in different cloud providers.

Maybe one of them is more specialized in data science use cases. Another one more in business intelligence use cases.

Data virtualization provides, like, the unified semantic layer or the unified business data layer, on top of that, where you can abstract the consumers from where the data is really coming from, and, also, you have a place to, apply a single entry point to apply consistent, security and governance policies.

Got it. Okay. So I’m having some flashbacks to my time at Gartner when we first started talking about the fabric, and there were a number of analysts that were saying, well, how is this different than the logical data warehouse? We’ll we’ll we’ll talk about that because I I’d like to I’d like to answer that question and get into it a little bit, but I do wanna tie off in some of the the common use cases of of where this is useful. I think you you mentioned you mentioned a few.

I have to imagine that your customers tend to be fairly large organizations with potentially multiple databases, multiple warehouses, lots of mergers and acquisitions where new data sources are getting added all of the time, and it’s more effective or efficient to virtually connect them to than to physically connect them. Does that sound correct?

Absolutely correct. Also, in this type of organizations, you know, you know, the the the typical architecture of having to centralize all the data is is very hard in this type of organization. Right? For for many reasons.

Sometimes it may be for technical reasons because moving the data from some systems to to to a central repository may be hard. But it may also be for legal reasons. Maybe some data cannot be moved outside of a certain geographical region or for organizational reason reasons. Right?

Because, you know, different business units in many cases want to retain ownership of the data. Right? So, and also, as I mentioned before, because the needs the analytic needs in these organizations are very diverse. So it’s it’s very, very hard to have one single system that is the most optimal in performance, functionality, and cost for all your needs.

That’s that’s very, very hard in big organizations.

So that’s where, you know, this abstraction layer basically allows you to to, let’s say, use the best system for each use case while still having this, consistent data access layer and this consistent, security and governance layer to for delivering the data. So that’s really, the another sweet spot.

Got it.

Let’s let’s talk a little bit about the idea of kind of centralization versus decentralization. There’s a lot of the conferences I’m going to these days, there seems to be this rebellion against what people call, you know, centralized patterns of of of anything.

Is is it correct to assume that an enterprise class virtualization layer could support domain driven design, could support the idea of this domain having this rules and this this domain having this set of rules and this domain having that set of rules. Is that is that a pattern you see?

Yes. Yes. As you know, in in in the last few years, we we we have heard a lot about this type of decentralized management patterns. It’s probably the most radical way of doing the of doing this would be the data mesh.

But but we can also have, like, mixed patterns. Right? Because probably not every organization is ready to go with the data mesh, you know, on day one. But, certainly, you know, the the a big problem with with centralization are bottlenecks.

Right?

Not only centralization from the technical standpoint of of of needing to use a single system for everything. Also, having a single thing in terms of everything, that that that creates a lot of bottlenecks. Right?

And you also know what happens with bottlenecks when the business, in many cases, cannot wait. So if you are spending weeks in delivering the data that is needed for a certain business need, then, you know, I think it’s very likely that the business will find a way to get the data anyway going around your system. And then at that point, you don’t even have the benefits of a centralized system because you don’t even have unified governance anymore or unified security anymore. You are creating cyber IT. Right? And that’s why I think these patterns, like the data fabric that provides you flexibility for decentralization. That does not mean that you need to go full decentralization because, you know, it depends on the organization, and and it’s also clear and obvious that in many cases, consolidation, trying to consolidate certain systems is a good idea.

But you should not be forced to always consolidate everything in the same place. That’s very, very rigid and very hard to sustain in time, I think, for a big organization.

Yeah. I I fully and completely agree. I mean, the the right answer almost always seems to be a hybrid approach with a certain degree of centralization, a certain degree of federation or decentralization.

Because the fact remains and I’m getting on a little bit one of my soapboxes here. Is the fact remains is that there are centralized uses of data that occur at high higher levels. The the c level wants to see things a certain way, and they’ve got their definitions, and functional levels have their definitions, and they can both be correct. So I I’m anything that is either centralized or fully decentralized, generally the case that I’m making is both.

One thing to keep in mind, I did actually have a podcast. I was trying to find the episode number. I’ll look it up here in a minute. I did have a podcast a few months ago where I went into the differences between MDM and a semantic layer, so we won’t cover those here.

I’d invite you to go check out that other podcast that I did, CDO Matters. I will save the episode here in a little bit. But where I go into detail around the difference between an MDM system and a semantic layer or what we’re talking about here as a data virtualization layer, same largely operating the same way. The key difference that I would say is that, MDM operates at a at an individual record level.

Mhmm.

Right? You can actually you know, it it it will it will master and manage and link and merge at a record level where the rules can get very, very, very specific into how you actually link an account to a customer or link a party to an account to a customer record. The rules are very granular. There’s data stewardship.

There’s there’s workflow component to it. Anyway, check out that previous episode, MDMs versus semantic layers. If you’ve got some questions, if you were listening to Alberto talking and you’re saying, ah, wait a minute. That sounds a little bit like MDM.

There’s certainly some overlap. There’s certainly some but but, guys, that’s not uncommon in the world of data management. There’s overlap between MDM, virtualization layers, integration layers, data quality tools, data governance tools.

If if overlap makes you uncomfortable, well, then this world, will will probably cause you a little bit of discomfort because there’s certainly a certain degree of of overlap in all of these tools. Let’s double back, Alberto, to the the conversation about the logical data warehouse and the fabric. Again, I’m a newer CDO. I may not have a ton of technical background. I’m hearing a lot of buzz about a fabric, and this this idea of a logical data warehouse makes sense. Is that what you described? I I get that.

Help me better understand if there’s a difference between that and a fabric, or are they synonymous?

Yes or no?

It’s just, they are not, synonymous, but I I I typically say that the data fabric is the logical data warehouse on steroids. Okay.

Mean meaning that, the data fabric serves with the logical data warehouse this idea of, k, you will have multiple distributed sources specialized in different, in different tasks. You want to provide a consistent beyond that view overall that in the language of the business that is stable. That’s common with the logical data warehouse. And then on top of that, data fabric, I think well, there are a couple of things that it does, but probably the main one is AI and metadata, and being metadata driven. Right? This idea of, okay, I will collect all type of active metadata.

Like, for instance, who is using what data products, when and how.

Also, I don’t know, bottlenecks or what data sources are typically used together. And I will use all that to add intelligence to the, to the architecture. Right? And this, may mean, for instance, intelligence in terms of, for instance.

So, the system is able to automatically, for instance, suggest where you where you should move this particular workload to optimize your cost. Or maybe when you are creating data products, you get you use past activity in the system to get automatic recommendations about how to combine data, or things like that. Right? So I would say that’s one of the key differences.

Another key difference, I I I think, is also the the idea of catalog or marketplace. So I will say that in general, the the fabric is much more, let’s say, data product oriented. Mhmm. While the logical data warehouse maybe was more, you know, more focused specifically on data integration, while the the the the the addition you know, on top of their integration to actually create data products, you need a lot of metadata.

Right? About a a a lot of data about how this should be used, what what are the required levels of data quality. You also need an infrastructure to discover data products. Right?

So that’s one thing that the fabric puts on top of the of the logical data warehouse. And the other key one is this automation and use of AI based on on the active metadata.

So, yes, it’s it’s very very similar. The content is very similar, but I I would say is, like, it incorporates more, advanced concepts. Right?

Yeah. For sure. I I love that definition. The the data virtualization a logical data warehouse on steroids, but I’d even go a little bit little bit farther.

I mean, in in at least in Microsoft’s version of the fabric, you know, you could you can you can spin up spark jobs. You can spin up data pipelines. There’s a lot of things that you can do in there that I would never expect a virtualization layer to handle. So maybe one way to think about it was that, you know, in Gartner’s this is just Gartner’s view.

But in their original view of the data fabric, an enabling a critical enabling capability was data virtualization. Right? The ability to get data from from anywhere across a a a a very diverse ecosystem.

One of the layers in the fabric was and is, I would argue, just this this logical data warehouse. Then on top of that, at least what Microsoft is doing is layering a whole bunch of other capabilities, including data governance, including metadata management like you were talking about, including even MDM. Profisee just released a a native fabric workload where you could actually instantiate and create an an MDM workflow from within fabric. So great way to say it. It’s it’s it’s a lot of things on steroids, but, you know, I I wouldn’t be confused if a lot of people have kind of or wouldn’t be confused.

I wouldn’t be surprised Yeah. People have some confusion around what the fabric is because it touches on a lot of different pieces.

Yeah. Absolutely. And it’s also true that sometimes, you know, different vendors use the terms slightly different. Yes.

Probably, the concept of of fabric in Microsoft, I would say, is not exactly probably the the the original concept from Gartner. It’s slightly different. But, yeah, totally agree that I think one key idea in fabric is this idea of complete platform. Right?

Yeah. I am so the and that means a lot of things. That, for instance, means, obviously, some of the things that you mentioned. Right?

Like, for instance, not only logical access to the data, also the ability to quickly, spin a new system that is able to that that you can use on the fly to process data, for a certain specific need and so on. So, yeah, it has this idea of being a more comprehensive platform, but many of the of the key concepts are are still are still common. Yeah.

Yeah. And and you’re absolutely right. So the the way that Microsoft is describing a fabric today is very different than Gartner’s original version of a fabric, which goes way farther. So Gartner’s original version for that fabric did get into the use of metadata to automate data governance, data management, even to get into, when you start to leverage in AI and Gen AI, to get into things like automating like, building pipelines, automating of of mapping of elements between source and target, all of that stuff.

That’s cool. That’s that’s cool. But there is an enabling component there of this logical data warehouse that that is critical and has and has to be and has to be there. So, yeah, it’s boy, if if I was new to this world and there there’s so many technologies out there and there’s so many buzzwords, just cutting through that stuff would be trying to figure it all out would be a month months long enterprise.

Let’s talk a little bit about about data products. You mentioned those those previously.

Now you’re not a data marketplace per se, but I can see how this logical data warehouse would play a critical role in helping you support a data product strategy. Tell us more. Where do you think?

Yeah. Well, at the at the end of the day, basically, a a data product is a way to expose data in a way that can be easily used and also easily reused, right, by to create other data products. And guaranteeing also, like, a common common management of all these data products that maybe are being created by by several teams, by several independent teams. So if you think about it, that’s exactly what, in many cases, what data virtualization does. Right? With data virtualization, you can create, let’s say, business friendly views of data that is in any data source. Also, by the way it works, it’s very natural to reuse, to reuse data products because, actually, in in data virtualization, typically, you have a layered architecture of components where one component is defined using other components.

So that provides a very, very natural way of of doing reuse. Also supports very naturally the things that you need to ensure interoperability between data products. Right? Other thing that that you typically want is, okay, if I am creating a data product that provides customer data and you are creating another data product that also provides customer data and reach with other information, you want those two data products to to use consistent semantic consistent semantics. And, also, you want them to be able to interoperate.

In data virtualization, you have, for instance, the concept of top down modeling that basically means is what we would call today contract based development. So, basically, when you are creating a data product, you can enforce contracts for the for the data products.

So, basically, it allows you to have multiple teams that are creating data products using their own systems. They don’t need to use, the same system for everything. But at the same time, you are able to guarantee, consistent semantics, also to provide a way of reusing components, and also a way to enforce consistent security rules, security and governance rules. So it’s one of our most typical use cases today.

So that that last that last bit when you started to talk about contracts and enforcement enforcement of data governance and and data security rules, that last bit is, I think, is is really important because today, what I see with a lot of data catalogs is that the catalogs may be documenting all the metadata. They may be documenting, you know, this this is this field is this wide. This is an integer. This is a varchar. Like like, they may be documenting the the components of what could be in a data contract, but they may not necessarily be enforcing them at run time, if that if if that makes sense. What you described is that if I’m running a query and if I or if I just wanna execute an API, hit hit some API, at the point of execution, you can enforce those rules, which is a really important component here because for so many data management systems out there, they’re not actually enforcing governance policies. They may just actually be defining them.

That’s interesting. That’s interesting.

Yeah. I think one problem with traditional catalogs is that. Right? Especially in big organizations, it’s very easy that they get a bit a little disconnected from reality. So so what you see in the catalog is not always what you see in reality.

So I I I think one key idea of the fabric of the of the data fabric as originally defined by Gartner was also removing that that mismatch. Right? Let’s say that the the the the place where you define the policies should also be the place where the policies are enforced. So that that means, at the end of the day, that you have this you need this common layer, this common access layer.

So the policies are enforced there. Right? I I think this a really, really, powerful idea, that actually has a a drastic impact. You know, because sometimes when you talk about this data fabric or logical data warehouse or logical architectures, we focus too much on maybe on the data integration part, but I think this part is probably the one that provides the the, you know, the the the greatest benefits.

Right? When the the policies that you define there, by definition, will be applied. Right?

Also, if you have if you’re in a distributed environment, if you have multiple things, creating data products, you can also use the idea of federated governance. Meaning that, okay. Yes. You’re creating data products with the data in your system, and you are even you even have the the freedom to decide who can access your data products.

But at the same time, I am able to enforce policies that will apply to all data products. For instance, because of regulations or or whatever. For instance, here in, you know, with GDPR or GDPR policies or whatever. Right?

So this idea of, okay, you have the freedom to use your to have the data in your systems. You design your products with a lot of autonomy. But at the same time, the policies that I have established are not only in a in paper or in a static catalog. I can’t guarantee that they will be applied.

I think that that’s really powerful.

I I I totally agree, and I would I would strongly advise, you know, all the CDOs out there that if you’re looking at data governance software, including a data catalog, ask the question of how is this used to enforce the rules.

And I think what you will find is is that the answer will typically be, well, those rules are enforced within the operational systems that are managing that data. Right? The rules are enforced in the CRM, or there are rules are enforced in Power BI about who can log in to Power BI. But that may not be enough.

Right? So what you’re hearing Alberto say here is that you can enforce at run time who can see this, who can do this. You can set up these contracts which have you know, create a structured relationship between producing consumer data. I love it.

Alright, Alberto. In our last few minutes, let’s let’s talk a little bit of, of course, about the thing everybody’s talking about. Let’s talk about AI.

But I don’t think I don’t think we need to talk necessarily about how a logical data warehouse would enable the use of AI because I think, to me, that’s kind of slightly self evident. Right? If if if you’re creating a single place where you can run a, you know, machine learning process, some sort of, you know, spark process against that data to go run an algorithm, do anything. I’m a little more interested to understand what what you see where you see things headed from the use of AI within products like Adonoto and others that you compete with. What do you what do you see over the next few months and years? Where do you see AI kind of helping to augment or automate some of the the management and processes that that that these logical data warehouses support?

Well, for instance, in development, I think we will see, in the development stage, for instance, when when when creating integration pipelines or when defining virtual views or data products or whatever, I think the the new models that are starting to appear that are not pure large language models, I I am thinking, for instance, in o one. Now, r one, r one deep sick that, you know, is the the the the model of the of the week. Right? Yes.

Of the week. Exactly. Yes.

Exactly. And that, I think these these models are are starting to get really good, you know, all the flavors of code generation, and that includes SQL generation or the language the languages that that you use to define pipelines. Right? Python generation and so on.

So, I think obviously, this is already happening. All data management products are already incorporated in some type of AI assistant that helps you to, you know, at certain stages of the development process. But I think, like, in twenty twenty five, we will we will see a a big jump. Right?

It’s it’s like the assistance so far, like, have been restricted to relatively small things or, I don’t know, some types of restricted automatic suggestions. I think in twenty twenty five and and maybe also the first half of twenty twenty six, because as you know, it takes time to to to get this into the products.

We will see, you know, a clear a clear jump. I think these models are are are getting really good at at this type of of particular tasks. That maybe, you know, the the older generation, the GPTs, and so on, were also good, were also useful, but maybe still not that that good as as I think this new generation is going to be. And, obviously, also, in all that is related to data democratization, that that that will be obviously important. For instance, in data management, every every user interface now also includes a natural language interface, and and and and the quality keeps improving.

And, also, it’s getting easier and easier to fine tune models and even to have your local models. Again, DeepSig may be another example. And that means that you will be able to customize them more, and and that means that it will be more effective for your particular use cases in your organizations, and it will be more and more reliable.

So, again, that’s something that we are already seeing today, but I think it will accelerate. Right? It will accelerate this year and this year. And I think the change will be quite evident.

Yeah. I I couldn’t agree more. I just spent the weekend in Austin, Texas at day to day Texas and listening to, a bunch of presentations over the last couple of days, with with the complexity of these rag patterns and the complexity of the patterns of of being used to fine tune and iterate and fast iterate on the tuning process, my mind was just, like, you know, graph, rag, dogs, dags. Hey.

All of these patterns. And, you know, what I saw was the evolution of a really, really powerful I don’t know how to describe it. Maybe like middleware. I don’t even know if that’s the right word, but where where the there is software that is being used to create these prompts that are incredibly powerful to do things like what you just described. Right? Like like, you know, write write me a Python job that will go get my customer data from all of these sources and and and, you know, and and present it back in this format.

Really, really, really powerful stuff. So well, on that note, AI, it’s out there. It’s coming. Yeah.

The deep seek, that’s the flavor of the week. But we’re recording this, you know, late January when it publishes in February. Who knows what the the next what the what the next flavor will be. Yeah.

But I wanna thank you, Alberto. This has been wonderful. I learned a lot about data virtualization and logical data warehouses today. By the way, folks, if you’re still with us, episode fifty four.

I had a great conversation with Sanjeev Mohan, super, super smart guy around semantic layers and and understanding what the differences are, talking a little bit more about them versus MDM. So check that out. Thank you so much, Alberto.

Thank you, Malcolm. It’s been a pleasure.

Alright. With that, I will, bid everybody ado. Thank you for tuning in. Thank you for watching on YouTube.

Thank you for downloading through Spotify or Google or wherever you get your podcast. If you haven’t already, please consider subscribing to the podcast. That would be an absolute thrill. I will see everybody again in another episode of CDO Matters sometime very soon.

Thanks all. Bye for now.

ABOUT THE SHOW

How can today’s Chief Data Officers help their organizations become more data-driven? Join former Gartner analyst Malcolm Hawker as he interviews thought leaders on all things data management – ranging from data fabrics to blockchain and more — and learns why they matter to today’s CDOs. If you want to dig deep into the CDO Matters that are top-of-mind for today’s modern data leaders, this show is for you.
Malcom Hawker - Gartner analyst and co-author of the most recent MQ.

Malcolm Hawker

Malcolm Hawker is an experienced thought leader in data management and governance and has consulted on thousands of software implementations in his years as a Gartner analyst, architect at Dun & Bradstreet and more. Now as an evangelist for helping companies become truly data-driven, he’s here to help CDOs understand how data can be a competitive advantage.
Facebook
Twitter
LinkedIn

LET'S DO THIS!

Complete the form below to request your spot at Profisee’s happy hour and dinner at Il Mulino in the Swan Hotel on Tuesday, March 21 at 6:30pm.

REGISTER BELOW

MDM vs. MDS graphic
The Profisee website uses cookies to help ensure you have the best experience possible.  Learn more