Culture
Data Management
Data Professionals

The CDO Matters Podcast Episode 78

Data-Driven Healthcare with Jane Urban

X

Episode Overview:

In this episode of the CDO Matters Podcast, Malcolm Hawker welcomes Jane Urban, the CDAO of Improzo, for a deep dive into the transformative role of data and AI in healthcare.
 
From predicting patient outcomes to improving care delivery, Jane shares practical insights drawn from her extensive experience. Whether you’re in healthcare, pharma, or life sciences, this conversation highlights how data can drive real-world impact.

Episode Links & Resources:

Good morning, good afternoon, good evening, good whatever time it is, wherever you are in this amazing blue floating orb of ours.

I am so thrilled that you have chosen to spend the next, forty, fifty, sixty minutes. We’ll see how long it goes. Probably not. More like forty, forty five minutes with us.

I am Malcolm Hawker. Well, the the proverbial us. I’m here with Jane as well. I don’t yeah.

It’s it’s both of them. If you podcast, there’s more than one of us in the studio today. I’m Malcolm. I’m the CDO of Prophecy, and I am also the host of the CDO Matters podcast.

Today, we are going to talk about data in health care. We’re gonna talk about some cool applications of AI and data in the health care space. We’ll talk about things like predicting potentially predicting patient outcomes, improving patient outcomes. So if you’re in the health care space or you’re related to the health care space, maybe you’re in pharma, maybe you’re in biopharma, maybe you’re even in ag, and maybe you’re doing some fun stuff in in the ag space. I think it’s all gonna be relevant today.

I’m thrilled to be joined by Jane Urban, who’s gonna share her infinite wisdom in this area. Jane, thanks for joining us tonight.

 

Thanks for having me, Malcolm. It’s it’s exciting to be able to talk I can always talk about this, so this is wonderful.

 

You and me both, much to the chagrin of my wife, anytime we’re in public, I could talk about this stuff at length. Jane, why don’t you tell us a little bit about your your background and how you come at your wisdom in the in the health care space?

 

Well, I would say we’re all students of this space, but, my background is actually in the biomedical engineering and economics is my root sort of degree, what I would say that was a precursor to what is data science today in a lot of ways.

 

And I’ve been working in the health care industry both as a practitioner alternating between those two for almost twenty years now. And definitely very much some someone who cares deeply about the notion of using data to have better decision making in pharma, pharma, and especially in the notion of finding and identifying unmet medical needs using data, in pharma. So I’ve kind of worked very early days in the DOS mainframe, working in SAS code with Medicare Medicaid data way back, and then all the way up to now, kind of using AI and other more modern techniques to make the best of data.

 

I’ve also spent time in the operations and kind of analytics spaces as well, so kind of adjuncts to data. And I think, you know, when you can bring all that together, it’s pretty exciting stuff.

 

Well, so I’m excited because we had our prep call. It was actually yesterday. Yes. So we we we had our prep call, and what are we gonna talk about? It was kind of the getting to know you because I didn’t know you. I was we we were introduced in indirectly through our friend in common, Angelie Bansal, who’s a wonderful and truly talented human being. So thankful for Angelie to make that that introduction.

 

But but as we were chatting, we we just started talking about, like, you know, health care and data and AI and all this stuff. And I I look at my watch, and it’s like forty minutes had gone by. I was like, damn. We should have pressed play.

 

So so my my hope is is that we or not play. Press record. Yes.

 

Button. Uh-huh.

 

It would have been just so bad, actually.

 

Whatever but whatever button we can find.

 

I’m showing my age because I’m like, this is the tape recorder metaphors. Right?

 

That’s what I’m thinking of.

 

If you hit instead of recording, you’re just in playing something, you wouldn’t have gotten any of it.

 

So Yeah. It’s it’s exactly right. But you know you’re old when you start making a lot of these references where where younger people would be like, what what, press? What do you mean?

 

Okay.

 

Explain to graduate students what a fax machine was at a lecture I was giving a few weeks ago, and they were looking at me all side like a fax machine? I was like, what’s the phone was a printer?

 

Yeah.

 

The guy was trying to explain it in context, and they were like, okay, why would that exist?

 

Why wouldn’t you just email it? And I’m like, yeah, I get get that.

 

Yeah. Yeah.

 

Or the the foils. Were you were you, you know, old enough to be in, you know, college where they would show, like, the the foils where they would put, like, the an overhead projector that the professors would write out on the clear plastic sheets of paper, and then it would get projected at the foils and call them the foils anyway. Yep. Yeah. People are like, what what are these old farts talking about?

 

That’s me. I I I know that is that is me. This is all real here.

 

Projector, though. I thought that was pretty cool pretty pretty cool tech for its day, actually, when you think about smart boards and all that now.

 

Better better than the chalkboards.

 

Right? Like, that that I’m old enough to remember that as well. Right. I was given the unique distinction often as being the, you know, the teacher’s pet and who got to clean the chalkboards, which I at the time, I was like, oh, I’m special. I get to cheat clean the chalkboards when when lo and behold, it was a very dirty job and, was yeah.

 

So Probably lung cancer in your future as a result.

 

It’s all good.

 

Well, how would we predict that? That’s that’s a great that’s a that’s a great segway.

 

So Only the data captured chalkboard duty is one of the, like, variables.

 

Although who knows? It could be buried in there somewhere. Some of these data sets are quite deep.

 

Part of my academic record, there’d be chalkboard duty and there would be film and AV duty. So I was also with the Keener that was, like, running the the the the the film projector and where you had to rewind the film after the film completed. Anyway, we were rabbit holing people. Like, man, we came for the data, and and we got, you know, eighties and nineties memorabilia.

 

So when I asked you yesterday, I said, you know, hey. What are you passionate about? Like, what what are some of the things that that that make you wanna get up and go to work in the morning? And and and you started by sharing insight around understanding and predicting outcomes, which I thought was really, really cool. So I will rhetorically bounce it back to you and say, why don’t you share with me what you shared yesterday around some of the things that you’ve been working on and some of the things you’re looking at around predicting using AI to do and be more predictive when it comes to health outcomes?

 

Yeah. I think what’s so exciting about where we are, especially this in this moment, is that we’ve had, in healthcare, a lot of data. I always say this is a great place to come if you like to play with data because we have no shortage of information, and a lot of it’s very longitudinal, too, so you can look backwards.

 

At points in micro, I’ve had access to ten plus years of data to look backwards at patient populations.

 

So there is this really lovely depth and breadth of data that’s available in healthcare because of all of the capturing of, basically, medical claims is really the main source for a lot of this information. So what you can do is you can combine, for a given individual, you don’t know their name, but you have kind of like an identifier number, all of the things that have happened to them, in some cases for a whole decade, especially if they don’t change around their insurance too much. You can really have a beautiful, kind of, complete picture of their medical history. And so a lot of the modeling that’s done, this is probably a little bit older AI, not the most up to date, kind of, generative stuff, but more classic, the statistical modeling AI, allows you to really look at the sort of a cohort of people who are undiagnosed but have a lot of similar characteristics to folks who are definitively diagnosed and then use that model to say these are people who have the highest sort of percent chance that they may also have this disease.

 

And this works well for things that may take a long time to diagnose, sometimes autoimmune disorders, things that involve a number of kind of unrelated symptoms that maybe then kind of that constellation comes together and you see a pattern that would allow you to identify someone, as having this maybe more rare or congenital type of disease. And there’s also an element of, if it’s rare enough, people haven’t seen it before. So you may be going to a clinician who just has no idea and has never encountered this particular disease state, whereas if you were lucky enough to walk into a place, say, in Boston, like where I live, you might have run into someone who’s like, Oh, I’ve seen this particular problem.

 

I know what this is. So a lot of the work that I’ve done is to kind of close that gap between your top notch, you know, academic medical person who’s seen lots of different interesting things and maybe someone who’s a little bit less experienced. They can kind of we can solve for some of that the difference in knowledge and then ultimately, hopefully, activate the physician to sort of start looking for that patient. You might not have their name, but sort of, like, some characteristics of them can help them say, oh, I kinda think I know what you’re talking about.

 

Let me reach out to them and see if we can get them, you know, like, a diagnosis and ideally also a treatment for that diagnosis.

 

That’s a little excited.

 

So so where where are those types of practices happening? I have to assume that they’re gonna be happening in any of the larger providers.

 

Right? So Yeah.

 

So it well, it depends, obviously, on what we’re talking about, which disease state, what kinds of data you need to identify the disease. I mean, in some cases, it could just be different diagnoses over time. So maybe you’ve had some kind of, you know, unknown swelling or you had a hernia or you had a bone break and a kind of weird bone to break younger or something like that. Like, those sorts of unusual things, put them all together, it’s some sort of an autoimmune problem.

 

There’s also an element of lab results and biopsies and test things, and that’s obviously more specific to certain disease states where there is a definitive, like this is present, therefore you have this infection or you have this cancer or you have, you know, so there’s some of that. And then there’s even imaging, right? That’s a whole other newest, newer field to be able to dig into all the pictures of different, you know, scans and things that have been done. And that’s where I think there’s gonna be some really interesting juxtaposition of human skill, that interpreting a visual kind of data point relative to technology.

 

And that could be another place I think we’ll see some changes over time as as computers get better and better at, you know, identifying patterns in visual fields. But in general, all of those things can be relevant depending on your disease state. I think what we talked about yesterday is, you know, I’m not a medical professional, don’t have that background, but I think in a lot of ways, the partnership that the data folks can have with folks who have a medical background, who maybe are formerly prescribers or formerly practitioners in a space as pharmacists, as nurses, as whatever kind of a health care professional, those are folks that can also help kind of guide the analytics to say these are most commonly the way patients sort of show up with this disease.

 

They may have this low lab value and this high lab value, and they’ve been to the emergency room in the last three months, and they’re also on a steroid, and they’re also like, you have this whole list of things that might happen. And so then you can start looking for that in the data as well and saying, okay, does that seem to be a real pattern in the real world? And can we then start looking for more people who look just like that? And how many more of them?

 

Where are they? And I think when you’re on to something, you start seeing patterns where maybe it’s seventy five percent or eighty percent of what you thought the patient should look like. It’s not perfect, but it’s getting close, and that allows you to say, okay, we’re on the right. Our model is learning and figuring this out kind of well.

 

So that’s a big part of the art and science, right, is knowing enough about the medical treatment practices and how people would look at a patient to say this looks right or this is improbable.

 

You know, there’s just no way that this would happen in the real world.

 

So I find I find this fascinating because we we pay a lot of what’s the right way to say this? Arguably lip service in the data and analytics space to talk about domain expertise. We we loosely say domain expertise. Yeah.

 

Because if you’re gonna build a report for somebody in procurement, you should probably know how you procure stuff. If you’re gonna build a report for somebody in sales, you should probably know how they sell stuff or how they market to like and, yes, yes, all these things are important, and and we do need these things, by the way. I’m not trying to downplay that. But in what in the world that you just described, holy cow.

 

Talk about debate expertise because the line between data scientist and medical scientist would start to get really blurry really fast.

 

Right? I can even see, like, what a killer maybe a wrong choice. Words there.

 

What a fantastic combination to have Hopefully I hopefully I feel that.

 

An MD and a data science background because there’s some of the from a diagnostic perspective, the things that you’re talking about, the number of variables that could be at play, the things that you should probably be considering, even beyond, like, the things you talked about, imagery, symptoms that are that are that are, you know, that that are being presented, Nutrition, exercise. All of these things, like, these models could get crazy complex.

 

And I would wonder, you know, the the average data scientist has sat down with one of these doctors, even the lexicon, like, the words could be really, really confounding. Or do you see are there certain types of data scientists? Have you seen people that start to kind of veer towards that world and go really deep? Yeah?

 

Yeah. I was gonna say, I think some of the data scientists that I’ve been fortunate enough to work with and have on my teams have had that kind of depth. And some of them, I think there’s an interesting crossover point. So pharma, in general, tends to be very siloed between the commercial folks that sort of make sure once a product has been approved by the FDA, it’s getting the messaging about it’s out there. There’s those TV ads, all those things you’ve probably encountered in the US anyway because we have that direct to consumer channel, and that’s one sort of big chunk of the pharma company. But the other half is this research and development chunk, And this is where I feel like chunk is a technical term, by the way.

 

Yes.

 

But I think that the the data scientists what I’ve seen is that the data scientists who work in those two realms, actually, I was fortunate to have some of them go from my team over to the r and d version of that work and vice versa. So we were starting to see that cross pollination and the value that that can create for both teams because there’s actually different kinds of data that these teams are looking at and using. And so by bringing some cross pollination of talent, there’s a deep expertise there, for sure, that’s pretty common amongst both, like understanding the disease state and knowing kind of typical treatment patterns, what are the other drugs you might use to try to treat this problem before you get to more complicated, maybe injectable products. There’s always kind of this notion of like, can you treat this with something over the counter first? Can or do you need something prescribed? And so on and so forth as you get kind of more serious in your disease progression.

 

So I think that is part of it, is getting to know really getting to know what the disease and the treatment looks like and then where maybe it goes sideways. I think that’s the other interesting part because so many times, right, you have something wrong, you go to whatever, primary care, urgent care, something kind of first point of care, and sometimes they get it totally right and spot on. Like, Oh, you need an antibiotic and you get your script and you take your meds and you feel much better and you just kind of move on with your life. But in these cases, especially if it’s something more autoimmune or chronic or whatever, you might do all that stuff and then it doesn’t seem like it got any better.

 

And so you’re sitting in a state of maybe just confusion and still feeling sick, and so then you have to do your next intervention. And so you’re looking at that long term kind of piece as well in this data, and that’s where you can kind of start to see people who have kind of unresolved, treatment and they still are looking for something better. And so that can be your indicator something is wrong. But I think other side, you can make it too complicated, and then it’s hard to know what really was driving any of it.

 

I know when we were first trying to build some of these algorithms, we found there were thirty things that seemed to have some statistical significance across the population, and that’s just a little overwhelming for anyone to kind of make decisions with or think about. So that’s when you kind of go back around to your, you know, clinicians and people and say, Well, what are the two or three things that seem to really matter? And so then it’s more about, like, maybe it’s recent emergency room visit is kind of important. That’s what you usually see is this patient ran had just gone to the emergency room, had a diagnosis or some sort of additional testing done, and then they show up in your office.

 

Or maybe they’ve been on multiple rounds of steroids and they’re not getting results, and so that means something about this is not just fixed with steroids or antibiotics or whatever. So it’s those kind of contextual things that then you can layer between the many possibilities that you could get from just the data by itself. That’s what your data scientist would be like, here are the thirty things. And then your medical scientist could say, okay, but of the thirty, like four of them are really what’s exciting and interesting about this population and the other twenty six kinda don’t matter that much.

 

And so then you can start to hone in on a story. And I think that’s the final step in all of this, right, is how do you then translate this into something that other people can take action from and do something with? Because if you say, if you see any of these thirty things, doctor, you should tell us and we’ll help you find that. Like, that’s not gonna it’s just not gonna work in a practical sense.

 

So I think the final art twist of the whole thing is how do you tell a story with that data so that it’s useful, and then you can have that conversation with whoever it is, whether it’s your medical team or your sales team and then your physician who’s trying to figure out what to do.

 

I think that’s the interesting end, like, last mile of it all that makes it especially exciting when you can get it right. I think that’s what I was we were talking about that. Like, just when you can get to a point where someone actually is getting relief, getting diagnosis, getting treatment because of some data geeks who’ve been running a lot of models in Python or whatever. It’s like, that’s pretty cool. Like, that’s a neat way for you to make a difference with your skills in a way that, you know, if you didn’t go to medical school, can you still help somebody? Like, turns out, yeah. And that’s that’s pretty exciting.

 

Well, there’s that and which is the most important metric and a fabulous metric because we talk a lot on this podcast and through all of my content about the importance of tracking and quantifying business benefits. That would most certainly, you know, longevity, living, would certainly be one of the best metrics certainly to track, but not to mention the fact that there is a lot of money in them there, Hills, for pharma companies to figure out some of this stuff, whether it’s treatment plans or whether it’s even just an effective diagnosis that can be there can be a reasonable return on that.

 

So sounds like incredibly rewarding work. To the story, though, to my point, you know, for stories to be relevant, we need to be using a shared language. And if doctors are speaking doctor and we’re speaking data, to me, there’s there’s something interesting there that at the very least, us as data people need to start to figure out how to use at least some of that language and speak in a way that’s gonna be compelling to the MDs.

 

No. You’re right. And I do think there’s also an element of listening and learning from clinicians and understanding what so there’s this qualitative kind of market research angle to this too, right, where you’re gonna listen to a bunch of different physicians, whether that’s through more formalized advisory board type things where you’re listening and learning, but or even just from people who are former clinicians coming and talking to you about what they think might be going on. But either way, that notion of what is a doctor’s experience and how do they think about this particular disease state and kind of patient population, that can go a long way in kind of just taking out a lot of steps in your data science exercise because you already kind of know roughly what you should be looking for and what not to look for.

 

What to see is kind of, like, not that exciting. You know, it’s sort of like you don’t wanna run an entire analysis to say older people are more likely to be sick or something. It’s like, well, I could have told you that without doing an analytic goal exercise, but but I think sometimes it’s it’s the hang up that you have as a data scientist is you’re just, like, running different kind of models and correlations and not then stepping back and saying, what does that mean for useful information or decision making reasons. So that’s that’s it’s it’s that’s the fun of it, actually.

 

It’s the art of trying to combine the true, you know, brute force analytical science piece with the practical reality of what it’s like to be a patient. And actually my original, so my early work was actually in primary market research. I spent a fair amount of time doing that early in my career. I think that’s helped a lot in how I tackle the data science piece because you have to combine the quant and the call together to really get the story straight.

 

There’s gonna be, of course, patterns that you find in the data that you can sort of link up to stories, but there could also be, I think, surprises in the data where you say, I thought everybody kind of went and stepped from point A to point B to point C, and that’s the progression that we’re hearing when we talk to clinicians or when we, you know, hear about this this particular disease state. But then in the data, people are kind of bouncing all over the place. They might go from a to c, back to b, back to a, back to c, and so it’s not as clear and linear as the story you get from a clinician.

 

And so that’s the other piece that’s interesting about this too is the confusion in the real world is real. And I think sometimes the medical care is much jumpier and more bouncy and less clear for patients than we’d like it to be when we talk to kind of your ideal key opinion later, kind of their experience, because they may be in a place that’s just inordinately well oiled around diagnosing and treating and dealing with this particular disease. But if you’re in a non academic place or a place that hasn’t seen this ever before, it probably isn’t gonna be as smooth sailing to navigate the healthcare system and get to the care you need.

 

So that’s part of it too, is it’s heterogeneous, right? There’s not one experience across the whole for anything, pretty much. There’s no consistency.

 

So you have to kinda help with that. How do you kinda create some level playing field for analysis and then ultimately for suggestions and advice for for clinicians.

 

So what I’m hearing you say is that people are complex organic beings, and health care providers and health care companies and large pharma companies are also complex, arguably slightly organic beings.

 

And somebody in the data world needs to be able to navigate between both of those. And I’m also hearing you say, and I’m paraphrasing you now, is that this work is inherently iterative. It’s inherently experimental to a certain degree. You’re nodding. Okay. That’s good.

 

Yes. I’m nodding. Yeah. I think well, I think it has to be. Right? Because you’re not gonna know until you pressure test certain hypotheses and see if they were born out in the real world or not.

 

Right.

 

And and that they’re validated by some someone who’s got a a true scientific background in this area. That’s the other piece too. Kind of consistently looping back with your your teams that know their stuff or with clinicians to say, does this resonate with you?

 

So so, you know, metaphorically speaking, you’re throw a lot you’re throwing a lot of pieces of spaghetti at the fridge Yep.

 

Seeing what sticks, and that’s just the natural process of trying to solve these problems within extremely complex organizations and extremely complex human beings, and that’s okay. That’s something that I think that that, you know, CDOs on more of the, let’s just call it traditional data and analytics side of the house, like the the dashboard people, the reports folks, or the data management people like me who who tend to see everything in these very deterministic lenses. Right? It is or it isn’t. It’s zero or a one. It’s garbage in or it’s garbage out. Or it’s all these kind of very deterministic ways of looking at it.

 

And what I’m hearing you talk about is that data science is this very experimental thing where you’re not gonna get a hundred percent accuracy from the models on on day one. That that this is something that needs to be experimented on, and it’s inherently an r and d process that’s gonna take some time.

 

So I think that that that’s I think that’s an important message for anybody who may be going into this world or maybe inheriting a data science team or maybe talking about inheriting even an AI function Yeah.

 

Where where it’s it’s not the same type of paradigm of this deterministic world as an inherently probabilistic world. How do you respond to that?

 

I think it fits spot on. I mean, part of the challenge that data scientists often have is that you are it’s much more of science. It’s hypotheses that you’re either proving or disproving, and then you kinda go back and make a new hypothesis and iterate that way. And then even if you kinda come up with something that’s hanging together and working pretty well, the market dynamics could change.

 

I remember so many of the models started to just totally diverge from where people were predicting them to go during COVID because behaviors were just so disrupted and the whole health care ecosystem. There were just some interesting pops and dips in behaviors and things. I remember one of the lead data scientists was telling me about you know, and we we always give our because this is the other thing with data scientists. They’re a creative bunch.

 

They’re kind of a you know, they tend to be in the more of a neurodiverse sort of space in terms of how they think. And so you have to give them a little bit of room for some creative stuff. Like, if you don’t give them some crayons to play with, then they will make crayons, and then it’s even harder to kinda curtail everything. So I always kinda gave it a little bit of space for, like, just just go check take some data and play with it, which for them is the fun, kind of exciting part of their day.

 

And I remember a couple of things that came back were glasses. Like, an incredible number of people got glasses in the first six months of COVID, and myself included. I didn’t really wear glasses much until COVID just because you’re suddenly spending all this time in front of a screen and it’s, like, hurting your eyes.

 

The other one was people were were suddenly looking for more help for their, like, audio like, they needed more, like, getting ear exams because they realized they couldn’t hear as well as they thought. Things like that. There were some little, like, weird increases in in propensity to look for something across kind of out of the norm.

 

The other thing that was happening, which I think was a double edged sword for pharma, was people who were currently on a product were way less likely to switch products. So whatever patterns you were seeing of someone maybe trying a new new drug or switching between different drugs or whatever or discontinuing and and not continuing or whatever the different kind of, like, current flow of humans doing their thing, suddenly, everyone kinda stopped changing whatever they were doing. So if you were in a product space that was declining, you suddenly kinda held steady because nobody wanted to leave their current drug or get a different drug while they were away from the health care system. They just wanted to keep, you know, refer getting a renewal of their script or whatever.

 

And anything new that was trying to be launched during that time frame was had almost no pickup because people would have to go to the doctor, they’d have to get a test done or something. So if you were launching, you’re in trouble. If you were expecting a launch, you may actually have had, like, kind of a win because nobody was actually changing any pattern of what they were doing when it came to their recurring chronic prescription. So there were some of those kind of things where all of the forecasts and models and things that sort of predicted behavior suddenly broke, people that are getting, let’s say, imaging done is way, way, way down because all that kind of outpatient optional stuff that doesn’t need to be done from an emergency perspective can be pushed off, from mammograms got put. There’s some kind of ridiculous number of mammograms that were pushed off during COVID.

 

So all that kind of stuff kind of sits stagnant. Then what’s the ripple effect of that on the healthcare system two, three, four years down the line? And I think that’s where it’s hard to quantify.

 

And obviously COVID had its direct impact of people getting COVID and expect and dying from COVID, but all the other ripples of people not getting done certain key medical things or putting off things or waiting. You know, all of that kind of is harder even to measure.

 

But, yeah, data science requires this constant kind of looking at what’s happening in the real world and how it might ripple or change or increase or decrease things that were happening at a certain cadence before that. And I think for the data scientists, COVID was exciting because it was something interesting and different that just never happened before in any of the history of data in pharma ever. I mean, this is probably true in a lot of other industries as well. Things that just didn’t make sense and didn’t kind of fit the normal modeling that everyone had done.

 

That’s how we ran out of toilet paper and things like that. So I think that’s part of what’s interesting with data science. It’s a much more creative approach. And there’s nothing wrong wrong with measuring and counting things and putting them on dashboards.

 

It’s so important. It has to happen. Yeah.

 

Of course.

 

But I think it’s just a very different breed of mental space to be in. So to your point, you may still have to clean up that data, though, right? Because there could be some times when it’s not brought in or it’s missing or whatever.

 

So there’s still work to be done to to sort of like Yes.

 

The the completeness and timeliness and accuracy, you know?

 

The wrangling, as it were. The wrangling.

 

Unfortunately, it still applies in the data we’re talking about for these things as well. And so that’s part of the challenge. So there’s still I don’t know if you ever get away from either data science or just analytics, the prepping and cleaning and and getting things ready. And I honestly I maybe it’s controversial, but I think you almost have to do it a bit yourself. Like, you can’t automate every single piece of that because you miss out. I know. Right?

 

You miss out on some of the learning of what could be problematic about your data or what is typically complete and what’s kind of missing. Because if you didn’t know that and you try to use a variable that’s only present, you know, five percent of the data, like you might be basing your analysis on a really tiny subset of your data and not even realize it. So those kinds of like really simple counting things, how many how many records are complete, all that kind of stuff, I think, especially in the case of trying to build these complex models, you may build these complex models, you may have to drop some things just because they’re not good enough. You know, they don’t make they don’t have enough information to be accurate. And so that even if that might work or you can’t use it.

 

You’re you’re you’re touching on something that I think is is is a really, really I don’t know. Profound may sound like a a overinflated word in this case, but you’re you’re touching on something that’s really important here, which is I will loosely call the idea of data being AI ready. Yeah. And a lot of the conferences that I’ve been to over the last couple of years, people have loosely just kinda thrown out, make it AI ready.

 

And what I’ve seen is a lot of people rushing to overlay legacy ways of managing data onto AI. Meaning, legacy, like BI, analytics, insights, classic dashboard stuff. The way that I look at it, the the systems and processes we have for BI are not what we need for AI. Here’s an example.

 

Gartner defines AI ready data is data that accurately represents the source, right, for a given use case for a given use case.

 

But in the BI world, we we’ll happily change stuff all the time. Right. We’ll we’ll we’ll transform data all the time because because we’re trying to make it conform to some standard that is needed by the dashboard in order to run correctly or so that our data pipeline doesn’t fail. Right?

 

So it’s not necessarily accurately reflecting source. It’s accurately reflecting some standard that we’re imposing on our analytical paradigm. So these are two very, very different things. Right?

 

And where you impose that transformation to in the in the whole schema. So, like, let’s say you’re talking about annual sales. You’re taking the date the sales could be at the daily level, but you’re rolling them up to twenty twenty four or something. Where did you do that?

 

Are you doing it at the very end? So it’s like we have all this raw data with every single sales, you know, amount, and we rolled it all up at the very end. Or did you already roll it, and then the only thing you’re pulling in is this single number for twenty twenty four? Like, when you do those two different things, obviously, the one number is gonna be very accurate and complete and timely.

 

But could it be could there be transformations that were inaccurately done or, you know, you can’t see what you did to it to get it to that point. And so I think that’s some of the anxiety that the AI can just roll stuff up. And if someone typoed the the year in a bunch of data, you won’t know that. And so now you have this number that doesn’t match anywhere else and you can’t figure out why. So I think the same problems that that plague us in the non AI data space will still fight us to be AI enabled and probably faster and with more confusion and delay around them because of that. Because I think a lot of what I’m I think my favorite metaphor for AI right now that I’ve been talking about is when, steam powered factories were trying to transfer to electric power, initially, right, factories in that era had just one big steam engine in the middle of the building, and it powered all the factory steps.

 

So there’s ten steps. Let’s say, all ten machines have to run at the same time because there’s one engine and it’s all connected into one big loop. That’s that’s when they first did electricity, it was just like, okay, I guess we’ll put this electric engine now in the middle of the factory and run all ten things on the electric engine. And everyone was kinda like, I don’t always see the benefit of this, but okay, because we’re reelectrified.

 

So it’s like we’re using AI now because we put the AI where we used to put regular statistics or something. It’s only when you started to think about what’s unique about electricity, oh, you could have circuits and you could have, like, the part two and part eight running, but the other parts are turned off and it would still work. And and it took a while for that kind of a mindset shift to happen in a in a factory space when that was a focus and priority. I think the same thing’s gonna have to happen with AI.

 

We’ve kinda done that. Right? We just dropped the AI on top of wherever our old analytic toolkit was, and we’re sort of unimpressed. Like, in some cases, it’s like and it said the wrong answer.

 

So, like, what have we even been doing with this thing? It’s not good. Versus rethinking the whole entire process from the ground up with AI to start from, and that’s gonna take us some time to figure out how to even do that. So when you talk about AI ready data, it’s almost like, is AI the thing you need for that data in the first place?

 

Maybe in some cases, yes. But in other cases, AI and data don’t really need to play together in that way. It could be something completely different.

 

Yeah. I I think you’re exactly right.

 

Love the metaphor of just kind of dropping this thing into the old factory and expecting everything to just work spec, and it’s not not immersive.

 

You’re like, I don’t get it.

 

It’s the point of No.

 

It’s it’s not. And what’s happening is that, you know, the we’re you know, if if you view the explosion or the launch at the very least of ChatGPT as the kind of, like, you know, the clock starts here for the consumerization of AI at scale at most data and analytics functions. I would argue for better or worse, we’re still trying to figure all of this stuff out. I don’t think I don’t think anybody really has any clue, particularly when it comes to governance or even particularly with the things that we were just talking about. Like, what what does what does AI ready data look like? Should I be eliminating outliers, or is the greatest value in the outlier?

 

Right. Right. Could the AI start to help you dive into your data in a new way that you wouldn’t have been able to do in a more traditional statistical way? Could you show me the top five, the bottom five, whatever? Like, all those kinds of questions that were maybe a little harder to do in traditional programming world that AI can do quite quite simply. I think the other thing is, is AI supposed to be doing analytics?

 

Like like, we I think we’ve sort of conflated the notion that using this kind of word based interface is so lovely, right, that you can just ask a question and it’ll give you an answer. But my hunch is that we’ll find over time that stats stuff was pretty good. Like, if you’re trying to figure out what’s the likely predictor of something, maybe statistics is still where it’s at, and the AI is more of a qualitative interface to understand those statistical models rather than doing them itself. Because that’s actually one of the things that’s really interesting about up until really the last couple of models, asking AI to do, like, two plus two, it struggles.

 

Like, it’s not good at math because it’s not an element of collective human knowledge in the same way. And so it’s kinda using the wrong tool. Right? You’re trying to use a screwdriver as a hammer or something, and it just doesn’t quite make sense.

 

And so that, to me, is the other place is that have we figured out whether we even need a AI ready, quote, unquote, data, or if we actually just need to use AI for something else besides statistics. Like, maybe that’s maybe statistics is good enough for statistics, and you don’t need AI for that.

 

You’ve run running running all of this data through legacy pipelines and legacy wrangling Right. Processes.

 

I’ll tell you something I’m I’m kinda wrestling with at a philosophical level, and and it’s actually far more practical than just philosophical, but which is the idea that eighty percent eighty to ninety percent of all data in most organizations is unstructured. Right? And that’s a that’s a that’s a broad swath. Right?

 

That that includes, you know, XML, HTML, semi structured, structured text, video, completely unstructured. Who knows? So so fine. It’s just a sound byte, eighty to ninety percent.

 

But there’s a buttload of it. Yeah. There’s a buttload of it out there, and it could be very valuable to AI. Right?

 

Particularly Gen AI because that’s how it is optimized. Gen AI solutions are optimized by and built on unstructured data, mostly text.

 

So I’m struggling with the idea, Jay, and I’d love to ask your question your your opinion on this. I’m struggling with there there is a there is a population of of people in my world that that believe the only way that we are going to govern that that unstructured data is if we find ways to structure it. Is if we put it through this sausage grinder, right, and and we find find ways to basically chunk it into parts so that we can run our legacy data governance processes at scale, so we are confident that we can feed it back to AI. Right?

 

Here’s the here’s the irony. You are you can Yeah. That doesn’t quite hang up. Take forward.

 

Right? Which is okay. All this unstructured data, the way that that I need to structure it so I can govern it, so I can put some sort of, like, certified stamp on it, so I’m confident that AI could consume it.

 

Why don’t you just leave it as it is?

 

But here’s the irony, not the irony. Here’s the paradox.

 

To leave it as it is, the only way you’re gonna process that data at scale is to use AI to do it.

 

Right. Right. Well, and I think that’s part of it too is so it’s it cuts both ways. Right?

 

You you have this unstructured data. AI is better at kinda looking at and trying to make sense of unstructured things and finding patterns in it. So it probably is a good tool for structuring unstructured data. So if you read it the other way, right, where you say, let’s use AI to take unstructured things and structure them in a way that then I can use my formalized governance that’s familiar to me that I’ve used for a long time.

 

But the starting point was still an AI solution. So now you’re, like, struggling with whether or not that’s accurate or if it’s hallucinating or if it’s creating different kinds of risk that you then are feeding into your governance. So you’ve kind of inherently put it at a higher risk state than before you even started the whole ex episode. So So I think that’s part of the tension that we’re having with where AI can play in a governance and any sort of data framework.

 

Right? And I think part of it is also, what are you trying to do with this information? Because at the end of it all, if it’s truly just to better understand it and you’re curious about it or you’re doing some hypothesis you want to better kind of, you know, bring to life, maybe the governance looks different for those kinds of things. And there’s still an element of leaving something unstructured and looking at it in an unstructured format with AI that is it only goes so far in terms of whether you can trust it as the final answer.

 

Maybe it’s more of an input than an output. And I think especially in the healthcare space where no matter how good AI gets at diagnosing things, I mean, I think that’s where we eventually are gonna go is we can teach AI essentially all of medical knowledge to to date and be like, alright. Tell us what the heck’s wrong with this person. Here’s their data.

 

Like, help us out. I mean, that’s, I think, where a lot of the technology is trying to go. There is no way that we would just take the AI answer by itself and go, okay, cool. That’s what we’ll do.

 

We’ll, you know, chop off this person’s arm or we’ll give them this drug or we’ll do whatever. Like, of course, you’re gonna have someone weigh in somewhere in that process to say, I’ve seen this particular combination of things before, and I don’t agree with amputating this person’s arm. I think you can treat it this other way, and everyone’s gonna want that kind of second opinion of a human no matter what answer the AI gives. So there’s kind of this tension around whether AI is really meant to do some of these things.

 

I think we maybe have over projected its value proposition in some places, and then ironically probably under projected it for making sense of certain kinds of qualitative information or I think actually, like, having AI as a guide or a counselor is fascinating to me. I know there’s a lot going on with using AI as just give me a pep talk, you know, or things like that. We’re actually pretty good at that sort of stuff because it has the written wisdom of all the humans. And so it’s pretty good at knowing, at least some level, what people like to hear, you know, things like that.

 

So there’s some places where I think we’ve maybe underestimated where AI can play, and maybe where we’ve overestimated it. And, of course, it’s gonna have and it does have this hype bubble that’s kind of around it right now. But my hunch is that there will still be some of the best places is really with automation and thinking about intelligently taking stuff that we do that’s really gross and repetitive and trying to make it less of a burden for people. I mean, that’s kinda where a lot of the agentic stuff is coming from is things that are more arguably less unstructured.

 

They’re more structured, but using AI to help. I mean, that’s where I think a lot of people want to go, like calendar management or, you know, making sense of a of a dataset that has missing values could be something AI could be pretty good at. But, actually, interpreting then then doing the math on that maybe isn’t the strength for the AI. So it’s it’s a journey that we’re all kind of having to go on together in a lot of ways.

 

And since everyone’s excited about it, you’ve you’ve got a lot happening all at the same time. So kind of finding the signal in that effort through the noise, I think, is one of the biggest challenges the entire, honestly, all people using data anywhere are struggling with.

 

Well, bingo. I I totally agree.

 

And to extrapolate that out, the signal in this case, I I would argue, we’re kinda pulling back a little bit, I would argue the hardest part of of all of the things that we were just talking about. We talked about mindset shift. We talked about maybe even technology shift. We talked about right tool for the right job.

 

Right? Is this AI? Is that the right thing? But when it comes to governance and when it comes to kind of managing this data, when it comes to AI readiness, the hard part, to me, the really hard part is quality.

 

What could loosely be defined as quality and the classical attributes of quality. Right? Is it accurate? Is it timely?

 

Right? Those those things in in in particular. I guess we could time stamp a few things, but accuracy, ultimately, this is gonna be the hardest thing for us to kinda get our hands around because how do you how do you assess the accuracy of ten pages of text?

 

Right. I mean, it’s if that’s the sort of the whole point of why there’s value in summarize this for me or whatever as a question to the AI agent or the AI tool. And I think timeliness is is fascinating because I think as I’ve seen an effort to try to use AI in a day to day operation, many of these models, they have a snapshot shot and a timestamp sort of to what they know. Like, they know the world up until, say, July of twenty twenty four or whatever.

 

Mhmm.

 

And then if you ask a question that involves knowing something that happened after July of twenty twenty four, like, if you said, what is the full year sales for this company?

 

It’s not gonna be able to do that for you because it doesn’t have any information after July. So it’s gonna say, I’m sorry. I don’t have that, but I can give you twenty twenty three sales or something. And and then you’re gonna be like, so we just don’t have it.

 

It doesn’t exist. And no, it’s just this particular tool doesn’t do that for you because it’s got a view of the world that ended in July. So those kinds of things, I think for some folks when they’re first using these tools, they’re maybe, again, they’re asking questions that the AI isn’t built to solve. And I think that’s part of the reason why, like, a perplexity has taken off as an AI tool because it has that kind of real time hybrid where it can actually look out on the Internet for you and also tell you things about historical stuff.

 

So depending on what you’re asking for, it can do both. And that might be one of the things people are not realizing that these have these tools have subtlety to them and you can do different things. It’s kind of like if you are using Word to do something you probably should do in Excel or vice versa. Right?

 

It’s just really clunky to write an essay in Excel. It It would be pretty hard to put a whole bunch of data just tabbed into into Word. Like, there’s a reason why you have two different tools for those two different things. And I’d like to see that be the way AI kind of evolves to the maturity.

 

I think at the end of it is that you it is a suite of different AI types of tools. It’s not just the generative AI. It’s kind of Mhmm. Some of them are better at this and some of them are better at that.

 

And certainly in health care, the idea of teaching an AI to know medicine, to to speak the language of doctors, or to make sense of the medical records and all the acronyms and things like that. I mean, those are all areas where people are starting to create more specific tools rather than just a general, like, something you see in GPT or Claude or whatever, that’s more generalist. So I think that’s part of it too is figuring out how do we how do we specialize the AI. Again, the same idea with the factory.

 

Like, there’s different machinery that does different things, but it’s all using electricity as the under underlying principle for how it works.

 

Well, when I think about health care writ large, you know, we talked about a lot of different things in pharma, traditional health care. But when I think about health care writ large, it’s going to be ground zero for figuring out a lot of this stuff. So we were just talking about, you know, the quality of unstructured text. Well, you know, that’s that’s doctor’s notes transcribed via Natural Voice in into, you know, an iPhone.

 

Right? It’s trying to decipher handwritten doctor’s notes in a way in in a way that’s actually accurate and actionable. Right? So health care is gonna be ground zero for figuring out a lot of these things.

 

Right. And, you know, what is appropriate for classic BI and for analytics and statistics and what’s more appropriate for GenAI.

 

Man, if you like solving hard problems, if you like data, if you like exciting stuff, if you like actually making a meaningful difference when you talk about people’s lives, sounds like a great career option because as much and now I’m opining kind of a little bit of editorial here. As as much as AI may help to augment and potentially even automate some of the the the diagnosis happening here, What you and I just talked about is a is a whole new world.

 

Right. Right?

 

A whole new world.

 

I think so.

 

A whole new world where data and medicine are coming together, and we are we’re finding new things that we never even imagined were findable. We’re finding correlations and causations that we never even imagined were were possible. Right.

 

Well, that is yeah. We’re gonna collect more information over the next you know, what we already have. We keep exponentially collecting more data and more information than we have ever before.

 

And we talked about this a little bit yesterday too, the wearables and Yeah.

 

The way that all this data can be much more, you know, the timeliness of when you can find out what’s going on with different parts of your physiology, from your kind of sleep to your heart rate to whatever. That used to be maybe once a year, you would go into the doctor and find out kinda how everything’s doing. Now you could theoretically know that your heart is malfunctioning at a minute by minute basis. So that I mean, that’s another whole realm that, right now, we haven’t really tapped into.

 

We’re not making the most use of, but I think the clinical pathways that are there, I mean, that idea of med tech is definitely coming. And as it gets goes away from being about Apple watches and Oura rings and these kind of more consumer goods to really being true medical diagnostic level technology, that’s gonna be another frontier for how clinicians can hopefully manage a lot of this remotely, right? You don’t have to necessarily go into a hospital to know how someone’s feeling. And especially for people who maybe wanna age at home, that would it’s a lot of work to get out of your house and go to the medical facility.

 

So and also risk of getting infections and all kinds of other problems. So I think that’s the other piece of it too is we’re gonna collect even more information now about how people’s health is working or not, and that can also be another variable that is just more unstructured data that needs some interpretation and some kind of combining with all of the qualitative pieces of how people are doing. Because that’s, to me, is a big part of the gap we have too is there’s, you know, oh, I felt plumbing yesterday. Well, whatever was happening yesterday isn’t happening now.

 

So now what do we do? Right? How do we make sense of what to do with that information?

 

So it’s a whole other kind of it’s probably a whole another hour just about Oh, yeah.

 

Well, yeah. I mean, we didn’t even talk about kind of, like, wellness and preventative stuff.

 

Right. Like, that’s that’s yesterday. Like, how do we get ahead of some of this stuff so we’re not actually treating with anything? And I think that to me is the ultimate kind of win would be that we’re we’re getting to a place where people don’t need to be thinking about because it’s funny, work in a space where you hopefully never use the products that you’re you’re working on trying to make sense of.

 

So that’s the even more ideal place is that you’ve prevented and kinda gotten ahead of some of these things that are creeping up on you, and you never have the emergency room visit because you’ve kind of headed it off with some other easier way to solve the problem. So it’s a journey for sure, but I actually am very excited about the notion of data driven health. I think that’s something we haven’t done enough of in the past and now we can. And I think people who are open to it, they’re willing to kind of take some of that risk of having their themselves tracked and all that, I think could probably benefit and do do better with that information.

 

We didn’t even get into any of the epic stuff and the big brother stuff and the the like, we didn’t even talk about No.

 

We did. I think we did.

 

About that.

 

And and that’s a whole other hour easily.

 

Yeah. That’s a whole other thing. I think the biggest thing with that, though, is that we have a a strong obligation.

 

There’s a lot of things we could do Right.

 

In the health care space that we we choose with, you know, regulation, governance, law, policy, however you wanna talk about it, to not do because we’ve decided as this collective group that that’s not a good use of our resources. I think in the rare disease space, there’s this interesting twist on the ethics because if someone is genuinely unaware that there is both a disease that they have and a treatment for that disease, it feels like the ethics kind of flip over and say, how do we get that information to that person?

 

Or terminal.

 

Yeah. Or terminal.

 

I mean, this would be the kind of things that you just you’d rather have people get information about this rather than not.

 

Right.

 

So that’s, I think, part of the challenge in general with this, and this is the tension you see actually probably more strongly with the GDPR policies in Europe where there is this, you know, I think for some patients, they kind of want their data to be available and shared so that if science finds a breakthrough to fix their problem, they’d be the first to know. You know? And right now, that’s very hard to do in some countries. So definitely a whole other whole other topic.

 

We could talk we could talk for hours. We we we should probably we we should probably allow our audience to noodle on the things that we’re talking about. You know, I felt this way after I talked to you yesterday, and I feel it even more today. I’m I’m excited about this space. I’m excited about the future here. I’m excited about all the opportunities.

 

I’m really excited about just loosely the idea of wellness that we’ve just kind of touched on at the end.

 

But, you know, when you talk about data for dashboards and making your your your manufacturing plant a little bit, you know, more effective or your your your your sales conversions a little more effective. I mean, that’s meaningful stuff, and I’ve built a career around that stuff. But when you start talking about our species, right, when you start talking about, you know, my wife or my child, like, wow.

 

This seems like an exciting place that the data people should should really be considering if they’re not already because there’s a lot going on here, and there’s a lot to learn.

 

And, you know, I’m I’m glad that we had the conversation. I’m glad that there’s smart people in this space like you.

 

To our listeners, thank you for listening. If you’ve made it this far, please take a moment to subscribe to the podcast and the thumbs up and the socials and and all the rest of it. Jane, thank you so much.

 

Thank you so much. This was so fun. And I agree. I’m obviously a little biased, but I do think this is a wonderful data space to be in, and I encourage people to to do it, to come in.

 

Bias bias is totally okay, in in this regard.

 

We’re we’re promoting, data and analytics. We’re promoting health care. So with that, please check us out on another episode of CDO Matters podcast sometime soon. By the way, third Friday of every month, I do a live version of this on LinkedIn.

 

So if you’re listening to the podcast and you want maybe a little bit more and you want a little bit more of the fun rep parte back and forth, often I have some guests, but I kinda do a little bit more of a free flow, third Friday of every month, you can ask me anything as well. We’re on LinkedIn. So you can come and bring your questions maybe based on some of the conversations I’ve had in the podcast, based on some of the stuff you see me post on LinkedIn. Who knows?

 

Come see me third Friday of every month on LinkedIn. Thank you again, Jane. We will see you on another episode of the CDO Matters podcast sometime very soon. Thank you for stopping by today.

See you again soon. Bye for no

ABOUT THE SHOW

How can today’s Chief Data Officers help their organizations become more data-driven? Join former Gartner analyst Malcolm Hawker as he interviews thought leaders on all things data management – ranging from data fabrics to blockchain and more — and learns why they matter to today’s CDOs. If you want to dig deep into the CDO Matters that are top-of-mind for today’s modern data leaders, this show is for you.
Malcom Hawker - Gartner analyst and co-author of the most recent MQ.

Malcolm Hawker

Malcolm Hawker is an experienced thought leader in data management and governance and has consulted on thousands of software implementations in his years as a Gartner analyst, architect at Dun & Bradstreet and more. Now as an evangelist for helping companies become truly data-driven, he’s here to help CDOs understand how data can be a competitive advantage.
Facebook
Twitter
LinkedIn

LET'S DO THIS!

Complete the form below to request your spot at Profisee’s happy hour and dinner at Il Mulino in the Swan Hotel on Tuesday, March 21 at 6:30pm.

REGISTER BELOW

MDM vs. MDS graphic