Episode Overview:
Join the CDO of Profisee, Malcolm Hawker, as he explores the problem of how the increasing prevalence of dark data is playing a role in helping to create a growing sustainability challenge in the world of data and analytics.
Malcolm explores the root of the problem, the impacts its having on the environment, and the practical steps that data leaders can take today to make their data management practices more sustainable.
Episode Links & Resources:
Good morning. Good afternoon. Good evening. Good. Whatever time it is, wherever you are in the world.
I’m Malcolm Hawker. I am the host of the CDO Matters podcast.
Thank you so much for joining me today.
Whatever time it is, wherever you are, it’s Friday as I record this. This is the second week of July that I’m recording this. So because it’s Friday, I’m naturally in a good mood. That’s good. I’m also in another really good mood because as of Monday, I will be in Boston, Massachusetts attending the CDOIQ conference. I think they call it a symposium. It’s the CDOIQ symposium.
I think they that’s a conscious thing because it is really kind of more of an unconference or at least that’s how it started. It is a user driven community where the community of CDOs actually decides what will be talked about and what won’t be talked about.
So that’s really kind of cool. I’ll be at the symposium. So I’m in a particularly good mood. It’s Friday. I’m off to Boston.
All sorts of good stuff.
The topic for today’s discussion, you get yours truly and yours truly alone today. The topic of today is sustainability in data. So if you have a passion for the environment, if you have a passion for doing whatever it takes to make sure we pass a better version of this planet to future generations, and you have a passion for data, and maybe you run a data and analytics function. Maybe you’re a CDO. Maybe you are a data steward. Maybe you’re a data governor. Maybe you’re not even in the world of data, and you stumbled on this podcast or you stumbled on this YouTube recording because of the title.
Whatever the case, I’m thrilled that you’re here, and we’re gonna take a look at what can we as data leaders do to help improve the sustainability of our practices, of our governance of data, of what we do day in, day out. That is today’s topic.
We’ll dive into that in just a little bit, but let’s cover a few logistics first.
As you heard me say, I will be in Boston next week at an amazing conference. I think the CDOIQ conference, and I mean this, is the best conference for senior data leaders. Now, that’s generally, I would say, depending on, of course, the size of company that you work for, you know, could be director at some companies if you work for like an Uber large company, could be the CDO, but a senior leader where you are leading a function where generally you’re going to have a team, maybe you don’t. Maybe you’re CDO of one like Joyce Myers in a previous episode of the CDO Matters podcast, where she gave she gave a great talk about how she’s overcoming so many of the hurdles that that she faces as a team of one trying to build a function.
But I digress. If you’re a senior data leader at your organization, no matter how many people you have working for you, whether it’s just you, the CDOIQ conference is a great conference to go to. You will spend time with interfacing with other CDOs. You’re going to hear con content delivered by other CDOs.
It’s a fantastic event. If you have not been, you really need to plan to go next year. The reason why I’m kind of ranting about it is not only because I’m going to be there next week, where I will actually be speaking no fewer than three times.
I really stumbled on a pot of gold this year. I don’t know how this kind of came to pass but for somebody who actually really enjoys speaking and for somebody who enjoys getting out in front of people, I hit the jackpot this year.
We, prophecy, is sponsoring the event as as we usually do and that typically will get you a a well, in this case, it does get a speaking slot. That’s not always the case by the way. A lot of the speaking I do is because I’m invited by the conference organizers.
Great example is is Dataversity. Often, Tony Shaw and his team will invite me out, out to speak, even though we’re not sponsoring, even though we do sponsor often. So we did get a speaking slot because we are sponsoring through Proficy, but two additional slots came up because I have friends and, previous coworkers and others in the data space who wanted me to co present with them. So I will be co presenting with Moody’s Analytics and my friend Steve Kleinman on Tuesday afternoon talking about MDM.
And, I will be on Thursday afternoon co presenting with my friend Angelie Bansal from Cervelo, a Curdi company. Both Steve and Angelie are our friends. I’ve worked with them. I’ve worked with Steve for for many years in a previous life. I’ve only recently, over the last two years, come to know Angelie, but we’ve really, we’ve really, hit it off. We we think the same way about a lot of issues.
Conversation I’m gonna present with, the presentation I’m giving with Angelie is all about data culture and data leadership. So in both situations, I’m stoked. I’m gonna have three conversations and three presentations at the just amazing event. So I’m I’m, like, I’m I’m I’m giddy.
By the time you watch this, it’ll be the third week of August. So CDOIQ will have come and gone.
However, another reason why I’m talking about this is because the great people at CDOIQ actually Thank you, doctor Richard Wang and team at CDOIQ Symposium.
And what did we do with those? We gave them to you.
So maybe you’re on the email for the CDO Matters community. Maybe you get my monthly newsletter. It’s called the CDO Matters Roundup.
And if you’re on that email, Drissro, in the last well, again, we’re in a bit of a time machine here.
But a month ago, you would’ve got an email from me saying, hey. Do you want a free virtual pass for this amazing conference? And they were gobbled up like that. They they they they went like hotcakes, and our community just gobbled them up.
And I’m so excited that you did. So if you’re listening to this or watching this podcast and you were actually able to attend the virtual event, hey. If you’re watching through YouTube, just drop a drop a note below. Make a comment below in in YouTube.
I’d I’d love to hear about it. I’d love to hear about your experience. Or you know what? Send me an email.
I know it’s a a month after the fact at this point, but send me an email. Let me know how it was.
And let let me know if you think that you’d want to attend in person in in the future. I think that that would be of interest, to know as well. But in either case, I I I’m just, like, so excited that we were able to give away a hundred of these passes, and they went to you, members of the CDO Matters community. So I’m I’m just so stoked about that because I suspect there’s many of you you who don’t get a chance to go to these types of conferences that often.
Right? Maybe you actually live on a far off place. I was talking to somebody yesterday from from Singapore who wanted to go to the conference, and this, of course, is a virtual version of the conference. I was talking to people in Europe.
I know for a fact, some folks from Europe are gonna attend virtually, including my friend Eddie Shore or Eddie Short who who signed up. So I’m glad Eddie’s getting to go. I’m glad people from Singapore getting to go. I’m glad that people who wouldn’t otherwise get a chance to go to an event of this caliber, and this is a high caliber event, get to attend.
And by the way, the virtual event is no different than the in person event except for obvious differences.
But I’m saying the content the content is identical.
Right? The content is identical. The great thing about CDO, IQ is is that they live stream everything.
And they live stream it through an, actually, a really useful app called, I call it Hoova, but people call it Hoova. I think it’s technically Hoova.
But, I’m a fan of eighties rap and, so I say Hova.
And I’ll I’ll leave you to figure out why I say Hova because I’m afraid, a fan of of eighties rap music. But anyway, I’m I’m over the moon on CDOIQ because I get to speak. Oh, also, the hits just keep coming.
Also, because I get to spend a day on Monday with my friends and partners from Microsoft, well, they are presenting a workshop, an all day long workshop, on data governance, on Monday that is loosely affiliated to to the to the conference because it’s actually this workshop is happening on sixteenth floor of the Hyatt, which is exactly where the CDOIQ conference is happening. So again, I’m over the moon. It’s gonna be a great week next week. I’m so excited. We got to get a hundred members of our community out to this event. It’s just I’m I’m I’m giddy, as you could probably tell.
So any other housekeeping, items that would be relevant for August? I don’t think so.
I hope you’ve avoid avoided.
I think I need to slow down.
I hope that you have enjoyed why would I say avoid it? I hope you’ve enjoyed some of the content we’ve been producing on on the podcast.
I hope you actually enjoyed by now, you would have got a sneak peek of the presentation that Angelie and I gave because that we we published that as an episode a couple of weeks ago.
As well, last week, you would have been able to hear doctor Joe Perez speak about how to become a better public speaker. And my goodness, when it comes to public speaking, there’s few better than doctor Joe Perez.
Scott Taylor, Joe Perez, that would be a great showdown.
I don’t even know how I could how I could judge that because they’re they’re both unbelievable.
And and I hope you got some value hearing from Joe about how you could become a better public speaker as well. Some of the things that he says about, it’s not even necessarily the the content. It’s just the delivery.
And that can be learned, folks. That stuff can be learned. So I really hope you’ve been been enjoying the content.
You’re subscribed, that you’re liked and subscribed and all of that stuff.
When you run into me and sometimes you do, you’ll run into me at conferences, quite obviously, or other events or or maybe you come to one of the prophecy events that we put on around the country, one of our roadshows, one of our data hero roadshows. Oh, speak of, by the way.
You will watch this on August twenty second, about a month later, not quite a full month, three weeks later will be the second annual Prophecy Data Hero Summit.
You want to sign up for that. It is a day long virtual event where the keynote speak speaker will be yours truly, and I will open that event sharing the same presentation that I gave at CDOIQ.
So at CDOIQ, I’m talking all about data leadership. I’m talking about adopting a data hero mindset. What I’m talking about is a is a mindset that is going to help us once and for all break over this barrier, this hump that we have between the potential of data and the value that we’re delivering through data. There’s a gap there. We all see a massive potential, but many of us continue to struggle to deliver prolonged and meaningful value.
I believe that gap is a function of the way we think about our jobs and our data and our customers. And that’s what I’m gonna speak about at CDOIQ next week, And that’s what I’m going to deliver as a keynote speech for the Data Hero Summit the second week of September. So please people, sign up. It’s free.
You can hear me speak. You can hear people from all sorts of experts speak. I should have probably brought up our speakers list. But I know for a fact we’ve got a number of, experts speaking as well as prophecy customers who are doing MDM.
Okay? Who are doing data governance, who are overcoming challenges day in and day out. So so many opportunities through this community to access unbelievable content, access speakers, access people who know what they’re talking about.
It’s I I’m just so thrilled to be able to bring you this content because I think this is the way that we as a community of thought leaders need to be operating these days. Sharing what we know, lifting the tide so all boats rise. You’re one of those boats. I’m thrilled that you’re here. Let’s move on to today’s topic. And I hope you didn’t mind me using a metaphor that basically called you a boat.
Man, it’s Friday, and like I said, I’m a little giddy.
And, well, of course, because it’s the weekend.
Let’s talk about sustainability in data, shall we?
I can’t believe it was two years ago. But two years ago, I wrote an article for Forbes titled Stop Hoarding Data, Save the Planet. And then I said, Is the CDO a call to action? Stop Hoarding Data, Save the Planet.
What I’m gonna share with you today is really an evolution of something that I started talking about two years ago. Now, do I talk about this topic often? Well, no. Because I’m not an expert in sustainability.
Okay? I’m an expert in data. You’re listening to me because I know my stuff when it comes to data. I am not an expert on sustainability.
If you are and you would like to collaborate, maybe you should reach out. But but I’m not. So it’s not a topic that I have talked an awful lot about because it’s not it’s not a major comfort zone for me, but I still think it’s kind of important.
It’s not just kind of important, it’s really important. Right? Sustainability is important.
Putting making sure that we leave the world at least as good as it is and maybe even better for future generations. Of course, these things are important.
I’m a technologist. I’m an innovator. I I am all about progress. I want to build.
I want to make things. I want to help you build and make things. And I believe in the promise of technology. I believe in the promise of AI.
I believe in the promise of development and innovation and investment in technology. I believe in all these things. I believe in blockchain. I even believe in crypto, which a lot of us don’t anymore, but I believe in all these things.
So does that make me a contradiction?
Somebody who believes in technology but also believes that we can do a better job by the planet? No. Of course not. We’re capable. We can walk and chew gum. So I printed this article, published this article a couple years in in Forbes, and I I didn’t hear really kind of much about it at all.
At the time, and even to this day, I was shocked. When I was doing my research two years ago, I was shocked by how few people in the world of data and analytics were talking about this.
Because even at that time, even at that time, I had convinced myself, and I still think this is true, by the way, I had convinced myself that it was only a matter of time before CDOs and CIOs are given sustainability targets. I still believe that to be true.
I think macro forces over the last couple of years have kind of changed the game a little bit. Of course, there’s AI, but there’s things like, you know, mass mass inflation. There’s things like, oh, two wars in Europe. Well, one’s not technically Europe, but you get my point. Two wars.
Things kind of change the equation a little bit. You know, energy, shortage coming out of the beginning of the Ukraine crisis.
So I think people may have taken their eye off the ball a little bit when it came to sustainability, but still, I was just blown away by how few people were were were talking about it.
So I wrote the article and, you know, went out in Forbes, and I got a little bit of feedback. I I got a little bit of interaction from it, but, honestly, not very much. Not very much. And that’s one of the reasons why I haven’t haven’t talked much about it. But I’ll say over the last year, things have changed just a just a little bit.
There seems to be a little bit more traction growing here. And honestly, folks, it’s largely coming out of Europe with a bit of a spillover effect into into the US and into other markets with a major focus in in Europe on on ESG, environment, sustainability, and and governance.
So more people are talking about it.
And along comes next week or last week, which by the time you watch this would have been probably six weeks ago now. I got an invite to, join a panel, a discussion panel talking on live radio on an NPR station, the NPR station, KQED, in San Francisco Bay Area. And, that was kind of that, you know, I always enjoy opportunities to talk.
I always welcome opportunities to share what I know and to share my passion for data and to share share things. And and this came up for, like, hey. Go on NPR and, you know, it’s live.
Other radio waves in, you know, in and around the San Francisco air area, and I was like, heck yes. Sign me up.
I only ended up kind of talking for about fifteen or twenty minutes over an hour long conversation. They had a few experts on, including a a gentleman from Time magazine, a Time correspondent, named Andrew Chow, who was really, really smart sharing a bunch of data related to how much energy is actually being consumed in data centers.
And this was the genesis of my conversation in in my Forbes article, which was and this is data from two years ago. I suspect the data isn’t much better now. I have not updated it. I need to, but that data that I found a couple of years ago suggested, okay, this is a suggestion, from reputable sources, but there’s an acknowledgment that a lot more work is required here. And this is across kind of two key metrics. One of them is that the data center industry produces more greenhouse gases than the global airline and shipping industries combined.
Okay? So let’s assume that this is an order of magnitude off. Right? Maybe let’s just say data centers produce half or a quarter.
In any in either regard, that’s a lot of energy.
If you’ve ever flown into San Jose Airport from the north, looked out into Santa Clara, and maybe you’re in the Bay Area. You know exactly what I mean. You look down and you see football field after football field after football field of data centers.
Fly into Dulles Airport, IAD, again, from the north. You’re flying over bucolic Maryland, and you’re looking down, and it’s, oh, there’s the Catoctin Mountains, and it’s beautiful, and it’s green, and it’s rolling hills. And, oh, well, there’s a start to see some people. Frederick, Maryland, you look down. Oh, and then you cross the Potomac, and, you know, and that’s lovely as well. And then you look down.
Ashburn, Virginia.
Holy cow. It is it just for miles and miles and miles and miles of data center after data center after data center with air conditioners as far as the eye can see.
So I didn’t need to be told that the data center industry was was consuming a lot of energy. I I I having lived in Northern Virginia for ten years, which I did, when I was working for this little Internet startup called AOL.
Interesting side side fact here folks, Internet history. The reason why all those data centers are in Ashburn, Virginia, the reason why AWS East is in Virginia, the reason why all those are there is because is AOL basically hardwired, that part of Northern Virginia directly to the eastern terminus or the the US eastern terminus of the Internet. It was called May East, m a e east. I forget what May stands for, Metropolitan Access something. Anyway, the DOD when DARPA net when they were building the Internet, basically, like the node for the East Coast of the United States and for the Internet, May East was in the basement of the cable and wireless building in Tysons Corner, Virginia, which is not actually a real city, by the way. I think it’s what is it? It’s it’s McLean?
Yeah. I think it’s McLean.
Either way. Falls Church? McLean? Either way, Tysons Corner. If you if you’re from DC or if you’ve lived in Northern Virginia, you know you know exactly what I mean. Basement of the cable wireless building was the the node for the Internet on the East Coast, and what AOL did is they they basically ran these fiber cables, these giant thick fiber pipes down what is now the toll road, aka the Greenway, when they expanded that road to give congresspeople, toll free access to Dulles Airport.
Man, I am a treasure trove of useless information today. To make a long story short, basically, they extended the eastern terminus point of the Internet to AOL in Ashburn, Virginia. Well, at the time, it was called it still is called Dulles, Virginia.
But, anyway, that’s why all the data centers are there, and and there’s, like, an endless array of them, and they’re consuming a ton of energy. Again, by some estimates, they’re producing far more greenhouse gases than the airlines and the shipping industry. And they are most certainly consuming a ton of electricity. There is no if, ands or doubts about that.
So the other data point that was kind of an anchor of this article that I wrote for Forbes is that a lot of the data that we have as corporations, and the estimates here vary wildly, because we’re talking about a lot of private information, it doesn’t need to be publicly recorded, it doesn’t need to be publicly disclosed.
So the the the the the band here is incredibly wide, but that band says that anywhere from fifty to ninety percent of the data that companies are storing is dark, AKA not used, AKA just consuming digital dust.
Right? But it’s still sitting on a disk somewhere. Right? It’s still sitting on a disk somewhere.
And if you are deployed in the cloud, honestly, it doesn’t really matter. It’s your cloud, somebody’s cloud, it’s just somebody else’s computer, but let’s just say the cloud. The cloud is those data centers by the way.
You have this data that is taking up disk space in these data centers and is collecting digital dust.
So put two and two together. And again, even if we’re off a factor of a factor, like of ten, right, we got a problem because we’re storing stuff we’re not using and we’re consuming scarce resources and we’re emitting greenhouse gas, this seems kind of wasteful.
When I thought about that, I hearkened back to my days running data and analytics functions.
I hearkened back to my days running a product management function when we’re reviewing product requirements, when we get to the reporting page.
I I I have memories of sitting in conference rooms as a product leader and a product manager talking when we’re building applications.
K? Right? We’re building software. We’re storing data. I I have vivid memories where I’m sitting across the table from somebody who is in data and analytics.
I’ve been on both sides of the table, by the way, whether it was building stuff or saving data. I’ve I’ve been on both sides of the equation where we had conversations where I was asked, oh, okay. Well, what what data do you need to store? What do you need?
Like, what do you need your reports to be? And the answer was, I really don’t know. Save it all.
And and have you been in a similar situation?
If you’re in data and and analytics, you’re a data leader or maybe you’re a product leader, I guarantee you’ve been in a similar situation.
And I’d be willing to bet he said the exact same thing that I said, which was, oh, man. I may need that one day.
Yeah. Go ahead. Just just keep it. Is it expensive to keep it? No. Not really.
Just prices keep going down and down and down. We’ve got a minimum commitment to to to spend on, you know, Google, Amazon, Microsoft. This is gonna help with our minimum spend. There’s not a ton of overhead here.
Sure. Go ahead. Well, are you sure? What does it cost? Well, I really can’t tell you what it cost.
Oh, okay. Well, yeah. Go ahead. Just save it all. I mean, literally, this was the conversation that I’ve had multiple times.
That was the lever of rigor that we applied to a decision to just store data forever, ever.
Even in the odd and rare cases where you’re doing some sort of audit where you’re where you’re figuring out if you should be archiving data. Right? I had that conversation over archival as well.
And, honestly, the decision there was, okay. I don’t really see a lot of cost here. I don’t see a lot of overhead here. And and as a product person, it doesn’t even hit my it doesn’t even hit my budget. It hits somebody else’s budget.
It’s minimal. We can’t even track it, and nobody’s tracking it. So, hey. Why not?
Because do you wanna be that data leader who gets asked by a product leader, by a business leader, hey, can you run me that report for the, you know, the number of filbert flanges that are misaligning to our grapple grommets every month?
And you answer, oh, we don’t have the data.
Have you ever been in that position? I have.
I’ve been in that position too, and that position sucks.
Right? Like, you don’t wanna be the data leader. It’s like, I don’t have it. We don’t have the data. Can’t do it. I mean, I can start gathering the data now, but I can’t get you a report for a while, and even then, it’s probably gonna be not that meaningful because we’ll only have a month worth of data, and I can’t go back and and, you know, I I don’t have a time machine.
You know, a lot of positive intention here or maybe not positive intention. Maybe maybe and it’s just not it’s I don’t think it’s willful ignorance. It’s just, okay. It’s not gonna cost us too much, so let’s go ahead and and save it. That’s data hoarding, folks.
That’s data hoarding. And by the way, if you’d like to check out the article that I published in Forbes, just Google hawker data hoarding. I come up number one every time.
Which is like shocker. I can’t believe there’s other there’s not other Hawkers out there. Hey, Robert, if you’re watching, my friend, I I forget what episode number, Robert was. Oh, let me look.
Let me look. Episode number forty seven.
My long lost cousin, Robert Hawker, who wrote a great book called Practical Data Quality.
Maybe he will write an article one day about data hoarding, but for now, I own I own that search term, Docker data hoarding.
It’s funny. I’m enjoying myself today.
Where the heck was I?
Anyway, I’ve been that data leader. I’ve been that product leader. I don’t think there’s a lot of kind of malicious intent here, but we need to rethink our practices here. We need to rethink our practices because, again, I think it’s only a matter of time before CDOs and CIOs are given sustainability targets.
For a while, I think some of them will be able to buy their way out vis a vis carbon credits, which is what most large companies are doing these days who say we’re carbon neutral. Well, you’re just buying your way out.
Separate conversation to talk about the efficacy as it were of that approach.
Maybe you are planting some trees. That’s good too. You know? But but but I mean, the carbon credit thing to me, I find that is just just, I don’t know, it’s a head scratcher to me.
But my point here folks is that I think we can and should be doing better. I think we can and should be doing better. How we get these initiatives prioritized, Maybe a separate conversation, but what I argued in our four in that Forbes article is that we should probably be focused on a few key things.
One of them was just better data governance.
Right? We need to have let’s just start with the basics. We need to know what’s out there.
Like, starting with that, discovery. K? Data discovery, data profiling. What is out there?
It’s not you know, there’s there’s a lot of us who don’t even know. Right? There’s a lot of us who don’t even know.
And if that’s the case, well, I think a you know, the first thing that we need to do is pull our head out of the sand and figure out, like, okay, what exactly is all out there? Now this isn’t me advocating and go say, hey. Let’s go spend zillions of dollars on a data catalog, but that certainly would help.
That certainly would help. Right?
I I think that that is an important aspect of this. First of all, let’s go discover what we’ve got out there.
Second of all, usage. My goodness.
How many of you can actually report?
Like, so step one, let’s figure out what our universe of data is. K? Step two is let’s figure out what’s being used and how it’s being used. Right? Is it being used analytically? Is it being used operationally?
Is it not being used at all? Right?
Three, metadata.
Right? Like and that include I guess I should have said that in number one, like discovering discovering all all of your data, the the figuring out all the metadata for it. Right? Figure out what you got out there.
And for a lot of this data, it’s gonna be a challenge. Right? For a lot of this, we’re talking about unstructured data. Maybe we’re talking about video.
Maybe we’re talking about other formats where doing this is more difficult and arguably wouldn’t be cheap. I get it.
So I’m not talking about doing all of this overnight. We can start with some baby steps, but, you know, just figuring out just just metadata. Right? What what’s out there. Right?
Discovering it, labeling it, classifying it, inventorying it, basic stuff. Right? And then usage. I already mentioned this, but then usage. What’s getting used? What’s not getting used?
Let’s revisit our our our archival. That’s hard to say quickly when you’re excited.
Our archival processes. There, I nailed it. Let’s revisit. Do you have policies for archival? What are they?
Are they enforced?
Do you have policies that actually says, hey. After two years of no use whatsoever, we will archive this data.
Like, you know, do you do you have a reasonable relationship with people in legal, who can give you or or maybe in your CISO organization who can give you guidance on what you can and can’t do from an archival perspective?
Right? Like basic stuff. If you don’t have archival policies, you need them.
You know, to me, I think usage is is a a key driver here.
But I do understand that there is certainly going to be some aspects of data that you need to keep from a legal or or or, you know, regulatory perspective, and that’s fine too. That’s okay. But that’s a part of this. You need to understand and and this does get back to data classification.
Right? Data discovery.
You know, you need to understand what’s out there, and is it more sensitive data? Is it financial data? Right? Is it data that that that falls under some regulatory or compliance umbrella? All part of discovering what you got, classifying what you got.
And then revisiting your policies related to data retention and archival. How long do you need to keep this stuff?
So those are some basics from a governance perspective. I I think we can all do a little bit better.
But here’s where the hard stuff starts.
And I say it’s hard because I know it’s kinda hard, but it’s certainly not impossible. The next level that we need to be thinking about here is ROI.
Right? We need to understand how data is being used and we also need to understand how data is being how data is being, you know, how value is being delivered to the organization.
This is a big part of it because you could have some dashboards that are being populated with data every year, every day, every month, every week, and never used.
Right? So again, getting back to usage, getting back to having some idea, is there real value here? Just because I’m spitting out a bunch of dashboards every week, does that mean there’s actual any real value here?
So this does get back to usage. I would argue, folks. I would argue, and I’ve said this many times on this podcast and I will continue to say it. I do see this being a role of some idea of a data product manager.
Right? Because what we’re talking about here is sunsetting data in essence. Right? If we were to look at this through the lens of a product life cycle, we’re talking about sunsetting and things. Right? We’re talking about decommissioning things.
And we need to be talking about these things more. And if we’re not using them, we don’t see any potential value from them, we need to decommission them.
There are some of you out there who are probably now saying, well, because of AI, we need to store everything forever.
I don’t know that to be true. I really don’t know that to be true. Because, again, if you’ve got a data product manager, right, and if you have somebody in your organization in the data and analytics function whose job it is is to intimately understand your customer’s requirements, that includes any requirements from a data science perspective.
Right? That includes being closely aligned to the producers of models, LLMs, any other things that you may be building out of a data science function, and the end users of those models, the output of a data function data science function. If you’ve got somebody whose job it is is to understand what they need and what they don’t need, I would argue you can do a reasonably good job at understanding and calling through what is a must store versus a nice to store versus a don’t need to store.
Right? Now, again, you could probably say, well, we don’t know what we don’t know. We don’t know what we we may need in the future.
Well, that’s true with anything.
That’s literally true with anything. Right? I don’t see that being a very sustainable is a great word here a sustainable argument because we that’s the the same is true with anything. Then we would never then we would hoard everything for all time, always, under every condition possible because, you know, there may be one day. Right? But that’s the hoarder mentality that we need to break away from because chances are pretty good. Pareto is going to apply here, and eighty percent of this data, you don’t need.
Chances are pretty good. But again, if you’ve got somebody whose job it is to understand those are customer requirements, understand how data is being used, understand how value is being delivered to the organization, you’ll be in a far better place to say, You know what? I really don’t need this.
This does get back to governance, basic blocking and tackling governance. If you have a governance function, whereas a part of that governance function, you know what? Maybe next year, let’s say twenty twenty five, maybe it’s not a massive investment, but maybe it’s a small investment where you say I’m gonna create a a sustainability subcommittee.
Right? Or or I’m at least going to have and initiate some sustainability program as a part of my data and analytics function and have it be managed through the governance committee because a lot of these policies will need will need some form of business sign off.
Right? If you get the business aligned, if you make a cohesive argument for archival of data today that is not being archived, if you are able to show both the business benefit of doing that because there will be savings here because you don’t need to necessarily keep this stuff churning away on discs if you show the business benefit. And lo and behold, maybe you also even show the societal benefit because that’s ultimately, folks, what we’re talking about here.
If you find a way to estimate, okay, there’s a lot of calculators out there. Here’s where my knowledge starts to to cap. Again, I’m not a sustainability expert, but I’d be willing to bet that there you could find some calculator or find some framework or some methodology that said, if I do these things from the reduction of energy or reduction of I I I eliminate a hundred terabytes of storage in an average data center, what does that do on an annual basis? I I I have to think there are at least some high level guidance out there that could give you an understanding of what sort of societal benefit your actions will take.
And you’re doing this as a formalized program, and you’re doing this with the blessing of a governance committee, and with the blessing of your business stakeholders, well, there you go.
Right? And what board of directors out there, particularly a European board of directors, but I believe increasingly a North American board of directors. What board of directors wouldn’t be able to to wouldn’t like to be able to report?
Here’s how we reduced our carbon footprint to this year out of the data and analytics function. And I know some of you in Europe are already saying, but we’re already doing that, Malcolm. And and good on you. Hit pip a rate.
That’s fantastic. I can tell you it’s not happening here. Maybe a handful of companies here or there, but it’s not happening here in the US and Canada. And I think it should.
Right? So let’s think about formalizing a program here. Right? Doesn’t have to be a major investment.
Maybe it’s just maybe it’s a small investment. Maybe it is a ten percent of a data product owner’s time, a data product manager’s time to start looking at this and and looking at ways that we could be driving goodness for our organizations and driving good for goodness for the planet at the same time. And if we can find a win win there, why wouldn’t we? Why wouldn’t we?
So I think I think you put all these things together, I think there are some baby steps that we, as data leaders or data practitioners, can start doing.
Step one starts with asking some questions here. Do we really need to keep saving this data forever and ever?
Do we know what it costs to save this data? Can we make informed decisions on whether we store something or whether we don’t store something?
All these things, again, I love the idea of a program. I love the idea of a data product manager who’s responsible for this stuff.
I think it’s necessarily a requirement to understand some of these cost benefit issues because sometimes, you know what? Maybe the ROI isn’t there. And this doesn’t necessarily always need to be a hard dollars and cents ROI because when you start talking about the environment we do get into some emotional issues where I think you could make a case that simply doing better for the planet should be enough of a motivation.
But if we’ve got data that’s sitting around that has never been accessed in ten years, like, seriously, I’d I’d I’d be willing to bet you’ve got terabyte upon terabyte of data that’s just sitting somewhere that’s never where that where it’s never been accessed, never been read once.
A bet. Can we get rid of that? Why wouldn’t we?
Why wouldn’t we? Now, for a lot of companies, the answer to that question is is because I could better deploy those resources on some initiative to drive top line.
Again, I I I can hear this already because I’ve been the one at the table, Heaven to justify stuff like this. Not sustainability things, but but more run the business type, non glamorous, keep the lights running initiatives.
And you could you could argue this kind of aligns to those types of initiatives, right, where it’s housekeeping stuff, where it is we have to make a decision between shiny new thing, right, and keeping the lights on or addressing some tech debt or data debt. And you always know what happens. You know what happens.
We we we choose the shiny thing and the tech debt piles up.
You could argue this hoarded data is a form of tech and or data debt. We just say loosely tech debt because that’s what it is.
We shoved it away for a rainy day thinking the rainy day would come and it hasn’t come. And now we got debt. And what do we do? So I understand the pressures that CDOs may be under or CIOs may be under from the perspective of, well, I’m gonna need money to address the tech debt, and that’s a hard nut to crack.
I get it.
I get it. But that’s where having some idea of what the benefits are gonna be here.
Does your does your board of directors have an ESG initiative? Can you find a way to tie to that? Are there dollars associated to that? Maybe there are. Maybe there are buckets of money out there that you can tap to start addressing some of this stuff.
So I think I’ll I’ll I’ll start to wind up a little bit there. I think there’s there’s certainly more that we can be doing. There are kind of overarching cultural issues here, but I think these are within IT.
I’m not entirely sure. I mean, I did mention earlier my story about being the guy at the table having the discussion about do we keep it, do we not keep it. But as a product person, that was an easy decision because my IT person was not pushing back and saying, hey. There’s a cost here.
Right? There was no real hard pushback.
Right?
So are there cultural issues that that maybe at play here? On the business side, maybe.
Maybe. But I think it’s more on the IT side. And I think the real cultural issue here is this kind of this rainy day fund idea that that maybe I’ll need it one day. A fear of being asked to produce a report that I cannot produce because I don’t have the data. And that’s the bigger picture stuff that we need to start start picking away at. And I think we do that by taking that programmatic approach to really starting to tackle some of these issues.
Right? I’ve really been talking about this through the kind of lens of some sort of, like, tech debt type initiative, but you could interweave what I was just talking about into new initiatives going forward.
Right? Again, if you build those relationships with your business partners, with the software engineering organization, with the CTO organization, where you actually have a data product manager sitting at the table when they’re defining requirements for new software initiatives or new tools or new new whatever it is, new website, doesn’t matter. Where you actually have somebody from your team at that table, where you’re deeply ingrained and where you can say, hey, yeah. Let me do the analysis for you and what it’s gonna take to fulfill your requirements.
Right? Let me help put together, you know, I’ll help you actually write the requirements. I’ll help you refine the requirements. I will help you understand what the cost will be to do what you want to do and the benefits of doing what you want to do. I’ll do all those things. Oh, and by the way, there’s this new metric we’re also tracking here related to sustainability.
Wow.
Right? Kind of interweave that governance and data product management role into everything else the business does day in and day out. And that’s really where we’re kind of writ large. That’s what we’re trying to get to, folks. We wanna be part of the business. We wanna be a fab part of the fabric unintended of the business. So whether you’re doing it retrospectively, trying to clean up after the fact, and I mentioned some things we could do.
Discovery of data, just better governance. Right? Understanding usage. That’s kind of part of a cleanup thing, but that would also be part of a go forward.
So either or, whether you’re addressing the old debt, whether you’re implementing a program to stop the debt from happening again in the future, nothing but goodness.
So that is my suggestion for CDOs who or anybody else in IT, by the way. I think you could easily make the case here to to have the same thing from, like, a software development perspective. Right? Maybe you’re building and managing and and and sustaining software. You could be asking the exact same thing. Right?
But data, I think, is a little bit unique, because of the predilections that are already kind of there, our tendencies that are already there, and the costs, and the fact that we already know there’s data to tell us we’re not doing a good job here. We’re just not. Right? Having all this data sitting around collecting dust, no bueno.
So I think we can do a better job. I know we can do a better job. I know we got a lot of plates that we’re spinning. We’re spinning a lot of plates. I get it.
But I also know that a lot of boards boards of directors care about this stuff.
And I also know that increasingly as time passes, I suspect regulators will care about this stuff as they already do in Europe, as they already do.
So you can have this happen to you or you can happen to it.
That’s kind of the way I look at it. I would find a way in twenty twenty five if I was a CDO to find a way to allocate some resources to what I’m talking about and to come up with a success story here because heaven knows we need all the success stories we can get. And if we’re doing good by the planet, cha ching.
Alright, folks. That is it for me today for another episode of the CDO Matters podcast.
I’m excited. Maybe I’m gonna go pack my bags now for Boston. It’s Friday. I don’t leave till Sunday. Probably not.
But thanks for listening. Thank you for subscribing. If you’re not already a subscriber to the podcast, I would be tickled pink if you joined the CDO Matters community.
And hey, sign up for our newsletter, the CDO Matters Roundup.
Maybe next time around, you could get a free virtual pass to the best conference for data leaders, retail value eight hundred dollars simply by signing up.
Anyway, thanks for listening in, folks. I will see you on another episode of the CDO Matters podcast sometime very, very soon. Bye for now.