Brainstorm AI Singapore 2024: How To Deliver AI-Ready Data

  • 3 months ago
Satyen SANGANI, Co-founder and CEO, Alation Geraldine WONG, Group Chief Data Officer, GXS Bank Moderator: Nicholas GORDON, Editor, Asia, FORTUNE; Co-chair, Fortune Brainstorm AI Singapore
Transcript
00:00So, Sachin, I'd like to start with you.
00:03We're all aware of the garbage in, garbage out problem.
00:07Bad data leads to bad outcomes.
00:09But how does this play out in the world of AI?
00:13So generative AI poses a different structure for analyzing data than the traditional pipeline.
00:22Historically, when we've analyzed data, we basically take all of this unstructured data
00:29or we build an application and software applications and we try to extract information by structuring
00:35a data model.
00:36So we try to make structure from the unstructured thing.
00:40In the generative context, it's a little bit different because you're effectively short-cutting
00:44that entire process.
00:46You're taking all of this unstructured information and you're feeding it into a model.
00:50You're vectorizing, as it were, and as a consequence of all of that, you're basically then getting
00:56an interim, it's almost a black box, and you're getting an output.
01:00And that output, there's no way to check the information that's going into it.
01:04Whatever you feed it, it absorbs.
01:06And so this idea of generating these models becomes a lot more tricky because now there's
01:10no way to really intrinsically check how that data is processed, what's going on in the
01:15middle of the model.
01:16If you give it garbage, you're going to get garbage.
01:19Even if you give it good data, you might get garbage.
01:22So Geraldine, we're talking about data.
01:26You're with GXS Bank.
01:27Could you tell us a bit more about what the bank is trying to do and why data is so critical
01:32to that objective?
01:33Yeah.
01:34So GXS Bank is one of the four full digital banking licenses that was given four years
01:40ago by MAS.
01:42It is made out of the Grab Singtel Digital Consortium.
01:44I think one of the key CVP that we have in the bank is to leverage data to do things
01:51differently in terms of pricing, credit risk scoring, as well as enabling financial inclusion,
01:57financial mobility to customers who today might not be served by local banks.
02:03And when we started the journey four years back, I think Sachin has been with us on this
02:07journey as well.
02:08And the importance about getting the quality data, the right types of metadata inside your
02:14data lake is so crucial in building those models that we talked about.
02:21The kinds of data sources are so important when we started off as a new bank.
02:25Imagine having to do credit risk scoring or risk-based pricing from scratch without any
02:29forms of history of the data of your customers.
02:34And this was the experience that we had.
02:36If I had to summarize the journey thus far, I think the first year was really about getting
02:43governance in place, policies, frameworks, the tools like Sachin is using or has created
02:50in place as well, processes in place.
02:53And then you realize that in year two, those processes might not work, and you go back
02:56and revise some of these processes.
02:58The collection of data is so important.
03:00You need to be able to understand the customer journey and bid them at the touch points of
03:05the customers.
03:06Because if you do it after the fact, then you're missing out on all this information
03:10data collection that you're collecting.
03:13And then building up those models, refining it, and including those feedback loops that
03:17your customers are giving you signals from before you actually put it into production
03:22as well.
03:23So I would summarize this as a four-year journey.
03:26So Sachin, you have a client or customer who comes to you with a data problem they want
03:31to solve.
03:32How do you start the conversation about data quality, about data intelligence at Alation?
03:40The interesting thing about data intelligence is that often when customers come to us, they
03:44say they want to organize their data.
03:47And often these are very complicated companies with lots of legacy data and systems that
03:54has existed over, in some cases, decades, and in some cases, multiple decades.
03:59And there's this view that you can sort of organize all of that data in some period of
04:06time.
04:07That if we just put some work into it, and we buy the right software, and we buy the
04:10right tools, all of a sudden, all of this data will get organized.
04:15Now interestingly, it doesn't work that way at all.
04:17Because most of the time, recreating all of the information that's produced this information,
04:23navigating all of this code, navigating all of this software is almost impossible to do.
04:28Most of these companies have been built over thousands and tens of thousands of projects
04:32with hundreds of thousands of databases, each of which have tens of thousands of data sets
04:38themselves.
04:39And so to try to do all that archaeology is really very difficult.
04:42And then to try to make all of that data go to higher quality is also difficult.
04:48And it also, frankly, depends on the use case.
04:50And so often, what we recommend is for customers to work from the end backwards.
04:55You first have to start and think, what am I trying to do?
04:57And what are the relevant bits of data?
05:00And often that's not very apparent.
05:01Because some bits of data might answer a question before you even knew that that data existed.
05:06And so it's an iterative process for sure.
05:09And it's not one that's always a straight line.
05:11But it is the case that you have to work from the end backwards.
05:14Otherwise, you end up in a situation where you can get quickly lost, and you can lose
05:18a lot of money.
05:19And how often is it that that legacy data just ends up being useless to whatever problem
05:22you're trying to solve?
05:24Quite often.
05:25Yeah.
05:26Quite often.
05:27Either because it was incorrect, or because you couldn't find it appropriately, or because
05:31the system that contained it wasn't actually the authoritative source.
05:35So Geraldine, you mentioned in your answer about you having to get alternate data sources,
05:40which requires working with ecosystem partners, getting their data, getting them to share
05:45that data responsibly.
05:47And I'm reminded of financial inclusion.
05:50The reason why it's hard to serve these customers is because there isn't enough information
05:54out there to build the right customer profile, to build the right risk profile.
05:58So how are you and GXS Bank trying to solve this problem by getting the data that you
06:03need to serve these customers?
06:06So in Singapore, I think most of us here have records in the Credit Bureau of Singapore.
06:13However, in Indonesia, in our regional countries, this might not be so.
06:17And this is when we rely on alternative data sources, such as from Grab and Singtel.
06:23When we first started this journey, we had to work closely with Singtel and Grab to use
06:28them as proxies.
06:29So if you think about Grab, how your financial behavior and some of their financial wallets,
06:34your buy now, pay later.
06:35On the Singtel side, you would think about bill repayment behavior, for example.
06:39Those are simple proxies that we could use.
06:41But they are strong signals for us to build a starting model, as well as a future engineering.
06:47And from there, we actually, well, I'll term it as a cold start problem in doing risk-based
06:54pricing as well as credit risk scoring.
06:56And with that, I think what we have done is to be able to offer small little loans to
07:02people who have no bureau or thin bureau, that's what we call in Singapore, who otherwise
07:06today might not be served by the traditional banks.
07:10In doing so, what you're doing is to be able to get records of them for their regular repayment
07:15behavior coming back.
07:16And once this has been determined to be regular and reliable, we are then able to increase
07:23the number of amount of loans to them as well.
07:25I think this helps in creating those financial inclusion and mobility for some of these customers
07:30who might not be able to get a loan otherwise.
07:33So we are open to questions from the floor.
07:35So if you have any questions, think about them as I ask Sachin this question.
07:40So you talked before about too much data, too much legacy data getting a handle on that.
07:45But the opposite problem is not having enough data.
07:49And there has been a lot of discussion about ways to solve that.
07:52And one of the solutions proposed is synthetic data, artificially created data to expand
07:58your training set.
08:00What are your thoughts on synthetic data?
08:03Is it a workable solution to this problem, or does it do more harm than good?
08:09It certainly can be quite valuable, given the right context.
08:12So essentially what you're doing with synthetic data is you're taking real data, real documents
08:17inside of a company that might instead have confidential information or trade secrets.
08:22And you're essentially tokenizing it and changing the actual content, but leaving the
08:29structure of that content in place.
08:30And so if that metastructure is altered completely, then you might lose the learnings.
08:35And so it really depends on how the transformations occur, what the use cases and the outcomes
08:41might end up being.
08:42But it certainly has a place, and it certainly has a use.
08:45And in a world where we're looking for more and more data, and these models are so voracious
08:49and their appetite is so big, it is a helpful thing to have in a world where we need to
08:55get more information to make the models better.
08:58Any questions from the audience?
09:00Well, in that case, Geraldine, you were one of GX's Bank's first employees.
09:07You had to build this data culture from scratch.
09:10Could you tell us a bit more about that experience?
09:12What it was like getting everyone on the same page when it comes to dealing with data, dealing
09:16with data responsibly, protecting data privacy?
09:19What was that story like?
09:22I think it was fortunate that we were one of the first few employees.
09:26It signified a lot to what data was in a company as one of the first employees to be employed.
09:33As new people came on board, I think what I experienced was their understanding of what
09:38data team does for them, what analytics does for them, what data science is for them.
09:44It's all varied levels of maturity.
09:48One of the things that I tried doing was, and this was back during COVID time, so we
09:52were building a digital bank remotely on Zoom when teams of two people could meet with each
09:59other.
10:00It was a lot of educating, meeting with them to understand what their needs were, their
10:04expectations were, but also creating an operating model to say, hey, what does the data team
10:10actually do?
10:11What can it benefit you?
10:12What are some of the use cases that we have seen in the past being adopted within your
10:16business unit?
10:17How does it help you?
10:18How are you going to adopt that?
10:20Also being able to assess their level of maturity was very important for us.
10:24Secondly, I think tools like Alation, for example, the other tools that we use for data
10:29visualization, trying to democratize a lot of the data to the hands of the users so users
10:36in the bank are able to search for data definitions, what is in the bank itself through Alation,
10:42for example, and they have trust in the data that they see.
10:46This is really important because reliability on the data and the quality of the data is
10:50a key component to using the data, and putting it, expanding it into a dashboard and visualization
11:01is what helps the users create those use cases, create the problem statements that they use
11:06data to solve for as well.
11:09And then a similar question to you, I mean, if you were to give advice to a business leader
11:13about how to build a strong data culture in their organization, what would be the sorts
11:18of advice you would give them?
11:20So culture is effectively just a set of habits and behaviors that exist, right, within any
11:25organization.
11:26And so those habits and behaviors exist on some level by default.
11:30And the real thing to incentivize, I mean, if we're trying to build a habit, if we want
11:34to stop smoking, if we want to eat better quality food, you really want to build in
11:40immediate rewards.
11:42And what people do in the data path often that makes that journey go awry is there's
11:47too much organizing, too much money being spent, too many organizational meetings about
11:54my definition is better than your definition, and you get into this work and you get lost.
11:59And so what we find the best companies do and the best customers do is they tie their
12:04outcomes to value very quickly.
12:07And so the work that they do is often very clearly measurable and very clearly outcome
12:14oriented, and the stories are very quickly socialized.
12:18And so what you find is that that momentum often ends up being super useful, because
12:23if you can get one or two wins, then you can get a couple more.
12:27The other thing, though, that I find super valuable is that you tie to the most important
12:30problems of the business.
12:32The thing about data teams is they can work on anything.
12:35It's ultimately enablement service.
12:36And so how do you know which problems to tie to?
12:40It's the ones that the executive cares about the most.
12:42And often we find that data teams forget about this.
12:44They just work on the problems that are most interesting.
12:46But the interesting problems aren't the ones that are going to necessarily make you or
12:49save you money.
12:51And what about you, Geraldine?
12:52What lessons would you kind of offer to leaders going through this journey themselves?
12:56I would say you really need the buy-in of the top management, and even getting everyone
13:02together in getting a business sponsor is so important.
13:06Because like what Sachin was saying, it's tying the matrix of success to a business
13:10sponsor and showing them those quick wins as well is so important.
13:15So Sachin, I'm going to use a bit of a rude term on stage here.
13:21On social media, there's a term now being thrown around, AI slop, low-quality data,
13:28low-quality products generated by Gen AI programs.
13:32And there's a fear that it's just lowering the overall quality of stuff on the internet.
13:39How do you kind of prevent that feedback loop from happening?
13:42How do you prevent low-quality generated AI products from polluting the data set and making
13:49your own AI up at worst, and then having this negative cycle towards just having really
13:56bad quality stuff produced by your AI models?
13:59Yeah.
14:00And of course, the slop begets more slop.
14:04And we're, of course, in a United States election year, and you're going to see misinformation,
14:09which is sort of even a worse form of the slop.
14:15So there's lots of answers to that question.
14:17But one of them is, interestingly, of course, there's AIs to detect the bad AIs, which can
14:23only, of course, go so far.
14:25There's manual intervention.
14:28There's model training.
14:31But then there's vigilance, right?
14:32We all have to be sort of aware of the information that we're consuming and understand what is
14:38right or wrong.
14:39And that's a form of literacy that I think doesn't exist as critically as it should in
14:43the digital world.
14:44I mean, we certainly all get phished.
14:47We get SMSs that effectively are trying to defraud us into sending gift cards and money.
14:52But we have to realize that some of this information is a light form of phishing.
14:56They're trying to convince us of messages that may not be true or bad information.
15:00And so it's a multi-pronged thing.
15:02I mean, truth is a hard thing to come by, and it's work.
15:08So Geraldine, I remember I was talking to a VC once, and he made the point that the
15:15world of fintech, the opportunities in fintech are not actually in centers like Hong Kong,
15:22even in Singapore, where the regulation is very strong, has a well-developed financial
15:25system.
15:26But a lot of the opportunity is in emerging markets, where the systems are less developed.
15:32There's less infrastructure.
15:36As people are thinking about financial inclusion, fintech, in these less developed markets,
15:42how do they need to think about data?
15:44Is the data going to be harder to come by?
15:46Are concerns about, again, data privacy, working with partners more important?
15:51As you move to wilder markets in Singapore, what do people have to keep in mind?
15:58I think I'll draw lessons from what we have in the bank as well.
16:02I think the first thing that I mentioned earlier was about data sharing agreements in place
16:06with our shareholders.
16:08I think this can be a lateral learning lesson.
16:11Having those proper data sharing agreements, things with consent, who collects consent
16:17as part of this whole customer journey is really important as well.
16:20I recall days and nights poring over that document.
16:23But operationalizing it is equally tough, because each organization have their own mechanism,
16:30own channel for which data is being shared.
16:33The other thing is, how do you also do audits on each other's data storage?
16:39How do they protect the data?
16:41And there are agreements in place such that it states down how that data would be stored
16:46properly and used in an appropriate manner for certain users, for certain purposes only.
16:52I think that's from a data protection, data sharing point of view.
16:55The second thing is about how do you use the data?
16:58We spoke about alternative data sources.
17:01I think a lot of learnings could be used in, for example, credit risk profiling, in acquisition
17:06of customers as well.
17:07So you lower your cost of acquisition through your partners.
17:10But how do you use those alternative data sources to create a flywheel as well and help
17:14both sides of the data sharing partners to do customer acquisition as well as credit
17:20risk profiling?
17:21Where data sources such as the credit bureau, Singapore, might not be available, alternative
17:26data sources would be where you leverage and create the risk profile as well.
17:31So very quick question to you, Geraldine.
17:33GXS Bank launched August 2022.
17:36ChatGPT launches three months later, which changes the whole conversation about AI and data.
17:43I mean, are you expecting any other surprises when it comes to data and AI?
17:49I do see that right now there's a lot of gen AI being done in a very task-level manner
17:56that should be expanded into a more enterprise workforce impact type of flow.
18:03The workflow, this is where I see transformation in the workflow entire processes.
18:08And this is dependent on where each organization defines end-to-end, because the end-to-end
18:12workflow for me might be very different to say in such an organization.
18:16And I think it's about us applying individual organization's mindset to see where those
18:20gaps are to implement the gen AI for workflows, workflow impact on gen AI.
18:26And I mean, we were talking earlier, I mean, this is still all very new.
18:29And I think a lot of companies, especially non-tech companies, are still grappling with
18:33this new tech.
18:34I mean, how long...
18:35And the tech companies too.
18:37Yeah.
18:38How long will it take, you think, for this literacy to really break into outside of the
18:43tech sphere to these other companies?
18:47So I think the thing that's unpredictable with this technology is that you don't know
18:50how quickly the models themselves are improving.
18:53And then there are other techniques in the general world of AI, like deep reinforcement
18:57learning that could sort of really improve the quality and the capability of what these
19:02models can do.
19:03So that is, I think, somewhat unpredictable, but could be quite revolutionary.
19:07But even given the technology, let's just say we stopped developing these new models.
19:13I would argue that it's going to be certainly years and possibly decades before we can get
19:18to the level of adoption that really has most of these institutions absorbing this technology
19:24in the way that they need to.
19:25I mean, we're so early in terms of understanding what use cases are appropriate, how these
19:31models can be used, how to tune them appropriately, what the right prompts are.
19:35I mean, there's so many things that we have yet to learn and how to integrate the right
19:40data into it.
19:41So there's a lot to do.
19:42And I think there's just tons of excitement in front of us.
19:46Well, we will definitely experience more unpredictability in the world of AI in the years to come.
19:50Thank you so much, Sechan.
19:51Thank you so much, Geraldine, for talking about AI ready day to day.
19:55Please give a round of applause.

Recommended