Moderator:
Geoff Hollingworth, CMO | Rakuten Symphony
Speakers:
Gaurav Jain, Head of Data/AI Platform | Rakuten Symphony
Piyush Khurana, Product Owner | Rakuten Symphony
[This transcript has been edited for clarity and readability.]
Geoff Hollingworth: Hi and welcome to the latest recording of Zero-Touch Telecom Live, our companion broadcast to our newsletter on LinkedIn, Zero-Tech Telecom. We've spent a significant amount of time on this newsletter and the live sessions around the importance of data and how it feeds into good AI. Good AI means good data. We've discussed the efficiency models that we've put into production with various customers. Today, we're going to change slightly and look at modeling around data sets for growth rather than efficiency. I'd like to welcome Gaurav and Piyush from our AI and data team to give us insights from people who are really doing this work. Welcome to the show, both of you.
Let me start with you, Gaurav. One of the things you covered in the article was the parallel journeys of bringing data in from a top-down point of view and also finding opportunities from a bottom-up product-centric approach. Can you explain the difference between the two and your experience?
Gaurav Jain: So, Geoff, what I believe and what the industry is currently moving towards is bringing data and AI together. It's not possible to do AI without the right data. We've created an AI and data platform where data is the critical part, capturing the right data in the right shape and size to feed into the AI system. AI systems require filtered, well-planned, well-managed data, and that's what we are doing in our AI platform. With that, you'll get the right insights and actionable outputs from any AI OPS system.
Geoff Hollingworth: Piyush, I'm going to bridge to you now and ask you the difficult questions based on what Gaurav just said. What are the top three challenges in doing what Gaurav mentioned? He made it sound so easy, but in reality, what's hard about bringing data into a platform like that?
Piyush Khurana: I would say, Geoff, it's really about the amount of data that goes into the system. One challenge is having petabytes of data or even more and not knowing which is the relevant data set to start with. This brings the associated challenge of cost as your data lake grows. We initially considered both the top-down and bottom-up approaches. With the top-down approach, you build a full-fledged platform and put all sorts of data sets into the data lake, enabling users to perform AI actions. The bottom-up approach is more particular about the output you want to receive, integrating only the data sets required for specific use cases. Identifying the right data sets and exposing them to the target audience in the right manner are key challenges.
Geoff Hollingworth: OK, let's dig into some of your answers. In the article, you mentioned three rich data sets: deep packet inspection (DPI), call detail records (CDR), and another one that you'll remind me of, Gaurav. What led you to these data sets, and what did you see happen when you injected them?
Gaurav Jain: When we started ingesting the data, we got petabytes of data. For our users, it was difficult to find the right data set. So, we created a catalog and querying layer on top of it. This helps users identify relevant, clean data. The catalog provides capabilities to figure out the best-suited data for their needs, and the querying layer ensures the data has all the necessary insights. For example, DPI data gives granular details about the ecosystem, and CDR data, which is BSS data, provides insights into customer behavior and payment patterns. Combining these data sets helps us better understand where to invest and identify potential use cases.
Geoff Hollingworth: Piyush, do you have anything to add?
Piyush Khurana: Gaurav covered most of it. These data sets, DPI, CDR, and BSS, are crucial for any telecom organization. Even for basic segmentation problems, these are the essential data sets. It's about figuring out what the customer is doing on the network and how to improve their experience using network insights.
Geoff Hollingworth: Walk me through how someone who needs a business outcome, like targeting people likely to subscribe to Rakuten TV, would use your platform.
Piyush Khurana: For any user, the first step is discovering the data on the platform. We have a data catalog module where you can search for relevant data sets. For example, you might search for video streaming volumes. Once you find the data set, you might need to perform aggregates. You don't need minute-level details but rather daily usage patterns. You identify relevant users and sort them, targeting the top users for your campaign. Then you monitor the campaign response and improve your algorithm based on conversion rates.
Geoff Hollingworth: Gaurav, given the scale of operations with millions of customers and activities, how do you manage the data to avoid being overwhelmed by volume, cost, and performance?
Gaurav Jain: We manage data by creating focused data marts, dynamic data sets for specific customer needs. For example, we can use a GIS system to visualize where Rakuten TV customers are concentrated. We also have a data governance system to manage data cost-effectively, monitoring data usage and time-to-live. This helps us keep the right data for the right use cases while being cost-efficient.
Geoff Hollingworth: Piyush, with this amount of data, how do you ensure privacy and security?
Piyush Khurana: We follow a data mesh approach, marking sensitive information and restricting access. Users go through a governance mechanism for data access, requiring two-level approval from data owners and governance officers. This ensures sensitive data sets are protected.
Geoff Hollingworth: So, you've distributed access control among data set owners. Your platform acts like a matchmaking function to manage access at scale. Is that correct?
Gaurav Jain: Yes, and we are also GDPR compliant, ensuring global data privacy policies. We're adding features to help customers discover new data sets using AI, providing metadata information for better insights.
Geoff Hollingworth: Gaurav, with your experience in traditional telecom companies versus Rakuten, what are the different challenges?
Gaurav Jain: Traditional telecom companies have varied data formats from different vendors, requiring significant data cleaning and transformation. In Rakuten, being cloud-native, we have a more standardized approach, reducing effort and cost.
Geoff Hollingworth: For everyone listening, good AI requires good data, and 90% of the work is ensuring data is managed appropriately. Piyush, any final thoughts?
Piyush Khurana: It's about a mix of approaches depending on the organization's data maturity. Be selective about data sets and avoid duplicacy by connecting to pre-existing data sources. This enriches the data lake and provides necessary data for future use cases.
Geoff Hollingworth: Gaurav, your final thoughts?
Gaurav Jain: Finding the right data and creating actionable insights is key. Ensure the value from data lakes and AI systems is worth the investment.
Geoff Hollingworth: Thank you, Gaurav and Piyush. For everyone listening, we discuss making AI and automation happen at scale and the actual issues involved. Please subscribe, comment, and let us know what you'd like us to cover next. Thank you.