AI models in production: sleeping cells

May 23, 2024
5
mins read

You're at a theater and experiencing an unusually long wait to be seated. Despite there being the usual number of ushers, you notice some are just walking around and not actually helping anyone. This hidden inefficiency mirrors a problem faced by mobile operators: sleeping cells.

"Sleeping cells" appear to function normally but cannot effectively handle user traffic due to issues like congestion, software errors or memory problems. These cells go undetected until customer complaints or routine checks reveal abnormal traffic patterns.

Every day, mobile operators grapple with this issue across hundreds of thousands of cells. Manual processes can't keep up with the scale and complexity of finding the problem cells, often leading to long detection and resolution times, and bad customer experiences.

The operator only discovers the issue when customer service teams report complaints that a subscriber’s device shows a strong signal, but the service is not working or is poor. Or a routine check reveals low traffic patterns that could potentially indicate the culprit: a sleeping cell. Then again, maybe there is just low traffic?

Manual processes can’t cope in modern times

Operators typically rely on manual thresholds to detect dozens of potential sleeping cell cases every day. Most cases are not actually sleeping cells. Rather, they are the result of false positives triggered by various symptoms that mimic a sleeping cell but are caused by other factors, such as:

  • Event-driven traffic anomalies. Traffic suddenly drops because an event like a festival or public holiday has ended.
  • Seasonal variations. Maybe people are traveling during the summer causing anomalies in cell usage.
  • Weather-related issues. Heavy cell usage could follow a power outage in the area with traffic dropping suddenly when power is restored.

The easiest approach to addressing the problem is to simply reboot every suspected cell, taking it out of commission for as long as six minutes—and crushing the customer experience in the area in the process. In our theater example, this would be akin to summoning all of the ushers to a meeting and creating further delays when only two were creating a problem.

Operators aiming to be more surgical in response may make a site visit to each suspected cell to manually test its performance. Of course, this is not always realistic due to cost and time to resolve. It is common for sleeping cells to go undetected for days, impacting every subscriber that comes in contact with it.

Finding a needle in the haystack with AI

Automation and AI are emerging as critical tools to help operators more quickly detect traffic anomalies amongst hundreds of thousands of cells. Much like finding a needle in a haystack, AI can help operators proactively and efficiently discover symptoms based on subscriber usage, base station resource scheduling, network usage data, cell site data, geolocation and more.

An analysis is conducted by the AI model using machine learning and classification based on the data collected with large time series-based data to accurately predict and detect the problem based on profiling, trending and recalling a data set with respect to behavior of cell sites and subscribers. Only then can an accurate determination be made about whether a sleeping cell is to blame.

Let’s dive deeper into the methods that power the analysis:

Machine learning models:

  • Pattern recognition. Models are trained on historical data to recognize patterns associated with normal and abnormal traffic behavior.
  • Feature extraction. Anomalies are identified based on traffic volume, user activity and error rates extracted from the data.

Time-series analysis:

  • Historical comparison. Traffic data is analyzed over various time periods (e.g., hourly, daily or weekly) to detect changes from expected patterns. In this case, a consistent drop in traffic at a specific time versus historical data could indicate a potential issue.
  • Seasonality and other trends. Models may account for expected seasonal variations or longer-term trends to reduce the likelihood of a false positive due to temporary changes resulting from weather or an event.

Subscriber profiling:

  • User behavior analysis. Individual users or groups of users are profiled to understand which contributes most to the traffic and usual patterns. The model then detects when these patterns are disrupted.
  • Application usage. Application types in use are analyzed. Users primarily using chat apps might show different traffic patterns compared to those streaming videos. This helps accurately determine if traffic drops are a result of a sleeping cell or typical application usage.

Over time, the models are fine-tuned to differentiate between actual sleeping cells and cells experiencing normal fluctuation.

AI model implementation challenges

Especially in early days, the largest challenge operators will face when they deploy an AI model for identifying sleeping cells is related to accuracy.

For example, as we discussed earlier, it is hard to identify change in seasonality data. AI models can only learn by validating and providing feedback from previous outcomes into the system. This tuning of the model takes time but gradually increases accuracy.

The operators we work with have a head start from this perspective given the years of training already informing our models. But still, AI models need to be further tuned based on the specific characteristics of traffic, events and anomalies that are unique to each operator’s network and region.

Real AI-powered sleeping cell model results from Japan

Rakuten’s AI model for identifying sleeping cells has been live in Japan for years. Following implementation, mean time to detect and resolve sleeping cell issues improved by 80% with demonstrable customer experience improvements. This is evidenced by traffic usage increases for each affected cell.

The model learns and improves continuously over time, currently delivering as high as 80-90% accuracy. It is very efficient, with near real-time triggering.

There are also follow-on effects related to the improvements. Even though sleeping cells may initially affect network performance, the overall traffic at a cluster level remains stable due to the fast detection and resolution time.

To date, Rakuten Mobile has deployed more than 70 AI models and we look forward to continuing to share details about how they have been implemented and the results we are achieving.

AI
Mobile
Related Newsletter
Making AI work in the real world: Latest web training series focuses on behavior and model control
AI models don’t just work or fail. They learn, adapt and sometimes stall or go off track. The challenge is understanding why they behave the way they do and how to guide them back on course.
July 18, 2025
4
MINUTES
Automation’s real role: Amplifying streamlined operations
The Rakuten Symphony team is in attendance at FutureNet World London this week. More conversations are starting to expose the inevitable reality of operators under pressure to meet demands of new networks. Whether they are prepping for AI, private networks, 6G or something else. While these networks may be different, in many ways, the challenges are the same. The industry’s default response to previous challenges and opportunities alike has been to deploy more tools and more technology, hoping it will solve underlying problems. But that won’t be possible this time.
May 8, 2025
4
MINUTES
AI-powered security in telecom: The use cases that can win
External network security attacks are more likely to succeed as attacker sophistication increases. Modern malware is polymorphic and programmed to evade common signatures, rules and perimeter-based defense mechanisms. Once hackers make it into the network, they can stealthily navigate it, compromising accounts, seeking out valuable assets and gradually stealing data.
April 29, 2025
4
MINUTES
What early AI-driven deployments in the telco cloud teach us
Operationally industrializing AI is the number one key success factor that enables: Data scientists to focus on data and AI, not tooling. IT to support data scientists with the maximum amount of automation. AI, data and model governance enforcement from a security, privacy and lifecycle management perspective.
April 24, 2025
5
MINUTES