White Paper: Achieving Uninterrupted Service - NextGen Network Backup and Restore Solution
By
Share this content
August 29, 2025
15
minute read
Abstract
In the highly competitive and interconnected telecom landscape, network availability and data integrity are paramount. Any disruption, whether due to hardware failure, software corruption, human error, cyberattack, or natural disasters, can lead to significant service outages, data loss, revenue loss, reputational damage, and regulatory penalties.
This white paper outlines a comprehensive centralized network Backup and Restore solution, specifically designed for fully cloud-native telco networks. It addresses the unique challenges of multi-vendor, multi-domain environments, Geo-redundant architectures, and the need for rapid backup and recovery, ensuring data security, business continuity and customer trust. This solution is built upon a uniform, centralized, and carrier-grade framework, addressing the current lack of common tooling and disparate backup approaches within telecom organization.
Introduction
The telecommunications industry is rapidly evolving, driven by new technologies like 5G, AI-enabled, and cloud-native architectures. This creates complex networks with many physical and virtual components, applications, and crucial data spread across various areas (Radio Access Network (RAN), Core Network, IP Transport, Operations Support Systems (OSS), Business Support Systems (BSS) and Security).
Current backup methods often handle parts of the network in isolation. This leads to slow and fragmented recovery during major issues. Our high-level design shows a clear need for a single, centralized, and consistent Backup and Restore solution. Today, different platforms (like Virtualized Infrastructure Manager (VIM), Rakuten Cloud, RAN, Core network, IP transport network, OSS, BSS and Security) use varied, often manual, backup approaches. Also, application and cluster backups are not yet robust enough for modern telecom needs.
To build a truly resilient network, we need a unified approach that covers all critical parts, from network functions and data to operational and business systems. This white paper describes such a solution. It focuses on providing an automated, scalable, secure and reliable telecom carrier-grade backup and restore operations framework.
The Critical Need for Network Backup and Restore in Telecom
For fully cloud-native telecom operators who anticipate experiencing frequent disasters, the critical need is intensified by:
Enhanced Disaster Recovery Capabilities: The need for robust, automated, and geographically spread disaster recovery. This means quick site failovers, data copied across regions and orchestrated recovery of whole network sections to minimize downtime during large outages.
Frequent and Verified Recovery Drills: A proactive approach to disaster management necessitates more frequent and thorough recovery drills. The solution must facilitate these drills by providing automated testing environments and clear verification processes to ensure that recovery plans are viable and meet Recovery Time Objective (RTO)/Recovery Point Objective (RPO) targets under various disaster scenarios.
Resilience Against Data Corruption/Loss: In a heightened disaster environment, data corruption or loss (e.g., from ransomware attacks, widespread hardware failures, or cascading software bugs) becomes a more significant threat. The solution's ability to restore to clean, verified states quickly is paramount.
Regulatory and Compliance Scrutiny: Operators facing increased disaster risks will likely face stricter regulatory scrutiny regarding their business continuity and disaster recovery plans. A comprehensive unified solution provides the necessary auditing and reporting capabilities to demonstrate compliance.
Maintaining Public Trust During Crises: The ability to rapidly restore services is crucial for maintaining public trust and demonstrating operational resilience, especially when such events become more frequent or severe.
A robust centralized backup and restore solution mitigates these risks by providing:
Rapid Recovery: Minimizing Mean Time To Recovery (MTTR) for critical services by automation.
Data Integrity: Ensuring the consistency and accuracy of configuration and operational data.
Business Continuity: Maintaining essential services even in the face of major disruptions by restoring from centralized backups.
Compliance: Meeting regulatory requirements for data retention and availability.
Security: Secure backups across all network domains with data encryption, access control, retention policies.
Challenges in Telco Backup and Restore
Telecommunications networks face unique backup and restore challenges, made worse by current, fragmented methods. For fully cloud-native telcos, these challenges are even more pronounced:
Diverse Environments: Networks use equipment and software from many vendors, each with their own interfaces and data formats. This makes unified backup complex.
Lack of Observability and Alerts: Without centralized dashboards and proper alerting, it’s difficult to monitor backup job status, identify failures, or understand overall health of the backup environment, leading to delayed response and increase risk.
Disparate Approaches: Different business units employing varied, often manual, backup methods (Cron jobs, single SFTP server) hinder consistency and reliability. Fragmented backup makes coordinated, end-to-end recovery extremely complex, error-prone, and time-consuming, significantly increasing Mean Time To Recovery (MTTR) during critical outages.
Security Concern: Backup data is highly sensitive and requires robust data encryption, encrypted storage, retention policy and access controls.
High Transaction Volume: Constant changes in subscriber data, service configurations, and network states demand frequent backups.
Scalability and Data Synchronization: The sheer volume of data and number of components requiring backup is continuously growing. Ensuring consistency of data across multiple, geographically dispersed systems and data centers is a significant challenge.
Customer-defined retention policy: Without a centralized system allowing users to define and enforce data retention policies for different network elements, organizations face challenges with compliance, inefficient storage utilization, and potential data sprawl.
Real-time Data: Many network functions deal with real-time traffic and state information that is difficult to back up without service interruption.
Complex Interdependencies: Interdependencies between network functions, services, and operational systems make coordinated recovery challenging.
A critical challenge for cloud-native telcos is the limited availability of readily comprehensive market solutions for central backup and restore. These solutions often lack the deep customization needed for diverse deployments and multi-vendor application support inherent in a telco environment.
This is precisely why Rakuten Mobile, which provides zero-touch, fully automated, IT-driven infrastructures designed, deployed, and operationalized to serve over 9 million subscribers across Japan, decided to develop this solution in-housefrom Rakuten Symphony.
This approach ensures that the solution can effectively handle the unique requirements of all kinds of cloud-native applications and cover network elements in every domain:
RAN
Core Network
IP Transport Network
OSS
BSS
Security
Cloud Platform
Our solution aims to simplify this by allowing users to create templates, manage activities, monitor through dashboards, and view detailed operations. The platform will ensure data integrity and availability by periodically backing up network elements or executing immediate backups based on user-defined templates, with robust mechanisms for restoration.
Figure 1: End to End Architecture Blocks of Rakuten Mobile Network
The Centralized Backup and Restore Solution
The solution is designed as a uniform centralized backup and restore solution, built to be telecom carrier-grade, scalable, and to operate on a common architectural framework. It orchestrates backup and restore operations across all network domains, ensuring data integrity and availability across data centers as figure 2. The system will periodically back up network elements or execute immediate backups based on user-created templates, providing robust mechanisms for restoration.
Figure 2: Overall Architecture for Backup and Restore
It comprises the following key components:
1. Centralized Orchestration and Management Platform:
This platform serves as the core of the solution, providing a unified interface for all backup and restore activities.
Figure 3: Sample Dashboard
Unified Dashboard: Provides a single viewpoint for monitoring backup status, managing activities, and initiating restore operations across the entire network. This aligns with the "monitoring through dashboards" aspect of the user journey.
Policy Engine: Defines backup schedules (e.g., daily, weekly – as per "Backup Frequency" in templates), retention policies, data encryption standards, and recovery point objectives (RPOs) and recovery time objectives (RTOs) for different network elements and services. This is directly supported by the concept of "templates" which are "predefined configurations that users can create to standardize and automate backup and restore operations.
Workflow Automation: Automates the entire backup and restore process, reducing manual intervention and human error, a crucial improvement over current manual execution and Cron job approaches.
Integration Layer: Provides APIs and connectors for seamless integration with multi-vendor network elements, telco applications, virtualization platforms, container orchestration, and cloud environments. This is supported by the explicit dependencies on Inventory Manager and various system components like Keycloak, Vault/IPA/ISE/NRE, Observability, Notification Hub, OSS Platform Services, Jenkins CI/CD.
2. Data Collection Agents/Connectors:
These components facilitate the extraction of data from various network elements and systems.
Network Element Adapters: Specialized agents or connectors for each vendor's equipment (e.g., RAN controllers, Core network elements, transport devices) to extract configuration, state, and application data. The assumption of "MOP from all domains to be promptly shared" is critical for the effective functioning of these adapters.
Virtualization/ Container Integrations: Connectors to hypervisors and Kubernetes clusters to back up virtual machines, container persistent volumes, and cluster configurations. The explicit dependency on Rakuten Cloud Backup highlights the chosen solutions for container-native backup.
Database Agents: Agents for various databases used by OSS/BSS and network functions.
File System Agents: For backing up critical files and directories on servers.
3. Secure Storage Repository:
This component ensures the secure and efficient storage of backup data. The dependency on MinIO indicates a chosen solution for object storage.
Multi-Tiered Storage: Supports various storage types (e.g., NAS, SAN, object storage, cloud storage) for optimizing cost and performance based on data criticality and retention policies. MinIO can serve as a flexible object storage solution.
Data Deduplication and Compression: Data deduplication identifies and eliminates redundant copies of data at the block level. Only one unique copy is stored, and subsequent identical blocks are replaced with pointers to the existing one. This is particularly effective in telco environments where many network elements might share common configurations, operating system files, or application binaries.
After deduplication, the remaining unique data blocks are further compressed to reduce their size. Compression algorithms reduce the overall data volume, leading to smaller backup files.
Encryption at Rest and in Transit: Ensures the security of backup data.
Immutability: Supports immutable backups to protect against ransomware attacks and accidental deletion.
Central Backup and Restore solution will use industry standard APIs where possible. Each micro service will publish its API in swagger format to get on-boarded into the API gateway.
Figure 4: The API Call flow
4. Restore Engine:
This engine orchestrates the recovery process, ensuring rapid and accurate restoration.
Intelligent Restoration: Understands network dependencies and orchestrates the restore sequence to ensure proper service re-establishment.
Granular Recovery: Ability to restore specific configurations, individual VMs/containers, or entire network domains.
Validation and Verification: Post-restore checks to ensure the integrity and functionality of restored devices and services.
Rollback Capability: Ability to revert to previous states if a restore operation encounters issues.
5. Reporting and Auditing:
This provides transparency and accountability for all backup and restore operations. The "detailed views of backup and restore operations" aspect of the user journey is supported here.
Comprehensive Logs: Detailed records of all backup and restore activities for auditing and compliance.
Performance Metrics: Tracking of backup success rates, RPO/RTO adherence, and storage utilization.
Alerting and Notifications: Proactive alerts on backup failures, storage issues, or potential recovery risks, supported by the Notification Hub dependency.
Figure 5: Sample of performance metrics
Key Features
The Centralized Backup and Restore solution offer a robust set of features to meet the stringent demands of telecommunications networks:
Automated Discovery: Automatically discovers network elements, virtual machines, and containers, leveraging the inventory manager integration for comprehensive coverage.
Policy-Driven Backups: Define RPO/RTO targets based on service criticality using user-defined templates, ensuring business objectives are met.
Configurable Backup Frequency: Users can define how often backups should occur (e.g., daily, weekly) through the template system.
Incremental and Differential Backups: Optimizes backup windows and storage by only backing up changed data.
Application-Consistent Backups: Ensures data integrity for active applications and databases by coordinating with application-specific agents.
Network-Aware Restoration: Intelligent sequencing of restore operations, considering network dependencies (e.g., restoring core before RAN).
Disaster Recovery Orchestration: Automation of DR workflows, including site failover and failback.
Advanced Security Features:
Security by Design: Role-based access control (RBAC) via Keycloak, and encryption (at rest and in transit) to protect against threats.
Immutable Backups: Ensures that backup data can’t be altered, deleted, or encrypted by ransomware or malicious actors, providing an unchangeable last line of defense.
Scalability and Performance: Designed to handle the massive scale and high velocity of data in modern telco networks, leveraging technologies like MinIO and Rakuten Cloud Backup.
User-Centric Journey: Provides a clear user experience for creating templates, managing activities, monitoring through dashboards, and accessing detailed operational views.
Air-Gapped Solutions: Supports creating isolated, offline copies of critical backup data, providing ultimate protection against network-borne threats and widespread outages.
AI/ML-driven automation: The integration of AI/ML-driven automation into backup and restore processes will further empower operators. AI/ML can predict potential failures, optimize backup schedules, intelligently prioritize recovery steps, and even automate complex restoration sequences.
Figure 6: RBAC Scheme
Conclusion
In the dynamic and hyper-connected telecommunications industry, a unified backup and restore solution is no longer a luxury but a fundamental necessity. Our proposed solution, designed as a uniform centralized backup and restore system, directly addresses the current challenges of fragmented approaches and immature backup processes. By providing a unified, automated, and intelligent platform for protecting all critical network assets, operators can significantly enhance their resilience, minimize downtime, ensure data integrity, and maintain customer trust.
Looking ahead, the integration of AI/ML-driven automation into backup and restore processes will further reduce OPEX and minimize service downtime. This comprehensive approach safeguards against the multifaceted threats of today's digital landscape, empowering telcos to deliver uninterrupted, high-quality services and secure their future growth through a truly telecom carrier-grade, scalable, and common architectural framework.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Mobile
Cloud
Cloud-Native
Enterprise Solutions
Telco
Authors:
Qiu Ping, Senior Manager, Architecture & Business Solutioning Department, Rakuten Mobile Inc.
Yadav Brijesh, General Manager, Architecture & Business Solutioning Department, Rakuten Mobile Inc.
Johri Prafull, Senior Director, CICD Factory Department, Rakuten Symphony Inc.
Colaiuta Sara, Senior Director, CICD Factory Department, Rakuten Symphony Inc.
Bah Mamadu, Vice President, CICD Factory Department, Rakuten Symphony Inc.
Share this content
Subscribe to Covered, a Newsletter for Modern Telecom
You are signed up!
Thank you for joining. You are now a part of the Rakuten Symphony community. As a community member, you will receive news, announcements, updates, insights and information in our eNewsletter.
Thank you for joining. You are now a part of the Rakuten Symphony community. As a community member, you will receive news, announcements, updates, insights and information in our eNewsletter.
Something went wrong, please try again.
Contact Us
To learn more about Rakuten Symphony transformational technology and unique software based cloud approach, to join the community as blog contributor, and for other feedback and information, contact us via one of our channels below.
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Notice for more information.