Spotlight on Tech

Understanding application-aware Kubernetes storage for stateful applications 

Brooke Frischemeier
Head of Product Management, Unified Cloud
Rakuten Symphony
June 22, 2023
minute read

With the world moving rapidly towards 'softwarization,' Kubernetes has gained significant prominence throughout the last decade as the leading container orchestration platform.

Naturally, understanding its storage capabilities becomes essential for effectively managing and scaling applications in a containerized environment. As organizations increasingly rely on containerization and orchestration with Kubernetes, there is an urgent need for storage solutions that align with the requirements of stateful applications. Below explores the importance of application-aware Kubernetes storage, and effectively leveraging application-aware Kubernetes storage solutions to support the unique needs of modern-day stateful applications.  

Stateless applications

The genesis of the Kubernetes revolution comes down to stateless, web-scale workloads. In the stateless model, one typically deploys multiple replicas of an application that run in parallel to increase availability. Kubernetes then scales the number of replicas up and down, based on demand.  

With stateless applications, replicas do not require a unique identity, as operations and data are not persistent. The application itself is stateless because it does not have to perform operations based on a previous operation or data “state.” Every time an operation is performed, it carries on like it did the very first time, from the beginning. Think of it as a simple web search, where the applications do not need to understand or retain any information about any prior transaction. One day I am searching for vitamins and the next for a good book – each set of inputs and the resulting output are unrelated.  

Stateful applications

On the other hand, a “stateful” application, such as bank transactions, does care about preexisting conditions, data and states where each transaction in the ledger takes into account what was there previously. For such an application in Kubernetes, there needs to be a “persistent” relationship between the data, the application and the underlying containers and pods as it scales, migrates, stops, starts, heals, or just goes on in its daily routine. This persistent relationship needs to represent the any-point-in-time status of the overall solution.  

Stateful applications are generally used when there is a need to maintain and manage persistent data. There are several reasons why organizations choose to utilize stateful architectures. For example, they can be valuable in enabling use cases that require data persistence and data integrity or involve orchestrating complex workflows.

The list of stateful applications is ever-growing. We see it not just in the cloud, but also at the edge, especially as we roll out new mobile services enabling: the Internet of Things (IoT), Industry 4.0, self-driving vehicles, smart cities, video analytics, customized content delivery, security, Database-as-a-service (DBaaS) and self-healing networks.  

Stateful Kubernetes services

Prior to Kubernetes, there was a simple and direct mapping between data and an application, running on bare metal or in a virtual machine. But with Kubernetes innovation came complexity. Kubernetes applications are broken up into many microservices and mapped to different containers, each with a different relationship to the data, as it grows, scales, migrates and heals. Not only does the data have state, but the condition of its Kubernetes containers and microservices also have a given state that can change the very moment they are deployed. Therefore, simply rolling back to day one is not a good option.  

To complicate matters further, most clouds run off of their legacy storage, where Kubernetes applications and their data are mapped to one another via a “simple” Container Storage Interface (CSI). On one end there is the data, for example, a storage array with a CSI that only sees a generic connection to some downstream node. The array controller does not see or even comprehend the application or the Kubernetes container constructs. Moreover, the application only sees a generic persistent volume and has no notion of what kinds of media make up the storage, where they are located or how they all fit together in the system. It is all very generic as neither side of the CSI has the visibility to understand the complexity of the other side.  

This not only severely limits efficiency and performance, but it also impacts how one can configure the combined solution for data protection, recovery, quality of service, workload-to-storage affinities and lifecycle automation – each of which impacts user experience, platform efficiencies, as well as data protection and recovery automation. This disparity is even more exacerbated when one attaches to a volume that spans multiple media types. At the very least, vanilla Kubernetes and a generic CSI hamstrings the service designer and the user.

A “typical” stateful Kubernetes storage is required to properly manage the lifecycle of persistent data for stateful applications running on Kubernetes clusters. Before deploying, you should consider:

  • Data persistence: Persistence is mandatory, as it provides application durability. Without stateful, persistent storage, if a container crashes or is terminated for any reason, the data will be lost.
  • Data consistency: For stateful applications, data must be reliably synchronized across multiple instances of the application, preventing data corruption.
  • Fast and reliable recovery: In the event that there is a failure, a stateful storage solution provides data recovery, via incremental backups and snapshots. Incremental backup only caters to the data that has changed since the last backup and requires the least amount of overall storage resources.
  • Reliable scaling: With Kubernetes, stateful storage allows applications to scale horizontally by distributing data across multiple pods, servers and even data centers, providing optimal load balancing, throughput, availability, and reliability.
  • Reliable persistent replica identifiers: Kubernetes provides this with StatefulSets that guarantee that when an application transitions to a new pod, e.g., after failure, it retains the previous environment and relationships. This includes PersistentVolumeClaims (PVCs) to ensure that a new pod's volume will be reattached.    
  • Usable storage abstraction: The solution must be customizable in a way that makes it easy for developers and administrators to provision and maintain resources. Kubernetes provides this abstraction layer for managing storage resources. Stateful storage solutions integrate with Kubernetes through PersistentVolumes (PVs) and PVCs.

Why we need more than just plain old Kubernetes storage

As mentioned before, since vanilla Kubernetes and a generic CSI limit the service designer, it typically takes the skillset of a seasoned application developer to manage them.  

To sum things up to this point, while cloud-native Kubernetes allows one to scale and enhance performance by leaps and bounds, when it comes to data protection, the relationship between storage and application becomes more complex.

  • In cloud-native, the application is no longer a homogenous running blob as before.
  • All of its roles are exposed individually.
  • There are multiple containers per role, with their own data needs and connectivity.
  • The containers scale independently and are constantly changing.
  • There is storage, network, and compute connectivity to understand.
  • There is application config data, Kubernetes config data, secrets, metadata and ConfigMaps, where all of it has to relate back to storage in its own way, and they are constantly changing.
  • Organizations need to work in a multi-tenant, multi-organizational environment that requires multi-tenant resource pooling, chargeback and role-based access (RBAC).

None of this is addressed by Kubernetes and legacy storage vendors. While Kubernetes is the way of the future, the same old operations model – relying on command line interface (CLI), hunting, tagging and hard coding – is a boat anchor.!

A better way: Symcloud Storage

Our Symcloud Storage understands, auto-learns and auto-adapts to all application and data permutations and performs any-point-in-time backups, snapshots, cloning and disaster recovery with application and Kubernetes state awareness.  

Industry vendors claim application awareness. Still, they require manual intensive tagging and marking over the lifetime of the application and Kubernetes expertise. With cloud-native storage as in the case of Symcloud Storage, we auto-ingest the application from its Helm chart, YAML file or operator, then we auto-discover it and auto-monitor and adapt its changes over its entire lifecycle. Fully automated forever and way easier to use, no Kubernetes expertise required.

Furthermore, Symcloud Storage provides programmable pre and post-processing policies that auto-adjust to target environments and can even renumber IP addresses when cloning so there are no network clashes. Additionally, it provides automated storage placement based on easy-to-configure policies and IOPs-based storage QoS and it can even be set to auto-reconnect to an alternate on node outage.

The solution includes industry-leading, software-defined storage that supports a comprehensive set of application-aware services, including snapshots, clone, backup, encryption, and business continuity. All data services are application-aware, tracking not only data storage, but the metadata and the ever-changing Kubernetes application config, protecting a wide range of datasets for “application-consistent” disaster recovery of complex network- and storage-intensive stateful applications.  

Lastly, this all comes with easy-to-use, multi-cloud portability, multi-tenant resource pooling, chargeback and RBAC that can integrate with your existing Lightweight Directory Access Protocol (LDAP) solution.

It sounds like a lot of additional features, complexity and learning, but Symcloud Storage comes with the industry’s easiest-to-use graphical user interface (GUI) driven by automated storage policies. If you decide to use our CLI, operations that take the competition multiple commands to perform are taken care of by a single line of code.  

Explore specific use cases of SymcloudTM and its application in the telecom industry.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Kubernets Storage

Subscribe to Covered, a Newsletter for Modern Telecom

You are signed up!

Thank you for joining. You are now a part of the Rakuten Symphony community. As a community member, you will receive news, announcements, updates, insights and information in our eNewsletter.
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Notice for more information.