Kubernetes operators enable application vendors to expand the functionality of core Kubernetes by allowing them to effectively manage the deployment and lifecycle of their applications. By defining a Custom Resource Definition (CRD) and custom controllers, it is possible to develop a software extension for Kubernetes that is tailored to the specific orchestration requirements of a particular application, and is thus able to manage it more effectively. Although this approach can be highly advantageous for application vendors or developers, it may pose significant challenges for cluster administrators or end-users. While there are compelling reasons to create operators for applications, ensuring consistency in the definitions and establishing robust higher-level building blocks will make operator frameworks more appealing to all parties involved. We will see some interesting solutions to this problem toward the end of this blog.
Kubernetes was earlier recognized as a highly opinionated stack, with application developers being compelled to adhere to specific deployment patterns in order to onboard it to the platform. That means the stack’s framework imposes a certain method of interactive rules or constructs. In other words, there is a general right way of working the stack in a series of patterns to get the most out of it. These patterns included essential constructs such as pod, Persistent Volume Claim (PVC), ConfigMap, Secret, Deployment, StatefulSets, DaemonSets, Services and others, each serving a unique purpose. Pods provided compute resources, PVC facilitated storage, ConfigMaps/Secrets were utilized for configuration purposes, Deployments and StatefulSets were employed to manage scalability and availability, and Services/Endpoints were used for network access.
Although Kubernetes supports many other resource types, the aforementioned constructs are application-agnostic and serve as foundational building blocks for virtually any type of application deployment.
Kubernetes constructs are designed to model the resources required for application deployment, such as storage, compute and network. However, certain aspects of applications cannot be adequately mapped to these constructs. While progress has been made in areas such as data protection through the use of specialized custom resources and VolumeSnapshotClass, VolumeSnapshotContent, VolumeContent etc., there remains a significant amount of work to be done. Custom resources and operators are being developed to address native shortcomings in a way that makes interacting with the stack easier and more predictable, and include:
In order to successfully automate data protection lifecycle management, application awareness is a must for data-heavy applications such as SQL, NoSQL, DataBus and big data applications. There is a pressing need for data management.
For example, consider these complexities:
Operators are meant to solve all of the above problems and some of the mature operators do deliver predictive functionality in a format that is easy to work with in a repetitive manner.
Operators are a superior way to extend Kubernetes to complex application deployments. However, there are some glaring issues, although minor, which need to be addressed to reduce the operator’s overhead.
Let’s consider each briefly.
Consider a simple scenario such as listing all the applications running on a Kubernetes cluster, with multiple helm releases and custom operators. Using "helm list" only provides a partial overview of helm releases and some operators, but not all running applications. Creating a GUI for such a Kubernetes instance presents challenges in user navigation. One possible solution is to organize helm and operator applications into different namespaces, though determining how to divide namespaces in a Kubernetes cluster is a complex issue with multiple patterns available:
On a side note, some of the operators dictate how the namespaces are created. Overall, it’s safe to say that there needs to be some consistency in how application listings work. Application CRD is one possible solution.
With an operator pattern, the application vendors define CRDs and controllers. Vendors are free to define the schema in the CRD. Of course they need to adhere to the basic template with ApiVersion, Metadata, Spec or Status, but what goes in the spec and status sections, in most cases, is completely up to the vendor. And here is where the trouble starts.
To illustrate this point, let us consider a simple scale-up/down operation. Every Kubernetes admin knows how to scale a StatefulSet. It is either ‘kubectl scale’ or editing the ‘spec.replicas’ in the YAML spec, using ‘kubectl edit’. This is consistent across any StatefulSet. The actions that are taken when scaled-out or scaled-in are entirely up to the entrypoint.
However, in some cases, changes may be needed on the already-running pods before the new pod can be created as part of scale-out operation. Changes to entrypoint alone can’t handle the scale-up operation and a custom controller, which works at an application level, will need to intervene and orchestrate the scale-up or scale-down operation.
There is typically a command line client (CLI client) or a document that describes which fields to change in the YAML spec to trigger the scale-up. Every operator might use different specs and command line parameters to handle application lifecycle operations.
In summary, every operator defines its own schema and practices for a standard operation such as a scaleout/scale-in. In an environment where thousands of applications are deployed, it is extremely difficult to orchestrate if there is no consistency in definitions.
i. Etcd operator
ii. MongoDB Operator
iii. Cassandra Operator
Lack of consistent lifecycle management patterns
Many operators have a custom CLI which is different from the standard application management CLIs. This CLI is typically to manage the CRs or the YAML definitions and not necessarily the application management. Just take a look at one of the postgres operators.
i. Postgres backup — Let us take a look at documentation for logical backups. pg_dumpall trigger is behind this declarative syntax.
ii. Etcd backups — To back up etcd cluster, we have another operator
There is a lack of consistency in debugging and troubleshooting when it comes to Kubernetes operators, and a similar issue arises with metrics, monitoring and alerting. For example, generating an average CPU utilization for an application requires organizing resources by namespaces, or alternatively, using labels and selectors. This can become difficult to manage if there is no control over label addition and removal. It's worth noting that anyone with access to a pod can add or remove labels, so it's important to consider potential issues that may arise from this.
Alternatives such as the super operator framework from Rakuten Symphony allow vendors to model any application and its lifecycle management operations using a consistent schema and lifecycle hooks, eliminating consistency issues. Then there is kubernetes Application CRD. Other open source initiatives such as project Nephio and efforts to standardize Network Functions/Network Services definitions also offer solutions.
Why use our approach? Rakuten Symphony’s native application orchestration framework is simple yet powerful as it enables developers and application administrators alike to compose, deploy, and manage complex application stacks, workloads and data pipelines. One type of application supported by the aforementioned framework are SymcloudTM applications. These applications are mandated by the configuration defined within the backing Application Bundle.
Our Application Bundles are a robust and inclusive collection of all artifacts required to deploy and manage an application, giving you the most amount of automated options. It contains one or more application container images, referenced within a manifest file that describes the components the application is composed of, the necessary dependencies between services, resource requirements, affinity/anti-affinity rules and custom actions required for application management. As a result, one can view an Application Bundle as the starting point for creating an application, and as such, a means by which to abstract the underlying infrastructure from a user.
For more information, here is the link to the documentation.
While the build/distribute/run/manage methodology leveraging Docker/OCI for application images, Kubernetes/Controller/CRI-O for application runtime and controller/CRDs for application lifecycle management benefits vendors, cluster administrators or end-users face a different challenge. With hundreds of operators each having their own CLI and custom YAML formats, application vendors must understand the intricacies of each operator in addition to the core Kubernetes resources to effectively triage and troubleshoot applications.
At Rakuten Symphony, we have been working on this issue for the last seven years. The goal is to build a pattern, called application bundles, for application developers to onboard any application, yet making it simple enough for the end-users and administrator, and automation tooling to maintain consistent operational practices. We have reference bundles for most of the mainstream applications ranging across SQL, noSQL, BigData, DataBus and telco workloads.
What I’ve described here is mostly based on lessons I’ve learned through my significant work orchestrating data-intensive applications at Rakuten Symphony.
Rakuten Symphony’s Symcloud™ extends Kubernetes’ agility, efficiency and portability to all stateful applications, including complicated big data, databases, AI/ML and custom applications, on any infrastructure, including on-premises, hybrid and multi-cloud ecosystem.