Service mesh – a control plane for your application
Applications built on monolithic/ 3-tier/ n-tier architecture often fail to meet the market demands in terms of scaling and performance. This is generally attributed towards the inflexible nature of these architectures, where the code base becomes unmanageable due to various reasons – like the addition of new features, identifying dependencies, and side effects that could crop up due to scaling, etc. In these environments, adopting new technologies and making changes take a long time. The bottom line is that they are less agile and ancient. Microservice architecture is believed to be the rescuer, where business logic is handled by separate services. It helps overcome the issues faced by the likes of monoliths (where business and system logic are bundled together) by splitting the application into multiple small components where each handles specific tasks and exposes standard APIs. This helps to focus on the hotspots in your application and easily enable horizontal scaling too, if required. Having said that, it is not as simple as it looks to manage a microservice architecture.
A glimpse of monoliths, microservices, and anything in between
Before delving deeper into managing microservices, let us take a look at what monoliths and microservices are capable of, and take stock of their pros and cons.
- Monolithic application keeps its entire business logic in one single code base.
- It is deployed as a single entity or service.
Pros: Low operational complexity. Holds good during the initial phase of application development, where few components are sufficient.
Cons: Scaling the capacity (horizontal scaling) of the application is a challenge as it involves handling multiple instances of a large code base. Increasing the development team size is another challenge, because it is hard for new members to understand the complexities of the existing code.
An enhanced version of the monolith can be called as the n-tier application, where vertical and horizontal scaling is possible. However, there are bottlenecks at the database (DB) and load balancer (LB) levels.
- Microservices are a natural evolution from n-tier applications.
- The components are segmented in such a way that there is no need to touch all of them while making changes to a specific application.
- The modern operational techniques bring down complexities involved in managing multiple microservices and progressive updates, zero-downtime updates, and so on.
Pros: Each microservice can scale individually based on its demand. The development team can parallelly work on their competent areas and rollout services. This is highly favorable for horizontal scalability and better resource utilization.
Cons: Complex operational requirements to manage the system. It requires strong visibility to manage the entire system.
Managing microservices in a modern application
Currently, most of the modern applications are microservice-based, and they might be dependent on other SaaS, PaaS systems too. Key components of this architecture include:
- Technology agnostic frontend components (web, mobile or other clients)
- Authentication APIs
- Different services level APIs
Microservice-based application is the way to build modern applications due to its flexibility in scaling up and other resource utilization benefits. But, when it comes to operational requirements, it gets complex as it has multiple moving parts in it. It involves taking care of all the moving components, their release and upgrade, and at the same time ensuring the health of the components. These factors directly lead to increased complexities while scaling, as the dependencies too increase. The major complexities arise in:
- Managing heterogeneous environments
- Continuous integration and incremental rollouts
These are also considered as standard operational requirements related to the rollout of an application in a microservice environment. While Docker and similar container technologies help overcome heterogeneous environments, platforms like Kubernetes provide the required consciousness integration instruments to simplify the complexities.
To get a good understanding of the system and make proactive decisions, the Site Reliability Engineer (SRE) needs to monitor and measure the factors given below in a production environment:
- Latency of all services
- Request Per Second (RPS) from different services
- Data volume
- Failure vs success rate
- Help in backend traffic distribution
- Communication security
- Zero-downtime rollout
- Intelligent load balancing
- Service discovery
- Retry/Timeout implications
- A/B testing for different services
- Visibility into service latency
- Distributed tracing
- Circuit breaker
- Retry storm
Some of the items listed are handled during the application development itself. For example, enabling SSL for ensuring secure communication to a service is done at the development stage. Here, the control is with the developer. Non-adherence of the standards specified by the security team becomes a weak point in the system. If the operational person gets full control over the security, that would be a clean method, as it is an important task for the operational person rather than the developer.
Similarly, it is possible to bring the control of all the items listed above under the operations team by abstracting them via tools. That is exactly what a service mesh does.
Service mesh tries to tap in and solve most of the SRE problems. It provides full visibility into the production systems, based on which an SRE can make instrumentation or proactive decisions to scale up or down or take other key actions to sustain the SLA agreements or other objectives specific to your application. All these are possible without changing the application code or business logic.
In this type of environment, service developers need not worry about ensuring the security of the ingress and egress requests, as it’s already taken care of by the service mesh. Similarly, cluster-aware load balancing, service discovery, etc. are also taken care of. Taking off all these complexities or platform awareness requirements from the service developers’ hands makes them more productive and helps them in concentrating on business logic. This is what a service mesh does – offering a bunch of proxies which can be used by services to abstract the network requirements. The proxies or the components of the service mesh are described below.
Control and Data plane
Service mesh has three main components and proxies – control plane, data plane, and sidecar. This separation is based on its responsibilities.
Sidecar, as the name implies, is a proxy that behaves like a sidecar in a motorcycle. These proxies or sidecars are deployed at the infrastructure layer level and enable the services to route their request through them instead of reaching the network layer directly. These sidecars carry out all the actions required for the ingress and egress traffic from a given application. They follow the rules provided by the service mesh’s control plane. They are mainly responsible for service discovery, health of the services, routing requests, authentication and authorization of requests, load balancing, and observability.
Now, you can think of a data plane as the worker who does the actual magic on the ground. The sidecar/proxy running along with a service is the data plane of the service mesh. The control plane helps to manage the data plane and give required instructions based on operational requirements. Also, control-plane supplies the required management tools to collect and visualize the metrics and dynamically do configuration changes, if needed. A control plane can offer a full view of what’s happening in the system.
The control-plane components have to be run separately to manage all the sidecars. So, in a cluster, there will be one control plane and n number of data planes to match the number of services. In other words, every replica of a service will have an accompanying sidecar with it.
You can have a high-level view of your application stack with service mesh. Some of the service meshes support outside Kubernetes environments, but it’s more favorable to use Kubernetes as it gets all the instrumentation to manage operational pipelines.
Now if you zoom into your application further, you can see where the sidecar and control plane runs. As mentioned, you can see that every instance of the service will have a proxy or sidecar associated with it to manage the ingress and egress traffic, and a control plane to give instructions to the sidecar.
Istio and Linkerd are the two major service meshes available in the market. Istio democratized the concept of service mesh and showcased its importance in microservice environments. Istio is backed by Google, Lyft and IBM. On the other side, Linkerd is a simpler version of Istio, which is a Cloud Native Computing Foundation’s (CNCF) project. It has started to gain traction.
Control plane for your application
It is a widely known concept to use service mesh for managing microservices in an application. Though service mesh is not generally used outside Kubernetes, treating the service mesh as a control plane at the application level will certainly take off the developer burden in terms of:
- Identifying the service dependencies
- Handling request retries (retry storm scenarios)
- Dealing with a request timeout
- Making decisions about enabling HTTPS/TLS transparent to microservice.
- Handling rate limit of a service
- Performing A/B testing
- Metrics collection
- Dynamic load-balancing rules based on the system metrics
Long story short, service mesh can be used to manage and run the application in a production environment and take proactive decisions based on the system behavior. It even helps in tracing and debugging microservice APIs. The service mesh has many features to manage and offer better visibility into the production system activities. In this perspective, the service mesh becomes the control plane for an application.
These kinds of services are not new; they existed even before the concept of service mesh became popular. But, they were all very tightly coupled and built specifically for some particular microservice environments. Now with service mesh, the common parts can be brought out and reused in any microservice environment without much friction.