Scheduling in Kubernetes is the process of binding pending pods to nodes, and is performed by a component of Kubernetes called kube-scheduler. The scheduler’s decisions, whether or where a pod can or can not be scheduled, are guided by its configurable policy.
As Kubernetes clusters are very dynamic and their state changes over time, there may be desire to move already running pods to some other nodes for various reasons:
- Some nodes are under or over utilized.
- The original scheduling decision does not hold true any more, as taints or labels are added to or removed from nodes, pod/node affinity requirements are not satisfied any more.
- Some nodes failed and their pods moved to other nodes.
- New nodes are added to clusters.
Descheduler
What is Descheduler? Descheduler relies on default scheduler and deletes nodes when they are no longer needed. It saves your resources utilization and significantly reduces your spendings.
Why do we need Descheduler? Kubernetes users relied primarily on the Kubernetes Cluster Autoscaler (CAS) to dynamically adjust the compute capacity of their clusters or Cloud provider autoscaler solution like Amazon EC2 Auto Scaling groups. Using Descheduler you don’t need to create dozens of node groups to achieve the flexibility and diversity. It consolidates instance orchestration responsibilities within a single system, which is simpler, more stable and cluster-aware.
The Descheduler can be run as a Job
, CronJob
, or Deployment
inside of a k8s cluster. It has the advantage of being able to be run multiple times without needing user intervention. The descheduler pod is run as a critical pod in the kube-system
namespace to avoid being evicted by itself or by the kubelet.
The following diagram provides a visualization of most of the strategies:
Descheduler made up a set of plugins which implements a different strategies for pod scheduling/descheduling:
Name | Description |
RemoveDuplicates | This strategy plugin makes sure that there is only one pod associated with a ReplicaSet (RS), ReplicationController (RC), StatefulSet, or Job running on the same node. If there are more, those duplicate pods are evicted for better spreading of pods in a cluster. |
LowNodeUtilization | This strategy finds nodes that are under utilized and evicts pods, if possible, from other nodes in the hope that recreation of evicted pods will be scheduled on these underutilized nodes. |
HighNodeUtilization | This strategy finds nodes that are under utilized and evicts pods from the nodes in the hope that these pods will be scheduled compactly into fewer nodes. |
RemovePodsHavingTooManyRestarts | This strategy makes sure that pods having too many restarts are removed from nodes. For example a pod with EBS/PD that can’t get the volume/disk attached to the instance, then the pod should be re-scheduled to other nodes. |
PodLifeTime | This strategy evicts pods that are older than maxPodLifeTimeSeconds . |
RemoveFailedPods | This strategy evicts pods that are in failed status phase. |
EKS alternative
Karpenter is an open-source cluster autoscaler that automatically provisions new nodes in response to unschedulable pods. Karpenter evaluates the aggregate resource requirements of the pending pods and chooses the optimal instance type to run them. It will automatically scale-in or terminate instances that don’t have any non-daemonset pods to reduce waste. It also supports a consolidation feature which will actively move pods around and either delete or replace nodes with cheaper versions to reduce cluster cost.
Karpenter is not as tightly coupled to Kubernetes versions (as CAS is) and doesn’t require you to jump between AWS and Kubernetes APIs. It was designed to overcome some of the challenges presented by Cluster Autoscaler by providing simplified ways to:
- Provision nodes based on workload requirements.
- Create diverse node configurations by instance type, using flexible workload provisioner options. Instead of managing many specific custom node groups, Karpenter could let you manage diverse workload capacity with a single, flexible provisioner.
- Achieve improved pod scheduling at scale by quickly launching nodes and scheduling pods.
AKS alternative
Karpenter is pretty good solution for cloud native infrastructure but it does support only EKS cluster for now. The AKS Karpenter Provider enables node autoprovisioning using Karpenter on your AKS cluster.
v1alpha2
).Karpenter provider for AKS is an official open-source node provisioning project built for Kubernetes by Azure team. It improves the efficiency and cost of running workloads on Kubernetes clusters by:
- Watching for pods that the Kubernetes scheduler has marked as unschedulable.
- Evaluating scheduling constraints (resource requests, nodeselectors, affinities, tolerations, and topology spread constraints) requested by the pods.
- Provisioning nodes that meet the requirements of the pods.
- Removing the nodes when the nodes are no longer needed.
Hope you like the post. Please follow our previous Kubernetes topic here. Subscribe to our newsletter or follow us on Twitter and LinkedIn.
Save your privacy, bean ethical!