Elastic Stack is a collection of open-source technologies for containerized applications that incorporates Elasticsearch for data intake, storage, enrichment, visualization, and analysis. Elasticsearch is an open-source platform that ingests application data, indexes it, and stores it for analytics as a distributed search and analytics engine.
Elasticsearch is commonly referred to as a write-heavy database because it collects massive amounts of data while indexing a variety of data kinds. Kubernetes makes it simple to configure, manage, and grow Elasticsearch clusters to handle such large volumes of data. Kubernetes further abstracts cluster administration by making it easier to deploy resources for Elasticsearch using Infrastructure-as-Code setups.
While Kubernetes cannot retain all of the data created by a cluster, persistent volumes can be utilized to keep it safe for later use. OpenEBS provides local persistent volumes (LocalPV) that allow data to be stored on actual disks to help with this.
Many users, including the Cloud Native Computing Foundation, ByteDance (TikTok), and Zeta Associates, have reported their experiences with adopting OpenEBS for local storage management in Kubernetes for Elasticsearch on the OpenEBS community’s Adopters list (Lockheed Martin).
We’ll look at how OpenEBS LocalPV may provide data storage for Elasticsearch clusters in this guide. This manual will also cover the following topics:
- Elastic Stack operators’ primary functions in a Kubernetes cluster
- Elasticsearch operators are used with Fluentd and Kibana to construct the EFK stack.
- Prometheus and Grafana are used to monitor Elasticsearch cluster metrics.
Getting Started with Elasticsearch Analytics
Elasticsearch improves the efficiency with which enormous amounts of textual, graphical, or numerical data can be stored and searched. Kubernetes makes it simple to manage Elasticsearch node connections, making it easier to install Elasticsearch on-premises or in hosted cloud environments. It’s important to realize that Elasticsearch nodes are not the same as Kubernetes cluster nodes. A Kubernetes node is a real or virtual system that operates the orchestrator, whereas an Elasticsearch node runs a single instance of Elasticsearch.
Elasticsearch Cluster Topology
An Elasticsearch node can be thought of as a POD from the perspective of Kubernetes. There are three sorts of Elasticsearch PODs formed when an Elasticsearch cluster is deployed:
- Master –manages the Elasticsearch cluster.
- Client – ddirects incoming traffic to appropriate channels.
- Data – rin charge of storing and retrieving cluster data.
The configuration of a typical 7 POD Elasticsearch cluster with three masters, two clients, and two data nodes is shown in the diagram below:
Creating manifest files for each of the cluster’s PODs is part of the Elasticsearch deployment process. OpenEBS adds a visibility tier for LocalPV Storage by connecting to the cluster and enabling cluster monitoring, logging, and topology tests. In addition, the following tools are used to enable cluster-wide analytics:
- Fluentd – an open-source data collection agent that works with Elasticsearch to gather, transform, and send log data to the Elastic Backend. Fluentd is installed on cluster nodes to gather and convert POD data before sending it to Elasticsearch data PODs for indexing and storage. It’s usually configured as a DaemonSet on each Kubernetes worker node.
- Kibana – Once the Kubernetes cluster is up and running, it must be monitored and managed. Kibana is utilized as a visualization tool for cluster data, and the Elasticsearch client service is provided as an environment variable in PODs that Kibana should connect to.
Solution Guide
The processes and essential considerations for establishing Elasticsearch clusters on Kubernetes using OpenEBS Persistent Volumes are detailed in the following solution guide. You can construct persistent storage for the EFK stack supported by Kubernetes, to which OpenEBS is deployed, by following the guide. The tutorial covers instructions for utilizing Prometheus and Grafana to perform metric checks and performance monitoring for the Elasticsearch cluster.
Please let me know how you utilize Elasticsearch in production and if you have a particularly interesting use case to share.