Elasticsearch介绍（五）ES生产环境使用之冷热数据架构实战

在现实项目中，elasticsearch的使用频率是非常高的。大型的项目中都离不开elasticsearch。我们知道在elasticsearch中，他的分片数量是在最开始创建index的时候就会设置好，已经设置就不能被修改。后期如果数据量大的话，我们一般会进行重建索引。这个工作也是一个不小的工作量。

但是在某些业务场景里面，数据量会很大，但是我们也不会经常使用到，那这个时候，我们重建索引就没有太大的必要。我们只需要把不使用的数据放在冷备机器上，让热的数据还是保持在高性能的服务器上供业务使用即可。这就是今天给大家介绍的冷热数据架构。

这里我们给大家演示一下全部过程。还是基于《Elasticsearch介绍（二）Elasticsearch 集群安装》安装的两台服务器来进行操作。这里我们回顾下之前的集群

序号	节点	ip
1	node-1	192.168.31.20
2	node-2	192.168.31.30

现在我们需要进行规划一下，把node-1做成是热节点，所有活跃的数据都保存在node-1上，把node-2节点做成冷备份节点，所有的冷数据备份都保存早node-2节点上。具体规划如下

序号	节点	ip	角色
1	node-1	192.168.31.20	热节点
2	node-2	192.168.31.30	冷节点

配置冷热节点主要和以下几点因素相关：

1、在elasticsearch的服务端配置文件里面添加冷热配置的节点

2、在创建的index上定义热节点存储。

3、通过api把index的数据迁移到冷接点存储。

好了，下面开始实战！

一、修改服务器端的配置

1）在192.168.31.20的elasticsearch的节点上添加配置：

node.attr.rack: r1
node.attr.box_type: hot

此时192.168.31.20上elasticsearch.yml的完整配置是：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#集群名, 同一集群节点的集群名应该一致
cluster.name: es-cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
# 节点名, 每个节点名应该不同
node.name: node-1

# 跨域相关配置
http.cors.enabled: true
http.cors.allow-origin: "*"

#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /mnt/elasticsearch-7.7.0/data
#
# Path to log files:
#
path.logs: /mnt/elasticsearch-7.7.0/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 192.168.31.20
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#所有节点 hosts
discovery.seed_hosts: ["192.168.31.20", "192.168.31.30"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["node-1","node-2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

# 代表有资格竞争 master
node.master: true
# 代表为数据节点
node.data: true
node.attr.rack: r1
node.attr.box_type: hot

1）在192.168.31.30的elasticsearch的节点上添加配置：

node.attr.rack: r9
node.attr.box_type: cool

此时192.168.31.20上elasticsearch.yml的完整配置是：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#集群名, 同一集群节点的集群名应该一致
cluster.name: es-cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
# 节点名, 每个节点名应该不同
node.name: node-2

# 跨域相关配置
http.cors.enabled: true
http.cors.allow-origin: "*"

#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /mnt/elasticsearch-7.7.0/data
#
# Path to log files:
#
path.logs: /mnt/elasticsearch-7.7.0/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 192.168.31.30
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#所有节点 hosts
discovery.seed_hosts: ["192.168.31.20", "192.168.31.30"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["node-1","node-2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

# 代表有资格竞争 master
node.master: true
# 代表为数据节点
node.data: true
node.attr.rack: r9
node.attr.box_type: cool

然后我们把这两台服务器进行下重启，重启完后，我们通过head可以看到

二、创建一个index的索引

这里我们不写api了，直接通过head的插件进行处理

put /test

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "10s",
    "index.routing.allocation.require.box_type": "hot"
  },
  "mappings": {
    "properties": {
      "username": {
        "type": "keyword"
      },
      "password": {
        "type": "text"
      }
    }
  }
}

上面的settings里面我们定义了3个分片，副本是1个，routing是hot

这时候我们看下具体的分片

看到了吗？集群变成了不健康的状态，但是我们这个是正常的，在这里给大家解释下：

1：由于我们的整个集群只有2台服务器，我们设置的副本是1，同时定义到了hot，我们在配置的集群里面一台是hot，一台是cool，所以hot只有一台，没法满足多一个副本，如果集群有多的机器，hot超过2个，则上面就不会报错。

2：在setting的时候，我们的这个test的index是要保存在hot的，node-2是cool的角色，所以他不会在cool角色的服务器上创建副本。

综上出现上诉问题是正常的。

三、向index上写入数据

post /test/_doc

{
  "username": "zhangsan",
  "password": "123456"
}

通过head插件的写入数据是：

然后我们可以看到数据进去了。

四、让index把数据写入cool角色的服务器

PUT test/_settings
{  
    "index.routing.allocation.require.box_type":"cool"
}

使用head插件的方式操作

这时候我们看下分片的情况

是不是就正常了，这里由于我们只有2台机器进行演示，如果条件允许的话，在多角色的话，就可以看到数据都进去了cool角色的服务器上了。

总结下：

1、elasticsearch的冷热数据架构在数据分析类的项目中还是比较常见的。

2、elasticsearch的冷热数据架构在数据不用的时候，我们需要使用api进行迁移，这个迁移可以用curl，也可以使用javaapi，一般在生产环境里面我们会搭配crontab或者调度器进行调用。

3、在生产上使用的话，我们一般不会直接使用这种test这种index，一般都是 test-2022-07-* 这样的index，主要是确保进行分配指定条件的数据进入cool服务器进行存储，把高性能的服务器应用于在线的业务。

正文

Elasticsearch介绍（五）ES生产环境使用之冷热数据架构实战

一、修改服务器端的配置

二、创建一个index的索引

三、向index上写入数据

四、让index把数据写入cool角色的服务器

相关阅读

副本（Replica）在Elasticsearch中扮演什么角色？

为什么说Elasticsearch是分布式的？它的分布式特性体现在哪些方面？

请解释Elasticsearch中的倒排索引是如何工作的？

如何在FlinkSQL中使用Elasticsearch连接器？

如何监控 Elasticsearch 集群状态？

在并发情况下，Elasticsearch 如果保证读写一致？

Elasticsearch 对于大数据量（上亿量级）的聚合如何实现？

对于 GC 方面，在使用 Elasticsearch 时要注意什么？

发表评论取消回复

还没有评论，来说两句吧...

目录[+]