Elasticsearch常用命令

1, Index

curl -X GET 'localhost:9200/_cat/indices?v'                          # 查看所有index

curl -X GET "localhost:9200/_cat/indices/INDEX_PATTERN-*?v&s=index"  # 查看某些名字的index

curl -X GET "localhost:9200/_cat/indices?v&s=docs.count:desc"        # 查看按容量大小排序的index

curl -XDELETE "localhost:9200/INDEX_NAME"                            # 删除某些index

curl -X GET "localhost:9200/INDEX_NAME/_count"                       # 查看某个INDEX的文档数量

curl -XGET "127.0.0.1:9200/_all/_settings?pretty=true "              # 查看所有Index的配置,会列出所有Index,很长...

curl -XGET "127.0.0.1:9200/office_dns_log-*/_settings?pretty=true"   # 查看某些Index的配置

curl -XGET "127.0.0.1:9200/office_dns_log-*/_settings/index.number_*?pretty=true" # 查看某些Index的shards和replicas数量

# 修改索引的replicas数量
curl -XPUT "localhost:9200/INDEX_NAME/_settings?pretty" -H 'Content-Type: application/json' -d' { "number_of_replicas": 2 }'

# 查看INDEX的mapping信息 curl -XGET "127.0.0.1:9200/INDEX_NAME/_mapping?include_type_name=true&pretty=true"

主要参考:
Get index settings API

2, Node

(这里主要参考了这篇文章.)

curl -X GET "localhost:9200/_cat/nodes?v"    # 查看各节点内存使用状态及负载情况
ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.18.192.101           37          80   4    0.41    0.49     0.52 dim       -      it-elk-node3
172.18.192.102           69          90   4    1.19    1.53     1.56 dim       *      it-elk-node4
172.18.192.100           36         100   1    2.91    2.60     2.86 dim       -      it-elk-node2

curl -X GET "localhost:9200/_nodes/stats"
curl -X GET "localhost:9200/_nodes/nodeId1,nodeId2/stats"

# return just indices
curl -X GET "localhost:9200/_nodes/stats/indices"
# return just os and process
curl -X GET "localhost:9200/_nodes/stats/os,process"
# return just process for node with IP address 10.0.0.1
curl -X GET "localhost:9200/_nodes/10.0.0.1/stats/process"

# return just process
curl -X GET "localhost:9200/_nodes/process"
# same as above
curl -X GET "localhost:9200/_nodes/_all/process"
# return just jvm and process of only nodeId1 and nodeId2
curl -X GET "localhost:9200/_nodes/nodeId1,nodeId2/jvm,process"
# same as above
curl -X GET "localhost:9200/_nodes/nodeId1,nodeId2/info/jvm,process"
# return all the information of only nodeId1 and nodeId2
curl -X GET "localhost:9200/_nodes/nodeId1,nodeId2/_all"


# Fielddata summarised by node
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata?fields=field1,field2"
# Fielddata summarised by node and index
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata?level=indices&fields=field1,field2"
# Fielddata summarised by node, index, and shard
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata?level=shards&fields=field1,field2"
# You can use wildcards for field names
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata?fields=field*"

3, segment

# 查看所有INDEX的segment(注意如果INDEX较多, 这个列表可能很长)
curl -u elastic:HMEaQXtLiJaD4zn1ZxzM -X GET "127.0.0.1:9200/_cat/segments?v"

# 查看某个INDEX的segment
curl -u elastic:HMEaQXtLiJaD4zn1ZxzM -X GET "127.0.0.1:9200/_cat/segments/INDEX_PATTERN-*?v"

4, templates相关

template可以定义每个index的设置, 以及每个field的类型, 等等(仅对将来的INDEX有效, 不对现在已有的INDEX有效).

# 查看所有templates
curl -X GET "127.0.0.1:9200/_cat/templates?v&s=name"

# 查看某一个template
curl -X GET "127.0.0.1:9200/_template/template_1?pretty=true"

# 针对某INDEX设定一个template(仅针对未来创建的index有效)
curl -XPUT 127.0.0.1:9200/_template/template_1 -H 'Content-Type: application/json' -d'{
  "index_patterns": ["office_dns*"],
  "settings" : {
    "index.refresh_interval": "30s",
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "index.translog.durability": "request",
    "index.translog.sync_interval": "30s"
  },
  "order" : 1
}'

# 提示:
index.refresh_interval: INDEX的刷新间隔, 默认为1s, 即写入的数据经过多少可以在ES中搜索到.
number_of_shards: 分片的数量(重要), 建议设置为node数量, 如果一个index的容量超过了30G, 会导致查询速度很慢, 此时一定要通过shards数量来分散index
number_of_replicas: 副本数量
index.translog.durability: 将translog数据(包括index/update/delete等)待久化至硬盘的方式,request是系统默认方式.
index.translog.sync_interval:translog提交间隔, 默认是5s

参考文档: https://blog.csdn.net/u014646662/article/details/99293604

5, License相关

# 查看License
curl -XGET 'http://127.0.0.1:9200/_license'

# 删除License
curl -X DELETE "localhost:9200/_license"

# 导入License(本地的License文件为aaa.json), 如果启用了用户名/密码, 这里需要加上用户密码, 例如-u elastic:password
curl -XPUT 'http://127.0.0.1:9200/_xpack/license' -H "Content-Type: application/json" -d @aaa.json

6, shards管理

curl -XGET "127.0.0.1:9200/_cluster/settings?pretty"    # 查看集群最大分片数量
{
  "persistent" : {
    "cluster" : {
      "max_shards_per_node" : "30000"    # 单个节点能容纳30000个shards,默认值是1000
    }
    "xpack" : 
      "monitoring" : 
        "collection" : {
          "enabled" : "true"
        }
      }
    }
  }
}


curl -XGET "127.0.0.1:9200/_cluster/health?pretty"    # 查看当前shards使用数量
{
  "cluster_name" : "it-elk",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 7233,
  "active_shards" : 7248,
  "relocating_shards" : 0,
  "initializing_shards" : 12,
  "unassigned_shards" : 5323,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 2085,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 3167844,
  "active_shards_percent_as_number" : 57.601525868234916        #数据的正常率,100表示一切ok
}

# 查看未分配的分片
curl -XGET "127.0.0.1:9200/_cat/shards?h=index,shard,prirep,state,unassigned.*&pretty"|grep UNASSIGNED | wc -l

# 查看未分配分片, 以及未分配原因
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED

7, template管理

# 查看所有templates
curl -X GET "127.0.0.1:9200/_cat/templates?v&s=name"

# 查看某一个template
curl -X GET "127.0.0.1:9200/_template/TEMPLATE_NAME?pretty=true"

# 设定一个名为mail的template, 使得以后的mail-w3svc1-*索引具备以下设定
curl -u -XPUT 127.0.0.1:9200/_template/mail -H 'Content-Type: application/json' -d'{
  "index_patterns": ["mail-w3svc1-*"],
  "settings" : {
    "index.refresh_interval": "30s",
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "index.translog.durability": "request",
    "index.translog.sync_interval": "30s"
  },
  "mappings": {
    "properties": {
      "rt": { "type": "integer" },
      "status": { "type": "integer" },
      "width": { "type": "float" }
      "uri": { "type": "text", "fielddata": true },
      "username": { "type": "text", "fielddata": true },
      "server_ip": { "type": "text", "fielddata": true },
      "client_ip": { "type": "text", "fielddata": true }
    }
  },
  "order" : 1
}'

# 提示:
1, index.refresh_interval: INDEX的刷新间隔, 默认为1s, 即写入的数据经过多少可以在ES中搜索到
由于刷新是很耗费资源的行为, 初次导入大量数据时, 可转设置长一点(如30s)等, 后来再改成5s或者10s.
2, number_of_shards: 主分片的数量(重要), 默认为1
如果一个index的容量超过了30G, 会导致查询速度很慢, 此时可以通过shards数量来分散index
3, number_of_replicas: 副本数量, 默认为1.
请注意, 如果一个INDEX的shards为5, 而replicas的话, 会导致这个INDEX一共有5*(1+1)个分片, 会拖慢集群性能.
建议: node数量<=shards数量*(replicas数量+1)
4, index.translog.durability: 将translog数据(包括index/update/delete等操作)待久化至硬盘的方式
5, index.translog.sync_interval: translog提交间隔, 默认是5s
6, mappings.properties: 表示rt/status这几个fields的类型为int类型,并开启username/server_ip等字段的外部脚本的聚合功能
常见字段类型可参考https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html
# 查看上面设定的mail模板
curl -X GET "127.0.0.1:9200/_template/mail?pretty=true"

8, Document

# 写入一条Document(如果index不存在则会自动创建)
curl -XPOST "127.0.0.1:9200/INDEX_NAME/_doc/1" -H "Content-Type: application/json" -d'{"name": "zhu kun"}'
 
# 查看一条Document
curl -XGET "127.0.0.1:9200/INDEX_NAME/_doc/1?pretty=true"
 
# 搜索一条数据
curl -XGET "127.0.0.1:9200/INDEX_NAME/_search?q=name:zhu&pretty=true"

9, 特别注意

一般text类型的field, 仅能在kibana中进行排序和搜索, 如果需要在脚本(如Grafana)中进行聚合(排序,统计,汇总等)操作,则需要设定fielddata为ture, 否则可能会报出如下的错误(参考官网这个文档):

Fielddata is disabled on text fields by default. Set `fielddata=true` on [`your_field_name`] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.

参考文档:
https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/#reason-2-too-many-shards-not-enough-nodes

Read More

docker pull offline

如果要在一个内网/网络很差的环境下运行docker pull, 应该会让人抓狂吧. 本文介绍一下怎么解决这个问题. 本文基于CentOS 7.

1, 使用Proxy

这是大多数人首先想到的方法. docker官网说export HTTPS_PROXY / export HTTP_PROXY以后运行docker pull即可使用代理, 但是我这里是无效的, 原因未知. 这里介绍一个解决办法:

$ vim /usr/lib/systemd/system/docker.service  #在[Service]段下添加如下2行
......
[Service]
Environment="HTTPS_PROXY=http://10.10.74.101:8888"
Environment="HTTP_PROXY=http://10.10.74.101:8888"
......
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker.service

然后再次运行docker pull就不会有问题了.

请注意: 使用本方法需要一个速度很棒的proxy_server . 如果你的proxy_server速度一般, 那么即使配置生效, 极慢的pull速度也会让你怀疑人生. 因此, 不妨考虑一下下面的方法.

2, 使用docker save/load进行镜像的导入导出

首先在一个网络不错的环境里, 进行docker pull, 并且将pull回来的docker save成一个image

$ docker pull docker.elastic.co/elasticsearch/elasticsearch:6.6.2
$ docker save -o es.img docker.elastic.co/elasticsearch/elasticsearch:6.6.2

然后将这个es.img文件拷贝到没有公网/网络很差的系统里

$ docker load -i es.img
Read More

解决df -h命令卡住

通过strace命令判断是在哪个mount point卡住

$ strace df
......
stat("/sys/fs/cgroup/devices", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/fs/cgroup/pids", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/fs/cgroup/hugetlb", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/fs/cgroup/blkio", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/fs/cgroup/freezer", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/fs/cgroup/cpuset", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/fs/cgroup/cpu,cpuacct", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/fs/cgroup/perf_event", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/kernel/config", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0
stat("/proc/sys/fs/binfmt_misc",    #这里卡住了

解决/proc/sys/fs/binfmt_misc被卡住问题(以下方法任选其一)

1. 执行systemctl restart proc-sys-fs-binfmt_misc.automount;

2. 升级到最新 systemd-219-57 版本;

3. 按照红帽知识库的步骤对 proc-sys-fs-binfmt_misc.automount 进行 mask 操作, 只进行静态的 mount 操作;

Read More

Prometheus中rate和irate的区别

rate()

rate(v range-vector) calculates the per-second average rate of increase of the time series in the range vector.

rate()函数计算某个时间序列范围内的每秒平均增长率。

Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for.

自适应单调性中断(比如target重启导致的计数器重置).

Also, the calculation extrapolates to the ends of the time range, allowing for missed scrapes or imperfect alignment of scrape cycles with the range’s time period.

计算结果是推算到每个时间范围的最后而得出, 允许漏抓和抓取周期与时间范围的不完美结合.

The following example expression returns the per-second rate of HTTP requests as measured over the last 5 minutes, per time series in the range vector:

以下示例返回最后五分钟HTTP请求每秒增长率

rate(http_requests_total{job="api-server"}[5m])

rate should only be used with counters. It is best suited for alerting, and for graphing of slow-moving counters.

rate应该只和计数器一起使用。最适合告警和缓慢计数器的绘图。 (more…)

Read More

使用Collectd+Prometheus+Grafana监控nginx状态

在安装Nginx时,如果指定了–with-http_stub_status_module, 就可以使用本文的方法进行监控. 不用担心, 不论是从rpm/apt安装的Nginx, 均自带了该Module.

一般的建议是, 在nginx的机器上同时安装Collectd和collectd_exporter, 然后将数据导出到Prometheus(一般位于第三方服务器), 再从Grafana读取Prometheus中的数据.

1, 配置nginx

安装Nginx的过程此处略过, 我们需要确定一下Nginx安装了http_stub_status_module.

$ sudo nginx -V | grep http_sub_status
nginx version: nginx/1.14.0
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
built with OpenSSL 1.0.2k-fips  26 Jan 2017
TLS SNI support enabled
configure arguments: --user=nginx --group=nginx --prefix=/usr/local/nginx --conf-path=...

配置Nginx启用该module

location /nginx_status {
    stub_status on;
    access_log off;
    allow 127.0.0.1;
    deny all;
}

然后便可以通过http://ip/nginx_status来获取相关状态信息.

$ curl http://127.0.0.1/nginx_status
Active connections: 29
server accepts handled requests
 17750380 17750380 6225361
Reading: 0 Writing: 1 Waiting: 28

(more…)

Read More