elasthicsearch 学习

杂项

运行环境

elasticsearch最低要求java 8环境

启动

cd elasticsearch-5.0.2/bin
./elasticsearch 

或者制定集群和节点名
./elasticsearch -Ecluster.name=my_cluster_name -Enode.name=my_node_name

设置

PUT /_settings
{  "number_of_replicas" : 0 } 设置复制分片个数
{  "search.default_search_timeout" : -1 } 设置超时时间，-1为任意值

随机生成测试数据

json-generator

状态说明

green：着一切良好（集群所有的功能都正常）
yellow：意味着所有的数据都是可用的，但是一些复制分片可能没有正确分发（集群的所有功能还是正常的）
red：意味着因为某些原因导致有些数据不能使用。注意，即使集群状态是red，它仍然可以运行一部分的功能。（例如，它依然可以从一些可用的分片处理搜索请求）但你应该尽快去修复它，因为这样会使搜索结果丢失一些数据

API

监控分片的的大小: _cat/shards
查看集群健康(windows下 curl 参数必须是双引号)：

GET /_cat/health?v
curl -XGET 'localhost:9200/_cat/health?v&pretty'

查看节点数

GET /_cat/nodes?v
curl -XGET 'localhost:9200/_cat/nodes?v&pretty'

查看索引数

GET /_cat/indices?v
curl -XGET 'localhost:9200/_cat/indices?v&pretty'

创建索引(名为customer)

PUT /customer?pretty
curl -XPUT 'localhost:9200/customer?pretty'

删除索引

DELETE /customer?pretty
curl -XDELETE 'localhost:9200/customer?pretty&pretty'

创建文档

PUT /customer/external/1?pretty
{
"name": "John Doe"
}
curl -XPUT 'localhost:9200/customer/external/1?pretty' -d'
{
"name": "John Doe"
}'

更新文档

POST /customer/external/1/_update?pretty
{
  "doc": { "name": "Jane Doe" }
}
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty&pretty' -d'
{
  "doc": { "name": "Jane Doe" }
}'

使用script更新
POST /customer/external/1/_update?pretty
{
  "script" : "ctx._source.age += 5"
}
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty&pretty' -d'
{
  "script" : "ctx._source.age += 5"
}'

删除文档 by id

DELETE /customer/external/2?pretty
curl -XDELETE 'localhost:9200/customer/external/2?pretty&pretty'

删除文档 by query(5.X)

POST  /customer/external/_delete_by_query
{
    "query" : {
        "match" : {
            "name": "test"
        }
    }
}

批处理

POST /customer/external/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":""}}
{"name": "Jane Doe" }


POST /customer/external/_bulk?pretty
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}


curl -XPOST 'localhost:9200/bank/account/_bulk?pretty&refresh' --data-binary "@accounts.json"

查询

URI查询

1 根据id(1)进行查询：

/customer/external/1?pretty
curl -XGET 'localhost:9200/customer/external/1?pretty&pretty'

2 使用_search

GET /bank/_search?q=*&sort=account_number:asc

body查询

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "_source": ["account_number", "balance"]
  "from": 10,
  "size": 10
}

query:查询条件
sort:排序条件
form:返回数据起始index，默认为1
size:返回数据条数，默认为10
_source:查询的column,默认为全部

查询条件

匹配全部 "match_all": {}
匹配特定column: { "match": { "account_number": 20 } }
多个条件：

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

上述例子，bool should子句只要文档满足其中一个查询，就认为匹配

match

"match": { "account_number": 20 }表示查询account_number为20的document
"match": { "address": "mill lane" }表示查询address中包含mill 或 lane的document
注意：此处不一定是空格，某些特殊字符也可能被看作是分隔符
"match_phrase": { "address": "mill lane" } 查询address中包含“mill lane”的document

bool

bool must子句指定：当所有的查询都返回true的时候，才认为匹配文档
bool should子句只要文档满足其中一个查询，就认为匹配
bool must_not子句指定当所有查询都不满足的时候，就认为匹配文档
bool range balance指定值的范围
bool filter {...}将查询条件放在filter中，将不计算该部分的得分，加快检索速度
多个条件组合（条件之间是and关系）

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

执行过滤

bool中使用filter

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

查询结果说明

took-elasticsearch执行搜索花费的毫秒数
timed_out-告诉我们这次搜索是否超时
_shards- 告诉我们搜索了多少个分片，以及搜索成功和失败的分片数
hits-搜索返回的文档结果
hits.total 一共命中了多少结果
sort-搜索排序规则，如果没有该字段，则按相关度排序
_score和max_score 暂时先忽略这个参数（文档得分，反映相关度）

指标

最大文档数

每个elasticsaerch分片都是一个Lucene 索引。在单个索引中你最多可以存储2,147,483,519 (= Integer.MAX_VALUE - 128) 个文档。你可以使用 _cat/shards api去监控分片的的大小

https://github.com/13428282016/elasticsearch-CN
https://www.elastic.co/guide/en/elasticsearch/reference/5.3/index.html
https://www.elastic.co/guide/en/kibana/5.3/index.html
http://106.186.120.253/preview/search-in-depth.html
https://es.xiaoleilu.com/010_Intro/30_Tutorial_Search.html