注意 下面 的users 是一个索引,类似于一张表
创建一个文档
使用post方式
# create document 自动生成 _id POST users/_doc { "user": "Jack", "post_date": "2020-10-11T21:18:55", "message": "trying out kibana" }
{ "_index" : "users", "_type" : "_doc", "_id" : "z7jQF3UBdW2iN5NVuq-w", // id是系统自动帮我们生成的 "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 }
使用put
# create document 指定id。如果id已经存在,报错 # 这里的op_type 操作类型,即为create,还可以为index PUT users/_doc/1?op_type=create # 等同于 PUT users/_create/1 { "user": "Mike ~ ~", "post_date": "2020-10-11T21:22:24", "message": "trying out Elasticsearch" }
{ "_index" : "users", "_type" : "_doc", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1 }
当再次执行 这个put请求,就会报错
{ "error": { "root_cause": [ { "type": "version_conflict_engine_exception", "reason": "[1]: version conflict, document already exists (current version [1])", "index_uuid": "P4KITo6zS_mkci8tDNl7Tg", "shard": "0", "index": "users" } ], "type": "version_conflict_engine_exception", // 由于版本冲突 "reason": "[1]: version conflict, document already exists (current version [1])", "index_uuid": "P4KITo6zS_mkci8tDNl7Tg", "shard": "0", "index": "users" }, "status": 409 }
GET获取
GET users/_doc/1
{ "_index" : "users", // 代表 文档 所处的 索引 "_type" : "_doc", // 它的类型 是一个 文档 "_id" : "1", "_version" : 1, // version代表 它已经进行了1次改动了。 "_seq_no" : 1, "_primary_term" : 1, "found" : true, // source 表示 文档所有的原始信息 "_source" : { "user" : "Mike ~ ~", "post_date" : "2020-10-11T21:22:24", "message" : "trying out Elasticsearch" } }
我们继续新增数据,并且使用index方式,使用已经存在的id 1.
获取id为1的现在的值
{
"_index" : "users",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 1,
"_primary_term" : 1,
"found" : true,
"_source" : {
"user" : "Mike ~ ~",
"post_date" : "2020-10-11T21:22:24",
"message" : "trying out Elasticsearch"
}
}
执行index方式
PUT users/_doc/1
{
"msg": "I am using index !!!"
}
{
"_index" : "users",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1
}
查询id为1 现在的值
{
"_index" : "users",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"_seq_no" : 3,
"_primary_term" : 1,
"found" : true,
"_source" : {
"msg" : "I am using index !!!"
}
}
这时候版本号 会 +1 .
Delete users/_doc/2
{
"_index" : "users",
"_type" : "_doc",
"_id" : "2",
"_version" : 2,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 5,
"_primary_term" : 1
}
对id为1进行字段的增加。
POST users/_update/1
{
"doc": {
"post_date": "2020-10-11T21:45:15",
"message": "trying out Elasticsearch",
"status": "update success"
}
}
{
"_index" : "users",
"_type" : "_doc",
"_id" : "1",
"_version" : 3,
"_seq_no" : 4,
"_primary_term" : 1,
"found" : true,
"_source" : {
"msg" : "I am using index !!!",
"post_date" : "2020-10-11T21:45:15",
"message" : "trying out Elasticsearch",
"status" : "update success"
}
}
POST _bulk
{"index": {"_index": "test", "_id": "1"}}
{"filed1": "value1"}
{"delete": {"_index": "test", "_id": "2"}}
{"create": {"_index": "test2", "_id": "3"}}
{"field1": "value3"}
{"update": {"_id": "1", "_index": "test"}}
{"doc": {"field2": "value2"}}
{
"took" : 752,
"errors" : false,
"items" : [
{
"index" : {
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"delete" : {
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"result" : "not_found",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1,
"status" : 404
}
},
{
"create" : {
"_index" : "test2",
"_type" : "_doc",
"_id" : "3",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"update" : {
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1,
"status" : 200
}
}
]
}
GET /_mget
{
"docs": [
{
"_index": "test",
"_id": "1"
},
{
"_index": "test",
"_id": "2"
}
]
}
{
"docs" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"_seq_no" : 2,
"_primary_term" : 1,
"found" : true,
"_source" : {
"filed1" : "value1",
"field2" : "value2"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"found" : false
}
]
}
GET users/_msearch
{"index" : "test"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{"index" : "test", "search_type" : "dfs_query_then_fetch"}
{"query" : {"match_all" : {}}}
{}
{"query" : {"match_all" : {}}}
{"query" : {"match_all" : {}}}
{"search_type" : "dfs_query_then_fetch"}
{"query" : {"match_all" : {}}}
GET twitter/_search
{
"query": {
"multi_match": {
"query": "朝阳",
"fields": [
"user",
"address^3",
"message"
],
"type": "best_fields"
}
}
}
不知道哪一个字段有这个关键词,可以使用上面的多字段搜索;multi_search 的type 为 best_fields
也就是说它搜索了3个字段。最终的分数 _score 是按照得分最高的那个字段的分数为准。
高版本的Elasticsearch 支持 SQL查询
GET /_sql?
{
"query": "select * from yx_device_event_info where appName = 'Telegram'"
}
也可以将我们的 sql 语句转成 dsl 语句
GET /_sql/translate
{
"query": "select * from yx_device_event_info where appName = 'Telegram'"
}
profile API 是调试工具,它添加了有关执行的详细信息搜索请求中的每个组件。它为用户提供有关搜索的每个步骤的洞察了,请求执行并可以帮助确定某些请求为何缓慢。
GET twitter/_search
{
"profile": "true",
"query": {
"match": {
"city": "北京"
}
}
}
聚合框架有助于基于搜索查询提供聚合数据。它基于成为聚合的简单构建模块,可以组合以构建复杂的数据摘要。
图书和搜索引擎的类比
Anaylsis - 文本分析是把全文本转换一系列单词(term / token)的过程,也叫分词
Analysis是通过Analyzer来实现的
除了在数据写入时转换词条,匹配Query语句的时候也需要用相同的分析器对查询语句进行分析。
分词器是专门处理分词的组件,Analyzer由三部分组成
Character Filters -> Tokenizer -> Token Filters
过滤 停用词 默认是关闭的状态
- 按词切分
- 小写处理
# 输入一句话,对这句话进行分词处理 GET _analyze { "analyzer": "standard", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
{ "tokens" : [ { "token" : "2", "start_offset" : 0, "end_offset" : 1, "type" : "<NUM>", "position" : 0 }, { "token" : "running", "start_offset" : 2, "end_offset" : 9, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "quick", "start_offset" : 10, "end_offset" : 15, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "brown", "start_offset" : 16, "end_offset" : 21, "type" : "<ALPHANUM>", "position" : 3 }, { "token" : "foxes", "start_offset" : 22, "end_offset" : 27, "type" : "<ALPHANUM>", "position" : 4 }, { "token" : "leap", "start_offset" : 28, "end_offset" : 32, "type" : "<ALPHANUM>", "position" : 5 }, { "token" : "over", "start_offset" : 33, "end_offset" : 37, "type" : "<ALPHANUM>", "position" : 6 }, { "token" : "lazy", "start_offset" : 38, "end_offset" : 42, "type" : "<ALPHANUM>", "position" : 7 }, { "token" : "dogs", "start_offset" : 43, "end_offset" : 47, "type" : "<ALPHANUM>", "position" : 8 }, { "token" : "in", "start_offset" : 48, "end_offset" : 50, "type" : "<ALPHANUM>", "position" : 9 }, { "token" : "the", "start_offset" : 51, "end_offset" : 54, "type" : "<ALPHANUM>", "position" : 10 }, { "token" : "summer", "start_offset" : 55, "end_offset" : 61, "type" : "<ALPHANUM>", "position" : 11 }, { "token" : "evening", "start_offset" : 62, "end_offset" : 69, "type" : "<ALPHANUM>", "position" : 12 } ] }
- 按照非字母切分,非字母的都被去除
- 小写处理
GET _analyze { "analyzer": "simple", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
{ "tokens" : [ { "token" : "running", "start_offset" : 2, "end_offset" : 9, "type" : "word", "position" : 0 }, { "token" : "quick", "start_offset" : 10, "end_offset" : 15, "type" : "word", "position" : 1 }, { "token" : "brown", "start_offset" : 16, "end_offset" : 21, "type" : "word", "position" : 2 }, { "token" : "foxes", "start_offset" : 22, "end_offset" : 27, "type" : "word", "position" : 3 }, { "token" : "leap", "start_offset" : 28, "end_offset" : 32, "type" : "word", "position" : 4 }, { "token" : "over", "start_offset" : 33, "end_offset" : 37, "type" : "word", "position" : 5 }, { "token" : "lazy", "start_offset" : 38, "end_offset" : 42, "type" : "word", "position" : 6 }, { "token" : "dogs", "start_offset" : 43, "end_offset" : 47, "type" : "word", "position" : 7 }, { "token" : "in", "start_offset" : 48, "end_offset" : 50, "type" : "word", "position" : 8 }, { "token" : "the", "start_offset" : 51, "end_offset" : 54, "type" : "word", "position" : 9 }, { "token" : "summer", "start_offset" : 55, "end_offset" : 61, "type" : "word", "position" : 10 }, { "token" : "evening", "start_offset" : 62, "end_offset" : 69, "type" : "word", "position" : 11 } ] }
相比Simple Analyzer,多了stop filter
- 会把the,a,is,等修饰性词语去除
# 停用词过滤 GET _analyze { "analyzer": "stop", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
{ "tokens" : [ { "token" : "running", "start_offset" : 2, "end_offset" : 9, "type" : "word", "position" : 0 }, { "token" : "quick", "start_offset" : 10, "end_offset" : 15, "type" : "word", "position" : 1 }, { "token" : "brown", "start_offset" : 16, "end_offset" : 21, "type" : "word", "position" : 2 }, { "token" : "foxes", "start_offset" : 22, "end_offset" : 27, "type" : "word", "position" : 3 }, { "token" : "leap", "start_offset" : 28, "end_offset" : 32, "type" : "word", "position" : 4 }, { "token" : "over", "start_offset" : 33, "end_offset" : 37, "type" : "word", "position" : 5 }, { "token" : "lazy", "start_offset" : 38, "end_offset" : 42, "type" : "word", "position" : 6 }, { "token" : "dogs", "start_offset" : 43, "end_offset" : 47, "type" : "word", "position" : 7 }, { "token" : "summer", "start_offset" : 55, "end_offset" : 61, "type" : "word", "position" : 10 }, { "token" : "evening", "start_offset" : 62, "end_offset" : 69, "type" : "word", "position" : 11 } ] }
按照空格切分
# 按照空格进行切分 GET _analyze { "analyzer": "whitespace", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
{ "tokens" : [ { "token" : "2", "start_offset" : 0, "end_offset" : 1, "type" : "word", "position" : 0 }, { "token" : "running", "start_offset" : 2, "end_offset" : 9, "type" : "word", "position" : 1 }, { "token" : "Quick", "start_offset" : 10, "end_offset" : 15, "type" : "word", "position" : 2 }, { "token" : "brown-foxes", "start_offset" : 16, "end_offset" : 27, "type" : "word", "position" : 3 }, { "token" : "leap", "start_offset" : 28, "end_offset" : 32, "type" : "word", "position" : 4 }, { "token" : "over", "start_offset" : 33, "end_offset" : 37, "type" : "word", "position" : 5 }, { "token" : "lazy", "start_offset" : 38, "end_offset" : 42, "type" : "word", "position" : 6 }, { "token" : "dogs", "start_offset" : 43, "end_offset" : 47, "type" : "word", "position" : 7 }, { "token" : "in", "start_offset" : 48, "end_offset" : 50, "type" : "word", "position" : 8 }, { "token" : "the", "start_offset" : 51, "end_offset" : 54, "type" : "word", "position" : 9 }, { "token" : "summer", "start_offset" : 55, "end_offset" : 61, "type" : "word", "position" : 10 }, { "token" : "evening.", "start_offset" : 62, "end_offset" : 70, "type" : "word", "position" : 11 } ] }
不分词,直接将输入当一个term输出
# 不分词,直接输出 GET _analyze { "analyzer": "keyword", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
{ "tokens" : [ { "token" : "2 running Quick brown-foxes leap over lazy dogs in the summer evening.", "start_offset" : 0, "end_offset" : 70, "type" : "word", "position" : 0 } ] }
- 通过正则表达式进行分词
- 默认是\W+,非字母的符号进行分割。
# 正则表达式,默认是非字母分割 GET _analyze { "analyzer": "pattern", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
{ "tokens" : [ { "token" : "2", "start_offset" : 0, "end_offset" : 1, "type" : "word", "position" : 0 }, { "token" : "running", "start_offset" : 2, "end_offset" : 9, "type" : "word", "position" : 1 }, { "token" : "quick", "start_offset" : 10, "end_offset" : 15, "type" : "word", "position" : 2 }, { "token" : "brown", "start_offset" : 16, "end_offset" : 21, "type" : "word", "position" : 3 }, { "token" : "foxes", "start_offset" : 22, "end_offset" : 27, "type" : "word", "position" : 4 }, { "token" : "leap", "start_offset" : 28, "end_offset" : 32, "type" : "word", "position" : 5 }, { "token" : "over", "start_offset" : 33, "end_offset" : 37, "type" : "word", "position" : 6 }, { "token" : "lazy", "start_offset" : 38, "end_offset" : 42, "type" : "word", "position" : 7 }, { "token" : "dogs", "start_offset" : 43, "end_offset" : 47, "type" : "word", "position" : 8 }, { "token" : "in", "start_offset" : 48, "end_offset" : 50, "type" : "word", "position" : 9 }, { "token" : "the", "start_offset" : 51, "end_offset" : 54, "type" : "word", "position" : 10 }, { "token" : "summer", "start_offset" : 55, "end_offset" : 61, "type" : "word", "position" : 11 }, { "token" : "evening", "start_offset" : 62, "end_offset" : 69, "type" : "word", "position" : 12 } ] }
支持不同国家的语言分词
# 不同国家语言分词 GET _analyze { "analyzer": "english", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
{ "tokens" : [ { "token" : "2", "start_offset" : 0, "end_offset" : 1, "type" : "<NUM>", "position" : 0 }, { "token" : "run", "start_offset" : 2, "end_offset" : 9, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "quick", "start_offset" : 10, "end_offset" : 15, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "brown", "start_offset" : 16, "end_offset" : 21, "type" : "<ALPHANUM>", "position" : 3 }, { "token" : "fox", "start_offset" : 22, "end_offset" : 27, "type" : "<ALPHANUM>", "position" : 4 }, { "token" : "leap", "start_offset" : 28, "end_offset" : 32, "type" : "<ALPHANUM>", "position" : 5 }, { "token" : "over", "start_offset" : 33, "end_offset" : 37, "type" : "<ALPHANUM>", "position" : 6 }, { "token" : "lazi", "start_offset" : 38, "end_offset" : 42, "type" : "<ALPHANUM>", "position" : 7 }, { "token" : "dog", "start_offset" : 43, "end_offset" : 47, "type" : "<ALPHANUM>", "position" : 8 }, { "token" : "summer", "start_offset" : 55, "end_offset" : 61, "type" : "<ALPHANUM>", "position" : 11 }, { "token" : "even", "start_offset" : 62, "end_offset" : 69, "type" : "<ALPHANUM>", "position" : 12 } ] }
中文分词
分别进两个es中,安装插件
docker exec -it es72_02 sh
cd bin # es 安装插件 ./elasticsearch-plugin install analysis-icu
安装完插件之后,重启一下
docker-compose stop docker-compose start
# 中文分词 POST _analyze { "analyzer": "icu_analyzer", "text": "他说的确实在理" }
{ "tokens" : [ { "token" : "他", "start_offset" : 0, "end_offset" : 1, "type" : "<IDEOGRAPHIC>", "position" : 0 }, { "token" : "说的", "start_offset" : 1, "end_offset" : 3, "type" : "<IDEOGRAPHIC>", "position" : 1 }, { "token" : "确实", "start_offset" : 3, "end_offset" : 5, "type" : "<IDEOGRAPHIC>", "position" : 2 }, { "token" : "在", "start_offset" : 5, "end_offset" : 6, "type" : "<IDEOGRAPHIC>", "position" : 3 }, { "token" : "理", "start_offset" : 6, "end_offset" : 7, "type" : "<IDEOGRAPHIC>", "position" : 4 } ] }
match
match查询,会先对搜索词进行分词,比如“白雪公主和苹果”,会分成 白雪、公主、苹果,含有相关内容的字段,都会被检索出来。
wildcard
wildcard查询:是使用通配符进行查询,其中?代表任意一个字符,*代表任意一个或多个字符。
match_phrase
term
结构化字段查询,匹配一个值,且输入的值不会被分词器分词
https://elasticstack.blog.csdn.net/article/details/102728604