ElasticSearch 7 & Kibana unexpected behavior - python

I am trying to store a data into elastic search index the data of a column look as below
C ID
1234
5678
NA
123D D5614 A7890
Now I know this data is kind of mixed and so I have selected the text field for this with below properties
"mappings": {
"properties":{
"C ID":{"type":"text" , "fields" :{'keyword': {'type':'keyword'}}},
}
}
Even after this I am always getting the error.
failed to parse field[C ID] of type long in document id 4
Please help me out with this. I have not given any reference of long don't know why I am getting this error
Update
My code base
from elasticsearch import Elasticsearrch
ESConnector is a class responisble for kerberos login. We are calling Elasticsearch under ESConnector class
es = ESConnector()
if not ex.indices.exist(INDEX):
set = {"settings":{"index":{"number_of_shards":1, "number_of_replicas":1}}
es.indices.create(INDEX, body = set)
mbody = {
"mappings": {
"properties":{
"C ID":{"type":"text" , "fields" :{'keyword': {'type':'keyword'}}},
}
}
}
es.indices.put_mapping(INDEX, body = mbody)

You can create the index with the mapping in a single call
if not es.indices.exist(INDEX):
body = {
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"C ID": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
es.indices.create(INDEX, body = body)
It should work this way.

Related

ElasticSearch - Compile Error on Adding a Field?

Using Python, I'm trying to go row-by-row through an Elasticsearch index with 12 billion documents and add a field to each document. The field is named direction and will contain "e" for some values of the field src and "e" for others. For this particular _id, the field should contain an "e".
from elasticsearch import Elasticsearch
es = Elasticsearch(["https://myESserver:9200"],
http_auth=('myUsername', 'myPassword'))
query_to_add_direction_field = {
"script": {
"inline": "direction=\"e\"",
"lang": "painless"
},
"query": {"constant_score": {
"filter": {"bool": {"must": [{"match": {"_id": "YKReAoQBk7dLIXMBhYBF"}}]}}}}
}
results = es.update_by_query(index="myIndex-*", body=query_to_add_direction_field)
I'm getting this error:
elasticsearch.BadRequestError: BadRequestError(400, 'script_exception', 'compile error')
I'm new to Elasticsearch. How can I correct my query so that it does not throw an error?
UPDATE:
I updated the code like this:
query_find_id = {
"size": "1",
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
query_to_add_direction_field = {
"script": {
"source": "ctx._source['egress'] = true",
"lang": "painless"
},
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
results = es.search(index="traffic-*", body=query_find_id)
results = es.update_by_query(index="traffic-*", body=query_to_add_direction_field)
results_after_update = es.search(index="traffic-*", body=query_find_id)
The code now runs without errors... I think I may have fixed it.
I say I think I may have fixed it because if I run the same code again, I get a version_conflict_engine_exception error on the call to update_by_query... but I think that just means the big 12B-row index is still being updated to match the change I made. Does that sound possibly accurate?
Please try the following query:
{
"script": {
"source": "ctx._source.direction = 'e'",
"lang": "painless"
},
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
]
}
}
}
}
}
Regarding version_conflict_engine_exception it happens because the version of the document is not the one that the update_by_query operation expects, for example, because other process updated that doc at the same time.
You can add /_update_by_query?conflicts=proceed to workaround the issue.
Read more about conflicts here:
https://www.elastic.co/guide/en/elasticsearch/reference/8.5/docs-update-by-query.html#docs-update-by-query-api-desc
If you think it is a temporal conflict, you can use retry_on_conflict to try again after the conflicts:
retry_on_conflict
(Optional, integer) Specify how many times should the operation be retried when a conflict occurs. Default: 0.

Elasticsearch multi field query request in Python

I'm a beginner in Elasticsearch and Python and I have an index created in Elasticsearch with some data, and I want to perform a query request on those data with python. This is my data mapping created in Kibana's Dev tools:
PUT /main-news-test-data
{
"mappings": {
"properties": {
"content": {
"type": "text"
},
"title": {
"type": "text"
},
"lead": {
"type": "text"
},
"agency": {
"type": "keyword"
},
"date_created": {
"type": "date"
},
"url": {
"type": "keyword"
},
"image": {
"type": "keyword"
},
"category": {
"type": "keyword"
},
"id":{
"type": "keyword"
}
}
}
}
and here is my Python code, in which we give it a keyword and a category number and it has to check in title, lead and content fields of the elastic data for the matching keyword and also check the entered category number with the data category number and return/print out any object that matches this criteria:
from elasticsearch import Elasticsearch
import json,requests
es = Elasticsearch(HOST="http://localhost", PORT=9200)
es = Elasticsearch()
def QueryMaker (keyword,category):
response = es.search(index="main-news-test-data",body={"from":0,"size":5,"query":{"multi_match":{
"content":keyword,"category":category,"title":keyword,"lead":keyword}}})
return(response)
if __name__ == '__main__':
keyword = input('Enter Keyword: ')
category = input('Enter Category: ')
#startDate = input('Enter StartDate: ')
#endDate = input('Enter EndDate: ')
data = QueryMaker(keyword,category)
print(data)
but I receive this error when I give the data to the input:
elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', '[multi_match] query does not support [content]')
What am I doing wrong?
Edit: the keyword has to be included in the title, lead and content but it doesn't have to be the same as them
Your multi_match query syntax is wrong here, also I think you need something like this, See more: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html
{
"from":0,
"size":5,
"query": {
"bool": {
"should": [
{
"multi_match" : {
"query": keyword,
"fields": [ "content", "title","lead" ]
}
},
{
"multi_match" : {
"query": category,
"fields": [ "category" ]
}
}
]
}
}
}

Elasticsearch not returning result for single word query

I have a basic Elasticsearch index that consists of a variety of help articles. Users can search for them in my Python/Django app.
The index has the following mappings:
{
"mappings": {
"properties": {
"body": {
"type": "text"
},
"category": {
"type": "nested",
"properties": {
"category_id": {
"type": "long"
},
"category_title": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
}
}
},
"title": {
"type": "keyword"
},
"date_updated": {
"type": "date"
},
"position": {
"type": "integer"
}
}
}
}
I basically want the user to be able to search for a query and get any results that match the article title or category.
Say I have an article called "I Can't Remember My Password" in the "Your Account" category.
If I search for the article title exactly, I see the result. If I search for the category title exactly, I also see the result.
But if I search for just "password", I get nothing. What do I need to change in my setup/query to make it so that this query (or similarly non-exact queries) also returns the result?
My query looks like:
{
"query": {
"bool": {
"should": [{
"multi_match": {
"fields": ["title"],
"query": "password"
}
},
{
"nested": {
"path": "category",
"query": {
"multi_match": {
"fields": ["category.category_title"],
"query": "password"
}
}
}
}
]
}
}
}
I have read other questions and experimented with various settings but no luck so far. I am not doing anything particularly special at index time in terms of preparing the fields so I don't know if that's something to look at. I'm just using the elasticsearch-dsl defaults.
The solution was to reindex the title field as text rather than keyword. The latter only allows exact matching.
Credit to LeBigCat for pointing that out in the comments. They haven't posted it as an answer so I'm doing it on their behalf to improve visibility.

Getting linked documents in single lookup query in Elastic Search

To provide some context :
I want to write a bulk update query(possibly affecting 0.5 - 1M docs). The update would be in the aspects field (shown below) which are mostly duplicated.
My thinking was if I normalised it into another entity (aspect_label), the amount of docs updated would be reduced drastically (say 500-1000 max).
Query : I want to find out if there is a way to get linked documents via id in Elastic Search.
Eg. if I have documents in index my_db according to the mapping below.
Just to point out : processed_reviews is a child of aspect_label
{
"my_db":{
"mappings":{
"processed_reviews":{
"_all":{
"enabled":false
},
"_parent":{
"type":"aspect_label"
},
"_routing":{
"required":true
},
"properties":{
"data":{
"properties":{
"insights":{
"type":"nested",
"properties":{
"aspects":{
"type":"nested",
"properties":{
"aspect_label_id":{
"type":"keyword"
},
"aspect_term_frequency":{
"type":"long"
}
}
}
}
},
"preprocessed_text":{
"type":"text"
},
"preprocessed_title":{
"type":"text"
}
}
}
}
}
}
}
}
And another entity aspect_label :
{
"my_db": {
"mappings": {
"aspect_label": {
"_all": {
"enabled": false
},
"properties": {
"aspect": {
"type": "keyword"
},
"aspect_label_new": {
"type": "keyword"
},
"aspect_label_old": {
"type": "text"
}
}
}
}
}
}
Now, I want to write a search query on the processed_reviews type such that the aspect_label_id entity is replaced with the the value of aspect_label_new in the doc or the entire doc in aspect_label matching the id.
{
"_index":"my_db",
"_type":"processed_reviews",
"_id":"191b3bff-4915-4404-a05a-10e6bd2b19d4",
"_score":1,
"_routing":"5",
"_parent":"5",
"_source":{
"data":{
"preprocessed_text":"Good product I really like so comfortable and so light wait and looks good",
"preprocessed_title":"Good choice",
"insights":[
{
"aspects":[
{
"aspect_label":"color",
"aspect_term_frequency":1
}
]
}
]
}
}
}
Also, if there is a better way to approach this problem/ something wrong with my approach or if this is possible or not. Please inform me of the same as well.

TTL in elasticsearch

I am using the elasticsearch python client to create and store data to a aws elasticsearch instance.
def create_index():
"""
create mapping of data
"""
mappings = '''
{
"tweet":{
"_ttl":{
"enabled": true,
"default": "2m"
},
"properties": {
"text":{
"type": "string"
},
"location":{
"type": "geo_point"
}
}
}
}
'''
# Ignore if index already exists
es.indices.create(index='tweetmap', ignore=400, body=mappings)
As defined above, now I am expecting the records to be deleted automatically after 2 minutes, however they are persisting.
What could be the possible reason ?
There was an error with the way I had defined mappings, corrected it as shown below:
mappings = '''
{
"mappings":{
"tweet":{
"_ttl":{
"enabled": true,
"default": "1d"
},
"properties": {
"text":{
"type": "string"
},
"location":{
"type": "geo_point"
}
}
}
}
}
'''

Categories

Resources