Translating Elasticsearch request from Kibana into elasticsearch-dsl - python

Recently migrated from AWS Elasticsearch Service (used Elasticsearch 1.5.2) to Elastic Cloud (currently using Elasticsearch 5.1.2). Glad I did it, but with that change comes a newer version of Elasticsearch and newer API's. Struggling to get my head around the new way of requesting stuff. Formerly, I could more or less copy/paste from Kibana's "Elasticsearch Request Body", adjust a few things, run elasticsearch.Elasticsearch.search() and get what I expect.
Here's my Elasticsearch Request Body from Kibana (for brevity, removed some of the extraneous stuff that Kibana usually inserts):
{
"size": 500,
"sort": [
{
"Time.ISO8601": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "Message\\ ID: 2003",
"analyze_wildcard": true
}
},
{
"range": {
"Time.ISO8601": {
"gte": 1484355455678,
"lte": 1484359055678,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
},
"stored_fields": [
"*"
],
"script_fields": {},
}
Now I want to use elasticsearch-dsl to do it, since that seems to be the recommended method (instead of using elasticsearch-py). How would I translate the above into elasticsearch-dsl?
Here's what I have so far:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
client = Elasticsearch(
hosts=['HASH.REGION.aws.found.io/elasticsearch'],
use_ssl=True,
port=443,
http_auth=('USER','PASS')
)
s = Search(using=client, index="emp*")
s = s.query("query_string", query="Message\ ID:2003", analyze_wildcards=True)
s = s.query("range", **{"Time.ISO8601": {"gte": 1484355455678, "lte": 1484359055678, "format": "epoch_millis"}})
s = s.sort("Time.ISO8601")
response = s.execute()
for hit in response:
print '%s %s' % (hit['Time']['ISO8601'], hit['Message ID'])
My code written as above is not giving me what I expect. Getting results that include stuff that doesn't match "Message\ ID:2003", and also it's giving me things outside the requested range of Time.ISO8601 as well.
Totally new to elasticsearch-dsl and ES 5.1.2's way of doing things, so I know I've got lots to learn. What am I doing wrong? Thanks in advance for the help!

I don't have elasticsearch running right now but the query looks like what you wanted (you can always see the query produced by looking at s.to_dict()) with the exception of escaping the \ sign. In the original query it was escaped yet in python the result might be different due to different escaping.
I wuld strongly advise to not have spaces in your fields and also to use a more structured query than query_string:
s = Search(using=client, index="emp*")
s = s.filter("term", message_id=2003)
s = s.query("range", Time__ISO8601={"gte": 1484355455678, "lte": 1484359055678, "format": "epoch_millis"})
s = s.sort("Time.ISO8601")
Note that I also changed query() to filter() for a slight speedup and used __ instead of . in the field name keyword argument. elasticsearch-dsl will automatically expand that to ..
Hope this helps...

Related

Using Solr-Docker with python return a wrong results

I have a flask app which runs in a Docker container and I wanted to use Solr with it for indexing and searching, so I built a container for Solr using the Solr official image and used it with my app using docker-compose.
In the app I have multiple types of objects that I want to index for example type1 and type2 and each type has specific fields, so I got in Solr, documents that have different fields, such as doc1 could have field1 and field2, and doc2 could have field3, field4 and field5, and each document has a field called type to specify its type.
I have two types of search first one is searching for documents of a specific type and this is an example URL of it which is used with requests Python package:
response = requests.get("http://solr:8983/solr/myCollection/select?q=*val*&defType=edismax&fq=type:type1&qf=field1^2&qf=field2^1")
, and the other is overall search so I search for documents of all types, and here is its URL example:
response = requests.get("http://solr:8983/solr/myCollection/select?q=*val*&defType=edismax&fq=type:type1||type2&qf=field1^1&qf=field2^1&qf=field3^1&qf=field4^1&qf=field1^1")
I have two problems with my work:
I don't get the result that I expected when I run some queries.
some fields have values with special characters like (z=x+y*f) and when I try to escape these special characters by '\' it doesn't work.
So, is the queries that I wrote have something wrong and is there any article or tutorial that could help me because I searched a lot in the documentation and the internet but I couldn't find I way to solve my problems.
Note: I didn't change the schema file I let it as default.
I've solved the problems by using the tokenizers and filters in indexing and querying.
You can use them by the Client API that Solr provide.
Here is an example of JSON data to add tokenizers and filters to a field type:
{
"replace-field-type": {
"name": "field_name",
"class": "solr.TextField",
"multiValued": True,
"indexAnalyzer": {
"tokenizer": {
"class": "solr.LowerCaseTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
}
]
},
"queryAnalyzer": {
"tokenizer": {
"class": "solr.WhitespaceTokenizerFactory",
"rule": "java"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
}
]
}
}
}

How to create BigQuery Data Transfer Service using Python

I tried creating a Data Transfer Service using bigquery_datatransfer. I used the following python library,
pip install --upgrade google-cloud-bigquery-datatransfer
Used the method
create_transfer_config(parent, transfer_config)
I have defined the transfer_config values for the data_source_id: amazon_s3
transfer_config = {
"destination_dataset_id": "My Dataset",
"display_name": "test_bqdts",
"data_source_id": "amazon_s3",
"params": {
"destination_table_name_template":"destination_table_name",
"data_path": <data_path>,
"access_key_id": args.access_key_id,
"secret_access_key": args.secret_access_key,
"file_format": <>
},
"schedule": "every 10 minutes"
}
But while running the script I'm getting the following error,
ValueError: Protocol message Struct has no "destination_table_name_template" field.
The fields given inside the params are not recognized. Also, I couldn't find what are the fields to be defined inside the "params" struct
What are the fields to be defined inside the "params" of transfer_config to create the Data Transfer job successfully?
As you can see in the documentation, you should try putting your code inside the google.protobuf.json_format.ParseDict() function.
transfer_config = google.protobuf.json_format.ParseDict(
{
"destination_dataset_id": dataset_id,
"display_name": "Your Scheduled Query Name",
"data_source_id": "scheduled_query",
"params": {
"query": query_string,
"destination_table_name_template": "your_table_{run_date}",
"write_disposition": "WRITE_TRUNCATE",
"partitioning_field": "",
},
"schedule": "every 24 hours",
},
bigquery_datatransfer_v1.types.TransferConfig(),
)
Please let me know if it helps you

update elasticsearch data by using elasticserch dsl

How can I update elasticsearch data by using elasticsearch-dsl package? Is that possible ?
I found elasticsearch update api, but it seems like bit difficult. What I am looking for is,
searchObj = Search(using=logserver, index=INDEX)
searchObj=searchObj.query("term",attribute=value).update(attribute=new_value)
response = searchObj.execute()
#kingArther answer is not correct.
elasticsearch-dsl support update very well!
By mapping an index to an object (DocType) it allows
you to save and update easily without any JSON rest requests.
You can find examples an API here
Its probably late but i am leaving this reply for those who are still having same issue:
elasticsearch-dsl offers update function that can be called on classes extending Document class of elasticsearch-dsl. Following is the code:
data = yourIndexClass.get(id=documentIdInIndex)
data.update(key=NewValue)
That is it. Simple. Find details Here
If I'm not wrong elasticsearch_dsl doesn't have an option for update/bulk update.
So, if you like, you can use elasticsearch-py pckage for the same.
Example
from elasticsearch import Elasticsearch
INDEX = 'myindex'
LOG_HOST = 'myhost:myport'
logserver = Elasticsearch(LOG_HOST)
script = "ctx._source.attribute = new_value"
update_body = {
"query": {
"bool": {
"filter": {
"term": {"attribute": "value"}
}
}
},
"script": {
"source": script,
"lang": "painless"
}
}
update_response = logserver.update_by_query(index=INDEX, body=update_body)
For more information, see this official documentation

How to escape hyphen character in Python elasticsearch

I'm using the basic elasticsearch library in python 3.
I have a query:
query = {
"query": {
"bool": {
"must": [{ "term": {"hostname": '"hal-pc"' } }]
}
}
}
That I call with:
page = es.search(index = index_name, body=query, search_type='scan', scroll='2m')
However I'm not getting any results. I can query on other fields so I know my query works, but when I add the search for a field with a hyphen in the value, I cannot find anything. How can I escape this character? I know that with normal ES queries you can send a message to configure your ES to respond to certain characters in certain ways, but I don't know how to do that in python.
If the field hostname is analyzed in mapping, elasticsearch does not store the field value as is. Instead, it stores "hal-pc" as two separate terms: "hal" and "pc".So, the doc might not be obtained when search for "hal-pc" using term filter.
You can search for "hal-pc" using Match query to get the necessary result. Or, by making the field hostname field not-analyzed and using term query as is.
{
"query": {
"match" : {
"hostname": "hal-pc"
}
}
}
But, this might also return docs where hostname is just "hal" or just "pc" as well.

ElasticSearch-Haystack: Spanish Tokenizer "Fails"

I'm using:
Haystack - 2.1.0
ElasticSearch - 0.90.3
pyelasticsearch - 0.6
I've configured a custom backend to change default Elasticsearch settings and use Spanish analyzer.
I'm using this settings for Elasticsearch:
"settings" : {
"index": {
"uuid": "IPwcMthwRpSJzpjtarc9eQ",
"analysis": {
"analyzer": {
"default": {
"filter": ["standard", "lowercase", "asciifolding", ],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"number_of_shards": "10",
}
},
"analyzer": {
"spanish": {
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
"spanish_keywords",
"spanish_stemmer"
]
}
}
I read this settings in some answer here. When I apply this settings to ElasticSearch and reindex my models I get a behaviour that I'm not sure I understand.
I have some objects with names like "Ciencias" and others like "Ciéncies" When I do a search like "ciencias" I receive objects with names like "Ciencias" and "Ciéncies", and the same happens when I search for "ciencies" or "ciéncies".
I want ElasticSearch to ignore accents, that's why I'm using asciifolding, and using spanish tokenizer because most of text is in spanish. I don't understand why using different words like "cienciAs" and "cienciEs" receive same results.
Why is this happening ? Is because a default ngram analyzer that is splitting the words ?
Why searching for "cienciAs" I get object with name like "ciénciEs" as results ?
Probably because the stemmer is doing its job. If you want to find out what happens while tokenising or stemming, install the inquisitor plugin and go to the Analyzers tab (see here)
Finally I removed the Spanish analyzer and everything began to work as expected.
Now I'm using only Asciifolding and Lowercase filters and accents and ñ's are being indexed well, and I don't have the issue with ciencias and ciencies.

Categories

Resources