How to search exact index (not index pattern) in elasticsearch eland - updated? - python

I am calling elasticsearch data using eland. The documentation is simple and I am able to implement it, but when searching the index it searches the index string using es_index_pattern which is basically a wildcard (it is also stated in the documentation).
from elasticsearch import ElasticSearch
import eland as ed
es = Elasticsearch(hosts="myhost", "port":0000)
search_body={
"bool":{
"filter":[
{"exists": {"field": "customer_name"}},
{"match_phrase": {"city": "chicago"}},
]
}
}
# Success : I am able to get the results if I search the index through "elasticsearch" api. Tried this repetitively and it works every time
results = es.search(index="my_index", body=search_body)
# Failure : But, I do not get results (but ReadTimeoutError) if I connect to 'my_index' index via the same localhost Elasticsearch using Eland
df = ed.DataFrame(es_client=es, es_index_pattern = 'my_index')
I have to hand type the error message becasue I cannot copy the error outside the environment I am using. Also, my host and port would be different
...
File ".../elasticsearch/transport.py", line 458, in perform_request
raise e
File "......elasticsearch/transport.py", line 419, in perform_request
File "..... /elasticsearch/connection/http_urllib3.py", line 275, in perform_request
raise ConnectionTimeout("TIMEOUT", str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnctionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host=myhost', port=0000): Read timed out. (read timeout=10)
I think that search through elasticsearch is able to get results bc it's calling the exact index name and hence not running into timedout.
But, Eland is rather using es_index_pattern thereby using my_index as wildcard i.e *my_index*, therefore I must be running into ReadTimeOutError.
I looked inside the source code to see if there was anything I could do, so Eland did not search the index as a pattern but exact match. But, I see no option for searching the exact index both in the documentation and the source code.
How do I search for exact index string in Eland?
Sources:
https://www.elastic.co/guide/en/elasticsearch/client/eland/current/overview.html
https://www.elastic.co/guide/en/elasticsearch/client/eland/current/overview.html
https://github.com/elastic/eland/blob/main/eland/ndframe.py
https://github.com/elastic/eland/blob/main/eland/dataframe.py

Also posted this on Github but I'll replicate here:
Searching an exact index only requires passing the exact index name, no wildcards are used:
import eland as ed
from elasticsearch import Elasticsearch
client = Elasticsearch(...)
client.index(index="test", document={"should": "seethis"})
client.index(index="test1", document={"should": "notseethis"})
client.index(index="1test", document={"should": "notseethis"})
client.indices.refresh(index="*test*")
df = ed.DataFrame(client, es_index_pattern="test")
print(df.to_pandas())
The output of the above is this as expected:
should
SNTTnH4BRC8cqQQMds-V seethis
The pattern word in the option doesn't mean we're using wildcards, it's the pattern that we're sending to Elasticsearch in the search and index APIs.

Related

Unable to extract all feature data for a project using pyral's rest api python

I'm trying to read all feature data for a project in Rally using pyral module. The total result set count is 1358 and the script throws error as mentioned below after reading some 700 records.
rally1 = Rally(entity='PortfolioItem_Feature',server='rally1.rallydev.com',fetch=True, user='xxx', password='', workspace='xxx/yyy', project='ABC')
features = rally1.get('Feature',fetch="True",pagesize=2000,start=0,limit=5000)
for feature in features:
#a=pd.DataFrame(feature)
print(feature.details())
OUTPUT:
PortfolioItem/Feature result set, totalResultSetSize: 1358, startIndex: 1 pageSize: 2000 current Index: 0
I'm getting the following errors:
File "C:\Users\nis\AppData\Roaming\Python\Python38\site-packages\pyral\entity.py", line 397, in details
(attr_name, cln, value.oid, value.UserName, value.DisplayName)
File "C:\Users\nis\AppData\Roaming\Python\Python38\site-packages\pyral\entity.py", line 139, in __getattr__
raise UnreferenceableOIDError("%s OID %s" % (rallyEntityTypeName, self.oid))
pyral.entity.UnreferenceableOIDError: User OID 44012386960
So, I have following questions:
How can I fix this error, meaning skip reading that particular field for the feature attributes or replace that with empty result set.
How can I convert the (a) <class 'pyral.rallyresp.RallyRESTResponse'> (b) PortfolioItem/Feature result set to dataframe without getting the error as mentioned above.
I am using the below code and this script too reads approx. 700 records and throws the same error as mentioned above. I tried using error handing still it stops reading at the point where it encounters error. Any help will be greatly appreciated.
r=pd.DataFrame()
for feature in features:
a= {
'OID': feature.oid,
'FeatureID':feature.FormattedID,
'Creation_Date': feature.CreationDate,
'Project_Name': feature.Project.Name,
'AcceptedLeafStoryPlanEstimateTotal':feature.AcceptedLeafStoryPlanEstimateTotal,
'AcceptedLeafStoryPlanEstimateTotal':feature.AcceptedLeafStoryPlanEstimateTotal,
'Feature_payload':feature._ref,
'Created by':feature.CreatedBy.DisplayName
}
#print(a.keys())
#print(a.values())
r=r.append(a,ignore_index=True,sort=False)
print(r)
Looks to me like the object with id 44012386960 is broken so you'll have to skip it.
for feature in features:
if feature.oid == 44012386960:
continue
print(feature.details())
I'm assuming that your error is a bad featureoid because it also happens in your second loop which only uses feature.oid but it's also possible that feature.CreatedBy is the problem because your error message suggests that it's a bad user oid. In that care my suggestion would be to print feature.CreatedBy.oid in your loop to find the bad one and change my if to if feature.CreatedBy.oid == 44012386960: continue
I resolved the issue using ternary operators in python as shown below:
feature_name = '' if feature.Feature is None else feature.Feature.Name
The issue was whenever Rally rest api was unable to reference the oid for a particular feature that caused an error.

pyravendb query parameters parsing error

I've noticed a weird parsing problem with ravendb's python client.
when i use this query
query_result = list(session.query().where_equals("url",url).select("Id","htmlCode","url"))
knowing that url = "http://www.mywebsite.net/"
The relevent part of the error stack is the following :
File "/usr/local/lib/python3.5/dist-packages/pyravendb/store/session_query.py", line 71, in __iter__
return self._execute_query().__iter__()
File "/usr/local/lib/python3.5/dist-packages/pyravendb/store/session_query.py", line 307, in _execute_query
includes=self.includes)
File "/usr/local/lib/python3.5/dist-packages/pyravendb/d_commands/database_commands.py", line 286, in query
raise exceptions.ErrorResponseException(response["Error"][:100])
pyravendb.custom_exceptions.exceptions.ErrorResponseException: Lucene.Net.QueryParsers.ParseException: Could not parse: 'url:http://www.mywebsite.net/' --->
BUT if I simply add a simple ' ' to the url parameter in the query, it works without any parsing error (but dosent returns a result though since syntax isnt the same).
I would like to contribute to the pyravendb on github but I'm not sure where it's parsing the parameters, it's probably calling lucene for that.
Any idea why a simple space can prevent proper parsing ?
The query you send to lucene is this url:http://www.mywebsite.net/
lucene key will be the url and the value suppose to be http://www.mywebsite.net/
because you have : in http://www.mywebsite.net/ the lucene parser get "confused" and raise a parsing error.(split key,value special character is :)
To fix your problem you need to escape the : in your url parameter and then give it to the query so your url parameter should look like this:
http\://www.mywebsite.net/
For your question why simple space can prevent proper parsing is because space in lucene indicates about another parameter to look for. (you can see what query we build when you using the where_in method)
This issue will be fixed in the next version of pyravendb (current version is 1.3.1.1)

Django-Haystack(elasticsearch) Autocomplete giving results for substring in search term

I have a search index with elasticsearch as backend:
class MySearchIndex(indexes.SearchIndex, indexes.Indexable):
...
name = indexes.CharField(model_attr='name')
name_auto = indexes.NgramField(model_attr='name')
...
Suppose I have following values in elasticsearch:
Cable
Magnet
Network
Internet
Switch
When I execute search for netw, it returned Magnet & Internet also along with Network. Using some other test cases I think haystack is searching for substring also, like net in netw as you see in above example.
Here is the code:
sqs = sqs.filter(category='cat_name').using(using)
queried = sqs.autocomplete(name_auto=q)
Also tried with:
queried = sqs.autocomplete(name_auto__contains=q)
How can I resolve this and make it working to return only those results that contains exact search term ?
Using django-haystack==2.4.1 Django==1.9.1 elasticsearch==1.9.0
Customize your elasticsearch backend settings with django-hesab
The default settings of django-hesab will return the exact search result.

Why is the reported number of hits from elasticsearch different depending on the query method?

I have an elasticsearch index which has 60k elements. I know that by checking the head plugin and I get the same information via Sense (the result is in the lower right corner)
I then wanted to query the same index from Python, in two diffrent ways: via a direct requests call and using the elasticsearch module:
import elasticsearch
import json
import requests
# the requests version
data = {"query": {"match_all": {}}}
r = requests.get('http://elk.example.com:9200/nessus_current/_search', data=json.dumps(data))
print(len(r.json()['hits']['hits']))
# the elasticsearch module version
es = elasticsearch.Elasticsearch(hosts='elk.example.com')
res = es.search(index="nessus_current", body={"query": {"match_all": {}}})
print(len(res['hits']['hits']))
In both cases the result is 10 - far from the expected 60k. The results of the query make sense (the content is what I expect), it is just that there are only a few of them.
I took one of these 10 hits and queried with Sense for its _id to close the loop. It is, as expected, found indeed:
So it looks like the 10 hits are a subset of the whole index, why aren't all elements reported in the Python version of the calls?
10 is the default size of the results returned by Elasticsearch. If you want more, specify "size": 100 for example. But, be careful, returning all the docs using size is not recommended as it can bring down your cluster. For getting back all the results use scan&scroll.
And I think it should be res['hits']['total'] not res['hits']['hits'] to get the number of total hits.

Push a raw value to Firebase via REST API

I am trying to use the requests library in Python to push data (a raw value) to a firebase location.
Say, I have urladd (the url of the location with authentication token). At the location, I want to push a string, say International. Based on the answer here, I tried
data = {'.value': 'International'}
p = requests.post(urladd, data = sjson.dumps(data))
I get <Response [400]>. p.text gives me:
u'{\n "error" : "Invalid data; couldn\'t parse JSON object, array, or value. Perhaps you\'re using invalid characters in your key names."\n}\n'
It appears that they key .value is invalid. But that is what the answer linked above suggests. Any idea why this may not be working, or how I can do this through Python? There are no problems with connection or authentication because the following works. However, that pushes an object instead of a raw value.
data = {'name': 'International'}
p = requests.post(urladd, data = sjson.dumps(data))
Thanks for your help.
The answer you've linked is a special case for when you want to assign a priority to a value. In general, '.value' is an invalid name and will throw an error.
If you want to write just "International", you should write the stringified-JSON version of that data. I don't have a python example in front of me, but the curl command would be:
curl -X POST -d "\"International\"" https://...
Andrew's answer above works. In case someone else wants to know how to do this using the requests library in Python, I thought this would be helpful.
import simplejson as sjson
data = sjson.dumps("International")
p = requests.post(urladd, data = data)
For some reason I had thought that the data had to be in a dictionary format before it is converted to stringified JSON version. That is not the case, and a simple string can be used as an input to sjson.dumps().

Categories

Resources