How to use ResultSet in PyES

How to use ResultSet in PyES - python

I'm using PyES to use ElasticSearch in Python.
Typically, I build my queries in the following format:
# Create connection to server.
conn = ES('127.0.0.1:9200')
# Create a filter to select documents with 'stuff' in the title.
myFilter = TermFilter("title", "stuff")
# Create query.
q = FilteredQuery(MatchAllQuery(), myFilter).search()
# Execute the query.
results = conn.search(query=q, indices=['my-index'])
print type(results)
# > <class 'pyes.es.ResultSet'>
And this works perfectly. My problem begins when the query returns a large list of documents.
Converting the results to a list of dictionaries is computationally demanding, so I'm trying to return the query results already in a dictionary. I came across with this documentation:
http://pyes.readthedocs.org/en/latest/faq.html#id3
http://pyes.readthedocs.org/en/latest/references/pyes.es.html#pyes.es.ResultSet
https://github.com/aparo/pyes/blob/master/pyes/es.py (line 1304)
But I can't figure out what exactly I'm supposed to do.
Based on the previous links, I've tried this:
from pyes import *
from pyes.query import *
from pyes.es import ResultSet
from pyes.connection import connect
# Create connection to server.
c = connect(servers=['127.0.0.1:9200'])
# Create a filter to select documents with 'stuff' in the title.
myFilter = TermFilter("title", "stuff")
# Create query / Search object.
q = FilteredQuery(MatchAllQuery(), myFilter).search()
# (How to) create the model ?
mymodel = lambda x, y: y
# Execute the query.
# class pyes.es.ResultSet(connection, search, indices=None, doc_types=None,
# query_params=None, auto_fix_keys=False, auto_clean_highlight=False, model=None)
resSet = ResultSet(connection=c, search=q, indices=['my-index'], model=mymodel)
# > resSet = ResultSet(connection=c, search=q, indices=['my-index'], model=mymodel)
# > TypeError: __init__() got an unexpected keyword argument 'search'
Anyone was able to get a dict from the ResultSet?
Any good sugestion to efficiently convert the ResultSet to a (list of) dictionary will be appreciated too.

I tried too many ways directly to cast ResultSet into dict but got nothing. The best way I recently use is appending ResultSet items into another list or dict. ResultSet covers every single item in itself as a dict.
Here is how I use:
#create a response dictionary
response = {"status_code": 200, "message": "Successful", "content": []}
#set restul set to content of response
response["content"] = [result for result in resultset]
#return a json object
return json.dumps(response)

Its not that complicated: just iterate over the result set. For example with a for loop:
for item in results:
print item

Related

pymongo: Error Creating embedded array in an OrderedDict

While importing SQL data into mongodb, I have merged few tables as an embedded array but while implementing I get syntactic errors stating 'key errors'.
Below is my code.
import pyodbc, json, collections, pymongo, datetime
arrayCol =[]
mongoConStr = 'localhost:27017'
sqlConStr = 'DRIVER={MSSQL-NC1311};SERVER=tcp:172.16.1.75,1433;DATABASE=devdb;UID=qauser;PWD=devuser'
mongoConnect = pymongo.MongoClient(mongoConStr)
sqlConnect = pyodbc.connect(sqlConStr)
dbo = mongoConnect.eaedw.ctArrayData
sqlCur = sqlConnect.cursor()
sqlCur.execute('''SELECT M.fldUserId ,TRU.intRuleGroupId ,TGM.strGroupName FROM TBL_USER_MASTER M
JOIN TBL_RULEGROUP_USER TRU ON M.fldUserId = TRU.intUserId
JOIN tbl_Group_Master TGM ON TRU.intRuleGroupId = TGM.intGroupId
''')
tuples = sqlCur.fetchall()
for tuple in tuples:
doc = collections.OrderedDict()
doc['fldUserId'] = tuple.fldUserId
doc['groups.gid'].append(tuple.intRuleGroupId)
doc['groups.gname'].append(tuple.strGroupName)
arrayCol.append(doc)
mongoImp = dbo.insert_many(arrayCol)
sqlCur.close()
mongoConnect.close()
sqlConnect.close()
Here, I was trying to create an embedded array name groups which will hold gid and groupname as a sub-doc in the array.
I get error for using append, it runs successfully without the embedded array.
Is there any error or mistake with the array definition?

You can't append to a list that doesn't exist. When you call append on them, doc['groups.gid'] and doc['groups.gname'] have no value. Even once you fix that problem, PyMongo prohibits you from inserting a document with keys like "groups.gid" that include dots. I think you intend to do this:
for tuple in tuples:
doc = collections.OrderedDict()
doc['fldUserId'] = tuple.fldUserId
doc['groups'] = collections.OrderedDict([
('gid', tuple.intRuleGroupId),
('gname', tuple.strGroupName)
])
arrayCol.append(doc)
I'm only guessing, based on your question, the schema that you really want to create.

saving search results as text instead of list

I am using Django 1.8 and currently am working on a Blog application. When i search for tweets( just a name instead of posts) , i want to save the search results obtained after querying the database, as text instead of list. My view function is as below:
def search(request):
query = request.GET.get('q','')
if query:
qset = (
Q(text__icontains=query)
#Q(hashes__icontains=query)
#Q(artist__icontains=query)
)
results = Tweet.objects.filter(qset).distinct()
else:
results = []
number_of_results = len(results)
search_item = query
returned_items = []
for res in results:
text = res.text
returned_items.append(text)
returns = returned_items[:]
search = Search(search_item=search_item,returns=returns)
search.save()
context = {'query':query,'results':results,'number_of_results':number_of_results,'title':'Search results for '+request.GET.get('q','')}
return render_to_response("tweets/search.html",context,context_instance=RequestContext(request))
also, the snapshot of my search table in the database is as shown below:
Please help me out friends.

you should join the returned list using the comma separted values. This will return the string.
returns = ', '.join(returned_items)

This piece of code is setting returns to a list:
returns = returned_items[:]
If you want to access the first string, set it to returned_items[0]. If you want to join all strings in the list, use join()
returns = "".join(returned_items)

NDB Model Querying of Key Ids using an array filter

I'm trying to query an NDB model using a list of provided key id strings. The model has string ids that are assigned at creation - for example:
objectKey = MyModel(
id="123456ABC",
name="An Object"
).put()
Now I can't figure out how to query the NDB key ids with a list filter. Normally you can do the MyModel.property.IN() to query properties:
names = ['An Object', 'Something else', 'etc']
# This query works
query = MyModel.query(MyModel.name.IN(names))
When I try to filter by a list of keys, I can't get it to work:
# This simple get works
object = MyModel.get_by_id("123456ABC")
ids = ["123456ABC", "CBA654321", "etc"]
# These queries DON'T work
query = MyModel.query(MyModel.id.IN(ids))
query = MyModel.query(MyModel.key.id.IN(ids))
query = MyModel.query(MyModel.key.id().IN(ids))
query = MyModel.query(MyModel._properties['id'].IN(ids))
query = MyModel.query(getattr(MyModel, 'id').IN(ids))
...
I always get AttributeError: type object 'MyModel' has no attribute 'id' errors.
I need to be able to filter by a list of IDs, rather than iterate through each ID in the list (which is sometimes long). How do I do it?

The following should work:
keys = [ndb.Key(MyModel, anid) for anid in ids]
objs = ndb.get_multi(keys)

You can also use urlsafe keys If you have problems using the ids.
keys = ndb.get_multi([ndb.Key(urlsafe=k) for k in ids])

Filtered multiple returned docs in mongoDB

I am trying to perform a query which returns back a document each time. The problem is that some docs have multiple instances in the database. So instead of getting one doc with a query I am getting multiple results. Thus I am trying to use find_one method which return the first query match. However, changing from find to find_one method I am facing a new problem. My code is the following:
lines = [line.rstrip() for line in open('ids.txt')]
list_names = []
names= open('name.txt', 'w')
for x in range(0,3000):
id = int(lines[x])
print x ,' ',lines[x]
for cursor in collection.find_one({"_id.uid": id}):
name = cursor['screenname']
print name
list_names.append(name)
names.write("%s\n" % name)
names.close()
I have a list of ids and I want to return the correspondant names from mongoDb. However, I am getting `name = cursor['screenname']
TypeError: string indices must be integers
What am I doing wrong here?

The find_one method does not return a cursor. It returns the document itself.
session = self.sessions.find_one({'_id': session_id})
print session # must print your document

Retrieve all items from DynamoDB using query?

I am trying to retrieve all items in a dynamodb table using a query. Below is my code:
import boto.dynamodb2
from boto.dynamodb2.table import Table
from time import sleep
c = boto.dynamodb2.connect_to_region(aws_access_key_id="XXX",aws_secret_access_key="XXX",region_name="us-west-2")
tab = Table("rip.irc",connection=c)
x = tab.query()
for i in x:
print i
sleep(1)
However, I recieve the following error:
ValidationException: ValidationException: 400 Bad Request
{'message': 'Conditions can be of length 1 or 2 only', '__type': 'com.amazon.coral.validate#ValidationException'}
The code I have is pretty straightforward and out of the boto dynamodb2 docs, so I am not sure why I am getting the above error. Any insights would be appreciated (new to this and a bit lost). Thanks
EDIT: I have both an hash key and a range key. I am able to query by specific hash keys. For example,
x = tab.query(hash__eq="2014-01-20 05:06:29")
How can I retrieve all items though?

Ahh ok, figured it out. If anyone needs:
You can't use the query method on a table without specifying a specific hash key. The method to use instead is scan. So if I replace:
x = tab.query()
with
x = tab.scan()
I get all the items in my table.

I'm on groovy but it's gonna drop you a hint. Error :
{'message': 'Conditions can be of length 1 or 2 only'}
is telling you that your key condition can be length 1 -> hashKey only, or length 2 -> hashKey + rangeKey. All what's in a query on a top of keys will provoke this error.
The reason of this error is: you are trying to run search query but using key condition query. You have to add separate filterCondition to perform your query.
My code
String keyQuery = " hashKey = :hashKey and rangeKey between :start and :end "
queryRequest.setKeyConditionExpression(keyQuery)// define key query
String filterExpression = " yourParam = :yourParam "
queryRequest.setFilterExpression(filterExpression)// define filter expression
queryRequest.setExpressionAttributeValues(expressionAttributeValues)
queryRequest.setSelect('ALL_ATTRIBUTES')
QueryResult queryResult = client.query(queryRequest)

.scan() does not automatically return all elements of a table due to pagination of the table. There is a 1Mb max response limit Dynamodb Max response limit
Here is a recursive implementation of the boto3 scan:
import boto3
dynamo = boto3.resource('dynamodb')
def scanRecursive(tableName, **kwargs):
"""
NOTE: Anytime you are filtering by a specific equivalency attribute such as id, name
or date equal to ... etc., you should consider using a query not scan
kwargs are any parameters you want to pass to the scan operation
"""
dbTable = dynamo.Table(tableName)
response = dbTable.scan(**kwargs)
if kwargs.get('Select')=="COUNT":
return response.get('Count')
data = response.get('Items')
while 'LastEvaluatedKey' in response:
response = kwargs.get('table').scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
data.extend(response['Items'])
return data

I ran into this error when I was misusing KeyConditionExpression instead of FilterExpression when querying a dynamodb table.
KeyConditionExpression should only be used with partition key or sort key values.
FilterExpression should be used when you want filter your results even more.
However do note, using FilterExpression uses the same reads as it would without, because it performs the query based on the keyConditionExpression. It then removes items from the results based on your FilterExpression.
Source
Working with Queries

This is how I do a query if someone still needs a solution:
def method_name(a, b)
results = self.query(
key_condition_expression: '#T = :t',
filter_expression: 'contains(#S, :s)',
expression_attribute_names: {
'#T' => 'your_table_field_name',
'#S' => 'your_table_field_name'
},
expression_attribute_values: {
':t' => a,
':s' => b
}
)
results
end

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use ResultSet in PyES - python

Its not that complicated: just iterate over the result set. For example with a for loop: for item in results: print item

Related

pymongo: Error Creating embedded array in an OrderedDict

saving search results as text instead of list

NDB Model Querying of Key Ids using an array filter

Filtered multiple returned docs in mongoDB

Retrieve all items from DynamoDB using query?

Categories

Resources