Retrieve all items from DynamoDB using query? - python

I am trying to retrieve all items in a dynamodb table using a query. Below is my code:
import boto.dynamodb2
from boto.dynamodb2.table import Table
from time import sleep
c = boto.dynamodb2.connect_to_region(aws_access_key_id="XXX",aws_secret_access_key="XXX",region_name="us-west-2")
tab = Table("rip.irc",connection=c)
x = tab.query()
for i in x:
print i
sleep(1)
However, I recieve the following error:
ValidationException: ValidationException: 400 Bad Request
{'message': 'Conditions can be of length 1 or 2 only', '__type': 'com.amazon.coral.validate#ValidationException'}
The code I have is pretty straightforward and out of the boto dynamodb2 docs, so I am not sure why I am getting the above error. Any insights would be appreciated (new to this and a bit lost). Thanks
EDIT: I have both an hash key and a range key. I am able to query by specific hash keys. For example,
x = tab.query(hash__eq="2014-01-20 05:06:29")
How can I retrieve all items though?

Ahh ok, figured it out. If anyone needs:
You can't use the query method on a table without specifying a specific hash key. The method to use instead is scan. So if I replace:
x = tab.query()
with
x = tab.scan()
I get all the items in my table.

I'm on groovy but it's gonna drop you a hint. Error :
{'message': 'Conditions can be of length 1 or 2 only'}
is telling you that your key condition can be length 1 -> hashKey only, or length 2 -> hashKey + rangeKey. All what's in a query on a top of keys will provoke this error.
The reason of this error is: you are trying to run search query but using key condition query. You have to add separate filterCondition to perform your query.
My code
String keyQuery = " hashKey = :hashKey and rangeKey between :start and :end "
queryRequest.setKeyConditionExpression(keyQuery)// define key query
String filterExpression = " yourParam = :yourParam "
queryRequest.setFilterExpression(filterExpression)// define filter expression
queryRequest.setExpressionAttributeValues(expressionAttributeValues)
queryRequest.setSelect('ALL_ATTRIBUTES')
QueryResult queryResult = client.query(queryRequest)

.scan() does not automatically return all elements of a table due to pagination of the table. There is a 1Mb max response limit Dynamodb Max response limit
Here is a recursive implementation of the boto3 scan:
import boto3
dynamo = boto3.resource('dynamodb')
def scanRecursive(tableName, **kwargs):
"""
NOTE: Anytime you are filtering by a specific equivalency attribute such as id, name
or date equal to ... etc., you should consider using a query not scan
kwargs are any parameters you want to pass to the scan operation
"""
dbTable = dynamo.Table(tableName)
response = dbTable.scan(**kwargs)
if kwargs.get('Select')=="COUNT":
return response.get('Count')
data = response.get('Items')
while 'LastEvaluatedKey' in response:
response = kwargs.get('table').scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
data.extend(response['Items'])
return data

I ran into this error when I was misusing KeyConditionExpression instead of FilterExpression when querying a dynamodb table.
KeyConditionExpression should only be used with partition key or sort key values.
FilterExpression should be used when you want filter your results even more.
However do note, using FilterExpression uses the same reads as it would without, because it performs the query based on the keyConditionExpression. It then removes items from the results based on your FilterExpression.
Source
Working with Queries

This is how I do a query if someone still needs a solution:
def method_name(a, b)
results = self.query(
key_condition_expression: '#T = :t',
filter_expression: 'contains(#S, :s)',
expression_attribute_names: {
'#T' => 'your_table_field_name',
'#S' => 'your_table_field_name'
},
expression_attribute_values: {
':t' => a,
':s' => b
}
)
results
end

Related

Cypher query problem when trying to find max of a returned column under certain relation id

I am facing a very strange problem I am calling the same function get_objects() 4 times and getting the max from the returned column, the item 10172 which should be returned as a maximum still present in the result list but instead of that it returns me another item 9998 which is not a maximum. While for other two calls to the same function with another parameter it gives me correct results.
I have run and tested the statement into Neo4j browser, it gives me the same problem behaves like just that node doesn't exist, but when I individually search for that node 10172 which should be returned as a maximum it does exist in the database but why it is not returning me as maximum in final result?
I also extracted the CSV file from the Neo4j to double check the relation and presence of that specific node. It exists. Where I am going wrong?
I have a data stored in a graph database as 4 types of nodes and they are connected with different 4 relations and the relation id attribute as (1,2,3,4) In cypher query I am trying to get the maximum paper id against relation 1. The problem seems to be exists with relation 1 and relation 4 calls. But I rechecked into database these nodes are present under these particular relations.
Here is what i have tried so far.
def get_objects(x):
par = str(x)
query = ''' MATCH (p)-[r]->(a) WHERE r.id = $par RETURN a.id '''
resultNodes = session.run(query, par = par)
df = DataFrame(resultNodes)
return df[0]
def find_max_1():
authors,terms,venues,papers=0,0,0,0
authors=get_objects(1).max()
terms=get_objects(2).max()
venues=get_objects(3).max()
papers=get_objects(4).max()
return authors,terms,venues,papers
def main():
m = find_max_1()
if __name__ == "__main__":
main()
The output is:
[9998, 14669, 10190, 9999]
Expected output:
[10172, 14669, 10190, 15648]
Any kind of help would be appreciated!
Thanks in advance.
The problem was returned result was string type and max() was calculating maximum between strings instead of int.

how to use keycondition in Query method from DynamoDb Table using boto3

'
I am trying to query the Dynamodb table to display the items from the table.
I am using a table.query with some conditions such as projectionexpression etc.
Since we knew that scan Method scans the whole table which is time-consuming, so im trying to use a query where i am getting an issue with putting key conditions as it is mandatory criteria.
So can you help me in this code.
using this URL :http://localhost:0000/rsi/hotels
My Repository.py ->
def list_items(self):
ProjectionExpression = "Id,#Name,Description"
ean = {"#Name": "Name"}
# Using this URL: http://localhost:0000/rsi/hotels
# how can I use keycondition for this in below table
esk = Key('Id').gt(0) # stuck here dono what exact condition to
give
limit = settings.AWS_SCAN_LIMIT
items = []
tableResponse = table.query(
ProjectionExpression=ProjectionExpression,
ExpressionAttributeNames=ean,
KeyConditionExpression = esk,
Limit=limit )
items.extend(tableResponse['Items'])
while 'LastEvaluatedKey' in tableResponse:
tableResponse = table.query(
ProjectionExpression=ProjectionExpression,
ExpressionAttributeNames=ean,
ExclusiveStartKey=tableResponse['LastEvaluatedKey'],
KeyConditionExpression=esk,
Limit = limit
)
items.extend(tableResponse['Items'])
return items
ERROR: ERROR - An error occurred (ValidationException) when calling the Query operation: Query key condition not supported
Actually, my table consists of
id name description
1 hai des
09 bye des2
123 there des3
etc......
So, i am not able to get of any other option on how to use parallel scan and query effectively, So,i have approached you, people.
Please help me on this!

Paginating a DynamoDB query in boto3

How can I loop through all results in a DynamoDB query, if they span more than one page? This answer implies that pagination is built into the query function (at least in v2), but when I try this in v3, my items seem limited:
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb')
fooTable = dynamodb.Table('Foo')
response = fooTable.query(
KeyConditionExpression=Key('list_id').eq('123')
)
count = 0
for i in response['Items']:
count += 1
print count # Prints a subset of my total items
ExclusiveStartKey is the name of the attribute which you are looking for.
Use the value that was returned for LastEvaluatedKey in the previous operation.
The data type for ExclusiveStartKey must be String, Number or Binary. No set data types are allowed.
http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#DynamoDB.Client.query

Returning the entire dataset using Google App Engine indexed search

Is there any way to fetch the entire dataset in an app engine search index? The below search takes an integer limit through QueryOptions, and the limit which always needs to be present.
I'm unable to determine if there is some special flag that can bypass this limit and return the entire result set. If the query is made without a QueryOptions, the result set is limited to 20 somehow.
_INDEX = search.Index(name=constants.SEARCH_INDEX)
_INDEX.search(query=search.Query(
query,
options=search.QueryOptions(
limit=limit,
sort_options=search.SortOptions(...))))
Any ideas?
You could customise the delete all example, if indeed you want every document in the index rather then every result in a query https://cloud.google.com/appengine/docs/python/search/#Python_Deleting_documents_from_an_index
from google.appengine.api import search
def delete_all_in_index(index_name):
"""Delete all the docs in the given index."""
doc_index = search.Index(name=index_name)
# looping because get_range by default returns up to 100 documents at a time
while True:
# Get a list of documents populating only the doc_id field and extract the ids.
document_ids = [document.doc_id
for document in doc_index.get_range(ids_only=True)]
if not document_ids:
break
# Delete the documents for the given ids from the Index.
doc_index.delete(document_ids)
So you might end up with something like:
while True:
document_ids = [document.doc_id
for document in doc_index.get_range(ids_only=True)]
if not document_ids:
break
# Get then something with the document
for id in document_ids:
document = index.get(id)
You'd probably want to get the document itself in the list comprehension rather then getting the ID then getting the document from that ID, but you get the idea.
Firstly, if you peek into the constructor of QueryOptions, that answers your question why it returns 20 results:
def __init__(self, limit=20, number_found_accuracy=None, cursor=None,
offset=None, sort_options=None, returned_fields=None,
ids_only=False, snippeted_fields=None,
returned_expressions=None):
The reason I think why the API is doing this is to avoid unnecessary fetching of results. You should use an offset if you need to fetch more results upon user action instead of always fetching all results. See this.
from google.appengine.api import search
...
# get the first set of results
page_size = 10
results = index.search(search.Query(query_string='some stuff',
options=search.QueryOptions(limit=page_size))
# calculate pages
pages = results.found_count / page_size
# user chooses page and hence an offset into results
next_page = ith * page_size
# get the search results for that page
results = index.search(search.Query(query_string='some stuff',
options=search.QueryOptions(limit=page_size, offset=next_page))

How to use ResultSet in PyES

I'm using PyES to use ElasticSearch in Python.
Typically, I build my queries in the following format:
# Create connection to server.
conn = ES('127.0.0.1:9200')
# Create a filter to select documents with 'stuff' in the title.
myFilter = TermFilter("title", "stuff")
# Create query.
q = FilteredQuery(MatchAllQuery(), myFilter).search()
# Execute the query.
results = conn.search(query=q, indices=['my-index'])
print type(results)
# > <class 'pyes.es.ResultSet'>
And this works perfectly. My problem begins when the query returns a large list of documents.
Converting the results to a list of dictionaries is computationally demanding, so I'm trying to return the query results already in a dictionary. I came across with this documentation:
http://pyes.readthedocs.org/en/latest/faq.html#id3
http://pyes.readthedocs.org/en/latest/references/pyes.es.html#pyes.es.ResultSet
https://github.com/aparo/pyes/blob/master/pyes/es.py (line 1304)
But I can't figure out what exactly I'm supposed to do.
Based on the previous links, I've tried this:
from pyes import *
from pyes.query import *
from pyes.es import ResultSet
from pyes.connection import connect
# Create connection to server.
c = connect(servers=['127.0.0.1:9200'])
# Create a filter to select documents with 'stuff' in the title.
myFilter = TermFilter("title", "stuff")
# Create query / Search object.
q = FilteredQuery(MatchAllQuery(), myFilter).search()
# (How to) create the model ?
mymodel = lambda x, y: y
# Execute the query.
# class pyes.es.ResultSet(connection, search, indices=None, doc_types=None,
# query_params=None, auto_fix_keys=False, auto_clean_highlight=False, model=None)
resSet = ResultSet(connection=c, search=q, indices=['my-index'], model=mymodel)
# > resSet = ResultSet(connection=c, search=q, indices=['my-index'], model=mymodel)
# > TypeError: __init__() got an unexpected keyword argument 'search'
Anyone was able to get a dict from the ResultSet?
Any good sugestion to efficiently convert the ResultSet to a (list of) dictionary will be appreciated too.
I tried too many ways directly to cast ResultSet into dict but got nothing. The best way I recently use is appending ResultSet items into another list or dict. ResultSet covers every single item in itself as a dict.
Here is how I use:
#create a response dictionary
response = {"status_code": 200, "message": "Successful", "content": []}
#set restul set to content of response
response["content"] = [result for result in resultset]
#return a json object
return json.dumps(response)
Its not that complicated: just iterate over the result set. For example with a for loop:
for item in results:
print item

Categories

Resources