how to use keycondition in Query method from DynamoDb Table using boto3 - python

'
I am trying to query the Dynamodb table to display the items from the table.
I am using a table.query with some conditions such as projectionexpression etc.
Since we knew that scan Method scans the whole table which is time-consuming, so im trying to use a query where i am getting an issue with putting key conditions as it is mandatory criteria.
So can you help me in this code.
using this URL :http://localhost:0000/rsi/hotels
My Repository.py ->
def list_items(self):
ProjectionExpression = "Id,#Name,Description"
ean = {"#Name": "Name"}
# Using this URL: http://localhost:0000/rsi/hotels
# how can I use keycondition for this in below table
esk = Key('Id').gt(0) # stuck here dono what exact condition to
give
limit = settings.AWS_SCAN_LIMIT
items = []
tableResponse = table.query(
ProjectionExpression=ProjectionExpression,
ExpressionAttributeNames=ean,
KeyConditionExpression = esk,
Limit=limit )
items.extend(tableResponse['Items'])
while 'LastEvaluatedKey' in tableResponse:
tableResponse = table.query(
ProjectionExpression=ProjectionExpression,
ExpressionAttributeNames=ean,
ExclusiveStartKey=tableResponse['LastEvaluatedKey'],
KeyConditionExpression=esk,
Limit = limit
)
items.extend(tableResponse['Items'])
return items
ERROR: ERROR - An error occurred (ValidationException) when calling the Query operation: Query key condition not supported
Actually, my table consists of
id name description
1 hai des
09 bye des2
123 there des3
etc......
So, i am not able to get of any other option on how to use parallel scan and query effectively, So,i have approached you, people.
Please help me on this!

Related

%s variable in Query Execution Python 3.8 (pymssql)

I have a python script with a basic GUI that logs into a DB and executes a query.
The Python script also asks for 1 parameter called "collection Name" which is taken from the tkinter .get function and is added as a %s inside the Query text. The result is that each time I can execute a query with a different "Collection name". This works and it is fine
Now, I want to add a larger string of Collection Names into my .get function so I can do cursor.execute a query with multiple collection names to get more complex data. But I am having issues with inputing multiple "collection names" into my app.
Below is a piece of my Query1, which has the %s variable that it then gets from the input to tkinter.
From #Session1
Join vGSMRxLevRxQual On(#Session1.SessionId = vGSMRxLevRxQual.SessionId)
Where vGSMRxLevRxQual.RxLevSub<0 and vGSMRxLevRxQual.RxLevSub>-190
and #Session1.CollectionName in (%s)
Group by
#Session1.Operator
Order by #Session1.Operator ASC
IF OBJECT_ID('tempdb..#SelectedSession1') IS NOT NULL DROP TABLE #SelectedSession1
IF OBJECT_ID('tempdb..#Session1') IS NOT NULL DROP TABLE #Session1
Here, is where I try to execute the query
if Query == "GSMUERxLevelSub" :
result = cursor.execute(GSMUERxLevelSub, (CollectionName,))
output = cursor.fetchmany
df = DataFrame(cursor.fetchall())
filename = "2021_H1 WEEK CDF GRAPHS().xlsx"
df1 = DataFrame.transpose(df, copy=False)
Lastly, here is where I get the value for the Collection name:
CollectionName = f_CollectionName.get()
enter image description here
enter code here
Your issues are due to a list/collection being a invalid parameter.
You'll need to transform collectionName
collection_name: list[str] = ['collection1', 'collection2']
new_collection_name = ','.join(f'"{c}"' for c in collection_name)
cursor.execute(sql, (new_collection_name,))
Not sure if this approach will be susceptible to SQL injection if that's a concern.
Edit:
Forgot the DBAPI would put another set of quotes around the parameters. If you can do something like:
CollectionName = ["foo", "bar"]
sql = f"""
From #Session1
Join vGSMRxLevRxQual On(#Session1.SessionId = vGSMRxLevRxQual.SessionId)
Where vGSMRxLevRxQual.RxLevSub<0 and vGSMRxLevRxQual.RxLevSub>-190
and #Session1.CollectionName in ({",".join(["%s"] * len(CollectionName))})
"""
sql += """
Group by
#Session1.Operator
Order by #Session1.Operator ASC
"""
cursor.execute(sql, (CollectionName,))
EDIT: Update to F-string

Read optimisation cassandra using python

I have a table with the following model:
CREATE TABLE IF NOT EXISTS {} (
user_id bigint ,
pseudo text,
importance float,
is_friend_following bigint,
is_friend boolean,
is_following boolean,
PRIMARY KEY ((user_id), is_friend_following)
);
I also have a table containing my seeds. Those (20) users are the starting point of my graph. So I select their ID and search in the table above to get their Followers and friends, and from there I build my graph (networkX).
def build_seed_graph(cls, name):
obj = cls()
obj.name = name
query = "SELECT twitter_id FROM {0};"
seeds = obj.session.execute(query.format(obj.seed_data_table))
obj.graph.add_nodes_from(obj.seeds)
for seed in seeds:
query = "SELECT friend_follower_id, is_friend, is_follower FROM {0} WHERE user_id={1}"
statement = SimpleStatement(query.format(obj.network_table, seed), fetch_size=1000)
friend_ids = []
follower_ids = []
for row in obj.session.execute(statement):
if row.friend_follower_id in obj.seeds:
if row.is_friend:
friend_ids.append(row.friend_follower_id)
if row.is_follower:
follower_ids.append(row.friend_follower_id)
if friend_ids:
for friend_id in friend_ids:
obj.graph.add_edge(seed, friend_id)
if follower_ids:
for follower_id in follower_ids:
obj.graph.add_edge(follower_id, seed)
return obj
The problem is that the time it takes to build the graph is too long and I would like to optimize it.
I've got approximately 5 millions rows in my table 'network_table'.
I'm wondering if it would be faster for me if instead of doing a query with a where clauses to just do a single query on whole table? Will it fit in memory? Is that a good Idea? Are there better way?
I suspect the real issue may not be the queries but rather the processing time.
I'm wondering if it would be faster for me if instead of doing a query with a where clauses to just do a single query on whole table? Will it fit in memory? Is that a good Idea? Are there better way?
There should not be any problem with doing a single query on the whole table if you enable paging (https://datastax.github.io/python-driver/query_paging.html - using fetch_size). Cassandra will return up to the fetch_size and will fetch additional results as you read them from the result_set.
Please note that if you have many rows in the table that are non seed related then a full scan may be slower as you will receive rows that will not include a "seed"
Disclaimer - I am part of the team building ScyllaDB - a Cassandra compatible database.
ScyllaDB have published lately a blog on how to efficiently do a full scan in parallel http://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/ which applies to Cassandra as well - if a full scan is relevant and you can build the graph in parallel than this may help you.
It seems like you can get rid of the last 2 if statements, since you're going through data that you already have looped through once:
def build_seed_graph(cls, name):
obj = cls()
obj.name = name
query = "SELECT twitter_id FROM {0};"
seeds = obj.session.execute(query.format(obj.seed_data_table))
obj.graph.add_nodes_from(obj.seeds)
for seed in seeds:
query = "SELECT friend_follower_id, is_friend, is_follower FROM {0} WHERE user_id={1}"
statement = SimpleStatement(query.format(obj.network_table, seed), fetch_size=1000)
for row in obj.session.execute(statement):
if row.friend_follower_id in obj.seeds:
if row.is_friend:
obj.graph.add_edge(seed, row.friend_follower_id)
elif row.is_follower:
obj.graph.add_edge(row.friend_follower_id, seed)
return obj
This also gets rid of many append operations on lists that you're not using, and should speed up this function.

Returning the entire dataset using Google App Engine indexed search

Is there any way to fetch the entire dataset in an app engine search index? The below search takes an integer limit through QueryOptions, and the limit which always needs to be present.
I'm unable to determine if there is some special flag that can bypass this limit and return the entire result set. If the query is made without a QueryOptions, the result set is limited to 20 somehow.
_INDEX = search.Index(name=constants.SEARCH_INDEX)
_INDEX.search(query=search.Query(
query,
options=search.QueryOptions(
limit=limit,
sort_options=search.SortOptions(...))))
Any ideas?
You could customise the delete all example, if indeed you want every document in the index rather then every result in a query https://cloud.google.com/appengine/docs/python/search/#Python_Deleting_documents_from_an_index
from google.appengine.api import search
def delete_all_in_index(index_name):
"""Delete all the docs in the given index."""
doc_index = search.Index(name=index_name)
# looping because get_range by default returns up to 100 documents at a time
while True:
# Get a list of documents populating only the doc_id field and extract the ids.
document_ids = [document.doc_id
for document in doc_index.get_range(ids_only=True)]
if not document_ids:
break
# Delete the documents for the given ids from the Index.
doc_index.delete(document_ids)
So you might end up with something like:
while True:
document_ids = [document.doc_id
for document in doc_index.get_range(ids_only=True)]
if not document_ids:
break
# Get then something with the document
for id in document_ids:
document = index.get(id)
You'd probably want to get the document itself in the list comprehension rather then getting the ID then getting the document from that ID, but you get the idea.
Firstly, if you peek into the constructor of QueryOptions, that answers your question why it returns 20 results:
def __init__(self, limit=20, number_found_accuracy=None, cursor=None,
offset=None, sort_options=None, returned_fields=None,
ids_only=False, snippeted_fields=None,
returned_expressions=None):
The reason I think why the API is doing this is to avoid unnecessary fetching of results. You should use an offset if you need to fetch more results upon user action instead of always fetching all results. See this.
from google.appengine.api import search
...
# get the first set of results
page_size = 10
results = index.search(search.Query(query_string='some stuff',
options=search.QueryOptions(limit=page_size))
# calculate pages
pages = results.found_count / page_size
# user chooses page and hence an offset into results
next_page = ith * page_size
# get the search results for that page
results = index.search(search.Query(query_string='some stuff',
options=search.QueryOptions(limit=page_size, offset=next_page))

Retrieve all items from DynamoDB using query?

I am trying to retrieve all items in a dynamodb table using a query. Below is my code:
import boto.dynamodb2
from boto.dynamodb2.table import Table
from time import sleep
c = boto.dynamodb2.connect_to_region(aws_access_key_id="XXX",aws_secret_access_key="XXX",region_name="us-west-2")
tab = Table("rip.irc",connection=c)
x = tab.query()
for i in x:
print i
sleep(1)
However, I recieve the following error:
ValidationException: ValidationException: 400 Bad Request
{'message': 'Conditions can be of length 1 or 2 only', '__type': 'com.amazon.coral.validate#ValidationException'}
The code I have is pretty straightforward and out of the boto dynamodb2 docs, so I am not sure why I am getting the above error. Any insights would be appreciated (new to this and a bit lost). Thanks
EDIT: I have both an hash key and a range key. I am able to query by specific hash keys. For example,
x = tab.query(hash__eq="2014-01-20 05:06:29")
How can I retrieve all items though?
Ahh ok, figured it out. If anyone needs:
You can't use the query method on a table without specifying a specific hash key. The method to use instead is scan. So if I replace:
x = tab.query()
with
x = tab.scan()
I get all the items in my table.
I'm on groovy but it's gonna drop you a hint. Error :
{'message': 'Conditions can be of length 1 or 2 only'}
is telling you that your key condition can be length 1 -> hashKey only, or length 2 -> hashKey + rangeKey. All what's in a query on a top of keys will provoke this error.
The reason of this error is: you are trying to run search query but using key condition query. You have to add separate filterCondition to perform your query.
My code
String keyQuery = " hashKey = :hashKey and rangeKey between :start and :end "
queryRequest.setKeyConditionExpression(keyQuery)// define key query
String filterExpression = " yourParam = :yourParam "
queryRequest.setFilterExpression(filterExpression)// define filter expression
queryRequest.setExpressionAttributeValues(expressionAttributeValues)
queryRequest.setSelect('ALL_ATTRIBUTES')
QueryResult queryResult = client.query(queryRequest)
.scan() does not automatically return all elements of a table due to pagination of the table. There is a 1Mb max response limit Dynamodb Max response limit
Here is a recursive implementation of the boto3 scan:
import boto3
dynamo = boto3.resource('dynamodb')
def scanRecursive(tableName, **kwargs):
"""
NOTE: Anytime you are filtering by a specific equivalency attribute such as id, name
or date equal to ... etc., you should consider using a query not scan
kwargs are any parameters you want to pass to the scan operation
"""
dbTable = dynamo.Table(tableName)
response = dbTable.scan(**kwargs)
if kwargs.get('Select')=="COUNT":
return response.get('Count')
data = response.get('Items')
while 'LastEvaluatedKey' in response:
response = kwargs.get('table').scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
data.extend(response['Items'])
return data
I ran into this error when I was misusing KeyConditionExpression instead of FilterExpression when querying a dynamodb table.
KeyConditionExpression should only be used with partition key or sort key values.
FilterExpression should be used when you want filter your results even more.
However do note, using FilterExpression uses the same reads as it would without, because it performs the query based on the keyConditionExpression. It then removes items from the results based on your FilterExpression.
Source
Working with Queries
This is how I do a query if someone still needs a solution:
def method_name(a, b)
results = self.query(
key_condition_expression: '#T = :t',
filter_expression: 'contains(#S, :s)',
expression_attribute_names: {
'#T' => 'your_table_field_name',
'#S' => 'your_table_field_name'
},
expression_attribute_values: {
':t' => a,
':s' => b
}
)
results
end

Python/Plone: Getting all unique keywords (Subject)

Is there a way of getting all the unique keyword index i.e. Subject in Plone by querying the catalog?
I have been using this as a guide but not yet successful.
This is what I have so far
def search_content_by_keywords(self):
"""
Attempting to search the catalog
"""
catalog = self.context.portal_catalog
query = {}
query['Subject'] = 'Someval'
results = catalog.searchResults(query)
return results
Instead of passing the keyword, I want to fetch all the keywords
catalog = self.context.portal_catalog
my_keys = catalog.uniqueValuesFor('Subject')
reference: http://docs.plone.org/develop/plone/searching_and_indexing/query.html#unique-values

Categories

Resources