get result in dynamo db based on list of elements - python

I am new to dynamo db and want to compare values of a list(python) with attribute value of dynamo db table.
I am able to compare single value by using query with index key:
response = dynamotable.query(
IndexName='Classicmovies',
KeyConditionExpression = Key('DDT').eq('BBB-rrr-jjj-mq'))
but want to compare entire list which should be in .eq as follow:
movies =['ddd-dddss-gdgdg','kkdf-dfdfd-www','dfw-gddf-gssg']
I have searched alot and not able to figure out right way.

Hard to say what you are trying to do. A query will only retrieve a bunch of records belonging to a single item collection. Maybe what you need is a scan but please avoid heavily using scans unless of its for maintenance purposes.

Related

Number of items in a table in dynamodb using boto3

I am trying to get the number of items in a table from dynamo db.
Code
def urlfn():
if request.method == 'GET':
print("GET REq processing")
return render_template('index.html',count = table.item_count)
But I am not getting the real count. I found that there is a 6 hour delay in getting the real count. Is there any way to get the real count of items in a table.
assuming in your code above that table is a service resource already defined, you can use:
len(table.scan())
this will give you an up to date count of items in your table. BUT it reads every single item in your table - for significantly large tables this can take a long time. AND it uses read capacity on your table to do so. So, for most practical purposes it really isn't a very good way to do so,
Depending on your use case here there are a few other options:
add a meta item that is updated everytime a new document is added to the dynamo. This is just document of whatever hash key / sort key combination you want with an attribute of "value" that you add 1 to every time you add a new item to the database.
you forget about using Dynamo. Sorry that sounds harsh, but DynamoDB is a no-sql database and attempting to use it in the same manner as a traditional relational database system is folly. # of 'rows' is not something that Dynamo is designed for because thats not its use case scope. There are no rows in Dynamo - there are documents, and those documents are partitioned, and you access small chunks of them at a time - meaning that the back end architecture does not lend itself for knowing what the entire system has in it at any given time (hence the 6 hour delay)

Is there a way to return multiple values to python after a MySql query?

I am new to python and of course MySql. I recently created a Python function that generates a list of values that i want to insert to a table (2 columns) in MySql based on their specification.
Is it possible to create a procedure that can take a list of values that i'm sending through python, check if these values are already in one of my 2 two columns,
if they are already in the second one don't return,
if they are in the first one return all that are contained there
if they are in none of them return them with some kind of a flag so i can handle them through python and insert them to correct table
EXTRA EXPLANATION
Let me try to explain what i want to achieve so maybe you can give me a push and help me out. So, first i get a list of CPE items like this ("cpe:/a:apache:iotdb:0.9.0") in python and my goal is to save them into a database where the CPE's related to the IOT will be differentiated from the generic ones and saved in different tables or columns. My goal is that this distinction will be done by user input for each and every item but only once per item, so after parsing all items in python i want to first check in database if they exist in one of the tables or columns.
So for each and every list item that i pass i want to query mysql and:
if it exists in non iot column already don't return anything
if it exist in iot column already return item
if not exists anywhere return also item so i can get user input in python to verify if this is iot item or not and insert it to database after that
I think you could use library called pandas.
Idk if it is the best solution but it could work.
Export the thing you have in SQL into pandas or just query the SQL file using pandas.
Check out this library, it's really helpful for exploring data sets.
https://pandas.pydata.org/

Retrieving Data Faster Using Django from a Large Database

I have a large database table of more than a million records and django is taking really long time to retrieve the data. When I had less records the data was retrieved fast.
I am using the get() method to retrieve the data from the database. I did try the filter() method but when I did that then it gave me entire table rather than filtering on the given condition.
Currently I retrieve the data using the code shown below:
context['variables'] = variable.objects.get(id=self.kwargs['pk'])
I know why it is slow, because its trying to go through all the records and get the records whose id matched. But I was wondering if there was a way I could restrict the search to last 100 records or if there is something I am not doing correctly with the filter() function. Any help would be appretiated.

Dynamodb - query if a list contains

I'm fairly new to NoSQL. Using Python/Boto but this is a fairly general question. Currently trying to switch a project from MongoDB to DynamoDB and seeking some advice on DynamoDB and it's capacity to query if a list contains a certain string. I have been searching for the past day or so but I'm starting to worry that it doesn't have this facility, other than to use scan which is terribly slow considering the db will be queries thousands of times on updates. Similar unanswered question here
I understand primary keys can only be N, S or B and not something like String Set (SS) which would have been useful.
The data is fairly simple and would look something like this. I'm looking for the most efficient way to query the db based on the tag attribute for entries that include 'string1' OR 'string2'. Again, I don't want to use scan but am willing to consider normalization of the data structure if there is a best practice in dynamodb.
{
id: <some number used as a primary key>,
tags: ['string1', 'string2'...],
data: {some JSON object}
}
From what I've read, even using global secondary indexes, this doesn't seem possible which is strange since that would make dynamodb only useful for the most simple queries. Hoping I'm missing something.
In MongoDB, you have multikey indices, but not in DynamoDB.
I'd think you'd need to solve it like you would in a relational database: create a many-to-many relation table with tag as your hash key and entry id as your sort key. And find some way to keep your relation table in sync with your entry table.

Quicker way of updating subdocuments

My JSON documents (called "i"), have sub documents (called "elements").
I am looping trhough these subdocuments and updating them one at a time. However, to do so (once the value i need is computed), I have mongo scan through all the documents in the database, then through all the subdocuments, and then find the subdocument it needs to update.
I am having major time issues, as I have ~3000 documents and this is taking about 4minutes.
I would like to know if there is a quicker way to do this, without mongo having to scan all the documents but by doing it within the loop.
Here is the code:
for i in db.stuff.find():
for element in i['counts']:
computed_value = element[a] + element[b]
db.stuff.update({'id':i['id'], 'counts.timestamp':element['timestamp']},
{'$set': {'counts.$.total':computed_value}})
I am identifying the overall document by "id" and then the subdocument by its timestamp (which is unique to each subdocument). I need to find a quicker way than this. Thank you for your help.
What indexes do you have on your collection ? This could probably be sped up by creating an index on your embedded documents. You can do this using dot notation -- there's a good explanation and example here.
In your case, you'd do something like
db.stuff.ensureIndex( { "i.elements.timestamp" : 1 });
This will make your searches through embedded documents run much faster.
Your update is based on id (and i assume it is diff from default _id of mongo)
Put index on your id field
You want to set new field for all documents within collection or want to do it only for some matching collection to given criteria? if only for matching collections, use query operator (with index if possible)
dont fetch full document, fetch only those fields which are being used.
What is your avg document size? Use explain and mongostat to understand what is actual bottleneck.

Categories

Resources