Structuring Firebase Database

Structuring Firebase Database - python

I'm following this tutorial to structure Firebase data. Near the end, it says the following:
With this kind of structure, you should keep in mind to update the data at 2 locations under the user and group too. Also, I would like to notify you that everywhere on the Internet, the object keys are written like "user1","group1","group2" etc. where as in practical scenarios it is better to use firebase generated keys which look like '-JglJnGDXcqLq6m844pZ'. We should use these as it will facilitate ordering and sorting.
So based on that, I'm assuming that the final result should be the following:
I'm using this python wrapper to post the data.
How can I achieve this?

When you write data to a Firebase array (for example in Javascript) using a line like this
var newPostKey = firebase.database().ref().child('users').push().key;
var updates = {item1: value1, item2: value2};
return firebase.database().ref().update(updates);
Like is described here, you will get a generated key for data "pushed". In the example above newPostKey will contain this generated key
UPDATE
To answer the updated question with with the Python wrapper:
Look for the section "Saving Data" in the page you linked to.
The code would look something like this;
data = {"Title": "The Animal Book"}
book = db.child("AllBooks").push(data)
data = {"Title": "Animals"}
category = db.child("Categories").push(data)
data = {category['name']: true }
db.child("AllBooks").child(book['name']).child("categories").push(data)

Related

What kind of data structure is this? Python

Studying Python, I am following an excellent Corey Schafer tutorial on Flask, he does this (I have extracted and summarized it for obvious reasons):
from folder_app import app # I did it to follow the structure and that the code is equal to the original
s = Serializer(app.config['SECRET_KEY'], 1800) # key, seconds
token = s.dumps({'user_id': 1}).decode('utf-8')
s = Serializer(app.config['SECRET_KEY'])
user_id = s.loads(token)['user_id'] # This is where I have the doubt
print(user_id)
print(type(s.loads(token)))
The code works, the problem I have is that although as you can see (s.loads (token)) is a dict, I expected to see something like this s.loads ({token ['user_id']}), or s.loads (token ['user_id']) or something like that. That is, it is a dict but it does not seem so. And my doubt goes in the sense if this comes from a greater concept of those they call "pythonic" (which I have not seen so far), or is something that only happens particularly as in this case. Incidentally, https://itsdangerous.palletsprojects.com/en/1.1.x/jws/ this appears: loads (self, s, salt = None, return_header = False) the arguments are in parentheses. I hope it is clear what my doubt is :)

I know this is not answer per say but just to add to my comment. This is an example of how the loads function works on dictionaries with the json module. https://docs.python.org/3/library/json.html#json.loads. What it does is take a json string and return the dictionary type object in Python. Your Serializer is doing something similar. It takes the token string and represents it as an object like dict
The s.dumps I am assuming is similar to json.dumps which gives you the json string representation of python dictionary.
import json
my_dict = json.loads('{"user_id": "Mane", "name": "Joe"}')
my_dict['user_id']
So you could just do json.loads('{"user_id": "Mane", "name": "Joe"}')['user_id'] which is just chaining the operations.

Work with nested objects using couchdb-python

Disclaimer: Both Python and CouchDB are new for me. So far my "programming" has mostly consisted of Bash scripts.
I'm trying to create a small script that updates objects in a CouchDB database. The objects however aren't created by my script but by an App called Tap Forms that uses CouchDB for sync. Basically I'm trying to automatically update the content of the app. That also means I can't really influence the structure or names of the objects in CouchDB.
The Database is mostly filled with objects of this structure:
{
"_id": "rec-3b17...",
"_rev": "21-cdf6...",
"values": {
"fld-c3d4...": 4,
"fld-1def...": 1000000000000,
"fld-bb44...": 760000000000,
"fld-a44f...": "admin,name",
"fld-5fc0...": "SSD",
"fld-642c...": true,
},
"deviceName": "MacBook Air",
"dateModified": "2019-02-08T14:47:06.051Z",
"dateCreated": "2019-02-08T11:33:00.018Z",
"type": "frm-7ff3...",
"dbID": "db-1435...",
"form": "frm-7ff3..."
}
I shortened the numbers a bit and removed some entries to increase readability.
Now the actual values I'm trying to update are within the "values" : {...} array (or object, or list, guess I don't have much experience with JSON either).
As I know some of these values, I managed to create view that finds the _id of an object on the server. I then use the python-couchdb module as described in documentation:
for item in db.view('CustomViews/test2', key="GENERIC"):
doc = db[item.id]
This gives me the object. However I want to update one of the values within the values array, lets say fld-c3d4.... But how? Using doc['values'] = 'new_value' updates the whole array. I tried other (seemingly logical) ways along the lines of doc['values['fld-c3d4']'] = 'new_value' but couldn't wrap my head around it. I couldn't find an example in any documentation.

So here's a example how to update the fld-c3d4.
You have your document that represent a dictionary with nested dictionary.
If you want to get the values, you will do something like this:
values = doc['values']
Now the variable values points to the values in your document.
From there, you can access a sub value:
values['fld-c3d4'] = 'new value'
If you want to directly update the value from the doc, you just have to chain those operations:
doc['values']['fld-c3d4'] = 'new value'

Scraping data from a http & javaScript site

I currently want to scrape some data from an amazon page and I'm kind of stuck.
For example, lets take this page.
https://www.amazon.com/NIKE-Hyperfre3sh-Athletic-Sneakers-Shoes/dp/B01KWIUHAM/ref=sr_1_1_sspa?ie=UTF8&qid=1546731934&sr=8-1-spons&keywords=nike+shoes&psc=1
I wanted to scrape every variant of shoe size and color. That data can be found opening the source code and searching for 'variationValues'.
There we can see sort of a dictionary containing all the sizes and colors and, below that, in 'asinToDimentionIndexMap', every product code with numbers indicating the variant from the variationValues 'dictionary'.
For example, in asinToDimentionIndexMap we can see
"B01KWIUH5M":[0,0]
Which means that the product code B01KWIUH5M is associated with the size '8M US' (position 0 in variationValues size_name section) and the color 'Teal' (same idea as before)
I want to scrape both the variationValues and the asinToDimentionIndexMap, so i can associate the IndexMap numbers to the variationValues one.
Another person in the site (thanks for the help btw) suggested doing it this way.
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '(\{.+?\}_')
import json
d = json.loads(data[0])
d['products'][0]
I can sort of understand the first part. We get everything that's a 'script' as a string and then get everything between {}. The issue is what happens after that. My knowledge of json is not that great and reading some stuff about it didn't help that much.
Is it there a way to get, from that data, 2 dictionaries or lists with the variationValues and asinToDimentionIndexMap? (maybe using some regular expressions in the middle to get some data out of a big string). Or explain a little bit what happens with the json part.
Thanks for the help!
EDIT: Added photo of variationValues and asinToDimensionIndexMap

I think you are close Manuel!
The following code will turn your scraped source into easy-to-select boxes:
import json
d = json.loads(data[0])
JSON is a universal format for storing object information. In other words, it's designed to interpret string data into object data, regardless of the platform you are working with.
https://www.w3schools.com/js/js_json_intro.asp
I'm assuming where you may be finding things a challenge is if there are any errors when accessing a particular "box" inside you json object.
Your code format looks correct, but your access within "each box" may look different.
Eg. If your 'asinToDimentionIndexMap' object is nested within a smaller box in the larger 'products' object, then you might access it like this (after running the code above):
d['products'][0]['asinToDimentionIndexMap']
I've hacked and slash a little bit so you can better understand the structure of your particular json file. Take a look at the link below. On the right-hand side, you will see "which boxes are within one another" - which is precisely what you need to know for accessing what you need.
JSON Object Viewer
For example, the following would yield "companyCompliancePolicies_feature_div":
import json
d = json.loads(data[0])
d['updateDivLists']['full'][0]['divToUpdate']
The person helping you before outlined a general case for you, but you'll need to go in an look at structure this way to truly find what you're looking for.

variationValues = re.findall(r'variationValues\" : ({.*?})', ' '.join(script))[0]
asinVariationValues = re.findall(r'asinVariationValues\" : ({.*?}})', ' '.join(script))[0]
dimensionValuesData = re.findall(r'dimensionValuesData\" : (\[.*\])', ' '.join(script))[0]
asinToDimensionIndexMap = re.findall(r'asinToDimensionIndexMap\" : ({.*})', ' '.join(script))[0]
dimensionValuesDisplayData = re.findall(r'dimensionValuesDisplayData\" : ({.*})', ' '.join(script))[0]
Now you can easily convert them to json as use them combine as you wish.

Is there a Python API for submitting batch get requests to AWS DynamoDB?

The package boto3 - Amazon's official AWS API wrapper for python - has great support for uploading items to DynamoDB in bulk. It looks like this:
db = boto3.resource("dynamodb", region_name = "my_region").Table("my_table")
with db.batch_writer() as batch:
for item in my_items:
batch.put_item(Item = item)
Here my_items is a list of Python dictionaries each of which must have the table's primary key(s). The situation isn't perfect - for instance, there is no safety mechanism to prevent you from exceeding your throughput limits - but it's still pretty good.
However, there does not appear to be any counterpart for reading from the database. The closest I can find is DynamoDB.Client.batch_get_item(), but here the API is extremely complicated. Here's what requesting two items looks like:
db_client = boto3.client("dynamodb", "my_region")
db_client.batch_get_item(
RequestItems = {
"my_table": {
"Keys": [
{"my_primary_key": {"S": "my_key1"}},
{"my_primary_key": {"S": "my_key2"}}
]
}
}
)
This might be tolerable, but the response has the same problem: all values are dictionaries whose keys are data types ("S" for string, "N" for number, "M" for mapping, etc.) and it is more than a little annoying to have to parse everything. So my questions are:
Is there any native boto3 support for batch reading from DynamoDb, similar to the batch_writer function above?
Failing that,
Does boto3 provide any built-in way to automatically deserialize the responses to the DynamoDB.Client.batch_get_item() function?
I'll also add that the function boto3.resource("dynamodb").Table().get_item() has what I would consider to be the "correct" API, in that no type-parsing is necessary for inputs or outputs. So it seems that this is some sort of oversight by the developers, and I suppose I'm looking for a workaround.

So thankfully there is something that you might find useful - much like the json module which has json.dumps and json.loads, boto3 has a types module that includes a serializer and deserializer. See TypeSerializer/TypeDeserializer. If you look at the source code, the serialization/deserialization is recursive and should be perfect for your use case.
Note: Its recommended that you use Binary/Decimal instead of just using a regular old python float/int for round trip conversions.
serializer = TypeSerializer()
serializer.serialize('awesome') # returns {'S' : 'awesome' }
deser = TypeDeserializer()
deser.deserialize({'S' : 'awesome'}) # returns u'awesome'
Hopefully this helps!

There's the service resource level batch_get_item. Maybe you could do something like that :
def batch_query_wrapper(table, key, values):
results = []
response = dynamo.batch_get_item(RequestItems={table: {'Keys': [{key: val} for val in values]}})
results.extend(response['Responses'][table])
while response['UnprocessedKeys']:
# Implement some kind of exponential back off here
response = dynamo.batch_get_item(RequestItems={table: {'Keys': [{key: val} for val in values]}})
results.extend(response['Response'][table])
return results
It will return your result as python objects.

I find this to be an effective way to convert a Boto 3 DynamoDB item to a Python dict.
https://github.com/Alonreznik/dynamodb-json

Somewhat related answer, the docs advertise the query method; their e.g.:
from boto3.dynamodb.conditions import Key, Attr
response = table.query(
KeyConditionExpression=Key('username').eq('johndoe')
)
items = response['Items']
print(items)

Pymongo.find() only return answer

I am working on a code which will fetch data from the database using pymongo. After that I'll show it in a GUI using Tkinter.
I am using
.find()
to find specific documents. However I don't want anything else then 'name' to show up. So I used {"name":1}, now it returns:
{u'name':u'**returned_name**'}
How do I remove the u'name': so it will only return returned_name?
Thanks in advance,
Max
P.s. I have searched a lot around the web but couldn't find anything which would give me some argument to help me.

What you see returned by find() call is a cursor. Just iterate over the cursor and get the value by the name key for every document found:
result = db.col.find({"some": "condition"}, {"name": 1})
print([document["name"] for document in result])
As a result, you'll get a list of names.
Or, if you want and expect a single document to be matched, use find_one():
document = db.col.find_one({"some": "condition"}, {"name": 1})
print(document["name"])

Mongo will return the data with keys, though you can as workaround use something like this
var result = []
db.Resellers_accounts.find({"name":1, "_id":0}).forEach(function(u) { result.push(u.name) })
This example is for NodeJS driver, similar can be done for Python
Edit (Python Code) -
res = db.Resellers_accounts.find({},{"name":1, "_id":0})
result = []
for each in res:
result.append(res['name'])
Edit 2 -
No pymongo doesn't support returning only values, everything is key-value paired in MongoDB.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.