Firebase Realtime Database - cannot read document as json/dictionary - python

In my realtime database I have a path /stats which contains a set of documents.
I want to using the python sdk get the /stats document as a dict. My code looks like that
path = "/stats"
ref = db.reference(path, firebase_app)
document = ref.get()
print(document)
And the output is
[None, {'name': 'Full Time Statistics', 'thumbnail': 'https://***', 'url': 'https://***'}]
which is a list not a dictionary. How to change it and read this document path as a dictionary something like that
{"1": {'name': 'Full Time Statistics', 'thumbnail': 'https://***', 'url': 'https://***'}}
On the other hand I can get other documents with similar structures as a dictionary with no issue. Why is it like that and how to solve it ?

Two things are happening here:
Since you are retrieving /stats you are getting all nodes under it. Since this is a repeated list and Firebase Realtime Database keys are strings, you'd normally get a dictionary (with the keys in the dictionary being the keys in the JSON).
Since your keys are numeric values, Firebase "thinks" you are trying to store an array/list and it tries to coerce the data into an array for you. That's why you get a None entry in the list: that's Firebase filling in the zeroth element for you.
There's unfortunately no way to disable this array coercion. I typically get around it by prefixing the keys with a fixed string, so that Firebase bypasses its array logic. So:
stats: {
stat1: { ... },
stat2: { ... }
}
Also see:
Best Practices: Arrays in Firebase

Related

Python: Parsing JSON data from API get - referring to the dictionary key?

I'm pretty new to Python so I'm only just starting to work with API's. I have retrieved the data I need from an API and it returns in the following format:
{u'id': u'000123', u'payload': u"{'account_title': u’sam b’, 'phone_num': ‘1234567890’, 'security_pin': u'000000', 'remote_word': u’secret123’, 'email_address': ‘email#gmail.com’, 'password': u’password123’}”}
So when I print my variable, it returns the above...
If I just want part of what it returns, how do I go about writing this? I was able to return id without issue, and if I specified 'payload' it returned everything after that. It seems like account_title, phone_num, security_pin, remote_word, email_address and password are all nested inside of 'payload'
Would would the best way be to have a variable, when printed, return just the email_address for example?
Thanks!
Welcome to Python! Sounds like you're getting right into it. It would be best to begin reading fundamentals, specifically about the Dictionary Data Structure
The Dictionary, or dict is what you are referencing in your question. It's a key-value store that is generally[1] un-ordered. The dict is a great way to represent JSON data.
Now you are asking how to extract information from a dictionary. Well, you seem to have it working out thus far! Let's use your example:
d = {u'id': u'000123', u'payload': u"{'account_title': u’sam b’, 'phone_num': ‘1234567890’, 'security_pin': u'000000', 'remote_word': u’secret123’, 'email_address': ‘email#gmail.com’, 'password': u’password123’}"}
Now if we write d['id'], we'll get the id (which is 000123)
If we write d['payload'], we'll get the dictionary within this larger dictionary. Cool part about dicts, they can be nested like this! As many times as you need.
d['payload']
"{'account_title': u’sam b’, 'phone_num': ‘1234567890’, 'security_pin': u'000000', 'remote_word': u’secret123’, 'email_address': ‘email#gmail.com’, 'password': u’password123’}"
Then as per your question, if you wanted to get email, it's the same syntax and you're just nesting the accessor. Like so:
d['payload']['email_address']
Hope that helps!
For the longest time, dicts were un-ordered in Python. In versions 3.6 and up, things began changing. This answer provides great detail on that. Otherwise, prior to that, using collections.OrderedDict was the only way to get a dict ordered by insertion-order

How do I call networkx.add_node(..) with optional properties?

I'm looping through a dictionary of objects constructed from JSON, and I'm creating vertices from them using networkx. The problem I'm experiencing is that some of the JSON object have missing properties, and if I do this:
self.graph.add_node(valueToCheck,
id=self.vertexDict[valueToCheck],
namespace=component["namespace"],
tenant=component["tenant"],
type=component.get("type")+"Component",
artifactFileName=component.get("artifactFileName"),
className=component.get("className"),
userConfig=component.get("userConfig"),
sourceType=component.get("sourceType"),
sinkType=component.get("sinkType"))
then I can't export my graph using nx.write_graphml(..) because some of the vertex properties have the value None (which is the expected output of component.get(..) when the property is missing).
How do I use networkx to construct vertices when some of my properties might be missing in the JSON objects?
Here's what my JSON looks like:
[{'type': 'function',
'namespace': 'campaigns',
'name': 'campaign-record-transformer',
'tenant': 'osp',
'artifactFileName': 'osp-functions-1.1-SNAPSHOT-jar-with-dependencies.jar',
'className': 'com.overstock.dataeng.pulsar.functions.CampaignRecordTransformer',
'inputs': ['persistent://osp/campaigns/campaign-manager'],
'logTopic': 'persistent://osp/logging/pulsar-log-topic',
'output': 'persistent://osp/campaigns/campaign-records'},
{'type': 'function',
'namespace': 'campaignsTest',
'name': 'campaign-metadata-transformer',
'tenant': 'osp',
'artifactFileName': 'osp-functions-1.1-SNAPSHOT-jar-with-dependencies.jar',
'className': 'com.overstock.dataeng.pulsar.functions.CampaignMetadataTransformer',
'logTopic': 'persistent://osp/logging/pulsar-log-topic',
'output': 'persistent://osp/campaigns/campaign-metadata-output'}]
Notice that the inputs property is missing from the second object. In the actual data, there are at least 8 optional properties that can be missing in different combinations, and there are hundreds of objects like this.
I do not have the reputation for a comment, so despite this not being a full answer, I am posting it as such
Have you tried simply excluding the properties that are missing from your add_node step?
That is, instead of providing a key value pair where the value is None, don't provide a key/value pair at all if the key is missing.
You can probably achieve this quite easily by loading your json using python and then just unpacking your component:
components = json.load(...)
for component in components:
self.graph.add_node(value, **component)
See https://docs.python.org/3/tutorial/controlflow.html#unpacking-argument-lists

Work with nested objects using couchdb-python

Disclaimer: Both Python and CouchDB are new for me. So far my "programming" has mostly consisted of Bash scripts.
I'm trying to create a small script that updates objects in a CouchDB database. The objects however aren't created by my script but by an App called Tap Forms that uses CouchDB for sync. Basically I'm trying to automatically update the content of the app. That also means I can't really influence the structure or names of the objects in CouchDB.
The Database is mostly filled with objects of this structure:
{
"_id": "rec-3b17...",
"_rev": "21-cdf6...",
"values": {
"fld-c3d4...": 4,
"fld-1def...": 1000000000000,
"fld-bb44...": 760000000000,
"fld-a44f...": "admin,name",
"fld-5fc0...": "SSD",
"fld-642c...": true,
},
"deviceName": "MacBook Air",
"dateModified": "2019-02-08T14:47:06.051Z",
"dateCreated": "2019-02-08T11:33:00.018Z",
"type": "frm-7ff3...",
"dbID": "db-1435...",
"form": "frm-7ff3..."
}
I shortened the numbers a bit and removed some entries to increase readability.
Now the actual values I'm trying to update are within the "values" : {...} array (or object, or list, guess I don't have much experience with JSON either).
As I know some of these values, I managed to create view that finds the _id of an object on the server. I then use the python-couchdb module as described in documentation:
for item in db.view('CustomViews/test2', key="GENERIC"):
doc = db[item.id]
This gives me the object. However I want to update one of the values within the values array, lets say fld-c3d4.... But how? Using doc['values'] = 'new_value' updates the whole array. I tried other (seemingly logical) ways along the lines of doc['values['fld-c3d4']'] = 'new_value' but couldn't wrap my head around it. I couldn't find an example in any documentation.
So here's a example how to update the fld-c3d4.
You have your document that represent a dictionary with nested dictionary.
If you want to get the values, you will do something like this:
values = doc['values']
Now the variable values points to the values in your document.
From there, you can access a sub value:
values['fld-c3d4'] = 'new value'
If you want to directly update the value from the doc, you just have to chain those operations:
doc['values']['fld-c3d4'] = 'new value'

How to make Chatfuel read JSON file stored in Zapier?

In my Chatfuel block I collect a {{user input}} and POST a JSON in a Zapier webhook. So far so good. After that, my local Pyhon reads this JSON from Zapier storage successfully
url = 'https://store.zapier.com/api/records?secret=password'
response = urllib.request.urlopen(url).read().decode('utf-8')
data = json.loads(response)
and analyze it generating another JSON as output:
json0={
"messages": [
{"text": analysis_output}]
}
Then Python3 posts this JSON in a GET webhook in Zapier:
import requests
r = requests.post('https://hooks.zapier.com/hooks/catch/2843360/8sx1xl/', json=json0)
r.status_code
Zapier Webhook successfully gets the JSON and sends it to Storage.
Key-Value pairs are set and then Chatfuel tries to read from storage:
GET https://store.zapier.com/api/records?secret=password2
But the JSON structure obtained is wrong, what was verified with this code:
url = 'https://store.zapier.com/api/records?secret=password2'
response = urllib.request.urlopen(url).read().decode('utf-8')
data = json.loads(response)
data
that returns:
{'messages': "text: Didn't know I could order several items"}
when the right one for Chatfuel to work should be:
{'messages': [{"text: Didn't know I could order several items"}]}
That is, there are two mais problems:
1) There is a missing " { [ " in the JSON
2) The JSON is appending new information to the existing one, instead of generating a brand new JSON, what cause the JSON to have 5 different parts.
I am looking for possible solutions for this issue.
David here, from the Zapier Platform team.
First off, you don't need quotes around your keys, we take care of that for you. Currently, your json will look like:
{ "'messages'": { "'text'": "<DATA FROM STEP 1>" } }
So the first change is to take out those.
Next, if you want to store an array, use the Push Value Onto List action instead. It takes a top-level key and stores your values in a key in that object called list. Given the following setup:
The resulting structure in JSON is
{ "demo": {"list": [ "5" ]} }
It seems like you want to store an extra level down; an array of json objects:
[ { "text": "this is text" } ]
That's not supported out of the box, as all list items are stored as strings. You can store json strings though, and parse them back into an object when you need to access them like an object!
Does that answer your question?

What is the best way to search millions of JSON files?

I've very recently picked up programming in Python and am working on creating a database.
I've already worked out extracting all these files from their source so they are all in a directory on my computer.
All of these files are structured the same way and what I want to do is search these multidimensional dictionaries and locate the value for a specific set of keys.
These json files are all structured similarly,
{
"userid": 34535367,
"result": {
"list": [
{
"name": 264,
"age": 64,
"id": 456345345
},
{
"name": 263,
"age": 42,
"id": 364563463456
}
]
}
}
In my case, I would like to search for the "name" key and return the relevant data(quality, id and the original userid) for the thousands of names just like it from my millions of JSON files.
Basically I'm very new at this and the little programming knowledge I have is in Python. I'm happy to start learning whatever I need to, but I'm not sure which direction to go.
If your goal is to create a database, then you should look on how databases work and solve the same problem you are trying to solve right now :)
NoSQL databases (like mangodb) work also with json documents and implements most likely a whole set of tools to search and filter documents.
Now to answer your question, there is no quick way to do so unless you do some preprocessing, meaning that you store different information about the data (called metadata).
This is a huge subject and I don't have enough expertise to give you all the answers, but I can give you a simple tip: Use indexes.
An index is a sorted key/value map where for every value, we store the documents that contains that value (or the file + position of the Json document) . For example an index for the name property would like this:
{
263: ('jsonfile10.json', '0')
264: ('jsonfile10.json', '30'),
# The json document can be found on the jsonfile10.json file on line 30
}
By keeping an index for the most queried values, you can turn a linear time search into a logarithmic time search not to mention that inserting a new document is much faster. in your case, you seems to only need an index on the name field.
Creating/updating the index is done when you insert, update or remove a document. Using a balanced binary tree can accelerate the updates on the index.
As a suggestion, why don't you just process all the incoming files and insert the data into a database? You will have a toolset to query that database. SQLite for example will do (as well as any other more sophisticated database):
http://www.sqlite.org/
http://docs.python.org/2/library/sqlite3.html
Simple other solution might be to build a file mapping name_id to /file/path. Then you can logarithmically do a binary search by the name id. But I'd still advise using a proper database as maintaining the index will be more cumbersome than doing some inserts/deletes.

Categories

Resources