Extracting data from nested JSON using python

Extracting data from nested JSON using python - python

I am hoping someone can help me solve this problem I am having with a nested JSON response. I have been trying to crack this for a few weeks now with no success.
Using a sites API I am trying to create a dictionary which can hold three pieces of information, for each user, extracted from the JSON responses. The first JSON response holds the users uid and crmid that I require.
The API comes back with a large JSON response, with an object for each account. An extract of this for a single account can be seen below:
{
'uid': 10,
'key':
'[
N#839374',
'customerUid': 11,
'selfPaid': True,
'billCycleAllocationMethodUid': 1,
'stmtPaidForAccount': False,
'accountInvoiceDeliveryMethodUid': 1,
'payerAccountUid': 0,
'countryCode': None,
'currencyCode': 'GBP',
'languageCode': 'en',
'customFields':
{
'field':
[{
'name': 'CRMID',
'value': '11001'
}
]
},
'consentDetails': [],
'href': '/accounts/10'}
I have made a loop which extracts each UID for each account:
get_accounts = requests.get('https://website.com/api/v1/accounts?access_token=' + auth_key)
all_account_info = get_accounts.json()
account_info = all_account_info['resource']
account_information = {}
for account in account_info:
account_uid = account['uid']
I am now trying to extract the CRMID value, in this case '11001': {'name': 'CRMID', 'value': '11001'}.
I have been struggling all week to make this work, I have two problems:
I would like to extract the UID (which I have done) and the CRMID from the deeply nested 'customFields' dictionary in the JSON response. I manage to get as far as ['key'][0], but I am not sure how to access the next dictionary that is nested in the list.
I would like to store this information in a dictionary in the format below:
{'accounts': [{'uid': 10, 'crmid': 11001, 'amount': ['bill': 4027]}{'uid': 11, 'crmid': 11002, 'amount': ['bill': 1054]}]}
(The 'bill' information is going to come from a separate JSON response.)
My problem is, with every loop I design the dictionary seems to only hold one account/the last account it loops over. I cant figure out a way to append to the dictionary instead of overwrite whilst using a loop. If anyone has a useful link on how to do this it would be much appreciated.
My end goal is to have a single dictionary which holds the three pieces of information for each account (uid, crmid, bill). I'm then going to export this into a CSV document.
Any help, guidance, useful links etc would be much appreciated.

In regards to question 1, it may be helpful to print each level as you go down, then try and work out how to access the object you are returned at that level. If it is an array it will using number notation like [0] and if it is a dictionary it will use key notation like ['key']
Regarding question 2, your dictionary needs unique keys. You are probably looping over and replacing the whole thing each time.
The final structure you suggest is a bit off, imagine it as:
accounts: {
'10': {
'uid': '10',
'crmid': 11001,
'amount': {
'bill': 4027
}
},
'11': {
'uid': '11',
'crmid': 11011,
'amount': {
'bill': 4028
}
}
}
etc.
So you can access accounts['10']['crmid'] or accounts['10']['amount']['bill'] for example.

Related

Can I loop an array inside a dictionary definition in python?

I'm trying to push data to Firebase and I was able to loop through an array and push the information on each loop.
But I need to add some pictures in this (So it'd be like looping an array inside a dictionary definition). I have all the links in an array.
This is my code so far.
def image_upload():
for i in range(len(excel.name)):
doc_ref = db.collection('plans').document('doc-name').collection('collection-name').document()
doc_id = doc_ref.id
data = {
'bedroomLocation': excel.bedroomLocation[i],
'bedrooms': excel.bedrooms[i],
'brand': excel.brand[i],
'catalog': excel.catalog[i],
'category': excel.category[i],
'code': excel.code[i],
'depth': excel.depth[i],
'description': excel.description[i],
'fullBaths': excel.fullBaths[i],
'garage': excel.garage[i],
'garageBays': excel.garageBays[i],
'garageLocation': excel.garageLocation[i],
'garageType': excel.garageType[i],
'date': firestore.SERVER_TIMESTAMP,
'id': doc_id,
'halfBaths': excel.halfBaths[i],
'laundryLocation': excel.laundryLocation[i],
'name': excel.name[i],
'onCreated': excel.onCreated[i],
'productType': excel.productType[i],
'region': excel.region[i],
'sqf': excel.sqf[i],
'state': excel.state[i],
'stories': excel.stories[i],
'tags': [excel.tags[i]],
'width': excel.width[i],
}
doc_ref.set(data)
That works fine, but I don't really know how to loop through the array of links.
This is what I tried below the block I copied above.
for j in range(len(excel.gallery)):
if len(excel.gallery[j]) != 0:
for k in range(len(excel.gallery[j])):
data['gallery'] = firestore.ArrayUnion(
[{'name': excel.gallery[j][k][0], 'type': excel.gallery[j][k][1],
'url': excel.gallery[j][k][2]}])
print(data)
doc_ref.set(data)
len(excel.gallery) has the same length as len(excel.name)
each j position has different amount of links though.
If I declare the gallery inside the data definition and I use ArrayUnion and pre define more than one piece of information it works fine, but I need to use that array to push information to Firebase.

excel.gallery is a matrix actually, is not a dictionary. And this is one of the example outputs for this [[['Images/CFK_0004-Craftmark_Oakmont_Elevation_1.jpeg', 'Elevation', 'url example from firebase'], .... and it goes on for each file. I'm testing with 8 images and 2 plans. So my matrix is 2x4 in this case. But it can happen that in a position there won't be any files if none match. What I'm looking for is add to the data before it is pushed (or after it doesn't matter the order) all the urls for that plan.
This works:
'gallery': firestore.ArrayUnion(
[{'name': 'Example Name', 'type': 'Elevation',
'url': 'Example url'},
{'name': 'Example Name2', 'type': 'First Floor',
'url': 'Example url2'}])
But I need to populate that field looping through excel.gallery

I'm a little confused by when you say, "I have all the links in an array". Could we see that array and what kind of output you are looking for?
Also assuming that excel.gallery is a dictionary you could clean up your code substantially using the items() function.
for j, k in excel.gallery.items():
if k:
data['gallery'] = firestore.ArrayUnion([{'name': k[0], 'type': k[1], 'url': k[2]}])
print(data)
doc_ref.set(data)

Filter within Mongo embedded document by range

I have a mongo collection that looks like this:
{
'_id': '...',
'friends': {
'id1': {'name': 'john', 'dateAdded': ISODate(...)},
'id2': {'name': 'joe', 'dateAdded': ISODate(...)},
...
}
}
I will like to filter the collection by friends attribute dateAdded without changing the collection model.
Is there an operator that makes filtering inside an embedded document dictionary possible?
Where the query will look like this:
self.collection.find({
'friends.$operator.dateAdded': {
'$gte': datetime.datetime(2000, 1, 1),
'$lte': datetime.datetime(2001, 9, 1)
}
})

There is no operator to perform what you are asking. In (very) simplistic terms, think of MongoDB as key/value store; you need to know the key in order to determine the value.
If you have the option to, refactor your schema so that friends is an array and lose the ids. In general it is poor design to have arbitrary-named keys such as id[n].

Python: Nested For Loop Overwriting Entire Dictionary When Looping Through List

Goal:
I'm trying to iterate on a copy of a nested dictionary (based on a simple JSON schema) to build individual JSON payloads for a web server requests that represents a team and its members.
Each payload is sourced from a dictionary outside of the loop containing the team as a key and the id of its users as values.
Issue:
I'm able to successfully copy the source dictionary and create the team dictionary including its 1st member, but on the 2nd iteration of the list to add additional members the first member is overwritten instead of the 2nd being added to the dictionary payload
This is my first time working with nested dictionaries so any tips would be highly appreciated.
# source dictionary
teams_dict = {'Boston':['1234','5678'],
'Atlanta':['9876','4321']}
# schema to be modified
payload_schema = {"data":
{"id":None,"type":"teams","attributes":
{"name":None},"relationships":
{"members":{"data":[{"id":None,"type":"users"}]}}}}
# loop
for team, members in teams_dict.items():
team_load = deepcopy(payload_schema)
team_load['data']['attributes']['name']=team
#print(f"Now creating team {team}")
for member in members:
team_load['data']['relationships']['members']['data'][0]['id']=member
team_load['data']['relationships']['members']['data'][0]['type']='users'
print(team_load)
#print(f"Added user id {member} to payload")
I end up with a payload only containing the 2nd member since the first is overwritten:
print(team_load)
{'data': {'id': None, 'type': 'teams', 'attributes': {'name': 'Atlanta'}, 'relationships': {'members': {'data': [{'id': '4321', 'type': 'users'}]}}}}
Ideally it would look like this:
print(team_load)
{'data': {'id': None, 'type': 'teams', 'attributes': {'name':'Atlanta'}, 'relationships': {'members': {'data': [{'id': '9876','type': 'users'},{'id': '4321','type': 'users'}]}}}}

The problem is that you're always writing to index 0 with this:
team_load['data']['relationships']['members']['data'][0]['id']=member
team_load['data']['relationships']['members']['data'][0]['type']='users'
this is a list:
team_load['data']['relationships']['members']['data']
so you need to append to it each time.
Since you're dealing with nested objects, I'd make the member info another object and remove it from the payload schema:
payload_schema = {"data":
{"id":None,"type":"teams","attributes":
{"name":None},"relationships":
{"members":{"data":[]}}}}
member_schema = {"id":None,"type":"users"}
Then in the inner loop:
for member in members:
member_load = deepcopy(member_schema)
member_load['id']=member
team_load['data']['relationships']['members']['data'].append(member_load)
print(team_load)
you don't need to set the type to "users" since it's already set in the schema, but you could set it to a different value if you wanted to.
Hope this helps!

Python Object ID

I would like create a object ID in python, I explain:
I know that exist mysql, sqlite, mongoDB, etc... But I would like at least create a object ID for store data in json.
Before I was putting the json info inside of a list and the ID was the index of this json in the list, for example:
data = [{"name": userName}]
data[0]["id"] = len(data) - 1
Then I realize that was wrong and obviously dont look like objectID, then I thought in that the ID can be the Date and Time together, but I thought was wrong too, so, I would like know the best way for make like a objectID, that represent this json inside the list. this list will be more longer, is for users or clients (is just a personal project). And how can be a example of a method for create the ID
Thanks so much, hope I explained good.

If you just want to create a locally unique value, you could use a really simple autoincrement approach.
data = [
{"name": "Bill"},
{"name": "Javier"},
{"name": "Jane"},
{"name": "Xi"},
{"name": "Nosferatu"},
]
current_id = 1
for record in data:
record["id"] = current_id
current_id += 1
print(record)
# {'name': 'Bill', 'id': 1}
# {'name': 'Javier', 'id': 2}
# {'name': 'Jane', 'id': 3}
# {'name': 'Xi', 'id': 4}
# {'name': 'Nosferatu', 'id': 5}
To add a new value if you're not initializing like this, you can get the last one with max(d.get("id", 0) for d in data).
This may cause various problems depending on your use case. If you don't want to worry about that, you could also throw UUIDs at it; they're heavier but easy to generate with reasonable confidence of uniqueness.
from uuid import uuid4
data = [{"name": "Conan the Librarian"}]
data[0]["id"] = str(uuid4())
print(data)
# 'id' will be different each time; example:
# [{'name': 'Conan the Librarian', 'id': '85bb4db9-c450-46e3-a027-cb573a09f3e3'}]
Without knowing your actual use case, though, it's impossible to say whether one, either, or both of these approaches would be useful or appropriate.

Python 2.7: Removing a dict in a list if a key is missing or empty

The list in question looks something like this, just a list of Blogs with any Posts if applicable:
blogs = {
{
'id': 1,
'title': 'Foodies',
'posts': {
{ 'id': 28, 'title': 'Sourdough Bread starter', 'blog_id': 1},
{ 'id': 64, 'title': 'How to make brioche in under 4 hours', 'blog_id': 1}
}
},{
'id': 2,
'title': 'Southern Meals',
'posts': {}
},{
'id': 3,
'title': 'Vegomamma'
},{
'id': 4,
'title': 'Culinare'
}
}
I only want the Blogs with Posts, so I'm trying to reduce the list so that I would only get the first dict returned.
Here's what I tried, which threw the error:
"'dict' object has no attribute 'posts'"
Which I understand, but I am trying to remove dicts without that attribute.
for b in blogs:
if "posts" not in b or b.posts.count() == 0:
blogs.remove(b)
Why does this fail? It seemed like a pretty simple solution, and I've used it before.
This app is built in Python and Angular, so I could do the filtering in Angular, but I'd rather take care of it in Python.
EDIT Added the exact error message.

First off all your data structure is not a valid python object. You have a set contain dictionaries that are not hashable objects (will raise TypeError). Based on your question body It seems that it's a list. Secondly, you don't need to check the existence of posts within the dictionary you can jsut use dict.get() within a list comprehension. get method returns None if the key is missing:
In [20]: [b for b in blogs if b.get('posts')]
Out[20]:
[{'posts': [{'id': 28, 'title': 'Sourdough Bread starter', 'blog_id': 1},
{'id': 64, 'title': 'How to make brioche in under 4 hours', 'blog_id': 1}],
'id': 1,
'title': 'Foodies'}]
Also, note that since the post is supposed to be an iterable the validation check will failed if it's empty (it evaluated as False). That's why I just used if b.get('posts').

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting data from nested JSON using python - python

Related

Can I loop an array inside a dictionary definition in python?

Filter within Mongo embedded document by range

Python: Nested For Loop Overwriting Entire Dictionary When Looping Through List

Python Object ID

Python 2.7: Removing a dict in a list if a key is missing or empty

Categories

Resources