Minimizing and aggregating a multi-level dictionary

Minimizing and aggregating a multi-level dictionary - python

Given a multi-level dictionary (#levels are unknown beforehand), I want to modify this dictionary to be no more than 3 level.
For example, below you can see a 5 level dictionary as an input:
{ K11:
{K21:
{ K31:
{ K41: VAL41,
K42: VAL42
},
K32: VAL32,
K33:
{K43:
{K51:V51}
}
}
}
}
The desired output is the following 3 level dict:
{ K11:
{K21:
{ K31.K41: VAL41,
K31.K42: VAL42,
K32: VAL32,
K33.K43.K51: V51
}
}
}
Basically starting from level 4+, I want to combine the keys all together and assign them at level 3 (Lets assume that combinations of these keys are always unique ...)
Any idea how to implement such method? I'm trying to implement a recursive function that will keep digging to last level, and then somehow to rebuild the dictionary backwards - however so far no success.
I'll appreciate if you can share your thoughts, thanks!

Try using json_normalize in pandas like this
from pandas import json_normalize
d = {'K11':
{'K21':
{'K31':
{'K41': 'VAL41',
'K42': 'VAL42'
},
'K32': 'VAL32',
'K33':
{'K43':
{'K51': 'V51'}
}
}
}
}
print(json_normalize(d['K11']['K21'], max_level=2).to_dict('records'))

Related

How do I update fields for multiple elements in an array with different values in MongoDB?

I have data of the form:
{
'_id': asdf123b51234
'field2': 0
'array': [
0: {
'unique_array_elem_id': id
'nested_field': {
'new_field_i_want_to_add': value
}
}
...
]
}
I have been trying to update like this:
for doc in update_dict:
collection.find_one_and_update(
{'_id':doc['_id']},
{'$set': {
'array.$[elem].nested_field.new_field_i_want_to_add':doc['new_field_value']
}
},
array_filters=[{'elem.unique_array_elem_id':doc['unique_array_elem_id']}]
But it is painfully slow. Updating all of my data will take several days running continuously. Is there a way to update this nested field for all array elements for a given document at once?
Thanks a lot

Using pandas.json_normalize to "unfold" a dictionary of a list of dictionaries

I am new to Python (and coding in general) so I'll do my best to explain the challenge I'm trying to work through.
I'm working with a large dataset which was exported as a CSV from a database. However, there is one column within this CSV export that contains a nested list of dictionaries (as best as I can tell). I've looked around extensively online for a solution, including on Stackoverflow, but haven't quite gotten a full solution. I think I understand conceptually what I'm trying to accomplish, but not clear as to the best method or data prepping process to use.
Here is an example of the data (pared down to just the two columns I'm interested in):
{
"app_ID": {
"0": 1abe23574,
"1": 4gbn21096
},
"locations": {
"0": "[ {"loc_id" : "abc1", "lat" : "12.3456", "long" : "101.9876"
},
{"loc_id" : "abc2", "lat" : "45.7890", "long" : "102.6543"}
]",
"1": "[ ]",
]"
}
}
Basically each app_ID can have multiple locations tied to a single ID, or it can be empty as seen above. I have attempted using some guides I found online using Panda's json_normalize() function to "unfold" or get the list of dictionaries into their own rows in a Panda dataframe.
I'd like to end up with something like this:
loc_id lat long app_ID
abc1 12.3456 101.9876 1abe23574
abc1 45.7890 102.6543 1abe23574
etc...
I am learning about how to use the different functions of json_normalize, like "record_path" and "meta", but haven't been able to get it to work yet.
I have tried loading the json file into a Jupyter Notebook using:
with open('location_json.json', 'r') as f:
data = json.loads(f.read())
df = pd.json_normalize(data, record_path = ['locations'])
but it only creates a dataframe that is 1 row and multiple columns long, where I'd like to have multiple rows generated from the inner-most dictionary that tie back to the app_ID and loc_ID fields.
Attempt at a solution:
I was able to get close to the dataframe format I wanted using:
with open('location_json.json', 'r') as f:
data = json.loads(f.read())
df = pd.json_normalize(data['locations']['0'])
but that would then require some kind of iteration through the list in order to create a dataframe, and then I'd lose the connection to the app_ID fields. (As best as I can understand how the json_normalize function works).
Am I on the right track trying to use json_normalize, or should I start over again and try a different route? Any advice or guidance would be greatly appreciated.

I can't say that suggesting you using convtools library is a good thing since you are a beginner, because this library is almost like another Python over the Python. It helps to dynamically define data conversions (generating Python code under the hood).
But anyway, here is the code if I understood the input data right:
import json
from convtools import conversion as c
data = {
"app_ID": {"0": "1abe23574", "1": "4gbn21096"},
"locations": {
"0": """[ {"loc_id" : "abc1", "lat" : "12.3456", "long" : "101.9876" },
{"loc_id" : "abc2", "lat" : "45.7890", "long" : "102.6543"} ]""",
"1": "[ ]",
},
}
# define it once and use multiple times
converter = (
c.join(
# converts "app_ID" data to iterable of dicts
(
c.item("app_ID")
.call_method("items")
.iter({"id": c.item(0), "app_id": c.item(1)})
),
# converts "locations" data to iterable of dicts,
# where each id like "0" is zipped to each location.
# the result is iterable of dicts like {"id": "0", "loc": {"loc_id": ... }}
(
c.item("locations")
.call_method("items")
.iter(
c.zip(id=c.repeat(c.item(0)), loc=c.item(1).pipe(json.loads))
)
.flatten()
),
# join on "id"
c.LEFT.item("id") == c.RIGHT.item("id"),
how="full",
)
# process results, where 0 index is LEFT item, 1 index is the RIGHT one
.iter(
{
"loc_id": c.item(1, "loc", "loc_id", default=None),
"lat": c.item(1, "loc", "lat", default=None),
"long": c.item(1, "loc", "long", default=None),
"app_id": c.item(0, "app_id"),
}
)
.as_type(list)
.gen_converter()
)
result = converter(data)
assert result == [
{'loc_id': 'abc1', 'lat': '12.3456', 'long': '101.9876', 'app_id': '1abe23574'},
{'loc_id': 'abc2', 'lat': '45.7890', 'long': '102.6543', 'app_id': '1abe23574'},
{'loc_id': None, 'lat': None, 'long': None, 'app_id': '4gbn21096'}
]

Deeply nested json - a list within a dictionary to Pandas DataFrame

I'm trying to parse nested json results.
data = {
"results": [
{
"components": [
{
"times": {
"periods": [
{
"fromDayOfWeek": 0,
"fromHour": 12,
"fromMinute": 0,
"toDayOfWeek": 4,
"toHour": 21,
"toMinute": 0,
"id": 156589,
"periodId": 20855
}
],
}
}
],
}
],
}
I can get to and create dataframes for "results" and "components" lists, but cannot get to "periods" due to the "times" dict. So far I have this:
df = pd.json_normalize(data, record_path = ['results','components'])
Need a separate "periods" dataframe with the included column names and values. Would appreciate your help on this. Thank you!

I results
II components
III times
IIII periods
The normalize should be correct way:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html
There is 4 level of nesting. There can be x components in results and y times in components - however that type of nesting is overengineering?
The simplest way of getting data is:
print data['a']['b']['c']['d'] (...)
in your case:
print data['results']['components']['times']['periods']
You can access the specific nested level by this piece of code:
def GetPropertyFromPeriods (property):
propertyList = []
for x in data['results']['components']['times']:
singleProperty = photoURL['periods'][property]
propertyList.append(singleProperty)
return propertyList
This give you access to one property inside periods (fromDayOfWeek, fromHour, fromMinute)
After coverting json value, transform it into pandas dataframe:
print pd.DataFrame(data, columns=["columnA", "columnB”])
If stuck:
How to Create a table with data from JSON output in Python
Python - How to convert JSON File to Dataframe
pandas documentation:
pandas.DataFrame.from_dict
pandas.json_normalize

Mongodb aggregate query with condition

I have to perform aggregate on mongodb in python and unable to do so.
Below is the structure of mongodb document extracted:
{'Category': 'Male',
'details' :[{'name':'Sachin','height': 6},
{'name':'Rohit','height': 5.6},
{'name':'Virat','height': 5}
]
}
I want to return the height where name is Sachin by the aggregate function. Basically my idea is to extract data by $match apply condition and aggregate at the same time with aggregate function. This can be easily done by doing in 3 steps with if statements but i'm looking to do in 1 aggregate function.
Please note: there is not fixed length of 'details' value.
Let me know if any more explanation is needed.

You can do a $filter to achieve
db.collection.aggregate([
{
$project: {
details: {
$filter: {
input: "$details",
cond: {
$eq: [
"$$this.name",
"Sachin"
]
}
}
}
}
}
])
Working Mongo playground
If you use in find, but you need to be aware of positional operator
db.collection.find({
"details.name": "Sachin"
},
{
"details.$": 1
})
Working Mongo playground
If you need to make it as object, you can simply use $arrayElemAr with $ifNull

Pandas length miss match, handling json with different length

I have multiple JSON that i am trying to parse using pandas and populate that data in table but due to different json ouputs i am facing "Length missmatch issue"
I have two jsons.
Json 1
{
"extract":{
"details":{
"name":"John Smith",
"region":null,
"add":"56 Street",
"state":ZL,
"exam":{
"lastexam":null
}
}
}
}
Json 2
{
"extract":{
"details":{
"name":"Will Smith",
"region":Jonsberg,
"add":"3rd Street",
"state":TO,
"exam":{
"lastexam":{
"examnumber":"6789",
"subject_name":"Chemistry",
"exam_time":"2020-03-06T20:21:22"
}
}
}
}
}
What i am looking for use dataframes and parse and populate table like following using pandas
**Name,region,add,state,exam_number,subject_name,exam_time**
John Smith,null,56 Street,ZL,null,null,null
Will Smith,Jonsberg,3rd street,TO,6789,Chemistry,2020-03-06 20:21:22
I am able to extract available column but how do achieve and form a a dataframe that will consider all the columns and populate null if that column does not exists in json.
How do i achieve this using pandas ?

see if pandas' json_normalize works for you :
from pandas import json_normalize
pd.concat((json_normalize(json1), json_normalize(json2)))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Minimizing and aggregating a multi-level dictionary - python

Try using json_normalize in pandas like this from pandas import json_normalize d = {'K11': {'K21': {'K31': {'K41': 'VAL41', 'K42': 'VAL42' }, 'K32': 'VAL32', 'K33': {'K43': {'K51': 'V51'} } } } } print(json_normalize(d['K11']['K21'], max_level=2).to_dict('records'))

Related

How do I update fields for multiple elements in an array with different values in MongoDB?

Using pandas.json_normalize to "unfold" a dictionary of a list of dictionaries

Deeply nested json - a list within a dictionary to Pandas DataFrame

Mongodb aggregate query with condition

Pandas length miss match, handling json with different length

Categories

Resources