List comprehension and flattening deep data structure

List comprehension and flattening deep data structure - python

Lets say I have a data structure that looks like the following (this is greatly simplified, my actual data has a significant amount of day-specific data for each job on each date):
data = {
'2019-01-01': {
'job-1-id': {'name': 'Job 1', 'address': '123 main st.'},
'job-2-id': {'name': 'Job 2', 'address': '824 1st Ave.'},
},
'2019-01-02': {
'job-1-id': {'name': 'Job 1', 'address': '123 main st.'},
'job-3-id': {'name': 'Job 3', 'address': '485 Pleasant Rd.'}
}
}
What I would like to do is flatten this, pushing the date and job id to an array of objects. E.g.:
data_flat = [
{'id': 'job-1-id', 'date': '2019-01-01', 'name': 'Job 1', 'address': '123 main st.'},
{'id': 'job-2-id', 'date': '2019-01-01', 'name': 'Job 2', 'address': '824 1st Ave.'},
{'id': 'job-1-id', 'date': '2019-01-02', 'name': 'Job 1', 'address': '123 main st.'},
{'id': 'job-3-id', 'date': '2019-01-02', 'name': 'Job 3', 'address': '485 Pleasant Rd.'},
]
Obviously I can look over and build a new array:
data_flat = []
for date, jobs in data.items():
for job_id, job in jobs.items():
data_flat.append({'id': job_id, 'date': date, etc...})
But is there a more pythonic/efficient way to do this using list comprehension with nested data like this? About all I can think of is using list comprehension for the inner loop, and then instead of appending, using extend to build the list. Thoughts?

A possible list comprehension solution might be as follows, where we unpack job dictionary and add id and date key-value pair to it, while iterating over the two for loops
[{**job, 'id': job_id, 'date': date} for date, jobs in data.items() for job_id, job in jobs.items()]
Which in traditional for-loop, looks like
for date, jobs in data.items():
for job_id, job in jobs.items():
data_flat.append({**job, 'id': job_id, 'date': date})

flat list can be made through reduce easily.
All you need to use initializer - third argument in the reduce function.
reduce(
lambda _list, date: _list.extend(
{'date': date, 'id':_id, **detail} for _id, detail in data[date].items()) or _list,
data,
[])
Above code works for both python2 and python3, but you need to import reduce module as from functools import reduce. Refer below link for details.
for python2
for python3

Related

Using json_normalize() for missing keys Python Pandas DataFrame

I have this snapshot of my dataset
test={'data': [{'name': 'john',
'insights': {'data': [{'account_id': '123',
'test_id': '456',
'date_start': '2022-12-31',
'date_stop': '2023-01-29',
'impressions': '4070',
'spend': '36.14'}],
'paging': {'cursors': {'before': 'MAZDZD', 'after': 'MAZDZD'}}},
'status': 'ACTIVE',
'id': '789'},
{'name': 'jack', 'status': 'PAUSED', 'id': '420'}]
}
I want to create a pandas dataframe where the columns are the name, date_start, date_stop, impressions, and spend.
When I tried json_normalize(), it raises an error because some of the keys are missing, when 'status':'PAUSED'. Is there a way to remove the values when the keys are missing from the list or another way of using json_normalize()? I tried errors='ignore' but it doesnt work as well.

Comparing values of two dictionary's items

I need to compare the values of the items in two different dictionaries.
Let's say that dictionary RawData has items that represent phone numbers and number names.
Rawdata for example has items like: {'name': 'Customer Service', 'number': '123987546'} {'name': 'Switchboard', 'number': '48621364'}
Now, I got dictionary FilteredData, which already contains some items from RawData: {'name': 'IT-support', 'number': '32136994'} {'name': 'Company Customer Service', 'number': '123987546'}
As you can see, Customer Service and Company Customer Service both have the same values, but different keys. In my project, there might be hundreds of similar duplicates, and we only want unique numbers to end up in FilteredData.
FilteredData is what we will be using later in the code, and RawData will be discarded.
Their names(keys) can be close duplicates, but not their numbers(values)**
There are two ways to do this.
A. Remove the duplicate items in RawData, before appending them into FilteredData.
B. Append them into FilteredData, and go through the numbers(values) there, removing the duplicates. Can I use a set here to do that? It would work on a list, obviously.
I'm not looking for the most time-efficient solution. I'd like the most simple and easy to learn one, if and when someone takes over my job someday. In my project it's mandatory for the next guy working on the code to get a quick grip of it.
I've already looked at sets, and tried to face the problem by nesting two for loops, but something tells me there gotta be an easier way.
Of course I might have missed the obvious solution here.
Thanks in advance!

I hope I understands your problem here:
data = [{'name': 'Customer Service', 'number': '123987546'}, {'name': 'Switchboard', 'number': '48621364'}]
newdata = [{'name': 'IT-support', 'number': '32136994'}, {'name': 'Company Customer Service', 'number': '123987546'}]
def main():
numbers = set()
for entry in data:
numbers.add(entry['number'])
for entry in newdata:
if entry['number'] not in numbers:
data.append(entry)
print data
main()
Output:
[{'name': 'Customer Service', 'number': '123987546'},
{'name': 'Switchboard', 'number': '48621364'},
{'name': 'IT-support', 'number': '32136994'}]

What you can do is take a dict.values(), create a set of those to remove duplicates and then go through the old dictionary and find the first key with that value and add it to a new one. Keep the set around because when you get the next dict entry, try adding the element to that set and see if the length of the set is longer that before adding it. If it is, it's a unique element and you can add it to the dict.

If you're willing on changing how FilteredData is currently, you can just use a dict and use the number as your key:
RawData = [
{'name': 'Customer Service', 'number': '123987546'},
{'name': 'Switchboard', 'number': '48621364'}
]
# Change how FilteredData is structured
FilteredDataMap = {
'32136994':
{'name': 'IT-support', 'number': '32136994'},
'123987546':
{'name': 'Company Customer Service', 'number': '123987546'}
}
for item in RawData:
number = item.get('number')
if number not in FilteredDataMap:
FilteredDataMap[number] = item
# If you need the list of items
FilteredData = list(FilteredDataMap.values())
You can just pull the actual list from the Map using .values()

I take the numbers are unique. Then, another solution would be taking advantage of the uniqueness of dictionary keys. This means converting each list of dictionary to a dictionary of 'number:name' pairs. Then, you simple need to update RawData with FilteredData.
RawData = [
{'name': 'Customer Service', 'number': '123987546'},
{'name': 'Switchboard', 'number': '48621364'}
]
FilteredData = [
{'name': 'IT-support', 'number': '32136994'},
{'name': 'Company Customer Service', 'number': '123987546'}
]
def convert_list(input_list):
return {item['number']:item['name'] for item in input_list}
def unconvert_dict(input_dict):
return [{'name':val, 'number': key} for key, val in input_dict.items()]
NewRawData = convert_list(RawData)
NewFilteredData = conver_list(FilteredData)
DesiredResultConverted = NewRawData.update(NewFilteredData)
DesuredResult = unconvert_dict(DesiredResultConverted)
In this example, the variables will have the following values:
NewRawData = {'123987546':'Customer Service', '48621364': 'Switchboard'}
NewFilteredData = {'32136994': 'IT-support', '123987546': 'Company Customer Service'}
When you update NewRawData with NewFilteredData, Company Customer Service will overwrite Customer Service as the value associated with the key 123987546. So,
DesiredResultConverted = {'123987546':'Company Customer Service', '48621364': 'Switchboard', '32136994': 'IT-support'}
Then, if you still prefer the original format, you can "unconvert" back.

Filtering through a list with embedded dictionaries

I've got a json format list with some dictionaries within each list, it looks like the following:
[{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
The amount of entries within the list can be up to 100. I plan to present the 'name' for each entry, one result at a time, for those that have London as a town. The rest are of no use to me. I'm a beginner at python so I would appreciate a suggestion in how to go about this efficiently. I initially thought it would be best to remove all entries that don't have London and then I can go through them one by one.
I also wondered if it might be quicker to not filter but to cycle through the entire json and select the names of entries that have the town as London.

You can use filter:
data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
london_dicts = filter(lambda d: d['venue']['town'] == 'London', data)
for d in london_dicts:
print(d)
This is as efficient as it can get because:
The loop is written in C (in case of CPython)
filter returns an iterator (in Python 3), which means that the results are loaded to memory one by one as required

One way is to use list comprehension:
>>> data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
>>> [d for d in data if d['venue']['town'] == 'London']
[{'id': 17,
'name': 'Alfred',
'venue': {'id': 456, 'town': 'London'},
'month': 'February'},
{'id': 17,
'name': 'Mary',
'venue': {'id': 56, 'town': 'London'},
'month': 'December'}]

How to create a dict of dicts from pandas dataframe?

I have a dataframe df
id price date zipcode
u734 8923944 2017-01-05 AERIU87
uh72 9084582 2017-07-28 BJDHEU3
u029 299433 2017-09-31 038ZJKE
I want to create a dictionary with the following structure
{'id': xxx, 'data': {'price': xxx, 'date': xxx, 'zipcode': xxx}}
What I have done so far
ids = df['id']
prices = df['price']
dates = df['date']
zips = df['zipcode']
d = {'id':idx, 'data':{'price':p, 'date':d, 'zipcode':z} for idx,p,d,z in zip(ids,prices,dates,zips)}
>>> SyntaxError: invalid syntax
but I get the error above.
What would be the correct way to do this, using either
list comprehension
OR
pandas .to_dict()
bonus points: what is the complexity of the algorithm, and is there a more efficient way to do this?

I'd suggest the list comprehension.
v = df.pop('id')
data = [
{'id' : i, 'data' : j}
for i, j in zip(v, df.to_dict(orient='records'))
]
Or a compact version,
data = [dict(id=i, data=j) for i, j in zip(df.pop('id'), df.to_dict(orient='r'))]
Note that, if you're popping id inside the expression, it has to be the first argument to zip.
print(data)
[{'data': {'date': '2017-09-31',
'price': 299433,
'zipcode': '038ZJKE'},
'id': 'u029'},
{'data': {'date': '2017-01-05',
'price': 8923944,
'zipcode': 'AERIU87'},
'id': 'u734'},
{'data': {'date': '2017-07-28',
'price': 9084582,
'zipcode': 'BJDHEU3'},
'id': 'uh72'}]

Numeric sort of list of dictionary objects

I am very new to python programming and have yet to buy a textbook on the matter (I am buying one from the store or Amazon today). In the meantime, can you help me with the following problem I have encountered?
I have an list of dictionary objects like this:
stock = [
{ 'date': '2012', 'amount': '1.45', 'type': 'one'},
{ 'date': '2012', 'amount': '1.4', 'type': 'two'},
{ 'date': '2011', 'amount': '1.35', 'type': 'three'},
{ 'date': '2012', 'amount': '1.35', 'type': 'four'}
]
I would like to sort the list by the amount date column and then by the amount column so that the sorted list looks like this:
stock = [
{ 'date': '2011', 'amount': '1.35', 'type': 'three'},
{ 'date': '2012', 'amount': '1.35', 'type': 'four'},
{ 'date': '2012', 'amount': '1.4', 'type': 'two'},
{ 'date': '2012', 'amount': '1.45', 'type': 'one'}
]
I now think I need to use sorted() but as a beginner I am having difficulties understanding to concepts I see.
I tried this:
from operator import itemgetter
all_amounts = itemgetter("amount")
stock.sort(key = all_amounts)
but this resulted in an list that was sorted alphanumerically rather than numerically.
Can someone please tell me how to achieve this seemingly simple sort? Thank-you!

Your sorting condition is too complicated for an operator.itemgetter. You will have to use a lambda function:
stock.sort(key=lambda x: (int(x['date']), float(x['amount'])))
or
all_amounts = lambda x: (int(x['date']), float(x['amount']))
stock.sort(key=all_amounts)

Start by converting your data into a proper format:
stock = [
{ 'date': int(x['date']), 'amount': float(x['amount']), 'type': x['type']}
for x in stock
]
Now stock.sort(key=all_amounts) will return correct results.
As you appear to be new in programming, here's a word of general advice if I may:
Proper data structure is 90 percent of success. Do not try to work around broken data by writing more code. Create a structure adequate to your task and write as less code as possible.

You can also use the fact that python's sort is stable:
stock.sort(key=lambda x: int(x["amount"]))
stock.sort(key=lambda x: int(x["date"]))
Since the items with the same key keep their relative positions when sorting (they're never swapped), you can build up a complicated sort by sorting multiple times.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.