Numeric sort of list of dictionary objects - python

I am very new to python programming and have yet to buy a textbook on the matter (I am buying one from the store or Amazon today). In the meantime, can you help me with the following problem I have encountered?
I have an list of dictionary objects like this:
stock = [
{ 'date': '2012', 'amount': '1.45', 'type': 'one'},
{ 'date': '2012', 'amount': '1.4', 'type': 'two'},
{ 'date': '2011', 'amount': '1.35', 'type': 'three'},
{ 'date': '2012', 'amount': '1.35', 'type': 'four'}
]
I would like to sort the list by the amount date column and then by the amount column so that the sorted list looks like this:
stock = [
{ 'date': '2011', 'amount': '1.35', 'type': 'three'},
{ 'date': '2012', 'amount': '1.35', 'type': 'four'},
{ 'date': '2012', 'amount': '1.4', 'type': 'two'},
{ 'date': '2012', 'amount': '1.45', 'type': 'one'}
]
I now think I need to use sorted() but as a beginner I am having difficulties understanding to concepts I see.
I tried this:
from operator import itemgetter
all_amounts = itemgetter("amount")
stock.sort(key = all_amounts)
but this resulted in an list that was sorted alphanumerically rather than numerically.
Can someone please tell me how to achieve this seemingly simple sort? Thank-you!

Your sorting condition is too complicated for an operator.itemgetter. You will have to use a lambda function:
stock.sort(key=lambda x: (int(x['date']), float(x['amount'])))
or
all_amounts = lambda x: (int(x['date']), float(x['amount']))
stock.sort(key=all_amounts)

Start by converting your data into a proper format:
stock = [
{ 'date': int(x['date']), 'amount': float(x['amount']), 'type': x['type']}
for x in stock
]
Now stock.sort(key=all_amounts) will return correct results.
As you appear to be new in programming, here's a word of general advice if I may:
Proper data structure is 90 percent of success. Do not try to work around broken data by writing more code. Create a structure adequate to your task and write as less code as possible.

You can also use the fact that python's sort is stable:
stock.sort(key=lambda x: int(x["amount"]))
stock.sort(key=lambda x: int(x["date"]))
Since the items with the same key keep their relative positions when sorting (they're never swapped), you can build up a complicated sort by sorting multiple times.

Related

Using json_normalize() for missing keys Python Pandas DataFrame

I have this snapshot of my dataset
test={'data': [{'name': 'john',
'insights': {'data': [{'account_id': '123',
'test_id': '456',
'date_start': '2022-12-31',
'date_stop': '2023-01-29',
'impressions': '4070',
'spend': '36.14'}],
'paging': {'cursors': {'before': 'MAZDZD', 'after': 'MAZDZD'}}},
'status': 'ACTIVE',
'id': '789'},
{'name': 'jack', 'status': 'PAUSED', 'id': '420'}]
}
I want to create a pandas dataframe where the columns are the name, date_start, date_stop, impressions, and spend.
When I tried json_normalize(), it raises an error because some of the keys are missing, when 'status':'PAUSED'. Is there a way to remove the values when the keys are missing from the list or another way of using json_normalize()? I tried errors='ignore' but it doesnt work as well.

List comprehension and flattening deep data structure

Lets say I have a data structure that looks like the following (this is greatly simplified, my actual data has a significant amount of day-specific data for each job on each date):
data = {
'2019-01-01': {
'job-1-id': {'name': 'Job 1', 'address': '123 main st.'},
'job-2-id': {'name': 'Job 2', 'address': '824 1st Ave.'},
},
'2019-01-02': {
'job-1-id': {'name': 'Job 1', 'address': '123 main st.'},
'job-3-id': {'name': 'Job 3', 'address': '485 Pleasant Rd.'}
}
}
What I would like to do is flatten this, pushing the date and job id to an array of objects. E.g.:
data_flat = [
{'id': 'job-1-id', 'date': '2019-01-01', 'name': 'Job 1', 'address': '123 main st.'},
{'id': 'job-2-id', 'date': '2019-01-01', 'name': 'Job 2', 'address': '824 1st Ave.'},
{'id': 'job-1-id', 'date': '2019-01-02', 'name': 'Job 1', 'address': '123 main st.'},
{'id': 'job-3-id', 'date': '2019-01-02', 'name': 'Job 3', 'address': '485 Pleasant Rd.'},
]
Obviously I can look over and build a new array:
data_flat = []
for date, jobs in data.items():
for job_id, job in jobs.items():
data_flat.append({'id': job_id, 'date': date, etc...})
But is there a more pythonic/efficient way to do this using list comprehension with nested data like this? About all I can think of is using list comprehension for the inner loop, and then instead of appending, using extend to build the list. Thoughts?
A possible list comprehension solution might be as follows, where we unpack job dictionary and add id and date key-value pair to it, while iterating over the two for loops
[{**job, 'id': job_id, 'date': date} for date, jobs in data.items() for job_id, job in jobs.items()]
Which in traditional for-loop, looks like
for date, jobs in data.items():
for job_id, job in jobs.items():
data_flat.append({**job, 'id': job_id, 'date': date})
flat list can be made through reduce easily.
All you need to use initializer - third argument in the reduce function.
reduce(
lambda _list, date: _list.extend(
{'date': date, 'id':_id, **detail} for _id, detail in data[date].items()) or _list,
data,
[])
Above code works for both python2 and python3, but you need to import reduce module as from functools import reduce. Refer below link for details.
for python2
for python3

Filtering through a list with embedded dictionaries

I've got a json format list with some dictionaries within each list, it looks like the following:
[{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
The amount of entries within the list can be up to 100. I plan to present the 'name' for each entry, one result at a time, for those that have London as a town. The rest are of no use to me. I'm a beginner at python so I would appreciate a suggestion in how to go about this efficiently. I initially thought it would be best to remove all entries that don't have London and then I can go through them one by one.
I also wondered if it might be quicker to not filter but to cycle through the entire json and select the names of entries that have the town as London.
You can use filter:
data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
london_dicts = filter(lambda d: d['venue']['town'] == 'London', data)
for d in london_dicts:
print(d)
This is as efficient as it can get because:
The loop is written in C (in case of CPython)
filter returns an iterator (in Python 3), which means that the results are loaded to memory one by one as required
One way is to use list comprehension:
>>> data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
>>> [d for d in data if d['venue']['town'] == 'London']
[{'id': 17,
'name': 'Alfred',
'venue': {'id': 456, 'town': 'London'},
'month': 'February'},
{'id': 17,
'name': 'Mary',
'venue': {'id': 56, 'town': 'London'},
'month': 'December'}]

How to create a dict of dicts from pandas dataframe?

I have a dataframe df
id price date zipcode
u734 8923944 2017-01-05 AERIU87
uh72 9084582 2017-07-28 BJDHEU3
u029 299433 2017-09-31 038ZJKE
I want to create a dictionary with the following structure
{'id': xxx, 'data': {'price': xxx, 'date': xxx, 'zipcode': xxx}}
What I have done so far
ids = df['id']
prices = df['price']
dates = df['date']
zips = df['zipcode']
d = {'id':idx, 'data':{'price':p, 'date':d, 'zipcode':z} for idx,p,d,z in zip(ids,prices,dates,zips)}
>>> SyntaxError: invalid syntax
but I get the error above.
What would be the correct way to do this, using either
list comprehension
OR
pandas .to_dict()
bonus points: what is the complexity of the algorithm, and is there a more efficient way to do this?
I'd suggest the list comprehension.
v = df.pop('id')
data = [
{'id' : i, 'data' : j}
for i, j in zip(v, df.to_dict(orient='records'))
]
Or a compact version,
data = [dict(id=i, data=j) for i, j in zip(df.pop('id'), df.to_dict(orient='r'))]
Note that, if you're popping id inside the expression, it has to be the first argument to zip.
print(data)
[{'data': {'date': '2017-09-31',
'price': 299433,
'zipcode': '038ZJKE'},
'id': 'u029'},
{'data': {'date': '2017-01-05',
'price': 8923944,
'zipcode': 'AERIU87'},
'id': 'u734'},
{'data': {'date': '2017-07-28',
'price': 9084582,
'zipcode': 'BJDHEU3'},
'id': 'uh72'}]

Python jsonpath Filter Expression

Background:
I have the following example data structure in JSON:
{'sensor' : [
{'assertions_enabled': 'ucr+',
'deassertions_enabled': 'ucr+',
'entity_id': '7.0',
'lower_critical': 'na',
'lower_non_critical': 'na',
'lower_non_recoverable': 'na',
'reading_type': 'analog',
'sensor_id': 'SR5680 TEMP (0x5d)',
'sensor_reading': {'confidence_interval': '0.500',
'units': 'degrees C',
'value': '42'},
'sensor_type': 'Temperature',
'status': 'ok',
'upper_critical': '59.000',
'upper_non_critical': 'na',
'upper_non_recoverable': 'na'}
]}
The sensor list will actually contain many of these dicts containing sensor info.
Problem:
I'm trying to query the list using jsonpath to return me a subset of sensor dicts that have sensor_type=='Temperature' but I'm getting 'False' returned (no match). Here's my jsonpath expression:
results = jsonpath.jsonpath(ipmi_node, "$.sensor[?(#.['sensor_type']=='Temperature')]")
When I remove the filter expression and just use "$.sensor.*" I get a list of all sensors, so I'm sure the problem is in the filter expression.
I've scanned multiple sites/posts for examples and I can't seem to find anything specific to Python (Javascript and PHP seem to be more prominent). Could anyone offer some guidance please?
The following expression does what you need (notice how the attribute is specified):
jsonpath.jsonpath(impi_node, "$.sensor[?(#.sensor_type=='Temperature')]")
I am using jsonpath-ng which seems to be active (as of 23.11.20) and I provide solution based on to Pedro's jsonpath expression:
data = {
'sensor' : [
{'sensor_type': 'Temperature', 'id': '1'},
{'sensor_type': 'Humidity' , 'id': '2'},
{'sensor_type': 'Temperature', 'id': '3'},
{'sensor_type': 'Density' , 'id': '4'}
]}
from jsonpath_ng.ext import parser
for match in parser.parse("$.sensor[?(#.sensor_type=='Temperature')]").find(data):
print(match.value)
Output:
{'sensor_type': 'Temperature', 'id': '1'}
{'sensor_type': 'Temperature', 'id': '3'}
NOTE: besides basic documentation provided on project's homepage I found additional information in tests.

Categories

Resources