Related
a= {101: {'Sender': 'Phillip', 'Receiver': 'Ramya', 'Start date': '14-03-2020', 'Delivery date': '25-03-2020', 'Sender location': 'Area 1', 'Receiver location': 'Area 6', 'Delivery status': 'Delivered', 'Shipping cost': 198}, 102: {'Sender': 'Romesh', 'Receiver': 'Phillip', 'Start date': '18-06-2020', 'Delivery date': '09-07-2020', 'Sender location': 'Area 2', 'Receiver location': 'Area 4', 'Delivery status': 'Delivered', 'Shipping cost': 275}, 103: {'Sender': 'Omega lll', 'Receiver': 'Ramya', 'Start date': '01-12-2020', 'Delivery date': 'Null', 'Sender location': 'Area 5', 'Receiver location': 'Area 1', 'Delivery status': 'In-Transit', 'Shipping cost': 200}, 104: {'Sender': 'Phillip', 'Receiver': 'John', 'Start date': '23-06-2020', 'Delivery date': '25-06-2020', 'Sender location': 'Area 1', 'Receiver location': 'Area 4', 'Delivery status': 'Delivered', 'Shipping cost': 314}, 105: {'Sender': 'Ramya', 'Receiver': 'Romesh', 'Start date': '29-08-2020', 'Delivery date': '10-09-2020', 'Sender location': 'Area 5', 'Receiver location': 'Area 3', 'Delivery status': 'Delivered', 'Shipping cost': 275}, 106: {'Sender': 'John', 'Receiver': 'Omega lll', 'Start date': '28-06-2020', 'Delivery date': 'Null', 'Sender location': 'Area 3', 'Receiver location': 'Area 1', 'Delivery status': 'In-Transit', 'Shipping cost': 270}}
** I tried this using filter function but stuck how to compare the dates **
Start_date= list(filter(lambda value: value['Start date' in 'Delivery Date'] ,a.values()))
print(Start_date)
Can someone help me please
You can use datetime.strptime to convert string to datetime object:
from datetime import datetime
for v in a.values():
if v["Delivery date"] == "Null":
continue
start = datetime.strptime(v["Start date"], "%d-%m-%Y")
delivery = datetime.strptime(v["Delivery date"], "%d-%m-%Y")
if (delivery - start).days <= 7:
print(v)
Prints:
{'Sender': 'Phillip', 'Receiver': 'John', 'Start date': '23-06-2020', 'Delivery date': '25-06-2020', 'Sender location': 'Area 1', 'Receiver location': 'Area 4', 'Delivery status': 'Delivered', 'Shipping cost': 314}
I have issue with pandas pd.groupby() function. I have DataFrame
data = [{'Shop': 'Venga', 'Item Name': 'Oranges', 'Measure':'Supply Cost', 'Value': '10'},
{'Shop': 'Venga', 'Item Name': 'Oranges', 'Measure':'Product Cost', 'Value': '20'},
{'Shop': 'Venga', 'Item Name': 'Apples', 'Measure':'Supply Cost', 'Value': '5'},
{'Shop': 'Venga', 'Item Name': 'Apples', 'Measure':'Product Cost', 'Value': '60'},
{'Shop': 'Mesto', 'Item Name': 'Oranges', 'Measure':'Supply Cost', 'Value': '15'},
{'Shop': 'Mesto', 'Item Name': 'Oranges', 'Measure':'Product Cost', 'Value': '10'},
{'Shop': 'Mesto', 'Item Name': 'Apples', 'Measure':'Supply Cost', 'Value': '80'},
{'Shop': 'Mesto', 'Item Name': 'Apples', 'Measure':'Product Cost', 'Value': '5'},
]
I want to move my categories of Measure to columns and make it look like this:
I have tried to run data.groupby(['Measure'], axis = 1).sum() but it doesn't work at all for me.
Use .groupby and then .unstack the correct level.
In this case, level=2 is the 'Measure' column, from the .groupby object.
.reset_index to remove the multi-level index.
import pandas as pd
dfg = df.groupby(['Shop', 'Item Name', 'Measure'])['Value'].sum().unstack(level=2).reset_index()
dfg.columns.name = None
# display(dfg)
Shop Item Name Product Cost Supply Cost
0 Mesto Apples 5 80
1 Mesto Oranges 10 15
2 Venga Apples 60 5
3 Venga Oranges 20 10
Lets say I have the following list in python. It is ordered first by Equip, then by Date:
my_list = [
{'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01'},
{'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02'},
{'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'},
{'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04'},
{'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05'},
{'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03'},
{'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04'},
{'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'}
]
What I want to do is collapse the list by each set where a given piece of Equipment's job does not change, and grab the first and last date the equipment was there. E.g., this simple example should change to:
list_by_job = [
{'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03'},
{'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05'},
{'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'},
{'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'}
]
A couple of things to note:
A-2 on Job 1 is only there for a single day, thus its First and Last Date should be the same.
A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.
As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)
For point 3, the list
my_list = [
{'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01'},
{'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02'},
{'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'}
]
should yield
list_by_job = [
{'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01'},
{'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02'},
{'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'}
]
Currently I am doing so in a simple loop/non-pythonic way:
list_by_job = []
last_entry = None
for entry in my_list:
if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
list_by_job.append({'Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date']})
else:
list_by_job[-1]['Last'] = entry['Date']
last_entry = entry
Is there a more pythonic way to do this using Python's list comprehension, etc?
You can use itertools.groupby:
import itertools
def _key(d):
return (d['Equip'], d['Job'])
my_list = [{'Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1'}, {'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1'}, {'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1'}, {'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2'}, {'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2'}, {'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1'}, {'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3'}, {'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3'}]
new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
final_result = [{"Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date']} for [c, d], b in new_data]
Output:
[{'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01'},
{'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04'},
{'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03'},
{'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04'}]
Edit:
Using data as suggested in your comment:
my_list = [{'Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1'}, {'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2'}, {'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1'}, {'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2'}, {'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2'}, {'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1'}, {'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3'}, {'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3'}]
Output:
[{'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01'},
{'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02'},
{'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03'},
{'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04'},
{'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03'},
{'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04'}]
I suggest using pandas for this.
itertools.groupby is cool but IMO a bit harder to comprehend.
>>> import pandas as pd
>>>
>>> my_list = [
...: {'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01'},
...: {'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02'},
...: {'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'},
...: {'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04'},
...: {'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05'},
...: {'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03'},
...: {'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04'},
...: {'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'}
...:]
>>>
>>> df = pd.DataFrame(my_list)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> groups = df.groupby(['Equip', 'Job']).agg({'Date': [min, max]}).reset_index()
>>> groups.columns = ['Equip', 'Job', 'First', 'Last']
>>> groups
>>>
Equip Job First Last
0 A-1 Job 1 2018-01-01 2018-01-03
1 A-1 Job 2 2018-01-04 2018-01-05
2 A-2 Job 1 2018-01-03 2018-01-03
3 A-2 Job 3 2018-01-04 2018-01-05
>>>
>>> groups.to_dict(orient='records')
>>>
[{'Equip': 'A-1',
'First': Timestamp('2018-01-01 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00')},
{'Equip': 'A-1',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 2',
'Last': Timestamp('2018-01-05 00:00:00')},
{'Equip': 'A-2',
'First': Timestamp('2018-01-03 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00')},
{'Equip': 'A-2',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 3',
'Last': Timestamp('2018-01-05 00:00:00')}]
I suggest keeping the dates as time stamps.
You can use pandas here, which is some sort of "database interface" for data:
import pandas as pd
df = pd.DataFrame(my_list)
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns={'min': 'First', 'max': 'Last'})
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')
for the given sample input, this gives:
>>> df2.to_dict('records')
[{'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03'},
{'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05'},
{'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'},
{'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'}]
In case the date format is not '%Y-%m-%d', then one first needs to convert it with pd.to_datetime(..) like:
import pandas as pd
df = pd.DataFrame(my_list)
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns={'min': 'First', 'max': 'Last'})
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')
trying to replace all elements named 'number' to 'numbr' in the data list but doesn't get it working.
Edit: So each key number should be renamed to numbr. Values stay as they are.
What am I doing wrong?
Thank you for your help!
data = [{'address': {
'city': 'city A',
'company_name': 'company A'},
'amount': 998,
'items': [{'description': 'desc A1','number': 'number A1'}],
'number': 'number of A',
'service_date': {
'type': 'DEFAULT',
'date': '2015-11-18'},
'vat_option': 123},
{'address': {
'city': 'city B',
'company_name': 'company B'},
'amount': 222,
'items': [{'description': 'desc B1','number': 'number B1'},
{'description': 'desc B2','number': 'number B2'}],
'number': 'number of B',
'service_date': {
'type': 'DEFAULT',
'date': '2015-11-18'},
'vat_option': 456}
]
def replace(l, X, Y):
for i,v in enumerate(l):
if v == X:
l.pop(i)
l.insert(i, Y)
replace(data, 'number', 'numbr')
print data
The following is a recursive replace implementation that replaces p1 by p2 in any string it encounters in the s object, recursing through lists, sets, tuples, dicts (both keys and values):
def nested_replace(s, p1, p2):
if isinstance(s, basestring): # Python2
# if isinstance(s, (str, bytes)): # Python3
return s.replace(p1, p2)
if isinstance(s, (list, tuple, set)):
return type(s)(nested_replace(x, p1, p2) for x in s)
if isinstance(s, dict):
return {nested_replace(k, p1, p2): nested_replace(v, p1, p2) for k, v in s.items()}
return s
>>> from pprint import pprint
>>> pprint(nested_replace(data, 'number', 'numbr'))
[{'address': {'city': 'city A', 'company_name': 'company A'},
'amount': 998,
'items': [{'description': 'desc A1', 'numbr': 'numbr A1'}],
'numbr': 'numbr of A',
'service_date': {'date': '2015-11-18', 'type': 'DEFAULT'},
'vat_option': 123},
{'address': {'city': 'city B', 'company_name': 'company B'},
'amount': 222,
'items': [{'description': 'desc B1', 'numbr': 'numbr B1'},
{'description': 'desc B2', 'numbr': 'numbr B2'}],
'numbr': 'numbr of B',
'service_date': {'date': '2015-11-18', 'type': 'DEFAULT'},
'vat_option': 456}]
eval function is anti pattern, but I think eval is best solution here
data1 = eval(repr(data).replace('number', 'numbr'))
If you are trying to replace both keys and values this will work.
from json import dumps, loads
data = [{'address': {
'city': 'city A',
'company_name': 'company A'},
'amount': 998,
'items': [{'description': 'desc A1','number': 'number A1'}],
'number': 'number of A',
'service_date': {
'type': 'DEFAULT',
'date': '2015-11-18'},
'vat_option': 123},
{'address': {
'city': 'city B',
'company_name': 'company B'},
'amount': 222,
'items': [{'description': 'desc B1','number': 'number B1'},
{'description': 'desc B2','number': 'number B2'}],
'number': 'number of B',
'service_date': {
'type': 'DEFAULT',
'date': '2015-11-18'},
'vat_option': 456}
]
data_string = dumps(data)
data = loads(data_string.replace('number', 'numbr')
I have a pretty basic (but not quite working) function to dedupe a list of dictionaries from key values by adding the key value to a list for keeping track.
def dedupe(rs):
delist = []
for r in rs:
if r['key'] not in delist:
delist.append(r['key'])
else:
rs.remove(r)
return rs
Which gets used in the script just below on two lists of dictionaries:
from pprint import pprint
records = [
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:00:00', '00:05:54'],
['00:05:55', '00:07:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:05:55', '00:07:54'],
['00:00:00', '00:05:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:16:47', '00:20:04'],
['00:00:00', '00:05:54'],
['00:05:55', '00:07:54']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:11:24', '00:16:46'], ['00:07:55', '00:11:23']]},
{'key': 'Item 3', 'name': 'Item 3', 'positions': [['00:20:05', '00:25:56']]}
]
records2 = [
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:00:00', '00:05:54'],
['00:05:55', '00:07:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:05:55', '00:07:54'],
['00:00:00', '00:05:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:16:47', '00:20:04'],
['00:00:00', '00:05:54'],
['00:05:55', '00:07:54']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:11:24', '00:16:46'], ['00:07:55', '00:11:23']]},
{'key': 'Item 3', 'name': 'Item 3', 'positions': [['00:20:05', '00:25:56']]}
]
def dedupe(rs):
delist = []
for r in rs:
if r['key'] not in delist:
delist.append(r['key'])
else:
rs.remove(r)
return rs
if __name__ == '__main__':
res = dedupe(records)
res2 = dedupe(records2)
pprint(res)
pprint(res2)
For either records or records2, I would expect to get:
[
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:00:00', '00:05:54'],
['00:05:55', '00:07:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
{'key': 'Item 3',
'name': 'Item 3',
'positions': [['00:20:05', '00:25:56']]}
]
But instead I get (for each of the two inputs):
[
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:00:00', '00:05:54'],
['00:05:55', '00:07:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:16:47', '00:20:04'],
['00:00:00', '00:05:54'],
['00:05:55', '00:07:54']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
{'key': 'Item 3', 'name': 'Item 3', 'positions': [['00:20:05', '00:25:56']]}
]
[
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:00:00', '00:05:54'],
['00:05:55', '00:07:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:11:24', '00:16:46'], ['00:07:55', '00:11:23']]},
{'key': 'Item 3', 'name': 'Item 3', 'positions': [['00:20:05', '00:25:56']]}
]
I keep staring at and tweaking this, but it's not clear to me why it is not deleting the third instance if they are in sequence (records), or works for the one with three, but fails on the one with two if the one with three instances are broken up (records2).
I wouldn't remove elements from an iterator while iterating it.
Instead do this:
def dedupe(rs):
delist = []
new_rs = []
for r in rs:
if r['key'] not in delist:
print r['key']
delist.append(r['key'])
new_rs.append(r)
return new_rs