I'm trying to transform this list:
list = [
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
]
into the sum of occurence on the key 'product_name' like this:
list_final = [
{'product_name': '4x6', 'quantity': 4, 'price': 1.16},
{'product_name': '4x4', 'quantity': 4, 'price': 1.16},
]
I can't figure how to search the occurence of the key 'product_name' without doing loops in loops
what I did :
for item in list:
if item.product_name in data.keys():
data[item.product_name]['qty'] += 1
data[item.product_name]['price'] *= 2
else:
data.update({item.product_name: [{'qty': item['quantity'], 'price': item['price']}]})
but I cant find a solution to get my list as I want
how can I do this right ?
Here's a solution with OrderedDict that handles multiple products.
from collections import OrderedDict
o = OrderedDict()
for x in data:
p = x['product_name']
if p not in o:
o[p] = x
else:
o[p].update({k : o[p][k] + x[k] for k in x.keys() - {'product_name'}})
list_final = list(o.values())
A product is added to the inventory if it doesn't exist, or else is summed with the existing inventory. This should work on python3.x and above.
print(list_final)
[{'price': 1.16, 'product_name': '4x6', 'quantity': 4}]
For python2.x, change this
o[p].update({k : o[p][k] + x[k] for k in x.keys() - {'product_name'}})
To
o[p].update({k : o[p][k] + x[k] for k in set(x.keys()) - {'product_name'}})
Probably not the most readable, but here's a for loop-free implementation:
def transform(array):
def inner(cumulator, row):
product_name = row['product_name']
bucket = cumulator.get(product_name, {'quantity': 0, 'price': 0})
cumulator[product_name] = {
'quantity': bucket['quantity'] + row['quantity'],
'price': bucket['price'] + row['price'],
}
return cumulator
return reduce(inner, array, {})
And then you just
transform(list)
// {'4x6': {'price': 1.16, 'quantity': 4}}
this might help:
l = [
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x6', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
{'product_name': '4x4', 'quantity': 1, 'price': 0.29},
]
from collections import defaultdict
count = defaultdict(lambda: {'quantity':0, 'price':0.0})
for d in l:
count[d['product_name']]['quantity'] += 1
count[d['product_name']]['price'] = d['price']
for prod_name, prod_info in count.items():
print("product_name:", prod_name, "quantity: {quantity} price: {price}".format(**prod_info))
Output for your input:
product_name: 4x6 quantity: 4 price: 0.29
product_name: 4x4 quantity: 4 price: 0.29
Note: This also works with python2
Related
I have a dataframe with nested json as column.
df.depth
0 {'buy': [{'quantity': 51, 'price': 2275.85, 'o...
1 {'buy': [{'quantity': 1, 'price': 2275.85, 'or...
2 {'buy': [{'quantity': 1, 'price': 2275.85, 'or..
inside each row have 5 depths of buy sell
df.depth[0]
{'buy': [{...}, {...}, {...}, {...}, {...}], 'sell': [{...}, {...}, {...}, {...}, {...}]}
real json structure is as below
{'buy': [{'quantity': 51, 'price': 2275.85, 'orders': 2}, {'quantity': 38, 'price': 2275.8, 'orders': 2}, {'quantity': 108, 'price': 2275.75, 'orders': 3}, {'quantity': 120, 'price': 2275.7, 'orders': 2}, {'quantity': 6, 'price': 2275.6, 'orders': 1}], 'sell': [{'quantity': 353, 'price': 2276.95, 'orders': 1}, {'quantity': 29, 'price': 2277.0, 'orders': 2}, {'quantity': 54, 'price': 2277.1, 'orders': 2}, {'quantity': 200, 'price': 2277.2, 'orders': 1}, {'quantity': 4, 'price': 2277.25, 'orders': 1}]}
i want to explode this in to something like this
Required Output:
depth.buy.quantity1 df.depth.buy.price1 ... depth.sell.quantity1 depth.sell.price1....
0 51 2275.85.... 353 2276
1 1 2275.85.... 352 2276
how to do it ?
Edit:
for help i have added demo dataframe:
a={'buy': [{'quantity': 51, 'price': 2275.85, 'orders': 2}, {'quantity': 38, 'price': 2275.8, 'orders': 2}, {'quantity': 108, 'price': 2275.75, 'orders': 3}, {'quantity': 120, 'price': 2275.7, 'orders': 2}, {'quantity': 6, 'price': 2275.6, 'orders': 1}], 'sell': [{'quantity': 353, 'price': 2276.95, 'orders': 1}, {'quantity': 29, 'price': 2277.0, 'orders': 2}, {'quantity': 54, 'price': 2277.1, 'orders': 2}, {'quantity': 200, 'price': 2277.2, 'orders': 1}, {'quantity': 4, 'price': 2277.25, 'orders': 1}]}
c=dict()
c['depth'] = a
df = pd.DataFrame([c,c])
You could try concat:
df = pd.concat([pd.concat([pd.DataFrame(x, index=[0]) for x in i], axis=1) for i in pd.json_normalize(df['depth'])['buy'].tolist()], ignore_index=True)
print(df)
Output:
quantity price orders quantity price orders ... quantity price orders quantity price orders
0 51 2275.85 2 38 2275.8 2 ... 120 2275.7 2 6 2275.6 1
1 51 2275.85 2 38 2275.8 2 ... 120 2275.7 2 6 2275.6 1
[2 rows x 15 columns]
I have a list of dictionaries, which looks like this:
_input = [{'cumulated_quantity': 30, 'price': 7000, 'quantity': 30},
{'cumulated_quantity': 80, 'price': 7002, 'quantity': 50},
{'cumulated_quantity': 130, 'price': 7010, 'quantity': 50},
{'cumulated_quantity': 330, 'price': 7050, 'quantity': 200},
{'cumulated_quantity': 400, 'price': 7065, 'quantity': 70}]
I would like to group the dictionary in bins of quantity 100, where the price is calculated as a weighted average. The result should look like this:
result = [{'cumulated_quantity': 100, 'price': 7003, 'quantity': 100},
{'cumulated_quantity': 200, 'price': 7038, 'quantity': 100},
{'cumulated_quantity': 300, 'price': 7050, 'quantity': 100},
{'cumulated_quantity': 400, 'price': 7060.5, 'quantity': 100}]
The weighted averages, in the result dictionary are calculated as follows:
7003 = (30*7000+50*7002+20*7010)/100
7038 = (30*7010+70*7050)/100
7050 = 100*7050/100
7060.5 = (30*7050+70*7065)/100
I managed to receive the result, by utilising pandas dataframes, however their performance is way too slow (about 0.5 seconds). Is there a fast method to do this in python?
Not using pandas, it's nearly instantaneous by doing it yourself:
result = []
cumulative_quantity = 0
bucket = {'price': 0.0, 'quantity': 0}
for dct in lst:
dct_quantity = dct['quantity'] # enables non-destructive decrementing
while dct_quantity > 0:
if bucket['quantity'] == 100:
bucket['cumulative_quantity'] = cumulative_quantity
result.append(bucket)
bucket = {'price': 0.0, 'quantity': 0}
added_quantity = min([dct_quantity, 100 - bucket['quantity']])
bucket['price'] = (bucket['price'] * bucket['quantity'] + dct['price'] * added_quantity) / (bucket['quantity'] + added_quantity)
dct_quantity -= added_quantity
bucket['quantity'] += added_quantity
cumulative_quantity += added_quantity
if bucket['quantity'] != 0:
bucket['cumulative_quantity'] = cumulative_quantity
result.append(bucket)
Gives
>>> result
[{'cumulative_quantity': 100, 'price': 7003.0, 'quantity': 100},
{'cumulative_quantity': 200, 'price': 7038.0, 'quantity': 100},
{'cumulative_quantity': 300, 'price': 7050.0, 'quantity': 100},
{'cumulative_quantity': 400, 'price': 7060.5, 'quantity': 100}]
This can be done linearly, as O(p), where p is the number of parts (equivalent to O(n * k) where k is the average number of pieces each dict must be split into (in your example k = 1.6)).
BIN_SIZE = 100
cum_quantity = 0
value = 0.
bin_quantity = 0
bin_value = 0
results = []
for record in _input:
price, quantity = record['price'], record['quantity']
while quantity:
prior_quantity = bin_quantity
bin_quantity = min(BIN_SIZE, bin_quantity + quantity)
quantity_delta = bin_quantity - prior_quantity
bin_value += quantity_delta * price
quantity -= quantity_delta
if bin_quantity == BIN_SIZE:
avg_price = bin_value / float(BIN_SIZE)
cum_quantity += BIN_SIZE
bin_quantity = bin_value = 0 # Reset bin values.
results.append({'cumulated_quantity': cum_quantity,
'price': avg_price,
'quantity': BIN_SIZE})
# Add stub for anything left in remaining bin (optional).
if bin_quantity:
results.append({'cumulated_quantity': cum_quantity + bin_quantity,
'price': bin_value / float(bin_quantity),
'quantity': bin_quantity})
>>> results
[{'cumulated_quantity': 100, 'price': 7003.0, 'quantity': 100},
{'cumulated_quantity': 200, 'price': 7038.0, 'quantity': 100},
{'cumulated_quantity': 300, 'price': 7050.0, 'quantity': 100},
{'cumulated_quantity': 400, 'price': 7060.5, 'quantity': 100}]
This question already has answers here:
Python sum on keys for List of Dictionaries [duplicate]
(5 answers)
Closed 4 years ago.
I have a python list like this:
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
I am trying to write the code to join the dictionaries with the same name by also adding the quantities. The final list will be that:
user = [
{'name': 'ozzy', 'quantity': 8},
{'name': 'frank', 'quantity': 6},
{'name': 'james', 'quantity': 7}
]
I have tried a few things but I am struggling to get the right code. The code I have written below is somewhat adding the values (actually my list is much longer, I have just added a small portion for reference).
newList = []
quan = 0
for i in range(0,len(user)):
originator = user[i]['name']
for j in range(i+1,len(user)):
if originator == user[j]['name']:
quan = user[i]['quantity'] + user[j]['quantity']
newList.append({'name': originator, 'Quantity': quan})
can you please help me to get the correct code?
Just count the items in a collections.Counter, and expand back to list of dicts if needed:
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
import collections
d = collections.Counter()
for u in user:
d[u['name']] += u['quantity']
print(dict(d))
newlist = [{'name' : k, 'quantity' : v} for k,v in d.items()]
print(newlist)
outputs Counter dict first, which is already sufficient:
{'frank': 6, 'ozzy': 8, 'james': 7}
and the reformatted output using list of dicts:
[{'name': 'frank', 'quantity': 6}, {'name': 'ozzy', 'quantity': 8}, {'name': 'james', 'quantity': 7}]
The solution is also straightforward with a standard dictionary. No need for Counter or OrderedDict here:
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
dic = {}
for item in user:
n, q = item.values()
dic[n] = dic.get(n,0) + q
print(dic)
user = [{'name':n, 'quantity':q} for n,q in dic.items()]
print(user)
Result:
{'ozzy': 8, 'frank': 6, 'james': 7}
[{'name': 'ozzy', 'quantity': 8}, {'name': 'frank', 'quantity': 6}, {'name': 'james', 'quantity': 7}]
I would suggest changing the way the output dictionary looks so that it is actually useable. Consider something like this
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
data = {}
for i in user:
print(i)
if i['name'] in data:
data[i['name']] += i['quantity']
else:
data.update({i['name']: i['quantity']})
print(data)
{'frank': 6, 'james': 7, 'ozzy': 8}
If you need to maintain the original relative order:
from collections import OrderedDict
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
d = OrderedDict()
for item in user:
d[item['name']] = d.get(item['name'], 0) + item['quantity']
newlist = [{'name' : k, 'quantity' : v} for k, v in d.items()]
print(newlist)
Output:
[{'name': 'ozzy', 'quantity': 8}, {'name': 'frank', 'quantity': 6}, {'name': 'james', 'quantity': 7}]
user = [
{'name': 'ozzy', 'quantity': 8},
{'name': 'frank', 'quantity': 6},
{'name': 'james', 'quantity': 7}
]
reference_dict = {}
for item in user :
reference_dict[item['name']] = reference_dict.get(item['name'],0) + item['quantity']
#Creating new list from reference dict
updated_user = [{'name' : k , 'quantity' : v} for k,v in reference_dict.items()]
print updated_user
I have an rdd similar to following:
[('C3', [{'Item': 'Shirt', 'Color ': 'Black', 'Size': '32','Price':'2500'}, {'Item': 'Sweater', 'Color ': 'Red', 'Size': '35', 'Price': '1000'}, {'Item': 'Jeans', 'Color ': 'Yellow', 'Size': '30', 'Price': '1500'}]), ('C1', [{'Item': 'Shirt', 'Color ': 'Green', 'Size': '25', 'Price': '2000'}, {'Item': 'Saree', 'Color ': 'Green', 'Size': '25', 'Price': '1500'}, {'Item': 'Saree', 'Color ': 'Green', 'Size': '25', 'Price': '1500'}, {'Item': 'Jeans', 'Color ': 'Yellow', 'Size': '30', 'Price': '1500'}])]
We can create above the rdd by:
sc.parallelize([('C3', [{'Item': 'Shirt', 'Color ': 'Black', 'Size': '32','Price':'2500'}, {'Item': 'Sweater', 'Color ': 'Red', 'Size': '35', 'Price': '1000'}, {'Item': 'Jeans', 'Color ': 'Yellow', 'Size': '30', 'Price': '1500'}]), ('C1', [{'Item': 'Shirt', 'Color ': 'Green', 'Size': '25', 'Price': '2000'}, {'Item': 'Saree', 'Color ': 'Green', 'Size': '25', 'Price': '1500'}, {'Item': 'Saree', 'Color ': 'Green', 'Size': '25', 'Price': '1500'}, {'Item': 'Jeans', 'Color ': 'Yellow', 'Size': '30', 'Price': '1500'}])])
I need to create a dataframe/rdd similar to following (I am adding count to all attributes)
{'C1': {'Color ': {'Green': 3, 'Yellow': 1},
'Item': {'Jeans': 1, 'Saree': 2, 'Shirt': 1},
'Price': {'1500': 3, '2000': 1},
'Size': {'25': 3, '30': 1}},
'C3': {'Color ': {'Black': 1, 'Red': 1, 'Yellow': 1},
'Item': {'Jeans': 1, 'Shirt': 1, 'Sweater': 1},
'Price': {'1000': 1, '1500': 1, '2500': 1},
'Size': {'30': 1, '32': 1, '35': 1}}}
Corresponding dataframe/rdd will be:
+-------+---------------------------------------------------------------------
|custo |attr
|C1 |Map(Color -> Map(Green -> 3, yellow -> 1), Item -> Map(Jeans -> 1, Saree -> 2, Shirt ->1), Price -> |
+-------+-------------------------------------------------------------------------------------------------------+
Use a udf to collect the counts.
from pyspark.sql import functions as f
from pyspark.sql import types as t
def count(c_dict):
res = {}
for item in c_dict:
print(type(item))
for key in item:
print(key, item[key])
if key in res:
if item[key] in res[key]:
res[key][item[key]]+= 1
else:
res[key][item[key]] = 1
else:
res[key]={}
res[key][item[key]] = 1
return(res)
schema = t.MapType(t.StringType(), t.MapType(t.StringType(), t.IntegerType()))
count_udf = f.udf(count, schema)
df2 = df.withColumn( 'col2' , count_udf(df.col2))
df.collect()
result
[Row(col1='C3', col2={'Size': {'35': 1, '30': 1, '32': 1}, 'Price': {'2500': 1, '1500': 1, '1000': 1}, 'Item': {'Sweater': 1, 'Jeans': 1, 'Shirt': 1}, 'Color ': {'Red': 1, 'Black': 1, 'Yellow': 1}}),
Row(col1='C1', col2={'Size': {'25': 3, '30': 1}, 'Price': {'1500': 3, '2000': 1}, 'Item': {'Jeans': 1, 'Saree': 2, 'Shirt': 1}, 'Color ': {'Green': 3, 'Yellow': 1}})]
I have an unknown number of lists of product results as dictionary entries that all have the same keys. I'd like to generate a new list of products that appear in all of the old lists.
'what products are available in all cities?'
given:
list1 = [{'id': 1, 'name': 'bat', 'price': 20.00}, {'id': 2, 'name': 'ball', 'price': 12.00}, {'id': 3, 'name': 'brick', 'price': 19.00}]
list2 = [{'id': 1, 'name': 'bat', 'price': 18.00}, {'id': 3, 'name': 'brick', 'price': 11.00}, {'id': 2, 'name': 'ball', 'price': 17.00}]
list3 = [{'id': 1, 'name': 'bat', 'price': 16.00}, {'id': 4, 'name': 'boat', 'price': 10.00}, {'id': 3, 'name': 'brick', 'price': 15.00}]
list4 = [{'id': 1, 'name': 'bat', 'price': 14.00}, {'id': 2, 'name': 'ball', 'price': 9.00}, {'id': 3, 'name': 'brick', 'price': 13.00}]
list...
I want a list of dicts in which the 'id' exists in all of the old lists:
result_list = [{'id': 1, 'name': 'bat}, {'id': 3, 'name': 'brick}]
The values that aren't constant for a given 'id' can be discarded, but the values that are the same for a given 'id' must be in the results list.
If I know how many lists I've got, I can do:
results_list = []
for dict in list1:
if any(dict['id'] == d['id'] for d in list2):
if any(dict['id'] == d['id'] for d in list3):
if any(dict['id'] == d['id'] for d in list4):
results_list.append(dict)
How can I do this if I don't know how many lists I've got?
Put the ids into sets and then take the intersection of the sets.
list1 = [{'id': 1, 'name': 'steve'}, {'id': 2, 'name': 'john'}, {'id': 3, 'name': 'mary'}]
list2 = [{'id': 1, 'name': 'jake'}, {'id': 3, 'name': 'tara'}, {'id': 2, 'name': 'bill'}]
list3 = [{'id': 1, 'name': 'peter'}, {'id': 4, 'name': 'rick'}, {'id': 3, 'name': 'marci'}]
list4 = [{'id': 1, 'name': 'susan'}, {'id': 2, 'name': 'evan'}, {'id': 3, 'name': 'tom'}]
lists = [list1, list2, list3, list4]
sets = [set(x['id'] for x in lst) for lst in lists]
intersection = set.intersection(*sets)
print(intersection)
Result:
{1, 3}
Note that we call the class method set.intersection rather than the instance method set().intersection, since the latter takes intersections of its arguments with the empty set set(), and of course the intersection of anything with the empty set is empty.
If you want to turn this back into a list of dicts, you can do:
result = [{'id': i, 'name': None} for i in intersection]
print(result)
Result:
[{'id': 1, 'name': None}, {'id': 3, 'name': None}]
Now, if you also want to hold onto those attributes which are the same for all instances of a given id, you'll want to do something like this:
list1 = [{'id': 1, 'name': 'bat', 'price': 20.00}, {'id': 2, 'name': 'ball', 'price': 12.00}, {'id': 3, 'name': 'brick', 'price': 19.00}]
list2 = [{'id': 1, 'name': 'bat', 'price': 18.00}, {'id': 3, 'name': 'brick', 'price': 11.00}, {'id': 2, 'name': 'ball', 'price': 17.00}]
list3 = [{'id': 1, 'name': 'bat', 'price': 16.00}, {'id': 4, 'name': 'boat', 'price': 10.00}, {'id': 3, 'name': 'brick', 'price': 15.00}]
list4 = [{'id': 1, 'name': 'bat', 'price': 14.00}, {'id': 2, 'name': 'ball', 'price': 9.00}, {'id': 3, 'name': 'brick', 'price': 13.00}]
lists = [list1, list2, list3, list4]
sets = [set(x['id'] for x in lst) for lst in lists]
intersection = set.intersection(*sets)
all_keys = set(lists[0][0].keys())
result = []
for ident in intersection:
res = [dic for lst in lists
for dic in lst
if dic['id'] == ident]
replicated_keys = []
for key in all_keys:
if len(set(dic[key] for dic in res)) == 1:
replicated_keys.append(key)
result.append({key: res[0][key] for key in replicated_keys})
print(result)
Result:
[{'id': 1, 'name': 'bat'}, {'id': 3, 'name': 'brick'}]
What we do here is:
Look at each id in intersection and grab each dict corresponding to that id.
Find which keys have the same value in all of those dicts (one of which is guaranteed to be id).
Put those key-value pairs into result
This code assumes that:
Each dict in list1, list2, ... will have the same keys. If this assumption is false, let me know - it shouldn't be difficult to relax.