I need help calculating each dict of dict date values to percentage.
raw_data = [{'name':'AB', 'date':datetime.date(2012, 10, 2), 'price': 23.80}, {'name':'AB', 'date':datetime.date(2012, 10, 3), 'price': 23.72}]
i have formarted above dictionary to below format using collection.
import collections
res = collections.defaultdict(dict)
for row in raw_data:
row_col = res[row['name']]
row_col[row['date']] = row['price']
{'AB': {datetime.date(2012, 10, 2): 23.80,
datetime.date(2012, 10, 3): 23.72,
datetime.date(2012, 10, 4): 25.90,
datetime.date(2012, 10, 5): 29.95}
Now i need to calculate above data into below format.
Calculation formula :
last price will dividend for all the top values
Date Price Percentage
datetime.date(2012, 10, 5) 29.95 26%
datetime.date(2012, 10, 4) 25.90 9%
datetime.date(2012, 10, 3) 23.72 0%
datetime.date(2012, 10, 2) 23.80 0
calculation goes like this
(23.72/23.80-1) * 100 = 0%
(25.90/23.80-1) * 100 = 9%
(29.95/23.80-1) * 100 = 26%
Any help really appreciate it.
You can grab a list of all the values in your dictionary with something like value_list = res.values(). This will be iterable, and you can grab your price values with a for loop and list slicing. value_list[0] will then contain your lowest price that you're dividing everything by. Then depending on what you plan on doing with the data, you can use a for loop to calculate all the percentages or wrap it in a function and run it as needed.
Referenced: Python: Index a Dictionary?
import datetime
import collections
raw_data = [
{'name':'AB', 'date':datetime.date(2012, 10, 2), 'price': 23.80},
{'name':'AB', 'date':datetime.date(2012, 10, 3), 'price': 23.72},
{'name':'AB', 'date':datetime.date(2012, 10, 4), 'price': 25.90},
{'name':'AB', 'date':datetime.date(2012, 10, 5), 'price': 29.95}
]
#all unique names in raw_data
names = set(row["name"] for row in raw_data)
#lowest prices, keyed by name
lowestPrices = {name: min(row["price"] for row in raw_data) for name in names}
for row in raw_data:
name = row["name"]
lowestPrice = lowestPrices[name]
price = row["price"]
percentage = ((price/lowestPrice)-1)*100
row["percentage"] = percentage
print raw_data
Output (newlines added by me):
[
{'date': datetime.date(2012, 10, 5), 'price': 29.95, 'percentage': 26.264755480607093, 'name': 'AB'},
{'date': datetime.date(2012, 10, 4), 'price': 25.9, 'percentage': 9.190556492411472, 'name': 'AB'},
{'date': datetime.date(2012, 10, 2), 'price': 23.8, 'percentage': 0.337268128161905, 'name': 'AB'},
{'date': datetime.date(2012, 10, 3), 'price': 23.72, 'percentage': 0, 'name': 'AB'}
]
Related
I have this data :
[
{'name': 'INV/2021/0913', 'invoice_date': datetime.date(2021, 3, 12), 'qty_total': 5.0},
{'name': 'INV/2021/0965', 'invoice_date': datetime.date(2021, 3, 14), 'qty_total': 6.0},
{'name': 'INV/2021/0966', 'invoice_date': datetime.date(2021, 3, 14), 'qty_total': 7.0},
{'name': 'INV/2021/0967', 'invoice_date': datetime.date(2021, 3, 14), 'qty_total': 3.0},
{'name': 'INV/2021/0992', 'invoice_date': datetime.date(2021, 3, 15), 'qty_total': 4.0}
]
As it can be seen the middle 3 dicts have same date.
I want to combine the dictionaries having the same invoice_date and sum up the its qty_total.
Set the name attribute to "" for the combined dictionaries.
The result should look like this:
[
{'name': 'INV/2021/0913', 'invoice_date': datetime.date(2021, 3, 12), 'qty_total': 5.0},
{'name': '', 'invoice_date': datetime.date(2021, 3, 14), 'qty_total': 16.0},
{'name': 'INV/2021/0992', 'invoice_date': datetime.date(2021, 3, 15), 'qty_total': 4.0}
]
use itertools.groupby
from datetime import datetime
from itertools import groupby
l = [
{'name': 'INV/2021/0913', 'invoice_date': datetime(2021, 3, 12).date(), 'qty_total': 5.0},
{'name': 'INV/2021/0965', 'invoice_date': datetime(2021, 3, 14).date(), 'qty_total': 6.0},
{'name': 'INV/2021/0966', 'invoice_date': datetime(2021, 3, 14).date(), 'qty_total': 7.0},
{'name': 'INV/2021/0967', 'invoice_date': datetime(2021, 3, 14).date(), 'qty_total': 3.0},
{'name': 'INV/2021/0992', 'invoice_date': datetime(2021, 3, 15).date(), 'qty_total': 4.0}
]
res = []
for k, v in groupby(sorted(l, key=lambda x: x["invoice_date"]), key=lambda x: (x["invoice_date"])):
val = list(v)
res.append(
{"name": " " if len(val)>1 else val[0]["name"], "invoice_date": k, "qty_total": sum(vals["qty_total"] for vals in val)}
)
print(res)
Output
[{'name': 'INV/2021/0913',
'invoice_date': datetime.date(2021, 3, 12),
'qty_total': 5.0},
{'name': ' ', 'invoice_date': datetime.date(2021, 3, 14), 'qty_total': 16.0},
{'name': 'INV/2021/0992',
'invoice_date': datetime.date(2021, 3, 15),
'qty_total': 4.0}]
I have a Pandas Dataframe where the column 'items' is a dictionary and shows per transaction which products have been bought:
data = {'price':[40, 15, 10, 2],
'items': ["{'product': 'Product1', 'quantity': 4, 'product': 'Product2', 'quantity': 1}", "{'product': 'Product2', 'quantity': 1, 'product': 'Product3', 'quantity': 1,'product': 'Product1', 'quantity': 1}", "{'product': 'Product1', 'quantity': 4}", "{'product': 'Product3', 'quantity': 1, 'product': 'Product1', 'quantity': 1}"]
}
df = pd.DataFrame (data, columns = ['price', 'items'])
I want to find out which products have been bought most. In this case the result should look like:
Product1: 4
Product2: 2
How can I count the most frequent values of the key 'product' within the column 'items'?
Perhaps you could use a namedtuple (from the built-in collections package).
First, define a named tuple called Record create a list of these:
from collections import namedtuple
import pandas as pd
Record = namedtuple('Record', 'price product quantity')
records = [
Record(40, 'Product1', 4), Record(40, 'Product2', 1),
Record(15, 'Product2', 1), Record(15, 'Product3', 1), Record(15, 'Product1', 1),
Record(10, 'Product1', 4),
Record( 2, 'Product3', 1), Record(2, 'Product1', 1),]
Second, create the data frame, and use groupby to compute number of each product:
# create data frame
df = pd.DataFrame(records)
# compute summary statistic
df = df.groupby('product')['quantity'].sum()
print(df)
product
Product1 10
Product2 2
Product3 2
Name: quantity, dtype: int64
I did not match your expected results. Sorry if I misunderstood your data and/or question.
I'm having some trouble accessing a value that is inside an array that contains a dictionary and another array.
It looks like this:
[{'name': 'Alex',
'number_of_toys': [{'classification': 3, 'count': 383},
{'classification': 1, 'count': 29},
{'classification': 0, 'count': 61}],
'total_toys': 473},
{'name': 'John',
'number_of_toys': [{'classification': 3, 'count': 8461},
{'classification': 0, 'count': 3825},
{'classification': 1, 'count': 1319}],
'total_toys': 13605}]
I want to access the 'count' number for each 'classification'. For example, for 'name' Alex, if 'classification' is 3, then the code returns the 'count' of 383, and so on for the other classifications and names.
Thanks for your help!
Not sure what your question asks, but if it's just a mapping exercise this will get you on the right track.
def get_toys(personDict):
person_toys = personDict.get('number_of_toys')
return [ (toys.get('classification'), toys.get('count')) for toys in person_toys]
def get_person_toys(database):
return [(personDict.get('name'), get_toys(personDict)) for personDict in database]
This result is:
[('Alex', [(3, 383), (1, 29), (0, 61)]), ('John', [(3, 8461), (0, 3825), (1, 1319)])]
This isn't as elegant as the previous answer because it doesn't iterate over the values, but if you want to select specific elements, this is one way to do that:
data = [{'name': 'Alex',
'number_of_toys': [{'classification': 3, 'count': 383},
{'classification': 1, 'count': 29},
{'classification': 0, 'count': 61}],
'total_toys': 473},
{'name': 'John',
'number_of_toys': [{'classification': 3, 'count': 8461},
{'classification': 0, 'count': 3825},
{'classification': 1, 'count': 1319}],
'total_toys': 13605}]
import pandas as pd
df = pd.DataFrame(data)
print(df.loc[0]['name'])
print(df.loc[0][1][0]['classification'])
print(df.loc[0][1][0]['count'])
which gives:
Alex
3
383
I would like to create nested dict in python3, I've the following list(from a sql-query):
[('madonna', 'Portland', 'Oregon', '0.70', '+5551234', 'music', datetime.date(2016, 9, 8), datetime.date(2016, 9, 1)), ('jackson', 'Laredo', 'Texas', '2.03', '+555345', 'none', datetime.date(2016, 5, 23), datetime.date(2016, 5, 16)), ('bohlen', 'P', 'P', '2.27', '+555987', 'PhD Student', datetime.date(2016, 9, 7))]
I would like to have the following output:
{madonna:{city:Portland, State:Oregon, Index: 0.70, Phone:+5551234, art:music, exp-date:2016, 9, 8, arrival-date:datetime.date(2016, 5, 23)},jackson:{city: Laredo, State:Texas........etc...}}
Can somebody show me an easy to understand code?
I try:
from collections import defaultdict
usercheck = defaultdict(list)
for accname, div_ort, standort, raum, telefon, position, exp, dep in cur.fetchall():
usercheck(accname).append[..]
but this don't work, I can't think any further myself
You can use Dict Comprehension (defined here) to dynamically create a dictionary based on the elements of a list:
sql_list = [
('madonna', 'Portland', 'Oregon', '0.70', '+5551234', 'music', datetime.date(2016, 9, 8), datetime.date(2016, 9, 1)),
('jackson', 'Laredo', 'Texas', '2.03', '+555345', 'none', datetime.date(2016, 5, 23), datetime.date(2016, 5, 16)),
('bohlen', 'P', 'P', '2.27', '+555987', 'PhD Student', datetime.date(2016, 9, 7))
]
sql_dict = {
element[0]: {
'city': element[1],
'state': element[2],
'index': element[3],
'phone': element[4],
'art': element[5],
} for element in sql_list
}
Keep in mind that every item in the dictionary needs to have a key and a value, and in your example you have a few values with no key.
If you have a list of the columns, you can use the zip function:
from collections import defaultdict
import datetime
# list of columns returned from your database query
columns = ["city", "state", "index", "phone", "art", "exp-date", "arrival-date"]
usercheck = defaultdict(list)
for row in cur.fetchall():
usercheck[row[0]] = defaultdict(list, zip(columns, row[1:]))
print usercheck
This will output a dictionary like:
defaultdict(<type 'list'>, {'madonna': defaultdict(<type 'list'>, {'city': 'Portland', 'art': 'music', 'index': '0.70', 'phone': '+5551234', 'state': 'Oregon', 'arrival-date': datetime.date(2016, 9, 1), 'exp-date': datetime.date(2016, 9, 8)}), 'jackson': defaultdict(<type 'list'>, {'city': 'Laredo', 'art': 'none', 'index': '2.03', 'phone': '+555345', 'state': 'Texas', 'arrival-date': datetime.date(2016, 5, 16), 'exp-date': datetime.date(2016, 5, 23)}), 'bohlen': defaultdict(<type 'list'>, {'city': 'P', 'art': 'PhD Student', 'index': '2.27', 'phone': '+555987', 'state': 'P', 'arrival-date': None, 'exp-date': datetime.date(2016, 9, 7)})})
When using defaultdict, the argument specifies the default value type in the dictionary.
from collections import defaultdict
usercheck = defaultdict(dict)
for accname, div_ort, standort, raum, telefon, position, exp, dep in cur.fetchall():
usercheck[accname]['city'] = div_ort
usercheck[accname]['state'] = standout
...
The keys in the dictionary are referenced using [key], not (key).
I'm trying to write a function, in an elegant way, that will group a list of dictionaries and aggregate (sum) the values of like-keys.
Example:
my_dataset = [
{
'date': datetime.date(2013, 1, 1),
'id': 99,
'value1': 10,
'value2': 10
},
{
'date': datetime.date(2013, 1, 1),
'id': 98,
'value1': 10,
'value2': 10
},
{
'date': datetime.date(2013, 1, 2),
'id' 99,
'value1': 10,
'value2': 10
}
]
group_and_sum_dataset(my_dataset, 'date', ['value1', 'value2'])
"""
Should return:
[
{
'date': datetime.date(2013, 1, 1),
'value1': 20,
'value2': 20
},
{
'date': datetime.date(2013, 1, 2),
'value1': 10,
'value2': 10
}
]
"""
I've tried doing this using itertools for the groupby and summing each like-key value pair, but am missing something here. Here's what my function currently looks like:
def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):
keyfunc = operator.itemgetter(group_by_key)
dataset.sort(key=keyfunc)
new_dataset = []
for key, index in itertools.groupby(dataset, keyfunc):
d = {group_by_key: key}
d.update({k:sum([item[k] for item in index]) for k in sum_value_keys})
new_dataset.append(d)
return new_dataset
You can use collections.Counter and collections.defaultdict.
Using a dict this can be done in O(N), while sorting requires O(NlogN) time.
from collections import defaultdict, Counter
def solve(dataset, group_by_key, sum_value_keys):
dic = defaultdict(Counter)
for item in dataset:
key = item[group_by_key]
vals = {k:item[k] for k in sum_value_keys}
dic[key].update(vals)
return dic
...
>>> d = solve(my_dataset, 'date', ['value1', 'value2'])
>>> d
defaultdict(<class 'collections.Counter'>,
{
datetime.date(2013, 1, 2): Counter({'value2': 10, 'value1': 10}),
datetime.date(2013, 1, 1): Counter({'value2': 20, 'value1': 20})
})
The advantage of Counter is that it'll automatically sum the values of similar keys.:
Example:
>>> c = Counter(**{'value1': 10, 'value2': 5})
>>> c.update({'value1': 7, 'value2': 3})
>>> c
Counter({'value1': 17, 'value2': 8})
Thanks, I forgot about Counter. I still wanted to maintain the output format and sorting of my returned dataset, so here's what my final function looks like:
def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):
container = defaultdict(Counter)
for item in dataset:
key = item[group_by_key]
values = {k:item[k] for k in sum_value_keys}
container[key].update(values)
new_dataset = [
dict([(group_by_key, item[0])] + item[1].items())
for item in container.items()
]
new_dataset.sort(key=lambda item: item[group_by_key])
return new_dataset
Here's an approach using more_itertools where you simply focus on how to construct output.
Given
import datetime
import collections as ct
import more_itertools as mit
dataset = [
{"date": datetime.date(2013, 1, 1), "id": 99, "value1": 10, "value2": 10},
{"date": datetime.date(2013, 1, 1), "id": 98, "value1": 10, "value2": 10},
{"date": datetime.date(2013, 1, 2), "id": 99, "value1": 10, "value2": 10}
]
Code
# Step 1: Build helper functions
kfunc = lambda d: d["date"]
vfunc = lambda d: {k:v for k, v in d.items() if k.startswith("val")}
rfunc = lambda lst: sum((ct.Counter(d) for d in lst), ct.Counter())
# Step 2: Build a dict
reduced = mit.map_reduce(dataset, keyfunc=kfunc, valuefunc=vfunc, reducefunc=rfunc)
reduced
Output
defaultdict(None,
{datetime.date(2013, 1, 1): Counter({'value1': 20, 'value2': 20}),
datetime.date(2013, 1, 2): Counter({'value1': 10, 'value2': 10})})
The items are grouped by date and pertinent values are reduced as Counters.
Details
Steps
build helper functions to customize construction of keys, values and reduced values in the final defaultdict. Here we want to:
group by date (kfunc)
built dicts keeping the "value*" parameters (vfunc)
aggregate the dicts (rfunc) by converting to collections.Counters and summing them. See an equivalent rfunc below+.
pass in the helper functions to more_itertools.map_reduce.
Simple Groupby
... say in that example you wanted to group by id and date?
No problem.
>>> kfunc2 = lambda d: (d["date"], d["id"])
>>> mit.map_reduce(dataset, keyfunc=kfunc2, valuefunc=vfunc, reducefunc=rfunc)
defaultdict(None,
{(datetime.date(2013, 1, 1),
99): Counter({'value1': 10, 'value2': 10}),
(datetime.date(2013, 1, 1),
98): Counter({'value1': 10, 'value2': 10}),
(datetime.date(2013, 1, 2),
99): Counter({'value1': 10, 'value2': 10})})
Customized Output
While the resulting data structure clearly and concisely presents the outcome, the OP's expected output can be rebuilt as a simple list of dicts:
>>> [{**dict(date=k), **v} for k, v in reduced.items()]
[{'date': datetime.date(2013, 1, 1), 'value1': 20, 'value2': 20},
{'date': datetime.date(2013, 1, 2), 'value1': 10, 'value2': 10}]
For more on map_reduce, see the docs. Install via > pip install more_itertools.
+An equivalent reducing function:
def rfunc(lst: typing.List[dict]) -> ct.Counter:
"""Return reduced mappings from map-reduce values."""
c = ct.Counter()
for d in lst:
c += ct.Counter(d)
return c