I have a Pandas Dataframe where the column 'items' is a dictionary and shows per transaction which products have been bought:
data = {'price':[40, 15, 10, 2],
'items': ["{'product': 'Product1', 'quantity': 4, 'product': 'Product2', 'quantity': 1}", "{'product': 'Product2', 'quantity': 1, 'product': 'Product3', 'quantity': 1,'product': 'Product1', 'quantity': 1}", "{'product': 'Product1', 'quantity': 4}", "{'product': 'Product3', 'quantity': 1, 'product': 'Product1', 'quantity': 1}"]
}
df = pd.DataFrame (data, columns = ['price', 'items'])
I want to find out which products have been bought most. In this case the result should look like:
Product1: 4
Product2: 2
How can I count the most frequent values of the key 'product' within the column 'items'?
Perhaps you could use a namedtuple (from the built-in collections package).
First, define a named tuple called Record create a list of these:
from collections import namedtuple
import pandas as pd
Record = namedtuple('Record', 'price product quantity')
records = [
Record(40, 'Product1', 4), Record(40, 'Product2', 1),
Record(15, 'Product2', 1), Record(15, 'Product3', 1), Record(15, 'Product1', 1),
Record(10, 'Product1', 4),
Record( 2, 'Product3', 1), Record(2, 'Product1', 1),]
Second, create the data frame, and use groupby to compute number of each product:
# create data frame
df = pd.DataFrame(records)
# compute summary statistic
df = df.groupby('product')['quantity'].sum()
print(df)
product
Product1 10
Product2 2
Product3 2
Name: quantity, dtype: int64
I did not match your expected results. Sorry if I misunderstood your data and/or question.
Related
I would really appreciate any help on the below. I am looking to create a set of values with 1 name compiling all duplicates, with a second dict value to total another value from a list of dicts. i have compiled the below code as an example:
l = [{'id': 1, 'name': 'apple', 'price': '100', 'year': '2000', 'currency': 'eur'},
{'id': 2, 'name': 'apple', 'price': '150', 'year': '2071', 'currency': 'eur'},
{'id': 3, 'name': 'apple', 'price': '1220', 'year': '2076', 'currency': 'eur'},
{'id': 4, 'name': 'cucumber', 'price': '90000000', 'year': '2080', 'currency': 'eur'},
{'id': 5, 'name': 'pear', 'price': '1000', 'year': '2000', 'currency': 'eur'},
{'id': 6, 'name': 'apple', 'price': '150', 'year': '2022', 'currency': 'eur'},
{'id': 9, 'name': 'apple', 'price': '100', 'year': '2000', 'currency': 'eur'},
{'id': 10, 'name': 'grape', 'price': '150', 'year': '2022', 'currency': 'eur'},
]
new_list = []
for d in l:
if d['name'] not in new_list:
new_list.append(d['name'])
print(new_list)
price_list = []
for price in l:
if price['price'] not in price_list:
price_list.append(price['price'])
print(price_list)
The out put i am hoping to achieve is:
[{'name': 'apple'}, {'price': <The total price for all apples>}]
Use a dictionary whose keys are the names and values are the list of prices. Then calculate the averages of each list.
d = {}
for item in l:
d.setdefault(item['name'], []).append(int(item['price']))
for name, prices in d.items()
d[name] = sum(prices)
print(d)
Actually, I thought this was the same as yesterday's question, where you wanted the average. If you just want the total, you don't need the lists. Use a defaultdict containing integers, and just add the price to it.
from collections import defaultdict
d = defaultdict(int)
for item in l:
d[item['name']] += int(item['price'])
print(d)
This method only requires one loop:
prices = {}
for item in l:
prices.update({item['name']: prices.get(item['name'], 0) + int(item['price'])})
print(prices)
Just for fun I decided to also implement the functionality with the item and price dictionaries separated as asked in the question, which gave the following horrendous code:
prices = []
for item in l:
# get indices of prices of corresponding items
price_idx = [n+1 for n, x in enumerate(prices) if item['name'] == x.get('name') and n % 2 == 0]
if not price_idx:
prices.append({'name': item['name']})
prices.append({'price': int(item['price'])})
else:
prices[price_idx[0]].update({'price': prices[price_idx[0]]['price'] + int(item['price'])})
print(prices)
And requires the following function to retrieve prices:
def get_price(name):
for n, x in enumerate(prices):
if n % 2 == 0 and x['name'] == name:
return prices2[n+1]['price']
Which honestly completely defeats the point of having a data structure. But if it answers your question, there you go.
This could be another one:
result = {}
for item in l:
if item['name'] not in result:
result[item['name']] = {'name': item['name'], 'price': 0}
result[item['name']]['price'] += int(item['price'])
I need to divide each row by a specific/set row for each column in my data frame. In this case, I need to divide every row by Revenue for each time period. I want to get a percentage of how much each account is of Revenue. I would also like to figure out how to make dynamic for any amount of columns.
My current Data frame:
data = {'202112 YTD': {'Gross Margin': 200000,
'Other (Income) & Expense': -100000,
'Revenue': 5000000,
'SG&A Expense': 150000,
'Segment EBITDA': 200000},
'202212 YTD': {'Gross Margin': 2850000,
'Other (Income) & Expense': -338000,
'Revenue': 6000000,
'SG&A Expense': 15000,
'Segment EBITDA': 200000}}
df = pd.DataFrame.from_dict(data)
df
Desired Output:
outdata = {'202112 YTD': {'Gross Margin': 0.040,
'Other (Income) & Expense': -0.020,
'Revenue': 1,
'SG&A Expense': 0.030,
'Segment EBITDA': 0.040},
'202212 YTD': {'Gross Margin': 0.475,
'Other (Income) & Expense': -0.056,
'Revenue': 1,
'SG&A Expense': 0.003,
'Segment EBITDA': 0.033}}
outdf = pd.DataFrame.from_dict(outdata)
outdf
Help would be appreciated. Original attempt as was to structure solution like this example:
import copy
import pandas as pd
original_table = [
{'name': 'Alice', 'age': 25, 'gender': 'Female'},
{'name': 'Bob', 'age': 32, 'gender': 'Male'},
{'name': 'Charlie', 'age': 40, 'gender': 'Male'},
{'name': 'Daisy', 'age': 22, 'gender': 'Female'},
{'name': 'Eve', 'age': 18, 'gender': 'Female'},
]
# Duplicate the table using copy.deepcopy()
duplicate_table = copy.deepcopy(original_table)
# Choose a specific column to divide the rows by
column_name = 'age'
divisor_value = original_table[3][column_name]
# Iterate over the rows in the duplicate table and divide each column by the divisor value
for i, row in enumerate(duplicate_table):
if column_name in row:
duplicate_table[i][column_name] = row[column_name] / divisor_value
else:
print(f"column: {column_name} not found in table")
# Convert the duplicate table to a DataFrame
duplicate_df = pd.DataFrame(duplicate_table)
# Print the duplicate DataFrame
duplicate_df
duplicate_df
Simply use:
outdf = df.div(df.loc['Revenue']).round(3)
Output:
202112 YTD 202212 YTD
Gross Margin 0.04 0.475
Other (Income) & Expense -0.02 -0.056
Revenue 1.00 1.000
SG&A Expense 0.03 0.002
Segment EBITDA 0.04 0.033
I have the next DataFrame:
a = [{'order': '789', 'name': 'A', 'date': 20220501, 'sum': 15.1}, {'order': '456', 'name': 'A', 'date': 20220501, 'sum': 19}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 14.1}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 22.9}, {'order': '700', 'name': 'B', 'date': 20220502, 'sum': 30.1}, {'order': '710', 'name': 'B', 'date': 20220502, 'sum': 10.5}]
df = pd.DataFrame(a)
print(df)
I need, to distinct (count) value in column order and to add values to the new column order_count, grouping by columns name and date, sum values in column sum.
I need to get the next result:
In your case do
out = df.groupby(['name','date'],as_index=False).agg({'sum':'sum','order':'nunique'})
Out[652]:
name date sum order
0 A 20220501 34.1 2
1 B 20220502 77.6 3
import pandas as pd
df[['name','date','sum']].groupby(by=['name','date']).sum().reset_index().rename(columns={'sum':'order_count'}).join(df[['name','date','sum']].groupby(by=['name','date']).count().reset_index().drop(['name','date'],axis=1))
I want to convert the below pandas data frame
data = pd.DataFrame([[1,2], [5,6]], columns=['10+', '20+'], index=['A', 'B'])
data.index.name = 'City'
data.columns.name= 'Age Group'
print data
Age Group 10+ 20+
City
A 1 2
B 5 6
in to an array of dictionaries, like
[
{'Age Group': '10+', 'City': 'A', 'count': 1},
{'Age Group': '20+', 'City': 'A', 'count': 2},
{'Age Group': '10+', 'City': 'B', 'count': 5},
{'Age Group': '20+', 'City': 'B', 'count': 6}
]
I am able to get the above expected result using the following loops
result = []
cols_name = data.columns.name
index_names = data.index.name
for index in data.index:
for col in data.columns:
result.append({cols_name: col, index_names: index, 'count': data.loc[index, col]})
Is there any better ways of doing this? Since my original data will be having large number of records, using for loops will take more time.
I think you can use stack with reset_index for reshape and last to_dict:
print (data.stack().reset_index(name='count'))
City Age Group count
0 A 10+ 1
1 A 20+ 2
2 B 10+ 5
3 B 20+ 6
print (data.stack().reset_index(name='count').to_dict(orient='records'))
[
{'Age Group': '10+', 'City': 'A', 'count': 1},
{'Age Group': '20+', 'City': 'A', 'count': 2},
{'Age Group': '10+', 'City': 'B', 'count': 5},
{'Age Group': '20+', 'City': 'B', 'count': 6}
]
I need help calculating each dict of dict date values to percentage.
raw_data = [{'name':'AB', 'date':datetime.date(2012, 10, 2), 'price': 23.80}, {'name':'AB', 'date':datetime.date(2012, 10, 3), 'price': 23.72}]
i have formarted above dictionary to below format using collection.
import collections
res = collections.defaultdict(dict)
for row in raw_data:
row_col = res[row['name']]
row_col[row['date']] = row['price']
{'AB': {datetime.date(2012, 10, 2): 23.80,
datetime.date(2012, 10, 3): 23.72,
datetime.date(2012, 10, 4): 25.90,
datetime.date(2012, 10, 5): 29.95}
Now i need to calculate above data into below format.
Calculation formula :
last price will dividend for all the top values
Date Price Percentage
datetime.date(2012, 10, 5) 29.95 26%
datetime.date(2012, 10, 4) 25.90 9%
datetime.date(2012, 10, 3) 23.72 0%
datetime.date(2012, 10, 2) 23.80 0
calculation goes like this
(23.72/23.80-1) * 100 = 0%
(25.90/23.80-1) * 100 = 9%
(29.95/23.80-1) * 100 = 26%
Any help really appreciate it.
You can grab a list of all the values in your dictionary with something like value_list = res.values(). This will be iterable, and you can grab your price values with a for loop and list slicing. value_list[0] will then contain your lowest price that you're dividing everything by. Then depending on what you plan on doing with the data, you can use a for loop to calculate all the percentages or wrap it in a function and run it as needed.
Referenced: Python: Index a Dictionary?
import datetime
import collections
raw_data = [
{'name':'AB', 'date':datetime.date(2012, 10, 2), 'price': 23.80},
{'name':'AB', 'date':datetime.date(2012, 10, 3), 'price': 23.72},
{'name':'AB', 'date':datetime.date(2012, 10, 4), 'price': 25.90},
{'name':'AB', 'date':datetime.date(2012, 10, 5), 'price': 29.95}
]
#all unique names in raw_data
names = set(row["name"] for row in raw_data)
#lowest prices, keyed by name
lowestPrices = {name: min(row["price"] for row in raw_data) for name in names}
for row in raw_data:
name = row["name"]
lowestPrice = lowestPrices[name]
price = row["price"]
percentage = ((price/lowestPrice)-1)*100
row["percentage"] = percentage
print raw_data
Output (newlines added by me):
[
{'date': datetime.date(2012, 10, 5), 'price': 29.95, 'percentage': 26.264755480607093, 'name': 'AB'},
{'date': datetime.date(2012, 10, 4), 'price': 25.9, 'percentage': 9.190556492411472, 'name': 'AB'},
{'date': datetime.date(2012, 10, 2), 'price': 23.8, 'percentage': 0.337268128161905, 'name': 'AB'},
{'date': datetime.date(2012, 10, 3), 'price': 23.72, 'percentage': 0, 'name': 'AB'}
]