Count Most Frequent Values for Key in Dictionary

Count Most Frequent Values for Key in Dictionary - python

I have a Pandas Dataframe where the column 'items' is a dictionary and shows per transaction which products have been bought:
data = {'price':[40, 15, 10, 2],
'items': ["{'product': 'Product1', 'quantity': 4, 'product': 'Product2', 'quantity': 1}", "{'product': 'Product2', 'quantity': 1, 'product': 'Product3', 'quantity': 1,'product': 'Product1', 'quantity': 1}", "{'product': 'Product1', 'quantity': 4}", "{'product': 'Product3', 'quantity': 1, 'product': 'Product1', 'quantity': 1}"]
}
df = pd.DataFrame (data, columns = ['price', 'items'])
I want to find out which products have been bought most. In this case the result should look like:
Product1: 4
Product2: 2
How can I count the most frequent values of the key 'product' within the column 'items'?

Perhaps you could use a namedtuple (from the built-in collections package).
First, define a named tuple called Record create a list of these:
from collections import namedtuple
import pandas as pd
Record = namedtuple('Record', 'price product quantity')
records = [
Record(40, 'Product1', 4), Record(40, 'Product2', 1),
Record(15, 'Product2', 1), Record(15, 'Product3', 1), Record(15, 'Product1', 1),
Record(10, 'Product1', 4),
Record( 2, 'Product3', 1), Record(2, 'Product1', 1),]
Second, create the data frame, and use groupby to compute number of each product:
# create data frame
df = pd.DataFrame(records)
# compute summary statistic
df = df.groupby('product')['quantity'].sum()
print(df)
product
Product1 10
Product2 2
Product3 2
Name: quantity, dtype: int64
I did not match your expected results. Sorry if I misunderstood your data and/or question.

Related

Dictionary list/dict comparison

I would really appreciate any help on the below. I am looking to create a set of values with 1 name compiling all duplicates, with a second dict value to total another value from a list of dicts. i have compiled the below code as an example:
l = [{'id': 1, 'name': 'apple', 'price': '100', 'year': '2000', 'currency': 'eur'},
{'id': 2, 'name': 'apple', 'price': '150', 'year': '2071', 'currency': 'eur'},
{'id': 3, 'name': 'apple', 'price': '1220', 'year': '2076', 'currency': 'eur'},
{'id': 4, 'name': 'cucumber', 'price': '90000000', 'year': '2080', 'currency': 'eur'},
{'id': 5, 'name': 'pear', 'price': '1000', 'year': '2000', 'currency': 'eur'},
{'id': 6, 'name': 'apple', 'price': '150', 'year': '2022', 'currency': 'eur'},
{'id': 9, 'name': 'apple', 'price': '100', 'year': '2000', 'currency': 'eur'},
{'id': 10, 'name': 'grape', 'price': '150', 'year': '2022', 'currency': 'eur'},
]
new_list = []
for d in l:
if d['name'] not in new_list:
new_list.append(d['name'])
print(new_list)
price_list = []
for price in l:
if price['price'] not in price_list:
price_list.append(price['price'])
print(price_list)
The out put i am hoping to achieve is:
[{'name': 'apple'}, {'price': <The total price for all apples>}]

Use a dictionary whose keys are the names and values are the list of prices. Then calculate the averages of each list.
d = {}
for item in l:
d.setdefault(item['name'], []).append(int(item['price']))
for name, prices in d.items()
d[name] = sum(prices)
print(d)
Actually, I thought this was the same as yesterday's question, where you wanted the average. If you just want the total, you don't need the lists. Use a defaultdict containing integers, and just add the price to it.
from collections import defaultdict
d = defaultdict(int)
for item in l:
d[item['name']] += int(item['price'])
print(d)

This method only requires one loop:
prices = {}
for item in l:
prices.update({item['name']: prices.get(item['name'], 0) + int(item['price'])})
print(prices)
Just for fun I decided to also implement the functionality with the item and price dictionaries separated as asked in the question, which gave the following horrendous code:
prices = []
for item in l:
# get indices of prices of corresponding items
price_idx = [n+1 for n, x in enumerate(prices) if item['name'] == x.get('name') and n % 2 == 0]
if not price_idx:
prices.append({'name': item['name']})
prices.append({'price': int(item['price'])})
else:
prices[price_idx[0]].update({'price': prices[price_idx[0]]['price'] + int(item['price'])})
print(prices)
And requires the following function to retrieve prices:
def get_price(name):
for n, x in enumerate(prices):
if n % 2 == 0 and x['name'] == name:
return prices2[n+1]['price']
Which honestly completely defeats the point of having a data structure. But if it answers your question, there you go.

This could be another one:
result = {}
for item in l:
if item['name'] not in result:
result[item['name']] = {'name': item['name'], 'price': 0}
result[item['name']]['price'] += int(item['price'])

Python divide rows by specific row for each column

I need to divide each row by a specific/set row for each column in my data frame. In this case, I need to divide every row by Revenue for each time period. I want to get a percentage of how much each account is of Revenue. I would also like to figure out how to make dynamic for any amount of columns.
My current Data frame:
data = {'202112 YTD': {'Gross Margin': 200000,
'Other (Income) & Expense': -100000,
'Revenue': 5000000,
'SG&A Expense': 150000,
'Segment EBITDA': 200000},
'202212 YTD': {'Gross Margin': 2850000,
'Other (Income) & Expense': -338000,
'Revenue': 6000000,
'SG&A Expense': 15000,
'Segment EBITDA': 200000}}
df = pd.DataFrame.from_dict(data)
df
Desired Output:
outdata = {'202112 YTD': {'Gross Margin': 0.040,
'Other (Income) & Expense': -0.020,
'Revenue': 1,
'SG&A Expense': 0.030,
'Segment EBITDA': 0.040},
'202212 YTD': {'Gross Margin': 0.475,
'Other (Income) & Expense': -0.056,
'Revenue': 1,
'SG&A Expense': 0.003,
'Segment EBITDA': 0.033}}
outdf = pd.DataFrame.from_dict(outdata)
outdf
Help would be appreciated. Original attempt as was to structure solution like this example:
import copy
import pandas as pd
original_table = [
{'name': 'Alice', 'age': 25, 'gender': 'Female'},
{'name': 'Bob', 'age': 32, 'gender': 'Male'},
{'name': 'Charlie', 'age': 40, 'gender': 'Male'},
{'name': 'Daisy', 'age': 22, 'gender': 'Female'},
{'name': 'Eve', 'age': 18, 'gender': 'Female'},
]
# Duplicate the table using copy.deepcopy()
duplicate_table = copy.deepcopy(original_table)
# Choose a specific column to divide the rows by
column_name = 'age'
divisor_value = original_table[3][column_name]
# Iterate over the rows in the duplicate table and divide each column by the divisor value
for i, row in enumerate(duplicate_table):
if column_name in row:
duplicate_table[i][column_name] = row[column_name] / divisor_value
else:
print(f"column: {column_name} not found in table")
# Convert the duplicate table to a DataFrame
duplicate_df = pd.DataFrame(duplicate_table)
# Print the duplicate DataFrame
duplicate_df
duplicate_df

Simply use:
outdf = df.div(df.loc['Revenue']).round(3)
Output:
202112 YTD 202212 YTD
Gross Margin 0.04 0.475
Other (Income) & Expense -0.02 -0.056
Revenue 1.00 1.000
SG&A Expense 0.03 0.002
Segment EBITDA 0.04 0.033

How to distinct (count), group by and sum data in DataFrame in Python?

I have the next DataFrame:
a = [{'order': '789', 'name': 'A', 'date': 20220501, 'sum': 15.1}, {'order': '456', 'name': 'A', 'date': 20220501, 'sum': 19}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 14.1}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 22.9}, {'order': '700', 'name': 'B', 'date': 20220502, 'sum': 30.1}, {'order': '710', 'name': 'B', 'date': 20220502, 'sum': 10.5}]
df = pd.DataFrame(a)
print(df)
I need, to distinct (count) value in column order and to add values to the new column order_count, grouping by columns name and date, sum values in column sum.
I need to get the next result:

In your case do
out = df.groupby(['name','date'],as_index=False).agg({'sum':'sum','order':'nunique'})
Out[652]:
name date sum order
0 A 20220501 34.1 2
1 B 20220502 77.6 3

import pandas as pd
df[['name','date','sum']].groupby(by=['name','date']).sum().reset_index().rename(columns={'sum':'order_count'}).join(df[['name','date','sum']].groupby(by=['name','date']).count().reset_index().drop(['name','date'],axis=1))

pandas dataframe convert values in array of objects

I want to convert the below pandas data frame
data = pd.DataFrame([[1,2], [5,6]], columns=['10+', '20+'], index=['A', 'B'])
data.index.name = 'City'
data.columns.name= 'Age Group'
print data
Age Group 10+ 20+
City
A 1 2
B 5 6
in to an array of dictionaries, like
[
{'Age Group': '10+', 'City': 'A', 'count': 1},
{'Age Group': '20+', 'City': 'A', 'count': 2},
{'Age Group': '10+', 'City': 'B', 'count': 5},
{'Age Group': '20+', 'City': 'B', 'count': 6}
]
I am able to get the above expected result using the following loops
result = []
cols_name = data.columns.name
index_names = data.index.name
for index in data.index:
for col in data.columns:
result.append({cols_name: col, index_names: index, 'count': data.loc[index, col]})
Is there any better ways of doing this? Since my original data will be having large number of records, using for loops will take more time.

I think you can use stack with reset_index for reshape and last to_dict:
print (data.stack().reset_index(name='count'))
City Age Group count
0 A 10+ 1
1 A 20+ 2
2 B 10+ 5
3 B 20+ 6
print (data.stack().reset_index(name='count').to_dict(orient='records'))
[
{'Age Group': '10+', 'City': 'A', 'count': 1},
{'Age Group': '20+', 'City': 'A', 'count': 2},
{'Age Group': '10+', 'City': 'B', 'count': 5},
{'Age Group': '20+', 'City': 'B', 'count': 6}
]

Calculating percentage based on python dictionaries

I need help calculating each dict of dict date values to percentage.
raw_data = [{'name':'AB', 'date':datetime.date(2012, 10, 2), 'price': 23.80}, {'name':'AB', 'date':datetime.date(2012, 10, 3), 'price': 23.72}]
i have formarted above dictionary to below format using collection.
import collections
res = collections.defaultdict(dict)
for row in raw_data:
row_col = res[row['name']]
row_col[row['date']] = row['price']
{'AB': {datetime.date(2012, 10, 2): 23.80,
datetime.date(2012, 10, 3): 23.72,
datetime.date(2012, 10, 4): 25.90,
datetime.date(2012, 10, 5): 29.95}
Now i need to calculate above data into below format.
Calculation formula :
last price will dividend for all the top values
Date Price Percentage
datetime.date(2012, 10, 5) 29.95 26%
datetime.date(2012, 10, 4) 25.90 9%
datetime.date(2012, 10, 3) 23.72 0%
datetime.date(2012, 10, 2) 23.80 0
calculation goes like this
(23.72/23.80-1) * 100 = 0%
(25.90/23.80-1) * 100 = 9%
(29.95/23.80-1) * 100 = 26%
Any help really appreciate it.

You can grab a list of all the values in your dictionary with something like value_list = res.values(). This will be iterable, and you can grab your price values with a for loop and list slicing. value_list[0] will then contain your lowest price that you're dividing everything by. Then depending on what you plan on doing with the data, you can use a for loop to calculate all the percentages or wrap it in a function and run it as needed.
Referenced: Python: Index a Dictionary?

import datetime
import collections
raw_data = [
{'name':'AB', 'date':datetime.date(2012, 10, 2), 'price': 23.80},
{'name':'AB', 'date':datetime.date(2012, 10, 3), 'price': 23.72},
{'name':'AB', 'date':datetime.date(2012, 10, 4), 'price': 25.90},
{'name':'AB', 'date':datetime.date(2012, 10, 5), 'price': 29.95}
]
#all unique names in raw_data
names = set(row["name"] for row in raw_data)
#lowest prices, keyed by name
lowestPrices = {name: min(row["price"] for row in raw_data) for name in names}
for row in raw_data:
name = row["name"]
lowestPrice = lowestPrices[name]
price = row["price"]
percentage = ((price/lowestPrice)-1)*100
row["percentage"] = percentage
print raw_data
Output (newlines added by me):
[
{'date': datetime.date(2012, 10, 5), 'price': 29.95, 'percentage': 26.264755480607093, 'name': 'AB'},
{'date': datetime.date(2012, 10, 4), 'price': 25.9, 'percentage': 9.190556492411472, 'name': 'AB'},
{'date': datetime.date(2012, 10, 2), 'price': 23.8, 'percentage': 0.337268128161905, 'name': 'AB'},
{'date': datetime.date(2012, 10, 3), 'price': 23.72, 'percentage': 0, 'name': 'AB'}
]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Count Most Frequent Values for Key in Dictionary - python

Related

Dictionary list/dict comparison

Python divide rows by specific row for each column

How to distinct (count), group by and sum data in DataFrame in Python?

pandas dataframe convert values in array of objects

Calculating percentage based on python dictionaries

Categories

Resources