This ist my example Table imported from a CSV
Name hours work
User1 2 Kiwi
User1 5 Melon
...
User1 3 Kiwi
And this is my desired output:
Total Kiwi:
User1 5
I guess it is possible with a right join or a groupby. But I cant relize it to real code.
I tried something like this
ou = pd.DataFrame([[ou["work"].sum()["kiwi"]]])
You need:
df = df.groupby(['Name','work'])['hours'].sum().unstack()
print (df)
work Kiwi Melon
Name
User1 5 5
Or:
df = df.pivot_table(index='Name', columns='work', values='hours', aggfunc='sum')
print (df)
work Kiwi Melon
Name
User1 5 5
And then:
print (df[['Kiwi']])
work Kiwi
Name
User1 5
Related
Agg by sales
Original Data:
Sales
Product
Qty
James
apple
10
Johnson
apple
1
Jessie
banana
2
Judy
melon
5
James
melon
5
Jessie
apple
8
To:
Sales
Apple
Melon
Banana
Total
James
10
5
0
15
Judy
0
5
0
5
Jessie
8
0
2
10
Johnson
1
0
0
1
I'd like to calcuate the amount for each product and group by each sales with pandas, so how to do this by pandas?
With df as your dataframe name Try:
temp_df = df.pivot_table(index='Sales', columns='Product', aggfunc=sum)
cols = [ind[1] for ind in np.array(temp_df.columns)]
data = np.array(temp_df)
final_df = pd.DataFrame({'Sales':temp_df.index})
for i, col in enumerate(cols):
final_df = pd.concat((final_df, pd.DataFrame({col:data[:, i]})), axis=1)
final_df = final_df.fillna(0)
final_df['total'] = final_df.iloc[:, 1:].sum(axis=1)
I have a pandas dataframe like this:
Name Product Amount
0 Bob Apple 1
1 Bob Banana 2
2 Jessica Orange 3
3 Jessica Banana 4
4 Jessica Tomato 3
5 Mary Banana 2
6 John Apple 3
7 John Grape 1
import pandas as pd
data = [('Bob','Apple',1), ('Bob','Banana',2), ('Jessica','Orange',3),
('Jessica','Banana',4),('Jessica','Tomato',3), ('Mary','Banana',2),
('John','Apple',3),('John','Grape',1)]
df = pd.DataFrame(data,columns=['Name','Product','Amount'])
What I have done so far:
l = []
count=0
for i in range(0,8):
row = df.iloc[i]
if row.Product not in l:
l.append(row.Product)
Now, l contains all the unique values in the Product column, but I need the total amount as well.
How can I find out for each product how many items were sold (for example, 4 units of Apple were sold)?
You're looking for .groupby() function:
print( df.groupby('Product')['Amount'].sum() )
Prints:
Product
Apple 4
Banana 8
Grape 1
Orange 3
Tomato 3
Name: Amount, dtype: int64
out = df.groupby('Product')['Amount'].sum()
print('{} units of Apple were sold.'.format(out.loc['Apple']))
Prints:
4 of Apple were sold.
Hi I am kind of new to python, but I have a dataframe like this:
ID NAME NAME1 VALUE
1 Sarah orange 5
1 Roger apple 3
2 Amy pineapple 2
2 Kia pear 8
I want it like this:
ID NAME NAME1 VALUE NAME NAME1 VALUE
1 Sarah orange 5 Roger apple 3
2 Amy pineapple 2 Kia pear 8
I am using pandas but not sure how I can achieve this and write to a csv. Any help would highly appreciated! Thanks!
Use set_index with cumcount for MultiIndex, reshape by unstack, sort MulitIndex by second level by sort_index and last flatten it by list comprehension with reset_index:
df = df.set_index(['ID',df.groupby('ID').cumcount()]).unstack().sort_index(axis=1, level=1)
#python 3.6+
df.columns = [f'{a}_{b}' for a, b in df.columns]
#python bellow 3.6
#df.columns = ['{}_{}'.format(a,b) for a, b in df.columns]
df = df.reset_index()
print (df)
ID NAME_0 NAME1_0 VALUE_0 NAME_1 NAME1_1 VALUE_1
0 1 Sarah orange 5 Roger apple 3
1 2 Amy pineapple 2 Kia pear 8
The title wasn't too clear but here's an example. Suppose I have:
person apple orange type
Alice 11 23 apple
Bob 14 20 orange
and I want to get this column
person new_col
Alice 11
Bob 20
so we get the column 'apple' for row 'Alice' and 'orange' for row 'Bob'.
I'm thinking iterrows, but that would be slow. Are there faster ways to do this?
Use DataFrame.lookup:
df['new_col'] = df.lookup(df.index, df['type'])
print (df)
person apple orange type new_col
0 Alice 11 23 apple 11
1 Bob 14 20 orange 20
If want only 2 column DataFrame use assign or DataFrame contructor:
df1 = df[['person']].assign(new_col=df.lookup(df.index, df['type']))
print (df1)
person new_col
0 Alice 11
1 Bob 20
df1 = pd.DataFrame({
'person':df['person'].values,
'new_col':df.lookup(df.index, df['type'])},
columns=['person','new_col'])
print (df1)
person new_col
0 Alice 11
1 Bob 20
I have the following DataFrame called df:
KEY_ID READY STEADY GO
001 Yes Maybe 123
002 No Maybe 123
003 Yes Sometimes 234
004 Yes Later 234
005 No Sometimes 345
I use df.count() to see how many times a value is filled in which is 5 every time:
KEY_ID 5
READY 5
STEADY 5
GO 5
But I would also like to see how many times the values in column STEADY are used. I do this with abc = df['STEADY'].value_counts() which gives me:
Sometimes 2
Maybe 2
Later 1
With a for loop I can extract the information of the values in abc which I just created with value_counts() as follows:
for i in abc:
print(i)
However, I tried several methods, including
for i,j in enumerate(abc):
print(i); print(j)
to get the names of Sometimes, Maybe, Later as well as I don't want to type them manually. How do I extract these names of the value_counts() values?
Are you looking for groupby() ?
import pandas as pd
lst = [['Apple', 1], ['Orange', 1], ['Apple', 2], ['Orange', 1], ['Apple', 3], ['Orange', 1]]
df = pd.DataFrame(lst)
df.columns = ['fruit', 'amount']
df.groupby('fruit').sum()
import pandas as pd
rowdata = [['Apple', 1], ['Orange', 1], ['Apple', 2], ['Orange', 1], ['Apple', 3],['Orange', 1]]
df = pd.DataFrame(rowdata)
df.groupby(0).sum()
This will give like a data frame which is given below,
1
0
Apple 6
Orange 3
But just df.sum() will give like this,
0 AppleOrangeAppleOrangeAppleOrange
1 9
I hope you are expecting like the first one..
IIUC:
In [339]: df
Out[339]:
name val
0 Apple 1
1 Orange 1
2 Apple 2
3 Orange 1
4 Apple 3
5 Orange 1
In [340]: df.groupby('name', as_index=False)['val'].sum()
Out[340]:
name val
0 Apple 6
1 Orange 3
In [341]: df.groupby('name', as_index=False)['val'].sum()['name']
Out[341]:
0 Apple
1 Orange
Name: name, dtype: object
In [342]: df.groupby('name', as_index=False)['val'].sum()['name'].tolist()
Out[342]: ['Apple', 'Orange']
It seems you want filter first by boolean indexing with isin:
print (df)
A B
0 Peach 3
1 Pear 6
2 Apple 1
3 Orange 1
4 Apple 2
5 Orange 1
6 Apple 3
7 Orange 1
df1 = df[df['A'].isin(['Apple','Orange'])]
print (df1)
A B
2 Apple 1
3 Orange 1
4 Apple 2
5 Orange 1
6 Apple 3
7 Orange 1
Then groupby and aggregate sum:
df2 = df1.groupby('A', as_index=False)['B'].sum()
print (df2)
A B
0 Apple 6
1 Orange 3
Another solution is groupby and aggregate first and then select only values by list:
df1 = df.groupby('A')['B'].sum()
df2 = df1.loc[['Apple','Orange']].reset_index()
print (df2)
A B
0 Apple 6
1 Orange 3