Hi I am kind of new to python, but I have a dataframe like this:
ID NAME NAME1 VALUE
1 Sarah orange 5
1 Roger apple 3
2 Amy pineapple 2
2 Kia pear 8
I want it like this:
ID NAME NAME1 VALUE NAME NAME1 VALUE
1 Sarah orange 5 Roger apple 3
2 Amy pineapple 2 Kia pear 8
I am using pandas but not sure how I can achieve this and write to a csv. Any help would highly appreciated! Thanks!
Use set_index with cumcount for MultiIndex, reshape by unstack, sort MulitIndex by second level by sort_index and last flatten it by list comprehension with reset_index:
df = df.set_index(['ID',df.groupby('ID').cumcount()]).unstack().sort_index(axis=1, level=1)
#python 3.6+
df.columns = [f'{a}_{b}' for a, b in df.columns]
#python bellow 3.6
#df.columns = ['{}_{}'.format(a,b) for a, b in df.columns]
df = df.reset_index()
print (df)
ID NAME_0 NAME1_0 VALUE_0 NAME_1 NAME1_1 VALUE_1
0 1 Sarah orange 5 Roger apple 3
1 2 Amy pineapple 2 Kia pear 8
Related
Agg by sales
Original Data:
Sales
Product
Qty
James
apple
10
Johnson
apple
1
Jessie
banana
2
Judy
melon
5
James
melon
5
Jessie
apple
8
To:
Sales
Apple
Melon
Banana
Total
James
10
5
0
15
Judy
0
5
0
5
Jessie
8
0
2
10
Johnson
1
0
0
1
I'd like to calcuate the amount for each product and group by each sales with pandas, so how to do this by pandas?
With df as your dataframe name Try:
temp_df = df.pivot_table(index='Sales', columns='Product', aggfunc=sum)
cols = [ind[1] for ind in np.array(temp_df.columns)]
data = np.array(temp_df)
final_df = pd.DataFrame({'Sales':temp_df.index})
for i, col in enumerate(cols):
final_df = pd.concat((final_df, pd.DataFrame({col:data[:, i]})), axis=1)
final_df = final_df.fillna(0)
final_df['total'] = final_df.iloc[:, 1:].sum(axis=1)
The title wasn't too clear but here's an example. Suppose I have:
person apple orange type
Alice 11 23 apple
Bob 14 20 orange
and I want to get this column
person new_col
Alice 11
Bob 20
so we get the column 'apple' for row 'Alice' and 'orange' for row 'Bob'.
I'm thinking iterrows, but that would be slow. Are there faster ways to do this?
Use DataFrame.lookup:
df['new_col'] = df.lookup(df.index, df['type'])
print (df)
person apple orange type new_col
0 Alice 11 23 apple 11
1 Bob 14 20 orange 20
If want only 2 column DataFrame use assign or DataFrame contructor:
df1 = df[['person']].assign(new_col=df.lookup(df.index, df['type']))
print (df1)
person new_col
0 Alice 11
1 Bob 20
df1 = pd.DataFrame({
'person':df['person'].values,
'new_col':df.lookup(df.index, df['type'])},
columns=['person','new_col'])
print (df1)
person new_col
0 Alice 11
1 Bob 20
I have a question about pandas and if someone could help me, I would be grateful for that very much.
I have a dataframe
df1 = pd.DataFrame( {'Name': ['A', 'B','A','A']})
df1
I want to do groupby for this.
x=df1.groupby("Name").size()
x
I also have another dataframe
df2 = pd.DataFrame( {'Name2': ['Jon',Maria','Maria','Mike','Mike','Mike']})
df2
For this one, I do groupby as well.
y= df2.groupby("Name2").size()
And then I want to make matrix whose column is x and row is y, and want to multiply the values.
I want the matrix like this.
Jon Maria Mike
A 3 6 9
B 1 2 3
If you could tell me how to do that, I would greatly appreciate it.
You could perform a dot product:
x.to_frame().dot(y.to_frame().T)
Name2 Jon Maria Mike
Name
A 3 6 9
B 1 2 3
If you want to remove the axis labels, use rename_axis:
x.to_frame().dot(y.to_frame().T)\
.rename_axis(None).rename_axis(None, 1)
Jon Maria Mike
A 3 6 9
B 1 2 3
Alternatively, assign in-place:
v = x.to_frame().dot(y.to_frame().T)
v.index.name = v.columns.name = None
v
Jon Maria Mike
A 3 6 9
B 1 2 3
In [35]: (pd.DataFrame(y[:,None].dot(x[:,None].T).T, columns=y.index, index=x.index)
.rename_axis(None)
.rename_axis(None,1))
Out[35]:
Jon Maria Mike
A 3 6 9
B 1 2 3
Or we can using np.multiply.outer
pd.DataFrame(np.multiply.outer(x.values,y.values),columns=y.index,index=x.index)
Out[344]:
Name2 Jon Maria Mike
Name
A 3 6 9
B 1 2 3
This ist my example Table imported from a CSV
Name hours work
User1 2 Kiwi
User1 5 Melon
...
User1 3 Kiwi
And this is my desired output:
Total Kiwi:
User1 5
I guess it is possible with a right join or a groupby. But I cant relize it to real code.
I tried something like this
ou = pd.DataFrame([[ou["work"].sum()["kiwi"]]])
You need:
df = df.groupby(['Name','work'])['hours'].sum().unstack()
print (df)
work Kiwi Melon
Name
User1 5 5
Or:
df = df.pivot_table(index='Name', columns='work', values='hours', aggfunc='sum')
print (df)
work Kiwi Melon
Name
User1 5 5
And then:
print (df[['Kiwi']])
work Kiwi
Name
User1 5
I have a pandas dataFrame in the following format
ID Name
0 1 Jim
1 1 Jimmy
2 2 Mark
3 2 Marko
4 3 Sergi
4 3 Sergi
I want to reshape the dataframe in the following format
ID Name_1 Name_2
0 1 Jim Jimmy
1 2 Mark Marko
2 3 Sergi Sergi
So that I can compare the two names. I am unable to use pd.pivot or pd.pivottable for this requirement.
Should be fairly simple. Please, can you suggest how to do this?
You can use cumcount with pivot, last add_prefix to column names:
df['groups'] = df.groupby('ID').cumcount() + 1
df = df.pivot(index='ID', columns='groups', values='Name').add_prefix('Name_')
print (df)
groups Name_1 Name_2
ID
1 Jim Jimmy
2 Mark Marko
3 Sergi Sergi
Another solution with groupby and unstack, last add_prefix to column names:
df1 = df.groupby('ID')["Name"] \
.apply(lambda x: pd.Series(x.values)) \
.unstack(1) \
.rename(columns=lambda x: x+1) \
.add_prefix('Name_')
print (df1)
Name_1 Name_2
ID
1 Jim Jimmy
2 Mark Marko
3 Sergi Sergi