Python Help Pandas row and Column

Python Help Pandas row and Column - python

Hi I am kind of new to python, but I have a dataframe like this:
ID NAME NAME1 VALUE
1 Sarah orange 5
1 Roger apple 3
2 Amy pineapple 2
2 Kia pear 8
I want it like this:
ID NAME NAME1 VALUE NAME NAME1 VALUE
1 Sarah orange 5 Roger apple 3
2 Amy pineapple 2 Kia pear 8
I am using pandas but not sure how I can achieve this and write to a csv. Any help would highly appreciated! Thanks!

Use set_index with cumcount for MultiIndex, reshape by unstack, sort MulitIndex by second level by sort_index and last flatten it by list comprehension with reset_index:
df = df.set_index(['ID',df.groupby('ID').cumcount()]).unstack().sort_index(axis=1, level=1)
#python 3.6+
df.columns = [f'{a}_{b}' for a, b in df.columns]
#python bellow 3.6
#df.columns = ['{}_{}'.format(a,b) for a, b in df.columns]
df = df.reset_index()
print (df)
ID NAME_0 NAME1_0 VALUE_0 NAME_1 NAME1_1 VALUE_1
0 1 Sarah orange 5 Roger apple 3
1 2 Amy pineapple 2 Kia pear 8

Related

How to use pandas to agg data with different condition for different columns?

Agg by sales
Original Data:
Sales
Product
Qty
James
apple
10
Johnson
apple
1
Jessie
banana
2
Judy
melon
5
James
melon
5
Jessie
apple
8
To:
Sales
Apple
Melon
Banana
Total
James
10
5
0
15
Judy
0
5
0
5
Jessie
8
0
2
10
Johnson
1
0
0
1
I'd like to calcuate the amount for each product and group by each sales with pandas, so how to do this by pandas?

With df as your dataframe name Try:
temp_df = df.pivot_table(index='Sales', columns='Product', aggfunc=sum)
cols = [ind[1] for ind in np.array(temp_df.columns)]
data = np.array(temp_df)
final_df = pd.DataFrame({'Sales':temp_df.index})
for i, col in enumerate(cols):
final_df = pd.concat((final_df, pd.DataFrame({col:data[:, i]})), axis=1)
final_df = final_df.fillna(0)
final_df['total'] = final_df.iloc[:, 1:].sum(axis=1)

Extract value of a particular column name in pandas as listed in another column

The title wasn't too clear but here's an example. Suppose I have:
person apple orange type
Alice 11 23 apple
Bob 14 20 orange
and I want to get this column
person new_col
Alice 11
Bob 20
so we get the column 'apple' for row 'Alice' and 'orange' for row 'Bob'.
I'm thinking iterrows, but that would be slow. Are there faster ways to do this?

Use DataFrame.lookup:
df['new_col'] = df.lookup(df.index, df['type'])
print (df)
person apple orange type new_col
0 Alice 11 23 apple 11
1 Bob 14 20 orange 20
If want only 2 column DataFrame use assign or DataFrame contructor:
df1 = df[['person']].assign(new_col=df.lookup(df.index, df['type']))
print (df1)
person new_col
0 Alice 11
1 Bob 20
df1 = pd.DataFrame({
'person':df['person'].values,
'new_col':df.lookup(df.index, df['type'])},
columns=['person','new_col'])
print (df1)
person new_col
0 Alice 11
1 Bob 20

Pandas, how to make matrix

I have a question about pandas and if someone could help me, I would be grateful for that very much.
I have a dataframe
df1 = pd.DataFrame( {'Name': ['A', 'B','A','A']})
df1
I want to do groupby for this.
x=df1.groupby("Name").size()
x
I also have another dataframe
df2 = pd.DataFrame( {'Name2': ['Jon',Maria','Maria','Mike','Mike','Mike']})
df2
For this one, I do groupby as well.
y= df2.groupby("Name2").size()
And then I want to make matrix whose column is x and row is y, and want to multiply the values.
I want the matrix like this.
Jon Maria Mike
A 3 6 9
B 1 2 3
If you could tell me how to do that, I would greatly appreciate it.

You could perform a dot product:
x.to_frame().dot(y.to_frame().T)
Name2 Jon Maria Mike
Name
A 3 6 9
B 1 2 3
If you want to remove the axis labels, use rename_axis:
x.to_frame().dot(y.to_frame().T)\
.rename_axis(None).rename_axis(None, 1)
Jon Maria Mike
A 3 6 9
B 1 2 3
Alternatively, assign in-place:
v = x.to_frame().dot(y.to_frame().T)
v.index.name = v.columns.name = None
v
Jon Maria Mike
A 3 6 9
B 1 2 3

In [35]: (pd.DataFrame(y[:,None].dot(x[:,None].T).T, columns=y.index, index=x.index)
.rename_axis(None)
.rename_axis(None,1))
Out[35]:
Jon Maria Mike
A 3 6 9
B 1 2 3

Or we can using np.multiply.outer
pd.DataFrame(np.multiply.outer(x.values,y.values),columns=y.index,index=x.index)
Out[344]:
Name2 Jon Maria Mike
Name
A 3 6 9
B 1 2 3

Pandas group and sum values

This ist my example Table imported from a CSV
Name hours work
User1 2 Kiwi
User1 5 Melon
...
User1 3 Kiwi
And this is my desired output:
Total Kiwi:
User1 5
I guess it is possible with a right join or a groupby. But I cant relize it to real code.
I tried something like this
ou = pd.DataFrame([[ou["work"].sum()["kiwi"]]])

You need:
df = df.groupby(['Name','work'])['hours'].sum().unstack()
print (df)
work Kiwi Melon
Name
User1 5 5
Or:
df = df.pivot_table(index='Name', columns='work', values='hours', aggfunc='sum')
print (df)
work Kiwi Melon
Name
User1 5 5
And then:
print (df[['Kiwi']])
work Kiwi
Name
User1 5

Reshaping Pandas dataframe grouping variables

I have a pandas dataFrame in the following format
ID Name
0 1 Jim
1 1 Jimmy
2 2 Mark
3 2 Marko
4 3 Sergi
4 3 Sergi
I want to reshape the dataframe in the following format
ID Name_1 Name_2
0 1 Jim Jimmy
1 2 Mark Marko
2 3 Sergi Sergi
So that I can compare the two names. I am unable to use pd.pivot or pd.pivottable for this requirement.
Should be fairly simple. Please, can you suggest how to do this?

You can use cumcount with pivot, last add_prefix to column names:
df['groups'] = df.groupby('ID').cumcount() + 1
df = df.pivot(index='ID', columns='groups', values='Name').add_prefix('Name_')
print (df)
groups Name_1 Name_2
ID
1 Jim Jimmy
2 Mark Marko
3 Sergi Sergi
Another solution with groupby and unstack, last add_prefix to column names:
df1 = df.groupby('ID')["Name"] \
.apply(lambda x: pd.Series(x.values)) \
.unstack(1) \
.rename(columns=lambda x: x+1) \
.add_prefix('Name_')
print (df1)
Name_1 Name_2
ID
1 Jim Jimmy
2 Mark Marko
3 Sergi Sergi

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Help Pandas row and Column - python

Related

How to use pandas to agg data with different condition for different columns?

Extract value of a particular column name in pandas as listed in another column

Pandas, how to make matrix

Pandas group and sum values

Reshaping Pandas dataframe grouping variables

Categories

Resources