How to change pandas table data arrangement? [duplicate] - python

This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 7 days ago.
I would like to change the arrangement of this table:
import pandas as pd
original_dict = {
"group A" : [10,9,11],
"group B" :[23,42,56]
}
original_df = pd.DataFrame(original_dict)
original_df
Here is the desired output:
Value
Group Type
10
group A
9
group A
11
group A
23
group B
42
group B
56
group B
Thank you!

You can use Pandas Melt function.
https://pandas.pydata.org/docs/reference/api/pandas.melt.html
df = pd.melt(original_df)
df.columns=['Group Type', 'Value']
df
Group Type Value
group A 10
group A 9
group A 11
group B 23
group B 42
group B 56

Related

Add 2 column from 2 dataframe in pandas [duplicate]

This question already has answers here:
Pandas merge two dataframes summing values [duplicate]
(2 answers)
how to merge two dataframes and sum the values of columns
(2 answers)
Closed 4 years ago.
I am new to pandas, could you help me with the case belove pls
I have 2 DF:
df1 = pd.DataFrame({'A': ['name', 'color', 'city', 'animal'], 'number': ['1', '32', '22', '13']})
df2 = pd.DataFrame({'A': ['name', 'color', 'city', 'animal'], 'number': ['12', '2', '42', '15']})
df1
A number
0 name 1
1 color 32
2 city 22
3 animal 13
DF1
A number
0 name 12
1 color 2
2 city 42
3 animal 15
I need to get the sum of the colum number e.g.
DF1
A number
0 name 13
1 color 34
2 city 64
3 animal 27
but if I do new = df1 + df2 i get a
NEW
A number
0 namename 13
1 colorcolor 34
2 citycity 64
3 animalanimal 27
I even tried with merge on="A" but nothing.
Can anyone enlight me pls
Thank you
Here are two different ways: one with add, and one with concat and groupby. In either case, you need to make sure that your number columns are numeric first (your example dataframes have strings):
# set `number` to numeric (could be float, I chose int here)
df1['number'] = df1['number'].astype(int)
df2['number'] = df2['number'].astype(int)
# method 1, set the index to `A` in each and add the two frames together:
df1.set_index('A').add(df2.set_index('A')).reset_index()
# method 2, concatenate the two frames, groupby A, and get the sum:
pd.concat((df1,df2)).groupby('A',as_index=False).sum()
Output:
A number
0 animal 28
1 city 64
2 color 34
3 name 13
Merging isn't a bad idea, you just need to remember to convert numeric series to numeric, select columns to merge on, then sum on numeric columns via select_dtypes:
df1['number'] = pd.to_numeric(df1['number'])
df2['number'] = pd.to_numeric(df2['number'])
df = df1.merge(df2, on='A')
df['number'] = df.select_dtypes(include='number').sum(1) # 'number' means numeric columns
df = df[['A', 'number']]
print(df)
A number
0 name 13
1 color 34
2 city 64
3 animal 28

Pandas DataFrame GroupBy sum/count to new DataFrame [duplicate]

This question already has answers here:
Specifying column order following groupby aggregation
(2 answers)
Closed 5 years ago.
My DataFrame is
State|City|Year|Budget|Income
S1|C1|2000|1000|1
S1|C2|2000|1200|2
S2|C3|2000|5500|3
I need to get a new DataFrame with columns:
State, Year, Count, Sum_Budget, Sum_Income:
That is,
State|Year|Count|Sum_Budget|Sum_Income
S1|2000|2|2200|3
S2|2000|1|5500|3
In C# the code would be:
dataframe
.GroupBy(x => new { x.State, x.City})
.Select(x => new {
x.Key.State,
x.Key.City,
Count = x.Count(),
Sum_Budget = x.Sum(y => y.Budget),
Sum_Income= x.Sum(y => y.Income)
}
}).ToArray();
How do I do so with Pandas?
Use agg:
d = {'Income':'Sum_Income','Budget':'Sum_Budget','City':'Count'}
agg_d = {'Budget':'sum', 'Income':'sum', 'City':'size'}
df = df.groupby(['State', 'Year'], as_index=False).agg(agg_d).rename(columns=d)
print (df)
State Year Sum_Income Sum_Budget Count
0 S1 2000 3 2200 2
1 S2 2000 3 5500 1

Python: combining two columns [duplicate]

This question already has answers here:
Combine two columns of text in pandas dataframe
(21 answers)
Closed 5 years ago.
I have two columns, one has the year, and another has the month data, and I am trying to make one column from them (containing year and month).
Example:
click_year
-----------
2016
click_month
-----------
11
I want to have
YearMonth
-----------
201611
I tried
date['YearMonth'] = pd.concat((date.click_year, date.click_month))
but it gave me "cannot reindex from a duplicate axis" error.
Bill's answer on the post might be what you are looking for.
import pandas as pd
df = pd.DataFrame({'click_year': ['2014', '2015'], 'click_month': ['10', '11']})
>>> df
click_month click_year
0 10 2014
1 11 2015
df['YearMonth'] = df[['click_year','click_month']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
>>> df
click_month click_year YearMonth
0 10 2014 201410
1 11 2015 201511

Summing up values for rows per columns starting with 'Col' [duplicate]

This question already has answers here:
Pandas: sum DataFrame rows for given columns
(8 answers)
Closed 7 years ago.
I have a DataFrame like this:
df =
Col1 Col2 T3 T5
------------------
28 34 11 22
45 589 33 66
For each row I want to sum up the total values of columns whose names start with Col.
Is there some more elegant and quick way than the one shown below?
df['total'] = 0
for index, row in df.iterrows():
total_for_row = 0
for column_name, column in df.transpose().iterrows():
if 'Col' in column_name:
total_for_row = total_for_row + row[column_name]
row['total'] = total_for_row
Try this
idx = df.columns.str.startswith('Col')
df['total'] = df.iloc[:,idx].sum(axis=1)

Selecting max within partition for pandas dataframe [duplicate]

This question already has answers here:
Python pandas - filter rows after groupby
(4 answers)
Closed 8 years ago.
I have a pandas dataframe. My goal is to select only those rows where column C has the largest value within group B. For example, when B is "one" the maximum value of C is 311, so I would like the row where C = 311 and B = "one."
import pandas as pd
import numpy as np
df2 = pd.DataFrame({ 'A' : 1.,
'A' : pd.Categorical(["test1","test2","test3","test4"]),
'B' : pd.Categorical(["one","one","two","two"]),
'C' : np.array([311,42,31,41]),
'D' : np.array([9,8,7,6])
})
df2.groupby('C').max()
Output should be:
test1 one 311 9
test4 two 41 6
You can use idxmax(), which returns the indices of the max values:
maxes = df2.groupby('B')['C'].idxmax()
df2.loc[maxes]
Output:
Out[11]:
A B C D
0 test1 one 311 9
3 test4 two 41 6

Categories

Resources