I want to add a new column in my dataframe where the new column is an incremental number started from 0
type value
a 25
b 23
c 33
d 31
I expect my dataframe would be:
type value id
a 25 1
b 23 2
c 33 3
d 31 4
beside the id column, I also want to add a new column: status_id where from number 1 to 2 is called foo and from number 3 to 4 is called bar. I expect the full dataframe would be like:
type value id status_id
a 25 1 foo
b 23 2 foo
c 33 3 bar
d 31 4 bar
How can I do this with pandas? Thanks in advance
Something like?
df['id'] = np.arange(1, len(df) + 1)
df['status_id'] = df['id'].sub(1).floordiv(2).map({0: 'foo', 1: 'bar'})
type value id status_id
0 a 25 1 foo
1 b 23 2 foo
2 c 33 3 bar
3 d 31 4 bar
We can try with cut
df['status_id'] = pd.cut(df.id,[0,2,4],labels=['foo','bar'])
df
type value id status_id
0 a 25 1 foo
1 b 23 2 foo
2 c 33 3 bar
3 d 31 4 bar
For the first question,
df.insert(0, 'id', range(1, 1 + len(df)))
For the second question, are you looking at only 4 columns? If so you can insert them manually. If its two foos and two bars for x columns, you can use modulo 4 to insert them correctly.
Related
I have a pandas dataframe:
A B C D
1 1 0 32
1 4
2 0 43
1 12
3 0 58
1 34
2 1 0 37
1 5
[..]
where A, B and C are index columns. What I want to compute is for every group of rows with unique values for A and B: D WHERE C=1 / D WHERE C=0.
The result should look like this:
A B NEW
1 1 4/32
2 12/43
3 58/34
2 1 37/5
[..]
Can you help me?
Use Series.unstack first, so possible divide columns 0,1:
new = df['D'].unstack()
new = new[1].div(new[0]).to_frame('NEW')
print (new)
NEW
A B
1 1 0.125000
2 0.279070
3 0.586207
2 2 0.135135
I have two dataframe, and i am able to merge it. but I want to merge it in specific format ( column wise), Below are the further details
>df1
id A B C
0 1 20 0 1
1 2 23 1 2
>df2
id A B C
0 1 10 1 1
1 2 20 1 1
Below is my code and output
df = pd.merge(df1,df2,on='id',suffixes=('_Pre', '_Post'))
The output of this is :
id A_Pre B_Pre C_Pre A_Post B_Post C_Post
0 1 20 0 1 10 1 1
1 2 23 1 2 20 1 1
But the EXPECTED output should be, Can someone help or guide me for this :
id A_Pre A_Post B_Pre B_Post C_Pre C_Post
0 1 20 10 0 1 1 1
1 2 23 20 1 1 2 1
When subsequently manipulation is possible you can do domething like:
df[np.array([[x+"_Pre", x+"_Post"] for x in df1.columns.drop("id")]).flatten()]
If you just want to modify the order of your columns you can use reindex :
df = df.reindex(columns=['A_Pre','A_Post','B_Pre','B_Post','C_Pre','C_Post'])
You can order the columns in the new dataset using sorted and just add the column "id" in a second statement
order_col = sorted(df.columns[1:], key=lambda x:x[:3])
df_final = pd.concat([df['id'],df[order_col]], axis=1)
My problem is the displacement of column names in my data frame after set the column B column as index.
What I had:
A B C
11 6260063207400 1999-02-15 1
22 6260063207400 1999-02-18 2
29 6260063207400 1999-02-20 2
61 6260063207400 1999-02-27 2
What I have:
B
A
1999-02-15 1
1999-02-18 2
1999-02-20 2
1999-02-27 2
1999-02-28 2
What I would:
A B
1999-02-15 1
1999-02-18 2
1999-02-20 2
1999-02-27 2
1999-02-28 2
Let us do
df = df.reset_index()
In Python, I have a pandas data frame df.
ID Ref Dist
A 0 10
A 0 10
A 1 20
A 1 20
A 2 30
A 2 30
A 3 5
A 3 5
B 0 8
B 0 8
B 1 40
B 1 40
B 2 7
B 2 7
I want to group by ID and Ref, and take the first row of the Dist column in each group.
ID Ref Dist
A 0 10
A 1 20
A 2 30
A 3 5
B 0 8
B 1 40
B 2 7
And I want to sum up the Dist column in each ID group.
ID Sum
A 65
B 55
I tried this to do the first step, but this gives me just an index of the row and Dist, so I cannot move on to the second step.
df.groupby(['ID', 'Ref'])['Dist'].head(1)
It'd be wonderful if somebody helps me for this.
Thank you!
I believe this is what you're looking for.
The first step you need to use first since you want the first in the groupby. Once you've done that, use reset_index() so you can use a groupby afterwards and sum it up using ID.
df.groupby(['ID','Ref'])['Dist'].first()\
.reset_index().groupby(['ID'])['Dist'].sum()
ID
A 65
B 55
Just drop_duplicates before the groupby. The default behavior is to keep the first duplicate row, which is what you want.
df.drop_duplicates(['ID', 'Ref']).groupby('ID').Dist.sum()
#A 65
#B 55
#Name: Dist, dtype: int64
For the following data frame:
index A B C
0 3 word 7
1 4 type 3
2 8 manic 4
3 9 tour 6
I am going to add value to a subset of column A. This is my code:
df.A= df.loc[0:2, 'A' ] + 30
But this is there result:
index A B C
0 33 word 7
1 34 type 3
2 38 manic 4
3 tour 6
makes the value of the fourth row of column A "null". Any suggestion ?
You can use +=:
df.loc[0:2,'A']+=30
df
Out[11]:
index A B C
0 0 33 word 7
1 1 34 type 3
2 2 38 manic 4
3 3 9 tour 6
What you assign to the new column A is a column that is one row shorter than required. Instead, you should asign a slice to a slice, like I have done below.
Try
df.loc[0:2,'A']= df.loc[0:2, 'A' ] + 30