For the following data frame:
index A B C
0 3 word 7
1 4 type 3
2 8 manic 4
3 9 tour 6
I am going to add value to a subset of column A. This is my code:
df.A= df.loc[0:2, 'A' ] + 30
But this is there result:
index A B C
0 33 word 7
1 34 type 3
2 38 manic 4
3 tour 6
makes the value of the fourth row of column A "null". Any suggestion ?
You can use +=:
df.loc[0:2,'A']+=30
df
Out[11]:
index A B C
0 0 33 word 7
1 1 34 type 3
2 2 38 manic 4
3 3 9 tour 6
What you assign to the new column A is a column that is one row shorter than required. Instead, you should asign a slice to a slice, like I have done below.
Try
df.loc[0:2,'A']= df.loc[0:2, 'A' ] + 30
Related
In the past hour I was searching here and couldn't find a very simple thing I need to do, duplicate a single row at index x, and just put in on index x+1.
df
a b
0 3 8
1 2 4
2 9 0
3 5 1
copy index 2 and insert it as is in the next row:
a b
0 3 8
1 2 4
2 9 0
3 9 0 # new row
4 5 1
What I tried is concat(with my own columns names) which make a mess.
line = pd.DataFrame({"date": date, "event": None}, index=[index+1])
return pd.concat([df.iloc[:index], line, df.iloc[index:]]).reset_index(drop=True)
How to simply duplicate a full row at a given index ?
You can use repeat(). Fill in the dictionary with the index and the key, and how many extra rows you would like to add as the value. This can work for multiple values.
d = {2:1}
df.loc[df.index.repeat(df.index.map(d).fillna(0)+1)].reset_index()
Output:
index a b
0 0 3 8
1 1 2 4
2 2 9 0
3 2 9 0
4 3 5 1
Got it.
df.loc[index+0.5] = df.loc[index].values
return df.sort_index().reset_index(drop = True)
I have a pandas dataframe:
A B C D
1 1 0 32
1 4
2 0 43
1 12
3 0 58
1 34
2 1 0 37
1 5
[..]
where A, B and C are index columns. What I want to compute is for every group of rows with unique values for A and B: D WHERE C=1 / D WHERE C=0.
The result should look like this:
A B NEW
1 1 4/32
2 12/43
3 58/34
2 1 37/5
[..]
Can you help me?
Use Series.unstack first, so possible divide columns 0,1:
new = df['D'].unstack()
new = new[1].div(new[0]).to_frame('NEW')
print (new)
NEW
A B
1 1 0.125000
2 0.279070
3 0.586207
2 2 0.135135
I just started learning pandas and I was trying to figure out the easiest possible solution for the problem mentioned below.
Suppose, I've a dataframe like this ->
A B
6 7
8 9
5 6
7 8
Here, I'm selecting the minimum value cell from column 'A' as the starting point and updating the sequence in the new column 'C'. After sequencing dataframe must look like this ->
A B C
5 6 0
6 7 1
7 8 2
8 9 3
Is there any easy way to pick a cell from from column 'A' and match it with the matching cell in column 'B' and update the sequence respectively in column 'C'?
Some extra conditions ->
If 5 is present in column 'B' then I need to add another row like this -
A B C
0 5 0
5 6 1
......
Try sort_values:
df.sort_values('A').assign(C=np.arange(len(df)))
Output:
A B C
2 5 6 0
0 6 7 1
3 7 8 2
1 8 9 3
I'm not sure what you mean with the extra conditions though.
I have an array called 'values' which features 2 columns of mean reaction time data from 10 individuals. The first column refers to data collected for a single individual in condition A, the second for that same individual in condition B:
array([[451.75 , 488.55555556],
[552.44444444, 590.40740741],
[629.875 , 637.62962963],
[454.66666667, 421.88888889],
[637.16666667, 539.94444444],
[538.83333333, 516.33333333],
[463.83333333, 448.83333333],
[429.2962963 , 497.16666667],
[524.66666667, 458.83333333]])
I would like to plot these data using seaborn, to display the mean values and connected single values for each individual across the two conditions. What is the simplest way to convert the array 'values' into a 3 column DataFrame, whereby one column features all the values, another features a label distinguishing that value as condition A or condition B, and a final column which provides a number for each individual (i.e., 1-10)? For example, as follows:
Value Condition Individual
451.75 A 1
488.56 B 1
488.55 A 2
...etc
melt
You can do that using pd.melt:
pd.DataFrame(data, columns=['A','B']).reset_index().melt(id_vars = 'index')\
.rename(columns={'index':'Individual'})
Individual variable value
0 0 A 451.750000
1 1 A 552.444444
2 2 A 629.875000
3 3 A 454.666667
4 4 A 637.166667
5 5 A 538.833333
6 6 A 463.833333
7 7 A 429.296296
8 8 A 524.666667
9 0 B 488.555556
10 1 B 590.407407
11 2 B 637.629630
12 3 B 421.888889
13 4 B 539.944444
14 5 B 516.333333
15 6 B 448.833333
16 7 B 497.166667
17 8 B 458.833333
This should work
import pandas as pd
import numpy as np
np_array = np.array([[451.75 , 488.55555556],
[552.44444444, 590.40740741],
[629.875 , 637.62962963],
[454.66666667, 421.88888889],
[637.16666667, 539.94444444],
[538.83333333, 516.33333333],
[463.83333333, 448.83333333],
[429.2962963 , 497.16666667],
[524.66666667, 458.83333333]])
pd_df = pd.DataFrame(np_array, columns=["A", "B"])
num_individuals = len(pd_df.index)
pd_df = pd_df.melt()
pd_df["INDIVIDUAL"] = [(i)%(num_individuals) + 1 for i in pd_df.index]
pd_df
variable value INDIVIDUAL
0 A 451.750000 1
1 A 552.444444 2
2 A 629.875000 3
3 A 454.666667 4
4 A 637.166667 5
5 A 538.833333 6
6 A 463.833333 7
7 A 429.296296 8
8 A 524.666667 9
9 B 488.555556 1
10 B 590.407407 2
11 B 637.629630 3
12 B 421.888889 4
13 B 539.944444 5
14 B 516.333333 6
15 B 448.833333 7
16 B 497.166667 8
17 B 458.833333 9
I have a pandas dataframe that I groupby, and then perform an aggregate calculation to get the mean for:
grouped = df.groupby(['year_month', 'company'])
means = grouped.agg({'size':['mean']})
Which gives me a dataframe back, but I can't seem to filter it to the specific company and year_month that I want:
means[(means['year_month']=='201412')]
gives me a KeyError
The issue is that you are grouping based on 'year_month' and 'company' . Hence in the means DataFrame, year_month and company would be part of the index (MutliIndex). You cannot access them as you access other columns.
One method to do this would be to get the values of the level 'year_month' of index . Example -
means.loc[means.index.get_level_values('year_month') == '201412']
Demo -
In [38]: df
Out[38]:
A B C
0 1 2 10
1 3 4 11
2 5 6 12
3 1 7 13
4 2 8 14
5 1 9 15
In [39]: means = df.groupby(['A','B']).mean()
In [40]: means
Out[40]:
C
A B
1 2 10
7 13
9 15
2 8 14
3 4 11
5 6 12
In [41]: means.loc[means.index.get_level_values('A') == 1]
Out[41]:
C
A B
1 2 10
7 13
9 15
As already pointed out, you will end up with a 2 level index. You could try to unstack the aggregated dataframe:
means = df.groupby(['year_month', 'company']).agg({'size':['mean']}).unstack(level=1)
This should give you a single 'year_month' index, 'company' as columns and your aggregate size as values. You can then slice by the index:
means.loc['201412']