adding value to a subset of a column in Pandas

adding value to a subset of a column in Pandas - python

For the following data frame:
index A B C
0 3 word 7
1 4 type 3
2 8 manic 4
3 9 tour 6
I am going to add value to a subset of column A. This is my code:
df.A= df.loc[0:2, 'A' ] + 30
But this is there result:
index A B C
0 33 word 7
1 34 type 3
2 38 manic 4
3 tour 6
makes the value of the fourth row of column A "null". Any suggestion ?

You can use +=:
df.loc[0:2,'A']+=30
df
Out[11]:
index A B C
0 0 33 word 7
1 1 34 type 3
2 2 38 manic 4
3 3 9 tour 6

What you assign to the new column A is a column that is one row shorter than required. Instead, you should asign a slice to a slice, like I have done below.
Try
df.loc[0:2,'A']= df.loc[0:2, 'A' ] + 30

Related

Duplicate a single row at index?

In the past hour I was searching here and couldn't find a very simple thing I need to do, duplicate a single row at index x, and just put in on index x+1.
df
a b
0 3 8
1 2 4
2 9 0
3 5 1
copy index 2 and insert it as is in the next row:
a b
0 3 8
1 2 4
2 9 0
3 9 0 # new row
4 5 1
What I tried is concat(with my own columns names) which make a mess.
line = pd.DataFrame({"date": date, "event": None}, index=[index+1])
return pd.concat([df.iloc[:index], line, df.iloc[index:]]).reset_index(drop=True)
How to simply duplicate a full row at a given index ?

You can use repeat(). Fill in the dictionary with the index and the key, and how many extra rows you would like to add as the value. This can work for multiple values.
d = {2:1}
df.loc[df.index.repeat(df.index.map(d).fillna(0)+1)].reset_index()
Output:
index a b
0 0 3 8
1 1 2 4
2 2 9 0
3 2 9 0
4 3 5 1

Got it.
df.loc[index+0.5] = df.loc[index].values
return df.sort_index().reset_index(drop = True)

Python Pandas Dataframe: Divide values in two rows based on column values

I have a pandas dataframe:
A B C D
1 1 0 32
1 4
2 0 43
1 12
3 0 58
1 34
2 1 0 37
1 5
[..]
where A, B and C are index columns. What I want to compute is for every group of rows with unique values for A and B: D WHERE C=1 / D WHERE C=0.
The result should look like this:
A B NEW
1 1 4/32
2 12/43
3 58/34
2 1 37/5
[..]
Can you help me?

Use Series.unstack first, so possible divide columns 0,1:
new = df['D'].unstack()
new = new[1].div(new[0]).to_frame('NEW')
print (new)
NEW
A B
1 1 0.125000
2 0.279070
3 0.586207
2 2 0.135135

Best way to compute sequence?

I just started learning pandas and I was trying to figure out the easiest possible solution for the problem mentioned below.
Suppose, I've a dataframe like this ->
A B
6 7
8 9
5 6
7 8
Here, I'm selecting the minimum value cell from column 'A' as the starting point and updating the sequence in the new column 'C'. After sequencing dataframe must look like this ->
A B C
5 6 0
6 7 1
7 8 2
8 9 3
Is there any easy way to pick a cell from from column 'A' and match it with the matching cell in column 'B' and update the sequence respectively in column 'C'?
Some extra conditions ->
If 5 is present in column 'B' then I need to add another row like this -
A B C
0 5 0
5 6 1
......

Try sort_values:
df.sort_values('A').assign(C=np.arange(len(df)))
Output:
A B C
2 5 6 0
0 6 7 1
3 7 8 2
1 8 9 3
I'm not sure what you mean with the extra conditions though.

How do I convert my 2D numpy array to a pandas dataframe with given categories?

I have an array called 'values' which features 2 columns of mean reaction time data from 10 individuals. The first column refers to data collected for a single individual in condition A, the second for that same individual in condition B:
array([[451.75 , 488.55555556],
[552.44444444, 590.40740741],
[629.875 , 637.62962963],
[454.66666667, 421.88888889],
[637.16666667, 539.94444444],
[538.83333333, 516.33333333],
[463.83333333, 448.83333333],
[429.2962963 , 497.16666667],
[524.66666667, 458.83333333]])
I would like to plot these data using seaborn, to display the mean values and connected single values for each individual across the two conditions. What is the simplest way to convert the array 'values' into a 3 column DataFrame, whereby one column features all the values, another features a label distinguishing that value as condition A or condition B, and a final column which provides a number for each individual (i.e., 1-10)? For example, as follows:
Value Condition Individual
451.75 A 1
488.56 B 1
488.55 A 2
...etc

melt
You can do that using pd.melt:
pd.DataFrame(data, columns=['A','B']).reset_index().melt(id_vars = 'index')\
.rename(columns={'index':'Individual'})
Individual variable value
0 0 A 451.750000
1 1 A 552.444444
2 2 A 629.875000
3 3 A 454.666667
4 4 A 637.166667
5 5 A 538.833333
6 6 A 463.833333
7 7 A 429.296296
8 8 A 524.666667
9 0 B 488.555556
10 1 B 590.407407
11 2 B 637.629630
12 3 B 421.888889
13 4 B 539.944444
14 5 B 516.333333
15 6 B 448.833333
16 7 B 497.166667
17 8 B 458.833333

This should work
import pandas as pd
import numpy as np
np_array = np.array([[451.75 , 488.55555556],
[552.44444444, 590.40740741],
[629.875 , 637.62962963],
[454.66666667, 421.88888889],
[637.16666667, 539.94444444],
[538.83333333, 516.33333333],
[463.83333333, 448.83333333],
[429.2962963 , 497.16666667],
[524.66666667, 458.83333333]])
pd_df = pd.DataFrame(np_array, columns=["A", "B"])
num_individuals = len(pd_df.index)
pd_df = pd_df.melt()
pd_df["INDIVIDUAL"] = [(i)%(num_individuals) + 1 for i in pd_df.index]
pd_df
variable value INDIVIDUAL
0 A 451.750000 1
1 A 552.444444 2
2 A 629.875000 3
3 A 454.666667 4
4 A 637.166667 5
5 A 538.833333 6
6 A 463.833333 7
7 A 429.296296 8
8 A 524.666667 9
9 B 488.555556 1
10 B 590.407407 2
11 B 637.629630 3
12 B 421.888889 4
13 B 539.944444 5
14 B 516.333333 6
15 B 448.833333 7
16 B 497.166667 8
17 B 458.833333 9

Filtering Pandas Dataframe Aggregate

I have a pandas dataframe that I groupby, and then perform an aggregate calculation to get the mean for:
grouped = df.groupby(['year_month', 'company'])
means = grouped.agg({'size':['mean']})
Which gives me a dataframe back, but I can't seem to filter it to the specific company and year_month that I want:
means[(means['year_month']=='201412')]
gives me a KeyError

The issue is that you are grouping based on 'year_month' and 'company' . Hence in the means DataFrame, year_month and company would be part of the index (MutliIndex). You cannot access them as you access other columns.
One method to do this would be to get the values of the level 'year_month' of index . Example -
means.loc[means.index.get_level_values('year_month') == '201412']
Demo -
In [38]: df
Out[38]:
A B C
0 1 2 10
1 3 4 11
2 5 6 12
3 1 7 13
4 2 8 14
5 1 9 15
In [39]: means = df.groupby(['A','B']).mean()
In [40]: means
Out[40]:
C
A B
1 2 10
7 13
9 15
2 8 14
3 4 11
5 6 12
In [41]: means.loc[means.index.get_level_values('A') == 1]
Out[41]:
C
A B
1 2 10
7 13
9 15

As already pointed out, you will end up with a 2 level index. You could try to unstack the aggregated dataframe:
means = df.groupby(['year_month', 'company']).agg({'size':['mean']}).unstack(level=1)
This should give you a single 'year_month' index, 'company' as columns and your aggregate size as values. You can then slice by the index:
means.loc['201412']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

adding value to a subset of a column in Pandas - python

You can use +=: df.loc[0:2,'A']+=30 df Out[11]: index A B C 0 0 33 word 7 1 1 34 type 3 2 2 38 manic 4 3 3 9 tour 6

What you assign to the new column A is a column that is one row shorter than required. Instead, you should asign a slice to a slice, like I have done below. Try df.loc[0:2,'A']= df.loc[0:2, 'A' ] + 30

Related

Duplicate a single row at index?

Python Pandas Dataframe: Divide values in two rows based on column values

Best way to compute sequence?

How do I convert my 2D numpy array to a pandas dataframe with given categories?

Filtering Pandas Dataframe Aggregate

Categories

Resources