Apply a function if id equals X

Apply a function if id equals X - python

I have a DataFrame like this
id subid a
1 1 1 2
2 1 1 10
3 1 1 20
4 1 2 30
5 1 2 35
6 1 2 36
7 1 2 40
8 2 2 20
9 2 2 29
10 2 2 30
And I want to calculate and save the value of the mean of variable "a" for each id. For example I want the mean of the variable "a" if id=2. And then save that result on a list
This is what I have so far:
for i in range(2):
results=[]
if df.iloc[:,3]==i:
value=np.mean(df)
results.append(value)

I think what you are trying to do is:
df.groupby('id')['a'].mean()
It will return mean of both 1 and 2 but if you want to take only mean of 2 then you can do this:
df.groupby('id')['a'].mean()[2]
By doing this you're only taking mean of a column whose id is 2.

Problems here,
results=[] should be out of loops, otherwise for each time the loop runs, result resets to [].
I'm aware that iloc[:,2] is a column you're looking for.
value = df['a'].mean()

Related

Cumulative Sum that resets based on specific condition

Let's say I have the following data:
df=pd.DataFrame({'Days':[1,2,3,4,1,2,3,4],
'Flag':["First","First","First","First","Second","Second","Second","Second"],
'Payments':[1,2,3,4,9,3,1,6]})
I want to create a cumulative sum for payments, but it has to reset when flag turns from first to second. Any help?
The output that I'm looking for is the following:

Not sure if this is you want since you didn't provide an output but try this
df=pd.DataFrame({'Days':[1,2,3,4,1,2,3,4],
'Flag':["First","Second","First","Second","First","Second","Second","First"],
'Payments':[1,2,3,4,9,3,1,6]})
# make groups using consecutive Flags
groups = df.Flag.shift().ne(df.Flag).cumsum()
# groupby the groups and cumulatively sum payments
df['cumsum'] = df.groupby(groups).Payments.cumsum()
df

You can use df['Flag'].ne(df['Flag'].shift()).cumsum() to generate a grouper that will group by changes in the Flag column. Then, group by that, and cumsum:
df['cumsum'] = df['Payments'].groupby(df['Flag'].ne(df['Flag'].shift()).cumsum()).cumsum()
Output:
>>> df
Days Flag Payments cumsum
0 1 First 1 1
1 2 First 2 3
2 3 First 3 6
3 4 First 4 10
4 1 Second 9 9
5 2 Second 3 12
6 3 Second 1 13
7 4 Second 6 19

What is wrong with
df['Cumulative Payments'] = df.groupby('Flag')['Payments'].cumsum()
Days Flag Payments Cumulative Payments
0 1 First 1 1
1 2 First 2 3
2 3 First 3 6
3 4 First 4 10
4 1 Second 9 9
5 2 Second 3 12
6 3 Second 1 13
7 4 Second 6 19

How to create new column in Pandas dataframe where each row is product of previous rows

I have the following DataFrame dt:
a
0 1
1 2
2 3
3 4
4 5
How do I create a a new column where each row is a function of previous rows?
For instance, say the formula is:
B_row(t) = A_row(t-1)+A_row(t-2)+3
Such that:
a b
0 1 /
1 2 /
2 3 6
3 4 8
4 5 10
Also, I hear a lot about the fact that we mustn't loop through rows in Pandas', however it seems to me that I should go at it by looping through each row and creating a sort of recursive loop - as I would do in regular Python.

You could use cumprod:
dt['b'] = dt['a'].cumprod()
Output:
a b
0 1 1
1 2 2
2 3 6
3 4 24
4 5 120

How to do for loops with conditions in python data frame

I am currently trying to add 1 to an entire column if the value(int) is greater than 0. The code that I am currently using for it is like so:`
for coldcloudy in final.coldcloudy:
final.loc[final['coldcloudy'] > 0,coldcloudy] +=1
However I keep on getting a 'KeyError: 0' with it. Essentially, I want the code to go row by row in a particular column and add 1 if the integer is zero. for the values that are added by 1, I will add to another column. Can someone please help?

You don't need for loop:
final = pd.DataFrame({'coldcloudy':np.random.choice([0,1],20)})
final.loc[final.coldcloudy > 0, 'coldcloudy'] += 1
print(final)
Output:
coldcloudy
0 2
1 2
2 0
3 0
4 2
5 2
6 0
7 2
8 0
9 0
10 2
11 2
12 0
13 2
14 2
15 0
16 2
17 0
18 2
19 2

Pandas Count values across rows that are greater than another value in a different column

I have a pandas dataframe like this:
X a b c
1 1 0 2
5 4 7 3
6 7 8 9
I want to print a column called 'count' which outputs the number of values greater than the value in the first column('x' in my case). The output should look like:
X a b c Count
1 1 0 2 2
5 4 7 3 1
6 7 8 9 3
I would like to refrain from using 'lambda function' or 'for' loop or any kind of looping techniques since my dataframe has a large number of rows. I tried something like this but i couldn't get what i wanted.
df['count']=df [ df.iloc [:,1:] > df.iloc [:,0] ].count(axis=1)
I Also tried
numpy.where()
Didn't have any luck with that either. So any help will be appreciated. I also have nan as part of my dataframe. so i would like to ignore that when i count the values.
Thanks for your help in advance!

You can using ge(>=) with sum
df.iloc[:,1:].ge(df.iloc[:,0],axis = 0).sum(axis = 1)
Out[784]:
0 2
1 1
2 3
dtype: int64
After assign it back
df['Count']=df.iloc[:,1:].ge(df.iloc [:,0],axis=0).sum(axis=1)
df
Out[786]:
X a b c Count
0 1 1 0 2 2
1 5 4 7 3 1
2 6 7 8 9 3

df['count']=(df.iloc[:,2:5].le(df.iloc[:,0],axis=0).sum(axis=1) + df.iloc[:,2:5].ge(df.iloc[:,1],axis=0).sum(axis=1))
In case anyone needs such a solution, you can just add the output you get from '.le' and '.ge' in one line. Thanks to #Wen for the answer to my question though!!!

Python random sampling in multiple indices

I have a data frame according to below:
id_1 id_2 value
1 0 1
1 1 2
1 2 3
2 0 4
2 1 1
3 0 5
3 1 1
4 0 5
4 1 1
4 2 6
4 3 7
11 0 8
11 1 14
13 0 10
13 1 9
I would like to take out a random sample of size n, without replacement, from this table based on id_1. This row needs to be unique with respect to the id_1 column and can only occur once.
End result something like:
id_1 id_2 value
1 1 2
2 0 4
4 3 7
13 0 10
I have tried to do a group by and use the indices to take out a row through random.sample but it dosent go all the way.
Can someone give me a pointer on how to make this work? Code for DF below!
As always, thanks for time and input!
/swepab
df = pd.DataFrame({'id_1' : [1,1,1,2,2,3,3,4,4,4,4,11,11,13,13],
'id_2' : [0,1,2,0,1,0,1,0,1,2,3,0,1,0,1],
'value_col' : [1,2,3,4,1,5,1,5,1,6,7,8,14,10,9]})

You can do this using vectorized functions (not loops) using
import numpy as np
uniqued = df.id_1.reindex(np.random.permutation(df.index)).drop_duplicates()
df.ix[np.random.choice(uniqued.index, 1, replace=False)]
uniqued is created by a random shuffle + choice of a unique element by id_1. Then, a random sample (without replacement) is generated on it.

This samples one random per id:
for id in sorted(set(df["id_1"])):
print(df[df["id_1"] == id].sample(1))
PS:
translated above solution using pythons list comprehension, returning a list of of indices:
idx = [df[df["id_1"] == val].sample(1).index[0] for val in sorted(set(df["id_1"]))]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Apply a function if id equals X - python

I think what you are trying to do is: df.groupby('id')['a'].mean() It will return mean of both 1 and 2 but if you want to take only mean of 2 then you can do this: df.groupby('id')['a'].mean()[2] By doing this you're only taking mean of a column whose id is 2.

Problems here, results=[] should be out of loops, otherwise for each time the loop runs, result resets to []. I'm aware that iloc[:,2] is a column you're looking for. value = df['a'].mean()

Related

Cumulative Sum that resets based on specific condition

How to create new column in Pandas dataframe where each row is product of previous rows

How to do for loops with conditions in python data frame

Pandas Count values across rows that are greater than another value in a different column

Python random sampling in multiple indices

Categories

Resources