apply function in specific range in row - python

my input:
index frame user1 user2
0 0 0 0
1 1 0 0
2 2 0 0
3 3 0 0
4 4 0 0
5 5 0 0
Also I have two objects start_frame and end_frame - pandas Series look like this for 'start frame' :
index frame
3 3
and for end frame:
index frame
4 5
My problem is apply function in specific column - user1 and in specific row number, where values I get from start_frame and end_frame.
I expect output like this:
frame user1 user2
0 0 0 0
1 1 0 0
2 2 0 0
3 3 1 0
4 4 1 0
5 5 1 0
I trying this but it return all column to ones or any other output but not that I want
def my_func(x):
x=x+1
return x
df['user1']=df['user1'].between(df['frame']==3, df['frame']==5, inclusive=False).apply(lambda x: add_one(x))
I trying another code:
df['user1']=df.apply(lambda row: 1 if row['frame'] in (3,5) else 0, axis=1)
But it return only 1 in row 3 and 5, how here in (3,5) insert range?
So I have two question: First and most important how to apply my_func exacly in rows what I need, and other question how to use my object end_frame and start_frame instead manually insert in function.
Thank you
Updated:
arr_rang = range(3,6)
df['user1']=df.apply(lambda row: 1 if row['frame'] in (arr_rang) else 0, axis=1)
Now it's return 1 in frame 3,4,5. That I need. But still I dont understand how use my objects end_frame and start_frame

let's append start_frame and end_frame since they are having common columns then check values using isin() and finally changing value by using boolean masking and loc accessor:
s=start_frame.append(end_frame)
mask=(df['index'].isin(s['index'])) | (df['frame'].isin(s['frame']))
df.loc[mask,'user1']=df.loc[mask,'user1']+1
#you can also use np.where() in place of loc accessor
output of df:
index frame user1 user2
0 0 0 0 0
1 1 1 0 0
2 2 2 0 0
3 3 3 1 0
4 4 4 1 0
5 5 5 1 0
Update:
use:
mask=df['frame'].between(3,5)
df.loc[mask,'user1']=df.loc[mask,'user1']+1

Did you try
def putHello(row):
row["hello"] = "world"
return row
data.iloc[5:7].apply(putHello,axis=1)
The output would look something like this
The documentation for pandas functions
Iloc pandas
Apply pandas

Related

Why cannot change the value for a specific column in a slice of pandas data frame using .loc?

df1 = pd.DataFrame(np.random.random_sample((10)), columns=list('A'))
df1['MATCHED'] = 0
df1
A MATCHED
0 0.424651 0
1 0.855567 0
2 0.983395 0
3 0.921866 0
4 0.001827 0
5 0.341491 0
6 0.055578 0
7 0.970564 0
8 0.078751 0
9 0.348055 0
Then I filter df1:
df1_slice = df1[df1['A'] <=0.4]
df1_slice
A MATCHED
4 0.001827 0
5 0.341491 0
6 0.055578 0
8 0.078751 0
9 0.348055 0
Now, I want to change the column MATCHED values for those rows in df1_slice :
df1.loc[df1_slice.index]['MATCHED']=1
I'd expect the MATCHED column changes in df1 from 0 to 1, but it doesn't.
df1.loc[df1_slice.index]
A MATCHED
4 0.001827 0
5 0.341491 0
6 0.055578 0
8 0.078751 0
9 0.348055 0
Why they don't change and how to change this script so that they change to 1 in df1.
I find that using .loc throughout is the best way to (re-)assign to slices. In your example you use .loc only to slice on the index, not on the columns. This should do it (haven't tested):
df1.loc[df1_slice.index, 'MATCHED'] = 1

How to multiply every column in one dataframe with all columns in other dataframe

I have two dataframes X_dummy and X_var, where X_dummy contains dummies and looks like this:
dummy1 dummy2
1 0
0 1
1 0
The X_var dataframe looks contains variables and looks like this:
var1 var2
4 2
10 5
1 1
Now I want to create a dataframe containing the cellwise product of every column from X_dummy with the complete X_var dataframe. Hence, my resulting dataframe should look like, X_result:
var1dummy1 var2dummy1 var1dummy2 var2dummy2
4 2 0 0
0 0 10 5
1 1 0 0
Does anyone know how to do this without using multiple for loops?
Something like numpy broadcast
new = pd.DataFrame(np.concatenate(df2.T.values * df1.T.values[:,None]).T)
new
Out[161]:
0 1 2 3
0 4 2 0 0
1 0 0 10 5
2 1 1 0 0
##new.columns = pd.MultiIndex.from_product([df1.columns,df2.columns]).map('_'.join)
Try:
pd.concat([(df1[i]*df2[j]).rename(f'{i}{j}') for i in df1 for j in df2], axis=1)
Output:
dummy1var1 dummy1var2 dummy2var1 dummy2var2
0 4 2 0 0
1 0 0 10 5
2 1 1 0 0
You can definitely do it with one loop:
dummies = X_dummy.astype(bool)
pd.concat([X_var.loc[dummies[c]] for c in dummies], axis=1).fillna(0).astype(int)
# var1 var2 var1 var2
#0 4 2 0 0
#1 0 0 10 5
#2 1 1 0 0
Note that because one of your dataframes contains dummies, you do not need multiplication at all.

Returning the Position of a Pandas Dataframe Entry when first Value in the Row equals 1

As described above i want to get the Position Index of the Dataframe entry based on the condition. It should look something like this
import pandas as pd
a = [[1,0,0,1],[0,1,0,1],[0,0,0,1]]
df = pd.DataFrame(a)
df
Out[61]:
0 1 2 3
0 1 0 0 1
1 0 1 0 1
2 0 0 0 1
And i want to create a new column, that returns the position of the first 1 of the corresponding row. So the End result should look like this:
Out[62]:
0 1 2 3 New
0 1 0 0 1 0
1 0 1 0 1 1
2 0 0 0 1 3
This is my first Question on stackoverflow, so sorry if i did some formal mistakes while asking this question.
Any help appreciated

Pandas dataframe issue: `reset_index` does not remove hierarchical index

I am trying to flatten a Pandas Dataframe MultiIndex so that there is only a single level index. The usual solution based on any number of SE posts is to use the df.reset_index command, but that is just not fixing the problem.
I started out with an Xarray DataArray and converted it to a dataframe. The original dataframe looked like this.
results
simdata a_ss_yr attr attr1 attr2 attr3
run year
0 0 0 0 0 0 0
1 1 6 2 0 4
2 2 4 2 2 0
3 3 1 0 0 1
4 4 2 0 2 0
To flatten the index I used
df.reset_index(drop=True)
This only accomplished this:
run year results
simdata a_ss_yr attr attr1 attr2
0 0 0 0 0 0 0
1 0 1 1 6 2 0
2 0 2 2 4 2 2
3 0 3 3 1 0 0
4 0 4 4 2 0 2
I tried doing the df.reset_index() option more than once, but this is still not flattening the index, and I want to get this to only a single level index.
More specifically I need the "run" and "year" variables to go to the level 0 set of column names, and I need to remove the "result" heading entirely.
I have been reading the Pandas documentation, but it seems like doing this kind of surgery on the index is not really described. Does anyone have a sense of how to do this?
Use first droplevel for remove first level of MultiIndex and then reset_index:
df.columns = df.columns.droplevel(0)
df = df.reset_index()

Add columns to pandas dataframe containing max of each row, AND corresponding column name

My system
Windows 7, 64 bit
python 3.5.1
The challenge
I've got a pandas dataframe, and I would like to know the maximum value for each row, and append that info as a new column. I would also like to know the name of the column where the maximum value is located. And I would like to add another column to the existing dataframe containing the name of the column where the max value can be found.
A similar question has been asked and answered for R in this post.
Reproducible example
In[1]:
# Make pandas dataframe
df = pd.DataFrame({'a':[1,0,0,1,3], 'b':[0,0,1,0,1], 'c':[0,0,0,0,0]})
# Calculate max
my_series = df.max(numeric_only=True, axis = 1)
my_series.name = "maxval"
# Include maxval in df
df = df.join(my_series)
df
Out[1]:
a b c maxval
0 1 0 0 1
1 0 0 0 0
2 0 1 0 1
3 1 0 0 1
4 3 1 0 3
So far so good. Now for the add another column to the existing dataframe containing the name of the column part:
In[2]:
?
?
?
# This is what I'd like to accomplish:
Out[2]:
a b c maxval maxcol
0 1 0 0 1 a
1 0 0 0 0 a,b,c
2 0 1 0 1 b
3 1 0 0 1 a
4 3 1 0 3 a
Notice that I'd like to return all column names if multiple columns contain the same maximum value. Also please notice that the column maxval is not included in maxcol since that would not make much sense. Thanks in advance if anyone out there finds this interesting.
You can compare the df against maxval using eq with axis=0, then use apply with a lambda to produce a boolean mask to mask the columns and join them:
In [183]:
df['maxcol'] = df.ix[:,:'c'].eq(df['maxval'], axis=0).apply(lambda x: ','.join(df.columns[:3][x==x.max()]),axis=1)
df
Out[183]:
a b c maxval maxcol
0 1 0 0 1 a
1 0 0 0 0 a,b,c
2 0 1 0 1 b
3 1 0 0 1 a
4 3 1 0 3 a

Categories

Resources