Column 'signal' is populated with 0 or 1, and I want column 'reversal to tell me when there is a change in this column (i.e., from 0 to 1 or 1 to 0).
Issue: the code below gives me this information correctly for all the rows except the first one. The reason is that it tries to look at the value before the first one for column 'signal', and, since it does not find any value (of course - it is the first one!), it says that there is a change (as it does when the column's value changes from 0 to 1 or 1 to 0).
How can I fix it? I would like the code to disregard the first discrepancy, basically.
import pandas as pd
import numpy as np
d= {'signal':[0,0,0,1,1,0]}
df_zinc = pd.DataFrame(data=d)
df_zinc['reversal'] = np.where(df_zinc['signal']!=df_zinc['signal'].shift(),1,0)
print(df_zinc)
OUTPUT
signal reversal
0 0 1
1 0 0
2 0 0
3 1 1
4 1 0
5 0 1
If you are looking for the changes, I'd suggest using diff instead:
df_zinc['signal'].diff().fillna(0)!=0
if you prefer it as a int instead of a boolean:
bool_s = df_zinc['signal'].diff().fillna(0)!=0
int_s = bool_s.astype(int)
Testing:
df_zinc['reversal'] = (df_zinc['signal'].diff().fillna(0)!=0).astype(int)
Output
signal reversal
0 0 0
1 0 0
2 0 0
3 1 1
4 1 0
5 0 1
Related
my input:
index frame user1 user2
0 0 0 0
1 1 0 0
2 2 0 0
3 3 0 0
4 4 0 0
5 5 0 0
Also I have two objects start_frame and end_frame - pandas Series look like this for 'start frame' :
index frame
3 3
and for end frame:
index frame
4 5
My problem is apply function in specific column - user1 and in specific row number, where values I get from start_frame and end_frame.
I expect output like this:
frame user1 user2
0 0 0 0
1 1 0 0
2 2 0 0
3 3 1 0
4 4 1 0
5 5 1 0
I trying this but it return all column to ones or any other output but not that I want
def my_func(x):
x=x+1
return x
df['user1']=df['user1'].between(df['frame']==3, df['frame']==5, inclusive=False).apply(lambda x: add_one(x))
I trying another code:
df['user1']=df.apply(lambda row: 1 if row['frame'] in (3,5) else 0, axis=1)
But it return only 1 in row 3 and 5, how here in (3,5) insert range?
So I have two question: First and most important how to apply my_func exacly in rows what I need, and other question how to use my object end_frame and start_frame instead manually insert in function.
Thank you
Updated:
arr_rang = range(3,6)
df['user1']=df.apply(lambda row: 1 if row['frame'] in (arr_rang) else 0, axis=1)
Now it's return 1 in frame 3,4,5. That I need. But still I dont understand how use my objects end_frame and start_frame
let's append start_frame and end_frame since they are having common columns then check values using isin() and finally changing value by using boolean masking and loc accessor:
s=start_frame.append(end_frame)
mask=(df['index'].isin(s['index'])) | (df['frame'].isin(s['frame']))
df.loc[mask,'user1']=df.loc[mask,'user1']+1
#you can also use np.where() in place of loc accessor
output of df:
index frame user1 user2
0 0 0 0 0
1 1 1 0 0
2 2 2 0 0
3 3 3 1 0
4 4 4 1 0
5 5 5 1 0
Update:
use:
mask=df['frame'].between(3,5)
df.loc[mask,'user1']=df.loc[mask,'user1']+1
Did you try
def putHello(row):
row["hello"] = "world"
return row
data.iloc[5:7].apply(putHello,axis=1)
The output would look something like this
The documentation for pandas functions
Iloc pandas
Apply pandas
I have a number of pandas dataframes that each have a column 'speaker', and one of two labels. Typically, this is 0-1, however in some cases it is 1-2, 1-3, or 0-2. I am trying to find a way to iterate through all of my dataframes and standardize them so that they share the same labels (0-1).
The one consistent feature between them is that the first label to appear (i.e. in the first row of the dataframe) should always be mapped to '0', where as the second should always be mapped to '1'.
Here is an example of one of the dataframes I would need to change - being mindful that others will have different labels:
import pandas as pd
data = [1,2,1,2,1,2,1,2,1,2]
df = pd.DataFrame(data, columns = ['speaker'])
I would like to be able to change so that it appears as [0,1,0,1,0,1,0,1,0,1].
Thus far, I have tried inserting the following code within a bigger for loop that iterates through each dataframe. However it is not working at all:
for label in data['speaker']:
if label == data['speaker'][0]:
label = '0'
else:
label = '1'
Hopefully, what the above makes clear is that I am attempting to create a rule akin to: "find all instances in 'Speaker' that match the label in the first index position and change this to '0'. For all other instances change this to '1'."
Method 1
We can use iat + np.where here for conditional creation of your column:
# import numpy as np
first_val = df['speaker'].iat[0] # same as df['speaker'].iloc[0]
df['speaker'] = np.where(df['speaker'].eq(first_val), 0, 1)
speaker
0 0
1 1
2 0
3 1
4 0
5 1
6 0
7 1
8 0
9 1
Method 2:
We can also make use of booleans, since we can cast them to integers:
first_val = df['speaker'].iat[0]
df['speaker'] = df['speaker'].ne(first_val).astype(int)
speaker
0 0
1 1
2 0
3 1
4 0
5 1
6 0
7 1
8 0
9 1
Only if your values are actually 1, 2 we can use floor division:
df['speaker'] = df['speaker'] // 2
# same as: df['speaker'] = df['speaker'].floordiv(2)
speaker
0 0
1 1
2 0
3 1
4 0
5 1
6 0
7 1
8 0
9 1
You can use a iloc to get the value of the first row and the first column, and then a mask to set the values:
zero_map = df["speaker"].iloc[0]
mask_zero = df["speaker"] == zero_map
df.loc[mask_zero] = 0
df.loc[~mask_zero] = 1
print(df)
speaker
0 0
1 1
2 0
3 1
4 0
5 1
6 0
7 1
8 0
9 1
As described above i want to get the Position Index of the Dataframe entry based on the condition. It should look something like this
import pandas as pd
a = [[1,0,0,1],[0,1,0,1],[0,0,0,1]]
df = pd.DataFrame(a)
df
Out[61]:
0 1 2 3
0 1 0 0 1
1 0 1 0 1
2 0 0 0 1
And i want to create a new column, that returns the position of the first 1 of the corresponding row. So the End result should look like this:
Out[62]:
0 1 2 3 New
0 1 0 0 1 0
1 0 1 0 1 1
2 0 0 0 1 3
This is my first Question on stackoverflow, so sorry if i did some formal mistakes while asking this question.
Any help appreciated
My system
Windows 7, 64 bit
python 3.5.1
The challenge
I've got a pandas dataframe, and I would like to know the maximum value for each row, and append that info as a new column. I would also like to know the name of the column where the maximum value is located. And I would like to add another column to the existing dataframe containing the name of the column where the max value can be found.
A similar question has been asked and answered for R in this post.
Reproducible example
In[1]:
# Make pandas dataframe
df = pd.DataFrame({'a':[1,0,0,1,3], 'b':[0,0,1,0,1], 'c':[0,0,0,0,0]})
# Calculate max
my_series = df.max(numeric_only=True, axis = 1)
my_series.name = "maxval"
# Include maxval in df
df = df.join(my_series)
df
Out[1]:
a b c maxval
0 1 0 0 1
1 0 0 0 0
2 0 1 0 1
3 1 0 0 1
4 3 1 0 3
So far so good. Now for the add another column to the existing dataframe containing the name of the column part:
In[2]:
?
?
?
# This is what I'd like to accomplish:
Out[2]:
a b c maxval maxcol
0 1 0 0 1 a
1 0 0 0 0 a,b,c
2 0 1 0 1 b
3 1 0 0 1 a
4 3 1 0 3 a
Notice that I'd like to return all column names if multiple columns contain the same maximum value. Also please notice that the column maxval is not included in maxcol since that would not make much sense. Thanks in advance if anyone out there finds this interesting.
You can compare the df against maxval using eq with axis=0, then use apply with a lambda to produce a boolean mask to mask the columns and join them:
In [183]:
df['maxcol'] = df.ix[:,:'c'].eq(df['maxval'], axis=0).apply(lambda x: ','.join(df.columns[:3][x==x.max()]),axis=1)
df
Out[183]:
a b c maxval maxcol
0 1 0 0 1 a
1 0 0 0 0 a,b,c
2 0 1 0 1 b
3 1 0 0 1 a
4 3 1 0 3 a
I have a file with 13 columns and I am looking to perform some grouping tasks. The input looks like so:
A B C D E F G H I J K L M
0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 1 1
Excluding column A, the grouping is to be done as follows producing five new columns, the columns J,K,L,M will be merged into one as it is a special case.
A,B > new column D,E > new colum
B C Result
1 0 1
0 1 1
1 1 1
0 0 0
If either of the two columns has "1" in it or maybe both, I want to count it as 1. Right now I have written this little snippet but I am not sure how to proceed.
from collections import Counter
with open("datagroup.txt") as inFile:
print Counter([" ".join(line.split()[::2]) for line in inFile])
* Edit *
A B&C D&E F&G H&I J,K,L,M
1 1 0 0 1 1
1 1 0 0 0 1
0 1 0 0 1 0
1 0 0 0 0 1
0 1 0 1 1 1
1 0 0 0 0 1
Basically what I want to do is to exclude the first column and then compare every two columns after that until column J, If either column has a "1" present, I want to report that as "1" even if both columns have "1" I would still report that as "1". For the last for columns, namely: J,K,L,M if I see a "1" in either four, it should be reported as "1".
First, you're obviously going to have to iterate over the rows in some way to do something for each row.
Second, I have no idea what what you're trying to do with the [::2], since that will just give you all the even columns, or what the Counter is for in the first place, or why specifically you're trying to count strings made up of a bunch of concatenated columns.
But I think what you want is this:
with open("datagroup.txt") as inFile:
for row in inFile:
columns = row.split()
outcolumns = []
outcolumns.append(columns[0]) # A
for group in zip(columns[1:-4:2], columns[2:-4:2])+columns[-4:]:
outcolumns.append('1' if '1' in group else '0')
print(' '.join(outcolumns))
You can make this a lot more concise with a bit of itertools and comprehensions, but I wanted to keep this verbose and simple so you'd understand it.