I want to fill column with True and NaN values
import numpy as np
import pandas as pd
my_list = [1,2,3,4,5]
df = pd.DataFrame({'col1' : [0,1,2,3,4,5,6,7,8,9,10]})
df['col2'] = np.where(df['col1'].isin(my_list), True, np.NaN)
print (df)
It prints:
col1 col2
0 0 NaN
1 1 1.0
2 2 1.0
3 3 1.0
4 4 1.0
5 5 1.0
6 6 NaN
7 7 NaN
8 8 NaN
9 9 NaN
10 10 NaN
But it is very important for me to print bool value True, not float number 1.0. This column interacts with other columns. They are bool, so it must be bool too.
I know I can change it with replace function. But my DataFrame is very large. I cannot waste time. Is there a simple option to do it?
This code will solve your problem. np.where will returning you true because of numpy only deals with the number and True means 1 in number. that's why it's giving you 1.0 instead of True
Code
import numpy as np
import pandas as pd
my_list = [1,2,3,4,5]
df = pd.DataFrame({'col1' : [0,1,2,3,4,5,6,7,8,9,10]})
df['col2'] = df['col1'].apply(lambda x: True if x in my_list else np.NaN)
print (df)
Results
col1 col2
0 0 NaN
1 1 True
2 2 True
3 3 True
4 4 True
5 5 True
6 6 NaN
7 7 NaN
8 8 NaN
9 9 NaN
10 10 NaN
Use Nullable Boolean data type:
df['col2'] = pd.Series(np.where(df['col1'].isin(my_list), True, np.NaN), dtype='boolean')
print (df)
col1 col2
0 0 <NA>
1 1 True
2 2 True
3 3 True
4 4 True
5 5 True
6 6 <NA>
7 7 <NA>
8 8 <NA>
9 9 <NA>
10 10 <NA>
you can call this
df.col2 = df.col2.apply(lambda x: True if x==1.0 else x)
Related
I have a dataframe as follows. Actually this dataframe is made by outer join of two table.
IndentID IndentNo role_name role_id user_id ishod isdesignatedhod Flag
100 1234 xyz 3 17 1 nan right_only
nan nan nan -1 -1 None None right_only
nan nan nan 1 15 None None right_only
nan nan nan 100 100 None 1 right_only
Objective: I want a resultant dataframe based on column conditions. The conditions are given below
if ishod == 1 the resultant df will be:
IndentID IndentNo role_name role_id user_id
100 1234 xyz 3 17
if ishod!=1 and isdesignatedhod==1 the resultant df will be:
IndentID IndentNo role_name role_id user_id
100 1234 xyz 100 100
I am really clueless on how to proceed on this. Any clue will be appreciated!!
To select rows based on a value in a certain column you can do use the following notation:
df[ df["column_name"] == value_to_keep ]
Here is an example of this in action:
import pandas as pd
d = {'col1': [1,2,1,2,1,2,1,2,1,2,1,2,1,2,1],
'col2': [3,4,5,3,4,5,3,4,5,3,4,5,3,4,5],
'col3': [6,7,8,9,6,7,8,9,6,7,8,9,6,7,8]}
# create a dataframe
df = pd.DataFrame(d)
This is what df looks like:
In [17]: df
Out[17]:
col1 col2 col3
0 1 3 6
1 2 4 7
2 1 5 8
3 2 3 9
4 1 4 6
5 2 5 7
6 1 3 8
7 2 4 9
8 1 5 6
9 2 3 7
10 1 4 8
11 2 5 9
12 1 3 6
13 2 4 7
14 1 5 8
Now to select all rows for which the value is '2' in the first column:
df_1 = df[df["col1"] == 2]
In [19]: df_1
Out [19]:
col1 col2 col3
1 2 4 7
3 2 3 9
5 2 5 7
7 2 4 9
9 2 3 7
11 2 5 9
13 2 4 7
You can also multiple conditions this way:
df_2 = df[(df["col2"] >= 4) & (df["col3"] != 7)]
In [22]: df_2
Out [22]:
col1 col2 col3
2 1 5 8
4 1 4 6
7 2 4 9
8 1 5 6
10 1 4 8
11 2 5 9
14 1 5 8
Hope this example helps!
Andre gives the right answer. Also you have to keep in mind dtype of columns ishod and isdesignatedhod. They are "object" type, in this specifically case "strings".
So you have to use "quotes" when compare these object columns with numbers.
df[df["ishod"] == "1"]
This should do approximately what you want
nan = float("nan")
def func(row):
if row["ishod"] == "1":
return pd.Series([100, 1234, "xyz", 3, 17, nan, nan, nan], index=row.index)
elif row["isdesignatedhod"] == "1":
return pd.Series([100, 1234, "xyz", 100, 100, nan, nan, nan], index=row.index)
else:
return row
pd.read_csv(io.StringIO(
"""IndentID IndentNo role_name role_id user_id ishod isdesignatedhod Flag
100 1234 xyz 3 17 1 nan right_only
nan nan nan -1 -1 None None right_only
nan nan nan 1 15 None None right_only
nan nan nan 100 100 None 1 right_only
"""), sep=" +", engine='python')\
.apply(func,axis=1)
Output:
IndentID IndentNo role_name role_id user_id ishod isdesignatedhod Flag
0 100.0 1234.0 xyz 3 17 NaN NaN NaN
1 NaN NaN NaN -1 -1 None None right_only
2 NaN NaN NaN 1 15 None None right_only
3 100.0 1234.0 xyz 100 100 NaN NaN NaN
i have this dataframe:
a b c d
4 7 5 12
3 8 2 8
1 9 3 5
9 2 6 4
i want the column 'd' to become the difference between n-value of column a and n+1 value of column 'a'.
I tried this but it doesn't run:
for i in data.index-1:
data.iloc[i]['d']=data.iloc[i]['a']-data.iloc[i+1]['a']
can anyone help me?
Basically what you want is diff.
df = pd.DataFrame.from_dict({"a":[4,3,1,9]})
df["d"] = df["a"].diff(periods=-1)
print(df)
Output
a d
0 4 1.0
1 3 2.0
2 1 -8.0
3 9 NaN
lets try simple way:
df=pd.DataFrame.from_dict({'a':[2,4,8,15]})
diff=[]
for i in range(len(df)-1):
diff.append(df['a'][i+1]-df['a'][i])
diff.append(np.nan)
df['d']=diff
print(df)
a d
0 2 2.0
1 4 4.0
2 8 7.0
3 15 NaN
I have pandas dataframe of the form,df=
index,result1,result2,result3
0 s u s
1 u s u
2 s
3 s s u
i would like to add another column that contains a list of the number of times s occurs in that row, for example
index,result1,result2,result3,count
0 s u s 2
1 u s u 1
2 s 1
3 s s u 2
i have tried the following code
col=['result1','result2','result3']
df[cols].count(axis=1)
but this returns
0,3
1,3
2,1
3,3
so this counts the number of elements, i then tried
df[df[cols]=='s'].count(axis=1)
but this returned the following error: "Could not compare ['s'] with block values"
Any help would be greatly appreciated
For me works cast to string by astype numeric and NaN columns return your error:
print (df)
index result1 result2 result3 result4
0 0 s u 7 NaN
1 1 u s 7 NaN
2 2 s NaN 8 NaN
3 3 s s 7 NaN
4 4 NaN NaN 2 NaN
print (df.dtypes)
index int64
result1 object
result2 object
result3 int64
result4 float64
dtype: object
cols = ['result1','result2','result3','result4']
df['count'] = df[df[cols].astype(str) == 's'].count(axis=1)
print (df)
index result1 result2 result3 result4 count
0 0 s u 7 NaN 1
1 1 u s 7 NaN 1
2 2 s NaN 8 NaN 1
3 3 s s 7 NaN 2
4 4 NaN NaN 2 NaN 0
Or sum only True values from boolean mask:
print (df[cols].astype(str) == 's')
result1 result2 result3 result4
0 True False False False
1 False True False False
2 True False False False
3 True True False False
4 False False False False
cols = ['result1','result2','result3','result4']
df['count'] = (df[cols].astype(str) =='s').sum(axis=1)
print (df)
index result1 result2 result3 result4 count
0 0 s u 7 NaN 1
1 1 u s 7 NaN 1
2 2 s NaN 8 NaN 1
3 3 s s 7 NaN 2
4 4 NaN NaN 2 NaN 0
Another nice solution is from Nickil Maveli - use numpy:
df['count'] = (df[cols].values=='s').sum(axis=1)
I have a MWE that can be reproduced with the following code:
import pandas as pd
a = pd.DataFrame([[1,2],[3,4]], columns=['A', 'B'])
b = pd.DataFrame([[True,False],[False,True]], columns=['A', 'B'])
Which creates the following dataframes:
In [8]: a
Out[8]:
A B
0 1 2
1 3 4
In [9]: b
Out[9]:
A B
0 True False
1 False True
My question is, how can I change the values for dataframe A based on the boolean values in dataframe B?
Say for example if I wanted to make NAN values in dataframe A where there's an instance of False in dataframe B?
If need replace False to NaN:
print (a[b])
A B
0 1.0 NaN
1 NaN 4.0
or:
print (a.where(b))
A B
0 1.0 NaN
1 NaN 4.0
and if need replace True to NaN:
print (a[~b])
A B
0 NaN 2.0
1 3.0 NaN
or:
print (a.mask(b))
A B
0 NaN 2.0
1 3.0 NaN
Also you can use where or mask with some scalar value:
print (a.where(b, 7))
A B
0 1 7
1 7 4
print (a.mask(b, 7))
A B
0 7 2
1 3 7
print (a.where(b, 'TEST'))
A B
0 1 TEST
1 TEST 4
I want to add l in column 'A' but it creates a new column and adds l to the last one. Why is it happening? And how can I make what I want?
import pandas as pd
l=[1,2,3]
df = pd.DataFrame(columns =['A'])
df = df.append(l, ignore_index=True)
df = df.append(l, ignore_index=True)
print(df)
A 0
0 NaN 1.0
1 NaN 2.0
2 NaN 3.0
3 NaN 1.0
4 NaN 2.0
5 NaN 3.0
Edited
Is this what you want to do:
In[6]:df=df.A.append(pd.Series(l)).reset_index().drop('index',1).rename(columns={0:'A'})
In[7]:df
Out[7]:
A
0 1
1 2
2 3
Then you can add any list of different length.
Suppose:
a=[9,8,7,6,5]
In[11]:df=df.A.append(pd.Series(a)).reset_index().drop('index',1).rename(columns={0:'A'})
In[12]:df
Out[12]:
A
0 1
1 2
2 3
3 9
4 8
5 7
6 6
7 5
Previously
are you looking for this :
df=pd.DataFrame(l,columns=['A'])
df
Out[5]:
A
0 1
1 2
2 3
You can just pass a dictionary in the dataframe constructor, that if I understand your question correctly.
l = [1,2,3]
df = pd.DataFrame({'A': l})
df
A
0 1
1 2
2 3