confusing results of pandas boolean operator [duplicate] - python

I have a pandas Series object containing boolean values. How can I get a series containing the logical NOT of each value?
For example, consider a series containing:
True
True
True
False
The series I'd like to get would contain:
False
False
False
True
This seems like it should be reasonably simple, but apparently I've misplaced my mojo =(

To invert a boolean Series, use ~s:
In [7]: s = pd.Series([True, True, False, True])
In [8]: ~s
Out[8]:
0 False
1 False
2 True
3 False
dtype: bool
Using Python2.7, NumPy 1.8.0, Pandas 0.13.1:
In [119]: s = pd.Series([True, True, False, True]*10000)
In [10]: %timeit np.invert(s)
10000 loops, best of 3: 91.8 µs per loop
In [11]: %timeit ~s
10000 loops, best of 3: 73.5 µs per loop
In [12]: %timeit (-s)
10000 loops, best of 3: 73.5 µs per loop
As of Pandas 0.13.0, Series are no longer subclasses of numpy.ndarray; they are now subclasses of pd.NDFrame. This might have something to do with why np.invert(s) is no longer as fast as ~s or -s.
Caveat: timeit results may vary depending on many factors including hardware, compiler, OS, Python, NumPy and Pandas versions.

#unutbu's answer is spot on, just wanted to add a warning that your mask needs to be dtype bool, not 'object'. Ie your mask can't have ever had any nan's. See here - even if your mask is nan-free now, it will remain 'object' type.
The inverse of an 'object' series won't throw an error, instead you'll get a garbage mask of ints that won't work as you expect.
In[1]: df = pd.DataFrame({'A':[True, False, np.nan], 'B':[True, False, True]})
In[2]: df.dropna(inplace=True)
In[3]: df['A']
Out[3]:
0 True
1 False
Name: A, dtype object
In[4]: ~df['A']
Out[4]:
0 -2
0 -1
Name: A, dtype object
After speaking with colleagues about this one I have an explanation: It looks like pandas is reverting to the bitwise operator:
In [1]: ~True
Out[1]: -2
As #geher says, you can convert it to bool with astype before you inverse with ~
~df['A'].astype(bool)
0 False
1 True
Name: A, dtype: bool
(~df['A']).astype(bool)
0 True
1 True
Name: A, dtype: bool

I just give it a shot:
In [9]: s = Series([True, True, True, False])
In [10]: s
Out[10]:
0 True
1 True
2 True
3 False
In [11]: -s
Out[11]:
0 False
1 False
2 False
3 True

You can also use numpy.invert:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: s = pd.Series([True, True, False, True])
In [4]: np.invert(s)
Out[4]:
0 False
1 False
2 True
3 False
EDIT: The difference in performance appears on Ubuntu 12.04, Python 2.7, NumPy 1.7.0 - doesn't seem to exist using NumPy 1.6.2 though:
In [5]: %timeit (-s)
10000 loops, best of 3: 26.8 us per loop
In [6]: %timeit np.invert(s)
100000 loops, best of 3: 7.85 us per loop
In [7]: %timeit ~s
10000 loops, best of 3: 27.3 us per loop

In support to the excellent answers here, and for future convenience, there may be a case where you want to flip the truth values in the columns and have other values remain the same (nan values for instance)
In[1]: series = pd.Series([True, np.nan, False, np.nan])
In[2]: series = series[series.notna()] #remove nan values
In[3]: series # without nan
Out[3]:
0 True
2 False
dtype: object
# Out[4] expected to be inverse of Out[3], pandas applies bitwise complement
# operator instead as in `lambda x : (-1*x)-1`
In[4]: ~series
Out[4]:
0 -2
2 -1
dtype: object
as a simple non-vectorized solution you can just, 1. check types2. inverse bools
In[1]: series = pd.Series([True, np.nan, False, np.nan])
In[2]: series = series.apply(lambda x : not x if x is bool else x)
Out[2]:
Out[2]:
0 True
1 NaN
2 False
3 NaN
dtype: object

NumPy is slower because it casts the input to boolean values (so None and 0 becomes False and everything else becomes True).
import pandas as pd
import numpy as np
s = pd.Series([True, None, False, True])
np.logical_not(s)
gives you
0 False
1 True
2 True
3 False
dtype: object
whereas ~s would crash. In most cases tilde would be a safer choice than NumPy.
Pandas 0.25, NumPy 1.17

Related

Pandas: find matching rows in two dataframes (without using `merge`)

Let's suppose I have these two dataframes with the same number of columns, but possibly different number of rows:
tmp = np.arange(0,12).reshape((4,3))
df = pd.DataFrame(data=tmp)
tmp2 = {'a':[3,100,101], 'b':[4,4,100], 'c':[5,100,3]}
df2 = pd.DataFrame(data=tmp2)
print(df)
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
print(df2)
a b c
0 3 4 5
1 100 4 100
2 101 100 3
I want to verify if the rows of df2 are matching any rows of df, that is I want to obtain a series (or an array) of boolean values that gives this result:
0 True
1 False
2 False
dtype: bool
I think something like the isin method should work, but I got this result, which results in a dataframe and is wrong:
print(df2.isin(df))
a b c
0 False False False
1 False False False
2 False False False
As a constraint, I wish to not use the merge method, since what I am doing is in fact a check on the data before applying merge itself.
Thank you for your help!
You can use numpy.isin, which will compare all elements in your arrays and return True or False for each element for each array.
Then using all() on each array, will get your desired output as the function returns True if all elements are true:
>>> pd.Series([m.all() for m in np.isin(df2.values,df.values)])
0 True
1 False
2 False
dtype: bool
Breakdown of what is happening:
# np.isin
>>> np.isin(df2.values,df.values)
Out[139]:
array([[ True, True, True],
[False, True, False],
[False, False, True]])
# all()
>>> [m.all() for m in np.isin(df2.values,df.values)]
Out[140]: [True, False, False]
# pd.Series()
>>> pd.Series([m.all() for m in np.isin(df2.values,df.values)])
Out[141]:
0 True
1 False
2 False
dtype: bool
Use np.in1d:
>>> df2.apply(lambda x: all(np.in1d(x, df)), axis=1)
0 True
1 False
2 False
dtype: bool
Another way, use frozenset:
>>> df2.apply(frozenset, axis=1).isin(df1.apply(frozenset, axis=1))
0 True
1 False
2 False
dtype: bool
You can use a MultiIndex (expensive IMO):
pd.MultiIndex.from_frame(df2).isin(pd.MultiIndex.from_frame(df))
Out[32]: array([ True, False, False])
Another option is to create a dictionary, and run isin:
df2.isin({key : array.array for key, (_, array) in zip(df2, df.items())}).all(1)
Out[45]:
0 True
1 False
2 False
dtype: bool
There may be more efficient solutions, but you could append the two dataframes can call duplicated, e.g.:
df.append(df2).duplicated().iloc[df.shape[0]:]
This assumes that all rows in each DataFrame are distinct. Here are some benchmarks:
tmp1 = np.arange(0,12).reshape((4,3))
df1 = pd.DataFrame(data=tmp1, columns=["a", "b", "c"])
tmp2 = {'a':[3,100,101], 'b':[4,4,100], 'c':[5,100,3]}
df2 = pd.DataFrame(data=tmp2)
df1 = pd.concat([df1] * 10_000).reset_index()
df2 = pd.concat([df2] * 10_000).reset_index()
%timeit df1.append(df2).duplicated().iloc[df1.shape[0]:]
# 100 loops, best of 5: 4.16 ms per loop
%timeit pd.Series([m.all() for m in np.isin(df2.values,df1.values)])
# 10 loops, best of 5: 74.9 ms per loop
%timeit df2.apply(frozenset, axis=1).isin(df1.apply(frozenset, axis=1))
# 1 loop, best of 5: 443 ms per loop
Try:
df[~df.apply(tuple,1).isin(df2.apply(tuple,1))]
Here is my result:

condition check in pandas dataframe - python [duplicate]

float('nan') represents NaN (not a number). But how do I check for it?
Use math.isnan:
>>> import math
>>> x = float('nan')
>>> math.isnan(x)
True
The usual way to test for a NaN is to see if it's equal to itself:
def isNaN(num):
return num != num
numpy.isnan(number) tells you if it's NaN or not.
Here are three ways where you can test a variable is "NaN" or not.
import pandas as pd
import numpy as np
import math
# For single variable all three libraries return single boolean
x1 = float("nan")
print(f"It's pd.isna: {pd.isna(x1)}")
print(f"It's np.isnan: {np.isnan(x1)}}")
print(f"It's math.isnan: {math.isnan(x1)}}")
Output
It's pd.isna: True
It's np.isnan: True
It's math.isnan: True
It seems that checking if it's equal to itself (x != x) is the fastest.
import pandas as pd
import numpy as np
import math
x = float('nan')
%timeit x != x
44.8 ns ± 0.152 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit math.isnan(x)
94.2 ns ± 0.955 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit pd.isna(x)
281 ns ± 5.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit np.isnan(x)
1.38 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
here is an answer working with:
NaN implementations respecting IEEE 754 standard
ie: python's NaN: float('nan'), numpy.nan...
any other objects: string or whatever (does not raise exceptions if encountered)
A NaN implemented following the standard, is the only value for which the inequality comparison with itself should return True:
def is_nan(x):
return (x != x)
And some examples:
import numpy as np
values = [float('nan'), np.nan, 55, "string", lambda x : x]
for value in values:
print(f"{repr(value):<8} : {is_nan(value)}")
Output:
nan : True
nan : True
55 : False
'string' : False
<function <lambda> at 0x000000000927BF28> : False
I actually just ran into this, but for me it was checking for nan, -inf, or inf. I just used
if float('-inf') < float(num) < float('inf'):
This is true for numbers, false for nan and both inf, and will raise an exception for things like strings or other types (which is probably a good thing). Also this does not require importing any libraries like math or numpy (numpy is so damn big it doubles the size of any compiled application).
math.isnan()
or compare the number to itself. NaN is always != NaN, otherwise (e.g. if it is a number) the comparison should succeed.
Well I entered this post, because i've had some issues with the function:
math.isnan()
There are problem when you run this code:
a = "hello"
math.isnan(a)
It raises exception.
My solution for that is to make another check:
def is_nan(x):
return isinstance(x, float) and math.isnan(x)
Another method if you're stuck on <2.6, you don't have numpy, and you don't have IEEE 754 support:
def isNaN(x):
return str(x) == str(1e400*0)
With python < 2.6 I ended up with
def isNaN(x):
return str(float(x)).lower() == 'nan'
This works for me with python 2.5.1 on a Solaris 5.9 box and with python 2.6.5 on Ubuntu 10
I am receiving the data from a web-service that sends NaN as a string 'Nan'. But there could be other sorts of string in my data as well, so a simple float(value) could throw an exception. I used the following variant of the accepted answer:
def isnan(value):
try:
import math
return math.isnan(float(value))
except:
return False
Requirement:
isnan('hello') == False
isnan('NaN') == True
isnan(100) == False
isnan(float('nan')) = True
Comparison pd.isna, math.isnan and np.isnan and their flexibility dealing with different type of objects.
The table below shows if the type of object can be checked with the given method:
+------------+-----+---------+------+--------+------+
| Method | NaN | numeric | None | string | list |
+------------+-----+---------+------+--------+------+
| pd.isna | yes | yes | yes | yes | yes |
| math.isnan | yes | yes | no | no | no |
| np.isnan | yes | yes | no | no | yes | <-- # will error on mixed type list
+------------+-----+---------+------+--------+------+
pd.isna
The most flexible method to check for different types of missing values.
None of the answers cover the flexibility of pd.isna. While math.isnan and np.isnan will return True for NaN values, you cannot check for different type of objects like None or strings. Both methods will return an error, so checking a list with mixed types will be cumbersom. This while pd.isna is flexible and will return the correct boolean for different kind of types:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: missing_values = [3, None, np.NaN, pd.NA, pd.NaT, '10']
In [4]: pd.isna(missing_values)
Out[4]: array([False, True, True, True, True, False])
All the methods to tell if the variable is NaN or None:
None type
In [1]: from numpy import math
In [2]: a = None
In [3]: not a
Out[3]: True
In [4]: len(a or ()) == 0
Out[4]: True
In [5]: a == None
Out[5]: True
In [6]: a is None
Out[6]: True
In [7]: a != a
Out[7]: False
In [9]: math.isnan(a)
Traceback (most recent call last):
File "<ipython-input-9-6d4d8c26d370>", line 1, in <module>
math.isnan(a)
TypeError: a float is required
In [10]: len(a) == 0
Traceback (most recent call last):
File "<ipython-input-10-65b72372873e>", line 1, in <module>
len(a) == 0
TypeError: object of type 'NoneType' has no len()
NaN type
In [11]: b = float('nan')
In [12]: b
Out[12]: nan
In [13]: not b
Out[13]: False
In [14]: b != b
Out[14]: True
In [15]: math.isnan(b)
Out[15]: True
In Python 3.6 checking on a string value x math.isnan(x) and np.isnan(x) raises an error.
So I can't check if the given value is NaN or not if I don't know beforehand it's a number.
The following seems to solve this issue
if str(x)=='nan' and type(x)!='str':
print ('NaN')
else:
print ('non NaN')
How to remove NaN (float) item(s) from a list of mixed data types
If you have mixed types in an iterable, here is a solution that does not use numpy:
from math import isnan
Z = ['a','b', float('NaN'), 'd', float('1.1024')]
[x for x in Z if not (
type(x) == float # let's drop all float values…
and isnan(x) # … but only if they are nan
)]
['a', 'b', 'd', 1.1024]
Short-circuit evaluation means that isnan will not be called on values that are not of type 'float', as False and (…) quickly evaluates to False without having to evaluate the right-hand side.
For nan of type float
>>> import pandas as pd
>>> value = float(nan)
>>> type(value)
>>> <class 'float'>
>>> pd.isnull(value)
True
>>>
>>> value = 'nan'
>>> type(value)
>>> <class 'str'>
>>> pd.isnull(value)
False
for strings in panda take pd.isnull:
if not pd.isnull(atext):
for word in nltk.word_tokenize(atext):
the function as feature extraction for NLTK
def act_features(atext):
features = {}
if not pd.isnull(atext):
for word in nltk.word_tokenize(atext):
if word not in default_stopwords:
features['cont({})'.format(word.lower())]=True
return features

Reverse boolean column in python pandas [duplicate]

I have a pandas Series object containing boolean values. How can I get a series containing the logical NOT of each value?
For example, consider a series containing:
True
True
True
False
The series I'd like to get would contain:
False
False
False
True
This seems like it should be reasonably simple, but apparently I've misplaced my mojo =(
To invert a boolean Series, use ~s:
In [7]: s = pd.Series([True, True, False, True])
In [8]: ~s
Out[8]:
0 False
1 False
2 True
3 False
dtype: bool
Using Python2.7, NumPy 1.8.0, Pandas 0.13.1:
In [119]: s = pd.Series([True, True, False, True]*10000)
In [10]: %timeit np.invert(s)
10000 loops, best of 3: 91.8 µs per loop
In [11]: %timeit ~s
10000 loops, best of 3: 73.5 µs per loop
In [12]: %timeit (-s)
10000 loops, best of 3: 73.5 µs per loop
As of Pandas 0.13.0, Series are no longer subclasses of numpy.ndarray; they are now subclasses of pd.NDFrame. This might have something to do with why np.invert(s) is no longer as fast as ~s or -s.
Caveat: timeit results may vary depending on many factors including hardware, compiler, OS, Python, NumPy and Pandas versions.
#unutbu's answer is spot on, just wanted to add a warning that your mask needs to be dtype bool, not 'object'. Ie your mask can't have ever had any nan's. See here - even if your mask is nan-free now, it will remain 'object' type.
The inverse of an 'object' series won't throw an error, instead you'll get a garbage mask of ints that won't work as you expect.
In[1]: df = pd.DataFrame({'A':[True, False, np.nan], 'B':[True, False, True]})
In[2]: df.dropna(inplace=True)
In[3]: df['A']
Out[3]:
0 True
1 False
Name: A, dtype object
In[4]: ~df['A']
Out[4]:
0 -2
0 -1
Name: A, dtype object
After speaking with colleagues about this one I have an explanation: It looks like pandas is reverting to the bitwise operator:
In [1]: ~True
Out[1]: -2
As #geher says, you can convert it to bool with astype before you inverse with ~
~df['A'].astype(bool)
0 False
1 True
Name: A, dtype: bool
(~df['A']).astype(bool)
0 True
1 True
Name: A, dtype: bool
I just give it a shot:
In [9]: s = Series([True, True, True, False])
In [10]: s
Out[10]:
0 True
1 True
2 True
3 False
In [11]: -s
Out[11]:
0 False
1 False
2 False
3 True
You can also use numpy.invert:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: s = pd.Series([True, True, False, True])
In [4]: np.invert(s)
Out[4]:
0 False
1 False
2 True
3 False
EDIT: The difference in performance appears on Ubuntu 12.04, Python 2.7, NumPy 1.7.0 - doesn't seem to exist using NumPy 1.6.2 though:
In [5]: %timeit (-s)
10000 loops, best of 3: 26.8 us per loop
In [6]: %timeit np.invert(s)
100000 loops, best of 3: 7.85 us per loop
In [7]: %timeit ~s
10000 loops, best of 3: 27.3 us per loop
In support to the excellent answers here, and for future convenience, there may be a case where you want to flip the truth values in the columns and have other values remain the same (nan values for instance)
In[1]: series = pd.Series([True, np.nan, False, np.nan])
In[2]: series = series[series.notna()] #remove nan values
In[3]: series # without nan
Out[3]:
0 True
2 False
dtype: object
# Out[4] expected to be inverse of Out[3], pandas applies bitwise complement
# operator instead as in `lambda x : (-1*x)-1`
In[4]: ~series
Out[4]:
0 -2
2 -1
dtype: object
as a simple non-vectorized solution you can just, 1. check types2. inverse bools
In[1]: series = pd.Series([True, np.nan, False, np.nan])
In[2]: series = series.apply(lambda x : not x if x is bool else x)
Out[2]:
Out[2]:
0 True
1 NaN
2 False
3 NaN
dtype: object
NumPy is slower because it casts the input to boolean values (so None and 0 becomes False and everything else becomes True).
import pandas as pd
import numpy as np
s = pd.Series([True, None, False, True])
np.logical_not(s)
gives you
0 False
1 True
2 True
3 False
dtype: object
whereas ~s would crash. In most cases tilde would be a safer choice than NumPy.
Pandas 0.25, NumPy 1.17

check if two numeric values have same sign in numpy (+/-)

currently i am using numpy.logical_or with numpy.logical_and to check if elements of two arrays have same sign. Was wondering if there is already a ufunc or a more effective method that will achieve this. My current solutions is here
a = np.array([1,-2,5,7,-11,9])
b = np.array([3,-8,4,81,5,16])
out = np.logical_or(
np.logical_and((a < 0),(b < 0)),
np.logical_and((a > 0),(b > 0))
)
edit//
output
out
Out[51]: array([ True, True, True, True, False, True], dtype=bool)
One approach with elementwise product and then check for >=0, as same signs (both positive or both negative) would result in positive product values -
((a== b) & (a==0)) | (a*b>0)
Another with explicit sign check -
np.sign(a) == np.sign(b)
Runtime test -
In [155]: a = np.random.randint(-10,10,(1000000))
In [156]: b = np.random.randint(-10,10,(1000000))
In [157]: np.allclose(np.sign(a) == np.sign(b), ((a== b) & (a==0)) | (a*b>0))
Out[157]: True
In [158]: %timeit np.sign(a) == np.sign(b)
100 loops, best of 3: 3.06 ms per loop
In [159]: %timeit ((a== b) & (a==0)) | (a*b>0)
100 loops, best of 3: 3.54 ms per loop
# #salehinejad's soln
In [160]: %timeit np.where((np.sign(a)+np.sign(b))==0)
100 loops, best of 3: 8.71 ms per loop
In vanilla Python you could do something like:
abs(a + b) == abs(a) + abs(b)
That will return true if the signs are equal.
Addition is cheaper than multiplication. For non-equal signs:
np.where((np.sign(a)+np.sign(b))!=0)
For equal signs:
np.where((np.sign(a)+np.sign(b))==0)
This approach returns the indices; Not just True/False etc.
Output for the given a and b in question:
[0 1 2 3 5]
May try np.sum() for more than two variables.
import numpy as np
a = np.random.randn(5)
b = np.random.randn(5)
print a
print b
# Method 1
print np.logical_not(np.sign(a*b)-1)
# Method 2 ***probably best
print np.equal(np.sign(a), np.sign(b))
# Method 3
print np.where((a*b<0),np.zeros(5,dtype=bool),np.ones(5,dtype=bool))
# Method 4
print np.core.defchararray.startswith(np.array(-a*b).astype('str'),'-')
>>>
[-0.77184408 -0.55291345 -0.45774947 0.67080435 -0.286555 ]
[ 0.37220055 0.29489477 -1.05773195 1.03833121 1.01538001]
[False False True True False]
[False False True True False]
[False False True True False]
[False False True True False]
Method 1
a*b produces array of values, negative when signs are different
np.sign() converts array to -1 and 1
subtracting 1 converts array to -2 and 0
np.logical_not() converts -2 to False; and 0 to True
Method 2
np.sign() converts to -1, 1
np.equal() compares two arrays and gives truth value if equal element wise
Method 3
np.where(condition[, x, y])Return elements, either from x or y, depending on condition.
np.zeros(5,dtype=bool),np.ones(5,dtype=bool) are arrays of False and True respectively
Method 4
multiply -a*b
convert resultant array to dtype string
check which elements start with a -
reference:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.sign.html
https://docs.scipy.org/doc/numpy/reference/routines.logic.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.core.defchararray.startswith.html#numpy.core.defchararray.startswith
You can rely on bitwise operator XOR
x ^ y
the XOR operation is applied to the sign as well, so when they're the same the result is greater than 0 and if not it's less than 0
I wonder why no one went for the actual mathematical operation:
a * b < 0
In case you need the formula without numpy arrays

How can I obtain the element-wise logical NOT of a pandas Series?

I have a pandas Series object containing boolean values. How can I get a series containing the logical NOT of each value?
For example, consider a series containing:
True
True
True
False
The series I'd like to get would contain:
False
False
False
True
This seems like it should be reasonably simple, but apparently I've misplaced my mojo =(
To invert a boolean Series, use ~s:
In [7]: s = pd.Series([True, True, False, True])
In [8]: ~s
Out[8]:
0 False
1 False
2 True
3 False
dtype: bool
Using Python2.7, NumPy 1.8.0, Pandas 0.13.1:
In [119]: s = pd.Series([True, True, False, True]*10000)
In [10]: %timeit np.invert(s)
10000 loops, best of 3: 91.8 µs per loop
In [11]: %timeit ~s
10000 loops, best of 3: 73.5 µs per loop
In [12]: %timeit (-s)
10000 loops, best of 3: 73.5 µs per loop
As of Pandas 0.13.0, Series are no longer subclasses of numpy.ndarray; they are now subclasses of pd.NDFrame. This might have something to do with why np.invert(s) is no longer as fast as ~s or -s.
Caveat: timeit results may vary depending on many factors including hardware, compiler, OS, Python, NumPy and Pandas versions.
#unutbu's answer is spot on, just wanted to add a warning that your mask needs to be dtype bool, not 'object'. Ie your mask can't have ever had any nan's. See here - even if your mask is nan-free now, it will remain 'object' type.
The inverse of an 'object' series won't throw an error, instead you'll get a garbage mask of ints that won't work as you expect.
In[1]: df = pd.DataFrame({'A':[True, False, np.nan], 'B':[True, False, True]})
In[2]: df.dropna(inplace=True)
In[3]: df['A']
Out[3]:
0 True
1 False
Name: A, dtype object
In[4]: ~df['A']
Out[4]:
0 -2
0 -1
Name: A, dtype object
After speaking with colleagues about this one I have an explanation: It looks like pandas is reverting to the bitwise operator:
In [1]: ~True
Out[1]: -2
As #geher says, you can convert it to bool with astype before you inverse with ~
~df['A'].astype(bool)
0 False
1 True
Name: A, dtype: bool
(~df['A']).astype(bool)
0 True
1 True
Name: A, dtype: bool
I just give it a shot:
In [9]: s = Series([True, True, True, False])
In [10]: s
Out[10]:
0 True
1 True
2 True
3 False
In [11]: -s
Out[11]:
0 False
1 False
2 False
3 True
You can also use numpy.invert:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: s = pd.Series([True, True, False, True])
In [4]: np.invert(s)
Out[4]:
0 False
1 False
2 True
3 False
EDIT: The difference in performance appears on Ubuntu 12.04, Python 2.7, NumPy 1.7.0 - doesn't seem to exist using NumPy 1.6.2 though:
In [5]: %timeit (-s)
10000 loops, best of 3: 26.8 us per loop
In [6]: %timeit np.invert(s)
100000 loops, best of 3: 7.85 us per loop
In [7]: %timeit ~s
10000 loops, best of 3: 27.3 us per loop
In support to the excellent answers here, and for future convenience, there may be a case where you want to flip the truth values in the columns and have other values remain the same (nan values for instance)
In[1]: series = pd.Series([True, np.nan, False, np.nan])
In[2]: series = series[series.notna()] #remove nan values
In[3]: series # without nan
Out[3]:
0 True
2 False
dtype: object
# Out[4] expected to be inverse of Out[3], pandas applies bitwise complement
# operator instead as in `lambda x : (-1*x)-1`
In[4]: ~series
Out[4]:
0 -2
2 -1
dtype: object
as a simple non-vectorized solution you can just, 1. check types2. inverse bools
In[1]: series = pd.Series([True, np.nan, False, np.nan])
In[2]: series = series.apply(lambda x : not x if x is bool else x)
Out[2]:
Out[2]:
0 True
1 NaN
2 False
3 NaN
dtype: object
NumPy is slower because it casts the input to boolean values (so None and 0 becomes False and everything else becomes True).
import pandas as pd
import numpy as np
s = pd.Series([True, None, False, True])
np.logical_not(s)
gives you
0 False
1 True
2 True
3 False
dtype: object
whereas ~s would crash. In most cases tilde would be a safer choice than NumPy.
Pandas 0.25, NumPy 1.17

Categories

Resources