what is wrong with this pandas expression

what is wrong with this pandas expression - python

I'm new to Pandas, the answer may be obvious.
I have 3 series of the same length: a, b, c
a[b > c] = 0
works, but:
a[math.fabs(b) > c] = 0
doesn't work, and
a[(b > c or b < -c)] = 0
doesn't work either.
How can I implement that logic?

Your issue is that in the first expression the expression you use is vectorized while in the other one it is not.
In the first expression, the < operation between two series returns a series as well
In the second expression, math.fabs is supposed to be applied elements by elements and not to an array/series of elements (try the numpy version instead is it exists).
In the third expression, the or operation is not vectorized and you should use | instead.

Related

Why is the "1" after sum necessary to avoid a syntax error

Why does this work:
def hamming_distance(dna_1,dna_2):
hamming_distance = sum(1 for a, b in zip(dna_1, dna_2) if a != b)
return hamming_distance
As opposed to this:
def hamming_distance(dna_1,dna_2):
hamming_distance = sum(for a, b in zip(dna_1, dna_2) if a != b)
return hamming_distance
I get this error:
Input In [90]
hamming_distance = sum(for a, b in zip(dna_1, dna_2) if a != b)
^
SyntaxError: invalid syntax
I expected the function to work without the 1 after the ()

The working expression can be unrolled into something like this:
hamming_distance = 0
for a, b in zip(dna_1, dna_2):
if a != b:
hamming_distance += 1
Without a number after +=, what should Python add? It doesn't know, and neither do we.
If this "unrolled" syntax or your code's relationship to it is new to you, probably start by reading up on list comprehensions, which generalize into generator expressions (which is what you have).

You wrote a generator expression. Generator expressions must produce a value (some expression to the left of the first for). Without it, you're saying "please sum all the lack-of-values not-produced by this generator expression".
Ask yourself:
What does a genexpr that produces nothing even mean?
What is sum summing when it's being passed a series of absolute nothing?
You could write a shorter genexpr with the same effect with:
hamming_distance = sum(a != b for a, b in zip(dna_1, dna_2))
since bools have integer values of 1 (for True) and 0 (for False), so it would still work, but it would be slower than sum(1 for a, b in zip(dna_1, dna_2) if a != b) (which produces fewer values for sum to work on and, at least on some versions of Python, allows sum to operate faster, since it has a fast path for summing small exact int types that bool breaks).

Chaining *= += operators

I have the following code:
aaa = np.random.rand(20, 1)
aaa *= 200
aaa -= 100
I wonder if it is possible to chain *= and -= operators on the same line. So, the loop over the array would be done only one time and I suppose a slight gain in performance results (of course for big arrays).

You cannot chain assignments in Python the way you can in C.
That is because in C an assignment is an expression: it has a value that can be assigned to a variable, or used in another expression. C got this idea from Algol, and those who come from the Pascal tradition tend to regard it as a misfeature. Because...
It is a trap for unwary novices who code if (a = b + c) when they mean if (a == b + c). Both are valid, but generally the second one is what you meant, because the first assigns the value of b + c to a and then tests the truth value of a.
Because assignments are not expressions in Python but statements, you will get a syntax error for if (a = b + c). It's just as invalid as if (return).
If you want to achieve what the C idiom does you can use an assignment expression (new in 3.8). You can explicitly code if (a := b + c) if what you really want to do is assign the value of b + c to a and then test the truth value of a (though technically I believe it actually tests the truth value of b + c; which comes to the same thing).
[And to the style martinets, yes, I do know that parens are redundant in a Python if statement.]

Doing them in one line would simply be
aaa = (aaa * 200) - 100
Though I doubt you'll see any performance difference between this version and what you wrote.

Python: Using Equality Operator Inside of Numpy Array Assignment

I saw this code in some examples online and am trying to understand and modify it:
c = a[b == 1]
Why does this work? It appears b == 1 returns true for each element of b that satisfies the equality. I don't understand how something like a[True] ends up evaluating to something like "For all values in a for which the same indexed value in b is equal to 1, copy them to c"
a,b, and c are all NumPy arrays of the same length containing some data.
I've searched around quite a bit but don't even know what to call this sort of thing.
If I want to add a second condition, for example:
c = a[b == 1 and d == 1]
I get
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I know this happens because that combination of equality operations is ambiguous for reasons explained here, but I am unsure of how to add a.any() or a.all() into that expression in just one line.
EDIT:
For question 2, c = a[(b == 1) & (d == 1)] works. Any input on my first question about how/why this works?

Why wouldn't your example in point (1) work? This is Boolean indexing. If the arrays were different shapes then it may be a different matter, but:
c = a[b == 1]
Is indistinguishable from:
c = a[a == 1]
When you don't know the actual arrays. Nothing specific to a is going on here; a == 1 is just setting up a boolean mask, that you then re-apply to a in a[mask_here]. Doesn't matter what generated the mask.

You just need to put the conditions separately in brackets. Try using this
c = a[(b == 1) & (d == 1)]

Is “An expression that has one of two values, depending on a condition.” an accurate definition of conditional expression?

In Think Python, 2nd Edition, the author defines conditional expression as "An expression that has one of two values, depending on a condition." But after I had reflected about it, I have thought that the accuracy of the definition may be questionable. Here's a function which is written using a conditional expression:
def get_sign(n):
"""Returns 1 if n is a positive number, -1 if n is a negative number,
or 0 if n is a zero
"""
return 1 if n > 0 else -1 if n < 0 else 0
Here the conditional expression is 1 if n > 0 else -1 if n < 0 else 0. And there are two observations about that:
the expression has one of three possible values, namely 1, -1, or 0.
the value depends on two conditions, namely n > 0, and n < 0.
So, is the author's definition accurate, why and why not? Is "An expression whose value depends on one or more conditions, and that has one of several values (at least two)." a more accurate definition of conditional expression, why and why not?

You still have two outcomes. That one of those two outcomes is itself dependent on another conditional expression doesn't change this.
I've added parentheses here to illustrate my point:
1 if n > 0 else (-1 if n < 0 else 0)
So the outcome of that expression is one of these two options:
1
-1 if n < 0 else 0
That second expression is itself another conditional expression. The first value is also just an expression, which has a value once you've evaluated it; the only difference is that it produces a simple literal value. All of this makes no difference to the top-level conditional expression, it still only deals with two outcomes.
Note that only one of the expressions is actually evaluated. This matters if one of those expressions has side effects (alters state outside of the expression) or is 'expensive' in terms of memory or processing time. For example:
import time
def sleep10secs():
time.sleep(10)
return 'slow'
print('instant' if True else sleep10secs())
will print instant instantly, the sleep10secs() function is not called.

Python 3 order of testing undetermined

string='a'
p=0
while (p <len(string)) & (string[p]!='c') :
p +=1
print ('the end but the process already died ')
while (p <1) & (string[p]!='c') :
IndexError: string index out of range
I want to test a condition up to the end of a string (example string length=1)
why are both parts of the and executed is the condition is already false!
as long as p < len(string). the second part does not even need executing.
if it does a lot of performance can be lost

You're not using proper boolean and. Use it and you won't see this problem. What you're using (&) is a bitwise comparison, which evaluates both sides.

Bitwise AND, "a & b", should be thought of as
function _bitwise_and(A,B):
# A and B are Python expressions
# which result in lists of 1's and 0's
a = A.evaluate()
b = B.evaluate()
return [ 1 if abit==1 and bbit==1 else 0 for abit,bbit in zip(a,b)]
so, graphically,
a: ... 0 1 1 0
b: ... 1 0 1 0
--------
a&b ... 0 0 1 0 <- each bit is 1 if-and-only-if the
corresponding input bits are both 1
and the result is a list of bits, packed into an integer.
.
Logical AND, "a and b", should instead be thought of as
function _and(A,B):
# A and B are Python expressions which result in values having truthiness
a = A.evaluate()
if is_truthy(a):
b = B.evaluate()
return b
else:
return a
.
Notice: if the result of A is falsy, B never gets evaluated - so if expression B has an error when evaluated, bitwise AND will result in an error while logical AND will not.
This is the basis for the common Python idiom,
while (offset in data) and test(data[offset]):
do_something_to(data[offset])
next offset
... because data[offset] is only evaluated if offset is a useable (non-error-producing) value.
By using '&' instead of 'and', you guarantee an error by evaluating data[last_offset+1] at the end of your loop.
.
Of course, this could have been avoided with another common idiom:
for ch in string if ch=='c':
do_something_to(ch)
which avoids IndexError problems altogether.

You need to use the boolean operators and and or rather than the bitwise operators & and |

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

what is wrong with this pandas expression - python

I'm new to Pandas, the answer may be obvious. I have 3 series of the same length: a, b, c a[b > c] = 0 works, but: a[math.fabs(b) > c] = 0 doesn't work, and a[(b > c or b < -c)] = 0 doesn't work either. How can I implement that logic?

Related

Why is the "1" after sum necessary to avoid a syntax error

Chaining *= += operators

Python: Using Equality Operator Inside of Numpy Array Assignment

Is “An expression that has one of two values, depending on a condition.” an accurate definition of conditional expression?

Python 3 order of testing undetermined

Categories

Resources