Python variable allocation and `id` keyword [duplicate] - python

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python “is” operator behaves unexpectedly with integers
Why (0-6) is -6 = False?
So, while playing with a bit with id (python 2.6.5), I noticed the following (shell session):
>>> a = 1
>>> id(a)
140524904
>>> b = 1
>>> id(b)
140524904
Of course, as soon as I modify one of the variables it gets assigned to a new memory address, i.e.
>>> b += 1
>>> id(b)
140524892
Is it the normal behavior to initially assign both variables that have identical values to the same memory location or just an optimization of i.e. CPython?
P.s. I spent a little time browsing around the code in parser, but couldn't find where and how variables are allocated.

As mentioned by glglgl, this is an implementation detail of CPython. If you look at Objects/longobject.c in the source code for CPython (e.g. version 3.3.0), you'll find the answer to what's happening:
#if NSMALLNEGINTS + NSMALLPOSINTS > 0
/* Small integers are preallocated in this array so that they
can be shared.
The integers that are preallocated are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
This explains why, after a = 1; b = 1, a is b will be True, even when you say a += 2; b +=2; a -= 2; b -= 2. Whenever a number is calculated to have a value that fits in this array, the resulting object is simply picked from this array instead, saving a bit of memory.
You can figure out the bounds of this small_ints array using a function like this:
def binary_search(predicate, lo, hi):
while lo + 1 < hi:
mid = (lo + hi) / 2
if predicate(mid):
lo = mid
else:
hi = mid
return lo
def is_small_int(n):
p = n + 1
q = n + 1
return (p - 1) is (q - 1)
def min_neg_small_int():
p, q = -1, -1
if p is not q:
return 0
while p is q:
p += p
q += q
return binary_search(is_small_int, p / 2, p) - 1
def max_pos_small_int():
p, q = 1, 1
if p is not q:
return 0
while p is q:
p += p
q += q
return binary_search(is_small_int, p / 2, p)
def small_int_bounds():
return (min_neg_small_int(), max_pos_small_int())
For my build (Python 2.7, 64-bit Windows build), small_int_bounds() == (-5, 256). This means that numbers between -5 and 256 (inclusive) are shared through the small_ints array in Objects/longobject.c.
-edit- I see elssar noted that there is a similar answer about interning of some literals. This fact is also mentioned in the documentation for PyInt_FromLong, as mentioned by this answer.

In python all variables are pointers to some objects. Even number.
Number is immutable object. So, CPython not need to create a new object with the same value.
This does not mean that CPython will always use the same objects.
In your first example variables a and b point to the same object.
When your make b += 1 you "create" new object 2.

Here the term "variables" must be precised: there are objects at one hand, and names which are bound to objects at the other hand.
If you do a = b = 1, both a and b are bound to the same object representing 1.
If you do a = 1; b = 1, I think it is a CPython detail that it is the same. Generally, an implementation could choose to have two objects both representing 1 and using them both here. But as that would be a waste of memory, it is generally not done in this way.

a and b both refer to the same object in memory (1), with the ID 140524904. Once you do b += 1 you have 2, which is located elsewhere.

Related

Dividing two numbers and printing the result and adding one to the result if there is a remainder, without using if-statements or imports or function? [duplicate]

This question already has answers here:
Is there a ceiling equivalent of // operator in Python?
(9 answers)
Closed 10 days ago.
I'm trying to divide number into groups in plain python without importing or the use of if-statements, and get the division into groups + remainder forming one extra group so that 200 / 99 would be 3, And 7 / 3 would be 3, but that 8 / 4 would still be just 2, and 4 / 2 would be 2 etc.. I cannot import anything so it needs to be in plain python.
I tried storing the numbers from inputs from user into variables and dividing them, and then adding one. I also tried // and adding 1 but I cannot get it to work.
How about this:
a, b = 200, 99
result, remainder = a // b + int(bool(a % b)), a % b
This computes the result by performing an integer divide a // b and computing the remainder a % b. Converting an integer value to a boolean is False for 0 and True for any other value, and converting that back to an integer gives you the value you want to add to the result. The remainder is computed again to assign it, if you need it.
As user #markransom commented, the conversion to int() isn't even necessary, as bool already 'is' an integer type:
>>> isinstance(True, int)
True
So, this works (although it may be considered a bit less readable):
a, b = 200, 99
result, remainder = a // b + bool(a % b), a % b
If you're using a modern version of Python and really want it to be short, this also works:
result = a // b + bool(remainder := a % b)
This uses the walrus operator to assign the remainder when it is first computed, avoiding having to compute it twice as well.
Python boolean operations short-circuit and return the last value evaluated and you can use that to convert a non-zero remainder to 1 for addition to a quotient
def group_me(dividend, divisor):
quotient, remainder = divmod(dividend, divisor)
return quotient + (remainder and 1 or 0)
print(group_me(200, 99))
print(group_me(7, 3))
print(group_me(8, 4))
Output
3
3
2
If remainder is non-zero, remainder and 1 short-circuits and returns 1. Otherwise, the or now becomes 0 or 0, which retains its last value 0.
You could do an if statement mathematically without using if itself:
n, d = 200, 99
x, y = n/d, n//d + 1
result = int((x%1 == 0) * x + (x%1 != 0) * y)
Essentially it is condition * answer 1 + (1 - condition) * answer 2. Then it switch to whichever answer depending on condition being 1 or 0.

Julia code not finishing while Python code does

I am very new to Julia and was trying to implement/rewrite some of my previous Python code as practice. I was using the Project Euler problem 25 as practice
In Python I have
def fibonacci(N):
"""Returns the Nth Fibonacci Number"""
F = [0, 1]
i = 0
while i <= N-2:
F_new = F[i] + F[i+1]
F.append(F_new)
i += 1
return F[N]
N = 0
x = 1000
while len(str(fibonacci(N))) <= x:
if len(str(fibonacci(N))) == x:
print(N)
break
N = N + 1
Which runs and gives me the correct answer in about 6.5 seconds. When trying to do this in Julia below
function fib(N)
F = [0, 1]
global i = 1
while i <= N-2
F_new = F[i] + F[i+1]
append!(F, F_new)
global i += 1
end
return F[N]
end
N = 1
x = 1000
while length(string(fib(N))) <= x
if length(string(fib(N))) == x
print(N-1)
break
end
global N += 1
end
The code seems to run "forever". However in the Julia code only when x<= 20 will the code finish and produce the correct answer. In the Julia code when x>20 the program never ends.
I'm not sure where something could go wrong if it runs for all values below 21? Could somebody explain where the error is happening and why?
Python integers are by default unbounded in size and will grow as needed. Julia on the other hand will default to a signed 64 bit integer if on a 64 bit system. (See docs) This begins to overflow when trying to calculate values above around 19 digits long, hence why this starts around x=20. In order to get the same behavior in Julia, you should use the BigInt type for any values or arguments which can get above this size.
The main problem with your code is what #duckboycool has described. The second advice is to always write functions in Julia. Read the Julia performance tips page for a good start.
Note that you can make the function by #Bill 2X faster by removing the unnecessary if like this:
function test(x = 1000)
N = 0
while ndigits(fib(N)) < x
N += 1
end
return N
end
But if you really want a 16000X faster Julia version, then you can do this:
function euler25()
limit = big(10)^999
a, b = big(1), big(1)
N = 2
while b <= limit
a, b = b, a + b
N += 1
end
return N
end
#btime euler25() = 4782
377.700 μs (9573 allocations: 1.15 MiB)
This runs in 377 μs, because we avoid calculating fib(N) at every step from the beginning. And instead of comparing with the length of a string of the output at each iteration, we just compare with 10^999.
In addition to the earlier answer, note that you should avoid globals if looking at performance, so this is much faster than your global i and x code:
function fib(N)
F = [big"0", big"1"]
for i in 1:N-2
F_new = F[i] + F[i+1]
push!(F, F_new)
end
return F[N]
end
function test(x = 1000)
N = 1
while length(string(fib(N))) <= x
if length(string(fib(N))) == x
print(N-1)
break
end
N += 1
end
end
test()
#AboAmmar shows probably the best "normal" way of writing this. But if you want something even a bit more optimized, you can use in-place BigInt calls. I'm not sure whether I would recommend this, but it's nice to be aware of it.
using Base.GMP.MPZ: add!, set!
function euler25_(limit=big(10)^999)
a, b = big(1), big(1)
N = 2
c = big(0)
while b <= limit
add!(c, a, b)
set!(a, b)
set!(b, c)
N += 1
end
return N
end
This uses the special BigInt functions in the GMP.MPZ library, and writes values in-place, avoiding most of the allocations, and running 2.5x faster on my laptop.

Sum of Two Integers without using "+" operator in python

Need some help understanding python solutions of leetcode 371. "Sum of Two Integers". I found https://discuss.leetcode.com/topic/49900/python-solution/2 is the most voted python solution, but I am having problem understand it.
How to understand the usage of "% MASK" and why "MASK = 0x100000000"?
How to understand "~((a % MIN_INT) ^ MAX_INT)"?
When sum beyond MAX_INT, the functions yells negative value (for example getSum(2147483647,2) = -2147483647), isn't that incorrect?
class Solution(object):
def getSum(self, a, b):
"""
:type a: int
:type b: int
:rtype: int
"""
MAX_INT = 0x7FFFFFFF
MIN_INT = 0x80000000
MASK = 0x100000000
while b:
a, b = (a ^ b) % MASK, ((a & b) << 1) % MASK
return a if a <= MAX_INT else ~((a % MIN_INT) ^ MAX_INT)
Let's disregard the MASK, MAX_INT and MIN_INT for a second.
Why does this black magic bitwise stuff work?
The reason why the calculation works is because (a ^ b) is "summing" the bits of a and b. Recall that bitwise xor is 1 when the bits differ, and 0 when the bits are the same. For example (where D is decimal and B is binary), 20D == 10100B, and 9D = 1001B:
10100
1001
-----
11101
and 11101B == 29D.
But, if you have a case with a carry, it doesn't work so well. For example, consider adding (bitwise xor) 20D and 20D.
10100
10100
-----
00000
Oops. 20 + 20 certainly doesn't equal 0. Enter the (a & b) << 1 term. This term represents the "carry" for each position. On the next iteration of the while loop, we add in the carry from the previous loop. So, if we go with the example we had before, we get:
# First iteration (a is 20, b is 20)
10100 ^ 10100 == 00000 # makes a 0
(10100 & 10100) << 1 == 101000 # makes b 40
# Second iteration:
000000 ^ 101000 == 101000 # Makes a 40
(000000 & 101000) << 1 == 0000000 # Makes b 0
Now b is 0, we are done, so return a. This algorithm works in general, not just for the specific cases I've outlined. Proof of correctness is left to the reader as an exercise ;)
What do the masks do?
All the masks are doing is ensuring that the value is an integer, because your code even has comments stating that a, b, and the return type are of type int. Thus, since the maximum possible int (32 bits) is 2147483647. So, if you add 2 to this value, like you did in your example, the int overflows and you get a negative value. You have to force this in Python, because it doesn't respect this int boundary that other strongly typed languages like Java and C++ have defined. Consider the following:
def get_sum(a, b):
while b:
a, b = (a ^ b), (a & b) << 1
return a
This is the version of getSum without the masks.
print get_sum(2147483647, 2)
outputs
2147483649
while
print Solution().getSum(2147483647, 2)
outputs
-2147483647
due to the overflow.
The moral of the story is the implementation is correct if you define the int type to only represent 32 bits.
Here is solution works in every case
cases
- -
- +
+ -
+ +
solution
python default int size is not 32bit, it is very large number, so to prevent overflow and stop running into infinite loop, we use 32bit mask to limit int size to 32bit (0xffffffff)
a,b=-1,-1
mask=0xffffffff
while (b & mask):
carry=a & b
a=a^b
b=carray <<1
print( (a&Mask) if b>0 else a)
For me, Matt's solution stuck in inifinite loop with inputs Solution().getSum(-1, 1)
So here is another (much slower) approach based on math:
import math
def getSum(a: int, b: int) -> int:
return int(math.log2(2**a * 2**b))

Swapping array with the XOR operator doesn’t work in Python

I was trying to code Quicksort in Python (see the full code at the end of the question) and in the partition function I am supposed to swap two elements of an array (call it x). I am using the following code for swapping based on the xor operator:
x[i]^=x[j]
x[j]^=x[i]
x[i]^=x[j]
I know that it should work because of the nilpotence of the xor operator (i.e. x^x=0) and I have done it like a million times in Java and in C without any problem. My question is: why doesn’t it work in Python? It seems that it is not working when x[i] == x[j] (maybe i = j?).
x = [2,4,3,5,2,5,46,2,5,6,2,5]
print x
def part(a,b):
global x
i = a
for j in range(a,b):
if x[j]<=x[b]:
x[i]^=x[j]#t = x[i]
x[j]^=x[i]#x[i] = x[j]
x[i]^=x[j]#x[j]=t
i = i+1
r = x[i]
x[i]=x[b]
x[b]=r
return i
def quick(y,z):
if z-y<=0:
return
p = part(y,z)
quick(y,p-1)
quick(p+1,z)
quick(0,len(x)-1)
print x
As to why it doesn't work, it really shouldn't matter1, because you shouldn't be using code like that in the first place, especially when Python gives you a perfectly good 'atomic swap' capability:
x[i], x[j] = x[j], x[i]
It's always been my opinion that all programs should be initially optimised for readability first and only have performance or storage improvements imposed if there's a clear need and a clear benefit (neither of which I've ever seen for the XOR trick outside some incredibly small data environments like some embedded systems).
Even in languages that don't provide that nifty feature, it's more readable and probably faster to use a temporary variable:
tmp = x[i]
x[i] = x[j]
x[j] = tmp
1 However, if you really want to know why it's not working, it's because that trick is okay for swapping two distinct variables, but not so well when you use it with the same variable, which is what you'll be doing when you try to swap x[i] with x[j] when i is equal to j.
It's functionally equivalent to the following, with print statements added so you can see where the whole thing falls apart:
>>> a = 42
>>> a ^= a ; print(a)
0
>>> a ^= a ; print(a)
0
>>> a ^= a ; print(a)
0
Contrast that with two distinct variables, which works okay:
>>> a = 314159; b = 271828; print(a,b)
314159 271828
>>> a ^= b; print(a,b)
61179 271828
>>> b ^= a; print(a,b)
61179 314159
>>> a ^= b; print(a,b)
271828 314159
The problem is that the trick works by transferring information between the two variables gradually (similar to the fox/goose/beans puzzle). When it's the same variable, the first step doesn't so much transfer information as it does destroy it.
Both Python's 'atomic swap' and use of a temporary variable will avoid this problem completely.
I was reviewing this fact, and for example, you could express the xor like the following expression:
a xor b = (a or b) - (a & b)
So, basically, if you replace a by b, Whoa! xDD
You'll get it, zero.

What does *= mean in python? [duplicate]

This question already has answers here:
What is this operator *= -1
(2 answers)
Closed 9 years ago.
For example in this code:
def product(list):
p =1
for i in list:
p *= i
return p
I found this code, but I need to be able to explain each and every part of it.
It's shorthand for
p = p * i
It's analogous to the more frequently encountered p += i
Taken from the first result in google:
Multiply AND assignment operator, It multiplies right operand with the left operand and assign the result to left operand
*= is the same as saying p = p * i.
This link contains a list of all the operators in their various, wonderful combinations.
Example
A pseudo-code explanation of your code is as follows:
assume list := {4, 6, 5, 3, 5, 1}
p := 1.
for(each number in list)
p = the current value of p * the current number.
// So: 1 * 4, 4 * 6, 24 * 5, 120 * 3...
return p.
Usually p *= i is the same as p = p * i.
Sometimes it can be different, and I think the explanations already posted aren't clear enough for that, so:
It can be different when p is a mutable object. In that case the in-place *= may modify the original object instead of creating a new one. Compare what happens to q in each of these:
>>> p = q = [2]
>>> p *= 5
>>> p
[2, 2, 2, 2, 2]
>>> q
[2, 2, 2, 2, 2]
>>> p = q = [2]
>>> p = p * 5
>>> p
[2, 2, 2, 2, 2]
>>> q
[2]
If can also be different when p is a complex expression with side effects as the in-place version only evaluates sub-expressions once. So for example:
aList[randint(0, 5)] *= 3
is not the same as:
aList[randint(0, 5)] = aList[randint(0, 5)] * 3
It's not exactly the same as p = p * i:
>>> class A(int):
... def __imul__(self, item):
... print '__imul__ is running!'
...
... def __mul__(self, item):
... print '__mul__ is running!'
>>> mynumber = A(10)
>>> mynumber *= 5
__imul__ is running!
>>> mynumber = A(10)
>>> mynumber * 5
__mul__ is running!
However the output is mostly the same, so you should probably treat it so
The idea behind the operator a *= b is to mean the same as a = a * b. In most cases (as in yours) it will do exactly this, so it multiplies a variable with a value and stores the result in the variable again.
The notation using *= might be faster (depending on the classes involved), and it is, in any case, the clearer version, so it should be favored. Its main advantage shows if the variable a is itself already a complex expression like myGiantValueField[f(42)].getValue()[3]:
myGiantValueField[f(42)].getValue()[3] = myGiantValueField[f(42)].getValue()[3] * 3
is certainly less readable and due to code doubling more prone to fixing-errors than
myGiantValueField[f(42)].getValue()[3] *= 3
In general, though, the operator *= calls the method __imul__() of the variable a and hands over the argument b, so it means exactly the same as a.__imul__(b) (which isn't as intuitive).
Why could there be a difference between a = a * b and a *= b? Three reasons come two mind at once, but there might be more.
A class could implement only the *= operator (so a * b could be undefined although a *= b exists) due to performance aspects. Multiplying a very large value (e. g. a giant matrix) with a number is sometimes better done in-place to avoid having to allocate memory for the result (which might at once after computation be copied into the original variable by the assignment). That a = a * b is internally sth like tmp = a * b and a = tmp.
A class might find its use unintuitive using the direct multiplication, hence it could leave the * operator unimplemented. An example might be a class representing the volume of the computer speaker. Doubling the volume might make sense (volumeKnob *= 2) whereas computing it without using (assigning) it is not recommended (x = volumeKnob * 2? ← makes no sense as it does nothing).
If the type of the result of the multiplication differs from the type of a, I would not expect or recommend implementing the *= operator as it would be misleading. An example might be if a and b are vectors whose multiplication result would be a matrix (or a number). The name __imul__ already suggests that it is meant for being applied iteratively, i. e. more than once. But if a's type changed, this would be problematic.

Categories

Resources