Python sum() has a different result after importing numpy - python

I came across this problem by Jake VanderPlas and I am not sure if my understanding of why the result differs after importing the numpy module is entirely correct.
>>print(sum(range(5),-1)
>> 9
>> from numpy import *
>> print(sum(range(5),-1))
>> 10
It seems like in the first scenario the sum function calculates the sum over the iterable and then subtracts the second args value from the sum.
In the second scenario, after importing numpy, the behavior of the function seems to have modified as the second arg is used to specify the axis along which the sum should be performed.
Exercise number (24)
Source - http://www.labri.fr/perso/nrougier/teaching/numpy.100/index.html

"the behavior of the function seems to have modified as the second arg is used to specify the axis along which the sum should be performed."
You have basically answered your own question!
It is not technically correct to say that the behavior of the function has been modified. from numpy import * results in "shadowing" the builtin sum function with the numpy sum function, so when you use the name sum, Python finds the numpy version instead of the builtin version (see #godaygo's answer for more details). These are different functions, with different arguments. It is generally a bad idea to use from somelib import *, for exactly this reason. Instead, use import numpy as np, and then use np.sum when you want the numpy function, and plain sum when you want the Python builtin function.

Only to add my 5 pedantic coins to #Warren Weckesser answer. Really from numpy import * does not overwrite the builtins sum function, it only shadows __builtins__.sum, because from ... import * statement binds all names defined in the imported module, except those beginning with an underscore, to your current global namespace. And according to Python's name resolution rule (unofficialy LEGB rule), the global namespace is looked up before __builtins__ namespace. So if Python finds desired name, in your case sum, it returns you the binded object and does not look further.
EDIT:
To show you what is going on:
In[1]: print(sum, ' from ', sum.__module__) # here you see the standard `sum` function
Out[1]: <built-in function sum> from builtins
In[2]: from numpy import * # from here it is shadowed
print(sum, ' from ', sum.__module__)
Out[2]: <function sum at 0x00000229B30E2730> from numpy.core.fromnumeric
In[3]: del sum # here you restore things back
print(sum, ' from ', sum.__module__)
Out[3]: <built-in function sum> from builtins
First note: del does not delete objects, it is a task of garbage collector, it only "dereference" the name-bindings and delete names from current namespace.
Second note: the signature of built-in sum function is sum(iterable[, start]):
Sums start and the items of an iterable from left to right and returns the total. start defaults to 0. The iterable‘s items are normally numbers, and the start value is not allowed to be a string.
I your case print(sum(range(5),-1) for built-in sum summation starts with -1. So technically, your phrase the sum over the iterable and then subtracts the second args value from the sum isn't correct. For numbers it's really does not matter to start with or add/subtract later. But for lists it does (silly example only to show the idea):
In[1]: sum([[1], [2], [3]], [4])
Out[1]: [4, 1, 2, 3] # not [1, 2, 3, 4]
Hope this will clarify your thoughts :)

Related

Python function that acts on provided array

Some NumPy functions (e.g. argmax or cumsum) can take an array as an optional out parameter and store the result in that array. Please excuse my less than perfect grasp of the terminology here (which is what prevents me from googling for an answer), but it seems that these functions somehow act on variables that are beyond their scope.
How would I transform this simple function so that it can take an out parameter as the functions mentioned?
import numpy as np
def add_two(a):
return a + 2
a = np.arange(5)
a = add_two(a)
From my understanding, a rewritten version of add_two() would allow for the last line above to be replaced with
add_two(a, out=a)
In my opinion, the best and most explicit is to do as you're currently doing. Python passes the values, not the references as parameters in a function, so you can only modify mutable objects.
One way would be to do:
import numpy as np
def add_two(a, out):
out[:] = a+2
a = np.arange(5)
add_two(a, out=a)
a
Output:
array([2, 3, 4, 5, 6])
NB. Unlike your current solution, this requires that the object passed as parameter out exists and is an array
The naive solution would be to fill in the buffer of the output array with the result of your computation:
def add_two(a, out=None):
result = a + 2
if out is None:
out = result
else:
out[:] = result
return out
The problem (if you could call it that), is that you are still generating the intermediate array, and effectively bypassing the benefits of pre-allocating the result in the first place. A more nuanced approach would be to use the out parameters of the functions in your numpy pipeline:
def add_two(a, out=None):
return np.add(a, 2, out=out)
Unfortunately, as with general vectorization, this can only be done on a case-by-case basis depending on what the desired set of operations is.
As an aside, this has nothing to do with scope. Python objects are specifically available to all namespaces (though their names might not be). If a mutable argument is modified in a function, the changes will always be visible outside the function. See for example "Least Astonishment" and the Mutable Default Argument.

What is the simplest python equivalent to R `:` operator to create a sequence of numbers outside indexing

Is there a simple Python equivalent to R's : operator to create a vector of numbers? I only found range().
Example:
vector_example <- 1:4
vector_example
Output:
[1] 1 2 3 4
You mention range(). That's the standard answer for Python's equivalent. It returns a sequence. If you want the equivalent in a Python list, just create a list from the sequence returned by range():
range_list = list(range(1,5))
Result:
[1, 2, 3, 4]
I don't know 'go', but from your example, it appears that its : operator's second argument is inclusive...that is, that number is included in the resulting sequence. This is not true of Python's range() function. The second parameter passed to it is not included in the resulting sequence. So where you use 4 in your example, you want to use 5 with Python to get the same result.
I remember being frustrated by the lack of : to create sequences of consecutive numbers when I first switched from R to Python. In general, there is no direct equivalent to the : operator. Python sequences are more like R's seq() function.
While the base function range is alright, I personally prefer numpy.arange, as it is more flexible.
import numpy as np
# Create a simple array from 1 through 4
np.arange(1, 5)
# This is what I mean by "more flexible"
np.arange(1, 5).tolist()
Remember that Python lists and arrays are 0-indexed. As far as I'm concerned, all intervals are right-open too. So np.arange(a, b) will exclude b.
PS: There are other functions, such as numpy.linspace which may suit your needs.

How to perform a range on a Theano's TensorVariable?

How to perform a range on a Theano's TensorVariable?
Example:
import theano.tensor as T
from theano import function
constant = T.dscalar('constant')
n_iters = T.dscalar('n_iters')
start = T.dscalar('start')
result = start
for iter in range(n_iters):
result = start + constant
f = function([start, constant, n_iters], result)
print('f(0,2,5): {0}'.format(f(1,2)))
returns the error:
Traceback (most recent call last):
File "test_theano.py", line 9, in <module>
for iter in range(n_iters):
TypeError: range() integer end argument expected, got TensorVariable.
What is the correct way to use a range on a Theano's TensorVariable?
It's not clear what this code is attempting to do because even if the loop worked, it wouldn't compute anything useful: result will always equal the sum of start and constant irrespective of the number of iterations.
I'll assume that the intention was to compute result like this:
for iter in range(n_iters):
result = result + constant
The problem is that you're mixing symbolic, delayed execution, Theano operations with non-symbolic, immediately executed, Python operations.
range is a Python function that expects a Python integer parameter but you're providing a Theano symbolic value (n_iters, which has a double type instead of an integer type but let's assume it's actually an iscalar instead of a dscalar). As far as Python is concerned all Theano symbolic tensors are just objects: instances of a class type somewhere within the Theano library; they are most assuredly not integers. Even if you squint and try to pretend a Theano iscalar looks like a Python integer, it still doesn't work because Python operations execute immediately which means n_iters needs to have a value immediately. Theano on the other hand doesn't have a value for any iscalar until one is provided by calling a compiled Theano function (or via eval).
To create a symbolic range, you can use theano.tensor.arange which operates just like NumPy's arange, but symbolically.
Example:
import theano.tensor as T
from theano import function
my_range_max = T.iscalar('my_range_max')
my_range = T.arange(my_range_max)
f = function([my_range_max], my_range)
print('f(10): {0}'.format(f(10)))
outputs:
f(10): [0 1 2 3 4 5 6 7 8 9]
By making n_iters a symbolic variable you are implicitly saying "I don't know how many iterations there need to be in this loop until a value is provided for n_iters later". That being the case, you must use a symbolic loop instead of a Python for loop. In the latter case you must tell Python how many times to iterate now, you can't delay that decision until a value is provided for n_iters later. To solve this you need to switch to a symbolic loop, which is provided by Theano's scan operator.
Here's the code changed to use scan (as well as the other assumed changes).
import theano
import theano.tensor as T
from theano import function
constant = T.dscalar('constant')
n_iters = T.iscalar('n_iters')
start = T.dscalar('start')
results, _ = theano.scan(lambda result, constant: result + constant,
outputs_info=[start], non_sequences=[constant], n_steps=n_iters)
f = function([start, constant, n_iters], results[-1])
print('f(0,2,5): {0}'.format(f(0, 2, 5)))

Zeroth-order Bessel function Python

Apologies for the simplicity of this question.
I would like to implement an equation in Python. In this equation, K_0 is the zeroth-order modifed Bessel function.
What is the best way of implementing K_0 in Python?
No need to implement it; it's included. See the docs for the scipy.special module, in particular the optimized common ones here:
>>> import scipy.special
>>> print scipy.special.k0.__doc__
k0(x[, out])
y=k0(x) returns the modified Bessel function of the second kind (sometimes called the third kind) of
order 0 at x.
>>> scipy.special.k0(1)
0.42102443824070823
or more generally:
>>> print scipy.special.kn.__doc__
kn(x1, x2[, out])
y=kn(n,x) returns the modified Bessel function of the second kind (sometimes called the third kind) for
integer order n at x.
>>> scipy.special.kn(0, 1)
0.42102443824070834

numpy array access

I need to create a numpy array of N elements, but I want to access the
array with an offset Noff, i.e. the first element should be at Noff and
not at 0. In C this is simple to do with some simple pointer arithmetic, i.e.
I malloc the array and then define a pointer and shift it appropriately.
Furthermore, I do not want to allocate N+Noff elements, but only N elements.
Now for numpy there are many methods that come to my mind:
(1) define a wrapper function to access the array
(2) overwrite the [] operator
(3) etc
But what is the fastest method to realize this?
Thanks a lot!
Mark
I would be very cautious about over-riding the [] operator through the __getitem__() method. Although it will be fine with your own code, I can easily imagine that when the array gets passed to an arbitrary library function, you could get problems.
For example, if the function explicitly tried to get all values in the array as A[0:-1], it would maps to A[offset:offset-1], which will be an empty array for any positive or negative value of offset. This may be a little contrived, but it illustrates the general problem.
Therefore, I would suggest that you create a wrapper function for your own use (as a member function may be most convenient), but don't muck around with __getitem__().
Use A[n-offset]. this turns offset to offset+len(A) into 0 to len(A).
You've already given (1) and (2) as both more or less sensible methods. To test speed for these kind of things try timeit magic function in ipython. Example usage:
A = array(range(10))
Noff = 2
wrapper_access = lambda i: A[i - Noff]
print wrapper_access(2) #0
print wrapper_access(11) #9
print wrapper_access(1) #9 = A[-1]
timeit wrapper_access(5)
On my machine I get output from timeit 10000000 loops, best of 3: 193 ns per loop

Categories

Resources