Question about copies on assignment in numpy - python

Consider the following piece of code:
import numpy as np
a = np.zeros(10)
b = a
b = b + 1
If I print a and b, I get
>>> a
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
and
>>> b
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
Why is this? According to this answer, the third line above binds the variable a to the new name b, so that both refer to the same data. So why doesn't b = b + 1 modify a also?

the interpreter sees the code this way.
import numpy as np
a = np.zeros(10)
b1 = a
b2 = b1 + 1 # make a new array b1 + 1 and save it in b2
print(b2)
the + operator in numpy (or out of numpy) allocates new memory on the heap, this new memory is then assigned to the name b. (this is also a convention outside of numpy)
to prevent this, you should use the np.add function directly and pass the out parameter. (most numpy functions have an out parameter for this purpose)
import numpy as np
a = np.zeros(10)
b = a
np.add(b,1,out=b) # no new memory is allocated here.
print(a)
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
alternatively this will do almost the same end result but is less efficient.
import numpy as np
a = np.zeros(10)
b = a
b[:] = b + 1
print(a)
which would create a new array that will contain b + 1 then copy its elements into the preexisting array of b.
using the out parameter is useful when working with large data, where a simple "reserve a temporary array for results" may cause a memory error, especially when working with gpus using cupy (which has the same numpy interface) where memory is very restricted.

Related

Vectorize list returning python function into numpy nd-array [duplicate]

numpy.vectorize takes a function f:a->b and turns it into g:a[]->b[].
This works fine when a and b are scalars, but I can't think of a reason why it wouldn't work with b as an ndarray or list, i.e. f:a->b[] and g:a[]->b[][]
For example:
import numpy as np
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
print(g(a))
This yields:
array([[ 0. 0. 0. 0. 0.],
[ 1. 1. 1. 1. 1.],
[ 2. 2. 2. 2. 2.],
[ 3. 3. 3. 3. 3.]], dtype=object)
Ok, so that gives the right values, but the wrong dtype. And even worse:
g(a).shape
yields:
(4,)
So this array is pretty much useless. I know I can convert it doing:
np.array(map(list, a), dtype=np.float32)
to give me what I want:
array([[ 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3.]], dtype=float32)
but that is neither efficient nor pythonic. Can any of you guys find a cleaner way to do this?
np.vectorize is just a convenience function. It doesn't actually make code run any faster. If it isn't convenient to use np.vectorize, simply write your own function that works as you wish.
The purpose of np.vectorize is to transform functions which are not numpy-aware (e.g. take floats as input and return floats as output) into functions that can operate on (and return) numpy arrays.
Your function f is already numpy-aware -- it uses a numpy array in its definition and returns a numpy array. So np.vectorize is not a good fit for your use case.
The solution therefore is just to roll your own function f that works the way you desire.
A new parameter signature in 1.12.0 does exactly what you what.
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, signature='()->(n)')
Then g(np.arange(4)).shape will give (4L, 5L).
Here the signature of f is specified. The (n) is the shape of the return value, and the () is the shape of the parameter which is scalar. And the parameters can be arrays too. For more complex signatures, see Generalized Universal Function API.
import numpy as np
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
b = g(a)
b = np.array(b.tolist())
print(b)#b.shape = (4,5)
c = np.ones((2,3,4))
d = g(c)
d = np.array(d.tolist())
print(d)#d.shape = (2,3,4,5)
This should fix the problem and it will work regardless of what size your input is. "map" only works for one dimentional inputs. Using ".tolist()" and creating a new ndarray solves the problem more completely and nicely(I believe). Hope this helps.
You want to vectorize the function
import numpy as np
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
Assuming that you want to get single np.float32 arrays as result, you have to specify this as otype. In your question you specified however otypes=[np.ndarray] which means you want every element to be an np.ndarray. Thus, you correctly get a result of dtype=object.
The correct call would be
np.vectorize(f, signature='()->(n)', otypes=[np.float32])
For such a simple function it is however better to leverage numpy's ufunctions; np.vectorize just loops over it. So in your case just rewrite your function as
def f(x):
return np.multiply.outer(x, np.array([1,1,1,1,1], dtype=np.float32))
This is faster and produces less obscure errors (note however, that the results dtype will depend on x if you pass a complex or quad precision number, so will be the result).
I've written a function, it seems fits to your need.
def amap(func, *args):
'''array version of build-in map
amap(function, sequence[, sequence, ...]) -> array
Examples
--------
>>> amap(lambda x: x**2, 1)
array(1)
>>> amap(lambda x: x**2, [1, 2])
array([1, 4])
>>> amap(lambda x,y: y**2 + x**2, 1, [1, 2])
array([2, 5])
>>> amap(lambda x: (x, x), 1)
array([1, 1])
>>> amap(lambda x,y: [x**2, y**2], [1,2], [3,4])
array([[1, 9], [4, 16]])
'''
args = np.broadcast(None, *args)
res = np.array([func(*arg[1:]) for arg in args])
shape = args.shape + res.shape[1:]
return res.reshape(shape)
Let try
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
amap(f, np.arange(4))
Outputs
array([[ 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3.]], dtype=float32)
You may also wrap it with lambda or partial for convenience
g = lambda x:amap(f, x)
g(np.arange(4))
Note the docstring of vectorize says
The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop.
Thus we would expect the amap here have similar performance as vectorize. I didn't check it, Any performance test are welcome.
If the performance is really important, you should consider something else, e.g. direct array calculation with reshape and broadcast to avoid loop in pure python (both vectorize and amap are the later case).
The best way to solve this would be to use a 2-D NumPy array (in this case a column array) as an input to the original function, which will then generate a 2-D output with the results I believe you were expecting.
Here is what it might look like in code:
import numpy as np
def f(x):
return x*np.array([1, 1, 1, 1, 1], dtype=np.float32)
a = np.arange(4).reshape((4, 1))
b = f(a)
# b is a 2-D array with shape (4, 5)
print(b)
This is a much simpler and less error prone way to complete the operation. Rather than trying to transform the function with numpy.vectorize, this method relies on NumPy's natural ability to broadcast arrays. The trick is to make sure that at least one dimension has an equal length between the arrays.

Passing 1D arrays as elements of a 3D array in Tensorflow

In Tensorflow: I have a set of arrays, x0,x1,x2 and x3 that are generated in my program and each array has N elements. I also have a zeros-initialized 3D tensor of dimensions (N,2,2).
I want to make each of the array as the element of my 3D tensor (with the N elements in the third direction), M[:,0,0] =x0, M[:,0,1] =x1, M[:,1,0] =x2 and M[:,1,1] =x3, and I want to do that with for loops.
In Matlab which I am more familiar with it is possible to do that by simply:
M(1,1,:)=x0
M(2,2,:)=x3
M(1,2,:)=x1
M(2,1,:)=x2
Is there a way that I can pass arrays in a for loop as the third dimension of my tensor in tensor flow, for example:
for i in range(2):
M[:,i,i]=x
where x is an array?
I don't think it would work that way. You could do:
import tensorflow as tf
import numpy as np
x0,x1,x2,x3 = [ np.random.randint(0,10,3) for _ in range(4)]
tf_X = tf.stack( [ tf.stack([x0,x1]),
tf.stack([x2,x3]) ])
EDIT: It might change with Tensorflow2 though. Otherwise, you could use pytorch:
import torch
import numpy as np
X = torch.ones([3,4])
v = np.arange(4)
X[0,:] = torch.from_numpy(v)
Result:
In [20]: X
Out[20]:
tensor([[0., 1., 2., 3.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])

override operations in dask for getitem

Is there a way to override certain operations.
import dask
import numpy as np
a = np.zeros((10,10))
a = dask.delayed(lambda x : x*2)(a)
I would like a[0] to return a number (instead of having to call a[0].compute()).
Is this possible?
The context is that I would like to have a series of images (3D array), and run operations like:
imgs2 = imgs - 1
imgs3 = imgs*mask
and then have an operation like imgs3[0] explicitly run imgs3[0].compute().
However, I see many drawbacks with this method now and I would like to remove this post. For one is this is quite limiting. Indexing things like imgs3[:,:, 10] (all columns) may also have to end up with a computed result.
This seems to work fine for me
In [1]: import dask
...: import numpy as np
...: a = np.zeros((10,10))
...: a = dask.delayed(lambda x : x*2)(a)
...:
In [2]: a[0]
Out[2]: Delayed('getitem-4eccd4e43153cac99d8e6d280cc1ad9c')
In [3]: a[0].compute()
Out[3]: array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

How can I translate a MATLAB cell in Python 3?

Just to give you some context:
I have to translate some MATLAB code into Python 3 one, but here I've been confronted to a little problem.
Matlab:
for i in 1:num_nodes
for j in 1:num_nodes
K{i,j} = zeros(3,3);
Which I translated into:
k_topology = [[]]
for i in range(x):
for i in range(x):
k_topology[[i][j]].extend(np.zeros(3,3))
Also, further in the Matlab code there's a third loop:
for k in 1:3
K{i,j}(k,k) = -1
Which also kind of... Upsets me?
The fact is I don't really see how I can translate this kind of variable into Python. Also, I guess that my Python code's kind of "broken" - and I'm not really asking to any of you to improve it - , so I'm just asking which is the best way to translate Matlab's cell into Python?
I finally found something apparently simple to translate this, using list comprehension - according to kazemakase's answer. The actual Python code is now looking like this:
k_topology = [[np.zeros((3,3)) for j in range(self.get_nb_nodes_from_network())]\
for i in range(self.get_nb_nodes_from_network())]
And looks like something like this in Output:
[[array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]]),
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]]),
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])], ..., [array(...)]]
(There's really too many values to paste it here, but I think you got it.)
The first question you need to ask is "what is a Matlab cell and what could be a suitable corresponding Python type?"
If I remember correctly from my bad old Matlab days, a cell is sort of a container that holds content of mixed types. It is something like a dynamically typed array or matrix. It is multidimensionally indexed.
Python is dynamically typed, so any Python contianer can basically fulfill this function. Lists in Python are indexed, so nested lists could work - but they are somewhat weird to set up and access:
K = [[None] * num_nodes for _ in range(num_nodes)]
K[i][j] # need two indices to access elements of a nested list.
For the particular scenario a dictionary better mirrors Matlab syntax. Although a ditionary takes only one index, we can exploit the fact that tuples can be declared without brackets and that dictionaries can take tuples as index:
K = {}
for i in range(num_nodes):
for j in range(num_nodes):
K[i, j] = np.zeros((3, 3))
for k in 1:3
K[i, j][k, k] = -1
While the dictionary is syntactically more concise, element access is potentially less performant than in nested lists. Nested look different than Matlab code. The choice depends on performance or similarity to the original code. But if performance is an issue there are many more things to consider, anyway. In summary: There is no one best way to do it.
Since the OP expclicitly asked not to improve the code, I explicitly ask him/her to ignore this part of the answer.
A better way to build diagonal matrices is to use np.ones instead of looping over diagonal elements.
K = {}
for i in range(num_nodes):
for j in range(num_nodes):
K[i, j] = -np.ones((3, 3))
Also, nested lists can be constructed without (much) prior initialization, if that is the preferred approach:
K = []
for i in range(num_nodes):
K.append([])
for j in range(num_nodes):
K[-1].append(-np.ones((3, 3)))
Now, for the peace of my soul, let me take apart provide feedback on the OP's code:
k_topology = [[]]
for i in range(x):
for i in range(x):
k_topology[[i][j]].extend(np.zeros(3,3))
This has nothing to do with the original Matlab code (different variable names)
Both loops use i. j is never defined.
[[i][j]] builds a list with one element i and tries to take the jth element. If j is ever something other than 0 this will cause an error.
list.extend a appends all elements of the argument individually to the list - in this case individual rows. list.append would be correct to use as the whole 3x3 matrix should be appended as one element in K.
np.zeros(3, 3) should be np.zeros((3, 3)) (assuming np is an alias for numpy) because the function takes the shape is the first argument, not multiple arguments.
Using the Octave/scipy save/loadmat that I demonstrated in the linked post:
In an Octave session
>> num_nodes=3
num_nodes = 3
>> num_nodes=3;
>> K=cell(num_nodes, num_nodes);
>> for i = 1:num_nodes
for j = 1:num_nodes
K{i,j} = zeros(2,2);
end
end
>> K
K =
{
[1,1] =
0 0
0 0
[2,1] =
0 0
0 0
etc
Access one cell:
>> K{1,2}
ans =
0 0
0 0
Access one element of one cell:
>> K{1,2}(1,1)
ans = 0
>> save -7 kfile.mat K
In Python
In [31]: from scipy import io
In [32]: data = io.loadmat('kfile.mat')
In [34]: data
Out[34]:
{'K': array([[array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]])],
[array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]])],
[array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]])]], dtype=object),
'__globals__': [],
'__header__': b'MATLAB 5.0 MAT-file, written by Octave 4.0.0, 2017-02-15 19:05:44 UTC',
'__version__': '1.0'}
In [35]: data['K'].shape
Out[35]: (3, 3)
In [36]: data['K'][0,0].shape
Out[36]: (2, 2)
In [37]: data['K'][0,0][0,0]
Out[37]: 0.0
loadmat treats a cell as a 2d object dtype array; while regular matrices are 2d numeric arrays. Object arrays are, in many ways like a nested Python list.

Create a numpy array according to another array along with indices array

I have a numpy array(eg., a = np.array([ 8., 2.])), and another array which stores the indices I would like to get from the former array. (eg., b = np.array([ 0., 1., 1., 0., 0.]).
What I would like to do is to create another array from these 2 arrays, in this case, it should be: array([ 8., 2., 2., 8., 8.])
of course, I can always use a for loop to achieve this goal:
for i in range(5):
c[i] = a[b[i]]
I wonder if there is a more elegant method to create this array. Something like c = a[b[0:5]] (well, this apparently doesn't work)
Only integer arrays can be used for indexing, and you've created b as a float64 array. You can get what you're looking for if you explicitly convert to integer:
bi = np.array(b, dtype=int)
c = a[bi[0:5]]

Categories

Resources