I am attempting to utilize numpy to the best of its capabilities, but I am obviously missing some important link in the documentation due to my 'Noob-ness'.
What I want to do is create an array with a certain number of rows and columns and populate it with a sub array. The sub array is incremented by a pair of values as one traverses along the row. For subsequent rows, another pair of values is used to populate the columns. The best I have come up with is to use list comprehensions to generate the desired output. At this stage I can create an array which doesn't have the desired shape...I can deal with that in an awkward fashion, so all is not lost.
Here is what I have so far:
>>> import numpy as np
>>> np.set_printoptions(precision=4,threshold=20,edgeitems=3,linewidth=80) # default print options
>>> sub = np.array([[1,2],[3,4],[5,6]],dtype='float64') # a sub array of floats
>>> sub
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])
>>> e = np.empty((3,4),dtype='object') # an empty array of the desired shape
>>> e
array([[None, None, None, None],
[None, None, None, None],
[None, None, None, None]], dtype=object)
>>> dX = 1; dY = np.sqrt(3.)/2.0 # values to add to sub array per cell in e
>>> rows,cols = e.shape # rows and columns from 'e' shape
>>> out = [sub + [dX*i,dY*(i%2)] for i in range(0,cols)] # create the first row
>>> for j in range(1,rows): # create the other rows
... out += [out[k] + [0,-dY*2*j] for k in range(cols)]
...
>>> arr = np.array(out)
>>> arr.shape # expect to see ((3,4),3,2)...I think
(12, 3, 2)
>>> arr[0:4] # I will let you try this to see the format
The last line just shows the format of the first 4 elements of the output array. What I was hoping to do was populate the empty array, e, in a fashion which is more elegant than my list comprehension method AND/OR how to reshape the array properly. Again, unless I am missing links in the documentation, I would have expected a 3x4 array containing 3x2 subarrays...which is not what it is showing me. I would appreciate any help or links to appropriate documentation, since, I have spent hours trolling this site and I am obviously missing some appropriate numpy terminology.
The first out is a list of 4 (3,2) arrays.
np.array(out) at this stage produces a (4,3,2) array. np.array creates the highest dimension array that the data allows. In this case is concatenates those 4 arrays along a new dimension.
After the rows loop, out is a list of 12 arrays. out +=... on a list appends them.
So by the same logic, arr = np.array(out) will produce a (12,3,2) array. That could be reshaped: arr = arr.reshape(3,4,3,2).
Subarrays from arr could be copied to e, e.g.:
e[0,0] = arr[0,0]
Which raises the question, why do you want array like e? What advantage does it have over arr? arr represents 'the best of numpy's capabilities,e` tries to extend to them in poorly developed areas.
Your out list can be vectorized with something along this line:
ii = np.arange(cols)
ixy = np.array([dX*ii, dY*(ii%2)])
arr1 = sub[None,:,:] + ixy.T[:,None,:]
arr1 is a (4,3,2) array, and could be copied to the e[0,:] elements.
This could be cleaned up and extended to the other rows.
A clean way of iterating over all the elements of e, and assigning the corresponding subarray of arr uses np.ndindex (from the index_tricks module):
for i in np.ndindex(3,4):
e[i]=arr[i]
While it is a Python level iteration, it does not involve copying data. It just copies pointers. I'm a little surprised about this, but e[i,j] points to the same data block as arr[i,j]. This is evident from the .__array_interface__ values, and by modifying entries, e.g.
e[1,1][0,0] = 30
changes the value of arr[1,1,0,0].
Related
I want to iterate over a numpy array and do some calculations on the values. However, things are not as expected. To show what I mean, I simply wrote this code to read values from a numpy array and move them to another list.
a = array([1,2,1]).reshape(-1, 1)
u = []
for i in np.nditer(a):
print(i)
u.append(i)
print(u)
According to tutorial, nditer points to elements and as print(i) shows, i is the value. However, when I append that i to an array, the array doesn't store the value. The expected output is u = [1, 2, 1] but the output of the code is
1
2
1
[array(1), array(2), array(1)]
What does array(1) mean exactly and how can I fix that?
P.S: I know that with .tolist() I can convert a numpy array to a standard array. However, in that code, I want to iterate over numpy elements.
As already explained in your previous question, numpy.nditer yields numpy arrays. What is shown by print is only the representation of the object, not the content or type of the object (e.g., 1 and '1' have the same representation, not the same type).
import numpy as np
a = np.array([1,2,1]).reshape(-1, 1)
type(next(np.nditer(a)))
# numpy.ndarray
You just have a zero-dimensional array:
np.array(1).shape
# ()
There is no need to use numpy.nditer here. If you really want to iterate over the rows of your array with single column (and not use tolist), use:
u = []
for i in a[:,0]:
u.append(i)
u
# [1, 2, 1]
numpy.nditer actually returns a numpy array. If you want the actual value of this item, you can use the built in item() function:
a = array([1,2,1]).reshape(-1, 1)
u = []
for i in np.nditer(a):
u.append(i.item())
print(u)
A pure python equivalent of what's happening with your append is:
In [75]: alist = []
...: x = [0]
...: for i in range(3):
...: x[0] = i
...: print(x)
...: alist.append(x)
[0]
[1]
[2]
In [76]: alist
Out[76]: [[2], [2], [2]]
In [77]: x
Out[77]: [2]
x is modified in each loop, but only a reference is saved. The result is that all elements of the list are the same object, and display its last value.
I have a function that returns many output arrays of varying size.
arr1,arr2,arr3,arr4,arr5, ... = func(data)
I want to run this function many times over a time series of data, and combine each output variable into one array that covers the whole time series.
To elaborate: If the output arr1 has dimensions (x,y) when the function is called, I want to run the function 't' times and end up with an array that has dimensions (x,y,t). A list of 't' arrays with size (x,y) would also be acceptable, but not preferred.
Again, the output arrays do not all have the same dimensions, or even the same number of dimensions. Arr2 might have size (x2,y2), arr3 might be only a vector of length (x3). I do not know the size of all of these arrays before hand.
My current solution is something like this:
arr1 = []
arr2 = []
arr3 = []
...
for t in range(t_max):
arr1_t, arr2_t, arr3_t, ... = func(data[t])
arr1.append(arr1_t)
arr2.append(arr2_t)
arr3.append(arr3_t)
...
and so on. However this is inelegant looking when repeated 27 times for each output array.
Is there a better way to do this?
You can just make arr1, arr2, etc. a list of lists (of vectors or matrices or whatever). Then use a loop to iterate the results obtained from func and add them to the individual lists.
arrN = [[] for _ in range(N)] # N being number of results from func
for t in range(t_max):
results = func(data[t])
for i, res in enumerate(results):
arrN[i].append(res)
The elements in the different sub-lists do not have to have the same dimensions.
Not sure if it counts as "elegant", but you can build a list of the result tuples then use zip to group them into tuples by return position instead of by call number, then optionally map to convert those tuples to the final data type. For example, with numpy array:
from future_builtins import map, zip # Only on Python 2, to minimize temporaries
import numpy as np
def func(x):
'Dumb function to return tuple of powers of x from 1 to 27'
return tuple(x ** i for i in range(1, 28))
# Example inputs for func
data = [np.array([[x]*10]*10, dtype=np.uint8) for in range(10)]
# Output is generator of results for each call to func
outputs = map(func, data)
# Pass each complete result of func as a positional argument to zip via star
# unpacking to regroup, so the first return from each func call is the first
# group, then the second return the second group, etc.
positional_groups = zip(*outputs)
# Convert regrouped data (`tuple`s of 2D results) to numpy 3D result type, unpack to names
arr1,arr2,arr3,arr4,arr5, ...,arr27 = map(np.array, positional_groups)
If the elements returned from func at a given position might have inconsistent dimensions (e.g. one call might return 10x10 as the first return, and another 5x5), you'd avoid the final map step (since the array wouldn't have consistent dimensions and just replace the second-to last step with:
arr1,arr2,arr3,arr4,arr5, ...,arr27 = zip(*outputs)
making arr# a tuple of 2D arrays, or if the need to be mutable:
arr1,arr2,arr3,arr4,arr5, ...,arr27 = map(list, zip(*outputs))
to make them lists of 2D arrays.
This answer gives a solution using structured arrays. It has the following requirement: Ggven a function f that returns N arrays, and the size of each of the returned arrays can be different -- then for all results of f, len(array_i) must always be same. eg.
arrs_a = f("a")
arrs_b = f("b")
for sub_arr_a, sub_arr_b in zip(arrs_a, arrs_b):
assert len(sub_arr_a) == len(sub_arr_b)
If the above is true, then you can use structured arrays. A structured array is like a normal array, just with a complex data type. For instance, I could specify a data type that is made up of one array of ints of shape 5, and a second array of floats of shape (2, 2). eg.
# define what a record looks like
dtype = [
# tuples of (field_name, data_type)
("a", "5i4"), # array of five 4-byte ints
("b", "(2,2)f8"), # 2x2 array of 8-byte floats
]
Using dtype you can create a structured array, and set all the results on the structured array in one go.
import numpy as np
def func(n):
"mock implementation of func"
return (
np.ones(5) * n,
np.ones((2,2))* n
)
# define what a record looks like
dtype = [
# tuples of (field_name, data_type)
("a", "5i4"), # array of five 4-byte ints
("b", "(2,2)f8"), # 2x2 array of 8-byte floats
]
size = 5
# create array
arr = np.empty(size, dtype=dtype)
# fill in values
for i in range(size):
# func must return a tuple
# or you must convert the returned value to a tuple
arr[i] = func(i)
# alternate way of instantiating arr
arr = np.fromiter((func(i) for i in range(size)), dtype=dtype, count=size)
# How to use structured arrays
# access individual record
print(arr[1]) # prints ([1, 1, 1, 1, 1], [[1, 1], [1, 1]])
# access specific value -- get second record -> get b field -> get value at 0,0
assert arr[2]['b'][0,0] == 2
# access all values of a specific field
print(arr['a']) # prints all the a arrays
I have piece of code that slices a 2D NumPy array and returns the resulting (sub-)array. In some cases, the slicing only indexes one element, in which case the result is a one-element array:
>>> sub_array = orig_array[indices_h, indices_w]
>>> sub_array.shape
(1,)
How can I force this array to be two-dimensional in a general way? I.e.:
>>> sub_array.shape
(1,1)
I know that sub_array.reshape(1,1) works, but I would like to be able to apply it to sub_array generally without worrying about the number of elements in it. To put it in another way, I would like to compose a (light-weight) operation that converts a shape-(1,) array to a shape-(1,1) array, a shape-(2,2) array to a shape-(2,2) array etc. I can make a function:
def twodimensionalise(input_array):
if input_array.shape == (1,):
return input_array.reshape(1,1)
else:
return input_array
Is this the best I am going to get or does NumPy have something more 'native'?
Addition:
As pointed out in https://stackoverflow.com/a/31698471/865169, I was doing the indexing wrong. I really wanted to do:
sub_array = orig_array[indices_h][:, indices_w]
This does not work when there is only one entry in indices_h, but combining it with np.atleast_2d suggested in another answer, I arrive at:
sub_array = np.atleast_2d(orig_array[indices_h])[:, indices_w]
It sounds like you might be looking for atleast_2d. This function returns a view of a 1D array as a 2D array:
>>> arr1 = np.array([1.7]) # shape (1,)
>>> np.atleast_2d(arr1)
array([[ 1.7]])
>>> _.shape
(1, 1)
Arrays that are already 2D (or have more dimensions) are unchanged:
>>> arr2 = np.arange(4).reshape(2,2) # shape (2, 2)
>>> np.atleast_2d(arr2)
array([[0, 1],
[2, 3]])
>>> _.shape
(2, 2)
When defining a numpy array you can use the keyword argument ndmin to specify that you want at least two dimensions.
e.g.
arr = np.array(item_list, ndmin=2)
arr.shape
>>> (100, 1) # if item_list is 100 elements long etc
In the example in the question, just do
sub_array = np.array(orig_array[indices_h, indices_w], ndmin=2)
sub_array.shape
>>> (1,1)
This can be extended to higher dimensions too, unlike np.atleast_2d().
Are you sure you are indexing in the way you want to? In the case where indices_h and indices_w are broadcastable integer indexing arrays, the result will have the broadcasted shape of indices_h and indices_w. So if you want to make sure that the result is 2D, make the indices arrays 2D.
Otherwise, if you want all combinations of indices_h[i] and indices_w[j] (for all i, j), do e.g. a sequential indexing:
sub_array = orig_array[indices_h][:, indices_w]
Have a look at the documentation for details about advanced indexing.
Say I have a 3 dimensional numpy array:
np.random.seed(1145)
A = np.random.random((5,5,5))
and I have two lists of indices corresponding to the 2nd and 3rd dimensions:
second = [1,2]
third = [3,4]
and I want to select the elements in the numpy array corresponding to
A[:][second][third]
so the shape of the sliced array would be (5,2,2) and
A[:][second][third].flatten()
would be equivalent to to:
In [226]:
for i in range(5):
for j in second:
for k in third:
print A[i][j][k]
0.556091074129
0.622016249651
0.622530505868
0.914954716368
0.729005532319
0.253214472335
0.892869371179
0.98279375528
0.814240066639
0.986060321906
0.829987410941
0.776715489939
0.404772469431
0.204696635072
0.190891168574
0.869554447412
0.364076117846
0.04760811817
0.440210532601
0.981601369658
Is there a way to slice a numpy array in this way? So far when I try A[:][second][third] I get IndexError: index 3 is out of bounds for axis 0 with size 2 because the [:] for the first dimension seems to be ignored.
Numpy uses multiple indexing, so instead of A[1][2][3], you can--and should--use A[1,2,3].
You might then think you could do A[:, second, third], but the numpy indices are broadcast, and broadcasting second and third (two one-dimensional sequences) ends up being the numpy equivalent of zip, so the result has shape (5, 2).
What you really want is to index with, in effect, the outer product of second and third. You can do this with broadcasting by making one of them, say second into a two-dimensional array with shape (2,1). Then the shape that results from broadcasting second and third together is (2,2).
For example:
In [8]: import numpy as np
In [9]: a = np.arange(125).reshape(5,5,5)
In [10]: second = [1,2]
In [11]: third = [3,4]
In [12]: s = a[:, np.array(second).reshape(-1,1), third]
In [13]: s.shape
Out[13]: (5, 2, 2)
Note that, in this specific example, the values in second and third are sequential. If that is typical, you can simply use slices:
In [14]: s2 = a[:, 1:3, 3:5]
In [15]: s2.shape
Out[15]: (5, 2, 2)
In [16]: np.all(s == s2)
Out[16]: True
There are a couple very important difference in those two methods.
The first method would also work with indices that are not equivalent to slices. For example, it would work if second = [0, 2, 3]. (Sometimes you'll see this style of indexing referred to as "fancy indexing".)
In the first method (using broadcasting and "fancy indexing"), the data is a copy of the original array. In the second method (using only slices), the array s2 is a view into the same block of memory used by a. An in-place change in one will change them both.
One way would be to use np.ix_:
>>> out = A[np.ix_(range(A.shape[0]),second, third)]
>>> out.shape
(5, 2, 2)
>>> manual = [A[i,j,k] for i in range(5) for j in second for k in third]
>>> (out.ravel() == manual).all()
True
Downside is that you have to specify the missing coordinate ranges explicitly, but you could wrap that into a function.
I think there are three problems with your approach:
Both second and third should be slices
Since the 'to' index is exclusive, they should go from 1 to 3 and from 3 to 5
Instead of A[:][second][third], you should use A[:,second,third]
Try this:
>>> np.random.seed(1145)
>>> A = np.random.random((5,5,5))
>>> second = slice(1,3)
>>> third = slice(3,5)
>>> A[:,second,third].shape
(5, 2, 2)
>>> A[:,second,third].flatten()
array([ 0.43285482, 0.80820122, 0.64878266, 0.62689481, 0.01298507,
0.42112921, 0.23104051, 0.34601169, 0.24838564, 0.66162209,
0.96115751, 0.07338851, 0.33109539, 0.55168356, 0.33925748,
0.2353348 , 0.91254398, 0.44692211, 0.60975602, 0.64610556])
I'm dealing with arrays in python, and this generated a lot of doubts...
1) I produce a list of list reading 4 columns from N files and I store 4 elements for N times in a list. I then convert this list in a numpy array:
s = np.array(s)
and I ask for the shape of this array. The answer is correct:
print s.shape
#(N,4)
I then produce the mean of this Nx4 array:
s_m = sum(s)/len(s)
print s_m.shape
#(4,)
that I guess it means that this array is a 1D array. Is this correct?
2) If I subtract the mean vector s_m from the rows of the array s, I can proceed in two ways:
residuals_s = s - s_m
or:
residuals_s = []
for i in range(len(s)):
residuals_s.append([])
tmp = s[i] - s_m
residuals_s.append(tmp)
if I now ask for the shape of residuals_s in the two cases I obtain two different answers. In the first case I obtain:
(N,4)
in the second:
(N,1,4)
can someone explain why there is an additional dimension?
You can get the mean using the numpy method (producing the same (4,) shape):
s_m = s.mean(axis=0)
s - s_m works because s_m is 'broadcasted' to the dimensions of s.
If I run your second residuals_s I get a list containing empty lists and arrays:
[[],
array([ 1.02649662, 0.43613824, 0.66276758, 2.0082684 ]),
[],
array([ 1.13000227, -0.94129685, 0.63411801, -0.383982 ]),
...
]
That does not convert to a (N,1,4) array, but rather a (M,) array with dtype=object. Did you copy and paste correctly?
A corrected iteration is:
for i in range(len(s)):
residuals_s.append(s[i]-s_m)
produces a simpler list of arrays:
[array([ 1.02649662, 0.43613824, 0.66276758, 2.0082684 ]),
array([ 1.13000227, -0.94129685, 0.63411801, -0.383982 ]),
...]
which converts to a (N,4) array.
Iteration like this usually is not needed. But if it is, appending to lists like this is one way to go. Another is to pre allocate an array, and assign rows
residuals_s = np.zeros_like(s)
for i in range(s.shape[0]):
residuals_s[i,:] = s[i]-s_m
I get your (N,1,4) with:
In [39]: residuals_s=[]
In [40]: for i in range(len(s)):
....: residuals_s.append([])
....: tmp = s[i] - s_m
....: residuals_s[-1].append(tmp)
In [41]: residuals_s
Out[41]:
[[array([ 1.02649662, 0.43613824, 0.66276758, 2.0082684 ])],
[array([ 1.13000227, -0.94129685, 0.63411801, -0.383982 ])],
...]
In [43]: np.array(residuals_s).shape
Out[43]: (10, 1, 4)
Here the s[i]-s_m array is appended to an empty list, which has been appended to the main list. So it's an array within a list within a list. It's this intermediate list that produces the middle 1 dimension.
You are using NumPy ndarray without using the functions in NumPy, sum() is a python builtin function, you should use numpy.sum() instead.
I suggest you change your code as:
import numpy as np
np.random.seed(0)
s = np.random.randn(10, 4)
s_m = np.mean(a, axis=0, keepdims=True)
residuals_s = s - s_m
print s.shape, s_m.shape, residuals_s.shape
use mean() function with axis and keepdims arguments will give you the correct result.