Python, numpy.array slicing, altering array values with slices - python

I have a Task for numerical integration in which we approximate integral with quadrature formula. My problem is that the task needs me to avoid loops and use vectorized variant, which would be a slice?!
I have np.array object with n values and i have to alter each value of this array using a specific formula. The problem is that the value of this array at point i ist used in the formula to alter the position in. With a for loop it would be easy:
x = np.array([...])
for i in range(0,n):
x[i]=f(x[i]+a)*b`
(a,b some othe variables)
How do i do this with slices? I Have to do this for all elements of the array so it would be something like:
x[:]=f(x[???]+a)*b
And how do i get the right position from my array in to the formula? A slicing instruction like x[:] just runs through my whole object. Is there a way to somehow save the index i am currently at?
I tried to search but found nothing. The other problem is that i do not even know how to properly put the search request...

You may be confusing two issues
modifying all elements of an array
calculating values for all elements of an array
In
x = np.array([...])
for i in range(0,n):
x[i]=f(x[i]+a)*b`
you change elements of x one by one, and also pass them one by one to f.
x[:] = ... lets you change all elements of x at once, but the source (the right hand side of the equation) has to generate all those values. But usually you don't need to assign values. Instead just use x = .... It's just as fast and memory efficient.
Using x[:] on the RHS does nothing for you. If x is a list this makes a copy; if x is an array is just returns a view, an array with the same values.
The key question is, what does your f(...) function accept? If it uses operations like +, * and functions like np.sin, you can give it an array, and it will return an array.
But if it only works with scalars (that includes using functions like math.sin), the you have to feed it scalars, i.e. x[i].
Let's try to unpack that comment (which might be better as an edit to the original question)
I have an interval which has to be cut in picies.
x = np.linspace(start,end,pieceAmount)
function f
quadrature formula
b (weights or factors)
c (function values)
b1*f(x[i]+c1)+...+bn*f(x[i]+cn)
For example
In [1]: x = np.arange(5)
In [2]: b = np.arange(3)
In [6]: c = np.arange(4,7)*.1
We can do the x[i]+c for all x and c with broadcasting
In [7]: xc = x + c[:,None]
In [8]: xc
Out[8]:
array([[ 0.4, 1.4, 2.4, 3.4, 4.4],
[ 0.5, 1.5, 2.5, 3.5, 4.5],
[ 0.6, 1.6, 2.6, 3.6, 4.6]])
If f is a function like np.sin that takes any array, we can pass xc to that, getting back a like sized array.
Again with broadcasting we can do the b[n]*f(x[i]+c[n]) calculation
In [9]: b[:,None]* np.sin(xc)
Out[9]:
array([[ 0. , 0. , 0. , -0. , -0. ],
[ 0.47942554, 0.99749499, 0.59847214, -0.35078323, -0.97753012],
[ 1.12928495, 1.99914721, 1.03100274, -0.88504089, -1.98738201]])
and then we can sum, getting back an array shaped just like x:
In [10]: np.sum(_, axis=0)
Out[10]: array([ 1.60871049, 2.99664219, 1.62947489, -1.23582411, -2.96491212])
That's the dot or matrix product:
In [11]: b.dot(np.sin(xc))
Out[11]: array([ 1.60871049, 2.99664219, 1.62947489, -1.23582411, -2.96491212])
And as I noted earlier we can complete the action with
x = b.dot(f(x+c[:,None])
The key to a simple expression like this is f taking an array.

Related

Using `numpy.vectorize` to create multidimensional array results in ValueError: setting an array element with a sequence

This problem only seems to arise when my dummy function returns an array and thus, a multidimensional array is being created.
I reduced the issue to the following example:
def dummy(x):
y = np.array([np.sin(x), np.cos(x)])
return y
x = np.array([0, np.pi/2, np.pi])
The code I want to optimize looks like this:
y = []
for x_i in x:
y_i = dummy(x_i)
y.append(y_i)
y = np.array(y)
So I thought, I could use vectorize to get rid of the slow loop:
y = np.vectorize(dummy)(x)
But this results in
ValueError: setting an array element with a sequence.
Where even is the sequence, which the error is talking about?!
Your function returns an array when given a scalar:
In [233]: def dummy(x):
...: y = np.array([np.sin(x), np.cos(x)])
...: return y
...:
...:
In [234]: dummy(1)
Out[234]: array([0.84147098, 0.54030231])
In [235]: f = np.vectorize(dummy)
In [236]: f([0,1,2])
...
ValueError: setting an array element with a sequence.
vectorize constructs a empty result array, and tries to put the result of each calculation in it. But a cell of the target array cannot accept an array.
If we specify a otypes parameter, it does work:
In [237]: f = np.vectorize(dummy, otypes=[object])
In [238]: f([0,1,2])
Out[238]:
array([array([0., 1.]), array([0.84147098, 0.54030231]),
array([ 0.90929743, -0.41614684])], dtype=object)
That is, each dummy array is put in a element of a shape (3,) result array.
Since the component arrays all have the same shape, we can stack them:
In [239]: np.stack(_)
Out[239]:
array([[ 0. , 1. ],
[ 0.84147098, 0.54030231],
[ 0.90929743, -0.41614684]])
But as noted, vectorize does not promise a speedup. I suspect we could also use the newer signature parameter, but that's even slower.
vectorize makes some sense if your function takes several scalar arguments, and you'd like to take advantage of numpy broadcasting when feeding sets of values. But as replacement for a simple iteration over a 1d array, it isn't an improvement.
I don't really understand the error either, but with python 3.6.3 you can just write:
y = dummy(x)
so it is automatically vectorized.
Also in the official documentation there is written the following:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
I hope this was at least a little help.

Python equivalent to MATLAB's dynamic array initialization [duplicate]

I want to create an empty array and append items to it, one at a time.
xs = []
for item in data:
xs.append(item)
Can I use this list-style notation with NumPy arrays?
That is the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. To append rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly.
Instead of appending rows, allocate a suitably sized array, and then assign to it row-by-row:
>>> import numpy as np
>>> a = np.zeros(shape=(3, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a[0] = [1, 2]
>>> a[1] = [3, 4]
>>> a[2] = [5, 6]
>>> a
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])
A NumPy array is a very different data structure from a list and is designed to be used in different ways. Your use of hstack is potentially very inefficient... every time you call it, all the data in the existing array is copied into a new one. (The append function will have the same issue.) If you want to build up your matrix one column at a time, you might be best off to keep it in a list until it is finished, and only then convert it into an array.
e.g.
mylist = []
for item in data:
mylist.append(item)
mat = numpy.array(mylist)
item can be a list, an array or any iterable, as long
as each item has the same number of elements.
In this particular case (data is some iterable holding the matrix columns) you can simply use
mat = numpy.array(data)
(Also note that using list as a variable name is probably not good practice since it masks the built-in type by that name, which can lead to bugs.)
EDIT:
If for some reason you really do want to create an empty array, you can just use numpy.array([]), but this is rarely useful!
To create an empty multidimensional array in NumPy (e.g. a 2D array m*n to store your matrix), in case you don't know m how many rows you will append and don't care about the computational cost Stephen Simmons mentioned (namely re-buildinging the array at each append), you can squeeze to 0 the dimension to which you want to append to: X = np.empty(shape=[0, n]).
This way you can use for example (here m = 5 which we assume we didn't know when creating the empty matrix, and n = 2):
import numpy as np
n = 2
X = np.empty(shape=[0, n])
for i in range(5):
for j in range(2):
X = np.append(X, [[i, j]], axis=0)
print X
which will give you:
[[ 0. 0.]
[ 0. 1.]
[ 1. 0.]
[ 1. 1.]
[ 2. 0.]
[ 2. 1.]
[ 3. 0.]
[ 3. 1.]
[ 4. 0.]
[ 4. 1.]]
I looked into this a lot because I needed to use a numpy.array as a set in one of my school projects and I needed to be initialized empty... I didn't found any relevant answer here on Stack Overflow, so I started doodling something.
# Initialize your variable as an empty list first
In [32]: x=[]
# and now cast it as a numpy ndarray
In [33]: x=np.array(x)
The result will be:
In [34]: x
Out[34]: array([], dtype=float64)
Therefore you can directly initialize an np array as follows:
In [36]: x= np.array([], dtype=np.float64)
I hope this helps.
For creating an empty NumPy array without defining its shape you can do the following:
arr = np.array([])
The first one is preferred because you know you will be using this as a NumPy array. NumPy converts this to np.ndarray type afterward, without extra [] 'dimension'.
for adding new element to the array us can do:
arr = np.append(arr, 'new element')
Note that in the background for python there's no such thing as an array without
defining its shape. as #hpaulj mentioned this also makes a one-rank
array.
You can use the append function. For rows:
>>> from numpy import *
>>> a = array([10,20,30])
>>> append(a, [[1,2,3]], axis=0)
array([[10, 20, 30],
[1, 2, 3]])
For columns:
>>> append(a, [[15],[15]], axis=1)
array([[10, 20, 30, 15],
[1, 2, 3, 15]])
EDIT
Of course, as mentioned in other answers, unless you're doing some processing (ex. inversion) on the matrix/array EVERY time you append something to it, I would just create a list, append to it then convert it to an array.
Here is some workaround to make numpys look more like Lists
np_arr = np.array([])
np_arr = np.append(np_arr , 2)
np_arr = np.append(np_arr , 24)
print(np_arr)
OUTPUT: array([ 2., 24.])
If you absolutely don't know the final size of the array, you can increment the size of the array like this:
my_arr = numpy.zeros((0,5))
for i in range(3):
my_arr=numpy.concatenate( ( my_arr, numpy.ones((1,5)) ) )
print(my_arr)
[[ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.]]
Notice the 0 in the first line.
numpy.append is another option. It calls numpy.concatenate.
You can apply it to build any kind of array, like zeros:
a = range(5)
a = [i*0 for i in a]
print a
[0, 0, 0, 0, 0]
Depending on what you are using this for, you may need to specify the data type (see 'dtype').
For example, to create a 2D array of 8-bit values (suitable for use as a monochrome image):
myarray = numpy.empty(shape=(H,W),dtype='u1')
For an RGB image, include the number of color channels in the shape: shape=(H,W,3)
You may also want to consider zero-initializing with numpy.zeros instead of using numpy.empty. See the note here.
Another simple way to create an empty array that can take array is:
import numpy as np
np.empty((2,3), dtype=object)
I think you want to handle most of the work with lists then use the result as a matrix. Maybe this is a way ;
ur_list = []
for col in columns:
ur_list.append(list(col))
mat = np.matrix(ur_list)
I think you can create empty numpy array like:
>>> import numpy as np
>>> empty_array= np.zeros(0)
>>> empty_array
array([], dtype=float64)
>>> empty_array.shape
(0,)
This format is useful when you want to append numpy array in the loop.
Perhaps what you are looking for is something like this:
x=np.array(0)
In this way you can create an array without any element. It similar than:
x=[]
This way you will be able to append new elements to your array in advance.
The simplest way
Input:
import numpy as np
data = np.zeros((0, 0), dtype=float) # (rows,cols)
data.shape
Output:
(0, 0)
Input:
for i in range(n_files):
data = np.append(data, new_data, axis = 0)

Sorting a NumPy array and permuting another one along with it

I have two, numpy arrays, the first, A, being one-dimensional, the second, B, is two-dimensional in the application I have in mind, but really could have any dimension. Every single index of B covers the same range as the single index of A.
Now, I'd like to sort A (in descending order) but would like to permute every dimension of B along with it. Mathematically speaking, if P is the permutation matrix that sorts A, I would like to transform B according to np.dot(P, np.dot(B, P.T)). E.g. consider this example where sorting coincidentally corresponds to reversing the order:
In [1]: import numpy as np
In [2]: A = np.array([1,2,3])
In [3]: B = np.random.rand(3,3); B
Out[3]:
array([[ 0.67402953, 0.45017072, 0.24324747],
[ 0.40559793, 0.79007712, 0.94247771],
[ 0.47477422, 0.27599007, 0.13941255]])
In [4]: # desired output:
In [5]: A[::-1]
Out[5]: array([3, 2, 1])
In [6]: B[::-1,::-1]
Out[6]:
array([[ 0.13941255, 0.27599007, 0.47477422],
[ 0.94247771, 0.79007712, 0.40559793],
[ 0.24324747, 0.45017072, 0.67402953]])
The application I have in mind is to obtain eigenvalues and eigenvectors of a nonsymmetric matrix using np.linalg.eig (in contrast to eigh, eig does not guarantee any ordering of the eigenvalues), sort them by absolute value, and truncate the space. It would be beneficial to permute the components of the matrix holding the eigenvectors along with the eigenvalues and perform the truncation by slicing it.
You can use np.argsort to get sorted indices of A. Then you can use these indices to rearrange B.
It is not entirely cear how you want to rearrange B...
p = np.argsort(A)
B[:, p][p, :] # rearrange rows and column of B
B.transpose(p) # rearrange dimensions of B
If you want to order eigenvectors according to eigenvalues, you should only rearrange the columns of the eigenvectors:
(Also, it may make sense to use the absolute value, in case you get complex eigenvalues)
e, v = eig(x)
p = np.argsort(np.abs(e))[::-1] # descending order
v = v[:, p]
You can use numpy.argsort to get the index mapping. For example:
test=np.array([2,1,3])
test_array=np.array([[2,3,4],[1,2,3]])
rearranged_array=test_array[:,test.argsort()]
Here, test.argsort() yields [1,0,2].

numpy.nanmean of subsets of elements

I want to take subsets of elements and quickly apply nanmean to the associated columns, without looping.
For specificity, consider the reduction array r=[0,2,3], and the data array
a=np.array([
[2,3,4],
[3,np.NaN,5],
[16,66,666],
[2,2,5],
[np.NaN,3,4],
[np.NaN,4,5],
[np.NaN,5,4],
[3,6,4.5],
])
then I want to get back
b = np.array([
[2.5,3,4.5],
[16,66,666],
[2.5,4,4.5],
])
The top answer to this question solves the problem (for a single column) by using reduceat. Unfortunately for me, since nanmean is not a ufunc that trick does not work.
I don't think there's a one-liner to do this, because there are no nan-aware ufuncs in numpy.
But you can do something based on reduceat, after (temporarily) replacing all the nans in a:
For example, here's a quick function that accomplishes what you want:
def nanmean_reduceat(x, indices):
mask = np.isnan(x)
# use try-finally to make sure x is reset
# to its original state even if an error is raised.
try:
x[mask] = 0
return np.add.reduceat(x, indices) / np.add.reduceat(~mask, indices)
finally:
x[mask] = np.nan
then you can call
>>> nanmean_reduceat(a, [0, 2, 3])
array([[ 2.5, 3. , 4.5],
[ 16. , 66. , 666. ],
[ 2.5, 4. , 4.5]])
Hope that helps!
Edit: for brevity, I removed the empty except block and moved the return statement inside the try block. Because of the way finally statements work, the resetting of x is still executed!

Getting index of numpy.ndarray

I have a one-dimensional array of the type numpy.ndarray and I want to know the index of it's max entry. After finding the max, I used
peakIndex = numpy.where(myArray==max)
to find the peak's index. But instead of the index, my script spits out
peakIndex = (array([1293]),)
I want my code to spit out just the integer 1293. How can I clean up the output?
Rather than using numpy.where, you can use numpy.argmax.
peakIndex = numpy.argmax(myArray)
numpy.argmax returns a single number, the flattened index of the first occurrence of the maximum value. If myArray is multidimensional you might want to convert the flattened index to an index tuple:
peakIndexTuple = numpy.unravel_index(numpy.argmax(myArray), myArray.shape)
To find the max value of an array, you can use the array.max() method. This will probably be more efficient than the for loop described in another answer, which- in addition to not being pythonic- isn't actually written in python. (if you wanted to take items out of the array one by one to compare, you could use ndenumerate, but you'd be sacrificing some of the performance benefits of arrays)
The reason that numpy.where() yields results as tuples is that more than one position could be equal to the max... and it's that edge case that would make something simple (like taking array[0]) prone to bugs. Per Is there a Numpy function to return the first index of something in an array?,
"The result is a tuple with first all the row indices, then all the
column indices".
Your example uses a 1-D array, so you'd get the results you want directly from the array provided. It's a tuple with one element (one array of indices), and although you can iterate over ind_1d[0] directly, I converted it to a list solely for readability.
>>> peakIndex_1d
array([ 1. , 1.1, 1.6, 1. , 1.6, 0.8])
>>> ind_1d = numpy.where( peakIndex_1d == peakIndex_1d.max() )
(array([2, 4]),)
>>> list( ind_1d[0] )
[2, 4]
For a 2-D array with 3 values equal to the max, you could use:
>>> peakIndex
array([[ 0. , 1.1, 1.5],
[ 1.1, 1.5, 0.7],
[ 0.2, 1.2, 1.5]])
>>> indices = numpy.where( peakIndex == peakIndex.max() )
>>> ind2d = zip(indices[0], indices[1])
[(0, 2), (1, 1), (2, 2)]

Categories

Resources