numpy padding matrix of different row size - python

I have a numpy array of different row size
a = np.array([[1,2,3,4,5],[1,2,3],[1]])
and I would like to become this one into a dense (fixed n x m size, no variable rows) matrix. Until now I tried with something like this
size = (len(a),5)
result = np.zeros(size)
result[[0],[len(a[0])]]=a[0]
But I receive an error telling me
shape mismatch: value array of shape (5,) could not be broadcast to
indexing result of shape (1,)
I also tried to do padding wit np.pad, but according to the documentation of numpy.pad it seems I need to specify in the pad_width, the previous size of the rows (which is variable and produced me errors trying with -1,0, and biggest row size).
I know I can do it padding padding lists per row as it's shown here, but I need to do that with a much bigger array of data.
If someone can help me with the answer to this question, I would be glad to know of it.

There's really no way to pad a jagged array such that it would loose its jaggedness, without having to iterate over the rows of the array. You'll have to iterate over the array twice even: once to find out the maximum length you need to pad to, another to actually do the padding.
The code proposal you've linked to will get the job done, but it's not very efficient, because it adds zeroes in a python for-loop that iterates over the elements of the rows, whereas that appending could have been precalculated, thereby pushing more of that code to C.
The code below precomputes an array of the required minimal dimensions, filled with zeroes and then simply adds the row from the jagged array M in place, which is far more efficient.
import random
import numpy as np
M = [[random.random() for n in range(random.randint(0,m))] for m in range(10000)] # play-data
def pad_to_dense(M):
"""Appends the minimal required amount of zeroes at the end of each
array in the jagged array `M`, such that `M` looses its jagedness."""
maxlen = max(len(r) for r in M)
Z = np.zeros((len(M), maxlen))
for enu, row in enumerate(M):
Z[enu, :len(row)] += row
return Z
To give you some idea for speed:
from timeit import timeit
n = [10, 100, 1000, 10000]
s = [timeit(stmt='Z = pad_to_dense(M)', setup='from __main__ import pad_to_dense; import numpy as np; from random import random, randint; M = [[random() for n in range(randint(0,m))] for m in range({})]'.format(ni), number=1) for ni in n]
print('\n'.join(map(str,s)))
# 7.838103920221329e-05
# 0.0005027339793741703
# 0.01208890089765191
# 0.8269036808051169
If you want to prepend zeroes to the arrays, rather than append, that's a simple enough change to the code, which I'll leave to you.

You can do something like this with numpy.pad
import numpy as np
a = np.array([[1,2,3,4,5],[1,2,3],[1]])
l = np.array([len(a[i]) for i in range(len(a))])
width = l.max()
b=[]
for i in range(len(a)):
if len(a[i]) != width:
x = np.pad(a[i], (0,width-len(a[i])), 'constant',constant_values = 0)
else:
x = a[i]
b.append(x)
b = np.array(b)
print(b)
Above piece of code outputs something like this.
b = [[1, 2, 3, 4, 5],
[1, 2, 3, 0, 0],
[1, 0, 0, 0, 0]]
You can read back your input version of data by doing something as follows
a = []
for i in range(len(b)):
a.append(b[i][0:l[i]])
a = np.array(a)
print(a)
where you get the following output
a = array([array([1, 2, 3, 4, 5]), array([1, 2, 3]), array([1])], dtype=object)
Hopefully this helps someone who struggled like me to solve the issue.
Thank you.

Related

Is there a difference in the way we access elements of a list comprehension and the elements of a numpy array

I am working on a genetic algorithm code. I am fairly new to python.
My code snippet is as follows:
import numpy as np
pop_size = 10 # Population size
noi = 2 # Number of Iterations
M = 2 # Number of Phases in the Data
alpha = [np.random.randint(0, 64, size = pop_size)]* M
phi = [np.random.randint(0, 64, size = pop_size)]* M
reduced_tensor = [np.zeros((pop_size,3,3))]* M
for n_i in range(noi):
alpha_en = [(2*np.pi*alpha/63.00) for alpha in alpha]
phi_en = [(phi/63.00) for phi in phi]
for i in range(M):
for j in range(pop_size):
reduced_tensor[i][j] = [[1, 0, 0],
[0, phi_en[i][j], 0],
[0, 0, 0]]
Here I have a list of numpy arrays. The variable 'alpha' is a list containing two numpy arrays. How do I use list comprehension in this case? I want to create a similar list 'alpha_en' which operates on every element of alpha. How do I do that? I know my current code is wrong, it was just trial and error.
What does 'for alpha in alpha' mean (line 11)? This line doesn't give any error, but also doesn't give the desired output. It changes the dimension and value of alpha.
The variable 'reduced_tensor' is a list of an array of 3x3 matrix, i.e., four dimensions in total. How do I differentiate between the indexing of a list comprehension and a numpy array? I want to perform various operations on a list of matrices, in this case, assign the values of phi_en to one of the elements of the matrix reduced_tensor (as shown in the code). How should I do it efficiently? I think my current code is wrong, if not just confusing.
There some questionable programming in these 2 lines
alpha = [np.random.randint(0, 64, size = pop_size)]* M
...
alpha_en = [(2*np.pi*alpha/63.00) for alpha in alpha]
The first makes an array, and then makes a list with M pointers to the same thing. Note, M copies of the random array. If I were to change one element of alpha, I'd change them all. I don't see the point to this type of construction.
The [... for alpha in alpha] works because the 2 uses of alpha are different. At least in newer Pythons the i in [i*3 for i in range(3)] does not 'leak out' of the comprehension. That said, I would not approve of that variable naming. At the very least is it confusing to readers.
The arrays in alpha_en are separate. Values are derived from the array in alpha, but they are new.
for a in alphas:
a *= 2
would modify each array in alphas; how ever due to how alphas is constructed this ends up multiplying the array many times.
reduced_tensor = [np.zeros((pop_size,3,3))]* M
has the same problem; it's a list of M references to the same 3d array.
reduced_tensor[i][j]
references the i reference in that list, and the j 'row' of that array. I like to use
reduced_tensor[i][j,:,:]
to make it clearer to me and my reader the expected dimensions of the result.
The iteration over M does nothing for you; it just repeats the same assignment M times.
At the root of your problems is that use of list replication.
In [30]: x=[np.arange(3)]*3
In [31]: x
Out[31]: [array([0, 1, 2]), array([0, 1, 2]), array([0, 1, 2])]
In [32]: [id(i) for i in x]
Out[32]: [3036895536, 3036895536, 3036895536]
In [33]: x[0] *= 10
In [34]: x
Out[34]: [array([ 0, 10, 20]), array([ 0, 10, 20]), array([ 0, 10, 20])]

How to create an array of dimension n+1 given a function returning an array of dimension n

With numpy and python3 I have to following problem:
I have a function which returns a 2 dimensional array of integers of fixed size (2x3 in this case). What is the most idiomatic way to run this function n times and stack these together to a 3 dimensional 2x3xn array? What about performance? Something which only does the minimum number of allocations would be nice.
You are probably looking for np.dstack:
>>> import numpy as np
>>> arrs = [np.random.rand(2, 3) for x in range(5)]
>>> np.dstack(arrs).shape
(2, 3, 5)
If you know the final shape you can do something like the following:
>>> out = np.empty((2, 3, 5))
>>> out[..., 0] = np.random.rand(2, 3)

My numpy array always ends in zero?

I think I missed something somewhere. I filled a numpy array using two for loops (x and y) and a function based on the x,y position. The only problem is that the value of the array always ends in zero irregardless of the size of the array.
thetamap = numpy.zeros(36, dtype=float)
thetamap.shape = (6, 6)
for y in range(0,5):
for x in range(0,5):
thetamap[x][y] = x+y
print thetamap
range(0, 5) produces 0, 1, 2, 3, 4. The endpoint is always omitted. You want simply range(6).
Better yet, use the awesome power of NumPy to make the array in one line:
thetamap = np.arange(6) + np.arange(6)[:,None]
This makes a row vector and a column vector, then adds them together using NumPy broadcasting to make a matrix.

Index the middle of a numpy array?

To index the middle points of a numpy array, you can do this:
x = np.arange(10)
middle = x[len(x)/4:len(x)*3/4]
Is there a shorthand for indexing the middle of the array? e.g., the n or 2n elements closes to len(x)/2? Is there a nice n-dimensional version of this?
as cge said, the simplest way is by turning it into a lambda function, like so:
x = np.arange(10)
middle = lambda x: x[len(x)/4:len(x)*3/4]
or the n-dimensional way is:
middle = lambda x: x[[slice(np.floor(d/4.),np.ceil(3*d/4.)) for d in x.shape]]
Late, but for everyone else running into this issue:
A much smoother way is to use numpy's take or put.
To address the middle of an array you can use put to index an n-dimensional array with a single index. Same for getting values from an array with take
Assuming your array has an odd number of elements, the middle of the array will be at half of it's size. By using an integer division (// instead of /) you won't get any problems here.
import numpy as np
arr = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
# put a value to the center
np.put(arr, arr.size // 2, 999)
print(arr)
# take a value from the center
center = np.take(arr, arr.size // 2)
print(center)

Convert a 1D array to a 2D array in numpy

I want to convert a 1-dimensional array into a 2-dimensional array by specifying the number of columns in the 2D array. Something that would work like this:
> import numpy as np
> A = np.array([1,2,3,4,5,6])
> B = vec2matrix(A,ncol=2)
> B
array([[1, 2],
[3, 4],
[5, 6]])
Does numpy have a function that works like my made-up function "vec2matrix"? (I understand that you can index a 1D array like a 2D array, but that isn't an option in the code I have - I need to make this conversion.)
You want to reshape the array.
B = np.reshape(A, (-1, 2))
where -1 infers the size of the new dimension from the size of the input array.
You have two options:
If you no longer want the original shape, the easiest is just to assign a new shape to the array
a.shape = (a.size//ncols, ncols)
You can switch the a.size//ncols by -1 to compute the proper shape automatically. Make sure that a.shape[0]*a.shape[1]=a.size, else you'll run into some problem.
You can get a new array with the np.reshape function, that works mostly like the version presented above
new = np.reshape(a, (-1, ncols))
When it's possible, new will be just a view of the initial array a, meaning that the data are shared. In some cases, though, new array will be acopy instead. Note that np.reshape also accepts an optional keyword order that lets you switch from row-major C order to column-major Fortran order. np.reshape is the function version of the a.reshape method.
If you can't respect the requirement a.shape[0]*a.shape[1]=a.size, you're stuck with having to create a new array. You can use the np.resize function and mixing it with np.reshape, such as
>>> a =np.arange(9)
>>> np.resize(a, 10).reshape(5,2)
Try something like:
B = np.reshape(A,(-1,ncols))
You'll need to make sure that you can divide the number of elements in your array by ncols though. You can also play with the order in which the numbers are pulled into B using the order keyword.
If your sole purpose is to convert a 1d array X to a 2d array just do:
X = np.reshape(X,(1, X.size))
convert a 1-dimensional array into a 2-dimensional array by adding new axis.
a=np.array([10,20,30,40,50,60])
b=a[:,np.newaxis]--it will convert it to two dimension.
There is a simple way as well, we can use the reshape function in a different way:
A_reshape = A.reshape(No_of_rows, No_of_columns)
You can useflatten() from the numpy package.
import numpy as np
a = np.array([[1, 2],
[3, 4],
[5, 6]])
a_flat = a.flatten()
print(f"original array: {a} \nflattened array = {a_flat}")
Output:
original array: [[1 2]
[3 4]
[5 6]]
flattened array = [1 2 3 4 5 6]
some_array.shape = (1,)+some_array.shape
or get a new one
another_array = numpy.reshape(some_array, (1,)+some_array.shape)
This will make dimensions +1, equals to adding a bracket on the outermost
Change 1D array into 2D array without using Numpy.
l = [i for i in range(1,21)]
part = 3
new = []
start, end = 0, part
while end <= len(l):
temp = []
for i in range(start, end):
temp.append(l[i])
new.append(temp)
start += part
end += part
print("new values: ", new)
# for uneven cases
temp = []
while start < len(l):
temp.append(l[start])
start += 1
new.append(temp)
print("new values for uneven cases: ", new)
import numpy as np
array = np.arange(8)
print("Original array : \n", array)
array = np.arange(8).reshape(2, 4)
print("New array : \n", array)

Categories

Resources