Create a Numpy array from parts of other arrays in Python - python

EDIT
So I just understand now that Python and Numpy just don't do polymorphism very well. Given the same data, it has to be put into the right form before most functions can use it. So expecting Python to be able to 'one-line' something is beyond its capabilities.
Just for comparison, MATLAB doesn't do polymorphism very well either but it's much less noticeable as by default it keeps all numeric data as a 2D array so the majority of functions will work with the majority of data - making it soooo much easier
EDIT
I'm pretty new to Python and struggling to create new arrays out of existing arrays:
In MATLAB the following works to create a column vector from other column vectors:
a = [b(5:6); c(N:M); d(1:P); e(Q)]
with a lot of computational flexibility (Q could be a vector for example).
In Python, I can't find a nice command to add multiple 1D NumPy arrays together and it seems to have lots of issues with single values as it changes them from NumPy arrays to some other format, WHY?!
Can anyone give me a single line of code to carry out the above? It'd be great to see - so far all I've got is lines and lines of checking for the indexing variables (N, M, P, Q) and soooo many np.array(..)'s everywhere to try and keep things the same data type.
I've tried np.append but that doesn't work for multiple vectors (I could nest them but that seems very ugly, esp if I need to add many arrays) and np.concatenate complains that something is 0-dimensional, I don't understand that at all.

concatenate has no problems with a bunch of 1d array:
In [52]: np.concatenate([np.array([1,2,3]), np.ones((2,)), np.array([1])])
Out[52]: array([1., 2., 3., 1., 1., 1.])
If one argument is scalar:
In [53]: np.concatenate([np.array([1,2,3]), np.ones((2,)), 1])
Traceback (most recent call last):
File "<ipython-input-53-51b00c09f677>", line 1, in <module>
np.concatenate([np.array([1,2,3]), np.ones((2,)), 1])
File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 2 has 0 dimension(s)
It make a array from the last
In [57]: np.array(1)
Out[57]: array(1)
but that's a 0d array. In MATLAB that would be 2d - everything is 2d, there's not true scalars. Remember, numpy works in python, which has scalars and lists. MATLAB is matrices all the way down ...
Also numpy arrays can be 0d or 1d. There's no artificial 2d lower bound. It's a general array language, not just matrices. In MATLAB even 3d is a tweak on the original 2d.
hstack adds a tweak to make sure all arguments are at least 1d:
In [54]: np.hstack([np.array([1,2,3]), np.ones((2,)), 1])
Out[54]: array([1., 2., 3., 1., 1., 1.])
Even in MATLAB/Octave mismatched dimensions give problems:
>> a = [3:5; [1,2,3]]
a =
3 4 5
1 2 3
>> a = [3:5; [1,2,3]; 4]
error: vertical dimensions mismatch (2x3 vs 1x1)
>> a = [3:5; [1,2,3,4]]
error: vertical dimensions mismatch (1x3 vs 1x4)
>> a = [3:5, [1,2,3,4]]
a =
3 4 5 1 2 3 4
>> a = [3:5, [1,2,3,4],5]
a =
3 4 5 1 2 3 4 5

you would want to look on
np.concatenate
which take sequence of arrays and axis to concatnate them on

Related

Non-consecutive slicing of a multidimensional array in Python

I am trying to perform non-consectuitive slicing of a multidimensional array like this (Matlab peudo code)
A = B(:,:,[1,3],[2,4,6]) %A and B are two 4D matrices
But when I try to write this code in Python:
A = B[:,:,np.array([0,2]),np.array([1,3,5])] #A and B are two 4D arrays
it gives an error: IndexError: shape mismatch: indexing arrays could not be broadcast...
It should be noted that slicing for one dimension each time works fine!
In numpy, if you use more than one fancy index (i.e. array) to index different dimension of the same array at the same time, they must broadcast. This is designed such that indexing can be more powerful. For your situation, the simplest way to solve the problem is indexing twice:
B[:, :, [0,2]] [..., [1,3,5]]
where ... stands for as many : as possible.
Indexing twice this way would generate some extra data moving time. If you want to index only once, make sure they broadcast (i.e. put fancy indices on different dimension):
B[:, :, np.array([0,2])[:,None], [1,3,5]]
which will result in a X by Y by 2 by 3 array. On the other hand, you can also do
B[:, :, [0,2], np.array([1,3,5])[:,None]]
which will result in a X by Y by 3 by 2 array. The [1,3,5] axis is transposed before the [0,2] axis.
Yon don't have to use np.array([0,2]) if you don't need to do fancy operation with it. Simply [0,2] is fine.
np.array([0,2])[:,None] is equivalent to [[0],[2]], where the point of [:,None] is to create an extra dimension such that the shape becomes (2,1). Shape (2,) and (3,) cannot broadcast, while shape (2,1) and (3,) can, which becomes (2,3).

How to assign a 1D numpy array of length x to an element of length y of a 2D Numpy Array?

I'm looking for a way to assign a 1D numpy-array consisting of x elements to a 2D numpy Array of shape (y,z).
Example:
A=np.array([[0],[0],[0]])
A[2]=np.array([0,2])
Which should result in
A=[[0],[0],[0,2]]
This works perfectly fine using a python list, but has been causing me huge trouble when trying to do it in numpy, usually resulting in the error message:
could not broadcast input array from shape (z) into shape (x)
This seems to occur as a result of the fact that numpy copies everything instead of modifying the array in place. I have only recently begun using numpy and would really be grateful if someone could help find a way to do this efficiently.
Actually the issue is that Numpy refuses to perform implicit copies or reshapes. For instance:
>>> A=np.array([[0],[0],[0]])
>>> A[2]=np.array([0,2])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not broadcast input array from shape (2) into shape (1)
Here A[2] is a subarray of A, of shape 1. 2 cells can't fit in 1, so we get shape error. The reverse situation is possible and known as broadcasting:
>>> A[0:2]=5
>>> A
array([[5],
[5],
[0]])
Here a single scalar has been broadcast to update the entire subarray. We can resize A to be able to fit the shape 2 entry:
>>> A.shape
(3, 1)
>>> A.resize((3,2))
>>> A.shape
(3, 2)
>>> A[2]=np.array([0,2])
>>> A
array([[5, 5],
[0, 0],
[0, 2]])
We can see that the resizing actually reorganized our cells. It still starts with 5 5 0 but the cells are no longer along a single column. This is because numpy doesn't copy unless asked to, either; all of our multicell slices in fact refer into the same original array. We can make a second matrix and copy the original into a single column there:
>>> B=np.zeros((A.shape[0]+1,A.shape[1]))
>>> B[:,0]=A.transpose()
>>> B
array([[ 5., 0.],
[ 5., 0.],
[ 0., 0.]])
The transpose is because the slice of B is 1-dimensional shape (3 long) rather than a 2-dimensional shape like A (which is 1 wide and 3 high). Numpy considers the 1-dimensional array to be a horisontal shape, so a 3 wide and 1 high matrix will fit. You could think of it like copying a range of cells in a spreadsheet.
Notably, the numbers thus placed in B are copies of what was in A. This is because we did a modification of B. Views can be used to manipulate sections of a matrix (including seeing it in another shape, like transpose() does), for instance:
>>> C=B[::-1,1]
>>> C
array([ 0., 0., 0.])
>>> C[:]=[1,2,3]
>>> B
array([[ 5., 3.],
[ 5., 2.],
[ 0., 1.]])

Large point-matrix array multiplication in numpy

Given two large numpy arrays, one for a list of 3D points, and another for a list of transformation matrices. Assuming there is a 1 to 1 correspondence between the two lists, i'm looking for the best way to calculate the result array of each point transformed by it's corresponding matrix.
My solution to do this was to use slicing (see "test4" in the example code below) which worked fine with small arrays, but fails with large arrays because of how memory-wasteful my method is :)
import numpy as np
COUNT = 100
matrix = np.random.random_sample((3,3,)) # A single matrix
matrices = np.random.random_sample((COUNT,3,3,)) # Many matrices
point = np.random.random_sample((3,)) # A single point
points = np.random.random_sample((COUNT,3,)) # Many points
# Test 1, result of a single point multiplied by a single matrix
# This is as easy as it gets
test1 = np.dot(point,matrix)
print 'done'
# Test 2, result of a single point multiplied by many matrices
# This works well and returns a transformed point for each matrix
test2 = np.dot(point,matrices)
print 'done'
# Test 3, result of many points multiplied by a single matrix
# This works also just fine
test3 = np.dot(points,matrix)
print 'done'
# Test 4, this is the case i'm trying to solve. Assuming there's a 1-1
# correspondence between the point and matrix arrays, the result i want
# is an array of points, where each point has been transformed by it's
# corresponding matrix
test4 = np.zeros((COUNT,3))
for i in xrange(COUNT):
test4[i] = np.dot(points[i],matrices[i])
print 'done'
With a small array, this works fine. With large arrays, (COUNT=1000000) Test #4 works but gets rather slow.
Is there a way to make Test #4 faster? Presuming without using a loop?
You can use numpy.einsum. Here's an example with 5 matrices and 5 points:
In [49]: matrices.shape
Out[49]: (5, 3, 3)
In [50]: points.shape
Out[50]: (5, 3)
In [51]: p = np.einsum('ijk,ik->ij', matrices, points)
In [52]: p[0]
Out[52]: array([ 1.16532051, 0.95155227, 1.5130032 ])
In [53]: matrices[0].dot(points[0])
Out[53]: array([ 1.16532051, 0.95155227, 1.5130032 ])
In [54]: p[1]
Out[54]: array([ 0.79929572, 0.32048587, 0.81462493])
In [55]: matrices[1].dot(points[1])
Out[55]: array([ 0.79929572, 0.32048587, 0.81462493])
The above is doing matrix[i] * points[i] (i.e. multiplying on the right), but I just reread the question and noticed that your code uses points[i] * matrix[i]. You can do that by switching the indices and arguments of einsum:
In [76]: lp = np.einsum('ij,ijk->ik', points, matrices)
In [77]: lp[0]
Out[77]: array([ 1.39510822, 1.12011057, 1.05704609])
In [78]: points[0].dot(matrices[0])
Out[78]: array([ 1.39510822, 1.12011057, 1.05704609])
In [79]: lp[1]
Out[79]: array([ 0.49750324, 0.70664634, 0.7142573 ])
In [80]: points[1].dot(matrices[1])
Out[80]: array([ 0.49750324, 0.70664634, 0.7142573 ])
It doesn't make much sense to have multiple transform matrices. You can combine transform matrices as in this question:
If I want to apply matrix A, then B, then C, I will multiply the matrices in reverse order np.dot(C,np.dot(B,A))
So you can save some memory space by precomputing that matrix. Then applying a bunch of vectors to one transform matrix should be easily handled (within reason).
I don't know why you would need one million transformation on one million vectors, but I would suggest buying a larger RAM.
Edit:
There isn't a way to reduce the operations, no. Unless your transformation matrices hold a specific property such as sparsity, diagonality, etc. you're going to have to run all multiplications and summations. However, the way you process these operations can be optimized across cores and/or using vector operations on GPUs.
Also, python is notably slow. You can try splitting numpy across your cores using NumExpr. Or maybe use a BLAS framework on C++ (notably quick ;))

python numpy ValueError: operands could not be broadcast together with shapes

In numpy, I have two "arrays", X is (m,n) and y is a vector (n,1)
using
X*y
I am getting the error
ValueError: operands could not be broadcast together with shapes (97,2) (2,1)
When (97,2)x(2,1) is clearly a legal matrix operation and should give me a (97,1) vector
EDIT:
I have corrected this using X.dot(y) but the original question still remains.
dot is matrix multiplication, but * does something else.
We have two arrays:
X, shape (97,2)
y, shape (2,1)
With Numpy arrays, the operation
X * y
is done element-wise, but one or both of the values can be expanded in one or more dimensions to make them compatible. This operation is called broadcasting. Dimensions, where size is 1 or which are missing, can be used in broadcasting.
In the example above the dimensions are incompatible, because:
97 2
2 1
Here there are conflicting numbers in the first dimension (97 and 2). That is what the ValueError above is complaining about. The second dimension would be ok, as number 1 does not conflict with anything.
For more information on broadcasting rules: http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
(Please note that if X and y are of type numpy.matrix, then asterisk can be used as matrix multiplication. My recommendation is to keep away from numpy.matrix, it tends to complicate more than simplifying things.)
Your arrays should be fine with numpy.dot; if you get an error on numpy.dot, you must have some other bug. If the shapes are wrong for numpy.dot, you get a different exception:
ValueError: matrices are not aligned
If you still get this error, please post a minimal example of the problem. An example multiplication with arrays shaped like yours succeeds:
In [1]: import numpy
In [2]: numpy.dot(numpy.ones([97, 2]), numpy.ones([2, 1])).shape
Out[2]: (97, 1)
Per numpy docs:
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when:
they are equal, or
one of them is 1
In other words, if you are trying to multiply two matrices (in the linear algebra sense) then you want X.dot(y) but if you are trying to broadcast scalars from matrix y onto X then you need to perform X * y.T.
Example:
>>> import numpy as np
>>>
>>> X = np.arange(8).reshape(4, 2)
>>> y = np.arange(2).reshape(1, 2) # create a 1x2 matrix
>>> X * y
array([[0,1],
[0,3],
[0,5],
[0,7]])
You are looking for np.matmul(X, y). In Python 3.5+ you can use X # y.
It's possible that the error didn't occur in the dot product, but after.
For example try this
a = np.random.randn(12,1)
b = np.random.randn(1,5)
c = np.random.randn(5,12)
d = np.dot(a,b) * c
np.dot(a,b) will be fine; however np.dot(a, b) * c is clearly wrong (12x1 X 1x5 = 12x5 which cannot element-wise multiply 5x12) but numpy will give you
ValueError: operands could not be broadcast together with shapes (12,1) (1,5)
The error is misleading; however there is an issue on that line.
Use np.mat(x) * np.mat(y), that'll work.
We might confuse ourselves that a * b is a dot product.
But in fact, it is broadcast.
Dot Product :
a.dot(b)
Broadcast:
The term broadcasting refers to how numpy treats arrays with different
dimensions during arithmetic operations which lead to certain
constraints, the smaller array is broadcast across the larger array so
that they have compatible shapes.
(m,n) +-/* (1,n) → (m,n) : the operation will be applied to m rows
Convert the arrays to matrices, and then perform the multiplication.
X = np.matrix(X)
y = np.matrix(y)
X*y
we should consider two points about broadcasting.
first: what is possible.
second: how much of the possible things is done by numpy.
I know it might look a bit confusing, but I will make it clear by some example.
lets start from the zero level.
suppose we have two matrices. first matrix has three dimensions (named A) and the second has five (named B).
numpy tries to match last/trailing dimensions. so numpy does not care about the first two dimensions of B.
then numpy compares those trailing dimensions with each other. and if and only if they be equal or one of them be 1, numpy says "O.K. you two match". and if it these conditions don't satisfy, numpy would "sorry...its not my job!".
But I know that you may say comparison was better to be done in way that can handle when they are devisable(4 and 2 / 9 and 3). you might say it could be replicated/broadcasted by a whole number(2/3 in out example). and i am agree with you. and this is the reason I started my discussion with a distinction between what is possible and what is the capability of numpy.
This is because X and y are not the same types. for example X is a numpy matrix and y is a numpy array!
Error: operands could not be broadcast together with shapes (2,3) (2,3,3)
This kind of error occur when the two array does not have the same shape.
to correct this you need reshape one array to match the other.
see example below
a1 = array([1, 2, 3]), shape = (2,3)
a3 =array([[[1., 2., 3.],
[2., 3., 2.],
[2., 4., 5.]],
[[1., 0., 3.],
[2., 3., 7.],
[2., 4., 6.]]])
with shape = (2,3,3)
IF i try to run np.multiply(a2,a3) it will return the error below
Error: operands could not be broadcast together with shapes (2,3) (2,3,3)
to solve this check out the broadcating rules
which state hat Two dimensions are compatible when:
#1.they are equal, or
#2.one of them is 1`
Therefore lets reshape a2.
reshaped = a2.reshape(2,3,1)
Now try to run np.multiply(reshaped,a3)
the multiplication will run SUCCESSFUL!!
ValueError: operands could not be broadcast together with shapes (x ,y) (a ,b)
where x ,y are variables
Basically this error occurred when value of y (no. of columns) doesn't equal to the number of elements in another multidimensional array.
Now let's go through by ex=>
coding apart
import numpy as np
arr1= np.arange(12).reshape(3,
output of arr1
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
arr2= np.arange(4).reshape(1,4)
or (both are same 1 rows and 4 columns)
arr2= np.arange(4)
ouput of arr2=>
array([0, 1, 2, 3])
no of elements in arr2 is equal no of no. of the columns in arr1 it will be excute.
for x,y in np.nditer([a,b]):
print(x,y)
output =>
0 0
1 1
2 2
3 3
4 0
5 1
6 2
7 3
8 0
9 1
10 2
11 3

Create a dynamic 2D numpy array on the fly

I am having a hard time creating a numpy 2D array on the fly.
So basically I have a for loop something like this.
for ele in huge_list_of_lists:
instance = np.array(ele)
creates a 1D numpy array of this list and now I want to append it to a numpy array so basically converting list of lists to array of arrays?
I have checked the manual.. and np.append() methods that doesn't work as for np.append() to work, it needs two arguments to append it together.
Any clues?
Create the 2D array up front, and fill the rows while looping:
my_array = numpy.empty((len(huge_list_of_lists), row_length))
for i, x in enumerate(huge_list_of_lists):
my_array[i] = create_row(x)
where create_row() returns a list or 1D NumPy array of length row_length.
Depending on what create_row() does, there might be even better approaches that avoid the Python loop altogether.
Just pass the list of lists to numpy.array, keep in mind that numpy arrays are ndarrays, so the concept to a list of lists doesn't translate to arrays of arrays it translates to a 2d array.
>>> import numpy as np
>>> a = [[1., 2., 3.], [4., 5., 6.]]
>>> b = np.array(a)
>>> b
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
>>> b.shape
(2, 3)
Also ndarrays have nd-indexing so [1][1] becomes [1, 1] in numpy:
>>> a[1][1]
5.0
>>> b[1, 1]
5.0
Did I misunderstand your question?
You defiantly don't want to use numpy.append for something like this. Keep in mind that numpy.append has O(n) run time so if you call it n times, once for each row of your array, you end up with a O(n^2) algorithm. If you need to create the array before you know what all the content is going to be, but you know the final size, it's best to create an array using numpy.zeros(shape, dtype) and fill it in later. Similar to Sven's answer.
import numpy as np
ss = np.ndarray(shape=(3,3), dtype=int);
array([[ 0, 139911262763080, 139911320845424],
[ 10771584, 10771584, 139911271110728],
[139911320994680, 139911206874808, 80]]) #random
numpy.ndarray function achieves this. numpy.ndarray

Categories

Resources