Numpy `ValueError: operands could not be broadcast together with shape ...` - python

Im using python 2.7 and am attempting a forcasting on some random data from 1.00000000 to 3.0000000008. There are approx 196 items in my array and I get the error
ValueError: operands could not be broadcast together with shape (2) (50)
I do not seem to be able to resolve this issue on my own. Any help or links to relevant documentation would be greatly appreciated.
Here is the code I am using that generates this error
nsample = 50
sig = 0.25
x1 = np.linspace(0,20, nsample)
X = np.c_[x1, np.sin(x1), (x1-5)**2, np.ones(nsample)]
beta = masterAverageList
y_true = ((X, beta))
y = y_true + sig * np.random.normal(size=nsample)

If X and beta do not have the same shape as the second term in the rhs of your last line (i.e. nsample), then you will get this type of error. To add an array to a tuple of arrays, they all must be the same shape.
I would recommend looking at the numpy broadcasting rules.

Related

Numpy Value Error: operands could not be broadcast together with shapes (1,99) (100,)

I am attemping to define y equal to (x × 0.15) + N where N is a 100 element 1D array of random values chosen from a Gaussian distribution with mean 0.0 and standard deviation 0.5 but I keep getting the error code : ValueError: operands could not be broadcast together with shapes (1,99) (100,) . Any tips on how I can revise my code to make it work would be much appreciated, thanks.
N = np.random.normal (0, 0.5,size=(100))
y = (np.dot([x],0.15)) + N
It would be helpful if you included the x array in your example above. Python is complaining that (np.dot([x],0.15)) has dimensions np.shape(1,99), which means that x has the same shape. You could x.flatten() to make it np.shape(99), but then you still have the problem that x has 99 elements, and N has 100, and python does not know how to add two arrays of dissimilar length.
Assuming this is somehow wrong, and that x should have had 100 elements, you could just do
x = np.arange(100) # Example x-array, use your own in stead
N = np.random.normal(0, 0.5,size=x.shape)
y = x*0.15 + N
since all (np.dot([x],0.15)) does in this case is to multiply each element of x with 0.15, which much simpler written as the above.

Python - y should be a 1d array, got an array of shape instead

Let's consider data :
import numpy as np
from sklearn.linear_model import LogisticRegression
x=np.linspace(0,2*np.pi,80)
x = x.reshape(-1,1)
y = np.sin(x)+np.random.normal(0,0.4,80)
y[y<1/2] = 0
y[y>1/2] = 1
clf=LogisticRegression(solver="saga", max_iter = 1000)
I want to fit logistic regression where y is dependent variable, and x is independent variable. But while I'm using :
clf.fit(x,y)
I see error
'y should be a 1d array, got an array of shape (80, 80) instead'.
I tried to reshape data by using
y=y.reshape(-1,1)
But I end up with array of length 6400! (How come?)
Could you please give me a hand with performing this regression ?
Change the order of your operations:
First geneate x and y as 1-D arrays:
x = np.linspace(0, 2*np.pi, 8)
y = np.sin(x) + np.random.normal(0, 0.4, 8)
Then (after y was generated) reshape x:
x = x.reshape(-1, 1)
Edit following a comment as of 2022-02-20
The source of the problem in the original code is that;
x = np.linspace(0,2*np.pi,80) - generates a 1-D array.
x = x.reshape(-1,1) - reshapes it into a 2-D array, with one column and
as many rows as needed.
y = np.sin(x) + np.random.normal(0,0.4,80) - operates on a columnar array and
a 1-D array (treated here as a single row array).
the effect is that y is a 2-D array (80 * 80).
then the attempt to reshape y gives a single column array with 6400 rows.
The proper solution is that both x and y should be initially 1-D
(single row) arrays and my code does just this.
Then both arrays can be reshaped.
I encountered this error and solving it via reshape but it didn't work
ValueError: y should be a 1d array, got an array of shape () instead.
Actually, this was happening due to the wrong placement of [] brackets around np.argmax, below is the wrong code and correct one, notice the positioning of [] around the np.argmax in both the snippets
Wrong Code
ax[i,j].set_title("Predicted Watch : "+str(le.inverse_transform([pred_digits[prop_class[count]]])) +"\n"+"Actual Watch : "+str(le.inverse_transform(np.argmax([y_test[prop_class[count]]])).reshape(-1,1)))
Correct Code
ax[i,j].set_title("Predicted Watch :"+str(le.inverse_transform([pred_digits[prop_class[count]]]))+"\n"+"Actual Watch : "+str(le.inverse_transform([np.argmax(y_test[prop_class[count]])])))

ValueError: could not broadcast input array from shape (25,1) into shape (25)

when I am trying to run this simple snippet of code
a= 2
G = np.random.rand(25,1)
H = np.zeros((25,a))
for i in range(a):
H[:,i] = .5 * G
I receive the
ValueError: could not broadcast input array from shape (25,1) into shape (25).
I wonder if anyone can point at a solution to this problem?
I know it happens quite a bit in image processing, but this one, I don't knwo how to circumvent.
Cheers.
To fix this you use the first column of G:
for i in range(a):
H[:,i] = .5 * G[:, 0]
Numpy broadcasting basically attempts to match dimensions of arrays (when broadcasting) by starting with the last dimension and moving to the first. In this case the second dimension of G (1) gets broadcast to 25 (the first and only dimension of H[:, i]. The first dimension of G does not match with anything. You can read more about numpy broadcasting rules here.
Note: you really don't need that for loop. H is just G column repeated twice. You can accomplish that in various ways (e.g. np.tile, np.hstack, etc.)
H = np.tile(G / 2, 2)

Fast iteration over vectors in a multidimensional numpy array

I'm writing some python + numpy + cython code, and am trying to find the most elegant and efficient way of doing the following kind of iteration over an array:
Let's say I have a function f(x, y) that takes a vector x of shape (3,) and a vector y of shape (10,) and returns a vector of shape (10,). Now I have two arrays X and Y of shape sx + (3,) and sy + (10,), where the sx and sy are two shapes that can be broadcast together (i.e. either sx == sy, or when an axis differs, one of the two has length 1, in which case it will be repeated). I want to produce an array Z that has the shape zs + (10,), where zs is the shape of the broadcasting of sx with sy. Each 10 dimensional vector in Z is equal to f(x, y) of the vectors x and y at the corresponding locations in X and Y.
I looked into np.nditer and while it plays nice with cython (see bottom of linked page), it doesn't seem to allow iterating over vectors from a multidimensional array, instead of elements. I also looked at index grids, but the problem there is that cython indexing is only fast when the number of indexes is equal to the dimensionality of the array, and are stored as cython integers instead of python tuples.
Any help is greatly appreciated!
You are describing what Numpy calls a Generalized Universal FUNCtion, or gufunc. As it name suggests, it is an extension of ufuncs. You probably want to start by reading these two pages:
Writing your own ufunc
Building a ufunc from scratch
The second example uses Cython and has some material on gufuncs. To fully go down the gufunc road, you will need to read the corresponding section in the numpy C API documentation:
Generalized Universal Function API
I do not know of any example of gufuncs being coded in Cython, although it shouldn't be too hard to do following the examples above. If you want to look at gufuncs coded in C, you can take a look at the source code for np.linalg here, although that can be a daunting experience. A while back I bored my local Python User Group to death giving a talk on extending numpy with C, which was mostly about writing gufuncs in C, the slides of that talk and a sample Python module providing a new gufunc can be found here.
If you want to stick with nditer, here's a way using your example dimensions. It's pure Python here, but shouldn't be hard to implement with cython (though it still has the tuple iterator). I'm borrowing ideas from ndindex as described in shallow iteration with nditer
The idea is to find the common broadcasting shape, sz, and construct a multi_index iterator over it.
I'm using as_strided to expand X and Y to usable views, and passing the appropriate vectors (actually (1,n) arrays) to the f(x,y) function.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def f(x,y):
# sample that takes (10,) and (3,) arrays, and returns (10,) array
assert x.shape==(1,10), x.shape
assert y.shape==(1,3), y.shape
z = x*10 + y.mean()
return z
def brdcast(X, X1):
# broadcast X to shape of X1 (keep last dim of X)
# modeled on np.broadcast_arrays
shape = X1.shape + (X.shape[-1],)
strides = X1.strides + (X.strides[-1],)
X1 = as_strided(X, shape=shape, strides=strides)
return X1
def F(X, Y):
X1, Y1 = np.broadcast_arrays(X[...,0], Y[...,0])
Z = np.zeros(X1.shape + (10,))
it = np.nditer(X1, flags=['multi_index'])
X1 = brdcast(X, X1)
Y1 = brdcast(Y, Y1)
while not it.finished:
I = it.multi_index + (None,)
Z[I] = f(X1[I], Y1[I])
it.iternext()
return Z
sx = (2,3) # works with (2,1)
sy = (1,3)
# X, Y = np.ones(sx+(10,)), np.ones(sy+(3,))
X = np.repeat(np.arange(np.prod(sx)).reshape(sx)[...,None], 10, axis=-1)
Y = np.repeat(np.arange(np.prod(sy)).reshape(sy)[...,None], 3, axis=-1)
Z = F(X,Y)
print Z.shape
print Z[...,0]

Python- list with arrays shape issues

I'm creating a code to run a perceptron algorithm and I can't create a random matrix the way I need it:
from random import choice
from numpy import array, dot, random
unit_step = lambda x: -1 if x < 0 else 1
import numpy as np
m=3 #this will be the number of rows
allys=[]
for j in range(m):
aa=np.random.rand(1,3)
tt=np.random.rand(3)
yy=dot(aa,tt)
ally = [aa, yy]
allys.append(ally)
print "allys", allys
w = random.rand(3)
errors = []
eta = 0.2
n = 10
x=[1,3]
for i in xrange(n):
print i
x, expected = choice(allys)
print "x", x
And I get the problem here:
result = dot(x,w)
error = expected - unit_step(result)
errors.append(error)
w += eta * error * x
print x, expected, w, error, result, errors
The log says
w += eta * error * x
ValueError: non-broadcastable output operand with shape (3,) doesn't
match the broadcast shape (1,3)
The idea is to get result looping randomly over the "table" allys.
How can I solve this? What is shape(3,)?
Thanks!
The error message actually tells you what is wrong. The result of your multiplication is a (1,3) array (2D array, one row, three columns), whereas you try to add it into a 3-element vector.
Both arrays have three elements in a row, so if you do this:
w = w + eta * error * x
there will be no error on that line, but the resulting vector will actually be a (1,3) array. That is unwanted, because then your dot does not work.
There are several ways to fix the problem, but possibly the easiest to read is to reshape x for the calculation to be a 3-element vector (1D array):
w += eta * error * x.reshape(3,)
Possibly a cleaner solution would be to define w as a (1,3) 2D array, as well, and then transpose w for the dot. This is really a matter of taste.
For numpy arrays, shape attribute returns array's dimensionality. In your case, w.shape is (3,). This means that w is a one-dimensional array with 3 elements. In turn, x.shape is (1,3), which means that x is a two-dimensional array with one row and 3 columns. You are getting an error, because the interpreter is confused on how to match the shapes. I am not sure what you are trying to do, so it's hard to suggest the solution. But you might want to try reshaping one of the arrays. For example, x = x.reshape((3,)) for adapting the shape of x to w.

Categories

Resources