Minimum difference of Numpy arrays - python

I have two 3-dimensional Numpy arrays of the same size. Their entries are similar, but not quite the same. I would like to shift one array in all three space dimensions, so that the difference between both arrays is minimal.
I tried to write a function with arguments
- list of lengths I like to shift the array,
- array 1,
- array 2.
But I do not know how I can minimize this function, I tried using scipy.optimize.minimize, but failed:
import numpy as np
from scipy.optimize import minimize
def array_diff(shift, array1, array2):
roll = np.roll(np.roll(np.roll(array2, shift[0], axis=0), shift[1], axis=1), shift[2], axis=2)
diff = np.abs(np.subtract(array1, roll))
diffs = np.sum(diff)
return diffs
def opt_diff(func, array1, array2):
opt = minimize(func, x0=np.zeros(3), args=(array1, array2))
return opt
min_diff = opt_diff(array_diff, array1, array2)
This gives an error message regarding roll = np.roll(...) It says "slice indices must be integers or have an index method". I guess, that I am using the minimize function nor correctly, but have no idea, how to fix it.
My goal is to minimize the function img_diff and get the minimum sum of all entries of the difference array. As a result I would like to have the three parameters shift[0], shift[1] and shift[2] for shift in y-, x-, and z-direction.
Thank you for all your help.

This gives an error message regarding roll = np.roll(...) It says
"slice indices must be integers or have an index method".
np.roll requires an integer for the shift parameter. np.zeros creates an array of floats. Specify an integer type for x0:
x0=np.zeros(3,dtype=np.int32)
x0=np.zeros(3)
x0
Out[3]: array([ 0., 0., 0.])
x0[0]
Out[4]: 0.0
x0=np.zeros(3,dtype=np.int32)
x0[0]
Out[6]: 0
scipy.optimize.minimize will try to adjust x0 by fractions so maybe just add a statement to array_diff:
def array_diff(shift, array1, array2):
shift = shift.astype(np.int32)
...

Related

Numpy using multidimensional array to index a 1D array

I do not understand the following code, i.e. the last part of it.
max = np.max(rel_coords, axis=0)
min = np.min(rel_coords, axis=0)
bins = [np.arange(low, high) for low, high in zip(min, max)]
new_coord = np.array(np.meshgrid(*bins)).T
coord_norms = norm(new_coord, axis=-1).round().astype(int)
bin_count = np.bincount(coord_norms.flatten())
new_count = bin_count[coord_norms]
Can someone explain how I can index a 1-D array (bin_count) using a 2-D array (coord_norms)?
I do understand numpy broadcasting and advanced indexing, but would like to understand what`s going on behind the scenes in this case. Does bin_count first get broadcasted to the same shape as coord_norms?
How does Python assign the values in new_data?

How to test if two sparse arrays are (almost) equal?

I want to check if two sparse arrays are (almost) equal. Whereas for numpy arrays you can do:
import numpy as np
a = np.ones(200)
np.testing.assert_array_almost_equal(a, a)
This does not work for sparse arrays, which I can understand (either returns error AttributeError: ravel not found for smaller matrices or errors related to size of array). Is there a scipy equivalent to test sparse matrices? I could convert my sparse matrices are to dense matrices and use the numpy testing function, but sometimes this is not possible due to (memory/size) constraints. E.g.:
from scipy import sparse
b = sparse.rand(80000,8000,density=0.01)
type(b) # <class 'scipy.sparse.coo.coo_matrix'>
c = b.toarray() # ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
Is it possible to test these larger scipy arrays for equality, or should I test smaller samples?
Assuming that are we are not concerned with the non-zeros in one that array that might be within the tolerance value, we can simply get the row, col indices and the corresponding values and look for exact matches between the indices, while allclose() match for the values.
Hence, the implementation would be -
from scipy.sparse import find
def allclose(A, B, atol = 1e-8):
# If you want to check matrix shapes as well
if np.array_equal(A.shape, B.shape)==0:
return False
r1,c1,v1 = find(A)
r2,c2,v2 = find(B)
index_match = np.array_equal(r1,r2) & np.array_equal(c1,c2)
if index_match==0:
return False
else:
return np.allclose(v1,v2, atol=atol)
Here's another with nonzero and data methods to replace find function -
def allclose_v2(A, B, atol = 1e-8):
# If you want to check matrix shapes as well
if np.array_equal(A.shape, B.shape)==0:
return False
r1,c1 = A.nonzero()
r2,c2 = B.nonzero()
lidx1 = np.ravel_multi_index((r1,c1), A.shape)
lidx2 = np.ravel_multi_index((r2,c2), B.shape)
sidx1 = lidx1.argsort()
sidx2 = lidx2.argsort()
index_match = np.array_equal(lidx1[sidx1], lidx2[sidx2])
if index_match==0:
return False
else:
v1 = A.data
v2 = B.data
V1 = v1[sidx1]
V2 = v2[sidx2]
return np.allclose(V1,V2, atol=atol)
We can short-circuit at few places to speed it up further. On performance, I am focusing more at cases where only the values differ.

Python Numpy error : setting an array element with a sequence

I'm quite new to Python and Numpy, so I apologize if I'm missing something obvious here.
I have a function that solves a system of 2 differential equations :
import numpy as np
import numpy.linalg as la
def solve_ode(x0, a0, beta, t):
At = np.array([[0.23*t, (-10**5)*t], [0, -beta*t]], dtype=np.float32)
# get eigenvalues and eigenvectors
evals, V = la.eig(At)
Vi = la.inv(V)
# get e^At coeff
eAt = V # np.exp(evals) # Vi
xt = eAt*x0
return xt
However, running it with this code :
import matplotlib.pyplot as plt
# initial values
x0 = 10**6
a0 = 2.5
beta = 0.05
t = np.linspace(0, 3600, 360)
plt.semilogy(t, solve_ode(x0, a0, beta, t))
... throws this error :
ValueError: setting an array element with a sequence.
At this line :
At = np.array([[0.23*t, (-10**5)*t], [0, -beta*t]], dtype=np.float32)
Note that t and beta are supposed to be floats. I think Python might not be able to infer this but I don't know how I could do this...
Thx in advance for your help.
You are supplying t as a numpy array of shape 360 from linspace and not simply a float. The resulting At numpy array you are trying to create is then ill formed as all columns must be the same length. In python there is an important difference between lists and numpy arrays. For example, you could do what you have here as a list of lists, e.g.
At = [[0.23*t, (-10**5)*t], [0, -beta*t]]
with dimensions [[360 x 360] x [1 x 360]].
Alternatively, if all elements of At are the length of t the array would work,
At = np.array([[0.23*t, (-10**5)*t], [t, -beta*t]], dtype=np.float32)
with shape [2, 2, 360].
When you give a list or a list of lists, or in this case, a list of list of listss, all of them should have the same length, so that numpy can automatically infer the dimensions (shape) of the resulting matrix.
In your example, it's all correctly put, except the part you put 0 as a column I guess. Not sure what to call it though, cause your expected output is a cube I suppose.
You can fix it by giving the correct number of zeros as bellow:
At = np.array([[0.23*t, (-10**5)*t], [np.zeros(len(t)), -beta*t]], dtype=np.float32)
But check the .shape of the resulting array, and make sure it's what you want.
As others note the problem is the 0 in the inner list. It doesn't match the 360 length arrays generated by the other expressions. np.array can make an object dtype array from that (2x2), but can't make a float one.
At = np.array([[0.23*t, (-10**5)*t], [0*t, -beta*t]])
produces a (2,2,360) array. But I suspect the rest of that function is built around the assumption that At is (2,2) - a 2d square array with eig, inv etc.
What is the return xt supposed to be?
Does this work?
S = np.array([solve_ode(x0, a0, beta, i) for i in t])
giving a 1d array with the same number of values as in t?
I'm not suggesting this is the fastest way of solving the problem, but it's the simplest, especially if you are only generating 360 values.

Scipy interpolate returns a 'dimensionless' array

I understand that interp1d expects an array of values to interpolate, but the behavior when passing it a float is strange enough to ask what is going on and what exactly is being returned
import numpy as np
from scipy.interpolate import interp1d
x = np.array([1,2,3,4])
y = np.array([5,7,9,15])
f = interp1d(x,y, kind='cubic')
a = f(2.5)
print(repr(a))
print("type is {}".format(type(a)))
print("shape is {}".format(a.shape))
print("ndim is {}".format(a.ndim))
print(a)
Output:
array(7.749999999999992)
type is <class 'numpy.ndarray'>
shape is ()
ndim is 0
7.749999999999992
EDIT: To clarify, I would not expect numpy to even have a dimensionless, shapeless array much less a scipy function return one.
print("Numpy version is {}".format(np.__version__))
print("Scipy version is {}".format(scipy.__version__))
Numpy version is 1.10.4
Scipy version is 0.17.0
The interp1d returns a value that matches the input in shape - after wrapping in np.array() if needed:
In [324]: f([1,2,3])
Out[324]: array([ 5., 7., 9.])
In [325]: f([2.5])
Out[325]: array([ 7.75])
In [326]: f(2.5)
Out[326]: array(7.75)
In [327]: f(np.array(2.5))
Out[327]: array(7.75)
Many numpy operations do return scalars instead of 0d arrays.
In [330]: np.arange(3).sum()
Out[330]: 3
though actually it returns a numpy object
In [341]: type(np.arange(3).sum())
Out[341]: numpy.int32
which does have a shape () and ndim 0.
Whereas interp1d returns an array.
In [344]: type(f(2.5))
Out[344]: numpy.ndarray
You can extract the value with [()] indexing
In [345]: f(2.5)[()]
Out[345]: 7.75
In [346]: type(f(2.5)[()])
Out[346]: numpy.float64
This may just be an oversight in the scipy code. How often do people want to interpolate at just one point? Isn't interpolating over a regular grid of points more common?
==================
The documentation for f.__call__ is quite explicit about returning an array.
Evaluate the interpolant
Parameters
----------
x : array_like
Points to evaluate the interpolant at.
Returns
-------
y : array_like
Interpolated values. Shape is determined by replacing
the interpolation axis in the original array with the shape of x.
===============
The other side to the question is why does numpy even have a 0d array. The linked answer probably is sufficient. But often the question is asked by people who are used to MATLAB. In MATLAB nearly everything is 2d. There aren't any (true) scalars. Now MATLAB has structures and cells, and matrices with more than 2 dimensions. But I recall a time (in the 1990s) when it didn't have those. Everything, literal, was a 2d matrix.
The np.matrix approximates that MATLAB case, fixing its arrays at 2d. But it does have a _collapse method that can return a 'scalar'.

python numpy ValueError: operands could not be broadcast together with shapes

In numpy, I have two "arrays", X is (m,n) and y is a vector (n,1)
using
X*y
I am getting the error
ValueError: operands could not be broadcast together with shapes (97,2) (2,1)
When (97,2)x(2,1) is clearly a legal matrix operation and should give me a (97,1) vector
EDIT:
I have corrected this using X.dot(y) but the original question still remains.
dot is matrix multiplication, but * does something else.
We have two arrays:
X, shape (97,2)
y, shape (2,1)
With Numpy arrays, the operation
X * y
is done element-wise, but one or both of the values can be expanded in one or more dimensions to make them compatible. This operation is called broadcasting. Dimensions, where size is 1 or which are missing, can be used in broadcasting.
In the example above the dimensions are incompatible, because:
97 2
2 1
Here there are conflicting numbers in the first dimension (97 and 2). That is what the ValueError above is complaining about. The second dimension would be ok, as number 1 does not conflict with anything.
For more information on broadcasting rules: http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
(Please note that if X and y are of type numpy.matrix, then asterisk can be used as matrix multiplication. My recommendation is to keep away from numpy.matrix, it tends to complicate more than simplifying things.)
Your arrays should be fine with numpy.dot; if you get an error on numpy.dot, you must have some other bug. If the shapes are wrong for numpy.dot, you get a different exception:
ValueError: matrices are not aligned
If you still get this error, please post a minimal example of the problem. An example multiplication with arrays shaped like yours succeeds:
In [1]: import numpy
In [2]: numpy.dot(numpy.ones([97, 2]), numpy.ones([2, 1])).shape
Out[2]: (97, 1)
Per numpy docs:
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when:
they are equal, or
one of them is 1
In other words, if you are trying to multiply two matrices (in the linear algebra sense) then you want X.dot(y) but if you are trying to broadcast scalars from matrix y onto X then you need to perform X * y.T.
Example:
>>> import numpy as np
>>>
>>> X = np.arange(8).reshape(4, 2)
>>> y = np.arange(2).reshape(1, 2) # create a 1x2 matrix
>>> X * y
array([[0,1],
[0,3],
[0,5],
[0,7]])
You are looking for np.matmul(X, y). In Python 3.5+ you can use X # y.
It's possible that the error didn't occur in the dot product, but after.
For example try this
a = np.random.randn(12,1)
b = np.random.randn(1,5)
c = np.random.randn(5,12)
d = np.dot(a,b) * c
np.dot(a,b) will be fine; however np.dot(a, b) * c is clearly wrong (12x1 X 1x5 = 12x5 which cannot element-wise multiply 5x12) but numpy will give you
ValueError: operands could not be broadcast together with shapes (12,1) (1,5)
The error is misleading; however there is an issue on that line.
Use np.mat(x) * np.mat(y), that'll work.
We might confuse ourselves that a * b is a dot product.
But in fact, it is broadcast.
Dot Product :
a.dot(b)
Broadcast:
The term broadcasting refers to how numpy treats arrays with different
dimensions during arithmetic operations which lead to certain
constraints, the smaller array is broadcast across the larger array so
that they have compatible shapes.
(m,n) +-/* (1,n) → (m,n) : the operation will be applied to m rows
Convert the arrays to matrices, and then perform the multiplication.
X = np.matrix(X)
y = np.matrix(y)
X*y
we should consider two points about broadcasting.
first: what is possible.
second: how much of the possible things is done by numpy.
I know it might look a bit confusing, but I will make it clear by some example.
lets start from the zero level.
suppose we have two matrices. first matrix has three dimensions (named A) and the second has five (named B).
numpy tries to match last/trailing dimensions. so numpy does not care about the first two dimensions of B.
then numpy compares those trailing dimensions with each other. and if and only if they be equal or one of them be 1, numpy says "O.K. you two match". and if it these conditions don't satisfy, numpy would "sorry...its not my job!".
But I know that you may say comparison was better to be done in way that can handle when they are devisable(4 and 2 / 9 and 3). you might say it could be replicated/broadcasted by a whole number(2/3 in out example). and i am agree with you. and this is the reason I started my discussion with a distinction between what is possible and what is the capability of numpy.
This is because X and y are not the same types. for example X is a numpy matrix and y is a numpy array!
Error: operands could not be broadcast together with shapes (2,3) (2,3,3)
This kind of error occur when the two array does not have the same shape.
to correct this you need reshape one array to match the other.
see example below
a1 = array([1, 2, 3]), shape = (2,3)
a3 =array([[[1., 2., 3.],
[2., 3., 2.],
[2., 4., 5.]],
[[1., 0., 3.],
[2., 3., 7.],
[2., 4., 6.]]])
with shape = (2,3,3)
IF i try to run np.multiply(a2,a3) it will return the error below
Error: operands could not be broadcast together with shapes (2,3) (2,3,3)
to solve this check out the broadcating rules
which state hat Two dimensions are compatible when:
#1.they are equal, or
#2.one of them is 1`
Therefore lets reshape a2.
reshaped = a2.reshape(2,3,1)
Now try to run np.multiply(reshaped,a3)
the multiplication will run SUCCESSFUL!!
ValueError: operands could not be broadcast together with shapes (x ,y) (a ,b)
where x ,y are variables
Basically this error occurred when value of y (no. of columns) doesn't equal to the number of elements in another multidimensional array.
Now let's go through by ex=>
coding apart
import numpy as np
arr1= np.arange(12).reshape(3,
output of arr1
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
arr2= np.arange(4).reshape(1,4)
or (both are same 1 rows and 4 columns)
arr2= np.arange(4)
ouput of arr2=>
array([0, 1, 2, 3])
no of elements in arr2 is equal no of no. of the columns in arr1 it will be excute.
for x,y in np.nditer([a,b]):
print(x,y)
output =>
0 0
1 1
2 2
3 3
4 0
5 1
6 2
7 3
8 0
9 1
10 2
11 3

Categories

Resources