I am trying to manipulate a numpy array that contains data stored in an other array. So far, when I change a value in my array, both of the arrays get values changed:
import numpy as np
from astropy.io import fits
image = fits.getdata("randomImage.fits")
fft = np.fft.fft2(image)
fftMod = np.copy(fft)
fftMod = fftMod*2
if fftMod.all()== fft.all():
print "shit same same same "
-- > shit same same same
Why is?
You misunderstood the usage of the .all() method.
It yields True if all elements of an array are not 0. This seems to be the case in both your arrays or in neither of them.
Since one is the double of the other, they definetly give the same result to the .all() method (both True or both False)
edit as requested in the comments:
To compare the content of the both arrays use element wise comparison first and check that all elements are True with .all:
(fftMod == fft).all()
Or maybe better for floats including a certain tolerance:
np.allclose(fftMod, fft)
Related
I'm learning Python right now and I'm stuck with this line of code I found on the internet. I can not understand what actually this line of code do.
Suppose I have this array:
import numpy as np
x = np.array ([[1,5],[8,1],[10,0.5]]
y = x[np.sqrt(x[:,0]**2+x[:,1]**2) < 1]
print (y)
The result is an empty array. What I want to know is what does actually the y do? I've never encountered this kind of code before. It seems like the square brackets is like the if-conditional statement. Instead of that code, If write this line of code:
import numpy as np
x = np.array ([[1,5],[8,1],[10,0.5]]
y = x[0 < 1]
print (y)
It will return exactly what x is (because zero IS less than one).
Assuming that it is a way to write if-conditional statement, I find it really absurd because I'm comparing an array with an integer.
Thank you for your answer!
In Numpy:
[1,1,2,3,4] < 2
is (very roughly) equivalent to something like:
[x<2 for x in [1,1,2,3,4]]
for vanilla Python lists. And as such, in both cases, the result would be:
[True, True, False, False, False]
The same holds true for some other functions, like addition, multiplication and so on. Broadcasting is actually a major selling point for Numpy.
Now, another thing you can do in Numpy is boolean indexing, which is providing an array of bools that are interpreted as 'Keep this value Y/N?'. So:
arr = [1,1,2,3,4]
res = arr[arr<2]
# evaluates to:
=> [1,1]
numpy works differently when you slice an array using a boolean or an int.
From the docs:
This advanced indexing occurs when obj is an array object of Boolean type, such as may be returned from comparison operators. A single
boolean index array is practically identical to x[obj.nonzero()]
where, as described above, obj.nonzero() returns a tuple (of length
obj.ndim) of integer index arrays showing the True elements of obj.
However, it is faster when obj.shape == x.shape.
If obj.ndim == x.ndim, x[obj] returns a 1-dimensional array filled
with the elements of x corresponding to the True values of obj. The
search order will be row-major, C-style. If obj has True values at
entries that are outside of the bounds of x, then an index error will
be raised. If obj is smaller than x it is identical to filling it with
False.
When you index an array using booleans, you are telling numpy to select the data corresponding to True, therefore array[True] is not the same as array[1]. In the first case, numpy will therefore interpret it as a zero dimensional boolean array, which, based on how masks works, is the same as selecting all data.
Therefore:
x[True]
will return the full array, just as
x[False]
will return an empty array.
Let's say I have a function (called numpyarrayfunction) that outputs an array every time I run it. I would like to run the function multiple times and store the resulting arrays. Obviously, the current method that I am using to do this -
numpyarray = np.zeros((5))
for i in range(5):
numpyarray[i] = numpyarrayfunction
generates an error message since I am trying to store an array within an array.
Eventually, what I would like to do is to take the average of the numbers that are in the arrays, and then take the average of these averages. But for the moment, it would be useful to just know how to store the arrays!
Thank you for your help!
As comments and other answers have already laid out, a good way to do this is to store the arrays being returned by numpyarrayfunction in a normal Python list.
If you want everything to be in a single numpy array (for, say, memory efficiency or computation speed), and the arrays returned by numpyarrayfunction are of a fixed length n, you could make numpyarray multidimensional:
numpyarray = np.empty((5, n))
for i in range(5):
numpyarray[i, :] = numpyarrayfunction
Then you could do np.average(numpyarray, axis = 1) to average over the second axis, which would give you back a one-dimensional array with the average of each array you got from numpyarrayfunction. np.average(numpyarray) would be the average over all the elements, or np.average(np.average(numpyarray, axis = 1)) if you really want the average value of the averages.
More on numpy array indexing.
I initially misread what was going on inside the for loop there. The reason you're getting an error is because numpy arrays will only store numeric types by default, and numpyarrayfunction is returning a non-numeric value (from the name, probably another numpy array). If that function already returns a full numpy array, then you can do something more like this:
arrays = []
for i in range(5):
arrays.append(numpyarrayfunction(args))
Then, you can take the average like so:
avgarray = np.zeros((len(arrays[0])))
for array in arrays:
avgarray += array
avgarray = avgarray/len(arrays)
I have some function in python returning a numpy.array:
matrix = np.array([0.,0.,0.,0.,0.,0.,1.,1.,1.,0.],
[0.,0.,0.,1.,1.,0.,0.,1.,0.,0.])
def some_function:
rows1, cols1 = numpy.nonzero(matrix)
cols2 = numpy.array([6,7,8,3,4,7])
rows2 = numpy.array([0,0,0,1,1,1])
print numpy.array_equal(rows1, rows2) # returns True
print numpy.array_equal(cols1, cols2) # returns True
return (rows1, cols1) # or (rows2, cols2)
It should normally extract the indices of nonzero entries of a matrix (rows1, cols1). However, I can also extract the indices manually (rows2, cols2). The problem is that the program returns different results depending on whether the function returns (rows1, cols1) or (rows2, cols2), although the arrays should be equal.
I should probably add that this code is used in the context of pyipopt, which calls a c++ software package IPOPT. The problem then occurs within this package.
Can it be that the arrays are not "completely" equal? I would say that they somehow must be because I am not modifying anything but returning one instead of the other.
Any idea on how to debug this problem?
You could check where the arrays are not equal:
print(where(rows1 != rows2))
But what you are doing is unclear, first there is no nonzeros function in numpy, only a nonzero which returns a tuple of coordinates. Are you only using the one corresponding to the rows?
I have a gaussian_kde.resample array. I don't know if it is a numpy array so that I can use numpy functions.
I had the data 0<x<=0.5 of 3000 variables and I used
kde = scipy.stats.gaussian_kde(x) # can also mention bandwidth here (x,bandwidth)
sample = kde.resample(100000) # returns 100,000 values that follow the prob distribution of "x"
This gave me a sample of data that follows the probability distribution of "x". But the problem is, no matter what bandwidth I try to select, I get very few negative values in my "sample". I only want values within the range 0 < sample <= 0.5
I tried to do:
sample = np.array(sample) # to convert this to a numpy array
keep = 0<sample<=0.5
sample = sample[keep] # using the binary conditions
But this does not work! How can I remove the negative values in my array?
Firstly, you can check what type it is by using the 'type' call within python:
x = kde.resample(10000)
type(x)
numpy.ndarray
Secondly, it should be working in the way you wrote, but I would be more explicit in your binary condition:
print x
array([[ 1.42935658, 4.79293343, 4.2725778 , ..., 2.35775067, 1.69647609]])
x.size
10000
y = x[(x>1.5) & (x<4)]
which you can see, does the correct binary conditions and removes the values >1.5 and <4:
print y
array([ 2.95451084, 2.62400183, 2.79426449, ..., 2.35775067, 1.69647609])
y.size
5676
I know I'm answering about 3 years late, but this may be useful for future reference.
The catch is that while kde.resample(100000) technically returns a NumPy array, this array actually contains another array(!), and that gets in the way of all the attempts to use indexing to get subsets of the sample. To get the array that the resample() method probably should have returned all along, do this instead:
sample = kde.resample(100000)[0]
The array variable sample should then have all 100000 samples, and indexing this array should work as expected.
Why SciPy does it this way, I don't know. This misfeature doesn't even appear to be documented.
First of all, the return value of kde.resample is a numpy array, so you do not need to reconvert it.
The problem lies in the line (Edit: No, it doesn't. This should work!)
keep = 0 < sample <= 0.5
It does not do what you would think. Try:
keep = (0 < sample) * (sample <= 0.5)
I have a 2d numpy array, for instance as:
import numpy as np
a1 = np.zeros( (500,2) )
a1[:,0]=np.arange(0,500)
a1[:,1]=np.arange(0.5,1000,2)
# could be also read from txt
then I want to select the indexes corresponding to a slice that matches a criteria such as all the value a1[:,1] included in the range (l1,l2):
l1=20.0; l2=900.0; #as example
I'd like to do in a condensed expression. However, neither:
np.where(a1[:,1]>l1 and a1[:,1]<l2)
(it gives ValueError and it suggests to use np.all, which it is not clear to me in such a case); neither:
np.intersect1d(np.where(a1[:,1]>l1),np.where(a1[:,1]<l2))
is working (it gives unhashable type: 'numpy.ndarray')
My idea is then to use these indexes to map another array of size (500,n).
Is there any reasonable way to select indexes in such way? Or: is it necessary to use some mask in such case?
This should work
np.where((a1[:,1]>l1) & (a1[:,1]<l2))
or
np.where(np.logical_and(a1[:,1]>l1, a1[:,1]<l2))
Does this do what you want?
import numpy as np
a1 = np.zeros( (500,2) )
a1[:,0]=np.arange(0,500)
a1[:,1]=np.arange(0.5,1000,2)
c=(a1[:,1]>l1)*(a1[:,1]<l2) # boolean array, true if the item at that position is ok according to the criteria stated, false otherwise
print a1[c] # prints all the points in a1 that correspond to the criteria
afterwards you can than just select from your new array that you make, the points that you need (assuming your new array has dimensions (500,n)) , by doing
print newarray[c,:]