I am looking to replace a number with NaN in numpy and am looking for a function like numpy.nan_to_num, except in reverse.
The number is likely to change as different arrays are processed because each can have a uniquely define NoDataValue. I have seen people using dictionaries, but the arrays are large and filled with both positive and negative floats. I suspect that it is not efficient to try to load all of these into anything to create keys.
I tried using the following but numpy requires that I use any() or all(). I realize that I need to iterate element wise, but hope that a built-in function can achieve this.
def replaceNoData(scanBlock, NDV):
for n, i in enumerate(array):
if i == NDV:
scanBlock[n] = numpy.nan
NDV is GDAL's no data value and array is a numpy array.
Is a masked array the way to go perhaps?
A[A==NDV]=numpy.nan
A==NDV will produce a boolean array that can be used as an index for A
You can also use np.where to replace a number with NaN.
arr = np.where(arr==NDV, np.nan, arr)
For example, the following result can be obtained via
arr = np.array([[1, 1, 2], [2, 0, 1]])
arr = np.where(arr==1, np.nan, arr)
This creates a new copy (unlike A[A==NDV]=np.nan) but in some cases that could be useful. For example, if the array was initially an int dtype, it will have to converted into a float array anyway (because replacing values with NaN won't work otherwise) and np.where can handle that.
Related
I am trying to randomly select a set of integers in numpy and am encountering a strange error. If I define a numpy array with two sets of different sizes, np.random.choice chooses between them without issue:
Set1 = np.array([[1, 2, 3], [2, 4]])
In: np.random.choice(Set1)
Out: [4, 5]
However, once the numpy array are sets of the same size, I get a value error:
Set2 = np.array([[1, 3, 5], [2, 4, 6]])
In: np.random.choice(Set2)
ValueError: a must be 1-dimensional
Could be user error, but I've checked several times and the only difference is the size of the sets. I realize I can do something like:
Chosen = np.random.choice(N, k)
Selection = Set[Chosen]
Where N is the number of sets and k is the number of samples, but I'm just wondering if there was a better way and specifically what I am doing wrong to raise a value error when the sets are the same size.
Printout of Set1 and Set2 for reference:
In: Set1
Out: array([list([1, 3, 5]), list([2, 4])], dtype=object)
In: type(Set1)
Out: numpy.ndarray
In: Set2
Out:
array([[1, 3, 5],
[2, 4, 6]])
In: type(Set2)
Out: numpy.ndarray
Your issue is caused by a misunderstanding of how numpy arrays work. The first example can not "really" be turned into an array because numpy does not support ragged arrays. You end up with an array of object references that points to two python lists. The second example is a proper 2xN numerical array. I can think of two types of solutions here.
The obvious approach (which would work in both cases, by the way), would be to choose the index instead of the sublist. Since you are sampling with replacement, you can just generate the index and use it directly:
Set[np.random.randint(N, size=k)]
This is the same as
Set[np.random.choice(N, k)]
If you want to choose without replacement, your best bet is to use np.random.choice, with replace=False. This is similar to, but less efficient than shuffling. In either case, you can write a one-liner for the index:
Set[np.random.choice(N, k, replace=False)]
Or:
index = np.arange(Set.shape[0])
np.random.shuffle(index)
Set[index[:k]]
The nice thing about np.random.shuffle, though, is that you can apply it to Set directly, whether it is a one- or many-dimensional array. Shuffling will always happen along the first axis, so you can just take the top k elements afterwards:
np.random.shuffle(Set)
Set[:k]
The shuffling operation works only in-place, so you have to write it out the long way. It's also less efficient for large arrays, since you have to create the entire range up front, no matter how small k is.
The other solution is to turn the second example into an array of list objects like the first one. I do not recommend this solution unless the only reason you are using numpy is for the choice function. In fact I wouldn't recommend it at all, since you can, and probably should, use pythons standard random module at this point. Disclaimers aside, you can coerce the datatype of the second array to be object. It will remove any benefits of using numpy, and can't be done directly. Simply setting dtype=object will still create a 2D array, but will store references to python int objects instead of primitives in it. You have to do something like this:
Set = np.zeros(N, dtype=object)
Set[:] = [[1, 2, 3], [2, 4]]
You will now get an object essentially equivalent to the one in the first example, and can therefore apply np.random.choice directly.
Note
I show the legacy np.random methods here because of personal inertia if nothing else. The correct way, as suggested in the documentation I link to, is to use the new Generator API. This is especially true for the choice method, which is much more efficient in the new implementation. The usage is not any more difficult:
Set[np.random.default_rng().choice(N, k, replace=False)]
There are additional advantages, like the fact that you can now choose directly, even from a multidimensional array:
np.random.default_rng().choice(Set2, k, replace=False)
The same goes for shuffle, which, like choice, now allows you to select the axis you want to rearrange:
np.random.default_rng().shuffle(Set)
Set[:k]
I find this behaviour an utter nonsense. This happens only with numpy arrays, typical Python's arrays will just throw an error.
Let's create two arrays:
randomNumMatrix = np.random.randint(0,20,(3,3,3), dtype=np.int)
randRow = np.array([0,1,2], dtype=np.int)
If we pass an array as index to get something from another array, an original array is returned.
randomNumMatrix[randRow]
The code above returns an equivalent of randomNumMatrix. I find this unintuitive. I would expect it, not to work or at least return an equivalent of
randomNumMatrix[randRow[0]][randRow[1]][randRow[2]].
Additional observations:
A)
The code below does not work, it throws this error: IndexError: index 3 is out of bounds for axis 0 with size 3
randRow = np.array([0, 1, 3], dtype=np.int)
B)
To my surprise, the code below works:
randRow = np.array([0, 1, 2, 2,0,1,2], dtype=np.int)
Can somebody please explain what are the advantages of this feature?
In my opinion it only creates much confusion.
What is?
randomNumMatrix[randRow[0]][randRow[1]][randRow[2]]
That's not a valid Python.
In numpy there is a difference between
arr[(x,y,z)] # equivalent to arr[x,y,z]
and
arr[np.array([x,y,z])] # equivalent to arr[np.array([x,y,z]),:,:]
The tuple provides a scalar index for each dimension. The array (or list) provides multiple indices for one dimension.
You may need to study the numpy docs on indexing, especially advanced indexing.
EDIT:
As my question was badly formulated, I decided to rewrite it.
Does numpy allow to create an array with a function, without using Python's standard list comprehension ?
With list comprehension I could have:
array = np.array([f(i) for i in range(100)])
with f a given function.
But if the constructed array is really big, using Python's list would be slow and would eat a lot of memory.
If such a way doesn't exist, I suppose I could first create an array of my wanted size
array = np.arange(100)
And then map a function over it.
array = f(array)
According to results from another post, it seems that it would be a reasonable solution.
Let's say I want to use the add function with a simple int value, it will be as follows:
array = np.array([i for i in range(5)])
array + 5
But now what if I want the value (here 5) as something that varies according to the index of the array element. For example the operation:
array + [i for i in range(5)]
What object can I use to define special rules for a variable value within a vectorized operation ?
You can add two arrays together like this:
Simple adding two arrays using numpy in python?
This assumes your "variable by index" is just another array.
For your specific example, a jury-rigged solution would be to use numpy.arange() as in:
In [4]: array + np.arange(5)
Out[4]: array([0, 2, 4, 6, 8])
In general, you can find some numpy ufunc that does the job of your custom function or you can compose then in a python function to do so, which then returns an ndarray, something like:
def custom_func():
# code for your tasks
return arr
You can then simply add the returned result to your already defined array as in:
array + cusom_func()
I need to create an array of strings, actually it is color value, for each value in another array. Logic is that for positive values should be one color, and for negatives another color.
I've tried this code snippet:
values = np.array([1, 2, -3, 4, 5])
color_values = np.array(['rgb(74,159,234)'] * len(values))
color_values[values < 0] = 'rgb(120,183,239)'
print(color_values)
But the problem is that new string values are truncating to length of previous value in array, so the result is:
['rgb(74,159,234)', 'rgb(74,159,234)', 'rgb(120,183,239', 'rgb(74,159,234)', 'rgb(74,159,234)']
THe third value is changed, but without last parethesis. I can rewrite code to achieve result I need but now I'm curious about why this happens.
I'm using Python 3.6, numpy 1.14.2
Acording to this answer, str numpy arrays have a fixed length. They suggest specifying the data type when declaring the array.
You could try to add the datatype when declaring your array; set it to 16 chars (or more).
color_values = np.array(['rgb(74,159,234)'] * len(values), dtype='S16')
The rest of the lines should not need modification.
I'm following this tutorial on how to use numpy to manipulate images. When I load the sample image using scipy, I get a 2D array of RGB tuples, with a dtype value appended on the end.
array([[7, 8, 5],
[3, 5, 7]], dtype=uint8)
I wrote a function and vectorized it
def myfunc(a, b):
return a + 2
vfunc = np.vectorize(myfunc)
but when I apply it to my array, the result doesn't have the dtype
array([[9, 10, 7],
[5, 7, 9]])
My guess is that because "dtype + 2" isn't defined, it's just losing that element of the array.
How can I write a function that will not strip the dtype when I vectorize it and apply it to a numpy array?
np.vectorize takes an otypes parameter. You can use that to specify the dtype of the return. Without that vectorize does a trial calculation on the 1st element of your array, and uses that return dtype to determine the dtype of the whole reply.
Look at the 3rd example in its docs.
Usually users encounter this when the first value produces an integer value (e.g. 0) and they expect the whole thing to be float.
So try:
vfunc = np.vectorize(myfunc, otypes=[np.uint8])
dtype=uint8 is not an element of the array. It is just a thing that gets printed to let you know that the array is of type np.uint8.
The default types np.float_ and np.int_ do not get a printout like that, which is what you are seeing in the second case. The way you can tell float and int arrays apart is that float arrays will always have decimal points in the numbers.
The reason that this is happening is that you are adding 2 to each element of your array. Since 2 is an integer, the output array gets promoted to np.int_ type and you do not get an explicit dtype printout.
You can try the following experiment: redefine myfunc to add a np.uint8 instead of an integer to the array elements and try to print the result:
def myfunc(a, b):
return a + np.uint8(2)
Finally, keep in mind that vectorizing Python code is usually not the best way to get things done. The function itself will be a Python function, and therefore slow. It is generally better to find a way of performing whatever operations you want with numpy functions.