Change string values in numpy.array in certain indices

Change string values in numpy.array in certain indices - python

I need to create an array of strings, actually it is color value, for each value in another array. Logic is that for positive values should be one color, and for negatives another color.
I've tried this code snippet:
values = np.array([1, 2, -3, 4, 5])
color_values = np.array(['rgb(74,159,234)'] * len(values))
color_values[values < 0] = 'rgb(120,183,239)'
print(color_values)
But the problem is that new string values are truncating to length of previous value in array, so the result is:
['rgb(74,159,234)', 'rgb(74,159,234)', 'rgb(120,183,239', 'rgb(74,159,234)', 'rgb(74,159,234)']
THe third value is changed, but without last parethesis. I can rewrite code to achieve result I need but now I'm curious about why this happens.
I'm using Python 3.6, numpy 1.14.2

Acording to this answer, str numpy arrays have a fixed length. They suggest specifying the data type when declaring the array.
You could try to add the datatype when declaring your array; set it to 16 chars (or more).
color_values = np.array(['rgb(74,159,234)'] * len(values), dtype='S16')
The rest of the lines should not need modification.

Related

Check for numpy array equality with specific NaN

There are several different types of NaN possible in most floating point representations (e.g. quiet NaNs, signalling NaNs, etc.). I assume this is also true in numpy. I have a specific bit representation of a NaN, defined in C and imported into python. I wish to test whether an array contains entirely this particular floating point bit pattern. Is there any way to do that?
Note that I want to test whether the array contains this particular NaN, not whether it has NaNs in general.

Numpy allows you to have direct access to the bytes in your array. For a simple case you can view nans directly as integers:
quiet_nan1 = np.uint64(0b0111111111111000000000000000000000000000000000000000000000000000)
x = np.arange(10, dtype=np.float64)
x.view(np.uint64)[5] = quiet_nan1
x.view(np.uint64)
Now you can just compare the elements for the bit-pattern of your exact NaN. This version will preserve shape since the elements are the same size.
A more general solution, which would let you with with types like float128 that don't have a corresponding integer analog on most systems, is to use bytes:
quiet_nan1l = np.frombuffer((0b01111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000).to_bytes(16, 'big'))
x = np.arange(3 * 4 * 5, dtype=np.float128).reshape3, 4, 5)
x.view(np.uint8).reshape(*x.shape, 16)[2, 2, 3, :] = quiet_nan1l
x.view(np.uint8).reshape(*x.shape, 16)
The final reshape is not strictly necessary, but it is very convenient, since it isolates the original array elements along the last dimension.
In both cases, modifying the view modifies the original array. That's the point of a view.
And if course it goes without saying (which is why I'm saying it), that this applies to any other bit pattern you may want to assign or test for, not just NaNs.

accessing portions of np.array

I want to have quick access to np.array elements for example from indexes from 0-6 plus 10 to the end. So far I have tried:
a[0:6,10:]
or
np.concatenate(a[0:6],a[10:])
both are giving me error, with the second one giving me:"TypeError: only integer scalar arrays can be converted to a scalar index"
Edit: concatenate is still giving me problems, so I am going to post my full code here:
Fold_5 = len(predictorX)/5
trainX = np.concatenate(predictorX[:3*int(Fold_5)],predictorX[4*int(Fold_5)])
predictor X is an array with values like
[[0.1,0.4,0.6,0.2],[..]....]

In:
a[0:6,10:]
0:6 selects rows, 10: selects columns. If a isn't 2d or large enough that will result in an error.
In
np.concatenate(a[0:6],a[10:])
the problem is the number of arguments; it takes a list of arrays. A second one, if given is understood to be axis, which should be an integer (hence your error).
np.concatenate([a[0:6],a[10:]])
should work.
Another option is to index with a list
a[0,1,2,3,4,5,10,11,...]]
np.r_ is a handy little tool for constructing such a list:
In [73]: np.r_[0:6, 10:15]
Out[73]: array([ 0, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14])
It in effect does np.concatenate([np.arange(0,6),np.arange(10,15)]).
It doesn't matter whether you index first and the concatenate, or concatenate indexes first and then index. Efficiency is about the same. np.delete chooses among several methods, including these, depending on the size and type of the 'delete' region.
In the trainX expression adding [] to the concatenate call should work. However, predictorX[4*Fold_5] could be a problem. Are you missing a : (as in 10: example)? If you want just one value, then you need to convert it to 1d, e.g. predictorX[[4*Fold_5]]
Fold_5 = len(predictorX)//5 # integer division in py3
trainX = np.concatenate([predictorX[:3*Fold_5], predictorX[4*Fold_5:]])

Here are two more short ways of getting the desired subarray:
np.delete(a, np.s_[6:10])
and
np.r_[a[:6], a[10:]]

np.concatenate takes a sequence of arrays. try
np.concatenate([a[0:6],a[10:]])
or
np.concatenate((a[0:6],a[10:]))

Adding coordinates to an array in Python 3

So I have image data which I am iterating through in order to find the pixel which have useful data in them, I then need to find these coordinates subject to a conditional statement and then put these into an array or DataFrame. The code I have so far is:
pix_coor = np.empty((0,2))
for (x,y), value in np.ndenumerate(data_int):
if value >= sigma3:
pix_coor.append([x,y])
where data is just an image array (129,129). All the pixels that have a value larger than sigma3 are useful and the other ones I dont need.
Creating an empty array works fine but when I append this it doesn't seem to work, I need to end up with an array which has two columns of x and y values for the useful pixels. Any ideas?

You could simply use np.argwhere for a vectorized solution -
pix_coor = np.argwhere(data_int >= sigma3)

In numpy, array.append is not an inplace operation, instead it copies the entire array into newly allocated memory (big enough to hold it along with the new values), and returns the new array. Therefore it should be used as such:
new_arr = arr.append(values)
Obviously, this is not an efficient way to add elements one by one.
You should use probably a regular python list for this.
Alternatively, pre allocate the numpy array with all values and then resize it:
pix_coor = np.empty((data_int.size, 2), int)
c = 0
for (x, y), value in np.ndenumerate(data_int):
if value >= sigma3:
pix_coor[c] = (x, y)
c += 1
numpy.resize(pix_coor, (c, 2))
Note that I used np.empty((data_int.size, 2), int), since your coordinates are integral, while numpy defaults to floats.

Selecting data on median value

I want to select one row of an array by the median value in one of the columns.
My method does not work the way I expect it to work, and it could be related to the representation/precision of the value returned by the numpy.median() function.
Here is a minimal working example and a workaround that I found:
import numpy as np
# Create an array with random numbers
some_array = np.random.rand(100)
# Try to select
selection = (some_array == np.median(some_array))
print len(some_array[selection]),len(some_array[~selection]) # Gives: 0, 100 -> selection fails
# Work-around
abs_dist_from_median = np.abs(some_array-np.median(some_array))
selection = (abs_dist_from_median == np.min(abs_dist_from_median))
print len(some_array[selection]),len(some_array[~selection]) # Gives: 1, 99 -> selection succeeded
It seems that the np.median() function returns a different representation off the number, thereby leading to a mismatch in the selection.
I find this behaviour strange, since by definition the median value of an array should be contained in the array. Any help/clarification would be appreciated!

First, the number of values is even such as [1, 2, 3, 4]. the median is (2+3)/2 not 2 or 3. If you change 100 to 101, it works properly. So your second approach is more appropriate on your purpose.
However, the best solution seems to use argsort as
some_array[some_array.argsort()[len(some_array)/2]]
Also, do not use == when you compare two float values. use np.isclose instead.

Numpy - Replace a number with NaN

I am looking to replace a number with NaN in numpy and am looking for a function like numpy.nan_to_num, except in reverse.
The number is likely to change as different arrays are processed because each can have a uniquely define NoDataValue. I have seen people using dictionaries, but the arrays are large and filled with both positive and negative floats. I suspect that it is not efficient to try to load all of these into anything to create keys.
I tried using the following but numpy requires that I use any() or all(). I realize that I need to iterate element wise, but hope that a built-in function can achieve this.
def replaceNoData(scanBlock, NDV):
for n, i in enumerate(array):
if i == NDV:
scanBlock[n] = numpy.nan
NDV is GDAL's no data value and array is a numpy array.
Is a masked array the way to go perhaps?

A[A==NDV]=numpy.nan
A==NDV will produce a boolean array that can be used as an index for A

You can also use np.where to replace a number with NaN.
arr = np.where(arr==NDV, np.nan, arr)
For example, the following result can be obtained via
arr = np.array([[1, 1, 2], [2, 0, 1]])
arr = np.where(arr==1, np.nan, arr)
This creates a new copy (unlike A[A==NDV]=np.nan) but in some cases that could be useful. For example, if the array was initially an int dtype, it will have to converted into a float array anyway (because replacing values with NaN won't work otherwise) and np.where can handle that.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Change string values in numpy.array in certain indices - python

Related

Check for numpy array equality with specific NaN

accessing portions of np.array

Adding coordinates to an array in Python 3

Selecting data on median value

Numpy - Replace a number with NaN

Categories

Resources