Is there anyway to basically take a column of a numpy array and whenever the absolute value is greater than a number, set the value to that signed number.
ie.
for val in col:
if abs(val) > max:
val = (signed) max
I know this can be done by looping and such but i was wondering if there was a cleaner/builtin way to do this.
I see there is something like
arr[arr > 255] = x
Which is kind of what i want but i want do this by column instead of the whole array. As a bonus maybe a way to do absolute values instead of having to do two separate operations for positive and negative.
The other answer is good but it doesn't get you all the way there. Frankly, this is somewhat of a RTFM situation. But you'd be forgiven for not grokking the Numpy indexing docs on your first try, because they are dense and the data model will be alien if you are coming from a more traditional programming environment.
You will have to use np.clip on the columns you want to clip, like so:
x[:,2] = np.clip(x[:,2], 0, 255)
This applies np.clip to the 2nd column of the array, "slicing" down all rows, then reassigns it to the 2nd column. The : is Python syntax meaning "give me all elements of an indexable sequence".
More generally, you can use the boolean subsetting index that you discovered in the same fashion, by slicing across rows and selecting the desired columns:
x[x[:,2] > 255, 2] = -1
Try calling clip on your numpy array:
import numpy as np
values = np.array([-3,-2,-1,0,1,2,3])
values.clip(-2,2)
Out[292]:
array([-2, -2, -1, 0, 1, 2, 2])
Maybe is a little late, but I think it's a good option:
import numpy as np
values = np.array([-3,-2,-1,0,1,2,3])
values = np.clip(values,-2,2)
Related
Code:
import numpy as np
ray = [1,22,33,42,51], [61,71,812,92,103], [113,121,132,143,151], [16,172,183,19,201]
ray = np.asarray(ray)
type(ray)
ray[np.ix_([-2:],[3:4])]
I'd like to use index slicing and get a subarray consisting of the last two rows and the 3rd/4th columns. My current code produces an error:
I'd also like to sum each column. What am I doing wrong? I cannot post a picture because I need at least 10 reputation points.
So you want to make a slice of an array. The most straightforward way to do it is... slicing:
slice = ray[-2:,3:]
or if you want it explicitly
slice = ray[-2:,3:5]
See it explained in Understanding slicing
But if you do want to use np.ix_ for some reason, you need
slice = ray[np.ix_([-2,-1],[3,4])]
You can't use : here, because [] here don't make a slice, they construct lists and you should specify explicitly every row number and every column number you want in the result. If there are too many consecutive indices, you may use range:
slice = ray[np.ix_(range(-2, 0),range(3, 5))]
And to sum each column:
slice.sum(0)
0 means you want to reduce the 0th dimension (rows) by summation and keep other dimensions (columns in this case).
Im trying to write a code with numpy where it outputs the maximum value between indexes. I think using argmax could be usable. However I do not know how I can use slices without using a for loop in python. If there is a pandas function for this it could be useable too. I want to make the computation as fast as possible.
list_ = np.array([9887.89, 9902.99, 9902.99, 9910.23, 9920.79, 9911.34, 9920.01, 9927.51, 9932.3, 9932.33, 9928.87, 9929.22, 9929.22, 9935.24, 9935.24, 9935.26, 9935.26, 9935.68, 9935.68, 9940.5])
indexes = np.array([0, 5, 10, 19])
Expected result:
Max number between index(0 - 5): 9920.79 at index 5
Max number between index(5 - 10): 9932.33 at index 10
Max number between index(10 - 19): 9940.5 at index 19
You can use reduceat directly yo your array without the need to splice/split it:
np.maximum.reduceat(list_,indexes[:-1])
output:
array([9932.33, 9929.22, 9940.5 ])
Assuming that the first (zero) index and the last index is specified in the indexes array,
import numpy as np
list_ = np.array([9887.89, 9902.99, 9902.99, 9910.23, 9920.79, 9911.34, 9920.01, 9927.51, 9932.3, 9932.33, 9928.87, 9929.22, 9929.22, 9935.24, 9935.24, 9935.26, 9935.26, 9935.68, 9935.68, 9940.5])
indexes = np.array([0, 5, 10, 19])
chunks = np.split(list_, indexes[1:-1])
print([c.max() for c in chunks])
max_ind = [c.argmax() for c in chunks]
print(max_ind + indexes[:-1])
It's not necessary that each chunk will have the same size with an arbitrary specification of indices. So The vectorization benefits of numpy is going to be lost in there one way or another (Since you can't have a numpy array where each element is of a different size in memory which also has all the benefits of vectorization).
At least one for loop is going to be necessary, I think. However, you can use split, to make the splitting a numpy-optimized operation.
I am trying to isolate the last column of a numpy array. However, the function needs to work for arrays of different sizes. When I put it like this:
array[:,array_length]
#array_length is a variable set to the length of one row of the array
which seems like it would work, it returns an error telling me that I can't slice with a variable, but only with an integer.
Is there a way to do this with numpy that I'm not seeing?
To access the last column of a numpy array, you can use -1
last_col = array[:, -1]
Or you can also do
array_length = len(array[0]) - 1
last_col = array[:, array_length]
I have a large numpy array data that I wish to filter by one column [:,8] <= radius and get the sum of a different column [:,7]
So far I have the following which returns an "invalid slice" error.
>>> data.slice
(4700, 9)
>>> np.sum(data[np.where(data[:,8] <= 50):,7])
IndexError: invalid slice
I'm pretty new to python so really can't seem to figure out what I'm doing wrong here. Any thoughts or explanations would be appreciated.
There's no need for the np.where call.
data = np.random.normal(size=(20, 2))
np.sum(data[data[:,0] < 0, 1])
In this example, I want the rows where data[:,0] < 0 is True, and I want column 1. So just slice with those and take the sum.
I have an array of size: (50, 50). Within this array there is a slice of size (20,10).
Only this slice contains data, the remainder is all set to nan.
How do I cut this slice out of my large array?
You can get this using fancy indexing to collect the items that are not NaN:
a = a[ np.logical_not( np.isnan(a) ) ].reshape(20,10)
or, alternatively, as suggested by Joe Kington:
a = a[ ~np.isnan(a) ]
Do you know where the NaNs are? If so, something like this should work:
newarray = np.copy(oldarray[xstart:xend,ystart:yend])
where xstart and xend are the beginning and end of the slice you want in the x dimension and similarly for y. You can then delete the old array to free up memory if you don't need it anymore.
If you don't know where the NaNs are, this should do the trick:
# in this example, the starting array is A, numpy is imported as np
boolA = np.isnan(A) #get a boolean array of where the nans are
nonnanidxs = zip(*np.where(boolA == False)) #all the indices which are non NaN
#slice out the nans
corner1 = nonnanidxs[0]
corner2 = nonnanidxs[-1]
xdist = corner2[0] - corner1[0] + 1
ydist = corner2[1] - corner1[1] + 1
B = copy(A[corner1[0]:corner1[0]+xdist,corner1[1]:corner1[1]+ydist])
#B is now the array you want
Note that this would be pretty slow for large arrays because np.where looks through the whole thing. There's an open issue in the number bug tracker for a method that finds the first index equal to some value and then stops. There might be a more elegant way to do this, this is just the first thing that came to my head.
EDIT: ignore, sgpc's answer is much better.