Numpy: Imposing row dependent maximum on array - python

Suppose I have the following array:
a = [[1, 4, 2, 3]
[3, 1, 5, 4]
[4, 3, 1, 2]]
What I'd like to do is impose a maximum value on the array, but have that maximum vary by row. For instance if I wanted to limit the 1st and 3rd row to a maximum value of 3, and the 2nd row to a value of 4, I could create something like:
[[1, 3, 2, 3]
[3, 1, 4, 4]
[3, 3, 1, 2]
Is there any better way than just looping over each row individually and setting it with 'nonzero'?

With numpy.clip (using the method version here):
a.clip(max=np.array([3, 4, 3])[:, None]) # np.clip(a, ...)
# array([[1, 3, 2, 3],
# [3, 1, 4, 4],
# [3, 3, 1, 2]])
Generalized:
def clip_2d_rows(a, maxs):
maxs = np.asanyarray(maxs)
if maxs.ndim == 1:
maxs = maxs[:, np.newaxis]
return np.clip(a, a_min=None, a_max=maxs)
You might be safer using the module-level function (np.clip) rather than the class method (np.ndarray.clip). The former uses a_max as a parameter, while the latter uses the builtin max as a parameter which is never a great idea.

With masking -
In [50]: row_lims = np.array([3,4,3])
In [51]: np.where(a > row_lims[:,None], row_lims[:,None], a)
Out[51]:
array([[1, 3, 2, 3],
[3, 1, 4, 4],
[3, 3, 1, 2]])

With
>>> a
array([[1, 4, 2, 3],
[3, 1, 5, 4],
[4, 3, 1, 2]])
Say you have
>>> maxs = np.array([[3],[4],[3]])
>>> maxs
array([[3],
[4],
[3]])
What about doing
>>> a.clip(max=maxs)
array([[1, 3, 2, 3],
[3, 1, 4, 4],
[3, 3, 1, 2]])

Related

Permute a single row or column of a matrix

I have a large matrix where I want to permute (or shift) one row of it.
For example:
np.array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
And the desired shifting output is: (for the second row by 1, for that example)
np.array([[1, 2, 3, 4],
[2, 3, 4, 1],
[1, 2, 3, 4],
[1, 2, 3, 4]])
This can be done naively by extracting the row of interest, permute and stick it back in the matrix.
I want a better solution that is in-place and efficient.
How to shift desired row or column by n places?
How to permute (change the order as desired)?
Can this be done efficiently for more than 1 row? for example shift the i'th row i places forward:
np.array([[1, 2, 3, 4],
[2, 3, 4, 1],
[3, 4, 1, 2],
[4, 1, 2, 3]])
You can do it indexing by slicing the rows and rolling them:
import numpy as np
a = np.array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
shift = 2
rows = [1, 3]
a[rows] = np.roll(a[rows], shift, axis=1)
array([[1, 2, 3, 4],
[3, 4, 1, 2],
[1, 2, 3, 4],
[3, 4, 1, 2]])

Repeat a NumPy array in multiple dimensions at once?

np.repeat(np.repeat([[1, 2, 3]], 3, axis=0), 3, axis=1)
works as expected and produces
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3]])
However,
np.repeat([[1, 2, 3]], [3, 3])
and
np.repeat([[1, 2, 3]], [3, 3], axis=0)
produce errors.
Is it possible to repeat an array in multiple dimensions at once?
First off, I think the original method you propose is totally fine. It's readable, it makes sense, and it's not very slow.
You could use the repeat method instead of function which reads a bit more nicely:
>>> x.repeat(3, 1).repeat(3, 0)
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3]])
With numpy's broadcasting rules, there's likely dozens of ways to create the repeated data and throw it around into the shape you want, too. One approach could be to use np.broadcast_to() and repeat the data in D+1 dimensions, where D is the dimension you need, and then collapse it down to D.
For example:
>>> x = np.array([[1, 2, 3]])
>>> np.broadcast_to(x.T, (3, 3, 3)).reshape((3, 9))
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3]])
And without reshaping (so that you don't need to know the final length):
>>> np.hstack(np.broadcast_to(x, (3, 3, 3)).T)
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3]])
And there's likely a dozen other ways to do this. But I still think your original version is more idiomatic, as throwing it into extra dimensions to collapse it down is weird.
It isn't possible, see repeat. But you are using a array with the shape (1,3), so you have to use:
np.repeat(X, [2], axis=0)
because np.repeat(X, [2,2], axis=0) needs shape (2,3), e.g.
X = np.array([[1, 2, 3], [5, 6, 7]])
np.repeat(X, [2, 5], axis=0)
the output looks like:
[[1 2 3]
[1 2 3]
[5 6 7]
[5 6 7]
[5 6 7]
[5 6 7]]
This means [2,5] stands for [2, 5]:2x first row and [2, 5]:5x second row (shape: (2, *doesn't matter*) because axis=0 means you want to repeat the rows.
Therefore you first have to generate an array with the dimensions (3, *), and then produce the next array.
If you want to repeat your array:
np.repeat(X2, [5], axis=0)
produces:
[[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]]
because you have only a 1-dimensional array.
The first call of np.repeat produces a 2D-array, the second call duplicates the columns. If you want to use np.repeat(X2, [5], axis=0) you get the same result as you have mentioned in your post above, because you have to call np.repeat a second time on the output of np.repeat(X2, [5], axis=0).
In my opinion your use of np.repeat is the easiest and best way to achieve your output.
Edit: Hopefully the answer is now more clearly

numpy delete list element from list of lists

I have an array of numpy arrays:
a = [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
I need to find and remove a particular list from a:
rem = [1,2,3,5]
numpy.delete(a,rem) does not return the correct results. I need to be able to return:
[[1, 2, 3, 4], [2, 5, 4, 3], [5, 2, 3, 1]]
is this possible with numpy?
A list comprehension can achieve this.
rem = [1,2,3,5]
a = [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
a = [x for x in a if x != rem]
outputs
[[1, 2, 3, 4], [2, 5, 4, 3], [5, 2, 3, 1]]
Numpy arrays do not support random deletion by element. Similar to strings in Python, you need to generate a new array to delete a single or multiple sub elements.
Given:
>>> a
array([[1, 2, 3, 4],
[1, 2, 3, 5],
[2, 5, 4, 3],
[5, 2, 3, 1]])
>>> rem
array([1, 2, 3, 5])
You can get each matching sub array and create a new array from that:
>>> a=np.array([sa for sa in a if not np.all(sa==rem)])
>>> a
array([[1, 2, 3, 4],
[2, 5, 4, 3],
[5, 2, 3, 1]])
To use np.delete, you would use an index and not a match, so:
>>> a
array([[1, 2, 3, 4],
[1, 2, 3, 5],
[2, 5, 4, 3],
[5, 2, 3, 1]])
>>> np.delete(a, 1, 0) # delete element 1, axis 0
array([[1, 2, 3, 4],
[2, 5, 4, 3],
[5, 2, 3, 1]])
But you can't loop over the array and delete elements...
You can pass multiple elements to np.delete however and you just need to match sub elements:
>>> a
array([[1, 2, 3, 5],
[1, 2, 3, 5],
[2, 5, 4, 3],
[5, 2, 3, 1]])
>>> np.delete(a, [i for i, sa in enumerate(a) if np.all(sa==rem)], 0)
array([[2, 5, 4, 3],
[5, 2, 3, 1]])
And given that same a, you can have an all numpy solution by using np.where:
>>> np.delete(a, np.where((a == rem).all(axis=1)), 0)
array([[2, 5, 4, 3],
[5, 2, 3, 1]])
Did you try list remove?
In [84]: a = [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
In [85]: a
Out[85]: [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
In [86]: rem = [1,2,3,5]
In [87]: a.remove(rem)
In [88]: a
Out[88]: [[1, 2, 3, 4], [2, 5, 4, 3], [5, 2, 3, 1]]
remove matches on value.
np.delete works with an index, not value. Also it returns a copy; it does not act in place. And the result is an array, not a nested list (np.delete converts the input to an array before operating on it).
In [92]: a = [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
In [93]: a1=np.delete(a,1, axis=0)
In [94]: a1
Out[94]:
array([[1, 2, 3, 4],
[2, 5, 4, 3],
[5, 2, 3, 1]])
This is more like list pop:
In [96]: a = [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
In [97]: a.pop(1)
Out[97]: [1, 2, 3, 5]
In [98]: a
Out[98]: [[1, 2, 3, 4], [2, 5, 4, 3], [5, 2, 3, 1]]
To delete by value you need first find the index of the desired row. With integer arrays that's not too hard. With floats it is trickier.
=========
But you don't need to use delete to do this in numpy; boolean indexing works:
In [119]: a = [[1, 2, 3, 4], [1, 2, 3, 5], [2, 5, 4, 3], [5, 2, 3, 1]]
In [120]: A = np.array(a) # got to work with array, not list
In [121]: rem=np.array([1,2,3,5])
Simple comparison; rem is broadcasted to match rows
In [122]: A==rem
Out[122]:
array([[ True, True, True, False],
[ True, True, True, True],
[False, False, False, False],
[False, True, True, False]], dtype=bool)
find the row where all elements match - this is the one we want to remove
In [123]: (A==rem).all(axis=1)
Out[123]: array([False, True, False, False], dtype=bool)
Just not it, and use it to index A:
In [124]: A[~(A==rem).all(axis=1),:]
Out[124]:
array([[1, 2, 3, 4],
[2, 5, 4, 3],
[5, 2, 3, 1]])
(the original A is not changed).
np.where can be used to convert the boolean (or its inverse) to indicies. Sometimes that's handy, but usually it isn't required.

Remove numpy concat from algo

I have a function called gen_data which will make a single pass through a list and construct a 3D array. I then iterate across a list of list, applying the function gen_data, and then concat the results together.
fst = lambda x: x[0]
snd = lambda x: x[1]
def gen_data(data,p=0, batch_size = BATCH_SIZE, n_session = N_SESSION,
x = np.zeros((batch_size,SEQ_LENGTH,vocab_size))
y = np.zeros(batch_size)
for n in range(batch_size):
ptr = n
for i in range(SEQ_LENGTH):
x[n,i,char_to_ix[data[p+ptr+i]]] = 1.
if(return_target):
y[n] = char_to_ix[data[p+ptr+SEQ_LENGTH]]
return x, np.array(y,dtype='int32')
def batch_data(data):
nest = [gen_data(datum) for datum in data]
x = np.concatenate(map(fst,nest))
y = np.concatenate(map(snd,nest))
return (x,y)
What is the best way to combine these functions so I do not need to make multiple passes back through the data to concatenate the results?
To clarify, the goal would be remove the need to zip/concat/splat/list comp in general. To be able to initialize the x tensor to the correct dimensions and then iterate across each datum/SEQ_LENGTH, batch_size in a single pass.
Without testing things, here are a few quick fixes:
def gen_data(data,p=0, batch_size = BATCH_SIZE, n_session = N_SESSION,
x = np.zeros((batch_size,SEQ_LENGTH,vocab_size))
y = np.zeros(batch_size, dtype=int) # initial to desired type
for n in range(batch_size):
ptr = n
for i in range(SEQ_LENGTH):
x[n,i,char_to_ix[data[p+ptr+i]]] = 1.
if(return_target):
y[n] = char_to_ix[data[p+ptr+SEQ_LENGTH]]
return x, y
# y is already an array; don't need this: np.array(y,dtype='int32')
nest = [gen_data(datum) for datum in data] produces, I think,
[(x0,y0), (x1,y1),...] where x is 3d (n,m,y), and y is 1d (n)
x = np.concatenate([n[0] for n in nest]) (I like this format over mapping) look ok to me. Compared to all the list comprehension operations, concatenate is relatively cheap. Look at the guts of np.vstack, etc to see how those use comprehensions along with concatenate.
A small example:
In [515]: def gen():
return np.arange(8).reshape(2,4),np.arange(1,3)
.....:
In [516]: gen()
Out[516]:
(array([[0, 1, 2, 3],
[4, 5, 6, 7]]), array([1, 2]))
In [517]: nest=[gen() for _ in range(3)]
In [518]: nest
Out[518]:
[(array([[0, 1, 2, 3],
[4, 5, 6, 7]]), array([1, 2])),
(array([[0, 1, 2, 3],
[4, 5, 6, 7]]), array([1, 2])),
(array([[0, 1, 2, 3],
[4, 5, 6, 7]]), array([1, 2]))]
In [519]: np.concatenate([x[0] for x in nest])
Out[519]:
array([[0, 1, 2, 3],
[4, 5, 6, 7],
[0, 1, 2, 3],
[4, 5, 6, 7],
[0, 1, 2, 3],
[4, 5, 6, 7]])
In [520]: np.concatenate([x[1] for x in nest])
Out[520]: array([1, 2, 1, 2, 1, 2])
zip* effectively does a 'tanspose' on a nested list, so the arrays could be constructed with:
In [532]: nest1=zip(*nest)
In [533]: np.concatenate(nest1[0])
Out[533]:
array([[0, 1, 2, 3],
[4, 5, 6, 7],
[0, 1, 2, 3],
[4, 5, 6, 7],
[0, 1, 2, 3],
[4, 5, 6, 7]])
In [534]: np.concatenate(nest1[1])
Out[534]: array([1, 2, 1, 2, 1, 2])
Still requires concatenates.
Since nest is a list of tuples, it could serve as input to a structured array:
In [524]: arr=np.array(nest,dtype=[('x','(2,4)int'),('y','(2,)int')])
In [525]: arr['x']
Out[525]:
array([[[0, 1, 2, 3],
[4, 5, 6, 7]],
[[0, 1, 2, 3],
[4, 5, 6, 7]],
[[0, 1, 2, 3],
[4, 5, 6, 7]]])
In [526]: arr['y']
Out[526]:
array([[1, 2],
[1, 2],
[1, 2]])
Another possibility is to initial x and y, and iterate. But you are already doing this in gen_data. Only thing new is that I'd be assigning larger blocks.
x = ...
y = ...
for i in range(...):
x[i,...], y[i] = gen(data[i])
I like the comprehension solutions better, but I won't speculate on speeds.
In terms of speed I think it's the low level iteration in gen_data that is the time consumer. Concatenating larger blocks is relatively fast.
Another idea - since you are iterating over the rows of arrays within gen_data, how about passing views to that function, and iterate over those.
def gen_data(data,x=None,y=None):
# accept array or make own
if x is None:
x = np.zeros((3,4),int)
if y is None:
y = np.zeros(3,int)
for n in range(3):
x[n,...] = np.arange(4)+n
y[n] = n
return x,y
with no inputs, generate arrays as before:
In [543]: gen_data(None)
Out[543]:
(array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5]]),
array([0, 1, 2]))
or initial a pair, and iterate over views:
In [544]: x,y = np.zeros((9,4),int),np.zeros(9,int)
In [546]: for i in range(0,9,3):
.....: gen_data(None,x[i:i+3,...],y[i:i+3])
In [547]: x
Out[547]:
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5]])
In [548]: y
Out[548]: array([0, 1, 2, 0, 1, 2, 0, 1, 2])

Identify vectors with same value in one column with numpy in python

I have a large 2d array of vectors. I want to split this array into several arrays according to one of the vectors' elements or dimensions. I would like to receive one such small array if the values along this column are consecutively identical. For example considering the third dimension or column:
orig = np.array([[1, 2, 3],
[3, 4, 3],
[5, 6, 4],
[7, 8, 4],
[9, 0, 4],
[8, 7, 3],
[6, 5, 3]])
I want to turn into three arrays consisting of rows 1,2 and 3,4,5 and 6,7:
>>> a
array([[1, 2, 3],
[3, 4, 3]])
>>> b
array([[5, 6, 4],
[7, 8, 4],
[9, 0, 4]])
>>> c
array([[8, 7, 3],
[6, 5, 3]])
I'm new to python and numpy. Any help would be greatly appreciated.
Regards
Mat
Edit: I reformatted the arrays to clarify the problem
Using np.split:
>>> a, b, c = np.split(orig, np.where(orig[:-1, 2] != orig[1:, 2])[0]+1)
>>> a
array([[1, 2, 3],
[1, 2, 3]])
>>> b
array([[1, 2, 4],
[1, 2, 4],
[1, 2, 4]])
>>> c
array([[1, 2, 3],
[1, 2, 3]])
Nothing fancy here, but this good old-fashioned loop should do the trick
import numpy as np
a = np.array([[1, 2, 3],
[1, 2, 3],
[1, 2, 4],
[1, 2, 4],
[1, 2, 4],
[1, 2, 3],
[1, 2, 3]])
groups = []
rows = a[0]
prev = a[0][-1] # here i assume that the grouping is based on the last column, change the index accordingly if that is not the case.
for row in a[1:]:
if row[-1] == prev:
rows = np.vstack((rows, row))
else:
groups.append(rows)
rows = [row]
prev = row[-1]
groups.append(rows)
print groups
## [array([[1, 2, 3],
## [1, 2, 3]]),
## array([[1, 2, 4],
## [1, 2, 4],
## [1, 2, 4]]),
## array([[1, 2, 3],
## [1, 2, 3]])]
if a looks like this:
array([[1, 1, 2, 3],
[2, 1, 2, 3],
[3, 1, 2, 4],
[4, 1, 2, 4],
[5, 1, 2, 4],
[6, 1, 2, 3],
[7, 1, 2, 3]])
than this
col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
indices = np.concatenate(([0], indices, [len(a)]))
res = [a[start:end] for start, end in zip(indices[:-1], indices[1:])]
print(res)
results in:
[array([[1, 2, 3],
[1, 2, 3]]), array([[1, 2, 4],
[1, 2, 4],
[1, 2, 4]]), array([[1, 2, 3],
[1, 2, 3]])]
Update: np.split() is much nicer. No need to add first and last index:
col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
res = np.split(a, indices)

Categories

Resources