My final goal is to use a vectorized numpy solution for a for-loop. This loop creates for each element a random sample from another list if its elements are not given in the original element. However, the for-loops' input is a list of lists. I do not know how to apply a numpy vectorization for a list of lists. A reproducible example is here:
import random
list_of_all_items = [1, 2, 3, 4, 12, 21, 23, 42, 93]
seen_formats = [[1, 2, 3, 4], [2,23, 21, 3], [12, 42, 93, 1]]
not_seen_formats = []
for seen in seen_formats:
not_seen_formats.append(random.sample([format_ for format_ in list_of_all_items if format_ not in seen],
len(seen) * 1))
What I tried so far is:
import numpy as np
np.where(np.in1d(np.random.choice(list_of_all_items, 2, replace = False), np.asarray(seen_formats)))
>> (array([0, 1], dtype=int64),)
This sadly makes no sense. What I would like to have returned is an array which should contain random samples for the given list of lists, like:
>> array([[12, 21], # those numbers should be random numbers
[ 1, 4],
[ 2, 3]])
import numpy as np
np.random.seed(42)
list_of_all_items = np.array([1, 2, 3, 4, 12, 21, 23, 42, 93])
seen_formats = np.array([[1, 2, 3, 4], [2,23, 21, 3], [12, 42, 93, 1]])
print(list_of_all_items, '\n')
print(seen_formats, '\n')
def select(a, b):
return np.random.choice(a=np.setdiff1d(b, a), size=a.size, replace=False)
selection = np.apply_along_axis(func1d=select, axis=1, arr=seen_formats, b=list_of_all_items)
print(selection)
# Alternatively:
# select_vect = np.vectorize(select, excluded=['b'], signature='(m),(n)->(m)')
# selection2 = select_vect(seen_formats, list_of_all_items)
# print(selection2)
Output:
[ 1 2 3 4 12 21 23 42 93]
[[ 1 2 3 4]
[ 2 23 21 3]
[12 42 93 1]]
[[21 93 23 12]
[42 4 12 1]
[ 3 2 21 23]]
Related
*Question edited/updated to add an example
Hi all! I have this a np.array. Based on the reference values of it, I want to update array b, which is my matrix. The "1st column" of a represents a code and the "2nd column" is my reference value. The matrix is populated with codes and I must replace them. See below the example.
import numpy as np
a = np.asarray([[0, 11], [1, 22], [2, 33]])
b = np.asarray([[0, 14, 12, 2], [1, 1, 7, 0], [0, 0,3,5], [1, 2, 2, 6]])
In other words: I want to replace the 0, 1, 2 values in "b" by 11, 22, 33, respectively.
Which is the best way to do that, considering that my real a array has +- 50 codes and my real b matrices have a shape of (850,850).
Thanks in advance!
If I understand the question correctly, this example should show what you're asking for?
Assuming a is the matrix as you've listed above, and b is the list you want to write to
import numpy as np
a = np.asarray([[0, 10], [2, 30], [1, 40]])
b = np.zeros(3)
b[a[:, 0]] = a[:, 1]
where the [:, 0] is the index to be changed, and [:, 1] is what to populate it with
If codes are not too long integers, You just have to build the correct lookup table :
lut = np.arange(b.max()+1)
k,v = a.T
lut[k] = v
For :
>>> b
[[ 0 14 12 2]
[ 1 1 7 0]
[ 0 0 3 5]
[ 1 2 2 6]]
>>> lut[b]
[[11 14 12 33]
[22 22 7 11]
[11 11 3 5]
[22 33 33 6]]
undefined codes are mapped to themselves,code=value.
This question already has an answer here:
how to reshape an N length vector to a 3x(N/3) matrix in numpy using reshape
(1 answer)
Closed 5 years ago.
I need to change my array from the following
Array = np.array([x1,y1,z1,x2,y2,z2......])
to
Array = [[x1,x2,x3......]
[y1,y2,y3,.....]
[z1,z2,z3,.....]]
Is this possible if so how ?
Thanks
You just need to reshape that 1D array to 2D, then transpose it.
import numpy as np
a = np.array([10, 11, 12, 20, 21, 22, 30, 31, 32, 40, 41, 42, 50, 51, 52])
a = a.reshape(-1, 3).T
print(a)
output
[[10 20 30 40 50]
[11 21 31 41 51]
[12 22 32 42 52]]
I think reshape could help you.
See this docs: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.reshape.html
Is it what you were looking for ?
import numpy as np
foo = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
ret = [[], [], []]
for idx, number in enumerate(foo):
ret[idx % 3].append(number)
print ret # out: [[0, 3, 6], [1, 4, 7], [2, 5, 8]]
I am looking for a more pythonic way of randomly shifting rows of a numpy array. The idea is that I have an array of data, and I want to left-shift each row of the array by a random amount. My solution, which works, but I feel is a bit un-pythonic:
def shift_rows(data, max_shift):
"""Left-shifts each row in `data` by a random amount up to `max_shift`."""
return np.array([np.roll(row, -np.random.randint(0, max_shift)) for row in data])
And to test:
data = np.array([np.arange(0, 5) for _ in range(10)]) # toy data to illustrate
shifted = shift_rows(data, max_shift=5)
shifted
# array([1, 2, 3, 4, 0],
# [1, 2, 3, 4, 0],
# [0, 1, 2, 3, 4],
# ...
# [4, 0, 1, 2, 3]])
This is really more of a thought experiment. Can anybody come up with a more efficient or more pythonic way of doing this? I suppose list comprehensions are pythonic, but if I need to do this over a huge array is this efficient?
Edit: I marked the excellent reply by Divakar as the answer, but I would still love to hear it if anybody has any other ideas.
Generate all the column indices for all rows in one go and then simply use integer-indexing for a vectorized solution, like so -
# Store shape of input array
m,n = data.shape
# Get random column start indices for each row in one go
col_start = np.random.randint(0, max_shift, data.shape[0])
# Get the rolled indices for every row again in a vectorized manner.
# We are extending col_start to 2D and then adding a range array to get
# all column indices for every row by leveraging NumPy's braodcasting.
# Because of the additions, we might go off-limits. So, to simulate the
# rolled over version, mod it.
idx = np.mod(col_start[:,None] + np.arange(n), n)
# Finall with integer indexing get the values off data array
shifted_out = data[np.arange(m)[:,None], idx]
Step-by-step run -
1] Inputs :
In [548]: data
Out[548]:
array([[44, 23, 38, 32, 30],
[69, 15, 32, 41, 63],
[69, 41, 75, 50, 87],
[23, 28, 38, 79, 91]])
In [549]: max_shift = 5
2] Proposed solution :
2A] Get column starts :
In [550]: m,n = data.shape
In [551]: col_start = np.random.randint(0, max_shift, data.shape[0])
In [552]: col_start
Out[552]: array([1, 2, 3, 3])
2B] Get all indices :
In [553]: idx = np.mod(col_start[:,None] + np.arange(n), n)
In [554]: col_start[:,None]
Out[554]:
array([[1],
[2],
[3],
[3]])
In [555]: col_start[:,None] + np.arange(n)
Out[555]:
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[3, 4, 5, 6, 7]])
In [556]: np.mod(col_start[:,None] + np.arange(n), n)
Out[556]:
array([[1, 2, 3, 4, 0],
[2, 3, 4, 0, 1],
[3, 4, 0, 1, 2],
[3, 4, 0, 1, 2]])
2C] Finally index into data :
In [557]: data[np.arange(m)[:,None], idx]
Out[557]:
array([[23, 38, 32, 30, 44],
[32, 41, 63, 69, 15],
[50, 87, 69, 41, 75],
[79, 91, 23, 28, 38]])
Verification -
1] Original approach :
In [536]: data = np.random.randint(11,99,(4,5))
...: max_shift = 5
...: col_start = -np.random.randint(0, max_shift, data.shape[0])
...: for i,row in enumerate(data):
...: print np.array([np.roll(row, col_start[i])])
...:
[[83 93 17 53 61]]
[[55 88 84 94 89]]
[[59 63 29 72 85]]
[[57 95 13 21 14]]
2] Proposed approach re-using col_start, so that we could do a value verification :
In [537]: m,n = data.shape
In [538]: idx = np.mod(-col_start[:,None] + np.arange(n), n)
In [539]: data[np.arange(m)[:,None], idx]
Out[539]:
array([[83, 93, 17, 53, 61],
[55, 88, 84, 94, 89],
[59, 63, 29, 72, 85],
[57, 95, 13, 21, 14]])
I frequently want to pixel bin/pixel bucket a numpy array, meaning, replace groups of N consecutive pixels with a single pixel which is the sum of the N replaced pixels. For example, start with the values:
x = np.array([1, 3, 7, 3, 2, 9])
with a bucket size of 2, this transforms into:
bucket(x, bucket_size=2)
= [1+3, 7+3, 2+9]
= [4, 10, 11]
As far as I know, there's no numpy function that specifically does this (please correct me if I'm wrong!), so I frequently roll my own. For 1d numpy arrays, this isn't bad:
import numpy as np
def bucket(x, bucket_size):
return x.reshape(x.size // bucket_size, bucket_size).sum(axis=1)
bucket_me = np.array([3, 4, 5, 5, 1, 3, 2, 3])
print(bucket(bucket_me, bucket_size=2)) #[ 7 10 4 5]
...however, I get confused easily for the multidimensional case, and I end up rolling my own buggy, half-assed solution to this "easy" problem over and over again. I'd love it if we could establish a nice N-dimensional reference implementation.
Preferably the function call would allow different bin sizes along different axes (perhaps something like bucket(x, bucket_size=(2, 2, 3)))
Preferably the solution would be reasonably efficient (reshape and sum are fairly quick in numpy)
Bonus points for handling edge effects when the array doesn't divide nicely into an integer number of buckets.
Bonus points for allowing the user to choose the initial bin edge offset.
As suggested by Divakar, here's my desired behavior in a sample 2-D case:
x = np.array([[1, 2, 3, 4],
[2, 3, 7, 9],
[8, 9, 1, 0],
[0, 0, 3, 4]])
bucket(x, bucket_size=(2, 2))
= [[1 + 2 + 2 + 3, 3 + 4 + 7 + 9],
[8 + 9 + 0 + 0, 1 + 0 + 3 + 4]]
= [[8, 23],
[17, 8]]
...hopefully I did my arithmetic correctly ;)
I think you can do most of the fiddly work with skimage's view_as_blocks. This function is implemented using as_strided so it is very efficient (it just changes the stride information to reshape the array). Because it's written in Python/NumPy, you can always copy the code if you don't have skimage installed.
After applying that function, you just need to sum the N trailing axes of the reshaped array (where N is the length of the bucket_size tuple). Here's a new bucket() function:
from skimage.util import view_as_blocks
def bucket(x, bucket_size):
blocks = view_as_blocks(x, bucket_size)
tup = tuple(range(-len(bucket_size), 0))
return blocks.sum(axis=tup)
Then for example:
>>> x = np.array([1, 3, 7, 3, 2, 9])
>>> bucket(x, bucket_size=(2,))
array([ 4, 10, 11])
>>> x = np.array([[1, 2, 3, 4],
[2, 3, 7, 9],
[8, 9, 1, 0],
[0, 0, 3, 4]])
>>> bucket(x, bucket_size=(2, 2))
array([[ 8, 23],
[17, 8]])
>>> y = np.arange(6*6*6).reshape(6,6,6)
>>> bucket(y, bucket_size=(2, 2, 3))
array([[[ 264, 300],
[ 408, 444],
[ 552, 588]],
[[1128, 1164],
[1272, 1308],
[1416, 1452]],
[[1992, 2028],
[2136, 2172],
[2280, 2316]]])
Natively from as_strided :
x = array([[1, 2, 3, 4],
[2, 3, 7, 9],
[8, 9, 1, 0],
[0, 0, 3, 4]])
from numpy.lib.stride_tricks import as_strided
def bucket(x,bucket_size):
x=np.ascontiguousarray(x)
oldshape=array(x.shape)
newshape=concatenate((oldshape//bucket_size,bucket_size))
oldstrides=array(x.strides)
newstrides=concatenate((oldstrides*bucket_size,oldstrides))
axis=tuple(range(x.ndim,2*x.ndim))
return as_strided (x,newshape,newstrides).sum(axis)
if a dimension not divide evenly into the corresponding dimension of x, remaining elements are lost.
verification :
In [9]: bucket(x,(2,2))
Out[9]:
array([[ 8, 23],
[17, 8]])
To specify different bin sizes along each axis for ndarray cases, you can use iteratively use np.add.reduceat along each axis of it, like so -
def bucket(x, bin_size):
ndims = x.ndim
out = x.copy()
for i in range(ndims):
idx = np.append(0,np.cumsum(bin_size[i][:-1]))
out = np.add.reduceat(out,idx,axis=i)
return out
Sample run -
In [126]: x
Out[126]:
array([[165, 107, 133, 82, 199],
[ 35, 138, 91, 100, 207],
[ 75, 99, 40, 240, 208],
[166, 171, 78, 7, 141]])
In [127]: bucket(x, bin_size = [[2, 2],[3, 2]])
Out[127]:
array([[669, 588],
[629, 596]])
# [2, 2] are the bin sizes along axis=0
# [3, 2] are the bin sizes along axis=1
# array([[165, 107, 133, | 82, 199],
# [ 35, 138, 91, | 100, 207],
# -------------------------------------
# [ 75, 99, 40, | 240, 208],
# [166, 171, 78, | 7, 141]])
In [128]: x[:2,:3].sum()
Out[128]: 669
In [129]: x[:2,3:].sum()
Out[129]: 588
In [130]: x[2:,:3].sum()
Out[130]: 629
In [131]: x[2:,3:].sum()
Out[131]: 596
Here is my code. What I want it to return is an array of matrices
[[1,1],[1,1]], [[2,4],[8,16]], [[3,9],[27,81]]
I know I can probably do it using for loop and looping through my vector k, but I was wondering if there is a simple way that I am missing. Thanks!
from numpy import *
import numpy as np
k=np.arange(1,4,1)
print k
def exam(p):
return np.array([[p,p**2],[p**3,p**4]])
print exam(k)
The output:
[1 2 3]
[[[ 1 2 3]
[ 1 4 9]]
[[ 1 8 27]
[ 1 16 81]]]
The key is to play with the shapes and broadcasting.
b = np.arange(1,4) # the base
e = np.arange(1,5) # the exponent
b[:,np.newaxis] ** e
=>
array([[ 1, 1, 1, 1],
[ 2, 4, 8, 16],
[ 3, 9, 27, 81]])
(b[:,None] ** e).reshape(-1,2,2)
=>
array([[[ 1, 1],
[ 1, 1]],
[[ 2, 4],
[ 8, 16]],
[[ 3, 9],
[27, 81]]])
If you must have the output as a list of matrices, do:
m = (b[:,None] ** e).reshape(-1,2,2)
[ np.mat(a) for a in m ]
=>
[matrix([[1, 1],
[1, 1]]),
matrix([[ 2, 4],
[ 8, 16]]),
matrix([[ 3, 9],
[27, 81]])]