Related
I have two 1D NumPy arrays x = [x[0], x[1], ..., x[n-1]] and y = [y[0], y[1], ..., y[n-1]]. The array x is known, and I need to determine the values for array y. For every index in np.arange(n), the value of y[index] depends on x[index] and on x[index + 1: ]. My code is this:
import numpy as np
n = 5
q = 0.5
x = np.array([1, 2, 0, 1, 0])
y = np.empty(n, dtype=int)
for index in np.arange(n):
if (x[index] != 0) and (np.any(x[index + 1:] == 0)):
y[index] = np.random.choice([0,1], 1, p=(1-q, q))
else:
y[index] = 0
print(y)
The problem with the for loop is that the size of n in my experiment can become very large. Is there any vectorized way to do this?
Randomly generate the array y with the full shape.
Generate a bool array indicating where to set zeros.
Use np.where to set zeros.
Try this,
import numpy as np
n = 5
q = 0.5
x = np.array([1, 2, 0, 1, 0])
y = np.random.choice([0, 1], n, p=(1-q, q))
condition = (x != 0) & (x[::-1].cumprod() == 0)[::-1] # equivalent to the posted one
y = np.where(condition, y, 0)
I have two sorted, numpy arrays similar to these ones:
x = np.array([1, 2, 8, 11, 15])
y = np.array([1, 8, 15, 17, 20, 21])
Elements never repeat in the same array. I want to figure out a way of pythonicaly figuring out a list of indexes that contain the locations in the arrays at which the same element exists.
For instance, 1 exists in x and y at index 0. Element 2 in x doesn't exist in y, so I don't care about that item. However, 8 does exist in both arrays - in index 2 in x but index 1 in y. Similarly, 15 exists in both, in index 4 in x, but index 2 in y. So the outcome of my function would be a list that in this case returns [[0, 0], [2, 1], [4, 2]].
So far what I'm doing is:
def get_indexes(x, y):
indexes = []
for i in range(len(x)):
# Find index where item x[i] is in y:
j = np.where(x[i] == y)[0]
# If it exists, save it:
if len(j) != 0:
indexes.append([i, j[0]])
return indexes
But the problem is that arrays x and y are very large (millions of items), so it takes quite a while. Is there a better pythonic way of doing this?
Without Python loops
Code
def get_indexes_darrylg(x, y):
' darrylg answer '
# Use intersect to find common elements between two arrays
overlap = np.intersect1d(x, y)
# Indexes of common elements in each array
loc1 = np.searchsorted(x, overlap)
loc2 = np.searchsorted(y, overlap)
# Result is the zip two 1d numpy arrays into 2d array
return np.dstack((loc1, loc2))[0]
Usage
x = np.array([1, 2, 8, 11, 15])
y = np.array([1, 8, 15, 17, 20, 21])
result = get_indexes_darrylg(x, y)
# result[0]: array([[0, 0],
[2, 1],
[4, 2]], dtype=int64)
Timing Posted Solutions
Results show that darrlg code has the fastest run time.
Code Adjustment
Each posted solution as a function.
Slight mod so that each solution outputs an numpy array.
Curve named after poster
Code
import numpy as np
import perfplot
def create_arr(n):
' Creates pair of 1d numpy arrays with half the elements equal '
max_val = 100000 # One more than largest value in output arrays
arr1 = np.random.randint(0, max_val, (n,))
arr2 = arr1.copy()
# Change half the elements in arr2
all_indexes = np.arange(0, n, dtype=int)
indexes = np.random.choice(all_indexes, size = n//2, replace = False) # locations to make changes
np.put(arr2, indexes, np.random.randint(0, max_val, (n//2, ))) # assign new random values at change locations
arr1 = np.sort(arr1)
arr2 = np.sort(arr2)
return (arr1, arr2)
def get_indexes_lllrnr101(x,y):
' lllrnr101 answer '
ans = []
i=0
j=0
while (i<len(x) and j<len(y)):
if x[i] == y[j]:
ans.append([i,j])
i += 1
j += 1
elif (x[i]<y[j]):
i += 1
else:
j += 1
return np.array(ans)
def get_indexes_joostblack(x, y):
'joostblack'
indexes = []
for idx,val in enumerate(x):
idy = np.searchsorted(y,val)
try:
if y[idy]==val:
indexes.append([idx,idy])
except IndexError:
continue # ignore index errors
return np.array(indexes)
def get_indexes_mustafa(x, y):
indices_in_x = np.flatnonzero(np.isin(x, y)) # array([0, 2, 4])
indices_in_y = np.flatnonzero(np.isin(y, x[indices_in_x])) # array([0, 1, 2]
return np.array(list(zip(indices_in_x, indices_in_y)))
def get_indexes_darrylg(x, y):
' darrylg answer '
# Use intersect to find common elements between two arrays
overlap = np.intersect1d(x, y)
# Indexes of common elements in each array
loc1 = np.searchsorted(x, overlap)
loc2 = np.searchsorted(y, overlap)
# Result is the zip two 1d numpy arrays into 2d array
return np.dstack((loc1, loc2))[0]
def get_indexes_akopcz(x, y):
' akopcz answer '
return np.array([
[i, j]
for i, nr in enumerate(x)
for j in np.where(nr == y)[0]
])
perfplot.show(
setup = create_arr, # tuple of two 1D random arrays
kernels=[
lambda a: get_indexes_lllrnr101(*a),
lambda a: get_indexes_joostblack(*a),
lambda a: get_indexes_mustafa(*a),
lambda a: get_indexes_darrylg(*a),
lambda a: get_indexes_akopcz(*a),
],
labels=["lllrnr101", "joostblack", "mustafa", "darrylg", "akopcz"],
n_range=[2 ** k for k in range(5, 21)],
xlabel="Array Length",
# More optional arguments with their default values:
# logx="auto", # set to True or False to force scaling
# logy="auto",
equality_check=None, #np.allclose, # set to None to disable "correctness" assertion
# show_progress=True,
# target_time_per_measurement=1.0,
# time_unit="s", # set to one of ("auto", "s", "ms", "us", or "ns") to force plot units
# relative_to=1, # plot the timings relative to one of the measurements
# flops=lambda n: 3*n, # FLOPS plots
)
What you are doing is O(nlogn) which is decent enough.
If you want, you can do it in O(n) by iterating on both arrays with two pointers and since they are sorted, increase the pointer for the array with smaller object.
See below:
x = [1, 2, 8, 11, 15]
y = [1, 8, 15, 17, 20, 21]
def get_indexes(x,y):
ans = []
i=0
j=0
while (i<len(x) and j<len(y)):
if x[i] == y[j]:
ans.append([i,j])
i += 1
j += 1
elif (x[i]<y[j]):
i += 1
else:
j += 1
return ans
print(get_indexes(x,y))
which gives me:
[[0, 0], [2, 1], [4, 2]]
Although, this function will search for all the occurances of x[i] in the y array, if duplicates are not allowed in y it will find x[i] exactly once.
def get_indexes(x, y):
return [
[i, j]
for i, nr in enumerate(x)
for j in np.where(nr == y)[0]
]
You can use numpy.searchsorted:
def get_indexes(x, y):
indexes = []
for idx,val in enumerate(x):
idy = np.searchsorted(y,val)
if y[idy]==val:
indexes.append([idx,idy])
return indexes
One solution is to first look from x's side to see what values are included in y by getting their indices through np.isin and np.flatnonzero, and then use the same procedure from the other side; but instead of giving x entirely, we give only the (already found) intersected elements to gain time:
indices_in_x = np.flatnonzero(np.isin(x, y)) # array([0, 2, 4])
indices_in_y = np.flatnonzero(np.isin(y, x[indices_in_x])) # array([0, 1, 2])
Now you can zip them to get the result:
result = list(zip(indices_in_x, indices_in_y)) # [(0, 0), (2, 1), (4, 2)]
In R, when I execute the code below:
> X=matrix(1,2,3)
> c=c(1,2,3)
> X*c
R gives out the following output:
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 1 3
But when I do the below on Python:
>>> import numpy as np
>>> X=np.array([[1,1,1],[1,1,1]])
>>> c=np.array([1,2,3])
>>> X*c
the Python code above gives the following output:
array([[1, 2, 3],
[1, 2, 3]])
Is there any way that I can make the Python to come up with the identical output as R? I think I somehow have to tell Python that I want the numpy to multiply each element of the matrix X by each element of the vector c along the column, instead of along the row, but I am not sure how to go about this.
In [18]: np.reshape([1,2,3]*2,(2,3),order='F')
Out[18]:
array([[1, 3, 2],
[2, 1, 3]])
This starts with a list multiply, which is replication:
In [19]: [1,2,3]*2
Out[19]: [1, 2, 3, 1, 2, 3]
The rest uses numpy to reshape it into a (2,3) array, but with consecutive values going down, 'F' order.
Not knowning R, and in particular the c(1,2,3) expression, I can't say that's what's going on in R.
===
You talk about rows with columns, but I don't see how that works in your example. That said, we can easily perform outer like products
===
This reproduces your R_Product (at least in a few test cases):
In [138]: def foo(X,c):
...: X1 = X.ravel()
...: Y = np.resize(c,X1.shape)*X1
...: return Y.reshape(X.shape, order='F')
...:
In [139]: foo(np.ones((2,3)),np.arange(1,4))
Out[139]:
array([[1., 3., 2.],
[2., 1., 3.]])
In [140]: foo(np.arange(6).reshape(2,3),np.arange(1,4))
Out[140]:
array([[ 0, 6, 8],
[ 2, 3, 15]])
I'm using the resize function to replicate c to match the total number of elements of X. And order F to stack them in the desired column order. The default for numpy is order C.
In numpy replicating an array to match another is not common, at least not in this sense. Replicating by row or column, as in broadcasting is common. And of course reshaping.
I am the OP.
I was looking for a quick and easy solution, but I guess there is no straightforward functionality in Python that allows us to do this. So, I had to make a function that multiplies a matrix with a vector in the same manner that R does:
def R_product(X,c):
"""
Computes the regular R product
(not same as the matrix product) between
a 2D Numpy Array X, and a numpy vector c.
Args:
X: 2D Numpy Array
c: A Numpy vector
Returns: the output of X*c in R.
(This is different than X/*/c in R)
"""
X_nrow = X.shape[0]
X_ncol = X.shape[1]
X_dummy = np.zeros(shape=((X_nrow * X_ncol),1))
nrow = X_dummy.shape[0]
nc = nrow // len(c)
Y = np.zeros(shape=(nrow,1))
for j in range(X_ncol):
for u in range(X_nrow):
X_element = X[u,j]
if u == X_nrow - 1:
idx = X_nrow * (j+1) - 1
else:
idx = X_nrow * j + (u+1) - 1
X_dummy[idx,0] = X_element
for i in range(nc):
for j in range(len(c)):
Y[(i*len(c)+j):(i*len(c)+j+1),:] = (X_dummy[(i*len(c)+j):(i*len(c)+j+1),:]) * c[j]
for z in range(nrow-nc*len(c)):
Y[(nc*len(c)+z):(nc*len(c)+z+1),:] = (X_dummy[(nc*len(c)+z):(nc*len(c)+z+1),:]) * c[z]
return Y.reshape(X_ncol, X_nrow).transpose() # the answer I am looking for
Should work.
I try to convert code from Matlab to python
I have code in Matlab:
[value, iA, iB] = intersect(netA{i},netB{j});
I am looking for code in python that find the values common to both A and B, as well as the index vectors ia and ib (for each common element, its first index in A and its first index in B).
I try to use different solution, but I received vectors with different length. tried to use numpy.in1d/intersect1d , that returns bad not the same value.
Thing I try to do :
def FindoverlapIndx(self,a, b):
bool_a = np.in1d(a, b)
ind_a = np.arange(len(a))
ind_a = ind_a[bool_a]
ind_b = np.array([np.argwhere(b == a[x]) for x in ind_a]).flatten()
return ind_a, ind_b
IS=np.arange(IDs[i].shape[0])[np.in1d(IDs[i], R_IDs[j])]
IR = np.arange(R_IDs[j].shape[0])[np.in1d(R_IDs[j],IDs[i])]
I received indexes with different lengths. But both must be of the same length as in Matlab's intersect.
MATLAB's intersect(a, b) returns:
common values of a and b, sorted
the first position of each of them in a
the first position of each of them in b
NumPy's intersect1d does only the first part. So I read its source and modified it to return indices as well.
import numpy as np
def intersect_mtlb(a, b):
a1, ia = np.unique(a, return_index=True)
b1, ib = np.unique(b, return_index=True)
aux = np.concatenate((a1, b1))
aux.sort()
c = aux[:-1][aux[1:] == aux[:-1]]
return c, ia[np.isin(a1, c)], ib[np.isin(b1, c)]
a = np.array([7, 1, 7, 7, 4]);
b = np.array([7, 0, 4, 4, 0]);
c, ia, ib = intersect_mtlb(a, b)
print(c, ia, ib)
This prints [4 7] [4 0] [2 0] which is consistent with the output on MATLAB documentation page, as I used the same example as they did. Of course, indices are 0-based in Python unlike MATLAB.
Explanation: the function takes unique elements from each array, puts them together, and concatenates: the result is [0 1 4 4 7 7]. Each number appears at most twice here; when it's repeated, that means it was in both arrays. This is what aux[1:] == aux[:-1] selects for.
The array ia contains the first index of each element of a1 in the original array a. Filtering it by isin(a1, c) leaves only the indices that were in c. Same is done for ib.
EDIT:
Since version 1.15.0, intersect1d does the second and third part if you pass return_indices=True:
x = np.array([1, 1, 2, 3, 4])
y = np.array([2, 1, 4, 6])
xy, x_ind, y_ind = np.intersect1d(x, y, return_indices=True)
Where you get xy = array([1, 2, 4]), x_ind = array([0, 2, 4]) and y_ind = array([1, 0, 2])
How do I use the numpy accumulator and add functions to add arrays column wise to make a basic accumulator?
import numpy as np
a = np.array([1,1,1])
b = np.array([2,2,2])
c = np.array([3,3,3])
two_dim = np.array([a,b,c])
y = np.array([0,0,0])
for x in two_dim:
y = np.add.accumulate(x,axis=0,out=y)
return y
actual output: [1,2,3]
desired output: [6,6,6]
numpy glossary says the sum along axis argument axis=1 sums over rows: "we can sum each row of an array, in which case we operate along columns, or axis 1".
"A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1)"
With axis=1 I would expect output [3,6,9], but this also returns [1,2,3].
Of Course! neither x nor y are two-dimensional.
What am I doing wrong?
I can manually use np.add()
aa = np.array([1,1,1])
bb = np.array([2,2,2])
cc = np.array([3,3,3])
yy = np.array([0,0,0])
l = np.add(aa,yy)
m = np.add(bb,l)
n = np.add(cc,m)
print n
and now I get the correct output, [6,6,6]
I think
two_dim.sum(axis=0)
# [6 6 6]
will give you what you want.
I don't think accumulate is what you're looking for as it provides a running operation, so, using add it would look like:
np.add.accumulate(two_dim)
[[1 1 1]
[3 3 3] # = 1+2
[6 6 6]] # = 1+2+3
reduce is more like what you've descibed:
np.add.reduce(two_dim)
[6 6 6]