In python need to combine two 2-dimensional numpy arrays, so that the resulting rows are combinations of the rows from the input arrays concatenated together. I need the fastest solution so it can be used in arrays that are very big.
For example:
I got:
import numpy as np
array1 = np.array([[1,2],[3,4]])
array2 = np.array([[5,6],[7,8]])
I want the code to return:
[[1,2,5,6]
[1,2,7,8]
[3,4,5,6]
[3,4,7,8]]
Solution using numpy's repeat, tile and hstack
The snippet
result = np.hstack([
np.repeat(array1, array2.shape[0], axis=0),
np.tile(array2, (array1.shape[0], 1))
])
Step by step explanation
We start with the two arrays, array1 and array2:
import numpy as np
array1 = np.array([[1,2],[3,4]])
array2 = np.array([[5,6],[7,8]])
First, we duplicate the content of array1 using repeat:
a = np.repeat(array1, array2.shape[0], axis=0)
The content of a is:
array([[1, 2],
[1, 2],
[3, 4],
[3, 4]])
Then we repeat the second array, array2, using tile. In particular, (array1.shape[0],1) replicates array2 in the first direction array1.shape[0] times and just 1 time in the other direction.
b = np.tile(array2, (array1.shape[0],1))
The result is:
array([[5, 6],
[7, 8],
[5, 6],
[7, 8]])
Now we can just proceed to stack the two results, using hstack:
result = np.hstack([a,b])
Achieving the desired output:
array([[1, 2, 5, 6],
[1, 2, 7, 8],
[3, 4, 5, 6],
[3, 4, 7, 8]])
For this small example, itertools.product is actually faster. I don't know how it scales
alist = list(itertools.product(array1.tolist(),array2.tolist()))
np.array(alist).reshape(-1,4)
Related
Let the 2-dimensional array is as below:
In [1]: a = [[1, 2], [3, 4], [5, 6], [1, 2], [7, 8]]
a = np.array(a)
a, type(a)
Out [1]: (array([[1, 2],
[3, 4],
[5, 6],
[1, 2],
[7, 8]]),
numpy.ndarray)
I have tried to do this procedure:
In [2]: a = a[a != [1, 2])
a = np.reshape(a, (int(a.size/2), 2) # I have to do this since on the first line in In [2] change the dimension to 1 [3, 4, 5, 6, 7, 8] (the initial array is 2-dimensional array)
a
Out[2]: array([[3, 4],
[5, 6],
[7, 8]])
My question is, is there any function in NumPy that can directly do that?
Updated Question
Here's the semi-full source code that I've been working on:
from sklearn import datasets
data = datasets.load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Target'] = pd.DataFrame(data.target)
bucket = df[df['Target'] == 0]
bucket = bucket.iloc[:,[0,1]].values
lp, rp = leftestRightest(bucket)
bucket = np.array([x for x in bucket if list(x) != lp])
bucket = np.array([x for x in bucket if list(x) != rp])
Notes:
leftestRightest(arg) is a function that returns 2 one-dimensional NumPy arrays of size 2 (which are lp and rp). For instances, lp = [1, 3], rp = [2, 4] and the parameter is 2-dimensional NumPy array
There should be a more delicate approach, but here what I have come up with:
np.array([x for x in a if list(x) != [1,2]])
Output
[[3, 4], [5, 6], [7, 8]]
Note that I wouldn't recommend working with list comprehensions in the large array since it would be highly time-consuming.
You're approach is correct, but the mask needs to be single-dimensional:
a[(a != [1, 2]).all(-1)]
Output:
array([[3, 4],
[5, 6],
[7, 8]])
Alternatively, you can collect the elements and infer the dimension with -1:
a[a != [1, 2]].reshape(-1, 2)
the boolean condition creates a 2D array of True/False. You have to apply and operation across the columns to make sure the match is not a partial match. Consider a row [5,2] in your above array, the script you wrote will add 5 and ignore 2 in the resultant 1D array. It can be done as follows:
a[np.all(a != [1, 2],axis=1)]
Is there any efficient way to find the multiplication of every row in a matrix using numpy?
I mean, for example, if
A = [[1, 2], [3, 4]]
then I would want something like np.sum(A, axis=1) just,
np.mul(A, axis=0) = [2, 12]
np.prod is what you're looking for.
a = np.array([[1, 2], [3, 4]])
print(np.prod(a, axis=1)) # Prints array([2, 12])
Use nympy.prod, exactly as you describe, i.e.
import numpy as np
A = [[1, 2], [3, 4]]
np.prod(A, axis=1) # Gives [ 2 12]
The multiply function is an universal function (ufunc) so you could do:
import numpy as np
A = np.array([[1, 2], [3, 4]])
result = np.multiply.reduce(A, axis=1)
print(result)
Output
[ 2 12]
Read the documentation on reduce, here.
I know it is possible to use meshgrid to get all combinations between two arrays using numpy.
But in my case I have an array of two columns and n rows and another array that I would like to get the unique combinations.
For example:
a = [[1,1],
[2,2],
[3,3]]
b = [5,6]
# The expected result would be:
final_array = [[1,1,5],
[1,1,6],
[2,2,5],
[2,2,6],
[3,3,5],
[3,3,6]]
Which method is the fastest way to get this result using only numpy?
Proposed solution
Ok got the result, but I would like to know if this is a reliable and fast solution for this task, if someone could give me any advice I will appreciate.
a_t = np.tile(a, len(b)).reshape(-1,2)
b_t = np.tile(b, len(a)).reshape(1,-1)
final_array = np.hstack((a_t,b_t.T))
array([[1, 1, 5],
[1, 1, 6],
[2, 2, 5],
[2, 2, 6],
[3, 3, 5],
[3, 3, 6]])
Kind of ugly, but here's one way:
xx = np.repeat(a, len(b)).reshape(-1, a.shape[1])
yy = np.tile(b, a.shape[0])[:, None]
np.concatenate((xx, yy), axis=1)
I try to have a matrix like:
M= [[1,1,..,1],
[2,2,..,2],
...
[40000, 40000, ..,40000]
It's what I tried:
data = np.mat((40000,8))
print(data.shape)
for i in range(data.shape[0]):
data[i,:] = i
print(data[:5])
The above code prints:
(1, 2)
[[0 0]]
I know how to fill a matrix with constant values, but I couldn't find a similar question for this case.
Use a simple array and don't forget that Python starts indexing at 0:
data = np.zeros((40000,8))
for i in range(data.shape[0]):
data[i,:] = i+1
Here's a way using numpy:
rows = 10
cols = 3
l = np.arange(1,rows)
np.tile(l,cols).reshape(cols,rows-1).T
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7],
[8, 8, 8],
[9, 9, 9]])
Matthieu Brucher's answer will perfectly do for your case. If you are looking at numbers much higher than 4000 and if time is an issue, you might want to get rid of the for-loop and create a list of lists with list comprehension before turning it into a numpy array:
a = [[i]*8 for i in range(1,4001)]
m = np.asarray(a)
In my case, this solution was ~7 times faster.
To use numpy broadcast over iterations u can do,
import numpy as np
M = np.ones((40000,8), dtype=np.int).T * np.arange(1, 40001)
M = M.T
print(M)
This should be faster than any above iterations.
If that's what u are looking for
Very simple:
data = np.arange(1, 40001).repeat(8).reshape(-1,8)
Though this is pure numpy as well, this is considerably slower than #yatu's solution.
I have two arrays of same size. In general dtype of these arrays is object (dtype = 'O'). What is the best way to access elements with same indicies from both arrays.
Possibility 1:
remove_indices = [i for i in range(len(array1)) if value in array1]
array1 = np.delete(array1, remove_indices, 0)
array2 = np.delete(array2, remove_indices, 0)
Possibility 2:
array3 = np.array([[array1[i], array2[i]] for i in range(len(array1))
if value not in array1[i]])
array1 = array3[:,0]
array2 = array3[:,1]
Note that Possibility 2 is faster. Is there any other solution with similar execution time (or faster)? How could I make Possiblity 2 more readable?
Not sure to understand well your examples, but sticking to What is the best way to access elements with same indicies from both arrays. make me think about zip. But using numpy why not using transpose ?
Like:
>>> array1 = numpy.array([0, 1, 2, 3, 4])
>>> array2 = numpy.array([5, 6, 7, 8, 9])
>>> numpy.array([array1, array2])
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> numpy.array([array1, array2]).T
array([[0, 5],
[1, 6],
[2, 7],
[3, 8],
[4, 9]])