multiply numpy row with all elements in list - python

how to multiply all rows in numpy array with list elements one by one like first row in array with first tuple in list , second with second and so on.
i am doing this
utl = np.array([[ 3, 12. ],
[ 3. , 17. ]])
all_ltp = ([(0, 134.30000305175778), (1, 133.80000305175778)])
a=np.array(list(itertools.product(utl, all_ltp)))
a = np.reshape(a, (-1,4))
print(a)
output is -
[[ 3. 12. 0. 134.30000305]
[ 3. 12. 1. 133.80000305]
[ 3. 17. 0. 134.30000305]
[ 3. 17. 1. 133.80000305]]
it only works but if i increase the values of array then
utl = np.array([[ 3, 12. , 99 ],
[ 3. , 17. , 99 ]])
all_ltp = ([(0, 134.30000305175778), (1, 133.80000305175778)])
a=np.array(list(itertools.product(utl, all_ltp)))
a = np.reshape(a, (-1,2))
print(a)
output is -
[[array([ 3., 12., 99.]) (0, 134.30000305175778)]
[array([ 3., 12., 99.]) (1, 133.80000305175778)]
[array([ 3., 17., 99.]) (0, 134.30000305175778)]
[array([ 3., 17., 99.]) (1, 133.80000305175778)]]
it is also working but not combining elements
output must be -
[[ 3. 12. 99 0. 134.30000305]
[ 3. 12. 99 1. 133.80000305]
[ 3. 17. 99 0. 134.30000305]
[ 3. 17. 99 1. 133.80000305]]

First convert all_ltp to a Numpy array:
b = np.array(all_ltp)
Then generate 2 intermediate arrays, by repeating utl and tiling b:
wrk1 = np.repeat(utl, repeats=b.shape[0], axis=0)
wrk2 = np.tile(b, reps=(utl.shape[0], 1))
(print both of them to see the result).
And to get the final result, horizontally stack both these tables:
result = np.hstack((wrk1, wrk2))
The result, for your source data, is:
[[ 3. 12. 99. 0. 134.30000305]
[ 3. 12. 99. 1. 133.80000305]
[ 3. 17. 99. 0. 134.30000305]
[ 3. 17. 99. 1. 133.80000305]]
Or, to have more concise code, run:
result = np.hstack((np.repeat(utl, repeats=b.shape[0], axis=0),
np.tile(b, reps=(utl.shape[0], 1))))

Related

Numpy - Product of Cartesian Product Along Last Axis of Jagged Array

Given a nested list with unequal number of elements, I would like to find the fastest way to calculate the product of the cartesian product along the last axis. In other words, first calculate the cartesian product between all sublists, then find the multiplicative product along all combinations. Then finally, I want to insert those values into a matrix of the same size/dimensionality as the original input. As an added piece of complexity, I want to pad axes of shape (1, ) with an extra 0. For example:
example1 = [[1, 2], [3, 4], [5], [6], [7]]
should result in
[[[[[ 630. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]
[[[ 840. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]]
[[[[1260. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]
[[[1680. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]]]
which has a shape (2, 2, 2, 2, 2), although it would be (2, 2, 1, 1, 1) without padding.
My initial function is:
def convert_nest_to_product_tensor(nest):
# find indices to collect elements from
combinations = list(itertools.product(*[range(len(l)) for l in nest]))
# collect elements and then calculate product for every Cartesian product
products = np.array(
[np.product([nest[i][idx] for i, idx in enumerate(comb)]) for comb in combinations]
)
# pad tensor for axes of shape 1
tensor_shape = [len(l) for l in nest]
tensor_shape = tuple([axis_shape+1 if axis_shape==1 else axis_shape for axis_shape in tensor_shape])
tensor = np.zeros(tensor_shape)
# insert values
for i, idx in enumerate(combinations):
tensor[idx] = products[i]
return tensor
However, it takes while, specifically the part where I find the product of the Cartesian products. I tried replacing that component using np.meshgrid + np.stack:
products = np.stack(np.meshgrid(*nest), axis=-1).reshape(-1, len(nest))
products = np.prod(products, axis=-1)
and while I get the correct values much faster, but they are not in the correct output order:
[[[[[ 630. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]
[[[1260. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]]
[[[[ 840. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]
[[[1680. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]]]
Any feedback on how to make this work (quickly) is much appreciated!
A simple way of getting the cartesian tuples and product:
In [10]: alist = list(itertools.product(*example1))
In [11]: alist
Out[11]: [(1, 3, 5, 6, 7), (1, 4, 5, 6, 7), (2, 3, 5, 6, 7), (2, 4, 5, 6, 7)]
In [12]: [np.prod(x) for x in alist]
Out[12]: [630, 840, 1260, 1680]
Or use math.prod for a no-numpy solution.

How to do an outer product of 3 vectors to create a 3d matrix in numpy? (and same for nd)

If i want to do an outer product of 2 vectors to create a 2d matrix, each element a product of the two respective elements in the original vectors:
b = np.arange(5).reshape((1, 5))
a = np.arange(5).reshape((5, 1))
a * b
array([[ 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4],
[ 0, 2, 4, 6, 8],
[ 0, 3, 6, 9, 12],
[ 0, 4, 8, 12, 16]])
I want the same for 3 (or for n) vectors.
An equivalent non numpy answer:
a = np.arange(5)
b = np.arange(5)
c = np.arange(5)
res = np.zeros((a.shape[0], b.shape[0], c.shape[0]))
for ia in range(len(a)):
for ib in range(len(b)):
for ic in range(len(c)):
res[ia, ib, ic] = a[ia] * b[ib] * c[ic]
print(res)
out:
[[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 1. 2. 3. 4.]
[ 0. 2. 4. 6. 8.]
[ 0. 3. 6. 9. 12.]
[ 0. 4. 8. 12. 16.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 2. 4. 6. 8.]
[ 0. 4. 8. 12. 16.]
[ 0. 6. 12. 18. 24.]
[ 0. 8. 16. 24. 32.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 3. 6. 9. 12.]
[ 0. 6. 12. 18. 24.]
[ 0. 9. 18. 27. 36.]
[ 0. 12. 24. 36. 48.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 4. 8. 12. 16.]
[ 0. 8. 16. 24. 32.]
[ 0. 12. 24. 36. 48.]
[ 0. 16. 32. 48. 64.]]]
How to do this with numpy [no for loops]?
Also, how to do this for a general function, not necessarily *?
NumPy provides you with np.outer() for computing the outer product.
This is a less powerful version of more versatile approaches:
ufunc.outer()
np.tensordot()
np.einsum()
np.einsum() is the only one capable of handling more than two input arrays:
import numpy as np
def prod(items, start=1):
for item in items:
start = start * item
return start
a = np.arange(5)
b = np.arange(5)
c = np.arange(5)
r0 = np.zeros((a.shape[0], b.shape[0], c.shape[0]))
for ia in range(len(a)):
for ib in range(len(b)):
for ic in range(len(c)):
r0[ia, ib, ic] = a[ia] * b[ib] * c[ic]
r1 = prod([a[:, None, None], b[None, :, None], c[None, None, :]])
# same as: r1 = a[:, None, None] * b[None, :, None] * c[None, None, :]
# same as: r1 = a.reshape(-1, 1, 1) * b.reshape(1, -1, 1) * c.reshape(1, 1, -1)
print(np.all(r0 == r2))
# True
r2 = np.einsum('i,j,k->ijk', a, b, c)
print(np.all(r0 == r2))
# True
# as per #hpaulj suggestion
r3 = prod(np.ix_(a, b, c))
print(np.all(r0 == r3))
# True
Of course, the broadcasting approach (which is the same that you used with the array.reshape() version of your code, except that it uses a slightly different syntax for providing the correct shape), can be automatized by explicitly building the slicing (or equivalently the array.reshape() parameters).
In [166]: a = np.arange(2)
...: b = np.arange(3)
...: c = np.arange(4)
As shown in comments and answer:
In [167]: R = np.einsum('i,j,k',a,b,c)
We can also np.ix_ construct arrays that broadcast against each other. This is often used to construct block indexing arrays, but works here as well:
In [168]: A,B,C = np.ix_(a,b,c)
In [169]: A,B,C
Out[169]:
(array([[[0]],
[[1]]]),
array([[[0],
[1],
[2]]]),
array([[[0, 1, 2, 3]]]))
In [170]: R1 = A*B*C
testing:
In [171]: np.allclose(R,R1)
Out[171]: True
That broadcasted product can be done in one line with:
In [172]: np.prod(np.array(np.ix_(a,b,c),object)).shape
Out[172]: (2, 3, 4)
Without that explicit object dtype casting I get a future warning about creating an ragged array.
np.meshgrid(a,b,c, sparse=True, indexing='ij') is an alternative to ix_.
While these ix_ etc expressions are nice, you should become thoroughly comfortable using:
a[:, None, None] * b[None, :, None] * c[None, None, :]
This kind of dimension expansion gives you the most power and flexibility.
The simplest approach (from this answer) is to use:
functools.reduce(np.multiply.outer, (a, b, c))
This works for any number of dimensions, and unlike np.prod(np.ix_(...)) it does not result in numpy deprecation warnings about introducing jagged arrays.

How to multiply individual elements of numpy array of row ith with element of another numpy array of row ith?

How to multiply individual elements of numpy array of row ith with element of another numpy array of row ith?
The inventory example is that I want to multiply an numpy array(containing the item's (280 of them) costing in USD, Euro) of size [280,2] with an numpy array of size [280,3] (stocks in 3 store houses(representing the column).
I believe I have no problem using for loops to calculate but I am trying to learn techniques of broadcasting and reshape. So I would like your help to point me the correct direction(or methods)
Edit: Example
Array A
[[1.50 1.80]
[3 8 ]]
Array B
[[5 10 20]
[10 20 30]]
Result I require is
[[7.5 9 11.5 18 30 36]
30 80 60 160 90 240]]
Thanks
The description was a bit fuzzy, as was the example:
In [264]: A=np.array([[1.5,1.8],[3,8]]); B=np.array([[5,10,20],[10,20,30]])
In [265]: A.shape
Out[265]: (2, 2)
In [266]: B.shape
Out[266]: (2, 3)
Looks like you are trying to do a version of outer product, which can be done with broadcasting.
Let's try one combination:
In [267]: A[:,:,None]*B[:,None,:]
Out[267]:
array([[[ 7.5, 15. , 30. ],
[ 9. , 18. , 36. ]],
[[ 30. , 60. , 90. ],
[ 80. , 160. , 240. ]]])
The right numbers are there, but not the right order. Let's try again:
In [268]: A[:,None,:]*B[:,:,None]
Out[268]:
array([[[ 7.5, 9. ],
[ 15. , 18. ],
[ 30. , 36. ]],
[[ 30. , 80. ],
[ 60. , 160. ],
[ 90. , 240. ]]])
That's better - now just reshape:
In [269]: _.reshape(2,6)
Out[269]:
array([[ 7.5, 9. , 15. , 18. , 30. , 36. ],
[ 30. , 80. , 60. , 160. , 90. , 240. ]])
_268 is a partial transpose of _267, .transpose(0,2,1).

Insert zero rows and columns at the same time at specific indices instead of at the end

I have a 2D array (a confusion matrix), for example (3,3). The number in the array refers to the index into a set of labels.
I know that this array should actually be (5,5) instead of (3,3), for the 5 row and column labels. I can find the labels that have been "hit":
import numpy as np
x = np.array([[3, 0, 3],
[0, 2, 0],
[2, 3, 3]])
labels = ["a", "b", "c", "d", "e"]
missing_idxs = np.setdiff1d(np.arange(len(labels)), x) # array([1, 4]
I know that the row and column for the missed index is all zero, so the output I want is this:
y = np.array([[3, 0, 0, 3, 0],
[0, 0, 0, 0, 0], # <- Inserted row at index 1 all zeros
[0, 0, 2, 0, 0],
[2, 0, 3, 3, 0],
[0, 0, 0, 0, 0]]) # <- Inserted row at index 4 all zeros
# ^ ^
# | |
# Inserted columns at index 1 and 4 all zeros
I can do that with multiple calls to np.insert in a loop over all missing indices:
def insert_rows_columns_at_slow(arr, indices):
result = arr.copy()
for idx in indices:
result = np.insert(result, idx, np.zeros(result.shape[1]), 0)
result = np.insert(result, idx, np.zeros(result.shape[0]), 1)
However, my real array is much bigger, and there may be many more missing indices. Since np.insert re-allocates every time, this is not very efficient.
How can I achieve the same result, but in a more efficient, vectorized way? Bonus points if it works in more than 2 dimensions.
Just another option:
Instead of using the missing indeces, use the non missing indeces:
non_missing_idxs = np.union1d(np.arange(len(labels)), x) # array([0, 2, 3])
y = np.zeros((5,5))
y[non_missing_idxs[:,None], non_missing_idxs] = x
output:
array([[3., 0., 0., 3., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 2., 0., 0.],
[2., 0., 3., 3., 0.],
[0., 0., 0., 0., 0.]])
You can do this by pre-allocating the full resulting array and filling the rows and columns with the old array, even in multiple dimensions, and the dimensions don't have to match size:
def insert_at(arr, output_size, indices):
"""
Insert zeros at specific indices over whole dimensions, e.g. rows and/or columns and/or channels.
You need to specify indices for each dimension, or leave a dimension untouched by specifying
`...` for it. The following assertion should hold:
`assert len(output_size) == len(indices) == len(arr.shape)`
:param arr: The array to insert zeros into
:param output_size: The size of the array after insertion is completed
:param indices: The indices where zeros should be inserted, per dimension. For each dimension, you can
specify: - an int
- a tuple of ints
- a generator yielding ints (such as `range`)
- Ellipsis (=...)
:return: An array of shape `output_size` with the content of arr and zeros inserted at the given indices.
"""
# assert len(output_size) == len(indices) == len(arr.shape)
result = np.zeros(output_size)
existing_indices = [np.setdiff1d(np.arange(axis_size), axis_indices,assume_unique=True)
for axis_size, axis_indices in zip(output_size, indices)]
result[np.ix_(*existing_indices)] = arr
return result
For your use-case, you can use it like this:
def fill_by_label(arr, labels):
# If this is your only use-case, you can make it more efficient
# By not computing the missing indices first, just to compute
# The existing indices again
missing_idxs = np.setdiff1d(np.arange(len(labels)), x)
return insert_at(arr, output_size=(len(labels), len(labels)),
indices=(missing_idxs, missing_idxs))
x = np.array([[3, 0, 3],
[0, 2, 0],
[2, 3, 3]])
labels = ["a", "b", "c", "d", "e"]
missing_idxs = np.setdiff1d(np.arange(len(labels)), x)
print(fill_by_label(x, labels))
>> [[3. 0. 0. 3. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 2. 0. 0.]
[2. 0. 3. 3. 0.]
[0. 0. 0. 0. 0.]]
But this is very flexible. You can use it for zero padding:
def zero_pad(arr):
out_size = np.array(arr.shape) + 2
indices = (0, out_size[0] - 1), (0, out_size[1] - 1)
return insert_at(arr, output_size=out_size,
indices=indices)
print(zero_pad(x))
>> [[0. 0. 0. 0. 0.]
[0. 3. 0. 3. 0.]
[0. 0. 2. 0. 0.]
[0. 2. 3. 3. 0.]
[0. 0. 0. 0. 0.]]
It also works with non-quadratic inputs and outputs:
x = np.ones((3, 4))
print(insert_at(x, (4, 5), (2, 3)))
>>[[1. 1. 1. 0. 1.]
[1. 1. 1. 0. 1.]
[0. 0. 0. 0. 0.]
[1. 1. 1. 0. 1.]]
With different number of insertions per dimension:
x = np.ones((3, 4))
print(insert_at(x, (4, 6), (1, (2, 4))))
>> [[1. 1. 0. 1. 0. 1.]
[0. 0. 0. 0. 0. 0.]
[1. 1. 0. 1. 0. 1.]
[1. 1. 0. 1. 0. 1.]]
You can use range (or other generators) instead of enumerating every index:
x = np.ones((3, 4))
print(insert_at(x, (4, 6), (1, range(2, 4))))
>>[[1. 1. 0. 0. 1. 1.]
[0. 0. 0. 0. 0. 0.]
[1. 1. 0. 0. 1. 1.]
[1. 1. 0. 0. 1. 1.]]
It works with arbitrary dimensions (as long as you specify indices for every dimension)1:
x = np.ones((2, 2, 2))
print(insert_at(x, (3, 3, 3), (0, 0, 0)))
>>>[[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[[0. 0. 0.]
[0. 1. 1.]
[0. 1. 1.]]
[[0. 0. 0.]
[0. 1. 1.]
[0. 1. 1.]]]
You can use Ellipsis (=...) to indicate that you don't want to change a dimension1,2:
x = np.ones((2, 2))
print(insert_at(x, (2, 4), (..., (0, 1))))
>>[[0. 0. 1. 1.]
[0. 0. 1. 1.]]
1: You could automatically detect this based on arr.shape and output_size, and fill it with ... as needed, but I'll leave that up to you if you need it. If you wanted to, you could probably get rid of the output_size parameter instead, but then it gets trickier with passing in generators.
2: This is somewhat different to the normal numpy ... semantics, as you need to specify ... for every dimension that you want to keep, i.e. the following does NOT work:
x = np.ones((2, 2, 2))
print(insert_at(x, (2, 2, 3), (..., 0)))
For timing, I ran the insertion of 10 rows and columns into a 90x90 array 100000 times, this is the result:
x = np.random.random(size=(90, 90))
indices = np.arange(10) * 10
def measure_time_fast():
insert_at(x, (100, 100), (indices, indices))
def measure_time_slow():
insert_rows_columns_at_slow(x, indices)
if __name__ == '__main__':
import timeit
for speed in ("fast", "slow"):
times = timeit.repeat(f"measure_time_{speed}()", setup=f"from __main__ import measure_time_{speed}", repeat=10, number=10000)
print(f"Min: {np.min(times) / 10000}, Max: {np.max(times) / 10000}, Mean: {np.mean(times) / 10000} seconds per call")
For the fast version:
Min: 7.336409069976071e-05, Max: 7.7440657400075e-05, Mean:
7.520040466995852e-05 seconds per call
That is about 75 microseconds.
For your slow version:
Min: 0.00028272533010022016, Max: 0.0002923079213000165, Mean:
0.00028581595062998535 seconds per call
That is about 300 microseconds.
The difference will be greater, the bigger the arrays get. E.g. for inserting 100 rows and columns into a 900x900 array, these are the results (ran only 1000 times):
Fast version:
Min: 0.00022916630539984907, Max: 0.0022916630539984908, Mean:
0.0022916630539984908 seconds per call
Slow version:
Min: 0.013766934227399906, Max: 0.13766934227399907, Mean:
0.13766934227399907 seconds per call

Create 2-dimensional range

I have a column vector of start values X, and a column vector of end values Z, and I want to create a matrix that creates linspaces between X and Z of size n. Is there a way to generate that directly without iterating?
Say n=10, and Z in this simple example is just a vector of 20. Then, the following code
X = np.arange(0,5,1)
Y = np.empty((5, 10))
for idx in range(0, len(X)):
Y[idx] = np.linspace(X[idx], 20, 10)
generates what I want, but it requires iteration. Is there any more performant solution, or one directly built in without all that do-it-yourself logic?
Here's the expected output for my test case:
Y
array([[ 0. , 2.22222222, 4.44444444, 6.66666667,
8.88888889, 11.11111111, 13.33333333, 15.55555556,
17.77777778, 20. ],
[ 1. , 3.11111111, 5.22222222, 7.33333333,
9.44444444, 11.55555556, 13.66666667, 15.77777778,
17.88888889, 20. ],
[ 2. , 4. , 6. , 8. ,
10. , 12. , 14. , 16. ,
18. , 20. ],
[ 3. , 4.88888889, 6.77777778, 8.66666667,
10.55555556, 12.44444444, 14.33333333, 16.22222222,
18.11111111, 20. ],
[ 4. , 5.77777778, 7.55555556, 9.33333333,
11.11111111, 12.88888889, 14.66666667, 16.44444444,
18.22222222, 20. ]])
That's what np.meshgrid is for. Edit: Nevermind, that's not what you wanted.
Here's what you want:
>>> X = np.arange(0, 5, 1)[:, None]
>>> Y = np.linspace(0, 1, 10)[None, :]
>>> X+Y*(20-X)
array([[ 0. , 2.22222222, 4.44444444, 6.66666667,
8.88888889, 11.11111111, 13.33333333, 15.55555556,
17.77777778, 20. ],
[ 1. , 3.11111111, 5.22222222, 7.33333333,
9.44444444, 11.55555556, 13.66666667, 15.77777778,
17.88888889, 20. ],
[ 2. , 4. , 6. , 8. ,
10. , 12. , 14. , 16. ,
18. , 20. ],
[ 3. , 4.88888889, 6.77777778, 8.66666667,
10.55555556, 12.44444444, 14.33333333, 16.22222222,
18.11111111, 20. ],
[ 4. , 5.77777778, 7.55555556, 9.33333333,
11.11111111, 12.88888889, 14.66666667, 16.44444444,
18.22222222, 20. ]])
List comprehensions at least are faster, and sometimes easier to understand than loops (also, almost always use xrange instead of range, btw):
matrix = np.array([np.linspace(x, 20, 10) for x in X])

Categories

Resources