Smarter way to build this list of all possible numbers - python

I must build a list of 3x2 value combinations with all possible values between 0.0 and 1.0 by the step size given (for now it’s 1/3).
The output should be [ [[v1, v2], [v3, v4], [v5, v6]], ... ] where every v is a value between 0.0 and 1.0, e.g.:
[ [[0.0, 0.0], [0.0, 0.0], [0.0, 0.0]],
[[0.0, 0.0], [0.0, 0.0], [0.0, 0.33]],
[[0.0, 0.0], [0.0, 0.0], [0.0, 0.66]],
[[0.0, 0.0], [0.0, 0.0], [0.0, 1.0]],
[[0.0, 0.0], [0.0, 0.0], [0.33, 0.0]],
[[0.0, 0.0], [0.0, 0.0], [0.33, 0.33]],
...,
[[1.0, 1.0], [1.0, 1.0], [1.0, 1.0]] ]
So far I have:
step = 1.0/3.0
lexica = []
for num1 in numpy.arange(0.0, 1.0, step):
for num2 in numpy.arange(0.0, 1.0, step):
for num3 in numpy.arange(0.0, 1.0, step):
for num4 in numpy.arange(0.0, 1.0, step):
for num5 in numpy.arange(0.0, 1.0, step):
for num6 in numpy.arange(0.0, 1.0, step):
lexica.append([[num1, num2],[num3, num4],[num5, num6]])
This doesn't get 1.0 for the highest value and knowing Python there’s got to be a better way of writing this.

You can use numpy.mgrid and manipulate it to give you the output you want
np.mgrid[0:1:step, 0:1:step, 0:1:step, 0:1:step, 0:1:step, 0:1:step].T.reshape(-1, 3, 2)
EDIT:
A bit more extensible method that fixes the endpoints:
def myMesh(nSteps, shape = (3, 2)):
c = np.prod(shape)
x = np.linspace(0, 1, nSteps + 1)
return np.array(np.meshgrid(*(x,)*c)).T.reshape((-1, ) + shape)
myMesh(3)
array([[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 0. , 0.33333333],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 0. , 0.66666667],
[ 0. , 0. ],
[ 0. , 0. ]],
...,
[[ 1. , 0.33333333],
[ 1. , 1. ],
[ 1. , 1. ]],
[[ 1. , 0.66666667],
[ 1. , 1. ],
[ 1. , 1. ]],
[[ 1. , 1. ],
[ 1. , 1. ],
[ 1. , 1. ]]])

this is what you could do without numpy:
from itertools import product
ret = []
for a, b, c, d, e, f in product(range(4), repeat=6):
ret.append([[a/3, b/3], [c/3, d/3], [e/3, f/3]])
or even as a list comprehension:
ret = [[[a/3, b/3], [c/3, d/3], [e/3, f/3]]
for a, b, c, d, e, f in product(range(4), repeat=6)]

You can use itertools.combinations_with_replacement in order to accomplish that task:
>>> from itertools import combinations_with_replacement as cwr
>>> cwr(cwr(numpy.linspace(0, 1, 4), 2), 3)
cwr(numpy.linspace(0, 1, 4), 2) creates all possible combinations of length 2 from the elements of numpy.linspace(0, 1, 4) (which are 0, 1/3, 2/3, 1). The outer call cwr(..., 3) then creates all possible length 3 tuples from the previous 2-tuples, resulting in your 3x2 elements.

Related

Resizing a 3D array and filling with zeros

I have a NumPy array made of ragged nested sequences such as the following:
arr = np.array((
np.random.random((2, 2, 2)),
np.random.random((4, 4, 4)),
np.random.random((2, 2, 2))
))
I want to resize each of the nested arrays to the shape (4, 4, 4) by filling it with zeros.
I initially looked at this post numpy - resize array filling with 0 which works for 2D NumPy arrays but, I have struggled to modify it for a 3D NumPy array.
So far I have tried iterating over the individual nested arrays however, even with some fairly basic code such as
for i, a in enumerate(arr[0]):
arr[0][i] = np.hstack([a, np.zeros([a.shape[0], 2])])
It still creates an error.
ValueError: could not broadcast input array from shape (2,4) into shape (2,2)
I could create separate variables for every nested array except this feels very slow and inefficient and I'd need even messier code to extend this to all 3 dimensions.
An example of a test:
arr = [[[0.1, 0.4],
[0.3, 0,7]],
[[0.5, 0.2],
[0.8, 0.1]]]
If I wanted it to have the shape (2, 3, 4) the output would be the following
[[[0.1, 0.4, 0.0, 0.0],
[0.3, 0,7, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]],
[[0.5, 0.2, 0.0, 0.0],
[0.8, 0.1, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]]]
UPDATE:
Don't even need to use pad then:
def pad_3d(arr: np.ndarray, out_shape: tuple[int, int, int]) -> np.ndarray:
x, y, z = arr.shape
output = np.zeros(out_shape, dtype=arr.dtype)
output[:x, :y, :z] = arr
return output
test_arr = np.array(
[[[0.1, 0.4],
[0.3, 0.7]],
[[0.5, 0.2],
[0.8, 0.1]]]
)
desired_shape = (2, 3, 4)
expected_output = np.array(
[[[0.1, 0.4, 0.0, 0.0],
[0.3, 0.7, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]],
[[0.5, 0.2, 0.0, 0.0],
[0.8, 0.1, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]]]
)
assert np.all(expected_output == pad_3d(test_arr, desired_shape)) # True
Original answer:
It's not entirely clear how you want to fill the resulting arrays with zeros around your data. Only on one side along each axis? Or do you want to essentially "center" your original data amidst the zeros?
Either way, I see no way around creating new arrays. The pad function does what you want, I think. Here is a simplified example for one array, where I "pad around" the data:
import numpy as np
a = np.arange(2*2*2).reshape((2, 2, 2))
x = np.pad(a, 0)
If you want to pad on one side with zeros:
x = np.pad(a, (0, 2))
Assuming your arrays are always cubic, i.e. of the shape (n, n, n), you can generalize like this:
def pad_with_zeros(arr, target_size):
return np.pad(arr, (0, target_size - arr.shape[0]))
IIUC, here is one way to do it:
Assuming your arr is actually a list or a tuple:
arr = (
np.random.random((2, 2, 2)),
np.random.random((4, 4, 4)),
np.random.random((2, 2, 2)),
)
# new shape: max length in each dimension:
shape = np.c_[[x.shape for x in arr]].max(0)
>>> shape
array([4, 4, 4])
# pad all arrays
new = [np.pad(x, np.c_[[0]*len(shape), shape - x.shape]) for x in arr]
>>> new[0].shape
(4, 4, 4)
>>> new[0]
array([[[0.5488135 , 0.71518937, 0. , 0. ],
[0.60276338, 0.54488318, 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]],
[[0.4236548 , 0.64589411, 0. , 0. ],
[0.43758721, 0.891773 , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]],
[[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]],
[[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]]])

Efficiently copying values from one ndarray to another on unequal sized arrays

I have two arrays of different sizes, but I am trying to overwrite some values within the first array with values from the second array on the matching "keys". My actual problem may have many, many rows, and I have already determined that this is currently bottle-necking my program.
edit: I failed to recognize that there may be duplicate values in a1, which should stay duplicated. I added one such example to the np.array examples.
example:
import numpy as np
# first two columns are 'keys', overwrite the 3rd column in a1 with the 3rd column from a2
# some values may be missing from a2. Those should keep the value in a1
a1 = np.array([[ 0.0, 2.0, 10.0 ],
[ 0.0, 2.0, 10.0 ],
[ 0.0, 3.0, 10.0 ],
[ 1.0, 3.0, 10.0 ],
[ 1.0, 13.0, 10.0 ],
[ 2.0, 2.0, 10.0 ],
[ 2.0, 5.0, 10.0 ]])
a2 = np.array([[ 0.0, 2.0, 0.0 ],
[ 0.0, 3.0, 0.713 ],
[ 1.0, 3.0, 0.713 ],
[ 1.0, 13.0, 1.0 ],
[ 2.0, 2.0, 0.0 ]])
# wanted result:
np.array([[ 0.0, 2.0, 0.0 ],
[ 0.0, 2.0, 0.0 ],
[ 0.0, 3.0, 0.713 ],
[ 1.0, 3.0, 0.713 ],
[ 1.0, 13.0, 1.0 ],
[ 2.0, 2.0, 0.0 ],
[ 2.0, 5.0, 10.0 ]])
When I do this brute force, I would simply take each row in a2 and loop through each row in a1 to replace values on matches, but is there a way to do this that runs more efficiently? Some way to vectorize the operation on at least one of the loops? My actual case involves many rows in both arrays and this takes a looooong time.
Would you consider other packages like Pandas?
import pandas as pd
d2 = pd.DataFrame(a2).set_index([0,1])
d1 = pd.DataFrame(a1).set_index([0,1])
d1.update(d2)
d1.reset_index().values
Output:
array([[ 0. , 2. , 0. ],
[ 0. , 2. , 0. ],
[ 0. , 3. , 0.713],
[ 1. , 3. , 0.713],
[ 1. , 13. , 1. ],
[ 2. , 2. , 0. ],
[ 2. , 5. , 10. ]])
Concatenate a2 and a1 and leave only unique rows for first 2 columns.
a_all = np.r_[a2, a1]
a_all = a_all[np.unique(a_all[:, :2], axis=0, return_index=True)[1]]
If column three is getting updated and you want to use pandas:
import numpy as np
import pandas as pd
a1 = np.array([[ 0.0, 2.0, 10.0 ],
[ 0.0, 2.0, 10.0 ],
[ 0.0, 3.0, 10.0 ],
[ 1.0, 3.0, 10.0 ],
[ 1.0, 13.0, 10.0 ],
[ 2.0, 2.0, 10.0 ],
[ 2.0, 5.0, 10.0 ]])
a2 = np.array([[ 0.0, 2.0, 0.0 ],
[ 0.0, 3.0, 0.713 ],
[ 1.0, 3.0, 0.713 ],
[ 1.0, 13.0, 1.0 ],
[ 2.0, 2.0, 0.0 ]])
d1 = pd.DataFrame(a1)
d2 = pd.DataFrame(a2)
d3 = d2.set_index([0,1])[[2]].combine_first(d1.set_index([0,1])[[2]]).reset_index().to_numpy()
d3
Output:
array([[ 0. , 2. , 0. ],
[ 0. , 2. , 0. ],
[ 0. , 3. , 0.713],
[ 1. , 3. , 0.713],
[ 1. , 13. , 1. ],
[ 2. , 2. , 0. ],
[ 2. , 5. , 10. ]])

Python - Iterate through a list and append n values every nth iteration to a vector [duplicate]

This question already has answers here:
How do I split a list into equally-sized chunks?
(66 answers)
Closed 3 years ago.
I am dealing with a deep reinforcement learning problem and the state I need to feed to my agent is contained in a vector of binary numbers.
The list looks like that:
[7.0, 1.0, 1.0, 0.0, 1.0, 5.0, 0.0, 1.0, 0.0, 1.0,
7.0, 1.0, 1.0, 0.0, 1.0, 6.0, 1.0, 0.0, 1.0, 0.0]
However, each complete state for my problem is contained every 5th iteration. Examples of complete states from the sample data are:
[[7. 1. 1. 0. 1.]]
[[5. 0. 1. 0. 1.]]
[[7. 1. 1. 0. 1.]]
[[6. 1. 0. 1. 0.]]
I have tried creating a parser function, similar to a sliding window which should capture the 5 values every 5th iteration.
def getState(data, timestep, window):
parser_start = timestep - window + 1
block = data[parser_start:timestep + 5] if parser_start >= 0 else data[0:timestep + 5] # pad with t0
res = []
for i in range(window - 1):
res.append(block[i])
return np.array([res])
to then implement into a for loop of the type:
window_size = 5
for t in range(10):
next_state = getState(data, t + 4, window_size + 1)
print(next_state)
However, when running the loop the result I get is:
[[7. 1. 1. 0. 1.]]
[[1. 1. 0. 1. 5.]]
[[1. 0. 1. 5. 0.]]
[[0. 1. 5. 0. 1.]]
[[1. 5. 0. 1. 0.]]
[[5. 0. 1. 0. 1.]]
[[0. 1. 0. 1. 7.]]
[[1. 0. 1. 7. 1.]]
[[0. 1. 7. 1. 1.]]
[[1. 7. 1. 1. 0.]]
It seems to append a sliding window of 1, rather than 5. I have been trying for weeks now but I can't find where the problem is.
Do you guys have any fresh ideas?
the window size should be in step
[data[i:i+5] for i in range(0,len(data),5)]
[[7.0, 1.0, 1.0, 0.0, 1.0],
[5.0, 0.0, 1.0, 0.0, 1.0],
[7.0, 1.0, 1.0, 0.0, 1.0],
[6.0, 1.0, 0.0, 1.0, 0.0]]

Iterate over ratios that sum to unity [duplicate]

I have an unknown number n of variables that can range from 0 to 1 with some known step s, with the condition that they sum up to 1. I want to create a matrix of all combinations. For example, if n=3 and s=0.33333 then the grid will be (The order is not important):
0.00, 0.00, 1.00
0.00, 0.33, 0.67
0.00, 0.67, 0.33
0.00, 1.00, 0.00
0.33, 0.00, 0.67
0.33, 0.33, 0.33
0.33, 0.67, 0.00
0.67, 0.00, 0.33
0.67, 0.33, 0.00
1.00, 0.00, 0.00
How can I do that for an arbitrary n?
Here is a direct method using itertools.combinations:
>>> import itertools as it
>>> import numpy as np
>>>
>>> # k is 1/s
>>> n, k = 3, 3
>>>
>>> combs = np.array((*it.combinations(range(n+k-1), n-1),), int)
>>> (np.diff(np.c_[np.full((len(combs),), -1), combs, np.full((len(combs),), n+k-1)]) - 1) / k
array([[0. , 0. , 1. ],
[0. , 0.33333333, 0.66666667],
[0. , 0.66666667, 0.33333333],
[0. , 1. , 0. ],
[0.33333333, 0. , 0.66666667],
[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.66666667, 0. ],
[0.66666667, 0. , 0.33333333],
[0.66666667, 0.33333333, 0. ],
[1. , 0. , 0. ]])
If speed is a concern, itertools.combinations can be replaced by a numpy implementation.
EDIT
Here is a better solution. It basically partitions the number of steps into the amount of variables to generate all the valid combinations:
def partitions(n, k):
if n < 0:
return -partitions(-n, k)
if k <= 0:
raise ValueError('Number of partitions must be positive')
if k == 1:
return np.array([[n]])
ranges = np.array([np.arange(i + 1) for i in range(n + 1)])
parts = ranges[-1].reshape((-1, 1))
s = ranges[-1]
for _ in range(1, k - 1):
d = n - s
new_col = np.concatenate(ranges[d])
parts = np.repeat(parts, d + 1, axis=0)
s = np.repeat(s, d + 1) + new_col
parts = np.append(parts, new_col.reshape((-1, 1)), axis=1)
return np.append(parts, (n - s).reshape((-1, 1)), axis=1)
def make_grid_part(n, step):
num_steps = round(1.0 / step)
return partitions(num_steps, n) / float(num_steps)
print(make_grid_part(3, 0.33333))
Output:
array([[ 0. , 0. , 1. ],
[ 0. , 0.33333333, 0.66666667],
[ 0. , 0.66666667, 0.33333333],
[ 0. , 1. , 0. ],
[ 0.33333333, 0. , 0.66666667],
[ 0.33333333, 0.33333333, 0.33333333],
[ 0.33333333, 0.66666667, 0. ],
[ 0.66666667, 0. , 0.33333333],
[ 0.66666667, 0.33333333, 0. ],
[ 1. , 0. , 0. ]])
For comparison:
%timeit make_grid_part(5, .1)
>>> 338 µs ± 2.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit make_grid_simple(5, .1)
>>> 26.4 ms ± 806 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
make_grid_simple actually runs out of memory if you push it just a bit further.
Here is one simple way:
def make_grid_simple(n, step):
num_steps = round(1.0 / step)
vs = np.meshgrid(*([np.linspace(0, 1, num_steps + 1)] * n))
all_combs = np.stack([v.flatten() for v in vs], axis=1)
return all_combs[np.isclose(all_combs.sum(axis=1), 1)]
print(make_grid_simple(3, 0.33333))
Output:
[[ 0. 0. 1. ]
[ 0.33333333 0. 0.66666667]
[ 0.66666667 0. 0.33333333]
[ 1. 0. 0. ]
[ 0. 0.33333333 0.66666667]
[ 0.33333333 0.33333333 0.33333333]
[ 0.66666667 0.33333333 0. ]
[ 0. 0.66666667 0.33333333]
[ 0.33333333 0.66666667 0. ]
[ 0. 1. 0. ]]
However, this is not the most efficient way to do it, since it is simply making all the possible combinations and then just picking the ones that add up to 1, instead of generating only the right ones in the first place. For small step sizes, it may incur in too high memory cost.
Assuming that they always add up to 1, as you said:
import itertools
def make_grid(n):
# setup all possible values in one position
p = [(float(1)/n)*i for i in range(n+1)]
# combine values, filter by sum()==1
return [x for x in itertools.product(p, repeat=n) if sum(x) == 1]
print(make_grid(n=3))
#[(0.0, 0.0, 1.0),
# (0.0, 0.3333333333333333, 0.6666666666666666),
# (0.0, 0.6666666666666666, 0.3333333333333333),
# (0.0, 1.0, 0.0),
# (0.3333333333333333, 0.0, 0.6666666666666666),
# (0.3333333333333333, 0.3333333333333333, 0.3333333333333333),
# (0.3333333333333333, 0.6666666666666666, 0.0),
# (0.6666666666666666, 0.0, 0.3333333333333333),
# (0.6666666666666666, 0.3333333333333333, 0.0),
# (1.0, 0.0, 0.0)]
We can think of this as a problem of dividing some fixed number of things (1/s in this case and represented using sum_left parameter) between some given number of bins (n in this case). The most efficient way I can think of doing this is using a recursion:
In [31]: arr = []
In [32]: def fun(n, sum_left, arr_till_now):
...: if n==1:
...: n_arr = list(arr_till_now)
...: n_arr.append(sum_left)
...: arr.append(n_arr)
...: else:
...: for i in range(sum_left+1):
...: n_arr = list(arr_till_now)
...: n_arr.append(i)
...: fun(n-1, sum_left-i, n_arr)
This would give an output like:
In [36]: fun(n, n, [])
In [37]: arr
Out[37]:
[[0, 0, 3],
[0, 1, 2],
[0, 2, 1],
[0, 3, 0],
[1, 0, 2],
[1, 1, 1],
[1, 2, 0],
[2, 0, 1],
[2, 1, 0],
[3, 0, 0]]
And now I can convert it to a numpy array to do an elementwise multiplication:
In [39]: s = 0.33
In [40]: arr_np = np.array(arr)
In [41]: arr_np * s
Out[41]:
array([[ 0. , 0. , 0.99999999],
[ 0. , 0.33333333, 0.66666666],
[ 0. , 0.66666666, 0.33333333],
[ 0. , 0.99999999, 0. ],
[ 0.33333333, 0. , 0.66666666],
[ 0.33333333, 0.33333333, 0.33333333],
[ 0.33333333, 0.66666666, 0. ],
[ 0.66666666, 0. , 0.33333333],
[ 0.66666666, 0.33333333, 0. ],
[ 0.99999999, 0. , 0. ]])
This method will also work for an arbitrary sum (total):
import numpy as np
import itertools as it
import scipy.special
n = 3
s = 1/3.
total = 1.00
interval = int(total/s)
n_combs = scipy.special.comb(n+interval-1, interval, exact=True)
counts = np.zeros((n_combs, n), dtype=int)
def count_elements(elements, n):
count = np.zeros(n, dtype=int)
for elem in elements:
count[elem] += 1
return count
for i, comb in enumerate(it.combinations_with_replacement(range(n), interval)):
counts[i] = count_elements(comb, n)
ratios = counts*s
print(ratios)

Python array using numpy

I am confused about doing vectorization using numpy.
In particular, I have a matrix of this form:
of type <type 'list'>
[[0.0, 0.0, 0.0, 0.0], [0.02, 0.04, 0.0325, 0.04], [1, 2, 3, 4]]
How do I make it look like the following using numpy?
[[ 0.0 0.0 0.0 0.0 ]
[ 0.02 0.04 0.0325 0.04 ]
[ 1 2 3 4 ]]
Yes, I know I can do it using:
np.array([[0.0, 0.0, 0.0, 0.0], [0.02, 0.04, 0.0325, 0.04], [1, 2, 3, 4]])
But I have a very long matrix, and I can't just type out each rows like that. How can I handle the case when I have a very long matrix?
This is not a matrix of type list, it is a list that contains lists. You may think of it as matrix, but to Python it is just a list
alist = [[0.0, 0.0, 0.0, 0.0], [0.02, 0.04, 0.0325, 0.04], [1, 2, 3, 4]]
arr = np.array(alist)
works just the same as
arr = np.array([[0.0, 0.0, 0.0, 0.0], [0.02, 0.04, 0.0325, 0.04], [1, 2, 3, 4]])
This creates 2d array, with shape (3,4) and dtype float
In [212]: arr = np.array([[0.0, 0.0, 0.0, 0.0], [0.02, 0.04, 0.0325, 0.04], [1, 2, 3, 4]])
In [213]: arr
Out[213]:
array([[ 0. , 0. , 0. , 0. ],
[ 0.02 , 0.04 , 0.0325, 0.04 ],
[ 1. , 2. , 3. , 4. ]])
In [214]: print(arr)
[[ 0. 0. 0. 0. ]
[ 0.02 0.04 0.0325 0.04 ]
[ 1. 2. 3. 4. ]]
Assuming you start with a large array, why not split it into arrays of the right size (n):
splitted = [l[i:i + n] for i in range(0, len(array), n)]
and make the matrix from that:
np.array(splitted)
If you're saying you have a list of lists stored in Python object A, all you need to do is call np.array(A) which will return a numpy array using the elements of A. Otherwise, you need to specify what form your data is in right now to clarify how you want to load your data.

Categories

Resources