Related
I have the following input data
class_p = [0.0234375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1748046875, 0.0439453125, 0.0, 0.35302734375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3828125]
league_p = [0.4765625, 0.0, 0.00634765625, 0.4658203125, 0.0, 0.0, 0.046875, 0.0, 0.0, 0.0029296875, 0.0, 0.0, 0.0, 0.0, 0.0]
a2_p = [0.1171875, 0.0, 0.0, 0.1171875, 0.0, 0.0078125, 0.30322265625, 0.31103515625, 0.0, 0.0, 0.0, 0.1435546875, 0.0, 0.0, 0.0]
p1_p = [0.0, 0.03125, 0.375, 0.09375, 0.0234375, 0.0, 0.46875, 0.0078125, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
p2_p = [0.3984375, 0.0, 0.0, 0.3828125, 0.08935546875, 0.08935546875, 0.023345947265625, 0.007720947265625, 0.0, 0.0, 0.0087890625, 0.00018310546875, 0.0, 0.0, 0.0]
class_v = [55, 75, 55, 75, 500, 10000, 55, 55, 55, 75, 75, 55, 55, 500, 55, 55, 75, 75, 55, 55, 55]
league_v = [0, 0, 0, 0, 0, 0, 0, 0, 40, 40, 40, 40, 1500, 1500, 3000]
a2_v= [0, 0, 0, 0, 0, 0, 0, 0, 40, 40, 40, 40, 1500, 1500, 3000]
p1_v = [0, 0, 0, 0, 0, 0, 0, 40, 40, 40, 40, 40, 1500, 1500, 3000]
p2_v = [0, 0, 0, 0, 0, 0, 0, 0, 40, 40, 40, 40, 1500, 1500, 3000]
With that data, I am generating the odds of each combination occurring.
As an example to generate the chance of a given combination
class_p[0]
league_p[6]
a2_p[11]
p1_p[7]
p2_p[3]
I would multiply their values with each other
0.0234375x0.046875x0.1435546875x0.0078125x0.3828125
That would give me 4.716785042546689510345458984375 × 10^-7
Since the given combination had class_p[0], league_p[6], a2_p[11], p1_p[7], p2_p[3], I would take the following values in the "values" arrays.
I would sum
class_v[0] + league_v[6] + a2_v[11] + p1_v[7] + p2_v[3]
That would give me 55+0+40+40+0 = 135
To finalize the process I would do
(0.0234375*0.046875*0.1435546875*0.0078125*0.3828125)*(55+0+40+40+0) = 0.00006367659807
The full final calc is
(0.0234375×0.046875×0.1435546875×0.0078125×0.3828125) (55 + 0 + 40 + 40 + 0)
(combintation_chance)*(combination_value)
I need to do this process for all possible combinations of combintation_chance
This should give me a column of values(1xN). If I sum the values of that column I reach the EV overall, by summing the EV of individual combinations.
Calculating combintation_chance is working just fine. My issue is how to line up the given combination with its corresponding value sum (combination_value). At the moment, I have additional identifiers attached to the *_p arrays and I then do a string comparison with them to determine which combination value to use. This is very slow for billions of comparisons, therefore I am exploring a better approach.
I am using python 3.8 & numpy 1.24
Edit
The question has been adjusted to include much more detail
Broadcasting
Ok, so it seems that this is a simple broadcasting problem.
You want a 5D-array of probabilities, times a 5d-array of values. And, of course, you want it without any for loop.
In numpy the classical way to have numpy do nested loops for you (which is, indeed, way faster than doing them yourself. First rule of numpy is "avoid at all cost to iterate over elements. No for loop"), is to use broadcasting.
Let's start with 2d example (as was your first intention. And that was a good idea. Problem was it was ambiguous, but restraining your question to 2d was not bad).
You have
class_p = np.array([0.0234375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1748046875, 0.0439453125, 0.0, 0.35302734375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3828125])
league_p = np.array([0.4765625, 0.0, 0.00634765625, 0.4658203125, 0.0, 0.0, 0.046875, 0.0, 0.0, 0.0029296875, 0.0, 0.0, 0.0, 0.0, 0.0])
One way (not the only one, but probably the one easier to adapt to any similar question) is to use broadcasting.
If you indeed convert class_p in a column, that is a 21×1 2D array, and league_p into a line, that is a 1×15 2D array, then, if you multiply both, result will be a 21x15 2D array, containing all combinations.
Because
np.array([[1],[2],[3]]) * np.array([[4,5]])
is
[[4,5],
[8,10],
[12,15]]
That's how broadcasting works.
There are several way to convert a 1D-array so a row or a column of a 2D-array. For example you could use .reshape. Like class_p.reshape(-1,1) and league_p.reshape(1,-1). But the fastest is to add a new axis. Like class_p[:,None] and league_p[None,:]. Note that the second way doesn't really create a new array. It is just a different view of the same array. This is way it is faster.
So, our 2D probability map is
class_p[:,None]*league_p[None,:]
Likewise, to have all 21×15 combination of sum of values, you can rely on the same broadcasting to perform additon
class_v[:,None]+league_v[None,:]
Broadcasting solution
So solution, in 2D, using broadcasting, is
class_p[:,None]*league_p[None,:] * (class_v[:,None] + league_v[None,:])
In 5D, with all your variables, it is still manageable (but don't add too much dimensions! it would soon become a huge result. And I suspect what you are really interested at the end is just the sum of all that), this time, not in one line (not that it couldn't be done that way, but, that would be a big line...)
pr = class_p[:,None,None,None,None]*league_p[None,:,None,None,None]*a2_p[None,None,:,None,None]*p1_p[None,None,None,:,None]*p2_p[None,None,None,None,:]
vl = class_v[:,None,None,None,None]+league_v[None,:,None,None,None]+a2_v[None,None,:,None,None]+p1_v[None,None,None,:,None]+p2_v[None,None,None,None,:]
pr*vl
add.outer and multiply.outer
As you can see, in 5D, it is a little bit tedious. But I wanted to show you the principle of broadcasting, before introducing another (not really shorter, but a bit less tedious) way. Way that was already given by Reinderien. But since it was before you clarified the question, it was not the good result, but principle is the same
In 2D
np.multiply.outer(class_p, league_p) * np.add.outer(class_v, league_v)
Unfortunately, those function take only 2 args. So in 5D, you have to chain them
pr = np.multiply.outer(class_p, np.multiply.outer(league_p, np.multiply.outer(a2_p, np.multiply.outer(p1_p, p2_p))))
vl = np.add.outer(class_v, np.add.outer(league_v, np.add.outer(a2_v, np.add.outer(p1_v, p2_v))))
pr * vl
Expected value
Note that if the aim of all this is to compute the expected "value" (whatever that value is), that is Σ p(i,j,k,l,m)×v(i,j,k,l,m), for all possible outcomes, then, doing it that way is probably not a good idea.
For your example, it is manageable. You are computing "only" 1 million possible outcomes that is 1 million probabilities (each of them being 4 multiplications) and 1 million associated values (4 additions each). And the performing 1 million multiplication between those 2 sets of 1 million probabilities and values. And then summing the result, that is one extra million addition. Altogether, that is only 10 millions elementary arithmetic operation. Not much for a modern computer, and response still feels instantaneous. But, yet, it is O(Nᵏ) is both cpu and memory. N being the typical length of an array, and k the number of variables.
But if you intend to add more dimensions (more variables, associated with more set of probabilities and set of values), then that is unnecessary explosive, in both CPU time, and memory (those 5D arrays of probabilities and values are stored), or simply if you intend to perform this computation more than once, that expected value can be computed way faster, using just O(Nk) operations.
I spare you the development (but it is just a matter of expanding sum Σᵢⱼₖₗₘ pᵢpⱼpₖpₗpₘ (vᵢ+vⱼ+vₖ+vₗ+vₘ)), you can compute it faster like this
P1 = class_p.sum()
PV1 = (class_p*class_v).sum()
P2 = league_p.sum()
PV2 = (league_p*league_v).sum()
P3 = a2_p.sum()
PV3 = (a2_p*a2_v).sum()
P4 = p1_p.sum()
PV4 = (p1_p*p1_v).sum()
P5 = p2_p.sum()
PV5 = (p2_p*p2_v).sum()
expectedValue = P1*P2*P3*P4*PV5 + P1*P2*P3*PV4*P5 + P1*P2*PV3*P4*P5 + P1*PV2*P3*P4*P5 + PV1*P2*P3*P4*P5
sameAs = (pr*vl).sum()
It appears more complicated because there are more lines. But each line is along 1 dimension only. So it is replacing an order of magnitude of n₁n₂n₃n₄n₅ operations by an order of magnitude of n₁+n₂+n₃+n₄+n₅ operations, where n₁,...,n₅ are the size of arrays of each of the 5 variables.
So, again, if your objective is to compute expected value, then, trying to compute the 5D arrays (as your question is), is a really costly way.
This doesn't make any attempt to cache intermediate results, etc.
import numpy as np
class_percentages = (0.0, 0.0, 0.0, 0.3, 0.50)
league_percentages = (0.1, 0.0, 0.2, 0.1, 0.05)
class_values = (50, 50, 50, 75, 100)
league_values = (0, 10, 10, 25, 75)
combined = np.add.outer(class_percentages, league_percentages)*np.add.outer(class_values, league_values)
print(combined)
Output:
[[ 5. 0. 12. 7.5 6.25]
[ 5. 0. 12. 7.5 6.25]
[ 5. 0. 12. 7.5 6.25]
[30. 25.5 42.5 40. 52.5 ]
[60. 55. 77. 75. 96.25]]
I have an empty numpy array, a list of indices, and list of values associated with the indices. The issue is that there may be duplicates in the indices. In all these "collision" cases, I'd like the smallest value to be picked. Just wondering what is the best way to go about it.
Eg:
array = [0,0,0,0,0,0,0]
indices = [0, 0, 2, 3, 2, 4]
values = [1.0, 3.0, 3.5, 1.5, 2.5, 8.0]
Result:
out = [1.0, 0, 2.5, 1.5, 8.0, 0.0, 0.0]
You can always implement something manually like:
import numpy as np
def index_reduce(arr, indices, out, reducer=min):
touched = np.zeros_like(out, dtype=np.bool_)
for i, x in enumerate(indices):
if not touched[x]:
out[x] = arr[i]
touched[x] = True
else:
out[x] = reducer(out[x], arr[i])
return out
which essentially loops through the indices and assign the values of arr to out if not already touched (keeping track of this with the touched array) and reducing the output with the specified reducer.
NOTE: The reducer function needs to be such that the final result can only depend on the current and previous value.
The usage of this would be:
indices = [0, 0, 2, 3, 2, 4]
values = [1.0, 3.0, 3.5, 1.5, 2.5, 8.0]
array = np.zeros(7)
index_reduce(values, indices, array)
# array([1. , 0. , 2.5, 1.5, 8. , 0. , 0. ])
If performances are of concern, you can also accelerate the above code with Numba with a simple decoration provided that also the values and indices inputs are NumPy arrays:
import numba as nb
index_reduce_nb = nb.njit(index_reduce)
indices = np.array([0, 0, 2, 3, 2, 4])
values = np.array([1.0, 3.0, 3.5, 1.5, 2.5, 8.0])
array = np.zeros(7)
index_reduce_nb(values, indices, array)
# array([1. , 0. , 2.5, 1.5, 8. , 0. , 0. ])
Benchmarks
The above solutions can be compared to a Torch-based solution (reworked from #Shai's answer):
import torch
def index_reduce_torch(arr, indices, out, reduce_="amin"):
arr = torch.from_numpy(arr)
indices = torch.from_numpy(indices)
out = torch.from_numpy(out)
return out.index_reduce_(dim=0, index=indices, source=arr, reduce=reduce_, include_self=False).numpy()
or, with additional skipping of Torch gradients:
index_reduce_torch_ng = torch.no_grad()(index_reduce_torch)
index_reduce_torch_ng.__name__ = "index_reduce_torch_ng"
and a Pandas-based solution (reworked from #bpfrd's answer):
import pandas as pd
def index_reduce_pd(arr, indices, out, reducer=min):
df = pd.DataFrame(data=zip(indices, arr))
df1 = df.groupby(0, as_index=False).agg(reducer)
out[df1[0]] = df1[1]
return out
using the following code:
funcs = index_reduce, index_reduce_nb, index_reduce_pd, index_reduce_torch, index_reduce_torch_ng
timings = {}
for i in range(4, 18):
n = 2 ** i
print(f"n = {n}, i = {i}")
extrema = 0, 2 * n
indices = np.random.randint(*extrema, n)
values = np.random.random(n)
out = np.zeros(extrema[1] + 1)
timings[n] = []
base = funcs[0](values, indices, out)
for func in funcs:
res = func(values, indices, out)
is_good = np.allclose(base, res)
timed = %timeit -r 16 -n 16 -q -o func(values, indices, out)
timing = timed.best * 1e6
timings[n].append(timing if is_good else None)
print(f"{func.__name__:>24} {is_good} {timing:10.3f} µs")
to produce with the additional lines:
import matplotlib.pyplot as plt
df = pd.DataFrame(data=timings, index=[func.__name__ for func in funcs]).transpose()
df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', figsize=(6, 4))
df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', ylim=[0, 500], figsize=(6, 4))
fig = plt.gcf()
fig.patch.set_facecolor('white')
these plots:
(the second is a zoomed-in version of the first).
These indicate that the Numba accelerated solution could be the fastest, closely followed by the Torch-based solution while the Pandas approach could be the slowest, even slower than the explicit solution without acceleration.
You are looking for index_reduce_, which was introduced in PyTorch 1.12.
import torch
array = torch.zeros(7)
indices = torch.tensor([0, 0, 2, 3, 2, 4])
values = torch.tensor([1.0, 3.0, 3.5, 1.5, 2.5, 8.0])
out = array.index_reduce_(dim=0, index=indices, source=values, reduce='amin', include_self=False)
You'll get your desired output:
tensor([1.0000, 0.0000, 2.5000, 1.5000, 8.0000, 0.0000, 0.0000])
Note that this method is in "beta" and its API may change in future PyTorch versions.
You can use pandas groupby agg as the following:
indices = [0, 0, 2, 3, 2, 4]
values = [1.0, 3.0, 3.5, 1.5, 2.5, 8.0]
array = [0,0,0,0,0,0,0]
df = pd.DataFrame(zip(indices, values), columns=['indices','values'])
df1 = df.groupby('indices', as_index=False).agg(values=('values', min))
for i,j in zip(df1['indices'].tolist(), df1['values'].tolist()):
array[i] = j
output:
array
>[1.0, 0, 2.5, 1.5, 8.0, 0, 0]
I have created several circles with different origins using Python and I am trying to implement a function that will divide each circle into n number of equal parts along the circumference. I am trying to populate an array that contains the starting [x,y] coordinate for each part on the circumference.
My code is as follows:
def fnCalculateArcCoordinates(self,intButtonCount,radius,center):
lstButtonCoord = []
#for degrees in range(0,360,intAngle):
for arc in range(1,intButtonCount + 1):
degrees = arc * 360 / intButtonCount
xDegreesCoord = int(center[0] + radius * math.cos(math.radians(degrees)))
yDegreesCoord = int(center[1] + radius * math.sin(math.radians(degrees)))
lstButtonCoord.append([xDegreesCoord,yDegreesCoord])
return lstButtonCoord
When I run the code for 3 parts, an example of the set of coordinates that are returned are:
[[157, 214], [157, 85], [270, 149]]
This means the segments are of different sizes. Could someone please help me identify where my error is?
The exact results of such trigonometric calculations are rarely exact integers. By flooring them to int, you lose some precision, of course. The approximate (Pythagorean) distance checks suggest that your math is correct:
(270-157)**2 + (149-85)**2
# 16865
(270-157)**2 + (214-149)**2
# 16994
(157-157)**2 + (214-85)**2
# 16641
Furthermore, you can use the built-in complex number type and the cmath module. In particular cmath.rect converts polar coordinates (a radius and an angle) into rectangular coordinates:
import cmath
def calc(count, radius, center):
x, y = center
for i in range(count):
r = cmath.rect(radius, (2*cmath.pi)*(i/count))
yield [round(x+r.real, 2), round(y+r.imag, 2)]
list(calc(4, 2, [0, 0]))
# [[2.0, 0.0], [0.0, 2.0], [-2.0, 0.0], [-0.0, -2.0]]
list(calc(6, 1, [0, 0]))
# [[1.0, 0.0], [0.5, 0.87], [-0.5, 0.87], [-1.0, 0.0], [-0.5, -0.87], [0.5, -0.87]]
You want to change rounding as you see fit.
I want to generate a list of floats of size M, where each item in the list is greater than the other proceeding items i.e. Descending order. and the sum of the list must sum to 1. and for the same M magnitude can I generate more than one list that obey to the given constraints.
I'm thinking of an equation in the following form:
Xi+1 = compute([Xi,Xi-1...X0], M, Random)
But I am not able to figure out the extent of this function. Thank you in advance.
okay, so let's pick 10 random numbers from 0 to 10, and sort them. Then compute sum and rebuild a new list with each element divided by this sum:
import random
# create a non-normalized ascending list of numbers
lst = sorted(random.uniform(0,10) for _ in range(10))
# compute the sum
temp_sum = sum(lst)
# now divide each member by the sum to normalize the list
lst = [i/temp_sum for i in lst]
print(lst,sum(lst))
one output could be:
[0.0340212528820301, 0.05665995400192079, 0.07733861892990018,
0.07752841352220373, 0.08556431469182045, 0.11628857362899164,
0.11706017358757258, 0.12523809404875455, 0.14272942597136748,
0.16757117873543856] 1.0
The sum could be not exactly 1 because of floating point inaccuracy, but will be very close.
If you want something that is mathematically predictable...
def makeDescendingUnitArray(length: int):
if (not isinstance(length, int)) or (length < 1):
raise ValueError("Array Length must be an int with a value of at least 1")
if length == 1:
return [1]
else:
constant = 1
output = list()
for x in range(length - 2):
constant /= 2
output.append(constant)
return output + [2*constant/3, constant/3]
for arrayLength in range(1, 10):
array = makeDescendingUnitArray(arrayLength)
print(array)
Produces the following arrays...
[1]
[0.6666666666666666, 0.3333333333333333]
[0.5, 0.3333333333333333, 0.16666666666666666]
[0.5, 0.25, 0.16666666666666666, 0.08333333333333333]
[0.5, 0.25, 0.125, 0.08333333333333333, 0.041666666666666664]
[0.5, 0.25, 0.125, 0.0625, 0.041666666666666664, 0.020833333333333332]
[0.5, 0.25, 0.125, 0.0625, 0.03125, 0.020833333333333332, 0.010416666666666666]
[0.5, 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.010416666666666666, 0.005208333333333333]
[0.5, 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.005208333333333333, 0.0026041666666666665]
If you want a mathematically predictable one-liner, then there's this...
(loop to show you what it looks like)
for length in range(1, 10):
array = [2*x/(length * (length + 1)) for x in range(length,0,-1)]
print(sum(array), array)
This produces the following output. Note that this is just as susceptible to the floating point rounding errors as all of the other algorithms. There are some better and some worse algorithms, but at some point, they'll all have some error.
Sum: 1.0 Array: [1.0]
Sum: 1.0 Array: [0.6666666666666666, 0.3333333333333333]
Sum: 0.9999999999999999 Array: [0.5, 0.3333333333333333, 0.16666666666666666]
Sum: 0.9999999999999999 Array: [0.4, 0.3, 0.2, 0.1]
Sum: 1.0 Array: [0.3333333333333333, 0.26666666666666666, 0.2, 0.13333333333333333, 0.06666666666666667]
Sum: 0.9999999999999998 Array: [0.2857142857142857, 0.23809523809523808, 0.19047619047619047, 0.14285714285714285, 0.09523809523809523, 0.047619047619047616]
Sum: 1.0 Array: [0.25, 0.21428571428571427, 0.17857142857142858, 0.14285714285714285, 0.10714285714285714, 0.07142857142857142, 0.03571428571428571]
Sum: 1.0 Array: [0.2222222222222222, 0.19444444444444445, 0.16666666666666666, 0.1388888888888889, 0.1111111111111111, 0.08333333333333333, 0.05555555555555555, 0.027777777777777776]
Sum: 0.9999999999999999 Array: [0.2, 0.17777777777777778, 0.15555555555555556, 0.13333333333333333, 0.1111111111111111, 0.08888888888888889, 0.06666666666666667, 0.044444444444444446, 0.022222222222222223]
I have a python script what adjusts coordinates of triangles towards the centre of gravity of the triangle.
This works just fine, however to generate a workable output (i need to write a text file wich can be imported by other software, Abaqus) i want to write a coordinate list in a text file.
But i can't get this to work proparly.
I think i first will need to create a list or tuple from the numpy array.
However this doesn't work correctly.
There's still an array per coordinate in this list.
How can i fix this?
The script i currently have i shown below.
newcoords = [[0.0, 0.0], [1.0, 0.0], [0.0, 1.0], [0.0, 0.0], [1.0, 1.0], [0.0, 1.0]]
newelems = [[0, 1, 2], [3, 4, 5]]
import numpy as np
#define triangles
triangles = np.array([[newcoords[e] for e in newelem] for newelem in newelems])
#find centroid of each triangle
CM = np.mean(triangles,axis=1)
#find vector from each point in triangle pointing towards centroid
point_to_CM_vectors = CM[:,np.newaxis] - triangles
#calculate similar triangles 1% smaller
new_triangle = triangles + 0.01*point_to_CM_vectors
newcoord = []
newcoord.append(list(zip(*new_triangle)))
print 'newcoord =', newcoord
#generate output
fout = open('_PartInput3.inp','w')
print >> fout, '*Node-new_triangle'
for i,x in enumerate(newcoord):
print >> fout, i+1, ',', x[0], ',', x[1]
fout.close()
The coordinate list in the output file '_PartInput3.inp' should look the following:
*Node-new_triangle
1, 0.00333333, 0.00333333
2, 0.99333333, 0.00333333
3, 0.00333333, 0.99333333
4, 0.00333333, 0.00666667
5, 0.99333333, 0.99666667
6, 0.00333333, 0.99666667
Thanks in advance for any help!
#generate output
fout = open('_PartInput3.inp','w')
fout.write('*Node-new_triangle\n')
s = new_triangle.shape
for i, x in enumerate(new_triangle.reshape(s[0]*s[1], 2)):
fout.write("{}, {}, {}\n".format(i+1, x[0], x[1]))
fout.close()
or better
#generate output
with open('_PartInput3.inp','w') as fout:
fout.write('*Node-new_triangle\n')
s = new_triangle.shape
for i, x in enumerate(new_triangle.reshape(s[0]*s[1], 2)):
fout.write("{}, {}, {}\n".format(i+1, x[0], x[1]))