Draw 2 numbers from urn with replacement - Python - python

my urn contains the numbers 1.3 and 0.9, which I would like to draw 35 times per simulation with replacement. Then perform a final calculation, from which the result is appended to a list.
In total I would like to perform 10000 simulations.
My code looks like this:
#Draw either 1.3 or 0.9
returns = [1.3,0.9]
#No. of simulations
simulations = 10000
#10000 for loops
for i in range(simulations):
lst = []
#each iteration should include 35 random draws with replacement
for i in range(35):
lst.append(random.choices(returns,1))
lst = np.array(lst)
#Do final calculation and append solution to list
ret = []
ret.append((prod(lst)^(1/35))-1)
The error i receive is TypeError: 'int' object is not iterable. I understand why it's not working as i am trying to convert an integer to a list object....but i just don't know how to solve this?
Full stack trace:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-5d61655781f6> in <module>
9 #each iteration should include 35 random draws with replacement
10 for i in range(35):
---> 11 lst.append(random.choices(returns,1))
12
13 lst = np.array(lst)
~/opt/anaconda3/lib/python3.7/random.py in choices(self, population, weights, cum_weights, k)
355 total = len(population)
356 return [population[_int(random() * total)] for i in range(k)]
--> 357 cum_weights = list(_itertools.accumulate(weights))
358 elif weights is not None:
359 raise TypeError('Cannot specify both weights and cumulative weights')
TypeError: 'int' object is not iterable

If you want to convert lst to a numpy array, you can instead use numpy.random.choice. This will also remove the need of the for loop.
import numpy as np
#Draw either 1.3 or 0.9
urn = [1.3,0.9]
#No. of simulations
simulations = 10000
#No. of draws
draws = 35
# simulate the draws from the urn
X=np.random.choice(urn,(draws,simulations))
# print first 10 elements as a check
print(X[1:10])
# print shape as a check
print(X.shape)
output:
[[1.3 1.3 1.3 ... 0.9 1.3 1.3]
[0.9 1.3 0.9 ... 0.9 0.9 0.9]
[0.9 1.3 0.9 ... 1.3 1.3 0.9]
...
[1.3 0.9 0.9 ... 1.3 0.9 0.9]
[1.3 1.3 1.3 ... 0.9 0.9 1.3]
[1.3 1.3 0.9 ... 0.9 1.3 1.3]]
(35, 10000)
I changed the name of returns to urn. returns is a bit confusing in python.

When you call:
random.choices(returns,1)
python thinks that the 1 corresponds to the weight, if it corresponds to the k which allows to choose the number of elements to return, it must be specified like this:
random.choices(returns,k=1)
but by default it is at 1 so it is not necessary to inform it

Related

How do I fill an array according to distributions for each element in Python?

Suppose I have 3 boxes and 3 animals and I want to create an array of boxes containing 1 animal each according to their respective distributions:
animals = ["Cat", "Dog", "Bunny"]
boxes = []
Where the probabilities are given by
"Cat" "Dog" "Bunny"
Box 1 0.3 0.4 0.3
Box 2 0.2 0.3 0.5
Box 3 0.5 0.3 0.2
How would I fill the array of boxes such that the first element is equal to "Cat" at probability 0.3, "Dog" at probability 0.4 and "Bunny" at probability 0.3, the second element is equal to "Cat" at probability 0.2, "Dog" at probability 0.3 etc.
Also, suppose the first element/box is "Cat". To look at the second and third box, we cannot have a probability >0 of changing the first box again since it's already filled with a cat. We can also not have a probability >0 of the second box containing a cat again, since it's already in box 1.
Would this be solved responsibly by scaling the remaining rows/columns to add up to 1 but still have their proportions be the same? For instance if box 1 is a cat, then we would get
"Cat" "Dog" "Bunny"
Box 1 1 0 0
Box 2 0 0.4 0.6
Box 3 0 0.6 0.4
You can use random.choices. It automatically weights the selection:
boxes = []
animals = ["Cat", "Dog", "Bunny"]
box1 = [0.3, 0.4, 0.3]
box2 = [0.2, 0.3, 0.5]
# box3 = [0.5, 0.3, 0.2] is commented out because it can be ignored
# Choose the first item to go in box1
boxes.append(random.choices(animals, k = 1, weights = box1))
chosen_ind = animals.index(boxes[0])
# Remove the chosen item from animals and box2
animals.pop(chosen_ind)
box2.pop(chosen_ind)
# Choose the second item
boxes.append(random.choices(animals, k = 1, weights = box2))
chosen_ind = animals.index(boxes[1])
# Remove the chosen item from animals, append the only remaining item
animals.pop(chosen_ind)
boxes.append(animals[0])
I am pretty aware that this is not a particularly clean nor scalable way to solve the problem, but it kind of gets the job done for this case.
Edit: A new version with a numpy array is this
import numpy as np
boxes = []
# n animals to choose from
animals = ['cat', 'dog', 'bunny' ... ] # as many items as needed
# n x n matrix of probabilities
prob = np.array([
[prob(box1, cat), prob(box1, dog), ...]
[prob(box2, cat), prob(box2, dog), ...]
...
])
for box_ind, box in enumerate(prob):
boxes.append(random.choices(animals, k = 1, weights = box)
col_ind = animals.index(boxes[box_ind])
# This line sets the probability of a chosen item to 0 for future iterations
prob[:, col_ind] = 0

How can I fix the problem that numpy.arange is not working properly?

I am trying to create two list with numpy.arange via two input parameters and the I want to pass them in an array which is initialized via np.zeros as a 3x3 matrix. The problem is that the pass only works for 0.1 and I don't see what I am doing wrong. My code:
import numpy as np
from time import sleep
def Stabilizer(px,pz):
ss = 0.05
#initialize an array for data aquisition: 1st row is for countrate, 2nd and 3rd row for x and z values in V
values_x = np.zeros((3,3), dtype=float)
values_z = np.zeros((3,3), dtype=float)
sleep(5)
values_x[2] = pz
values_z[1] = px
x_range = np.arange(px-ss, px+ss,ss)
z_range = np.arange(pz-ss, pz+ss,ss)
print(x_range)
print(z_range)
values_x[1] = x_range
values_z[2] = z_range
for i,x_value in enumerate(x_range):
#change_pos(x_channel, x_value)
sleep(1)
start = 1000
stop = 1+i
countrate = stop - start
values_x[0,i] = countrate
print(x_value)
print(values_x)
Stabilizer(0.1,0.2)
which creates this output on console:
Traceback (most recent call last):
File "C:/Users/x/PycharmProjects/NV_centre/test.py", line 46, in <module>
Stabilizer(0.1,0.2)
File "C:/Users/x/PycharmProjects/NV_centre/test.py", line 35, in Stabilizer
values_z[2] = z_range
ValueError: could not broadcast input array from shape (2) into shape (3)
[0.05 0.1 0.15]
[0.15 0.2 ]
In theory the function np.arange(px-ss, px+ss,ss) creates a list with the output [0.05 0.1]. When I use np.arange(px-ss, px+2*ss,ss) in theory the output would be [0.05 0.1 0.15] but it is [0.05 0.1 0.15 0.2 ]. And for z_range = np.arange(pz-ss, pz+2*ss,ss) the output is [0.15 0.2 0.25] which is correct. I don't understand why the difference occurs since both list are created in the same way.
The results for numpy.arange are not consistent when using a non integer step ( you have used 0.05). Using numpy.linspace instead would give more consistent results.
ref: https://numpy.org/doc/stable/reference/generated/numpy.arange.html
np.arange() does not works good for floating point numbers because floating-point operations have rounding error. Refer this for more details.
It's better to use np.linspace() in such cases. So change the following lines to :
x_range = np.linspace(px-ss, px+ss,3)
z_range = np.linspace(pz-ss, pz+ss,3)
This will work fine.
This is the best solution I can think of :
x_range = [round(i, 2) for i in np.arange(px-ss, px+2*ss,ss) if i<px+2*ss]

Spiking value inside a loop

I am writing a code to identify the proper dataset from the options in an array fits better to a given value, as below:
import numpy as np
def find_nearest(array, value):
array = np.asarray(array)
idx = (np.abs(array - value)).argmin()
return array[idx]
thickness = np.array([0.1,0.2,0.4,0.8,1.6,3.2,6.4,12.8,25.6,51.2])
b=np.array([])
a=100
c = 48.4
while c>=0 and a>0.1:
a = find_nearest(thickness,c)
if a > c:
g = np.where(thickness==a)
f = g[0]-1
a = thickness[f]
else:
a = a
c = c - a
print(c)
if c == 0.1:
break
b=np.append(b,a)
itemindex = np.where(thickness==a)
itemindex = itemindex[0]
upper_limit = len(thickness)+1
hj = np.arange(itemindex,upper_limit)
thickness = np.delete(thickness,hj, None)
print(thickness)
slots_sum = np.sum(b)
print("It will be used the following slots: ",b, "representing a total of {:.2f} mm".format(slots_sum))
However, for some reason that could not figured out, when the codes try to find the proper combination of values to reach 48.4, the code skips the in the array the value 0.4 and select 0.2 and 0.1, which results in the sum of 48.3 instead of the correct 48.4. I am banging my head for some days, I will appreciate any help.
[22.8]
[ 0.1 0.2 0.4 0.8 1.6 3.2 6.4 12.8]
[10.]
[0.1 0.2 0.4 0.8 1.6 3.2 6.4]
[3.6]
[0.1 0.2 0.4 0.8 1.6 3.2]
[0.4]
[0.1 0.2 0.4 0.8 1.6]
[0.2]
[0.1]
[0.1]
[]
It will be used the following slots: [25.6 12.8 6.4 3.2 0.2 0.1] representing a total of 48.30 mm.
```
Multiply your inputs by 10 to give integer values and the answer is what you expect.
You will need to compensate for the inexact nature of floating point values if you want to compare the sums of two different lists of floating point values.

Capped / Constrained Weights

I have a dataframe of weights, in which I want to constrain the maximum weight for any one element to 30%. However in doing this, the sum of the weights becomes less than 1, so the weights of all other elements should be uniformly increased, and then repetitively capped at 30% until the sum of all weights is 1.
For example:
If my data is in a pandas data frame, how can I do this efficiently?
Note: in reality I have like 20 elements which I want to cap at 10%... so there is much more processing involved. I also intent to run this step 1000s of times.
#jpp
The following is a rough approach, modified from your answer to iteratively solveand re-cap. It doenst produce a perfect answer though... and having a while loop makes it inefficient. Any ideas how this could be improved?
import pandas as pd
import numpy as np
cap = 0.1
df = pd.DataFrame({'Elements': list('ABCDEFGHIJKLMNO'),
'Values': [17,11,7,5,4,4,3,2,1.5,1,1,1,0.8,0.6,0.5]})
df['Uncon'] = df['Values']/df['Values'].sum()
df['Con'] = np.minimum(cap, df['Uncon'])
while df['Con'].sum() < 1 or len(df['Con'][df['Con']>cap]) >=1:
df['Con'] = np.minimum(cap, df['Con'])
nonmax = df['Con'].ne(cap)
adj = (1 - df['Con'].sum()) * df['Con'].loc[nonmax] /
df['Uncon'].loc[nonmax].sum()
df['Con'] = df['Con'].mask(nonmax, df['Con'] + adj)
print(df)
print(df['Con'].sum())
Here's one vectorised solution. The idea is to calculate an adjustment and distribute it proportionately among the non-capped values.
df = pd.DataFrame({'Elements': list('ABCDE'),
'Uncon': [0.53, 0.34, 0.06, 0.03, 0.03]})
df['Con'] = np.minimum(0.30, df['Uncon'])
nonmax = df['Con'].ne(0.30)
adj = (1 - df['Con'].sum()) * df['Uncon'].loc[nonmax] / df['Uncon'].loc[nonmax].sum()
df['Con'] = df['Con'].mask(nonmax, df['Uncon'] + adj)
print(df)
Elements Uncon Con
0 A 0.53 0.3
1 B 0.34 0.3
2 C 0.06 0.2
3 D 0.03 0.1
4 E 0.03 0.1

Find minimum of two values in Python

How to program this expression in Python:
min{cos(2xπ), 1/2}
?
I have tried:
x = np.array([1,2,3,4,5,3,2,5,7])
solution = np.min(np.cos(2*x*np.pi), 1/2)
But it does not work, and there is the following mistake:
TypeError: 'float' object cannot be interpreted as an integer.
I have tried your code with np.minimum like this :
import numpy as np
x = np.array([1,2,3,4,5,3,2,5,7])
solution = np.minimum(np.cos(2*x*np.pi), 1/2)
print(solution)
which gives something like this :
[ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
the minimum function will check through each element of array and returns an array. you can take a look here

Categories

Resources