I am writing a code to identify the proper dataset from the options in an array fits better to a given value, as below:
import numpy as np
def find_nearest(array, value):
array = np.asarray(array)
idx = (np.abs(array - value)).argmin()
return array[idx]
thickness = np.array([0.1,0.2,0.4,0.8,1.6,3.2,6.4,12.8,25.6,51.2])
b=np.array([])
a=100
c = 48.4
while c>=0 and a>0.1:
a = find_nearest(thickness,c)
if a > c:
g = np.where(thickness==a)
f = g[0]-1
a = thickness[f]
else:
a = a
c = c - a
print(c)
if c == 0.1:
break
b=np.append(b,a)
itemindex = np.where(thickness==a)
itemindex = itemindex[0]
upper_limit = len(thickness)+1
hj = np.arange(itemindex,upper_limit)
thickness = np.delete(thickness,hj, None)
print(thickness)
slots_sum = np.sum(b)
print("It will be used the following slots: ",b, "representing a total of {:.2f} mm".format(slots_sum))
However, for some reason that could not figured out, when the codes try to find the proper combination of values to reach 48.4, the code skips the in the array the value 0.4 and select 0.2 and 0.1, which results in the sum of 48.3 instead of the correct 48.4. I am banging my head for some days, I will appreciate any help.
[22.8]
[ 0.1 0.2 0.4 0.8 1.6 3.2 6.4 12.8]
[10.]
[0.1 0.2 0.4 0.8 1.6 3.2 6.4]
[3.6]
[0.1 0.2 0.4 0.8 1.6 3.2]
[0.4]
[0.1 0.2 0.4 0.8 1.6]
[0.2]
[0.1]
[0.1]
[]
It will be used the following slots: [25.6 12.8 6.4 3.2 0.2 0.1] representing a total of 48.30 mm.
```
Multiply your inputs by 10 to give integer values and the answer is what you expect.
You will need to compensate for the inexact nature of floating point values if you want to compare the sums of two different lists of floating point values.
Related
my urn contains the numbers 1.3 and 0.9, which I would like to draw 35 times per simulation with replacement. Then perform a final calculation, from which the result is appended to a list.
In total I would like to perform 10000 simulations.
My code looks like this:
#Draw either 1.3 or 0.9
returns = [1.3,0.9]
#No. of simulations
simulations = 10000
#10000 for loops
for i in range(simulations):
lst = []
#each iteration should include 35 random draws with replacement
for i in range(35):
lst.append(random.choices(returns,1))
lst = np.array(lst)
#Do final calculation and append solution to list
ret = []
ret.append((prod(lst)^(1/35))-1)
The error i receive is TypeError: 'int' object is not iterable. I understand why it's not working as i am trying to convert an integer to a list object....but i just don't know how to solve this?
Full stack trace:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-5d61655781f6> in <module>
9 #each iteration should include 35 random draws with replacement
10 for i in range(35):
---> 11 lst.append(random.choices(returns,1))
12
13 lst = np.array(lst)
~/opt/anaconda3/lib/python3.7/random.py in choices(self, population, weights, cum_weights, k)
355 total = len(population)
356 return [population[_int(random() * total)] for i in range(k)]
--> 357 cum_weights = list(_itertools.accumulate(weights))
358 elif weights is not None:
359 raise TypeError('Cannot specify both weights and cumulative weights')
TypeError: 'int' object is not iterable
If you want to convert lst to a numpy array, you can instead use numpy.random.choice. This will also remove the need of the for loop.
import numpy as np
#Draw either 1.3 or 0.9
urn = [1.3,0.9]
#No. of simulations
simulations = 10000
#No. of draws
draws = 35
# simulate the draws from the urn
X=np.random.choice(urn,(draws,simulations))
# print first 10 elements as a check
print(X[1:10])
# print shape as a check
print(X.shape)
output:
[[1.3 1.3 1.3 ... 0.9 1.3 1.3]
[0.9 1.3 0.9 ... 0.9 0.9 0.9]
[0.9 1.3 0.9 ... 1.3 1.3 0.9]
...
[1.3 0.9 0.9 ... 1.3 0.9 0.9]
[1.3 1.3 1.3 ... 0.9 0.9 1.3]
[1.3 1.3 0.9 ... 0.9 1.3 1.3]]
(35, 10000)
I changed the name of returns to urn. returns is a bit confusing in python.
When you call:
random.choices(returns,1)
python thinks that the 1 corresponds to the weight, if it corresponds to the k which allows to choose the number of elements to return, it must be specified like this:
random.choices(returns,k=1)
but by default it is at 1 so it is not necessary to inform it
Working with 'scipy.optimize.minimize' I'm having strange using of the minimize procedure. Below is test code to show my results:
import numpy as np
import pandas as pd
from scipy.optimize import minimize
def SES(good, a, h):
print('good is : {}'.format(good))
print('a is : {}'.format(a))
print('h is : {}'.format(h))
return 0
good = [1,2,3,4,5,6]
a = minimize(SES, x0 = good, args=(0.1, 1), method = 'L-BFGS-B', bounds = [[0.1, 0.3]]*len(good))
I'm expecting that SES function will print for 'good' parameter the values [1,2,3,4,5,6]. But I'm receiving the following output
good is : [0.3 0.3 0.3 0.3 0.3 0.3]
a is : 0.1
h is : 1
If I remove bounds parameter then I receive output as I expect:
a = minimize(SES, x0 = good, args=(0.1, 1), method = 'L-BFGS-B')
good is : [1. 2. 3. 4. 5. 6.]
a is : 0.1
h is : 1
Could you explain what I'm doing wrong...
It seems I know where is problem. The good is out of bounds therefore I have this result..
I have a dataframe of weights, in which I want to constrain the maximum weight for any one element to 30%. However in doing this, the sum of the weights becomes less than 1, so the weights of all other elements should be uniformly increased, and then repetitively capped at 30% until the sum of all weights is 1.
For example:
If my data is in a pandas data frame, how can I do this efficiently?
Note: in reality I have like 20 elements which I want to cap at 10%... so there is much more processing involved. I also intent to run this step 1000s of times.
#jpp
The following is a rough approach, modified from your answer to iteratively solveand re-cap. It doenst produce a perfect answer though... and having a while loop makes it inefficient. Any ideas how this could be improved?
import pandas as pd
import numpy as np
cap = 0.1
df = pd.DataFrame({'Elements': list('ABCDEFGHIJKLMNO'),
'Values': [17,11,7,5,4,4,3,2,1.5,1,1,1,0.8,0.6,0.5]})
df['Uncon'] = df['Values']/df['Values'].sum()
df['Con'] = np.minimum(cap, df['Uncon'])
while df['Con'].sum() < 1 or len(df['Con'][df['Con']>cap]) >=1:
df['Con'] = np.minimum(cap, df['Con'])
nonmax = df['Con'].ne(cap)
adj = (1 - df['Con'].sum()) * df['Con'].loc[nonmax] /
df['Uncon'].loc[nonmax].sum()
df['Con'] = df['Con'].mask(nonmax, df['Con'] + adj)
print(df)
print(df['Con'].sum())
Here's one vectorised solution. The idea is to calculate an adjustment and distribute it proportionately among the non-capped values.
df = pd.DataFrame({'Elements': list('ABCDE'),
'Uncon': [0.53, 0.34, 0.06, 0.03, 0.03]})
df['Con'] = np.minimum(0.30, df['Uncon'])
nonmax = df['Con'].ne(0.30)
adj = (1 - df['Con'].sum()) * df['Uncon'].loc[nonmax] / df['Uncon'].loc[nonmax].sum()
df['Con'] = df['Con'].mask(nonmax, df['Uncon'] + adj)
print(df)
Elements Uncon Con
0 A 0.53 0.3
1 B 0.34 0.3
2 C 0.06 0.2
3 D 0.03 0.1
4 E 0.03 0.1
How to program this expression in Python:
min{cos(2xπ), 1/2}
?
I have tried:
x = np.array([1,2,3,4,5,3,2,5,7])
solution = np.min(np.cos(2*x*np.pi), 1/2)
But it does not work, and there is the following mistake:
TypeError: 'float' object cannot be interpreted as an integer.
I have tried your code with np.minimum like this :
import numpy as np
x = np.array([1,2,3,4,5,3,2,5,7])
solution = np.minimum(np.cos(2*x*np.pi), 1/2)
print(solution)
which gives something like this :
[ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
the minimum function will check through each element of array and returns an array. you can take a look here
I have a set of data that looks like the following:
index 902.4 909.4 915.3
n 0.6 0.3 1.4
n.1 0.4 0.3 1.3
n.2 0.3 0.2 1.1
n.3 0.2 0.2 1.3
n.4 0.4 0.3 1.4
DCIS 0.3 1.6
DCIS.1 0.3 1.2
DCIS.2 1.1
DCIS.3 0.2 1.2
DCIS.4 0.2 1.3
DCIS.5 0.2 0.1 1.5
br_1 0.5 0.4 1.4
br_1.1 0.2 1.3
br_1.2 0.5 0.2 1.4
br_1.3 0.5 0.2 1.6
br_1.4 1.4
with the regular python indexing for the column[0]. The below is a code that I've written with lots of help from members of Stackoverflow:
nh = pd.ExcelFile(file)
df = pd.read_excel(nh)
df = df.set_index('Samples').transpose()
df = df.reset_index()
df_n = df.loc[df['index'].str.startswith('n')]
df_DCIS = df.loc[df['index'].str.startswith('DCIS')]
df_br1234 = df.loc[df['index'].str.startswith('br')]
#plt.tight_layout()
for i in range(1, df.shape[1]):
plt.figure()
df_n.iloc[:, i].hist(histtype='step', color='k', label='N')
df_DCIS.iloc[:, i].hist(histtype='step', color='r', label='DCIS')
df_br1234.iloc[:, i].hist(histtype='step', color='orange', label='IDC')
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), fancybox=True, shadow=True)
plt.title("Histograms for " + df.columns[i], loc='center')
plt.show()
This creates multiple figures with cut-off legend (this was not cut off when the figure was made by pycharm). However, the plt.title gives an error message saying TypeError: must be str, not float. I do understand that the columns of the different dataframes are floating numbers, and when I type print(df.columns), it says dtype is object. Is there a way to convert the float object to str? I tried using
plt.title("Histograms for " + df.columns[i].astype('str'))
but it said float object has no attribute astype.
You can use this:
plt.title("Histograms for " + str(df.columns[i]))
If you don't want the plots to be attached together, I'd suggest avoiding subplots() entirely. Instead, separate each plot with plt.show():
cols = ["902.4", "909.4", "915.3"]
data = [{"df":df_n, "color":"k", "label":"N"},
{"df":df_DCIS, "color":"r", "label":"DCIS"},
{"df":df_br1234, "color":"orange", "label":"IDC"}]
for col in cols:
for dataset in data:
dataset["df"][col].hist(histtype='step',
color=dataset["color"],
label=dataset["label"])
plt.title(f"{dataset['label']} for {col}")
plt.savefig(f"{dataset['label']}_for_{col}_plot.png")
plt.show()
Try
plt.title("Histograms for {0:.2f}".format(df.columns[i]))
The characters inside the curly brackets are from the Format Specification Mini-Language. This example formats a float with 2 decimal places. If you follow the link you'll see lots of other options.