How to distribute values in Python? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
so I am trying to calculate the likeliness of floats on CSGO skins.
A float is a value between 0 and 1 and they are distinguished in five sections.
Factory New (0 to 0.07) 3%, Minimal Wear (0.07 to 0.14) 24%, Field-Tested (0.14 to 0.38) 33%, Well-Worn (0.38 to 0.45) 24% and Battle-Scarred (0.45 to 1.0) 16%.
As you can see the distribution among the float values is not even, but weighed. However in each section the values are then spread evenly, for example:
https://blog.csgofloat.com/content/images/2020/07/image-6.png
It then gets tricky when you introduce float caps, meaning the float is no longer between 0 and 1, but fo example between 0.14 and 0.65.
The value is calculated how follows:
A section is selected according to their weights.
A float in the range of that section is randomly generated.
The final float is calculated according to this formula:
final_float = float * (max_float - min_float) + min_float
float being the randomly generated value, max and min_float the upper and lower cap (in this case 0.14 and 0.65).
I now want to calculate the distribution of skins with a cap among the five sections.
How would I do this?
Thank you in advance.

It's simple using numpy library:
import numpy as np
# input data
n_types = 5
types_weights = np.array([0.03, 0.24, 0.33, 0.24, 0.16])
types_intervals = np.array([0.0, 0.07, 0.14, 0.38, 0.45, 1.0])
# simulate distribution, by generating `n_samples` random floats
n_samples = 1000000
type_samples = np.random.choice(range(n_types), p=types_weights, size=n_samples, replace=True, )
float_ranges_begin = types_intervals[type_samples]
float_ranges_end = types_intervals[type_samples + 1]
float_samples = float_ranges_begin + np.random.rand(n_samples) * (float_ranges_end - float_ranges_begin)
# plot results
import matplotlib.pyplot as plt
plt.figure(figsize=(10,8))
plt.hist(float_samples, bins=100, density=True, rwidth=0.8)
# to see types names instead
# plt.xticks(types_intervals, types + ['1.0'], rotation='vertical', fontsize=16)
plt.xlabel('Float', fontsize=16)
plt.ylabel('Probability density', fontsize=16);
EDIT
If you wanted to find the exact distribution, then it's easy aswell, though your "scalable" requirement is not fully clear to me
n_types = 5
types = ['Factory New', 'Minimal Wear', 'Field-Tested', 'Well-Worn', 'Battle-Scarred']
types_weights = np.array([0.03, 0.24, 0.33, 0.24, 0.16])
types_intervals = np.array([-0.0001, 0.07, 0.14, 0.38, 0.45, 1.0])
# corerspond to top values on my plot, approximately [0.4 3.4 1.37 3.4 0.3]
types_probability_density = types_weights / (types_intervals[1:] - types_intervals[:-1])
def float_probability_density(x):
types = np.searchsorted(types_intervals, x) - 1
return types_probability_density[types]
sample_floats = np.linspace(0.0, 1.0, 100)
plt.figure(figsize=(16,8))
plt.bar(sample_floats, float_probability_density(sample_floats), width=0.005)

Related

numpy log2 zero masking only works outside of function

I'm trying to implement a fast entropy calculation for a float list of probabilities.
Instead of looping through a list, checking if not zero each time, I'm attempting to mask zeros using numpy's built in masking functionality. It works absolutely fine, unless I try to put it into a function, at which point it breaks. Any suggestions?
# Works fine!!
distribution = np.array([0.20, 0.3, 0.25, 0.25, 0])
log_dist = np.log2(distribution, out=np.zeros_like(distribution), where=(distribution!=0))
entropy = -np.sum(distribution * log_dist)
print(entropy)
# Breaks!
def calculate_entropy(distribution):
log_dist = np.log2(distribution, out=np.zeros_like(distribution), where=(distribution!=0))
entropy = -np.sum(distribution * log_dist)
return entropy
calculate_entropy([0.20, 0.3, 0.25, 0.25, 0])
output:
nan
Error message:
/var/folders/bt/vk3t9rnn2jz5d1wgj2rc3v200000gn/T/ipykernel_61321/2272953976.py:3: RuntimeWarning: divide by zero encountered in log2
log_dist = np.log2(distribution, out=np.zeros_like(distribution), where=(distribution!=0))
/var/folders/bt/vk3t9rnn2jz5d1wgj2rc3v200000gn/T/ipykernel_61321/2272953976.py:4: RuntimeWarning: invalid value encountered in multiply
entropy = -np.sum(distribution * log_dist)
I was expecting the function to work exactly the same, what am I missing?
Ugh, I'm an idiot. I forgot to convert the list into a numpy array. fix:
def calculate_entropy(distribution):
distribution = np.array(distribution)
log_dist = np.log2(distribution, out=np.zeros_like(distribution), where=(distribution!=0))
entropy = -np.sum(distribution * log_dist)
return entropy
calculate_entropy([0.20, 0.3, 0.25, 0.25, 0])

Python: how to compare data from 2 lists in a loop to pick the correct range to plot

I am trying to write a code that would get rid of speed data above water level. So far I have 9 bins (each 25 cm) and speed is measured for each of them but I need to compare the measured water level that I have with the bin height to make sure it is not using the above water level data.
so far I have made a list of the bins:
#The sample dataframe looks like this :
df=pd.DataFrame([[1.5, 0.2, 0.3, 0.33], [1.3, 0.25, 0.31, 0.35], [1.4, 0.21, 0.32, 0.36]], columns=['pressure', 'bin1', 'bin2', 'bin3'])
df2= pd.DataFrame ([1.25, 1.35, 1.55], columns=['bin heights'])
#to make things easier I defined separate lists
y1 = df['pressure'][:] #shows water level
s1 = df['bin1'][:] #shows speed for bin 1
s2= df['bin2'][:] #shows speed for bin 2
s3= df['bin3'][:] #shows speed for bin 3
#cleaning up data above water level; gives me the right index
diff1=np.subtract(y1, df2['bin heights'][0])
p1=diff1[(diff1<= 0.05) & (0<diff1)].index
diff2=np.subtract(y1, df2['bin heights'][1])
p2=diff2[(diff2<= 0.05) & (0<diff2)].index
diff3=np.subtract(y1, df2['bin heights'][2])
p3=diff3[(diff3<= 0.05) & (0<diff3)].index
I created the data frame below and it seems to work:
index = p1.append([p2,p3])
values=[df['bin1'][p1], df['bin2'][p2], df['bin3'][p3]]
df0 = pd.DataFrame(values)
df0=df0.sort_index()
df02 = df0.T
Now there is only one value for each row and the rest are NaN,
how do I plot row by row and get that value without having to specify the column?
Found it (I defined a new column with all non NaN values):
cols = [df02.columns.str.startswith('speed')]
df02['speed'] = df02.filter(like='speed').max(1)
print(df02)

Expanding an array with equal increments towards zero

I have various dataframes, each has a different depth range.
For a more complex computation (this is a fragment of a question posted here: Curve fitting for each column in Pandas + extrapolate values),
I need to write a function, so it would expand the depth column / array for equal increments dz (in this case 0.5) towards zero (surface).
Here the missing quotas are 0.15, 0.65 and 1.15
import numpy as np
depth = np.array([1.65, 2.15, 2.65, 3.15, 3.65, 4.15, 4.65, 5.15, 5.65, 6.15, 6.65, 7.15, 7.65, 8.15, 8.65])
Any ideas how to write a function so it does it each time for a different depth range ( i.e. depending on the varying minimum value)?
A very simple solution I did is:
depth_min = np.min(depth)
step = 0.5
missing_vals = np.arange(depth_min - step, 0, -step)[::-1]
depth_tot = np.concatenate((missing_vals, depth), axis=0)
I'm sure there exist better ways

How to create a logarithmic spaced array in Python?

I want to create a array that starts from 10^(-2) and goes to 10^5 with logarithmic spaced, like this:
levels = [0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08. 0.09, 0.1, 0.2,..., 10.,20.,30.,40.,50.,60.,70.,80.,90.,100.,200.,300.,400.,500.,600.,700.,800.,900.,1000.,2000.,3000.,4000.,5000.,6000.,7000.,8000.,9000.,10000.,20000.,30000., ..., 100000.]
Is there a way to create it without write down every single number?
I noticed there is a function in NumPy called logspace, but it doesn't work for me because when I write:
levels = np.logspace(-2., 5., num=63)
It returns me an array equally spaced, not with a logarithmic increasing.
You can use an outer product to get the desired output. You just have to append 100000 at the end in the answer as answer = np.append(answer, 100000) as pointed out by #Matt Messersmith
Explanation
Create a range of values on a logarithmic scale from 0.01 to 10000
[1.e-02 1.e-01 1.e+00 1.e+01 1.e+02 1.e+03 1.e+04]
and then create a multiplier array
[1 2 3 4 5 6 7 8 9]
Finally, take the outer product to generate your desired range of values.
a1 = np.logspace(-2, 4, 7)
# a1 = 10.**(np.arange(-2, 5)) Alternative suggested by #DSM in the comments
a2 = np.arange(1,10,1)
answer = np.outer(a1, a2).flatten()
Output
[1.e-02 2.e-02 3.e-02 4.e-02 5.e-02 6.e-02 7.e-02 8.e-02 9.e-02 1.e-01 2.e-01 3.e-01 4.e-01 5.e-01 6.e-01 7.e-01 8.e-01 9.e-01 1.e+00 2.e+00 3.e+00 4.e+00 5.e+00 6.e+00 7.e+00 8.e+00 9.e+00 1.e+01 2.e+01 3.e+01 4.e+01 5.e+01 6.e+01 7.e+01 8.e+01 9.e+01 1.e+02 2.e+02 3.e+02 4.e+02 5.e+02 6.e+02 7.e+02 8.e+02 9.e+02 1.e+03 2.e+03 3.e+03 4.e+03 5.e+03 6.e+03 7.e+03 8.e+03 9.e+03 1.e+04 2.e+04 3.e+04 4.e+04 5.e+04 6.e+04 7.e+04 8.e+04 9.e+04]

Interpolating a data point on a curve with a negative slope

Hi I have two numpy arrays (in this case representing depth and percentage depth dose data) as follows:
depth = np.array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ,
1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.2,
2.4, 2.6, 2.8, 3. , 3.5, 4. , 4.5, 5. , 5.5])
pdd = np.array([ 80.40649399, 80.35692155, 81.94323956, 83.78981286,
85.58681373, 87.47056637, 89.39149833, 91.33721651,
93.35729334, 95.25343909, 97.06283306, 98.53761309,
99.56624117, 100. , 99.62820672, 98.47564754,
96.33163961, 93.12182427, 89.0940637 , 83.82699219,
77.75436857, 63.15528566, 46.62287768, 29.9665386 ,
16.11104226, 6.92774817, 0.69401413, 0.58247614,
0.55768992, 0.53290371, 0.5205106 ])
which when plotted give the following curve:
I need to find the depth at which the pdd falls to a given value (initially 50%). I have tried slicing the arrays at the point where the pdd reaches 100% as I'm only interested in the points after this.
Unfortunately np.interp only appears to work where both x and y values are incresing.
Could anyone suggest where I should go next?
If I understand you correctly, you want to interpolate the function depth = f(pdd) at pdd = 50.0. For the purposes of the interpolation, it might help for you to think of pdd as corresponding to your "x" values, and depth as corresponding to your "y" values.
You can use np.argsort to sort your "x" and "y" by ascending order of "x" (i.e. ascending pdd), then use np.interp as usual:
# `idx` is an an array of integer indices that sorts `pdd` in ascending order
idx = np.argsort(pdd)
depth_itp = np.interp([50.0], pdd[idx], depth[idx])
plt.plot(depth, pdd)
plt.plot(depth_itp, 50, 'xr', ms=20, mew=2)
This isn't really a programming solution, but it's how you can find the depth. I'm taking the liberty of renaming your variables, so x(i) = depth(i) and y(i) = pdd(i).
In a given interval [x(i),x(i+1)], your linear interpolant is
p_1(X) = y(i) + (X - x(i))*(y(i+1) - y(i))/(x(i+1) - x(i))
You want to find X such that p_1(X) = 50. First find i such that x(i)>50 and x(i+1), then the above equation can be rearranged to give
X = x(i) + (50 - y(i))*((x(i+1) - x(i))/(y(i+1) - y(i)))
For your data (with MATLAB; sorry, no python code) I make it approximately 2.359. This can then be verified with np.interp(X, depth, pdd)
There are several methods to carry out interpolation. For your case, you are basically looking for the depth at 50% which is not available in your data. The simplest interpolation is the linear case. I'm using numerical recipes library in C++ for acquiring the interpolated value via several techniques, therefore,
Linear Interpolation: see page 117
interpolated value depth(50%): 2.35915
Polynomial Interpolation: see page 117
interpolated value depth(50%): 2.36017
Cubic Spline Interpolation: see page 120
interpolated value depth(50%): 2.19401
Rational Function Interpolation: see page 124
interpolated value depth(50%): 2.35986

Categories

Resources