How to create a logarithmic spaced array in Python? - python

I want to create a array that starts from 10^(-2) and goes to 10^5 with logarithmic spaced, like this:
levels = [0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08. 0.09, 0.1, 0.2,..., 10.,20.,30.,40.,50.,60.,70.,80.,90.,100.,200.,300.,400.,500.,600.,700.,800.,900.,1000.,2000.,3000.,4000.,5000.,6000.,7000.,8000.,9000.,10000.,20000.,30000., ..., 100000.]
Is there a way to create it without write down every single number?
I noticed there is a function in NumPy called logspace, but it doesn't work for me because when I write:
levels = np.logspace(-2., 5., num=63)
It returns me an array equally spaced, not with a logarithmic increasing.

You can use an outer product to get the desired output. You just have to append 100000 at the end in the answer as answer = np.append(answer, 100000) as pointed out by #Matt Messersmith
Explanation
Create a range of values on a logarithmic scale from 0.01 to 10000
[1.e-02 1.e-01 1.e+00 1.e+01 1.e+02 1.e+03 1.e+04]
and then create a multiplier array
[1 2 3 4 5 6 7 8 9]
Finally, take the outer product to generate your desired range of values.
a1 = np.logspace(-2, 4, 7)
# a1 = 10.**(np.arange(-2, 5)) Alternative suggested by #DSM in the comments
a2 = np.arange(1,10,1)
answer = np.outer(a1, a2).flatten()
Output
[1.e-02 2.e-02 3.e-02 4.e-02 5.e-02 6.e-02 7.e-02 8.e-02 9.e-02 1.e-01 2.e-01 3.e-01 4.e-01 5.e-01 6.e-01 7.e-01 8.e-01 9.e-01 1.e+00 2.e+00 3.e+00 4.e+00 5.e+00 6.e+00 7.e+00 8.e+00 9.e+00 1.e+01 2.e+01 3.e+01 4.e+01 5.e+01 6.e+01 7.e+01 8.e+01 9.e+01 1.e+02 2.e+02 3.e+02 4.e+02 5.e+02 6.e+02 7.e+02 8.e+02 9.e+02 1.e+03 2.e+03 3.e+03 4.e+03 5.e+03 6.e+03 7.e+03 8.e+03 9.e+03 1.e+04 2.e+04 3.e+04 4.e+04 5.e+04 6.e+04 7.e+04 8.e+04 9.e+04]

Related

Python: how to compare data from 2 lists in a loop to pick the correct range to plot

I am trying to write a code that would get rid of speed data above water level. So far I have 9 bins (each 25 cm) and speed is measured for each of them but I need to compare the measured water level that I have with the bin height to make sure it is not using the above water level data.
so far I have made a list of the bins:
#The sample dataframe looks like this :
df=pd.DataFrame([[1.5, 0.2, 0.3, 0.33], [1.3, 0.25, 0.31, 0.35], [1.4, 0.21, 0.32, 0.36]], columns=['pressure', 'bin1', 'bin2', 'bin3'])
df2= pd.DataFrame ([1.25, 1.35, 1.55], columns=['bin heights'])
#to make things easier I defined separate lists
y1 = df['pressure'][:] #shows water level
s1 = df['bin1'][:] #shows speed for bin 1
s2= df['bin2'][:] #shows speed for bin 2
s3= df['bin3'][:] #shows speed for bin 3
#cleaning up data above water level; gives me the right index
diff1=np.subtract(y1, df2['bin heights'][0])
p1=diff1[(diff1<= 0.05) & (0<diff1)].index
diff2=np.subtract(y1, df2['bin heights'][1])
p2=diff2[(diff2<= 0.05) & (0<diff2)].index
diff3=np.subtract(y1, df2['bin heights'][2])
p3=diff3[(diff3<= 0.05) & (0<diff3)].index
I created the data frame below and it seems to work:
index = p1.append([p2,p3])
values=[df['bin1'][p1], df['bin2'][p2], df['bin3'][p3]]
df0 = pd.DataFrame(values)
df0=df0.sort_index()
df02 = df0.T
Now there is only one value for each row and the rest are NaN,
how do I plot row by row and get that value without having to specify the column?
Found it (I defined a new column with all non NaN values):
cols = [df02.columns.str.startswith('speed')]
df02['speed'] = df02.filter(like='speed').max(1)
print(df02)

How to distribute values in Python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
so I am trying to calculate the likeliness of floats on CSGO skins.
A float is a value between 0 and 1 and they are distinguished in five sections.
Factory New (0 to 0.07) 3%, Minimal Wear (0.07 to 0.14) 24%, Field-Tested (0.14 to 0.38) 33%, Well-Worn (0.38 to 0.45) 24% and Battle-Scarred (0.45 to 1.0) 16%.
As you can see the distribution among the float values is not even, but weighed. However in each section the values are then spread evenly, for example:
https://blog.csgofloat.com/content/images/2020/07/image-6.png
It then gets tricky when you introduce float caps, meaning the float is no longer between 0 and 1, but fo example between 0.14 and 0.65.
The value is calculated how follows:
A section is selected according to their weights.
A float in the range of that section is randomly generated.
The final float is calculated according to this formula:
final_float = float * (max_float - min_float) + min_float
float being the randomly generated value, max and min_float the upper and lower cap (in this case 0.14 and 0.65).
I now want to calculate the distribution of skins with a cap among the five sections.
How would I do this?
Thank you in advance.
It's simple using numpy library:
import numpy as np
# input data
n_types = 5
types_weights = np.array([0.03, 0.24, 0.33, 0.24, 0.16])
types_intervals = np.array([0.0, 0.07, 0.14, 0.38, 0.45, 1.0])
# simulate distribution, by generating `n_samples` random floats
n_samples = 1000000
type_samples = np.random.choice(range(n_types), p=types_weights, size=n_samples, replace=True, )
float_ranges_begin = types_intervals[type_samples]
float_ranges_end = types_intervals[type_samples + 1]
float_samples = float_ranges_begin + np.random.rand(n_samples) * (float_ranges_end - float_ranges_begin)
# plot results
import matplotlib.pyplot as plt
plt.figure(figsize=(10,8))
plt.hist(float_samples, bins=100, density=True, rwidth=0.8)
# to see types names instead
# plt.xticks(types_intervals, types + ['1.0'], rotation='vertical', fontsize=16)
plt.xlabel('Float', fontsize=16)
plt.ylabel('Probability density', fontsize=16);
EDIT
If you wanted to find the exact distribution, then it's easy aswell, though your "scalable" requirement is not fully clear to me
n_types = 5
types = ['Factory New', 'Minimal Wear', 'Field-Tested', 'Well-Worn', 'Battle-Scarred']
types_weights = np.array([0.03, 0.24, 0.33, 0.24, 0.16])
types_intervals = np.array([-0.0001, 0.07, 0.14, 0.38, 0.45, 1.0])
# corerspond to top values on my plot, approximately [0.4 3.4 1.37 3.4 0.3]
types_probability_density = types_weights / (types_intervals[1:] - types_intervals[:-1])
def float_probability_density(x):
types = np.searchsorted(types_intervals, x) - 1
return types_probability_density[types]
sample_floats = np.linspace(0.0, 1.0, 100)
plt.figure(figsize=(16,8))
plt.bar(sample_floats, float_probability_density(sample_floats), width=0.005)

Is there a function similar to np.random.choice that has a higher probability of choosing the lower values in the probability distribution?

I have an array of objects with a corresponding probability for each, say
sample = [a, b, c, d, e, f]
probability = [0.1, 0.15, 0.6, 0.05, 0.03, 0.07]
For most functions in my class this is perfect to use with np.random.choice as I want to select the values with the highest percentage chance.
On one of the functions though, I need it to be biased to the values with a lower probability (i.e. more likely to pick e and d in the sample than c).
Is there a function that can do this or do I need to consider converting the probability to some inverse probability like
inverse_probability = [(1-x) for x in probability]
inverse_probability = [x/sum(inverse_probability) for x in probability]
and then use this in the np.random.choice function?
Thanks in advance!
This is a simple solution but should solve your problem:
sample = ['a', 'b', 'c', 'd', 'e', 'f']
probability = [0.1, 0.15, 0.6, 0.05, 0.03, 0.07]
np.random.choice(a=sample, p=probability)
Solution 1:
inverse_probability = [(1-x) for x in probability]
inverse_probability = [x/sum(inverse_probability) for x in inverse_probability]
np.random.choice(a=sample, p=inverse_probability)
Solution 2:
inverse_probability = [1/x for x in probability]
inverse_probability = [x/sum(inverse_probability) for x in inverse_probability]
np.random.choice(a=sample, p=inverse_probability)
I believe you can use a Poisson distribution:
from numpy.random import poisson
index = poisson()
return sample[min(len(sample, index)]
See Wikipedia for more details on this distribution.
Note: This is valid only if you don't have any requirements for how the prioritization is done.

Interpolating a data point on a curve with a negative slope

Hi I have two numpy arrays (in this case representing depth and percentage depth dose data) as follows:
depth = np.array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ,
1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.2,
2.4, 2.6, 2.8, 3. , 3.5, 4. , 4.5, 5. , 5.5])
pdd = np.array([ 80.40649399, 80.35692155, 81.94323956, 83.78981286,
85.58681373, 87.47056637, 89.39149833, 91.33721651,
93.35729334, 95.25343909, 97.06283306, 98.53761309,
99.56624117, 100. , 99.62820672, 98.47564754,
96.33163961, 93.12182427, 89.0940637 , 83.82699219,
77.75436857, 63.15528566, 46.62287768, 29.9665386 ,
16.11104226, 6.92774817, 0.69401413, 0.58247614,
0.55768992, 0.53290371, 0.5205106 ])
which when plotted give the following curve:
I need to find the depth at which the pdd falls to a given value (initially 50%). I have tried slicing the arrays at the point where the pdd reaches 100% as I'm only interested in the points after this.
Unfortunately np.interp only appears to work where both x and y values are incresing.
Could anyone suggest where I should go next?
If I understand you correctly, you want to interpolate the function depth = f(pdd) at pdd = 50.0. For the purposes of the interpolation, it might help for you to think of pdd as corresponding to your "x" values, and depth as corresponding to your "y" values.
You can use np.argsort to sort your "x" and "y" by ascending order of "x" (i.e. ascending pdd), then use np.interp as usual:
# `idx` is an an array of integer indices that sorts `pdd` in ascending order
idx = np.argsort(pdd)
depth_itp = np.interp([50.0], pdd[idx], depth[idx])
plt.plot(depth, pdd)
plt.plot(depth_itp, 50, 'xr', ms=20, mew=2)
This isn't really a programming solution, but it's how you can find the depth. I'm taking the liberty of renaming your variables, so x(i) = depth(i) and y(i) = pdd(i).
In a given interval [x(i),x(i+1)], your linear interpolant is
p_1(X) = y(i) + (X - x(i))*(y(i+1) - y(i))/(x(i+1) - x(i))
You want to find X such that p_1(X) = 50. First find i such that x(i)>50 and x(i+1), then the above equation can be rearranged to give
X = x(i) + (50 - y(i))*((x(i+1) - x(i))/(y(i+1) - y(i)))
For your data (with MATLAB; sorry, no python code) I make it approximately 2.359. This can then be verified with np.interp(X, depth, pdd)
There are several methods to carry out interpolation. For your case, you are basically looking for the depth at 50% which is not available in your data. The simplest interpolation is the linear case. I'm using numerical recipes library in C++ for acquiring the interpolated value via several techniques, therefore,
Linear Interpolation: see page 117
interpolated value depth(50%): 2.35915
Polynomial Interpolation: see page 117
interpolated value depth(50%): 2.36017
Cubic Spline Interpolation: see page 120
interpolated value depth(50%): 2.19401
Rational Function Interpolation: see page 124
interpolated value depth(50%): 2.35986

making binned boxplot in matplotlib with numpy and scipy in Python

I have a 2-d array containing pairs of values and I'd like to make a boxplot of the y-values by different bins of the x-values. I.e. if the array is:
my_array = array([[1, 40.5], [4.5, 60], ...]])
then I'd like to bin my_array[:, 0] and then for each of the bins, produce a boxplot of the corresponding my_array[:, 1] values that fall into each box. So in the end I want the plot to contain number of bins-many box plots.
I tried the following:
min_x = min(my_array[:, 0])
max_x = max(my_array[:, 1])
num_bins = 3
bins = linspace(min_x, max_x, num_bins)
elts_to_bins = digitize(my_array[:, 0], bins)
However, this gives me values in elts_to_bins that range from 1 to 3. I thought I should get 0-based indices for the bins, and I only wanted 3 bins. I'm assuming this is due to some trickyness with how bins are represented in linspace vs. digitize.
What is the easiest way to achieve this? I want num_bins-many equally spaced bins, with the first bin containing the lower half of the data and the upper bin containing the upper half... i.e., I want each data point to fall into some bin, so that I can make a boxplot.
thanks.
You're getting the 3rd bin for the maximum value in the array (I'm assuming you have a typo there, and max_x should be "max(my_array[:,0])" instead of "max(my_array[:,1])"). You can avoid this by adding 1 (or any positive number) to the last bin.
Also, if I'm understanding you correctly, you want to bin one variable by another, so my example below shows that. If you're using recarrays (which are much slower) there are also several functions in matplotlib.mlab (e.g. mlab.rec_groupby, etc) that do this sort of thing.
Anyway, in the end, you might have something like this (to bin x by the values in y, assuming x and y are the same length)
def bin_by(x, y, nbins=30):
"""
Bin x by y.
Returns the binned "x" values and the left edges of the bins
"""
bins = np.linspace(y.min(), y.max(), nbins+1)
# To avoid extra bin for the max value
bins[-1] += 1
indicies = np.digitize(y, bins)
output = []
for i in xrange(1, len(bins)):
output.append(x[indicies==i])
# Just return the left edges of the bins
bins = bins[:-1]
return output, bins
As a quick example:
In [3]: x = np.random.random((100, 2))
In [4]: binned_values, bins = bin_by(x[:,0], x[:,1], 2)
In [5]: binned_values
Out[5]:
[array([ 0.59649575, 0.07082605, 0.7191498 , 0.4026375 , 0.06611863,
0.01473529, 0.45487203, 0.39942696, 0.02342408, 0.04669615,
0.58294003, 0.59510434, 0.76255006, 0.76685052, 0.26108928,
0.7640156 , 0.01771553, 0.38212975, 0.74417014, 0.38217517,
0.73909022, 0.21068663, 0.9103707 , 0.83556636, 0.34277006,
0.38007865, 0.18697416, 0.64370535, 0.68292336, 0.26142583,
0.50457354, 0.63071319, 0.87525221, 0.86509534, 0.96382375,
0.57556343, 0.55860405, 0.36392931, 0.93638048, 0.66889756,
0.46140831, 0.01675165, 0.15401495, 0.10813141, 0.03876953,
0.65967335, 0.86803192, 0.94835281, 0.44950182]),
array([ 0.9249993 , 0.02682873, 0.89439141, 0.26415792, 0.42771144,
0.12292614, 0.44790357, 0.64692616, 0.14871052, 0.55611472,
0.72340179, 0.55335053, 0.07967047, 0.95725514, 0.49737279,
0.99213794, 0.7604765 , 0.56719713, 0.77828727, 0.77046566,
0.15060196, 0.39199123, 0.78904624, 0.59974575, 0.6965413 ,
0.52664095, 0.28629324, 0.21838664, 0.47305751, 0.3544522 ,
0.57704906, 0.1023201 , 0.76861237, 0.88862359, 0.29310836,
0.22079126, 0.84966201, 0.9376939 , 0.95449215, 0.10856864,
0.86655289, 0.57835533, 0.32831162, 0.1673871 , 0.55742108,
0.02436965, 0.45261232, 0.31552715, 0.56666458, 0.24757898,
0.8674747 ])]
Hope that helps a bit!
Numpy has a dedicated function for creating histograms the way you need to:
histogram(a, bins=10, range=None, normed=False, weights=None, new=None)
which you can use like:
(hist_data, bin_edges) = histogram(my_array[:,0], weights=my_array[:,1])
The key point here is to use the weights argument: each value a[i] will contribute weights[i] to the histogram. Example:
a = [0, 1]
weights = [10, 2]
describes 10 points at x = 0 and 2 points at x = 1.
You can set the number of bins, or the bin limits, with the bins argument (see the official documentation for more details).
The histogram can then be plotted with something like:
bar(bin_edges[:-1], hist_data)
If you only need to do a histogram plot, the similar hist() function can directly plot the histogram:
hist(my_array[:,0], weights=my_array[:,1])

Categories

Resources