Plot excess points on x axis in python - python

I am looking to plot data captured at 240 hz(x axis) vs data captured at 60hz(y axis). The x axis data is 4 times that of y axis and I would like 4 points on x axis to be plotted for 1 point on y axis, so that the result graph looks like a step.
My list: Y axis: [0.0, 0.001, 0.003, 0,2, 0.4, 0.5, 0.7, 0.88, 0.9, 1.0]
X Axis: np.arange(1, 40) # numpy
Any ideas how to club the 4 excess points into one in the graph?

You can use numpy.repeat to duplicate each data point in your series as many times as you want. For your specific example:
from matplotlib import pyplot as plt
import numpy as np
fig, ax = plt.subplots()
X = np.arange(1,41)
Y = np.array([0.0, 0.001, 0.003, 0.2, 0.4, 0.5, 0.7, 0.88, 0.9, 1.0])
Y2 = np.repeat(Y,4)
print(Y2)
ax.plot(X,Y2)
plt.show()
Gives the following output for Y2:
[0. 0. 0. 0. 0.001 0.001 0.001 0.001 0.003 0.003 0.003 0.003
0.2 0.2 0.2 0.2 0.4 0.4 0.4 0.4 0.5 0.5 0.5 0.5
0.7 0.7 0.7 0.7 0.88 0.88 0.88 0.88 0.9 0.9 0.9 0.9
1. 1. 1. 1. ]
And the following figure:
You can also do the opposite with
X2 = X[::4]
ax.plot(X2, Y)
In which case you get this figure:

Related

How do I solve this problem using linear least squares with scipy?

I had the following problem on a test and my code didn't give me the exact results I needed, but I can't really find what went wrong.
We discovered a new comet whose elliptical orbit can be represented in a Cartesian $(x, y)$ coordinate system by the equation
$$ay^2+ bxy + cx + dy + e = x^2$$
Use a SciPy routine to solve the linear least squares problem to determine the orbital parameters a, b, c, >d, e, given the following observations of the comet’s position
Observation
1
2
3
4
5
6
7
8
9
10
x
1.02
0.95
0.87
0.77
0.67
0.56
0.44
0.3
0.16
0.01
y
0.39
0.32
0.27
0.22
0.18
0.15
0.13
0.12
0.13
0.15
I wrote this code, but when i plot the equation using matplotlib.contour the curve doesn't match the data points.
def fit(x, y):
n = (np.shape(x)[0])
A = np.array([y**2, x * y, x, y, np.ones(n)]).T
b = x**2
return linalg.lstsq(A, b)[0]
obx = np.array([1.02, 0.95, 0.87, 0.77, 0.67, 0.56, 0.44, 0.3, 0.16, 0.01])
oby = np.array([0.39, 0.32, 0.27, 0.22, 0.18, 0.15, 0.13, 0.12, 0.13, 0.15])
fit(obx, oby)
Does somebody know what I am doing wrong here, should i maybe use curvefit instead of lstsq , or is my mistake in the plotting code?
Some follow-up clarification, the code I wrote gave this output for the constants a to e.
array([-2.63562548, 0.14364618, 0.55144696, 3.22294034, -0.43289427])
I plotted the result with this code
obx = np.array([1.02, 0.95, 0.87, 0.77, 0.67, 0.56, 0.44, 0.3, 0.16, 0.01])
oby = np.array([0.39, 0.32, 0.27, 0.22, 0.18, 0.15, 0.13, 0.12, 0.13, 0.15])
def data_plot(x, y, a, b, c, d, e):
def f(x, y):
return a * y**2 + b * x * y + c * x + d * y + e
plt.close()
size = 100
xrang = np.linspace(0,0.5, size)
yrang = np.linspace(0,90, size)
X, Y = np.meshgrid(xrang, yrang)
F = f(X, Y)
G = X**2
plt.contour( (F-G), [0])
plt.scatter(x, y)
plt.xlim([-0.5, 1.5])
plt.ylim([0, 0.5])
plt.xlabel('x-coordinate')
plt.ylabel('y-coordinate')
plt.show()
return None
data_plot(obx, oby, -2.63562548, 0.14364618, 0.55144696, 3.22294034, -0.43289427
)
which give this, obviously wrong, result.plot
Does somebody know what I am doing wrong here, should i maybe use curvefit instead of lstsq , or is my mistake in the plotting code?
I think it's a mistake in your plotting code. I plotted this in a different manner, and it agrees with the initial points.
from sympy import plot_implicit, symbols, Eq, And
from sympy.plotting.plot import List2DSeries
import numpy as np
obx = np.array([1.02, 0.95, 0.87, 0.77, 0.67, 0.56, 0.44, 0.3, 0.16, 0.01])
oby = np.array([0.39, 0.32, 0.27, 0.22, 0.18, 0.15, 0.13, 0.12, 0.13, 0.15])
x, y = symbols('x y')
a, b, c, d, e = -2.63562548, 0.14364618, 0.55144696, 3.22294034, -0.43289427
p1 = plot_implicit(Eq(a*y**2+ b*x*y + c*x + d*y + e, x**2), (x, -2, 2), (y, -2, 2), line_color='red')
p1.append(List2DSeries(obx, oby))
p1.show()
(Blue is initial points, red is the least-squares fit.)

How to update a matrix of probabilities

I am trying to find/figure out a function that can update probabilities.
Suppose there are three players and each of them get a fruit out of a basket: ["apple", "orange", "banana"]
I store the probabilities of each player having each fruit in a matrix (like this table):
apple
orange
banana
Player 1
0.3333
0.3333
0.3333
Player 2
0.3333
0.3333
0.3333
Player 3
0.3333
0.3333
0.3333
The table can be interpreted as the belief of someone (S) who doesn't know who has what. Each row and column sums to 1.0 because each player has one of the fruits and each fruit is at one of the players.
I want to update these probabilities based on some knowledge that S gains. Example information:
Player 1 did X. We know that Player 1 does X with 80% probability if he has an apple. With 50% if he has an orange. With 10% if he has a banana.
This can be written more concisely as [0.8, 0.5, 0.1] and let us call it reach_probability.
A fairly easy to comprehend example is:
probabilities = [
[0.5, 0.5, 0.0],
[0.0, 0.5, 0.5],
[0.5, 0.0, 0.5],
]
# Player 1's
reach_probability = [1.0, 0.0, 1.0]
new_probabilities = [
[1.0, 0.0, 0.0],
[0.0, 1.0, 0.0],
[0.0, 0.0, 1.0],
]
The above example can be fairly easily thought through.
another example:
probabilities = [
[0.25, 0.25, 0.50],
[0.25, 0.50, 0.25],
[0.50, 0.25, 0.25],
]
# Player 1's
reach_probability = [1.0, 0.5, 0.5]
new_probabilities = [
[0.4, 0.2, 0.4],
[0.2, 0.5, 0.3],
[0.4, 0.3, 0.3],
]
In my use case using a simulation is not an option. My probabilities matrix is big. Not sure if the only way to calculate this is using an iterative algorithm or if there is a better way.
I looked at bayesian stuff and not sure how to apply it in this case. Updating it row by row then spreading out the difference proportionally to the previous probabilities seems promising but I haven't managed to make it work correctly. Maybe it isn't even possible like that.
Initial condition: p(apple) = p(orange) = p(banana) = 1/3.
Player 1 did X. We know that Player 1 does X with 80% probability if he has an apple. With 50% if he has an orange. With 10% if he has a banana.
p(X | apple) = 0.8
p(x | orange) = 0.5
p(x | banana) = 0.1
Since apple, orange, and banana are all equally likely at 1/3, we have p(x) = 1/3 * 1.4) ~ 0.466666666.
Recall Bayes theorem: p(a | b) = p(b|a) * p(a) / p(b)
So p(apple | x) = p(x | apple) * p(apple) / p(x) = 0.8 * (1/3) / 0.46666666 ~ 57.14%
similarly p(orange | x) = 0.5 * (1/3) / 0.46666666 ~ 35.71%
and p(banana | x) = 0.1 * (1/3) / 0.46666666 ~ 7.14%
Taking your example:
probabilities = [
[0.25, 0.25, 0.50],
[0.25, 0.50, 0.25],
[0.50, 0.25, 0.25],
]
# Player 1's
reach_probability = [1.0, 0.5, 0.5]
new_probabilities = [
[0.4, 0.2, 0.4],
[0.2, 0.5, 0.3],
[0.4, 0.3, 0.3],
]
p(x) = 0.25 * 1.0 + 0.25 * 0.5 + 0.5 * 0.5 = 0.625
p(a|x) = p(x|a) * p(a) / p(x) = 1.0 * 0.25 / 0.625 = 0.4
p(b|x) = p(x|b) * p(b) / p(x) = 0.5 * 0.25 / 0.625 = 0.2
p(c|x) = p(x|c) * p(c) / p(x) = 0.5 * 0.50 / 0.625 = 0.4
As desired. The other entries of each column can just be scaled to get a column sum of 1.0.
E.g. in column 1 we multiple the other entries by (1-0.4)/(1-0.25). This takes 0.25 -> 0.2 and 0.50 -> 0.40. Similarly for the other columns.
new_probabilities = [
[0.4, 0.200, 0.4],
[0.2, 0.533, 0.3],
[0.4, 0.266, 0.3],
]
If then player 2 does y with the same conditional probabilities we get:
p(y) = 0.2 * 1.0 + 0.533 * 0.5 + 0.3 * 0.5 = 0.6165
p(a|y) = p(y|a) * p(a) / p(y) = 1.0 * 0.2 / 0.6165 = 0.3244
p(b|y) = p(y|b) * p(b) / p(y) = 0.5 * 0.533 / 0.6165 = 0.4323
p(c|y) = p(y|c) * p(c) / p(y) = 0.5 * 0.266 / 0.6165 = 0.2157
Check this document:
Endgame Solving in Large Imperfect-Information Games∗
(S. Ganzfried, T. Sandholm, in International Conference on Autonomous Agents and MultiAgent Systems (AAMAS) (2015), pp. 37–45.)
Here is how I would approach this - have not worked through whether this has problems too but it seems alright in your examples.
Assume each update is of the form "X,Y has probability p'" Mark element X,Y dirty with delta p - p', where p was the old probability. Now, redistribute the delta proportionally to all unmarked elements in the row, then the column, marking each dirty with its own delta, and marking the first clean. Continue until no dirty entry remains.
0.5 0.5 0.0
0.0 0.5 0.5
0.5 0.0 0.5
Belief: 2,1 has probability zero.
0.5 0.0* 0.0 update 2,1 and mark dirty
0.0 0.5 0.5 delta is 0.5
0.5 0.0 0.5
1.0* 0.0' 0.0 distribute 0.5 to row & col
0.0 1.0* 0.5 update as dirty, both deltas -0.5
0.5 0.0 0.5
1.0' 0.0' 0.0 distribute -0.5 to rows & cols
0.0 1.0' 0.0* update as dirty, both deltas 0.5
0.0* 0.0 0.5
1.0' 0.0' 0.0 distribute 0.5 to row & col
0.0 1.0' 0.0' update as dirty, delta is -0.5
0.0' 0.0 1.0*
1.0' 0.0' 0.0 distribute on row/col
0.0 1.0' 0.0' no new dirty elements, complete
0.0' 0.0 1.0'
In your first example:
1/3 1/3 1/3
1/3 1/3 1/3
1/3 1/3 1/3
Belief: 3,1 has probability 0
1/3 1/3 0* update 3,1 to zero, mark dirty
1/3 1/3 1/3 delta is 1/3
1/3 1/3 1/3
1/2* 1/2* 0' distribute 1/3 proportionally across row then col
1/3 1/3 1/2* delta is -1/6
1/3 1/3 1/2*
1/2' 1/2' 0' distribute -1/6 proportionally across row then col
1/4* 1/4* 1/2' delta is 1/12
1/4* 1/4* 1/2'
1/2' 1/2' 0' distribute prportionally to unmarked entries
1/4' 1/4' 1/2' no new dirty entries, terminate
1/4' 1/4' 1/2'
You can mark entries dirty by inserting them with associated deltas into a queue and a hashset. Entries in both the queue and hash set are dirty. Entries in the hashset only are clean. Process the queue until you run out of entries.
I do not show an example where distribution is uneven, but the key is to distribute proportionally. Entries with 0 can never become non-zero except by a new belief.
Unfortunately there’s no known nice solution.
The way that I would apply Bayesian reasoning is to store a likelihood
matrix instead of a probability matrix. (Actually I’d store
log-likelihoods to prevent underflow, but that’s an implementation
detail.) We can start with the matrix
Apple
Orange
Banana
1
1
1
1
2
1
1
1
3
1
1
1
representing no knowledge. You could use the all-1/3 matrix instead, but
I’ve used 1 to emphasize that normalization is not required. To apply an
update like Player 1 doing X with conditional probabilities [0.8, 0.5,
0.1], we just multiply the row element-wise:
Apple
Orange
Banana
1
0.8
0.5
0.1
2
1
1
1
3
1
1
1
If Player 1 does Y independently with the same conditional
probabilities, then we get
Apple
Orange
Banana
1
0.64
0.25
0.01
2
1
1
1
3
1
1
1
Now, the rub is that these likelihoods don’t have a nice relationship to
probabilities of specific outcomes. All we know is that the probability
of a specific matching is proportional to the product of its matrix
entries. As a simple example, with a matrix like
Apple
Orange
Banana
1
1
0
0
2
0
1
0
3
0
1
1
the entry for Player 3 having Orange is 1, yet this assignment has
probability 0 because both possibilities for completing the matching
have probability 0.
What we need is the
permanent,
which sums the likelihood of every matching, and the minor for each
matrix entry, which sums the likelihood of every matching that makes the
corresponding assignment. Unfortunately we don’t know a good exact
algorithm for computing the permanent, and experts are skeptical that
one exists (the problem is NP-hard, and actually #P-complete). The
known approximation employs sampling via Markov chains.

how to create a histogram in python using my own values and frequencies witthout bins

How to create a histogram on python using my own frequencies, when i use bins it gives me another values:
i have 1000 numbers between 0 and 1, frquencies are for example i have 63 number between [0, 0.05[ but in the histogram i want that the frequencie will be 0.063 all frenquencies are divised by 1000
Frequencies =[0.063,0.047,0.049,0.051,0.049,0.045,0.033,0.055,0.047,0.052,0.048,0.067,0.033,0.056,0.055,0.041,0.048,0.05,0.072,0.039]
Intervals = [0 , 0.05 , 0.1 , 0.15 , 0.2 , 0.25 , 0.3 , 0.35 , 0.4 , 0.45 , 0.5 , 0.55 , 0.6 , 0.65 , 0.7, 0.75 ,0.8 , 0.85, 0.9 , 0.95, 1]
i need to plot a histogram that for example for the interval [0, 0.05[ the height will be 0.063
i tried this
plt.hist(Frequencies , bins = Intervals)
but the reslut is wrong
You already have the frequencies, so you don't need plt.hist but simply plt.bar:
x = range(len(Frequencies))
plt.bar(x, Frequencies)
tick_labels = [f'[{Intervals[i]},{Intervals[i+1]}[' for i in range(len(Intervals)-1)]
plt.xticks(x, tick_labels, rotation='vertical')

Iterating over numpy arange changes the values

I am using a numpy arange.
[In] test = np.arange(0.01, 0.2, 0.02)
[In] test
[Out] array([0.01, 0.03, 0.05, 0.07, 0.09, 0.11, 0.13, 0.15, 0.17, 0.19])
But then, if I iterate over this array, it iterates over slightly smaller values.
[In] for t in test:
.... print(t)
[Out]
0.01
0.03
0.049999999999999996
0.06999999999999999
0.08999999999999998
0.10999999999999997
0.12999999999999998
0.15
0.16999999999999998
0.18999999999999997
Why is this happening?
To avoid this problem, I have been rounding the values, but is this the best way to solve this problem?
for t in test:
print(round(t, 2))
I think the nature of the floating point numbers mentioned in the comments is the issue.
If you still think you're afraid of leaving it that way I suggest that you multiply your numbers by 100 and so work with intergers:
test = np.arange(1, 20, 2)
print(test)
for t in test:
print(t / 100)
This gives me the following output:
[ 1 3 5 7 9 11 13 15 17 19]
0.01
0.03
0.05
0.07
0.09
0.11
0.13
0.15
0.17
0.19
Alternatively you can also try the following:
test = np.arange(1, 20, 2) / 100
Did you try:
test = np.arange(0.01, 0.2, 0.02, dtype=np.float32)

Differences between dataframe spearman correlation using pandas and scipy

I have a fairly big matrix (4780, 5460) and computed the spearman correlation between rows using both "pandas.DataFrame.corr" and "scipy.stats.spearmanr". Each function return very different correlation coeficients, and now I am not sure which is the "correct", or if my dataset it more suitable to a different implementation.
Some contextualization: the vectors (rows) I want to test for correlation do not necessarily have all same points, there are NaN in some columns and not in others.
df.T.corr(method='spearman')
(r, p) = spearmanr(df.T)
df2 = pd.DataFrame(index=df.index, columns=df.columns, data=r)
In[47]: df['320840_93602.563']
Out[47]:
320840_93602.563 1.000000
3254_642.148.peg.3256 0.565812
13752_42938.1206 0.877192
319002_93602.870 0.225530
328_642.148.peg.330 0.658269
...
12566_42938.19 0.818395
321125_93602.2882 0.535577
319185_93602.1135 0.678397
29724_39.3584 0.770453
321030_93602.1962 0.738722
Name: 320840_93602.563, dtype: float64
In[32]: df2['320840_93602.563']
Out[32]:
320840_93602.563 1.000000
3254_642.148.peg.3256 0.444675
13752_42938.1206 0.286933
319002_93602.870 0.225530
328_642.148.peg.330 0.606619
...
12566_42938.19 0.212265
321125_93602.2882 0.587409
319185_93602.1135 0.696172
29724_39.3584 0.097753
321030_93602.1962 0.163417
Name: 320840_93602.563, dtype: float64
scipy.stats.spearmanr is not designed to handle nan, and its behavior with nan values is undefined. [Update: scipy.stats.spearmanr now has the argument nan_policy.]
For data without nans, the functions appear to agree:
In [92]: np.random.seed(123)
In [93]: df = pd.DataFrame(np.random.randn(5, 5))
In [94]: df.T.corr(method='spearman')
Out[94]:
0 1 2 3 4
0 1.0 -0.8 0.8 0.7 0.1
1 -0.8 1.0 -0.7 -0.7 -0.1
2 0.8 -0.7 1.0 0.8 -0.1
3 0.7 -0.7 0.8 1.0 0.5
4 0.1 -0.1 -0.1 0.5 1.0
In [95]: rho, p = spearmanr(df.values.T)
In [96]: rho
Out[96]:
array([[ 1. , -0.8, 0.8, 0.7, 0.1],
[-0.8, 1. , -0.7, -0.7, -0.1],
[ 0.8, -0.7, 1. , 0.8, -0.1],
[ 0.7, -0.7, 0.8, 1. , 0.5],
[ 0.1, -0.1, -0.1, 0.5, 1. ]])

Categories

Resources