Have some data that I've plotted on a log-log plot and now I want to fit a straight line through these points. I have tried various methods and can't get what I'm after. Example code:
import numpy as np
import matplotlib.pyplot as plt
import random
x= np.linspace(1,100,10)
y = np.log10(x)+np.log10(np.random.uniform(0,10))
coefficients = np.polyfit(np.log10(x),np.log10(y),1)
polynomial=np.poly1d(coefficients)
y_fit = polynomial(y)
plt.plot(x,y,'o')
plt.plot(x,y_fit,'-')
plt.yscale('log')
plt.xscale('log')
This gives me a ideal 'straight' line in log log offset by a random number to which I then fit a 1d poly. The output is:
So ignoring the offset, which I can deal with, it is not quite what I require as it has basically plotted a straight line between each point and then joined them up whereas I need a 'line of best fit' through the middle of them all so I can measure the gradient of it.
What is the best way to achieve this?
One problem is
y_fit = polynomial(y)
You must plug in the x values, not y, to get y_fit.
Also, you fit log10(y) with log10(x), so to evaluate the linear interpolator, you must plug in log10(x), and the result will be the base-10 log of the y values.
Here's a modified version of your script, followed by the plot it generates.
import numpy as np
import matplotlib.pyplot as plt
import random
x = np.linspace(1,100,10)
y = np.log10(x) + np.log10(np.random.uniform(0,10))
coefficients = np.polyfit(np.log10(x), np.log10(y), 1)
polynomial = np.poly1d(coefficients)
log10_y_fit = polynomial(np.log10(x)) # <-- Changed
plt.plot(x, y, 'o-')
plt.plot(x, 10**log10_y_fit, '*-') # <-- Changed
plt.yscale('log')
plt.xscale('log')
Related
I'm trying to analyse reproducibility of one experiment. I replaced 0 values with 0.1 and I plotted data from both experiments with log-log axes. So far, so good.
Next, I got rows where values in both columns are > 0 and I calculated a linear regression on the log10 of those values. I got the slope and the intercept of the linear fit and then I tried to plot it.
import pandas as pd
import numpy as np
table = pd.read_csv("data.csv")
data = table.replace(0, 0.1)
plt.plot(data["run1"], data["run2"], color="#03012d", marker=".", ls="None", markersize=3, label="")
plt.xscale('log')
plt.yscale('log')
plt.axis('square')
plt.xlabel("1st experiment")
plt.ylabel("2nd experiment")
from scipy.stats import linregress
df = table.loc[(table['run1'] >0) & (table['run2'] >0)]
stats = linregress(np.log10(df["run1"]),np.log10(df["run2"]))
m = stats.slope
b = stats.intercept
r = stats.rvalue
x = np.logspace(-1, 5, base=10)
y = (m*x+b)
plt.plot(x, y, c='orange', label="fit")
plt.legend()
But this is what I get and it's definitely not linear:
I don't know what I am doing wrong..
EDIT:
Link to the initial dataset
You are confusing things here. The problem is that np.logspace(-1, 5, base=10) simply returns you logarithmically spaced values but you still need to take the base 10 log of your x-values because your x-axis in the plot is logarithmic (np.log10(x)) and do the following
x = np.log10(np.logspace(-1, 5, base=10))
y = (m*x + b)
plt.plot(x, y, c='orange', label="fit")
This will give you what you expect, a straight linear regression prediction.
When I visually inspect a scatterplot of the data, I see no utility in taking logs. A straight line through the raw data looks like it is probably the best you can do here, see the attached images.
i am trying to construct a function which gives me interpolated values of a piecewise linear function. I tried linear spline interpolation (which should be able to do exactly this?)- but without any luck. The problem is most visible on a log scale plot. Below there is the code of a small example i prepared:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import os
from scipy import interpolate
#Original Data
pwl_data = np.array([[0,1e3, 1e5, 1e8], [-90,-90, -90, -130]])
#spine interpolation
pwl_spline = interpolate.splrep(pwl_data[0], pwl_data[1])
spline_x = np.linspace (0,1e8, 10000)
legend = []
plt.plot(pwl_data[0],pwl_data[1])
plt.plot(spline_x,interpolate.splev(spline_x,pwl_spline ),'*')
legend.append("Data")
legend.append("Interpolated Data")
plt.xscale('log')
plt.legend(legend)
plt.grid(True)
plt.grid(b=True, which='minor', linestyle='--')
plt.show()
What am I doing wrong?
The spline fitting have to be performed on the linearized data, i.e. using log(x) instead of x:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
#Original Data
pwl_data = np.array([[1, 1e3, 1e5, 1e8], [-90, -90, -90, -130]])
x = pwl_data[0]
y = pwl_data[1]
log_x = np.log(x)
#spine interpolation
pwl_spline = interpolate.splrep(log_x, y)
spline_log_x = np.linspace(0, 18, 30)
spline_y = interpolate.splev(spline_log_x, pwl_spline )
plt.plot(log_x, y, '-o')
plt.plot(spline_log_x, spline_y, '-*')
plt.xlabel('log(x)');
note: I remove the zero from the data. Also, spline fitting could be not the best if you want a piecewise linear function, you could have a look at this question for example: https://datascience.stackexchange.com/q/8457/53362
For plotting with matplotlib, consider matplotlibs step which internally performs a piecewise constant interpolation.
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.step.html
you can invoke it simply via:
plt.step(x,y) given your inputs x and y.
In plotly the argument line_shape='hv' for the Scatter plot achieves similar results see https://plotly.com/python/line-charts/
I have this strange problem with np.arange. I want to plot a simple equation which basically looks like y = Ax^{-1/3}(1-Bx^{4/3})^{1/2}
However, I can get a almost working-quality plot from wolfram mathematica with my provided equation but I am struggling to generate the same plot in python!
import numpy as np
import matplotlib.pyplot as plt
import math
# evenly sampled time at 200ms intervals
x = np.arange(0., 10**33, 10**8.)
plt.plot(x**(-1/3)*(1.102*10**20)*(1-(x**(4/3)*2.424*10**(-45)))**(1/2))
plt.xlim(math.pow(10,31), 3*math.pow(10,33))
plt.ylim(5*math.pow(10,8), 2.5*math.pow(10,9))
plt.xlabel("M(g)", fontsize =13)
plt.ylabel("R(cm)", fontsize=13)
plt.show()
my variable x should run from 0 to 3e33 and I want to see the plot both in linear and loglog plot, but I am having memory issues with the x range and if I set a smaller range, I basically get no plot at all. I am sure I am doing something wrong here, I just do not see it. Your help is appreciated.
There are several problems in the code:
x has too many points. Reduce the number of points, to e.g. 1000 points.
x should not start at 0, since 0**(-1/3) is undefined (you cannot divide by 0). Thus a sensible definition of x may be
x = np.linspace(1e30, 1e33, 1001)
The x values do not actually appear in the plot, since you only plot y, plt.plot(y) instead of y vs. x: plt.plot(x,y)
In total,
from __future__ import division # if using python 2
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(1e30, 1e33, 1001)
y = x**(-1/3)*(1.102*10**20)*(1-(x**(4/3)*2.424*10**(-45)))**(1/2)
plt.plot(x,y)
plt.xlabel("M(g)", fontsize =13)
plt.ylabel("R(cm)", fontsize=13)
plt.show()
will provide
I have a 2 lists, first with dates (datetime objects) and second with some values for these dates.
When I create a simple plot:
plt.plot_date(x=dates, y=dur, fmt='r-')
I get a very ugly image like this.
How I can smooth this line? I think about extrapolation, but have not found a simple function for this. In Scipy there are very difficult tools for this, but I don't understand what I must add to my data for extrapolation.
You can make it smooth using sp.polyfit
Code:
import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
# sampledata
x = np.arange(199)
r = np.random.rand(100)
y = np.convolve(r, r)
# plot sampledata
plt.plot(x, y, color='grey')
# smoothen sampledata using a 50 degree polynomial
p = sp.polyfit(x, y, deg=50)
y_ = sp.polyval(p, x)
# plot smoothened data
plt.plot(x, y_, color='r', linewidth=2)
plt.show()
I've been looking high and low for a solution to this simple problem but I can't find it anywhere! There are a loads of posts detailing semilog / loglog plotting of data in 2D e.g. plt.setxscale('log') however I'm interested in using log scales on a 3d plot(mplot3d).
I don't have the exact code to hand and so can't post it here, however the simple example below should be enough to explain the situation. I'm currently using Matplotlib 0.99.1 but should shortly be updating to 1.0.0 - I know I'll have to update my code for the mplot3d implementation.
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FixedLocator, FormatStrFormatter
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = Axes3D(fig)
X = np.arange(-5, 5, 0.025)
Y = np.arange(-5, 5, 0.025)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet, extend3d=True)
ax.set_zlim3d(-1.01, 1.01)
ax.w_zaxis.set_major_locator(LinearLocator(10))
ax.w_zaxis.set_major_formatter(FormatStrFormatter('%.03f'))
fig.colorbar(surf)
plt.show()
The above code will plot fine in 3D, however the three scales (X, Y, Z) are all linear. My 'Y' data spans several orders of magnitude (like 9!), so it would be very useful to plot it on a log scale. I can work around this by taking the log of the 'Y', recreating the numpy array and plotting the log(Y) on a linear scale, but in true python style I'm looking for smarter solution which will plot the data on a log scale.
Is it possible to produce a 3D surface plot of my XYZ data using log scales, ideally I'd like X & Z on linear scales and Y on a log scale?
Any help would be greatly appreciated. Please forgive any obvious mistakes in the above example, as mentioned I don't have my exact code to have and so have altered a matplotlib gallery example from my memory.
Thanks
Since I encountered the same question and Alejandros answer did not produced the desired Results here is what I found out so far.
The log scaling for Axes in 3D is an ongoing issue in matplotlib. Currently you can only relabel the axes with:
ax.yaxis.set_scale('log')
This will however not cause the axes to be scaled logarithmic but labeled logarithmic.
ax.set_yscale('log') will cause an exception in 3D
See on github issue 209
Therefore you still have to recreate the numpy array
I came up with a nice and easy solution taking inspiration from Issue 209. You define a small formatter function in which you set your own notation.
import matplotlib.ticker as mticker
# My axis should display 10⁻¹ but you can switch to e-notation 1.00e+01
def log_tick_formatter(val, pos=None):
return f"$10^{{{int(val)}}}$" # remove int() if you don't use MaxNLocator
# return f"{10**val:.2e}" # e-Notation
ax.zaxis.set_major_formatter(mticker.FuncFormatter(log_tick_formatter))
ax.zaxis.set_major_locator(mticker.MaxNLocator(integer=True))
set_major_locator sets the exponential to only use integers 10⁻¹, 10⁻² without 10^-1.5 etc. Source
Important! remove the cast int() in the return statement if you don't use set_major_locator and you want to display 10^-1.5 otherwise it will still print 10⁻¹ instead of 10^-1.5.
Example:
Try it yourself!
from mpl_toolkits.mplot3d import axes3d
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
fig = plt.figure(figsize=(11,8))
ax1 = fig.add_subplot(121,projection="3d")
# Grab some test data.
X, Y, Z = axes3d.get_test_data(0.05)
# Now Z has a range from 10⁻³ until 10³, so 6 magnitudes
Z = (np.full((120, 120), 10)) ** (Z / 20)
ax1.plot_wireframe(X, Y, Z, rstride=10, cstride=10)
ax1.set(title="Linear z-axis (small values not visible)")
def log_tick_formatter(val, pos=None):
return f"$10^{{{int(val)}}}$"
ax2 = fig.add_subplot(122,projection="3d")
# You still have to take log10(Z) but thats just one operation
ax2.plot_wireframe(X, Y, np.log10(Z), rstride=10, cstride=10)
ax2.zaxis.set_major_formatter(mticker.FuncFormatter(log_tick_formatter))
ax2.zaxis.set_major_locator(mticker.MaxNLocator(integer=True))
ax2.set(title="Logarithmic z-axis (much better)")
plt.savefig("LinearLog.png", bbox_inches='tight')
plt.show()
in osx: ran ax.zaxis._set_scale('log') (notice the underscore)
There is no solution because of the issue 209. However, you can try doing this:
ax.plot_surface(X, np.log10(Y), Z, cmap='jet', linewidth=0.5)
If in "Y" there is a 0, it is going to appear a warning but still works. Because of this warning color maps don´t work, so try to avoid 0 and negative numbers. For example:
Y[Y != 0] = np.log10(Y[Y != 0])
ax.plot_surface(X, Y, Z, cmap='jet', linewidth=0.5)
I wanted a symlog plot and, since I fill the data array by hand, I just made a custom function to calculate the log to avoid having negative bars in the bar3d if the data is < 1:
import math as math
def manual_log(data):
if data < 10: # Linear scaling up to 1
return data/10
else: # Log scale above 1
return math.log10(data)
Since I have no negative values, I did not implement handling this values in this function, but it should not be hard to change it.