Related
I have a basic graph example and I am trying to make all the points be on some sort of curved line. I have an idea on how to about this but am not sure how to implement it or if it is even possible. Below I have a picture of the graph that I have made with the following code:
import matplotlib.pyplot as plt
import numpy as np
# original data
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [2, 7, 3, 4, 5, 1, 6, 9, 4, 6]
# quadratic regression
for i in range(int((len(x) + len(y)) / 2)):
sub_x = x[i:i+3]
sub_y = y[i:i+3]
model = np.poly1d(np.polyfit(sub_x, sub_y, 2))
polyline = np.linspace(min(sub_x), max(sub_x), 200)
plt.plot(polyline, model(polyline), color="#6D34D6", linestyle='dashed')
# plot lines
plt.scatter(x, y, color='#FF3FAF')
plt.plot(x, y, color='#FF3FAF', linestyle='solid')
plt.show()
Here is the picture graph that is produced:
The question that I have is how do I make all the dotted lines connect seamlessly? I had an idea about averaging each two line segments that contain the same points but I don't know how to go around doing so. Another idea that I had was making some sort of bezier curve that connects all the points but that sounds unnecessarily complicated.
Something like the green line should be the output (sorry for the poorly drawn line):
You can use scipy.interpolate.interp1d to apply a quadratic interpolation to expand the number of points to, say, 300 length, and then plot a smooth curve.
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.interpolate import interp1d
# original data
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [2, 7, 3, 4, 5, 1, 6, 9, 4, 6]
# quadratic regression
for i in range(int((len(x) + len(y)) / 2)):
sub_x = x[i:i+3]
sub_y = y[i:i+3]
model = np.poly1d(np.polyfit(sub_x, sub_y, 2))
polyline = np.linspace(min(sub_x), max(sub_x), 200)
plt.plot(polyline, model(polyline), color="#6D34D6", linestyle='dashed')
#Interpolate
x_new = np.linspace(min(x), max(x), 300) #<----
f = interp1d(x, y, kind='quadratic') #<----
# plot lines
plt.scatter(x, y, color='#FF3FAF')
plt.plot(x_new, f(x_new), color='#FF3FAF', linestyle='solid') #<----
plt.show()
Does anybody have a suggestion on what's the best way to present overlapping lines on a plot? I have a lot of them, and I had the idea of having full lines of different colors where they don't overlap, and having dashed lines where they do overlap so that all colors are visible and overlapping colors are seen.
But still, how do I that.
I have the same issue on a plot with a high degree of discretization.
Here the starting situation:
import matplotlib.pyplot as plt
grid=[x for x in range(10)]
graphs=[
[1,1,1,4,4,4,3,5,6,0],
[1,1,1,5,5,5,3,5,6,0],
[1,1,1,0,0,3,3,2,4,0],
[1,2,4,4,3,2,3,2,4,0],
[1,2,3,3,4,4,3,2,6,0],
[1,1,3,3,0,3,3,5,4,3],
]
for gg,graph in enumerate(graphs):
plt.plot(grid,graph,label='g'+str(gg))
plt.legend(loc=3,bbox_to_anchor=(1,0))
plt.show()
No one can say where the green and blue lines run exactly
and my "solution"
import matplotlib.pyplot as plt
grid=[x for x in range(10)]
graphs=[
[1,1,1,4,4,4,3,5,6,0],
[1,1,1,5,5,5,3,5,6,0],
[1,1,1,0,0,3,3,2,4,0],
[1,2,4,4,3,2,3,2,4,0],
[1,2,3,3,4,4,3,2,6,0],
[1,1,3,3,0,3,3,5,4,3],
]
for gg,graph in enumerate(graphs):
lw=10-8*gg/len(graphs)
ls=['-','--','-.',':'][gg%4]
plt.plot(grid,graph,label='g'+str(gg), linestyle=ls, linewidth=lw)
plt.legend(loc=3,bbox_to_anchor=(1,0))
plt.show()
I am grateful for suggestions on improvement!
Just decrease the opacity of the lines so that they are see-through. You can achieve that using the alpha variable. Example:
plt.plot(x, y, alpha=0.7)
Where alpha ranging from 0-1, with 0 being invisible.
imagine your panda data frame is called respone_times, then you can use alpha to set different opacity for your graphs. Check the picture before and after using alpha.
plt.figure(figsize=(15, 7))
plt.plot(respone_times,alpha=0.5)
plt.title('a sample title')
plt.grid(True)
plt.show()
Depending on your data and use case, it might be OK to add a bit of random jitter to artificially separate the lines.
from numpy.random import default_rng
import pandas as pd
rng = default_rng()
def jitter_df(df: pd.DataFrame, std_ratio: float) -> pd.DataFrame:
"""
Add jitter to a DataFrame.
Adds normal distributed jitter with mean 0 to each of the
DataFrame's columns. The jitter's std is the column's std times
`std_ratio`.
Returns the jittered DataFrame.
"""
std = df.std().values * std_ratio
jitter = pd.DataFrame(
std * rng.standard_normal(df.shape),
index=df.index,
columns=df.columns,
)
return df + jitter
Here's a plot of the original data from Markus Dutschke's example:
And here's the jittered version, with std_ratio set to 0.1:
Replacing solid lines by dots or dashes works too
g = sns.FacetGrid(data, col='config', row='outputs', sharex=False)
g.map_dataframe(sns.lineplot, x='lag',y='correlation',hue='card', linestyle='dotted')
Instead of random jitter, the lines can be offset just a little bit, creating a layered appearance:
import matplotlib.pyplot as plt
from matplotlib.transforms import offset_copy
grid = list(range(10))
graphs = [[1, 1, 1, 4, 4, 4, 3, 5, 6, 0],
[1, 1, 1, 5, 5, 5, 3, 5, 6, 0],
[1, 1, 1, 0, 0, 3, 3, 2, 4, 0],
[1, 2, 4, 4, 3, 2, 3, 2, 4, 0],
[1, 2, 3, 3, 4, 4, 3, 2, 6, 0],
[1, 1, 3, 3, 0, 3, 3, 5, 4, 3]]
fig, ax = plt.subplots()
lw = 1
for gg, graph in enumerate(graphs):
trans_offset = offset_copy(ax.transData, fig=fig, x=lw * gg, y=lw * gg, units='dots')
ax.plot(grid, graph, lw=lw, transform=trans_offset, label='g' + str(gg))
ax.legend(loc='upper left', bbox_to_anchor=(1.01, 1.01))
# manually set the axes limits, because the transform doesn't set them automatically
ax.set_xlim(grid[0] - .5, grid[-1] + .5)
ax.set_ylim(min([min(g) for g in graphs]) - .5, max([max(g) for g in graphs]) + .5)
plt.tight_layout()
plt.show()
I'm plotting a graph on a x axis (solution concentration) against efficiency (y). I have this set up to display for x between 0 to 100, but I want to add another datapoint as a control, without any solution at all. I'm having issues as this doesn't really fit anywhere on the concentration axis, but Id like to add it either before 0 or after 100, potentially with a break in the axis to separate them. So my x-axis would look like ['control', 0, 20, 40, 60, 80, 100]
MWE:
x_array = ['control', 0, 20, 40, 50, 100]
y_array = [1, 2, 3, 4, 5, 6]
plt.plot(x_array, y_array)
Trying this, I get an error of:
ValueError: could not convert string to float: 'control'
Any ideas how i could make something like this work? Ive looked at xticks but that would plot the x axis as strings, therefore losing the continuity of the axis, which would mess up the plot as the datapoints are not spaced equidistant.
You can add a single point to your graph as a separate call to plot, then adjust the x-axis labels.
import matplotlib.pyplot as plt
x_array = [0, 20, 40, 50, 100]
y_array = [2, 3, 4, 5, 6]
x_con = -20
y_con = 1
x_ticks = [-20, 0, 20, 40, 60, 80, 100]
x_labels = ['control', 0, 20, 40, 60, 80, 100]
fig, ax = plt.subplots(1,1)
ax.plot(x_array, y_array)
ax.plot(x_con, y_con, 'ro') # add a single red dot
# set tick positions, adjust label text
ax.xaxis.set_ticks(x_ticks)
ax.xaxis.set_ticklabels(x_labels)
ax.set_xlim(x_con-10, max(x_array)+3)
ax.set_ylim(0,7)
plt.show()
I'm trying to make a scatterplot of two arrays/lists, one of which is the x coordinate and the other the y. I'm not having any trouble with that. However, I need to color-code these points based on their values at a specific point in time, based on data which I have in a 2d array. Also, this 2d array of data has a very large spread, so I'd like to color the points logarithmically (I'm not sure if this means just change the color bar labels or if there's a more fundamental difference.)
Here is my code so far:
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure(1)
time = #I'd like to specify time here.
x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]
multi_array = [[1, 1, 10, 100, 1000], [10000, 1000, 100, 10, 1], [300, 400, 5000, 12, 47]]
for counter in np.arange(0, 5):
t = multi_array[time, counter] #I tried this, and it did not work.
s = plt.scatter(x[counter], y[counter], c = t, marker = 's')
plt.show()
I followed the advice I saw elsewhere to color by a third variable, which was to set the color equal to that variable, but then when I tried that with my data set, I just got all the points as one color, and then when I try it with this mockup it gives me the following error:
TypeError: list indices must be integers, not tuple
Could someone please help me color my points the way I need to?
If I understand the question (which I'm not at all sure off) here is the answer:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure(1)
time = 2 #I'd like to specify time here.
x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]
multi_array = np.asarray([[1, 1, 10, 100, 1000], [10000, 1000, 100, 10, 1], [300, 400, 5000, 12, 47]])
log_array=np.log10(multi_array)
s = plt.scatter(x, y, c=log_array[time], marker = 's',s=100)
cb = plt.colorbar(s)
cb.set_label('log of ...')
plt.show()
After some tinkering, and using information learned from user4421975's answer and the link in the comments, I've puzzled it out. In short, I used plt.scatter's norm feature/attribute/thingie to mess with the colors and make them logarithmic.
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure(1)
time = 2
x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]
multi_array = np.asarray([[1, 1, 10, 100, 1000], [10000, 1000, 100, 10, 1], [300, 400, 5000, 12, 47]])
for counter in np.arange(0, 5):
s = plt.scatter(x[counter], y[counter], c = multi_array[time, counter], cmap = 'winter', norm = matplotlib.colors.LogNorm(vmin=multi_array[time].min(), vmax=multi_array[time].max()), marker = 's', )
cb = plt.colorbar(s)
cb.set_label('Log of Data')
plt.show()
I have an algorithm that can be controlled by two parameters so now I want to plot the runtime of the algorithm depending on these parameters.
My Code:
from matplotlib import pyplot
import pylab
from mpl_toolkits.mplot3d import Axes3D
fig = pylab.figure()
ax = Axes3D(fig)
sequence_containing_x_vals = [5,5,5,5,10,10,10,10,15,15,15,15,20,20,20,20]
sequence_containing_y_vals = [1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4]
sequence_containing_z_vals = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals, sequence_containing_z_vals)
pyplot.show()
This will plot all the points in the space but I want them connected and have something like this:
(The coloring would be nice but not necessary)
To plot the surface you need to use plot_surface, and have the data as a regular 2D array (that reflects the 2D geometry of the x-y plane). Usually meshgrid is used for this, but since your data already has the x and y values repeated appropriately, you just need to reshape them. I did this with numpy reshape.
from matplotlib import pyplot, cm
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
fig = pyplot.figure()
ax = Axes3D(fig)
sequence_containing_x_vals = np.array([5,5,5,5,10,10,10,10,15,15,15,15,20,20,20,20])
X = sequence_containing_x_vals.reshape((4,4))
sequence_containing_y_vals = np.array([1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4])
Y = sequence_containing_y_vals.reshape((4,4))
sequence_containing_z_vals = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
Z = sequence_containing_z_vals.reshape((4,4))
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.hot)
pyplot.show()
Note that X, Y = np.meshgrid([1,2,3,4], [5, 10, 15, 20]) will give the same X and Y as above but more easily.
Of course, the surface shown here is just a plane since your data is consistent with z = x + y - -5, but this method will work with generic surfaces, as can be seen in the many matplotlib surface examples.