Matplotlib - how to prevent graph from backtracking [duplicate]

Matplotlib - how to prevent graph from backtracking [duplicate] - python

This question already has answers here:
Matplotlib plotting in wrong order
(6 answers)
Closed 3 years ago.
I am new to Matplotlib. How do I prevent my graph from backtracking? Note the hook in the upper right of my graph. The X axis are strings, the Y axis are floats.
Here is my data:
['192.168.0.1', 2.568]
['96.120.96.153', 14.139]
['96.110.232.169', 10.505]
['162.151.49.133', 11.446]
['68.86.90.225', 24.335]
['68.86.84.226', 23.631]
['68.86.83.94', 29.011]
['173.167.58.162', 35.688]
['209.58.57.17', 162.768]
['64.86.79.2', 187.42]
['64.86.21.104', 162.461]
['63.243.205.1', 166.525]
['120.29.217.66', 156.898]
['209.58.86.143', 156.785]
['120.29.217.66', 181.599]
And the corresponding code:
import matplotlib.pyplot as plt
# x axis values
a = []
# corresponding y axis values
b = []
for k in range(15):
a.append(dataArray[k][0])
b.append(dataArray[k][1])
# plotting the points
plt.plot(a, b)
# naming the x axis
plt.xlabel('Hop Addresses')
# naming the y axis
plt.ylabel('average time (ms)')
# giving a title to my graph
plt.title('Time vs. Hops')
# function to show the plot
plt.show()

The issue you're having is that plt.plot() will plot lines from one data point to the next.
It is evident by the data that each timestep is independent of the other, but the order of IP addresses is important. plt.plot() tends to indicate a trend. Since the data is not a trend, rather a series of independent events, a different type of graph might be more appropriate. I suggest a bar graph in this case. And since the x-axis labels are bunched up, a horizontal bar chart would be even better.
You'll notice in the graph below that 120.29.217.66 appears twice. Once for each hop. To do account for this, plot against the index in the list rather than the IP address and then replace the y-axis labels.
import matplotlib.pyplot as plt
data = [
['192.168.0.1', 2.568],
['96.120.96.153', 14.139],
['96.110.232.169', 10.505],
['162.151.49.133', 11.446],
['68.86.90.225', 24.335],
['68.86.84.226', 23.631],
['68.86.83.94', 29.011],
['173.167.58.162', 35.688],
['209.58.57.17', 162.768],
['64.86.79.2', 187.42],
['64.86.21.104', 162.461],
['63.243.205.1', 166.525],
['120.29.217.66', 156.898],
['209.58.86.143', 156.785],
['120.29.217.66', 181.599],
]
idxs = range(len(data))
ips = [i[0] for i in data]
times = [i[1] for i in data]
plt.barh(idxs, times) # plot times vs the index of the array
plt.ylabel('Hop Addresses')
plt.xlabel('average time (ms)')
plt.title('Time vs. Hops')
plt.yticks(idxs, ips) # Replace tick labels with IP Addresses
plt.tight_layout()
plt.show()
Since we tend to read from top to bottom, you can always flip the y-axis.
plt.gca().invert_yaxis()

A line connects points in sequence. You need the data to be in the correct order, for the line graph to make sense. We can also just use the index as the 'x' value. For instance, for your data as below
data = [['192.168.0.1', 2.568],
['96.120.96.153', 14.139],
['96.110.232.169', 10.505],
['162.151.49.133', 11.446],
['68.86.90.225', 24.335],
['68.86.84.226', 23.631],
['68.86.83.94', 29.011],
['173.167.58.162', 35.688],
['209.58.57.17', 162.768],
['64.86.79.2', 187.42],
['64.86.21.104', 162.461],
['63.243.205.1', 166.525],
['120.29.217.66', 156.898],
['209.58.86.143', 156.785],
['120.29.217.66', 181.599]]
we can get only the delay in the same order provided by traceroute. Then the graph will be correct.
from matplotlib import pyplot as plt
plt.plot([i[1] for i in data])
plt.show()

Related

Python Scatter plot with matrix input. Having trouble getting number of columns showing on x axis, then a dot for each value in each column

I'm making a bar chart and a scatter plot. The bar chart takes a vector as an input. I plotted the values on the x-axis, and the amount of times they repeat on the y-axis. This is did by converting the vector to a list and using .count(). That worked great and was relatively straightforward.
As for the scatterplot, the input is going to be a matrix of any x and y dimensions. The idea is to have the amount of columns in the matrix show up on the x axis going from 1,2,3,4 etc depending on how many columns the inserted matrix is. The rows of each column will consist of many different numbers that I would like all to be displayed as dots or stars above the relevant column index, i. e. Column #3 consists of values 6,2,8,5,9,5 going down, and would like a dot for each of them going up the y-axis directly on top of the number 3 on the x axis. I have tried different approaches, some with dots showing up but in wrong places, other times the x axis is completely off even though I used .len(0,:) which prints out the correct amount of columns but doesn't chart it.
My latest attempt which now doesn't even show the dots or stars:
import numpy as np # Import NumPy
import matplotlib.pyplot as plt # Import the matplotlib.pyplot module
vector = np.array([[-3,7,12,4,0o2,7,-3],[7,7,12,4,0o2,4,12],[12,-3,4,10,12,4,-3],[10,12,4,0o3,7,10,12]])
x = len(vector[0,:])
print(x)#vector[0,:]
y = vector[:,0]
plt.plot(x, y, "r.") # Scatter plot with blue stars
plt.title("Scatter plot") # Set the title of the graph
plt.xlabel("Column #") # Set the x-axis label
plt.ylabel("Occurences of values for each column") # Set the y-axis label
plt.xlim([1,len(vector[0,:])]) # Set the limits of the x-axis
plt.ylim([-5,15]) # Set the limits of the y-axis
plt.show(vector)
The matrix shown at the top is just one I made up for the purpose of testing, the idea is that it should work for any given matrix which is imported.
I tried the above pasted code which is the closest I have gotten as it actually prints the amount of columns it has, but it doesn't show them on the plot. I haven't gotten to a point where it actually plots the points above the columns on y axis yet, only in completely wrong positions in a previous version.

import numpy as np # Import NumPy
import matplotlib.pyplot as plt # Import the matplotlib.pyplot module
vector = np.array([[-3,7,12,4,0o2,7,-3],
[7,7,12,4,0o2,4,12],
[12,-3,4,10,12,4,-3],
[10,12,4,0o3,7,10,12]])
rows, columns = vector.shape
plt.title("Scatter plot") # Set the title of the graph
plt.xlabel("Column #") # Set the x-axis label
plt.ylabel("Occurences of values for each column") # Set the y-axis label
plt.xlim([1,columns]) # Set the limits of the x-axis
plt.ylim([-5,15]) # Set the limits of the y-axis
for i in range(1, columns+1):
y = vector[:,i-1]
x = [i] * rows
plt.plot(x, y, "r.")
plt.show()

Set log xticks in matplotlib for a linear plot

Consider
xdata=np.random.normal(5e5,2e5,int(1e4))
plt.hist(np.log10(xdata), bins=100)
plt.show()
plt.semilogy(xdata)
plt.show()
is there any way to display xticks of the first plot (plt.hist) as in the second plot's yticks? For good reasons I want to histogram the np.log10(xdata) of xdata but I'd like to set minor ticks to display as usual in a log scale (even considering that the exponent is linear...)
In other words, I want the x_axis of this plot:
to be like the y_axis
of the 2nd plot, without changing the spacing between major ticks (e.g., adding log marks between 5.5 and 6.0, without altering these values)

Proper histogram plot with logarithmic x-axis:
Explanation:
Cut off negative values
The randomly generated example data likely contains still some negative values
activate the commented code lines at the beginning to see the effect
logarithmic function isn't defined for values <= 0
while the 2nd plot just deals with y-axis log scaling (negative values are just out of range), the 1st plot doesn't work with negative values in the BINs range
probably real world working data won't be <= 0, otherwise keep that in mind
BINs should be aligned to log scale as well
otherwise the 'BINs widths' distribution looks off
switch # on the plt.hist( statements in the 1st plot section to see the effect)
xdata (not np.log10(xdata)) to be plotted in the histogram
that 'workaround' with plotting np.log10(xdata) probably was the root cause for the misunderstanding in the comments
Code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) # just to have repeatable results for the answer
xdata=np.random.normal(5e5,2e5,int(1e4))
# MIN_xdata, MAX_xdata = np.min(xdata), np.max(xdata)
# print(f"{MIN_xdata}, {MAX_xdata}") # note the negative values
# cut off potential negative values (log function isn't defined for <= 0 )
xdata = np.ma.masked_less_equal(xdata, 0)
MIN_xdata, MAX_xdata = np.min(xdata), np.max(xdata)
# print(f"{MIN_xdata}, {MAX_xdata}")
# align the bins to fit a log scale
bins = 100
bins_log_aligned = np.logspace(np.log10(MIN_xdata), np.log10(MAX_xdata), bins)
# 1st plot
plt.hist(xdata, bins = bins_log_aligned) # note: xdata (not np.log10(xdata) )
# plt.hist(xdata, bins = 100)
plt.xscale('log')
plt.show()
# 2nd plot
plt.semilogy(xdata)
plt.show()

Just kept for now for clarification purpose. Will be deleted when the question is revised.
Disclaimer:
As Lucas M. Uriarte already mentioned that isn't an expected way of changing axis ticks.
x axis ticks and labels don't represent the plotted data
You should at least always provide that information along with such a plot.
The plot
From seeing the result I kinda understand where that special plot idea is coming from - still there should be a preferred way (e.g. conversion of the data in advance) to do such a plot instead of 'faking' the axis.
Explanation how that special axis transfer plot is done:
original x-axis is hidden
a twiny axis is added
note that its y-axis is hidden by default, so that doesn't need handling
twiny x-axis is set to log and the 2nd plot y-axis limits are transferred
subplots used to directly transfer the 2nd plot y-axis limits
use variables if you need to stick with your two plots
twiny x-axis is moved from top (twiny default position) to bottom (where the original x-axis was)
Code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) # just to have repeatable results for the answer
xdata=np.random.normal(5e5,2e5,int(1e4))
plt.figure()
fig, axs = plt.subplots(2, figsize=(7,10), facecolor=(1, 1, 1))
# 1st plot
axs[0].hist(np.log10(xdata), bins=100) # plot the data on the normal x axis
axs[0].axes.xaxis.set_visible(False) # hide the normal x axis
# 2nd plot
axs[1].semilogy(xdata)
# 1st plot - twin axis
axs0_y_twin = axs[0].twiny() # set a twiny axis, note twiny y axis is hidden by default
axs0_y_twin.set(xscale="log")
# transfer the limits from the 2nd plot y axis to the twin axis
axs0_y_twin.set_xlim(axs[1].get_ylim()[0],
axs[1].get_ylim()[1])
# move the twin x axis from top to bottom
axs0_y_twin.tick_params(axis="x", which="both", bottom=True, top=False,
labelbottom=True, labeltop=False)
# Disclaimer
disclaimer_text = "Disclaimer: x axis ticks and labels don't represent the plotted data"
axs[0].text(0.5,-0.09, disclaimer_text, size=12, ha="center", color="red",
transform=axs[0].transAxes)
plt.tight_layout()
plt.subplots_adjust(hspace=0.2)
plt.show()

Problem with scaling two different y-axis on matplotlib

I want to plot a dataset on one x-axis and two y-axes (eV and nm). The two y-axis are linked together with the equation: nm = 1239.8/eV.
As you can see from my picture output, the values are not in the correct positions. For instance, at eV = 0.5 I need to have nm = 2479.6, at eV = 2.9, nm = 423, etc…
How can I fix this?
My data.txt:
number eV nm
1 2.573 481.9
2 2.925 423.9
3 3.174 390.7
4 3.242 382.4
5 3.387 366.1
The code I am using:
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as tck
# data handling
file = "data.txt"
df = pd.read_csv(file, delimiter=" ") # generate a DataFrame with data
no = df[df.columns[0]]
eV = df[df.columns[1]].round(2) # first y-axis
nm = df[df.columns[2]].round(1) # second y-axis
# generate a subplot 1x1
fig, ax1 = plt.subplots(1,1)
# first Axes object, main plot (lollipop plot)
ax1.stem(no, eV, markerfmt=' ', basefmt=" ", linefmt='blue', label="Gas")
ax1.set_ylim(0.5,4)
ax1.yaxis.set_minor_locator(tck.MultipleLocator(0.5))
ax1.set_xlabel('Aggregation', labelpad=12)
ax1.set_ylabel('Transition energy [eV]', labelpad=12)
# adding second y-axis
ax2 = ax1.twinx()
ax2.set_ylim(2680,350) # set the corresponding ymax and ymin,
# but the values are not correct anyway
ax2.set_yticklabels(nm)
ax2.set_ylabel('Wavelength [nm]', labelpad=12)
# save
plt.tight_layout(pad=1.5)
plt.show()
The resulting plot is the following. I just would like to obtain a second axis by dividing the first one by 1239.8, and I don't know what else to look for!

You can use ax.secondary_yaxis, as described in this example. See the below code for an implementation for your problem. I have only included the part of the code relevant for the second y axis.
# adding second y-axis
def eV_to_nm(eV):
return 1239.8 / eV
def nm_to_eV(nm):
return 1239.8 / nm
ax2 = ax1.secondary_yaxis('right', functions=(eV_to_nm, nm_to_eV))
ax2.set_yticks(nm)
ax2.set_ylabel('Wavelength [nm]', labelpad=12)
Note that I am also using set_yticks instead of set_yticklabels. Furthermore, if you remove set_yticks, matplotlib will automatically determine y tick positions assuming a linear distribution of y ticks. However, because nm is inversely proportional to eV, this will lead to a (most likely) undesirable distribution of y ticks. You can manually change these using a different set of values in set_yticks.

I figured out how to solve this problem (source of the hint here).
So, for anyone who needs to have one dataset with one x-axis but two y-axes (one mathematically related to the other), a working solution is reported. Basically, the problem is to have the same ticks as the main y-axis, but change them proportionally, according to their mathematical relationship (that is, in this case, nm = 1239.8/eV). The following code has been tested and it is working.
This method of course works if you have two x-axes and 1 shared y-axis, etc.
Important note: you must define an y-range (or x-range if you want the opposite result), otherwise you might get some scaling problems.
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as tck
from matplotlib.text import Text
# data
file = "data.txt"
df = pd.read_csv(file, delimiter=" ") # generate a DataFrame with data
no = df[df.columns[0]]
eV = df[df.columns[1]].round(2) # first y-axis
nm = df[df.columns[2]].round(1) # second y-axis
# generate a subplot 1x1
fig, ax1 = plt.subplots(1,1)
# first Axes object, main plot (lollipop plot)
ax1.stem(no, eV, markerfmt=' ', basefmt=" ", linefmt='blue', label="Gas")
ax1.set_ylim(0.5,4)
ax1.yaxis.set_minor_locator(tck.MultipleLocator(0.5))
ax1.set_xlabel('Aggregation', labelpad=12)
ax1.set_ylabel('Transition energy [eV]', labelpad=12)
# function that correlates the two y-axes
def eV_to_nm(eV):
return 1239.8 / eV
# adding a second y-axis
ax2 = ax1.twinx() # share x axis
ax2.set_ylim(ax1.get_ylim()) # set the same range over y
ax2.set_yticks(ax1.get_yticks()) # put the same ticks as ax1
ax2.set_ylabel('Wavelength [nm]', labelpad=12)
# change the labels of the second axis by apply the mathematical
# function that relates the two axis to each tick of the first
# axis, and then convert it to text
# This way you have the same axis as y1 but with the same ticks scaled
ax2.set_yticklabels([Text(0, yval, f'{eV_to_nm(yval):.1f}')
for yval in ax1.get_yticks()])
# show the plot
plt.tight_layout(pad=1.5)
plt.show()
data.txt is the same as above:
number eV nm
1 2.573 481.9
2 2.925 423.9
3 3.174 390.7
4 3.242 382.4
5 3.387 366.1
Output image here

matplotlib: manually change yaxis values to differ from the actual value (NOT: change ticks!) [duplicate]

I am trying to plot a data and function with matplotlib 2.0 under python 2.7.
The x values of the function are evolving with time and the x is first decreasing to a certain value, than increasing again.
If the function is plotted against time, it shows function like this plot of data against time
I need the same x axis evolution for plotting against real x values. Unfortunately as the x values are the same for both parts before and after, both values are mixed together. This gives me the wrong data plot:
In this example it means I need the x-axis to start on value 2.4 and decrease to 1.0 than again increase to 2.4. I swear I found before that this is possible, but unfortunately I can't find a trace about that again.

A matplotlib axis is by default linearly increasing. More importantly, there must be an injective mapping of the number line to the axis units. So changing the data range is not really an option (at least when the aim is to keep things simple).
It would hence be good to keep the original numbers and only change the ticks and ticklabels on the axis. E.g. you could use a FuncFormatter to map the original numbers to
np.abs(x-tp)+tp
where tp would be the turning point.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
x = np.linspace(-10,20,151)
y = np.exp(-(x-5)**2/19.)
plt.plot(x,y)
tp = 5
fmt = lambda x,pos:"{:g}".format(np.abs(x-tp)+tp)
plt.gca().xaxis.set_major_formatter(matplotlib.ticker.FuncFormatter(fmt))
plt.show()

One option would be to use two axes, and plot your two timespans separately on each axes.
for instance, if you have the following data:
myX = np.linspace(1,2.4,100)
myY1 = -1*myX
myY2 = -0.5*myX-0.5
plt.plot(myX,myY, c='b')
plt.plot(myX,myY2, c='g')
you can instead create two subplots with a shared y-axis and no space between the two axes, plot each time span independently, and finally, adjust the limits of one of your x-axis to reverse the order of the points
fig, (ax1,ax2) = plt.subplots(1,2, gridspec_kw={'wspace':0}, sharey=True)
ax1.plot(myX,myY1, c='b')
ax2.plot(myX,myY2, c='g')
ax1.set_xlim((2.4,1))
ax2.set_xlim((1,2.4))

Specify color points depending on conditions [duplicate]

This question already has answers here:
How to change outliers to some other colors in a scatter plot
(2 answers)
Closed 5 years ago.
I have two numpy arrays, x and y, with 7000 elements each. I want to make a scatter plot of them giving each point a different color depending on these conditions:
-BLACK if x[i]<10.
-RED if x[i]>=10 and y[i]<=-0.5
-BLUE if x[i]>=10 and y[i]>-0.5
I tried creating a list of the same length as the data with the color I want to assign to each point and then plot the data with a loop, but it takes me a long time to run it. Here's my code:
import numpy as np
import matplotlib.pyplot as plt
#color list with same length as the data
col=[]
for i in range(0,len(x)):
if x[i]<10:
col.append('k')
elif x[i]>=10 and y[i]<=-0.5:
col.append('r')
else:
col.append('b')
#scatter plot
for i in range(len(x)):
plt.scatter(x[i],y[i],c=col[i],s=5, linewidth=0)
#add horizontal line and invert y-axis
plt.gca().invert_yaxis()
plt.axhline(y=-0.5,linewidth=2,c='k')
Before that, I tried creating the same color list in the same way, but plotting the data without the loop:
#scatter plot
plt.scatter(x,y,c=col,s=5, linewidth=0)
Even though this plots the data much, much faster than using the for loop, some of the scattered points appear with a wrong color. Why not using a loop to plot the data leads to incorrect color of some points?
I also tried defining three sets of data, one for each color, and adding them to the plot separately. But this is not what I am looking for.
Is there a way to specify in the scatter plots arguments the list of colors I want to use for each point in order not to use the for loop?
PS: This is the plot I get when I don't use the for loop (wrong one):
And this one when I use the for loop (correct):

This can be done using numpy.where. Since I do not your exact x and y values I will have to use some fake data:
import numpy as np
import matplotlib.pyplot as plt
#generate some fake data
x = np.random.random(10000)*10
y = np.random.random(10000)*10
col = np.where(x<1,'k',np.where(y<5,'b','r'))
plt.scatter(x, y, c=col, s=5, linewidth=0)
plt.show()
This produces the plot below:
The line col = np.where(x<1,'k',np.where(y<5,'b','r')) is the important one. This produces a list, the same size as x and y. It fills this list with 'k','b' or 'r' depending on the condition that is written before it. So if x is less than 1, 'k' will be appended to list, else if y is less than 5 'b' will be appended and if neither of those conditions are met, 'r' will be appended to the list. This way, you do not have to use a loop to plot your graph.
For your specific data you will have to change the values in the conditions of np.where.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matplotlib - how to prevent graph from backtracking [duplicate] - python

Related

Python Scatter plot with matrix input. Having trouble getting number of columns showing on x axis, then a dot for each value in each column

Set log xticks in matplotlib for a linear plot

Problem with scaling two different y-axis on matplotlib

matplotlib: manually change yaxis values to differ from the actual value (NOT: change ticks!) [duplicate]

Specify color points depending on conditions [duplicate]

Categories

Resources