plot inconsistent with the table values

plot inconsistent with the table values - python

I am confused why the plot in pyplot has values not the same as the values in the table
st.write("Corr matrix:")
st.write(corr_matrix)
The values in the table are the correct ones, and I'm just passing the same matrix "corr_matrix" to the plotly express below, but it shows values different from the table (i.e., incorrect)
if show_correlation_matrix:
st.write("Correlation Matrix:")
corr_matrix = df[prod_names].corr()
fig = px.imshow(corr_matrix, color_continuous_scale='coolwarm')
# if len(prod_names) <= 10:
# font_size = 20 / len(prod_names) + 5
# sns.heatmap(corr_matrix, annot=True, annot_kws={"size": font_size})
# else:
# sns.heatmap(corr_matrix, annot=False)
st.plotly_chart(fig)
I also tried to plot it using pyplot to see if it does the same thing, and yes, the plotly and pyplot show the same values which are incorrect becuase it's different from the table values (the correct one)
if show_correlation_matrix:
st.write("Corr matrix:")
corr_matrix = df[prod_names].corr()
st.plotly_chart(fig)
fig, ax = plt.subplots(figsize=(7, 7))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', ax=ax)
st.pyplot(fig)
another problem: in the plotly express plot, the values aren't shown, it's only shown if I hover on them, but I want them to be shown regardless if I hover over them or not, and this commented part
# if len(prod_names) <= 10:
# font_size = 20 / len(prod_names) + 5
# sns.heatmap(corr_matrix, annot=True, annot_kws={"size": font_size})
# else:
# sns.heatmap(corr_matrix, annot=False)
doesn't seem to apply to it, although it applies to the covariance matrix (the covariance matrix I'm doing doens't have the problems I have with the correlation matrix)
tried plotting the correlation matrix using plotly and pyplot but the values are incorrect
in fact, all the values are >= 0.7 whereas some of the actual correct values (which is reflected properly in the table) are less than 0.7 and some are 0.3 , 0.28, etc.

Related

Python: scatter plot with non-linear x axis

I have data with lots of x values around zero and only a few as you go up to around 950,
I want to create a plot with a non-linear x axis so that the relationship can be seen in a 'straight line' form. Like seen in this example,
I have tried using plt.xscale('log') but it does not achieve what I want.
I have not been able to use the log scale function with a scatter plot as it then only shows 3 values rather than the thousands that exist.
I have tried to work around it using
plt.plot(retper, aep_NW[y], marker='o', linewidth=0)
to replicate the scatter function which plots but does not show what I want.
plt.figure(1)
plt.scatter(rp,aep,label="SSI sum")
plt.show()
Image 3:
plt.figure(3)
plt.scatter(rp, aep)
plt.xscale('log')
plt.show()
Image 4:
plt.figure(4)
plt.plot(rp, aep, marker='o', linewidth=0)
plt.xscale('log')
plt.show()
ADDITION:
Hi thank you for the response.
I think you are right that my x axis is truncated but I'm not sure why or how...
I'm not really sure what to post code wise as the data is all large and coming from a server so can't really give you the data to see it with.
Basically aep_NW is a one dimensional array with 951 elements, values from 0-~140, with most values being small and only a few larger values. The data represents a storm severity index for 951 years.
Then I want the x axis to be the return period for these values, so basically I made a rp array, of the same size, which is given values from 951 down decreasing my a half each time.
I then sort the aep_NW values from lowest to highest with the highest value being associated with the largest return value (951), then the second highest aep_NW value associated with the second largest return period value (475.5) ect.
So then when I plot it I need the x axis scale to be similar to the example you showed above or the first image I attatched originally.
rp = [0]*numseas.shape[0]
i = numseas.shape[0] - 1
rp[i] = numseas.shape[0]
i = i - 1
while i != 0:
rp[i] = rp[i+1]/2
i = i - 1
y = np.argsort(aep_NW)
fig, ax = plt.subplots()
ax.scatter(rp,aep_NW[y],label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period")
ax.set_ylabel("SSI score")
plt.title("AEP for NW Europe: total loss per entire extended winter season")
plt.show()

It looks like in your "Image 3" the x axis is truncated, so that you don't see the data you are interested in. It appears this is due to there being 0's in your 'rp' array. I updated the examples to show the error you are seeing, one way to exclude the zeros, and one way to clip them and show them on a different scale.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
n = 100
numseas = np.logspace(-5, 3, n)
aep_NW = np.linspace(0, 140, n)
rp = [0]*numseas.shape[0]
i = numseas.shape[0] - 1
rp[i] = numseas.shape[0]
i = i - 1
while i != 0:
rp[i] = rp[i+1] /2
i = i - 1
y = np.argsort(aep_NW)
fig, axes = plt.subplots(1, 3, figsize=(14, 5))
ax = axes[0]
ax.scatter(rp, aep_NW[y], label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period")
ax.set_ylabel("SSI score")
ax = axes[1]
rp = np.array(rp)[y]
mask = rp > 0
ax.scatter(rp[mask], aep_NW[y][mask], label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period (0 values excluded)")
ax = axes[2]
log2_clipped_rp = np.log2(rp.clip(2**-100, None))[y]
ax.scatter(log2_clipped_rp, aep_NW[y], label="SSI sum")
xticks = list(range(-110, 11, 20))
xticklabels = [f'$2^{{{i}}}$' for i in xticks]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
ax.set_xlabel("log$_2$ Return period (values clipped to 2$^{-100}$)")
plt.show()

Problem with scaling two different y-axis on matplotlib

I want to plot a dataset on one x-axis and two y-axes (eV and nm). The two y-axis are linked together with the equation: nm = 1239.8/eV.
As you can see from my picture output, the values are not in the correct positions. For instance, at eV = 0.5 I need to have nm = 2479.6, at eV = 2.9, nm = 423, etc…
How can I fix this?
My data.txt:
number eV nm
1 2.573 481.9
2 2.925 423.9
3 3.174 390.7
4 3.242 382.4
5 3.387 366.1
The code I am using:
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as tck
# data handling
file = "data.txt"
df = pd.read_csv(file, delimiter=" ") # generate a DataFrame with data
no = df[df.columns[0]]
eV = df[df.columns[1]].round(2) # first y-axis
nm = df[df.columns[2]].round(1) # second y-axis
# generate a subplot 1x1
fig, ax1 = plt.subplots(1,1)
# first Axes object, main plot (lollipop plot)
ax1.stem(no, eV, markerfmt=' ', basefmt=" ", linefmt='blue', label="Gas")
ax1.set_ylim(0.5,4)
ax1.yaxis.set_minor_locator(tck.MultipleLocator(0.5))
ax1.set_xlabel('Aggregation', labelpad=12)
ax1.set_ylabel('Transition energy [eV]', labelpad=12)
# adding second y-axis
ax2 = ax1.twinx()
ax2.set_ylim(2680,350) # set the corresponding ymax and ymin,
# but the values are not correct anyway
ax2.set_yticklabels(nm)
ax2.set_ylabel('Wavelength [nm]', labelpad=12)
# save
plt.tight_layout(pad=1.5)
plt.show()
The resulting plot is the following. I just would like to obtain a second axis by dividing the first one by 1239.8, and I don't know what else to look for!

You can use ax.secondary_yaxis, as described in this example. See the below code for an implementation for your problem. I have only included the part of the code relevant for the second y axis.
# adding second y-axis
def eV_to_nm(eV):
return 1239.8 / eV
def nm_to_eV(nm):
return 1239.8 / nm
ax2 = ax1.secondary_yaxis('right', functions=(eV_to_nm, nm_to_eV))
ax2.set_yticks(nm)
ax2.set_ylabel('Wavelength [nm]', labelpad=12)
Note that I am also using set_yticks instead of set_yticklabels. Furthermore, if you remove set_yticks, matplotlib will automatically determine y tick positions assuming a linear distribution of y ticks. However, because nm is inversely proportional to eV, this will lead to a (most likely) undesirable distribution of y ticks. You can manually change these using a different set of values in set_yticks.

I figured out how to solve this problem (source of the hint here).
So, for anyone who needs to have one dataset with one x-axis but two y-axes (one mathematically related to the other), a working solution is reported. Basically, the problem is to have the same ticks as the main y-axis, but change them proportionally, according to their mathematical relationship (that is, in this case, nm = 1239.8/eV). The following code has been tested and it is working.
This method of course works if you have two x-axes and 1 shared y-axis, etc.
Important note: you must define an y-range (or x-range if you want the opposite result), otherwise you might get some scaling problems.
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as tck
from matplotlib.text import Text
# data
file = "data.txt"
df = pd.read_csv(file, delimiter=" ") # generate a DataFrame with data
no = df[df.columns[0]]
eV = df[df.columns[1]].round(2) # first y-axis
nm = df[df.columns[2]].round(1) # second y-axis
# generate a subplot 1x1
fig, ax1 = plt.subplots(1,1)
# first Axes object, main plot (lollipop plot)
ax1.stem(no, eV, markerfmt=' ', basefmt=" ", linefmt='blue', label="Gas")
ax1.set_ylim(0.5,4)
ax1.yaxis.set_minor_locator(tck.MultipleLocator(0.5))
ax1.set_xlabel('Aggregation', labelpad=12)
ax1.set_ylabel('Transition energy [eV]', labelpad=12)
# function that correlates the two y-axes
def eV_to_nm(eV):
return 1239.8 / eV
# adding a second y-axis
ax2 = ax1.twinx() # share x axis
ax2.set_ylim(ax1.get_ylim()) # set the same range over y
ax2.set_yticks(ax1.get_yticks()) # put the same ticks as ax1
ax2.set_ylabel('Wavelength [nm]', labelpad=12)
# change the labels of the second axis by apply the mathematical
# function that relates the two axis to each tick of the first
# axis, and then convert it to text
# This way you have the same axis as y1 but with the same ticks scaled
ax2.set_yticklabels([Text(0, yval, f'{eV_to_nm(yval):.1f}')
for yval in ax1.get_yticks()])
# show the plot
plt.tight_layout(pad=1.5)
plt.show()
data.txt is the same as above:
number eV nm
1 2.573 481.9
2 2.925 423.9
3 3.174 390.7
4 3.242 382.4
5 3.387 366.1
Output image here

Seaborn heatmap with variating cell sizes

I have a heatmap with ticks which have non equal deltas between themselves:
For example, in the attached image, the deltas are between 0.015 to 0.13. The current scale doesn't show the real scenario, since all cell sizes are equal.
Is there a way to place the ticks in their realistic positions, such that cell sizes would also change accordingly?
Alternatively, is there another method to generate this figure such that it would provide a realistic representation of the tick values?

As mentioned in the comments, a Seaborn heatmap uses categorical labels. However, the underlying structure is a pcolormesh, which can have different sizes for each cell.
Also mentioned in the comments, is that updating the private attributes of the pcolormesh isn't recommended. Moreover, the heatmap can be directly created calling pcolormesh.
Note that if there are N cells, there will be N+1 boundaries. The example code below supposes you have x-positions for the centers of the cells. It then calculates boundaries in the middle between successive cells. The first and the last distance is repeated.
The ticks and tick labels for x and y axis can be set from the given x-values. The example code supposes the original values indicate the centers of the cells.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set()
N = 10
xs = np.random.uniform(0.015, 0.13, 10).cumsum().round(3) # some random x values
values = np.random.rand(N, N) # a random matrix
# set bounds in the middle of successive cells, add extra bounds at start and end
bounds = (xs[:-1] + xs[1:]) / 2
bounds = np.concatenate([[2 * bounds[0] - bounds[1]], bounds, [2 * bounds[-1] - bounds[-2]]])
fig, ax = plt.subplots()
ax.pcolormesh(bounds, bounds, values)
ax.set_xticks(xs)
ax.set_xticklabels(xs, rotation=90)
ax.set_yticks(xs)
ax.set_yticklabels(xs, rotation=0)
plt.tight_layout()
plt.show()
PS: In case the ticks are mean to be the boundaries, the code can be simplified. One extra boundary is needed, for example a zero at the start.`
bounds = np.concatenate([[0], xs])
ax.tick_params(bottom=True, left=True)

Can't get rid of leading zeros on y axis

I am trying to plot graphs in Matplotlib and embed them into pyqt5 GUI. Everything is working fine, except for the fact that my y axis has loads of leading zeros which I cannot seem to get rid of.
I have tried googling how to format the axis, but nothing seems to work! I can't set the ticks directly because there's no way of determining what they will be, as I am going to be working with varying sized data sets.
num_bins = 50
# create an axis
ax = self.figure.add_subplot(111)
# discards the old graph
ax.clear()
##draws the bars and legend
colours = ['blue','red']
ax.hist(self.histoSets, num_bins, density=True, histtype='bar', color=colours, label=colours)
ax.legend(prop={'size': 10})
##set x ticks
min,max = self.getMinMax()
scaleMax = math.ceil((max/10000))*10000
scaleMin = math.floor((min/10000))*10000
scaleRange = scaleMax - scaleMin
ax.xaxis.set_ticks(np.arange(scaleMin, scaleMax+1, scaleRange/4))
# refresh canvas
self.draw()

all those numbers on your y-axis are tiny, i.e. on the order of 1e-5. this is because the integral of the density is defined to be 1 and your x-axis spans such a large range
I can mostly reproduce your plot with:
import matplotlib.pyplot as plt
import numpy as np
y = np.random.normal([190000, 220000], 20000, (5000, 2))
a, b, c = plt.hist(y, 40, density=True)
giving me:
the tuple returned from hist contains useful information, notably the first element (a above) are the densities, and the second element (b above) are the bins that it picked. you can see this all sums to one by doing:
sum(a[0] * np.diff(b))
and getting 1 back.
as ImportanceOfBeingErnest says you can use tight_layout() to resize the plot if it doesn't fit into the area

How to change axis range displayed in a histogram

I want to plot a histogram of my df with about 60 thousand of values. After I used plt.hist(x, bins = 30) it gave me something like
The problem is that there are more values bigger than 20 but the frequencies of those values may be smaller than 10. So how can I adjust the axis displayed to show more bins since I want to look at the whole distribution here.

The problem with histograms that skew so much towards one value is you're going to essentially flatten out any outlying values. A solution might be just to present the data with two charts.
Can you create another histogram containing only the values greater than 20?
(psuedo-code, since I don't know your data structure from your post)
plt.hist(x[x.column > 20], bins = 30)

Finally, it could look like this example:
import matplotlib.pyplot as plt
import numpy as np
values1 = np.random.rand(1000,1)*100
values2 = np.random.rand(100000,1)*5
values3 = np.random.rand(10000,1)*20
values = np.vstack((values1,values2,values3))
fig = plt.figure(figsize=(12,5))
ax1 = fig.add_subplot(121)
ax1.hist(values,bins=30)
ax1.set_yscale('log')
ax1.set_title('with log scale')
ax2 = fig.add_subplot(122)
ax2.hist(values,bins=30)
ax2.set_title('no log scale')
fig.savefig('test.jpg')

You could use plt.xscale('log')
PyPlot Logarithmic and other nonlinear axis

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

plot inconsistent with the table values - python

Related

Python: scatter plot with non-linear x axis

Problem with scaling two different y-axis on matplotlib

Seaborn heatmap with variating cell sizes

Can't get rid of leading zeros on y axis

How to change axis range displayed in a histogram

Categories

Resources