Python (twinx plot) Wacky graph - python

Edit: The graph is fixed now but I am having troubles plotting the legend. It only shows legend for 1 of the plots. As seen in the picture below
I am trying to plot a double axis graph with twinx but I am facing some difficulties as seen in the picture below.
Any input is welcomed! If you require any additional information, I am happy to provide them to you.
as compared to the original before plotting z-axis.
I am unsure why my graph is like that as initially before plotting my secondary y axis, (the pink line), the closing value graph can be seen perfectly but now it seems cut.
It may be due to my data as provided below.
Link to testing1.csv: https://filebin.net/ou93iqiinss02l0g
Code I have currently:
# read csv into variable
sg_df_merged = pd.read_csv("testing1.csv", parse_dates=[0], index_col=0)
# define figure
fig = plt.figure()
fig, ax5 = plt.subplots()
ax6 = ax5.twinx()
x = sg_df_merged.index
y = sg_df_merged["Adj Close"]
z = sg_df_merged["Singapore"]
curve1 = ax5.plot(x, y, label="Singapore", color = "c")
curve2 = ax6.plot(x, z, label = "Face Mask Compliance", color = "m")
curves = [curve1, curve2]
# labels for my axis
ax5.set_xlabel("Year")
ax5.set_ylabel("Adjusted Closing Value ($)")
ax6.set_ylabel("% compliance to wearing face mask")
ax5.grid #not sure what this line does actually
# set x-axis values to 45 degree angle
for label in ax5.xaxis.get_ticklabels():
label.set_rotation(45)
ax5.grid(True, color = "k", linestyle = "-", linewidth = 0.3)
plt.gca().legend(loc='center left', bbox_to_anchor=(1.1, 0.5), title = "Country Index")
plt.show();
Initially, I thought it was due to my excel having entire blank lines but I have since removed the rows which can be found here
Also, I have tried to interpolate but somehow it doesn't work. Any suggestions on this is very much welcomed

Only rows that where all NaN, were dropped. There’s still a lot of rows with NaN.
In order for matplotlib to draw connecting lines between two data points, the points must be consecutive.
The plot API isn't connecting the data between the NaN values
This can be dealt with by converting the pandas.Series to a DataFrame, and using .dropna.
See that x has been dropped, because it will not match the index length of y or z. They are shorter after .dropna.
y is now a separate dataframe, where .dropna is used.
z is also a separate dataframe, where .dropna is used.
The x-axis for the plot are the respective indices.
# read csv into variable
sg_df_merged = pd.read_csv("test.csv", parse_dates=[0], index_col=0)
# define figure
fig, ax5 = plt.subplots(figsize=(8, 6))
ax6 = ax5.twinx()
# select specific columns to plot and drop additional NaN
y = pd.DataFrame(sg_df_merged["Adj Close"]).dropna()
z = pd.DataFrame(sg_df_merged["Singapore"]).dropna()
# add plots with markers
curve1 = ax5.plot(y.index, 'Adj Close', data=y, label="Singapore", color = "c", marker='o')
curve2 = ax6.plot(z.index, 'Singapore', data=z, label = "Face Mask Compliance", color = "m", marker='o')
# labels for my axis
ax5.set_xlabel("Year")
ax5.set_ylabel("Adjusted Closing Value ($)")
ax6.set_ylabel("% compliance to wearing face mask")
# rotate xticks
ax5.xaxis.set_tick_params(rotation=45)
# add a grid to ax5
ax5.grid(True, color = "k", linestyle = "-", linewidth = 0.3)
# create a legend for both axes
curves = curve1 + curve2
labels = [l.get_label() for l in curves]
ax5.legend(curves, labels, loc='center left', bbox_to_anchor=(1.1, 0.5), title = "Country Index")
plt.show()

Related

Change to 4 columns from 2

I have a function that plots a boxplot and a histogram side by side in two columns. I would like to change my code to make it 4 columns to shorten the output. I have played with the code and I am missing something. I can change it to 4 columns, but then the right 2 are blank and everything is in the two left columns.
I have tried changing the line to
ax_box, ax_hist, ax_box2, ax_hist2 = axs[i*ncols], axs[i*ncols+1], axs[i*ncols+2], axs[i*ncols+3]
instead of
ax_box, ax_hist = axs[i*ncols], axs[i*ncols+1]
among other iterations of changing the indexes on the columns.
I am new to python and I know I am missing something that will be obvious to more experienced people.
my code is:
`def hist_box_all1(data, bins):
ncols = 2 # Number of columns for subplots
nrows = len(data.columns) # Number of rows for subplots
height_ratios = [0.75, 0.75] * (nrows // 2) + [0.75] * (nrows % 2)
fig, axs = plt.subplots(nrows=nrows, ncols=ncols, figsize=(15,4*nrows), gridspec_kw={'height_ratios': height_ratios})
axs = axs.ravel() # Flatten the array of axes
for i, feature in enumerate(data.columns):
ax_box, ax_hist = axs[i*ncols], axs[i*ncols+1]
sns.set(font_scale=1) # Set the size of the label
x = data[feature]
n = data[feature].mean() # Get the mean for the legend
m=data[feature].median()
sns.boxplot(
x=x,
ax=ax_box,
showmeans=True,
meanprops={
"marker": "o",
"markerfacecolor": "white",
"markeredgecolor": "black",
"markersize": "7",
},
color="teal",
)
sns.histplot(
x=x,
bins=bins,
kde=True,
stat="density",
ax=ax_hist,
color="darkorchid",
edgecolor="black",
)
ax_hist.axvline(
data[feature].mean(), color="teal", label="mean=%f" % n
) # Draw the mean line
ax_hist.axvline(
data[feature].median(), color="red", label="median=%f" % m
) #Draw the median line
ax_box.set(yticks=[]) # Format the y axis label
#sns.despine(ax=ax_hist) # Remove the axis lines on the hist plot
#sns.despine(ax=ax_box, left=True) # Remove the axis lines on the box plot
ax_hist.legend(loc="upper right") # Place the legend in the upper right corner
plt.suptitle(feature)
plt.tight_layout()`
Here is a screen shot of the output
Here is a screen shot of the data
I find this way of using matplotlib.pyplot and add_subplot() more convenient to add multiple subplots in a plot.
import matplotlib.pyplot as plt
fig = plt.figure(figsize=[22, 6])
ax = fig.add_subplot(1, 2, 1) # (1,2) means 1 row 2 columns, '1' at the final index indicates the order of this subplot i.e., first subplot on left side
ax.hist(somedata_for_left_side)
ax.set_title('Title 01')
ax = fig.add_subplot(1, 2, 2) # '2' at the final index indicates the order of this subplot i.e., second subplot on the right side
ax.hist(somedata_for_right_side)
ax.set_title('Title 02')
Example output plot (different titles and plot type):
You can try to adapt this to your code. Let me know if this helps.
Since it's a bit hard to see visually what your error is. Can you add some screenshots for easier understanding?

How to plot two case1.hdf5 and case2.hdf5 files in matplotlib. Seeking help to correct the script

I have below script which only plots case1.hdf5 file.
I want to plot another case2.hdf5 file in same script such that I
get two overlapping plots.
Additionally, I want to use
Times New Roman fonts for labels and titles.
Insert Legends for both the plots.
Multiply Y-axis data with some constant number.
This script gives bottom three lines in a same colour but I want all
three in different solid colours for case1.hdf5 and with same
colour and dashed for another case2.hdf5 file.
My script is here
import h5py
import matplotlib.pyplot as plt
import warnings
import matplotlib
warnings.filterwarnings("ignore") # Ignore all warnings
ticklabels=[r'$\Gamma$','F','Q','Z',r'$\Gamma$']
params = {
'mathtext.default': 'regular',
'axes.linewidth': 1.2,
'axes.edgecolor': 'Black',
}
plt.rcParams.update(params)
fig, ax = plt.subplots()
f = h5py.File('band.hdf5', 'r')
#print ('datasets are:')
print(list(f.keys()))
dist=f[u'distance']
freq=f[u'frequency']
kpt=f[u'path']
# Iterate over each segment
for i in range(len(dist)):
# Iteraton over each band
for nbnd in range(len(freq[i][0])):
x=[]
y=[]
for j in range(len(dist[i])):
x.append(dist[i][j])
y.append(freq[i][j][nbnd])
# First 3 bands are red
if (nbnd<3):
color='red'
else:
color='black'
ax.plot(x, y, c=color, lw=2.0, alpha=0.8)
# Labels and axis limit and ticks
ax.set_ylabel(r'Frequency (THz)', fontsize=12)
ax.set_xlabel(r'Wave Vector (q)', fontsize=12)
ax.set_xlim([dist[0][0],dist[len(dist)-1][-1]])
xticks=[dist[i][0] for i in range(len(dist))]
xticks.append(dist[len(dist)-1][-1])
ax.set_xticks(xticks)
ax.set_xticklabels(ticklabels)
# Plot grid
ax.grid(which='major', axis='x', c='green', lw=2.5, linestyle='--', alpha=0.8)
# Save to pdf
plt.savefig('plots.pdf', bbox_inches='tight')
You see, there is
First 3 bands are red
if (nbnd<3):
color='red'
and instead of red I want all of these three in solid different colours and for case2.hdf5 in dashed lines with same colours.
1. Colours
It sounds like in the first instance you want to map different colours to the first there bands of your data.
One way you might do this is to setup a colourmap and then apply it to those first three bands. Here I have just picked the default matplotlib colormap, but there are loads to choose from, so if the default doesn't work for you I would suggest checking out the post about choosing a colormap. In most use cases you should try to stick to a perceptually constant map.
2. Legend
This should just be a matter of calling ax.legend(). Although be wary when setting the position of the legend to be outside the bounds of the plot as you need to do some extra finicking when saving to pdf, as detailed here..
However you first need to add some labels to your plot, which in your case you would do inside your ax.plot() calls. I'm not sure what you are plotting, so can't tell you what labels would be sensible, but you may want something like: ax.plot(... label=f'band {nbnd}' if nbnd < 4 else None).
Notice the inline if. You are likely going to have a whole bunch of black bands that you don't want to label individually, so you likely want to only label the first and let the rest have label = None which means no bloated legend.
3. Scale Y
If you change the way you iterate through your data you should be able to capture the h5 dataset as something that behaves much like a numpy array. What I mean by that is you really only need two loops to index the data you want. freq[i, :, nbnd] should be a 1-d array that you want to set to y. You can multiply that 1-d array by some scale value
4.
import h5py
import matplotlib.pyplot as plt
import warnings
import matplotlib
warnings.filterwarnings("ignore") # Ignore all warnings
cmap = matplotlib.cm.get_cmap('jet', 4)
ticklabels=['A','B','C','D','E']
params = {
'mathtext.default': 'regular',
'axes.linewidth': 1.2,
'axes.edgecolor': 'Black',
'font.family' : 'serif'
}
#get the viridis cmap with a resolution of 3
#apply a scale to the y axis. I'm just picking an arbritrary number here
scale = 10
offset = 0 #set this to a non-zero value if you want to have your lines offset in a waterfall style effect
plt.rcParams.update(params)
fig, ax = plt.subplots()
f = h5py.File('band.hdf5', 'r')
#print ('datasets are:')
print(list(f.keys()))
dist=f[u'distance']
freq=f[u'frequency']
kpt=f[u'path']
lbl = {0:'AB', 1:'BC', 2:'CD', 3:'fourth'}
for i, section in enumerate(dist):
for nbnd, _ in enumerate(freq[i][0]):
x = section # to_list() you may need to convert sample to list.
y = (freq[i, :, nbnd] + offset*nbnd) * scale
if (nbnd<3):
color=f'C{nbnd}'
else:
color='black'
ax.plot(x, y, c=color, lw=2.0, alpha=0.8, label = lbl[nbnd] if nbnd < 3 and i == 0 else None)
ax.legend()
# Labels and axis limit and ticks
ax.set_ylabel(r'Frequency (THz)', fontsize=12)
ax.set_xlabel(r'Wave Vector (q)', fontsize=12)
ax.set_xlim([dist[0][0],dist[len(dist)-1][-1]])
xticks=[dist[i][0] for i in range(len(dist))]
xticks.append(dist[len(dist)-1][-1])
ax.set_xticks(xticks)
ax.set_xticklabels(ticklabels)
# Plot grid
ax.grid(which='major', axis='x', c='green', lw=2.5, linestyle='--', alpha=0.8)
# Save to pdf
plt.savefig('plots.pdf', bbox_inches='tight')
This script gives me the following image with the data you supplied

Connecting non-adjacent data points in Seaborn pointplot

I want to plot categorical plots with the Seaborn pointplot, but data points that are not adjacent are not connected with a line in the plot. I would like to interpolate between non adjacent points, and connect them in the same way as adjacent points are connected, how can I do this?
An example: In the left and middle images, the blue and green points should be connected with a curve, respectively, but now they are separated into small parts. How can I plot the left and middle images just like the right one?
fig, axs = plt.subplots(ncols=3, figsize=(10,5))
exp_methods = ['fMRI left', 'fMRI right', 'MEG']
for i in range(3):
experiment = exp_methods[i]
dataf = df[df['data']==experiment]
sns.pointplot(x='number_of_subjects', y='accuracy', hue='training_size', data=dataf,
capsize=0.2, size=6, aspect=0.75, ci=95, legend=False, ax=axs[i])
I don't think there is an option to interpolate where there are missing data points, and hence the line stops instead. This question on the same topic from 2016 remains unanswered.
Instead, you could use plt.errorbar as suggested in the comments, or add the lines afterwards using plt.plot while still using seaborn to plot the means and error bars:
import seaborn as sns
tips = sns.load_dataset('tips')
# Create a gap in the data and plot it
tips.loc[(tips['size'] == 4) & (tips['sex'] == 'Male'), 'size'] = 5
sns.pointplot('size', 'total_bill', 'sex', tips, dodge=True)
# Fill gap with manual line plot
ax = sns.pointplot('size', 'total_bill', 'sex', tips, dodge=True, join=False)
# Loop over the collections of point in the axes and the grouped data frame
for points, (gender_name, gender_slice) in zip(ax.collections, tips.groupby('sex')):
# Retrieve the x axis positions for the points
x_coords = [coord[0] for coord in points.get_offsets()]
# Manually calculate the mean y-values to use with the line
means = gender_slice.groupby(['size']).mean()['total_bill']
ax.plot(x_coords, means, lw=2)

Matplotlib title spanning two (or any number of) subplot columns

Because of the nature of what I am plotting, I want subplots akin to nested tables.
I'm not sure how to ask the question clearly so I'll added some pictures instead which I hope illustrate the problem.
What I have:
What I want:
Current (shortened) code looks something like this:
fig, axes = plt.subplots(nrows=5, ncols=4)
fig.suptitle(title, fontsize='x-large')
data0.plot(x=data0.x, y=data0.y, ax=axes[0,0],kind='scatter')
data1.plot(x=data1.x, y=data1.y, ax=axes[0,1],kind='scatter')
axes[0,0].set_title('title 0')
axes[0,1].set_title('title 1')
I can't figure out how to set a title for axes[0,0] and [0,1] together. I can't find anything in the documentation either. I am not fond of fussing around with tables in latex to achieve this. Any pointers?
Setting the figure title using fig.suptitle() and the axes (subplot) titles using ax.set_title() is rather straightforward. For setting an intermediate, column spanning title there is indeed no build in option.
One way to solve this issue can be to use a plt.figtext() at the appropriate positions. One needs to account some additional space for that title, e.g. by using fig.subplots_adjust and find appropriate positions of this figtext.
In the example below, we use the bounding boxes of the axes the title shall span over to find a centralized horizontal position. The vertical position is a best guess.
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
y = np.random.rand(10,8)
colors=["b", "g", "r", "violet"]
fig, axes = plt.subplots(nrows=2, ncols=4, sharex=True, sharey=True, figsize=(8,5))
#set a figure title on top
fig.suptitle("Very long figure title over the whole figure extent", fontsize='x-large')
# adjust the subplots, i.e. leave more space at the top to accomodate the additional titles
fig.subplots_adjust(top=0.78)
ext = []
#loop over the columns (j) and rows(i) to populate subplots
for j in range(4):
for i in range(2):
axes[i,j].scatter(x, y[:,4*i+j], c=colors[j], s=25)
# each axes in the top row gets its own axes title
axes[0,j].set_title('title {}'.format(j+1))
# save the axes bounding boxes for later use
ext.append([axes[0,j].get_window_extent().x0, axes[0,j].get_window_extent().width ])
# this is optional
# from the axes bounding boxes calculate the optimal position of the column spanning title
inv = fig.transFigure.inverted()
width_left = ext[0][0]+(ext[1][0]+ext[1][1]-ext[0][0])/2.
left_center = inv.transform( (width_left, 1) )
width_right = ext[2][0]+(ext[3][0]+ext[3][1]-ext[2][0])/2.
right_center = inv.transform( (width_right, 1) )
# set column spanning title
# the first two arguments to figtext are x and y coordinates in the figure system (0 to 1)
plt.figtext(left_center[0],0.88,"Left column spanning title", va="center", ha="center", size=15)
plt.figtext(right_center[0],0.88,"Right column spanning title", va="center", ha="center", size=15)
axes[0,0].set_ylim([0,1])
axes[0,0].set_xlim([0,10])
plt.show()
New in matplotlib 3.4.0
You can use subfigures if you have matplotlib version >= 3.4.0 (as mentioned in a comment by #ra0).
Once the subfigures are created, you can treat them exactly as you would a normal figure and create subplots and add suptitles.
Documentation and examples on subfigures.
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
y = np.random.rand(10, 8)
colors = ["b", "g", "r", "violet"]
fig = plt.figure(figsize=(8, 5), constrained_layout=True)
subfigs = fig.subfigures(1, 2)
titles = ["Left spanning title", "Right spanning title"]
for i, subfig in enumerate(subfigs):
axes = subfig.subplots(2, 2)
for j, row in enumerate(axes):
for k, ax in enumerate(row):
ax.scatter(x, y[:, i*4 + j*2 + k], color=colors[i*2 + k], s=25)
ax.set_xlim([0, 10])
ax.set_ylim([0, 1])
if j == 0:
ax.set_title(f"fig{i}, row{j}, col{k}")
subfig.suptitle(titles[i])
fig.suptitle("Very long figure title over the whole figure extent", fontsize='x-large')
plt.show()

Minor ticks in pandas plot

So I am trying to get minor tick grid lines to get displayed but they don't seem to appear on the plot. An example code is
data_temp = pd.read_csv(dir_readfile, dtype=float, delimiter='\t',
names = names, usecols=[0,1,2,3,4])
result = data_temp.groupby(['A', 'D']).agg({'B':'mean', 'E':'mean'})
result2 = result.unstack()
x = np.arange(450, 700, 50, dtype = int)
plt.grid(True, which='both')
plt.minorticks_on()
result2.B.plot(lw=2,colormap='jet',marker='.',markersize=4,
title='A v/s B', legend = True, grid = 'on' ,
xlim = [450, 700], ylim = [-70, -0], xticks = x)
What I get is
The major grid lines are displayed but the minor ones are not. I looked into the pandas documentation but just see the grid option. I was hoping to get the minor ticks grid lines to be a every 10th location on the X axis that is 460 470 etc and every location on the Y (actual scale of Y is a bit smaller)
Before plt.show() add plt.minorticks_on().
If you want to add minor ticks for selected axis then use:
ax = plt.gca()
ax.tick_params(axis='x',which='minor',bottom='off')

Categories

Resources