I am trying to plot my data where it shows my predicted values superimposed with the actual data values. It does the job but the bar that represents the y value become ridiculously small and uninterpretable and the x-axis labels only show at the bottom of the last graph.
Bit of background- the class ids are essentially subplots of different graphs with different actual and predicted values.
enter image description here
g = sns.catplot(data=plt_df,
y='Outcome',
x='DT',
kind='bar',
ci=None,
hue='Outcome_Type',
row='CLASS_ID',
palette=sns.color_palette(['red', 'blue']),
height = 10,
aspect = 3.5)
g.fig.subplots_adjust(hspace=1)
fig, ax = plt.subplots(figsize=(20, 9))
g.fig.suptitle("Distribution Plot Comparing Actual and Predicted Visits given caliberated Betas - " + describe_plot)
g.set_xlabels('Drive Time (Mins')
g.set_ylabels('Visits Percentage')
plt.xticks(rotation= 90)
plt.show()
Related
This is the graph I obtained from the code shown below (this is a snippet of a much larger script)
dataset = pd.read_csv('mon-ac-on-uni-on.csv')
print(dataset.columns)
X_test_mon = dataset[['Day', 'Month', 'Hour', 'AirConditioning', 'Temp','Humidity', 'Calender','Minute']]
y_test_mon = dataset.loc[:, 'AVG(totalRealPower)'].values
print(X_test_mon.columns)
y_pred_mon=regr.predict(X_test_mon)
plt.plot(y_test_mon, color = 'red', label = 'Real data')
plt.plot(y_pred_mon, color = 'blue', label = 'Predicted data')
plt.title('Random Forest Prediction- MONDAY- AC-ON-Uni-ON')
plt.legend()
plt.xlabel('Time')
plt.ylabel('Watt')
plt.show()
As you can see it has rows count on x-axis and power in watt on y-axis
now I want to have only time (Hour) ticks (8 - 17) on x-axis and power in KW (i.e divided by 1000) plotted on the y-axis.
For achieving that I tried following
plt.xticks(X_test_mon['Hour'])
plt.yticks(np.round(y_test_mon/1000))
but what I got is shown below: just black square on both the axes
I also tried
plt.xticks(range(8,17))
but no change. I am lost here. Please help!
As far as i can see, the results from y_test_mon and y_pred_mon are plotted against the "index" of the respective dataset. From the line, where X_test_mon is defined I would suspect, that the smallest timestep between each datapoint in the plot is 1 hour.
Right now the plot is drawn for the whole monitoring timespan. Try the following:
dates = X_test_mon.groupby(['Day','Month']).groups.keys()
for day, month in dates:
fig, ax = plt.subplots()
daily_avg_test_data = y_test_mon[(y_test_mon['Day'] == day) & (y_test_mon['Month'] == month)]
daily_avg_pred_data = y_pred_mon[(y_test_mon['Day'] == day) & (y_test_mon['Month'] == month)]
daily_avg_test_data.plot(x='Hour', y='AVG(totalRealPower)', ax=ax)
daily_avg_pred_data.plot(x='Hour', y='AVG(totalRealPower)', ax=ax)
plt.xlabel('Time')
plt.ylabel('kW')
# values were selected from the provided image, should fit the actual plotted data range
major_ticks=np.arange(20000, 120000, 20000)
# for plt.yticks(actual, replacement) you have to provide the actual tick (data) values and then the
# "replacement" values
plt.yticks(major_ticks, major_ticks/1000)
plt.show()
This should generate multiple figures (one for each day) that contain hourly data and
y-axis scaling in kW.
I have plotted a graph in python with a subplot of residuals and am trying to find a way to at a histogram plot of the residuals on the end of the histogram plot. I would also like to add a grey band on the residual plot showing 1 standard deviation.
also is there a way to remove the top and right-hand side boarders of the plot.
Here is a copy of the code and the graph I currently have.
fig1 = pyplot.figure(figsize =(9.6,7.2))
plt.frame1 =fig1.add_axes((0.2,0.4,.75,.6))
pyplot.errorbar(xval, yval*1000, yerr=yerr*1000, xerr=xerr, marker='x', linestyle='None')
# Axis labels
pyplot.xlabel('Height (m)', fontsize = 12)
pyplot.ylabel('dM/dt (g $s^{-1}$)', fontsize = 12)
# Generate best fit line using model function and best fit parameters, and add to plot
fit_line=model_funct(xval, [a_soln, b_soln])
pyplot.plot(xval, fit_line*1000)
# Set suitable axis limits: you will probably need to change these...
#pyplot.xlim(-1, 61)
#pyplot.ylim(65, 105)
# pyplot.show()
plt.frame2 = fig1.add_axes((0.2,0.2,.75,.2)) #start frame1 at 0.2, 0.4
plt.xlabel("Height of Water (m)", fontsize = 12)
plt.ylabel("Normalised\nResiduals", fontsize = 12) #\n is used to start a new line
plt.plot(h,normalised_residuals,"x", color = "green")
plt.axhline(0, linewidth=1, linestyle="--", color="black")
plt.savefig("Final Graph.png", dpi = 500)
The naming in your code is a bit weird, therefore I only post snippets since it is hard to try it by myself. Sometimes you use pyplot and sometimes you use plt which should be the same. Also you should name your axis like this ax = fig1.add_axes((0.2,0.4,.75,.6)). Then, if you do the plot, you should call it with the axis directly, i.e. use ax.errorbar().
To hide the borders of the axis in the top plot use:
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
Adding an error band in the bottom plot is pretty easy to do. Just calculate the mean and standard deviation using np.mean() and np.std(). Afterwards, call
plt.fill_between(h, y1=np.mean(normalised_residuals) - np.std(normalised_residuals),
y2=np.mean(normalised_residuals) + np.std(normalised_residuals),
color='gray', alpha=.5)
and change the color and alpha however you want it to be.
For the histogram projection you just add another axis like you've done it two times before (let's assume it is called ax) and call
ax.hist(normalised_residuals, bins=8, orientation="horizontal")
Here, bins has to be set to a small value probably since you don't have that many data points.
I'm working with biological data, and am using the heatmap from Seaborn to plot Pearson R values so I can visually compare expression of each of 22 cell types with every other cell type (making a 22x22 heatmap).
I've had two separate problems. Here is what my plots looked like the other day:
You can see that in one plot the y-tick labels are spaced, but the last label is clipped off. Then in the other plot, all of the y-tick labels are present, but are spaced too tightly.
Here's what my plots look like today. I foolishly am not sure what I changed when trying to fix the issue in the first image (aside from changing vmin and vmax, and the color scheme), but now the y-tick labels are all on top of each other.
The y-ticks are all on top of each other.
Here is my code to generate the heatmaps:
def cancerHeatmap(patients, cancer):
x_labels = ['Naive B','Memory B','Plasma','CD8 T','CD4 T Naive','CD4 T MR',
'CD4 T MA','FH T','Tregs','GD T',
'NK resting','NK activated','Monos','Macros M0','Macros M1','Macros M2',
'Dendritic resting','Dendritic activated','Mast resting','Mast activated','Eosinophils','Neutrophils']
pearsonrs = getPearsons(patients, cancer)
axs = sns.heatmap(pearsonrs, cmap = 'RdYlBu', vmin=-0.6, vmax=0.6)
axs.set_yticklabels(x_labels, rotation = 0, fontsize = 10)
axs.set_xticklabels(x_labels, rotation = 90)
axs.xaxis.set_ticks_position('top')
axs.set_xlabel(cancer)
return axs
sns.set_style("whitegrid")
sns.despine()
plt.figure(figsize=(12,70))
plt.suptitle('Subtitle')
for i, cancer in enumerate(cancer_types):
print cancer
plt.subplot((len(cancer_types)/2)+1,2, i+1)
ax = cancerHeatmap(patients, cancer)
plt.tight_layout(rect=[0, 0, 1, 0.97])
plt.savefig('outfile.pdf', dpi=200)
The important details of this are:
"pearsonrs" is a function that generates the numpy array of all of the values that go directly into the heatmap
"cancer_types" is a list of the different cancer types that I'll be generating heatmaps for
I have no idea why this is happening, especially because I can generate one plot at a time with this code, and the axis are perfect:
x_labels = ['Naive B','Memory B','Plasma','CD8 T','CD4 T Naive','CD4 T MR',
'CD4 T MA','FH T','Tregs','GD T',
'NK resting','NK activated','Monos','Macros M0','Macros M1','Macros
M2', 'Dendritic resting','Dendritic activated','Mast resting','Mast activated','Eosinophils','Neutrophils']
axs = sns.heatmap(np.array(pearsonrs), cmap = 'rainbow', vmin=-0.6, vmax=0.6)
axs.invert_yaxis()
axs.set_yticklabels(x_labels, rotation = 0, fontsize = 10)
axs.set_xticklabels(x_labels, rotation = 90, fontsize = 10)
axs.set_xlabel('Cancer')
axs.xaxis.set_ticks_position('top')
Any help is immensely appreciated.
Edit:
I ran identical scripts on my rMBP and my coworker's rMBP, and mine produced the figures with the stacked axis labels (image 2), and my coworker's computer produced the figures with odd spacing (image 1).
Edit 2:
It turns out that the Conda installation was slightly different on each computer. Updating both computers to the most recent Conda (4.5.9) does not change the heatmaps that either computer produces.
I'm using matplotlib to look at how wins are distributed based on betting odds for the MLB. The issue is that because betting odds are either >= 100 or <= -100, there's a big gap in the middle of my histogram.
Is there any way to exclude certain bins (specifically anything between -100 and 100) so that the bars of the chart flow more smoothly?
Link to current histogram
Here's the code I have right now:
num_bins = 20
fig, ax = plt.subplots()
n, bins, patches = ax.hist(winner_odds_df['WinnerOdds'], num_bins,
range=range_of_winner_odds)
ax.set_xlabel('Betting Odds')
ax.set_ylabel('Win Frequency')
ax.set_title('Histogram of Favorite Win Frequency Based on Betting Odds (2018)')
fig.tight_layout()
plt.show()
You could break your chart's x-axis as explained here, by plotting on two different axes that are made to visually look like one plot. The essential part, rewritten to apply to the x-axis instead of the y-axis, is:
f, (axl, axr) = plt.subplots(1, 2, sharey=True)
# plot the same data on both axes
axl.hist(winner_odds_df['WinnerOdds'], num_bins)
axr.hist(winner_odds_df['WinnerOdds'], num_bins)
# zoom-in / limit the view to different portions of the data
axl.set_xlim(-500, -100) # outliers only
axr.set_xlim(100, 500) # most of the data
# hide the spines between axl and axr
axl.spines['right'].set_visible(False)
axr.spines['left'].set_visible(False)
axr.yaxis.tick_right()
# How much space to leave between plots
plt.subplots_adjust(wspace=0.15)
See the linked document for how to polish this by adding diagonal break lines. The basic version produced by the code above then looks like this:
I'm trying to plot a simple histogram with multiple data in parallel.
My data are a set of 2D ndarrays, all of them with the same dimension (in this example 256 x 256).
I have this method to plot the data set:
def plot_data_histograms(data, bins, color, label, file_path):
"""
Plot multiple data histograms in parallel
:param data : a set of data to be plotted
:param bins : the number of bins to be used
:param color : teh color of each data in the set
:param label : the label of each color in the set
:param file_path : the path where the output will be save
"""
plt.figure()
plt.hist(data, bins, normed=1, color=color, label=label, alpha=0.75)
plt.legend(loc='upper right')
plt.savefig(file_path + '.png')
plt.close()
And I'm passing my data as follows:
data = [sobel.flatten(), prewitt.flatten(), roberts.flatten(), scharr.flatten()]
labels = ['Sobel', 'Prewitt', 'Roberts Cross', 'Scharr']
colors = ['green', 'blue', 'yellow', 'red']
plot_data_histograms(data, 5, colors, labels, '../Visualizations/StatisticalMeasures/RMSEHistograms')
And I got this histogram:
I know that this may be stupid, but I didn't get why my yticks varies from 0 to 4.5. I know that is due the normed parameter, but even reading this;
If True, the first element of the return tuple will be the counts
normalized to form a probability density, i.e., n/(len(x)*dbin). In a
probability density, the integral of the histogram should be 1; you
can verify that with a trapezoidal integration of the probability
density function.
I didn't really get how it works.
Also, once I set my bins to be equal five and the histogram has exactly 5 xticks (excluding borders), I didn't understand why I have some bars in the middle of some thicks, like the yellow one over the 0.6 thick. Since my number of bins and of xticks matches, I though that each set of four bars should be concentrated inside each interval, like it happens with the four first bars, completely concentrated inside the [0.0, 0.2] interval.
Thank you in advance.
The reason this is confusing is because you're squishing four histograms on one plot. In order to do this, matplotlib chooses to narrow the bars and put a gap between them. In a standard histogram, the total area of all bins is either 1 if normed or N. Here's a simple example:
a = np.random.rand(10)
bins = np.array([0, 0.5, 1.0]) # just two bins
plt.hist(a, bins, normed=True)
First note that the each bar covers the entire range of its bin: The first bar ranges from 0 to 0.5, and its height is given by the number of points in that range.
Next, you can see that the total area of the two bars is 1 because normed = True: The width of each bar is 0.5 and the heights are 1.2 and 0.8.
Let's plot the same thing again with another distribution so you can see the effect:
b = np.random.rand(10)
plt.hist([a, b], bins, normed=True)
Recall that the blue bars represent exactly the same data as in the first plot, but they're less than half the width now because they must make room for the green bars. You can see that now two bars plus some whitespace covers the range of each bin. So we must pretend that the width of each bar is actually the width of all bars plus the width of the whitespace gap when we are calculating the bin range and bar area.
Finally, notice that nowhere do the xticks align with the binedges. If you wish, you can set this to be the case manually, with:
plt.xticks(bins)
If you hadn't manually created bins first, you can grab it from plt.hist:
counts, bins, bars = plt.hist(...)
plt.xticks(bins)