This is the graph I obtained from the code shown below (this is a snippet of a much larger script)
dataset = pd.read_csv('mon-ac-on-uni-on.csv')
print(dataset.columns)
X_test_mon = dataset[['Day', 'Month', 'Hour', 'AirConditioning', 'Temp','Humidity', 'Calender','Minute']]
y_test_mon = dataset.loc[:, 'AVG(totalRealPower)'].values
print(X_test_mon.columns)
y_pred_mon=regr.predict(X_test_mon)
plt.plot(y_test_mon, color = 'red', label = 'Real data')
plt.plot(y_pred_mon, color = 'blue', label = 'Predicted data')
plt.title('Random Forest Prediction- MONDAY- AC-ON-Uni-ON')
plt.legend()
plt.xlabel('Time')
plt.ylabel('Watt')
plt.show()
As you can see it has rows count on x-axis and power in watt on y-axis
now I want to have only time (Hour) ticks (8 - 17) on x-axis and power in KW (i.e divided by 1000) plotted on the y-axis.
For achieving that I tried following
plt.xticks(X_test_mon['Hour'])
plt.yticks(np.round(y_test_mon/1000))
but what I got is shown below: just black square on both the axes
I also tried
plt.xticks(range(8,17))
but no change. I am lost here. Please help!
As far as i can see, the results from y_test_mon and y_pred_mon are plotted against the "index" of the respective dataset. From the line, where X_test_mon is defined I would suspect, that the smallest timestep between each datapoint in the plot is 1 hour.
Right now the plot is drawn for the whole monitoring timespan. Try the following:
dates = X_test_mon.groupby(['Day','Month']).groups.keys()
for day, month in dates:
fig, ax = plt.subplots()
daily_avg_test_data = y_test_mon[(y_test_mon['Day'] == day) & (y_test_mon['Month'] == month)]
daily_avg_pred_data = y_pred_mon[(y_test_mon['Day'] == day) & (y_test_mon['Month'] == month)]
daily_avg_test_data.plot(x='Hour', y='AVG(totalRealPower)', ax=ax)
daily_avg_pred_data.plot(x='Hour', y='AVG(totalRealPower)', ax=ax)
plt.xlabel('Time')
plt.ylabel('kW')
# values were selected from the provided image, should fit the actual plotted data range
major_ticks=np.arange(20000, 120000, 20000)
# for plt.yticks(actual, replacement) you have to provide the actual tick (data) values and then the
# "replacement" values
plt.yticks(major_ticks, major_ticks/1000)
plt.show()
This should generate multiple figures (one for each day) that contain hourly data and
y-axis scaling in kW.
Related
I am trying to plot my data where it shows my predicted values superimposed with the actual data values. It does the job but the bar that represents the y value become ridiculously small and uninterpretable and the x-axis labels only show at the bottom of the last graph.
Bit of background- the class ids are essentially subplots of different graphs with different actual and predicted values.
enter image description here
g = sns.catplot(data=plt_df,
y='Outcome',
x='DT',
kind='bar',
ci=None,
hue='Outcome_Type',
row='CLASS_ID',
palette=sns.color_palette(['red', 'blue']),
height = 10,
aspect = 3.5)
g.fig.subplots_adjust(hspace=1)
fig, ax = plt.subplots(figsize=(20, 9))
g.fig.suptitle("Distribution Plot Comparing Actual and Predicted Visits given caliberated Betas - " + describe_plot)
g.set_xlabels('Drive Time (Mins')
g.set_ylabels('Visits Percentage')
plt.xticks(rotation= 90)
plt.show()
I working on this plot and I would like to increase the ticks on the X-axis to be a bit more, but I'm stuck on it. I can't find a good example that uses Pandas plot to do this.
I only got 8 ticks on the X-axis, I would like to double it, at least. How do I get this done?
With ax.xaxis.set_major_locator(MonthLocator()) I get more ticks, but then the text is overlapping, and I can't get it to rotate.
ax.set_xticklabels(ax.get_xticks(), rotation = 50) did nothing.
# Place in DataFrame
df_avg = pd.DataFrame(pd.read_sql_query(query_avg, con))
df_total = pd.DataFrame(pd.read_sql_query(query_total, con))
con.close()
# Plot data from the DB
ax = df_avg.plot(x='dag', y='day_avg', figsize=(25, 5))
ax2 = df_avg.plot(x='dag', y='avg_temp', secondary_y=True, ax=ax)
# Set Labels
ax.set_xlabel('Time', size=12)
ax.set_ylabel('Avg amount earn (€)', size=12)
ax2.set_ylabel('Avg temp (°C)', size=12)
I'm trying to have two y-axes with the same x-axis.
This is what I have tried. But the suicide rates are not showing up on the graph.
I'm new to this, so I was wondering if anyone could spot why its not showing.
The picture is supposed to look like this with suicide rates in red and trust in blue with country as the x-axis
def suicidevstrustcountryplot(dat):
# Does income index change trust for female led countries?
# dat.plot(x ='Country', y='Income', kind = 'line')
# plt.show()
# create figure and axis objects with subplots()
fig,ax = plt.subplots()
# make a plot
ax.plot(dat.Country, dat.Trust, color="red", marker="o")
# set x-axis label
ax.set_xlabel("Country",fontsize=14)
for label in ax.get_xticklabels():
label.set_rotation(90)
label.set_ha('right')
# set y-axis label
ax.set_ylabel("Trust",color="red",fontsize=14)
# twin object for two different y-axis on the sample plot
ax2=ax.twinx()
# make a plot with different y-axis using second axis object
ax2.plot(dat.Country, dat.Trust,color="blue",marker="o")
ax2.set_ylabel("Suicide rate",color="blue",fontsize=14)
plt.show()
# save the plot as a file
fig.savefig('two_different_y_axis_for_single_python_plot_with_twinx.jpg',
format='jpeg',
dpi=100,
bbox_inches='tight')
suicidevstrustcountryplot(Femaletrust)
suicidevstrustcountryplot.suicidevstrustcountryplot.sort_values(ascending=False)[:10].plot(kind='scatter' ,title='Country')
I have the following code to print out columns from a pandas dataframe as two histograms:
df = pd.read_csv('fairview_Procedure_combined.csv')
ax = df.hist(column=['precision', 'recall'], bins=25, grid=False, figsize=(12,8), color='#86bf91', zorder=2, rwidth=0.9)
ax = ax[0]
for x in ax:
# Despine
x.spines['right'].set_visible(False)
x.spines['top'].set_visible(False)
x.spines['left'].set_visible(False)
# Switch off ticks
x.tick_params(axis="both", which="both", bottom="off", top="off", labelbottom="on", left="off", right="off", labelleft="on")
# Draw horizontal axis lines
vals = x.get_yticks()
for tick in vals:
x.axhline(y=tick, linestyle='dashed', alpha=0.4, color='#eeeeee', zorder=1)
# Remove title
x.set_title("")
# Set x-axis label
x.set_xlabel("test", labelpad=20, weight='bold', size=12)
# Set y-axis label
x.set_ylabel("count", labelpad=20, weight='bold', size=12)
# Format y-axis label
x.yaxis.set_major_formatter(StrMethodFormatter('{x:,g}'))
which gives the attached output:
I would like however to have different labels on the x-axis (in particular, those listed in my column list, that is, precision and recall)
Also, I have a grouping column (semantic_type) I would like to use to generate a bunch of paired graphs, but when I pass the by keyword in my hist method to group the histograms by semantic_type, I get an error of color kwarg must have one color per data set. 18 data sets and 1 colors were provided)
I figured it out using subplots... piece of cake.
In Pandas, I have a DataFrame of observations (baby bottle feeding volumes) that are indexed by a datetime and grouped by date:
...
bottles = bottles.set_index('datetime')
bottles = bottles.groupby(bottles.index.date)
I want to use matplotlib to plot the cumulative values as they increase each day--that is, show the volume of feedings as it increases each day and resets at midnight:
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.DayLocator())
ax.xaxis.set_minor_locator(mdates.HourLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
bottles['volume'].cumsum().plot(kind='bar', figsize=[16,8])
ax.xaxis.grid(True, which="major")
ax.xaxis.grid(False, which="minor")
ax.yaxis.grid(True)
plt.gcf().autofmt_xdate()
plt.show()
Which produces:
I'd like to only label dates on the x-axis once per day, and I'd also like to only draw a vertical grid line on date boundaries (every 24 hours). Any recommendations for how to fix the above code?
Since you didn't provide any data, I generated some dummy data. In essence, you can make the labels invisible by retrieving the ticks on the x-axis, and then making the hourly ticklabels visible.
Note: this works for hours, so resample your dataframe to hours if necessary.
import random
import pandas
import matplotlib.pyplot as plt
#generate dummy data and df
dates = pd.date_range('2017-01-01', '2017-01-10', freq='H')
df = pd.DataFrame(np.random.randint(0, 10, size=(1, len(dates)))[0], index=dates)
ax = df.groupby(pd.TimeGrouper('D')).cumsum().plot(kind='bar', width=1, align='edge', figsize=[16,8]) #cumsum with daily reset.
ax.xaxis.grid(True, which="major")
#ax.set_axisbelow(True)
#set x-labels to certain date format
ticklabels = [i.strftime('%D') for i in df.index]
ax.set_xticklabels(ticklabels)
#only show labels once per day (at the start of the day)
xticks = ax.xaxis.get_major_ticks()
n=24 # every 24 hours
for index, label in enumerate(ax.get_xaxis().get_ticklabels()):
if index % n != 0:
label.set_visible(False) # hide labels
xticks[index].set_visible(False) # hide ticks where labels are hidden
ax.legend_.remove()
plt.show()
Result: