I have a pandas DataFrame with a DateTime index.
I can plot a timeseries from it, and by default it looks fine.
But when I try to print a bar chart from the same DataFrame, the xAxis labels are ruined (massive overlapping). (Also the spacing of the data is weird (big gaps between sets of bars)
I tried autoformat_xdate(), but that didn't help anything.
This is the simple code fragment I used to generate the charts
entire_df['predict'] = regr.predict(entire_df[X_cols])
entire_df['error'] = entire_df['predict']-entire_df['px_usd_mmbtu']
#entire_df['error'].plot(kind='hist')
fig=plt.figure()
entire_df[['px_usd_mmbtu', 'predict']].plot()
fig2 = plt.figure()
entire_df['error'].plot(kind='bar')
#fig2.autofmt_xdate() #doesn't help
print (type(error_df.index))
Try this:
entire_df['predict'] = regr.predict(entire_df[X_cols])
entire_df['error'] = entire_df['predict']-entire_df['px_usd_mmbtu']
plt.figure(figsize=(15,15))
plt.xticks(rotation = 90) # or change from 90 to 45
#entire_df['error'].plot(kind='hist')
entire_df[['px_usd_mmbtu', 'predict']].plot()
entire_df['error'].plot(kind='bar')
Related
I am trying to plot a time-series data by HOUR as shown in the image, but I keep getting this error - Locator attempting to generate 91897 ticks ([15191.0, ..., 19020.0]), which exceeds Locator.MAXTICKS (1000). I have tried all available solutions on StackOverflow for similar problems but still could not get around it, Please help.
Link to image: https://drive.google.com/file/d/1b1PNCqVp7W65ciVPEWELiV2cTiXgBu2V/view?usp=sharing
Link to CSV:
https://drive.google.com/file/d/113kYjsqbyL5wx1j204yK6Wmop4wLsqMQ/view?usp=sharing
Attempted codes:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
df = pd.read_csv('price.csv', delimiter=',')
plt.rcParams['font.size'] = 18
fig, (ax) = plt.subplots(ncols=1, nrows=1, figsize=(20,15))
# Plot the data
ax.plot(df.VOLUME, label = 'volume')
# Set the title
ax.set_title("AUDUSD Hourly Variation\n", fontsize=23)
# Set the Y-Axis label
ax.set_ylabel("\nVolume")
# Set the X-Axis label
ax.set_xlabel('\nHour')
ax.legend() # Plot the legend for axes
# apply locator and formatter to the ticks on the X axis
ax.xaxis.set_major_locator(md.HourLocator(interval = 1)) # X axis will be formatted in Hour
ax.xaxis.set_major_formatter(md.DateFormatter('%H')) # set the date format to the hour shortname
# Set the limits (range) of the X-Axis
ax.set_xlim([pd.to_datetime('2011.08.05', format = '%Y.%m.%d'),
pd.to_datetime('2022.01.28', format = '%Y.%m.%d')])
plt.tight_layout()
plt.show()
Thanks for your assistance.
According to the error message, you are attempting to plot a time range that covers ~4000 hours (19000 - 15000 = 4000), but are only allowed to have max 1000 ticks. Increase the interval to 4 or 5.
ax.xaxis.set_major_locator(md.HourLocator(interval = 5))
Perhaps . doesn't work well as a separator in dates (because it is also used as a decimal separator in numbers). Try:
ax.set_xlim([pd.to_datetime('2011-08-05', format = '%Y-%m-%d'),
pd.to_datetime('2022-01-28', format = '%Y-%m-%d')])
If that doesn't cause the error, then change your input data correspondingly.
I am trying to plot a subplot which contains 14 candlestick charts of cryptocurrency data. (
https://www.kaggle.com/c/g-research-crypto-forecasting)
However, it seems that it can't display the figure properly.
Here is my code:
from plotly import subplots
import plotly.graph_objects as go
fig = subplots.make_subplots(rows=7,cols=2)
for ix,coin_name in enumerate(asset_details["Asset_Name"]):
coin_df = crypto_df[crypto_df["Asset_ID"]==asset_names_dict[coin_name]].set_index("timestamp")
coin_df_mini = coin_df.iloc[-100:]
column = lambda ix: 1 if ix % 2 == 0 else 2
candlestick = go.Candlestick(x=coin_df_mini.index, open=coin_df_mini['Open'], high=coin_df_mini['High'], low=coin_df_mini['Low'], close=coin_df_mini['Close'])
fig = fig.add_trace(candlestick, row=((ix//2) + 1), col=column(ix))
fig.update_layout(xaxis_rangeslider_visible=False)
fig.update_layout(title_text="Candlestick Charts", height=2800)
fig.show()
And here is the problem:
rangeslider_problem
No matter I plot the figure with or without the rangeslider, it's always out of shape.
You need to hide the slider on the x-axis unit created in the subplot. My answer was to do all the subplots manually. I don't have time to deal with this right now, but there is also a way to update the output content in a loop process.
fig.update_layout(xaxis1=dict(rangeslider=dict(visible=False)),
xaxis2=dict(rangeslider=dict(visible=False)),
xaxis3=dict(rangeslider=dict(visible=False)),
xaxis4=dict(rangeslider=dict(visible=False)),
xaxis5=dict(rangeslider=dict(visible=False)),
xaxis6=dict(rangeslider=dict(visible=False)),
xaxis7=dict(rangeslider=dict(visible=False)),
xaxis8=dict(rangeslider=dict(visible=False)),
xaxis9=dict(rangeslider=dict(visible=False)),
xaxis10=dict(rangeslider=dict(visible=False)),
xaxis11=dict(rangeslider=dict(visible=False)),
xaxis12=dict(rangeslider=dict(visible=False)),
xaxis13=dict(rangeslider=dict(visible=False)),
xaxis14=dict(rangeslider=dict(visible=False)),
)
I have a situation with my data. I like the behaviour of .plot() over a data frame. But sometimes it doesn't work, because the frequency of the time index is not an integer.
But reproducing the plot in matplotlib is OK. Just ugly.
The part that bother me the most is the settings of the x axis. The tick frequency and the limits. Is there any easy way that I can reproduce this behaviour in matplotlib?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Create Data
f = lambda x: np.sin(0.1*x) + 0.1*np.random.randn(1,x.shape[0])
x = np.arange(0,217,0.001)
y = f(x)
# Create DataFrame
data = pd.DataFrame(y.transpose(), columns=['dp'], index=None)
data['t'] = pd.date_range('2021-01-01 14:32:09', periods=len(data['dp']),freq='ms')
data.set_index('t', inplace=True)
# Pandas plot()
data.plot()
# Matplotlib plot (ugly x-axis)
plt.plot(data.index,data['dp'])
EDIT: Basically, what I want to achieve is a similar spacing in the xtics labels, and the tight margin adjust of the values. Legends and axis title, I can do them
Pandas output
Matplotlib output
Thanks
You can use some matplotlib date utilities:
Figure.autofmt_xdate() to unrotate and center the date labels
Axis.set_major_locator() to change the interval to 1 min
Axis.set_major_formatter() to reformat as %H:%M
fig, ax = plt.subplots()
ax.plot(data.index, data['dp'])
import matplotlib.dates as mdates
fig.autofmt_xdate(rotation=0, ha='center')
ax.xaxis.set_major_locator(mdates.MinuteLocator(interval=1))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
# uncomment to remove the first `xtick`
# ax.set_xticks(ax.get_xticks()[1:])
I'm working on an assignment from school, and have run into a snag when it comes to my stacked area chart.
The data is fairly simple: 4 columns that look similar to this:
Series id
Year
Period
Value
LNS140000
1948
M01
3.4
I'm trying to create a stacked area chart using Year as my x and Value as my y and breaking it up over Period.
#Stacked area chart still using unemployment data
x = d.Year
y = d.Value
plt.stackplot(x, y, labels = d['Period'])
plt.legend(d['Period'], loc = 'upper left')
plt.show()enter code here`
However, when I do it like this it only picks up M01 and there are M01-M12. Any thoughts on how I can make this work?
You need to preprocess your data a little before passing them to the stackplot function. I took a look at this link to work on an example that could be suitable for your case.
Since I've seen one row of your data, I add some random values to the dataset.
import pandas as pd
import matplotlib.pyplot as plt
dd=[[1948,'M01',3.4],[1948,'M02',2.5],[1948,'M03',1.6],
[1949,'M01',4.3],[1949,'M02',6.7],[1949,'M03',7.8]]
d=pd.DataFrame(dd,columns=['Year','Period','Value'])
years=d.Year.unique()
periods=d.Period.unique()
#Now group them per period, but in year sequence
d.sort_values(by='Year',inplace=True) # to ensure entire dataset is ordered
pds=[]
for p in periods:
pds.append(d[d.Period==p]['Value'].values)
plt.stackplot(years,pds,labels=periods)
plt.legend(loc='upper left')
plt.show()
Is that what you want?
So I was able to use Seaborn to help out. First I did a pivot table
df = d.pivot(index = 'Year',
columns = 'Period',
values = 'Value')
df
Then I set up seaborn
plt.style.use('seaborn')
sns.set_style("white")
sns.set_theme(style = "ticks")
df.plot.area(figsize = (20,9))
plt.title("Unemployment by Year and Month\n", fontsize = 22, loc = 'left')
plt.ylabel("Values", fontsize = 22)
plt.xlabel("Year", fontsize = 22)
It seems to me that the problem you are having relates to the formatting of the data. Look how the values are formatted in this matplotlib example. I would try to groupby the data by period, or pivot it in the correct format, and then graphing again.
I have a bar graph with multiple data series and i want to set the xaxis values to a significant value of %.2f I already tried using the set_major formatter for the first graph, but it resets the values to 0, while the values should be like the second graph.
How can I fix this?
My code look like this:
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as mtick
# select the measurement location
MATH = "import/data/place"
SAVE = "save/location"
fig, axes = plt.subplots(figsize=(12,15),nrows=2, ncols=1) # size of the plots and the placing
fig.subplots_adjust(hspace=0.5) # set space between plots
DATA = pd.read_csv(MATH,delimiter=',',usecols = [2,3,4,5,6,7,8,9,10,11,12],names = ['set_t','set_rh',
'type','math','ref','LUFFT','VPL','VPR','VVL','VVR','PRO'], parse_dates=True)
# select the data
temp = DATA.loc[(DATA['type']=='T')&(DATA['math']=='dif')] # dif temperature data
rh = DATA.loc[((DATA['type']=='RH')&(DATA['math']=='dif'))] # dif relative humidity data
# plot temperature
fg = temp.plot.bar(x='set_t',y = ['LUFFT','VPL','VPR','VVL','VVR','PRO'],
color = ['b','firebrick','orange','forestgreen','darkturquoise','indigo'],
ax=axes[0])
fg.grid(True)
fg.set_ylabel('$ΔT$(°C)',fontsize = 12)
fg.set_xlabel('ref $T$ (°C)',fontsize = 12)
fg.set_title('Difference in T from reference at constant relative humidity 50%',fontsize = 15)
fg.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.2f'))
fg.xaxis.set_major_formatter(mtick.FormatStrFormatter('%.2f'))
# plot relative humidity
df = rh.plot.bar(x='set_t',y = ['LUFFT','VPL','VPR','VVL','VVR','PRO'],
color = ['b','firebrick','orange','forestgreen','darkturquoise','indigo'],
ax=axes[1])
df.grid(True)
df.set_ylabel('$ΔU$(%)',fontsize = 12)
df.set_xlabel('ref $T$ (°C)',fontsize = 12)
df.set_title('Difference in U from reference at constant relative humidity 50%',fontsize = 15)
plt.tight_layout()
plt.savefig(SAVE + "_example.jpg")
plt.show()
A sample of my data:
07:40:00,07:50:00,39.85716354999982,51.00504745588235,T,dif,,0.14283645000018197,-0.07502069285698099,-0.15716354999978677,0.0020201234696060055,-0.07111703837193772,-0.0620802166664447,
07:40:00,07:50:00,39.85716354999982,51.00504745588235,RH,dif,,-0.40504745588239643,3.994952544117652,2.994952544117652,4.994952544117652,,6.994952544117652,
08:40:00,08:50:00,34.861160704969016,51.1297401832298,T,dif,,0.22883929503095857,0.2509082605481865,-0.2575243413326831,0.24864321659958222,0.14092262836431502,-0.04441070496899613,
08:40:00,08:50:00,34.861160704969016,51.1297401832298,RH,dif,,-0.32974018322978793,3.8702598167702007,2.8702598167702007,4.870259816770201,,6.870259816770201,
This is due to the fact that with a grouped barplot like this, made by Pandas, the x-axes loses its actual 'range', and the values associated with the tick position become the position itself. That's a bit cryptic, but you can see with fg.get_xlim() that the values have lost 'touch' with the original data, and are simply increasing integers. You can explore/debug the 'values' and 'positions' Matplotlib uses if you provide a FuncFormatter with a function like this:
def check_pos(val, pos):
print(val, pos)
return '%.2f' % val
This basically shows that no formatter is going to work for your case.
Luckily the ticklabels are set correctly (as text), so you could parse these to float, and format them as you wish.
Remove your formatter altogether, and set the xticklabels with:
fg.set_xticklabels(['%.2f' % float(x.get_text()) for x in fg.get_xticklabels()])
Note that Matplotlib itself is perfectly capable of preserving the correct tickvalues in combination with a bar plot, but you would have to do the 'grouping' etc yourself, so that's not very convenient as well.