I need to reduce or manually set the number of ticks on the x-axis of a Matplotlib line plot. This question has been asked many times here, I've gone through as many of those answers as I can find and through the Matplotlib docs and I haven't found a solution I can get working so I'm hoping for some help.
I have a Python dictionary with two sets of key:value pairs - datetime.datetime and float. There's hundreds of values in each set - but here's a snippet of the first elements just for reference:
ws_kline_dict_01 = {'time': [datetime.datetime(2023, 2, 15, 10, 35, 8)], 'close': [22183.07]}
I've converted that dictionary to a Pandas dataframe so I can see it more easily in Jupyter and also stripped out the year, month and day from 'time' using:
df_kline_dict_01 = pd.DataFrame(ws_kline_dict_01)
df_kline_dict_01['time'] = df_kline_dict_01['time'].dt.strftime('%H:%M:%S')
When I plot this via Matplotlib using 'time' as the x-axis - it prints every value as a tick which is way too cluttered (see 'Plot: Post-Panda format' below).
If I leave the datetime.datetime in its original form - Matplotlib seems to auto-select how many values it displays and it displays "Day Hour:Minutes" instead of "Hour:Minutes:Seconds" - which isn't working for me (see 'Plot: Pre-Panda format' below).
I've tried plt.locator_params(axis='x', nbins=n) - but this is giving me an error message:
"UserWarning: 'set_params()' not defined for locator of type <class 'matplotlib.category.StrCategoryLocator'>".
For reference - this is the code I'm using to produce the plot:
plt.plot(df_kline_dict_01['time'], df_kline_dict_01['close'], color = 'green', label = 'close')
plt.xticks(rotation=45, ha='right')
plt.show()
How do I (at least) reduce or (ideally) explicitly set the number of values/ticks shown on the x-axis?
Seems like this should be a pretty simple formatting task - but so far it's beating me and I'd appreciate some help getting this sorted.
Plot: Pre-Panda format
Plot: Post-Panda format
Here is a possible solution using the .xaxis.set_major_locator() method. You can adjust the max_xticks variable to suit your use-case.
...
df_kline_dict_01['time'] = df_kline_dict_01['time'].dt.strftime('%H:%M:%S')
fig, ax = plt.subplots()
ax.plot(df_kline_dict_01['time'], df_kline_dict_01['close'], color='green', label='close')
max_xticks = 6
ax.xaxis.set_major_locator(ticker.MaxNLocator(max_xticks))
plt.xticks(rotation=45, ha='right')
plt.show()
Note: I assigned max_xticks = 6 so it helps you understand the code otherwise you could just set the value in .MaxNLocator(6) in the next line of code.
Put some parameters for the locations like $plt.xticks(np.arange(min,max,step),rotation=45, ha='right')$
fill the min and max and steps as you wish
Related
Python beginner here :/!
The csv files can be found here (https://www.waterdatafortexas.org/groundwater/well/8739308)
#I'm trying to subset my data and plot them by years or every 6 months but I just cant make it work, this is my code so far
data=pd.read_csv('Water well.csv')
data["datetime"]=pd.to_datetime(data["datetime"])
data["datetime"]
fig, ax = plt.subplots()
ax.plot(data["datetime"], data["water_level(ft below land surface)"])
ax.set_xticklabels(data["datetime"], rotation= 90)
and this is my data and the output. As you can see, it only plots 2021 by time
This is my data of water levels from 2016 to 2021 and the output of the code
data
When you run your script, you get the following warning:
UserWarning: FixedFormatter should only be used together with FixedLocator
ax.set_xticklabels(data["datetime"], rotation= 90)
Your example demonstrates, why they included this warning.
Comment out your line
#ax.set_xticklabels(data["datetime"], rotation= 90)
and you have the following (correct) output:
Your code takes now the nine automatically generated x-axis ticks, removes the correct labels, and labels them instead with the first nine entries of the dataframe. Obviously, these labels are wrong, and this is the reason they provide you with the warning - either let matplotlib do the automatic labeling or do both using FixedFormatter and FixedLocator to ensure that tick positions and labels match.
For more information on Tick locators and formatters consult the matplotlib documentation.
P.S.: You also have to invert the y-axis because the data are in ft below land surface.
The problem is, you have too much data, you have to simplify it.
At first you can try to do something like this:
data["datetime"]=pd.to_datetime(data["datetime"])
date = data["datetime"][0::1000][0:10]
temp = data["water_level(ft below land surface)"][0::1000][0:10]
fig, ax = plt.subplots()
ax.plot(date, temp)
ax.set_xticklabels(date, rotation= 90)
date = data["datetime"][0::1000][0:10]
This line mean: take the index 0, then 1000, then 2000, ...
So you will have an new array. And then with this new array you just take the first 10 indexes.
It's a dirty solution
The best solution in my opinion is to create a new dataset with the average temperature for each day or each week. And after you display the result
I am trying to plot this DataFrame which records various amounts of money over a yearly series:
from matplotlib.dates import date2num
jp = pd.DataFrame([1000,2000,2500,3000,3250,3750,4500], index=['2011','2012','2013','2014','2015','2016','2017'])
jp.index = pd.to_datetime(jp.index, format='%Y')
jp.columns = ['Money']
I would simply like to make a bar graph out of this using PyPlot (i.e pyplot.bar).
I tried:
plt.figure(figsize=(15,5))
xvals = date2num(jp.index.date)
yvals = jp['Money']
plt.bar(xvals, yvals, color='black')
ax = plt.gca()
ax.xaxis_date()
plt.show()
But the chart turns out like this:
Only by increasing the width substantially will I start seeing the bars. I have a feeling that this graph is attributing the data to the first date of the year (2011-01-01 for example), hence the massive space between each 'bar' and the thinness of the bars.
How can I plot this properly, knowing that this is a yearly series? Ideally the y-axis would contain only the years. Something tells me that I do not need to use date2num(), since this seems like a very common, ordinary plotting exercise.
My guess as to where I'm stuck is not handling the year correctly. As of now I have them as DateTimeIndex, but maybe there are other steps I need to take.
This has puzzled me for 2 days. All solutions I found online seems to use DataFrame.plot, but I would rather learn how to use PyPlot properly. I also intend to add two more sets of bars, and it seems like the most common way to do that is through plt.bar().
Thanks everyone.
You can either do
jp.plot.bar()
which gives:
or plot against the actual years:
plt.bar(jp.index.year, jp.Money)
which gives:
I'm trying to graph contaminants measured in a sample over time, and some sample dates are closer together. How do I plot this line with the current datetime values, but make each xtick equidistant?
This is what I've got so far, currently the ticks are bunched together when the samples were taken closer together.
date = df_TCE.SAMPLEDATE.unique()
date_IA14 = df_TCE.SAMPLEDATE[df_TCE.SYS_LOC_CODE == 'IA-14']
IA14 = df_TCE.AL_RESULT_VALUE[df_TCE.SYS_LOC_CODE == 'IA-14']
plt.plot(date_IA14, IA14)
plt.title('TCE Time Series')
plt.xlabel('Date')
plt.ylabel('Contaminant Level')
ax = plt.subplot()
ax.set_xticks(date_IA14)
ax.set_yticks([1, 2, 3, 4, 5, 6, 7])
ax.set_facecolor('seashell')
plt.show()
This is the output with the ticks bunched:
Output
There are a few things you can try.
First, ensure that your dataframe series called SAMPLEDATE are datetime objects by running pandas.to_datetime(df_TCE.SAMPLEDATE). Resolve any parsing errors that arise so that you're truly dealing with a datetime x-axis rather than strings.
Then, check out fig.autofmt_xdate() instead of ax.set_xticks(date_IA14). Once our x axis is filled with proper datetime objects, matplotlib is smart enough to get us to reasonable xtick spacing.
If you dislike the defaults, check out matplotlib.dates.DayLocator() or the HourLocator() or the MonthLocator(), whatever meets your regular interval needs. You can apply it to your axes object like this:
ax.xaxis.set_major_locator(matplotlib.dates.DayLocator())
https://matplotlib.org/3.1.1/api/dates_api.html#matplotlib.dates.DayLocator
I'm obviously making a very basic mistake in adding a rolling mean plot to my figure.
The basic plot of close prices works fine, but as soon as I add the rolling mean to the plot, the x-axis dates get screwed up and I can't see what it's trying to do.
Here's the code:
import pandas as pd
import matplotlib.pyplot as plot
df = pd.read_csv('historical_price_data.csv')
df['Date'] = pd.to_datetime(df.Date, infer_datetime_format=True)
df.sort_index(inplace=True)
ax = df[['Date', 'Close']].plot(figsize=(14, 7), x='Date', color='black')
rolling_mean = df.Close.rolling(window=7).mean()
plot.plot(rolling_mean, color='blue', label='Rolling Mean')
plot.show()
With this sample data set I am getting this figure:
Given this simplicity of this code, I'm obviously making a very basic mistake, I just can't see what it is.
EDIT: Interesting, although #AndreyPortnoy's suggestion to set the index to Date results in the odd error that Date is not in the index, when I use the built-in's per his suggestion, the figure is no longer a complete mess, but for some reason the x-axis is reversed, and the ticks are no longer dates, but apparently ints (?) even though df.types shows Date is datetime64[ns]
#Sandipan\ Dey: Here's what the dataset looks like. Per code above I'm using pd.to_datetime() to convert to datetime64, and have tried df[::-1] to fix the problem where it is reversed when the 2nd plot (mov_avg) is added to the figure (but not reversed when figure only has the 1 plot.)
The fact that your dates for the moving averages start at 1970 suggests that an integer range index is used. It was generated by default when you read in the csv file. Try inserting
df.set_index('Date', inplace=True)
before
df.sort_index(inplace=True)
Then you can do
ax = df['Close'].plot(figsize=(14, 7), color='black')
rolling_mean = df.Close.rolling(window=7).mean()
plot.plot(rolling_mean, color='blue', label='Rolling Mean')
Note that I'm not passing x explicitly, letting pandas and matplotlib infer it.
You can simplify your code by using the builtin plotting facilities like so:
df['mov_avg'] = df['Close'].rolling(window=7).mean()
df[['Close', 'mov_avg']].plot(figsize=(14, 7))
I am plotting some columns of a csv using Pandas/Matplotlib. The index column is the time in seconds (which has very high number).
For example:
401287629.8
401287630.8
401287631.7
401287632.8
401287633.8
401287634.8
I need this to be printed as my xticklabel when i plot. But it is changing the number format as shown below:
plt.figure()
ax = dfPlot.plot()
legend = ax.legend(loc='center left', bbox_to_anchor=(1,0.5))
labels = ax.get_xticklabels()
for label in labels:
label.set_rotation(45)
label.set_fontsize(10)
I couldn't find a way for the xticklabel to print the exact value rather than shortened version of it.
This is essentially the same problem as How to remove relative shift in matplotlib axis
The solution is to tell the formatter to not use an offset
ax.get_xaxis().get_major_formatter().set_useOffset(False)
Also related:
useOffset=False in config file?
https://github.com/matplotlib/matplotlib/issues/2400
https://github.com/matplotlib/matplotlib/pull/2401
If it's not rude of me to point out, you're asking for a great deal of precision from a single chart. Your sample data shows a six-second difference over two times that are both over twelve and a half-years long.
You have to cut your cloth to your measure on this one. If you want to keep the years, you can't keep the seconds. If you want to keep the seconds, you can't have the years.