Hello I have a a list that has about 150 dates that are stored in string format. I would like to set an interval so that there are only 10 ticks along the x-axis I am not sure how to do this without changing the type format.
'1980-06',
'1980-09',
'1980-12',
'1981-03',
'1981-06',
'1981-09',
'1981-12',
...
You can provide the ticks list you want to show. To get that ticks list, just divide your date list into ten parts and find how far each tick should be. And then use Python's indexing to get the values of ticks. Check below:
import math
dates = ['1980-06', '1980-09', '1980-12', '1981-12' ...] # Your date list
date_len = len(dates)
step = int(math.ceil(date_len/10))
ticks = dates[::step] # ticks to show on graph
Related
I have a dataframe with shape (2000, 2). There are two columns: value and date. I want to plot it with date on x axis. But since there are 2000 days there, I want to keep only 10 ticks on x axis. I tried this:
plt.plot(data["Date"], data["Value"])
plt.locator_params(axis='x', nbins=10)
plt.show()
But plot looks like this:
How to fix it?
From your plot, I'm going to assume your problem is that your "Date" column are strings, and not datetimes (or pandas' Timestamp), so matplotlib considers it as categories. If it was datetime-like, matplotlib would automatically select a somewhat-suitable tick spacing:
You would need to convert those string back to datetimes, for example with dateutil.parser
from dateutil import parser
data['Date_dt'] = data['Date'].apply(parser.parse)
or via strptime (the formatting string in args could change depending on your date format)
from datetime import datetime
data['Date_dt'] = data['Date_str'].apply(datetime.strptime, args=['%Y-%m-%d %H:%M:%S'])
If for some obscure reason, you really just want EXACTLY 10 ticks, you could do something along the lines of:
plt.xticks(pd.date_range(data['Date'].min(), data['Date'].max(), periods=10))
I would like to construct a scatter plot, using date time objects on both axis. Namely, dates (formatted as %YYYY-MM-DD) will be placed on one axis, the second axis will display 24 hour scale (i.e. from 0 to 24) and contain timestamps of events (formatted as %HH:MM in 24-hour format), such a user logging into the server, that occurred on a given date. There could be several events on a particular date, for example, a user logging 2 or 3 times.
My questions: how do I use such datetime objects, while creating a plot using matplotlib? Do I need to convert them in order to feed into matplotlib?
As in https://stackoverflow.com/a/1574146/12540580 :
You must first convert your timestamps to Python datetime objects (use datetime.strptime). Then use date2num to convert the dates to matplotlib format.
Plot the dates and values using plot_date:
dates = matplotlib.dates.date2num(list_of_datetimes)
matplotlib.pyplot.plot_date(dates, values)
I want to plot timelines, my dates are formatted as day/month/year.
When building the index, I take care of that:
# format Date
test['DATA'] = pd.to_datetime(test['DATA'], format='%d/%m/%Y')
test.set_index('DATA', inplace=True)
and with a double check I see months and days are correctly interpreted:
#the number of month reflect the month, not the day : correctly imported!
test['Year'] = test.index.year
test['Month'] = test.index.month
test['Weekday Name'] = test.index.weekday_name
However, when I plot, I see datapoints get connected erratically (although their distribution seems to be correct, since I expect a seasonality):
# Start and end of the date range to extract
start, end = '2018-01', '2018-04'
# Plot daily, weekly resampled, and 7-day rolling mean time series together
fig, ax = plt.subplots()
ax.plot(test.loc['2018', 'TMIN °C'],
marker='.', linestyle='-', linewidth=0.5, label='Daily')
I suspect it may have to do with misinterpreted dates, or that dates are not put in the right sequence, but could not find a way to verify where an error may be.
Could you help validating how to import correctly my timeseries ?
Oh, it was super simple. I assumed datetime was automatically sorted, instead one must sort :
test.loc['2018-01':'2018-03'].sort_index().index #sorted
test.loc['2018-01':'2018-03'].index #not sorted
This question may be delated or marked as duplicate, I let it for moderators:
Pandas - Sorting a dataframe by using datetimeindex
I am trying to plot a drone's altitude vs time (Time on the X-axis and altitudes on the Y-axis). I converted my list of timestamps into a MatPlotLib-readable format using dates = matplotlib.dates.date2num(timestamps). The length of the altitudes list and the converted timestamps list is 16587 exactly, so there is no mismatch there. The graph came out absolutely horrendous and I would like to know how to make this readable with so much data. My full code is
timestamps = []
for stamp in times: #convert list of timestamp Strings to Python timestamp objects
stamp = date + " " + stamp
stamp = stamp.replace('.', ':') # We want the milliseconds to be behind a colon so it can be easily formatted to DateTime
stamp = datetime.strptime(stamp, '%Y-%m-%d %H:%M:%S:%f')
timestamps.append(stamp)
dates = matplotlib.dates.date2num(timestamps)
for alt in altitudes:
alt = round(float(alt), 2)
plt.plot_date(dates, altitudes)
plt.show()
The graph is indeed unreadable, even if it is not clear what's your expectation.
When plotting a huge number of points, I guess is better to specify also the alpha parameter to add some transparency and "see through" clouds of overlapping points.
Then you can specify your x and yticks (maybe also with rotation parameter) to show a smaller portion of them and add plt.grid(True)
These are just basic suggestions. Try to be more specific in "make this readable".
I have stock data that contains the ohlc attribute and I want to make a RSI indicator plot by calculating the close value. Because the stock data is sorted by date, the date must be changed to a number using date2num. But the calculation result of the close attribute becomes a list of RSI values when plotted overlapping.
I think the length of the results of the RSI is not the same as the date length, but after I test by doing len(rsi) == len(df ['date']) show the same length. Then I try not to use the x-axis date but the list of number made by range(0, len(df['date'])) and plot show as I expected.
#get data
df = df.tail(1000)
#covert date
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].apply(mdates.date2num)
#make indicator wit TA-Lib
rsi = ta.RSI(df['close'], timeperiod=14)
#plot rsi indicator wit TA-Lib
ax1.plot(df['date'], rsi)
ax2.plot(range(0, len(df['date'])), rsi)
#show chart
plt.show()
I expect the output using the x-axis date to be the same as the x-axis list of numbers
Image that shows the difference
It seems that matplotlib chooses the x-ticks to display (when chosen automatically) to show "round" numbers. So in your case of integers, a tick every 200; in your case of dates, every two months.
You seem to expect the dates to follow the same tick steps as the integers, but this will cause the graph to show arbitrary dates in the middle of the month, which isn't a good default behavior.
If that's the behavior you want, try something of this sort:
rng = range(len(df['date']))
ax2.plot(rng, rsi) # Same as in your example
ax2.set_xlim((rng[0], rng[-1])) # Make sure no ticks outside of range
ax2.set_xticklabels(df['date'].iloc[ax2.get_xticks()]) # Show respective dates in the locations of the integers
This behavior can of course be reversed if you wish to show numbers instead of dates, using the same ticks as the dates, but I'll leave that to you.
After I tried several times, I found the core of the problem. On weekends the data is not recorded so there is a gap on the date. The matplotlib x-axis date will be given a gap on weekends even though there is no data on that day, so the line plot will overlap.
For the solution I haven't found it, but for the time being I use the list of numbers.