Once again I'm stuck in python: I can't find a nice way of representing my data.
I've got a bunch of discharges that I want to plot. However, when I do this the x-axis is too crowded with dates. How can I change the representation that it just shows years (or just a few months). My code is below, the figure of my current result is this:
bargraph
Thanks in advance!
time = pd.Series(pd.period_range('1/1/1970',
freq='M', periods=12*12))
y = np.zeros(int(len(discharge)/3))
for i in range(int(len(discharge)/3)):
y[i] = discharge.sum(axis=1)[i*3]
qdi_bar = pd.DataFrame()
qdi_bar['sum_discharge'] = y
qdi_bar.index = time
qdi_bar.plot.bar()
Edit 1: You can use this code with matplotlib:
time = pd.Series(pd.period_range('1/1/1970',
freq='M', periods=12*12))
# Change to timestamps instead or period ranges (the latter cause a TypeError when used with pyplot)
time = time.apply(lambda x : x.to_timestamp())
y = np.zeros(int(len(discharge)/3))
for i in range(int(len(discharge)/3)):
y[i] = discharge.iloc[i*3].sum()
qdi_bar = pd.DataFrame()
qdi_bar['sum_discharge'] = y
qdi_bar.index = time
#Plot with matplotlib.pyplot instead of pandas
plt.bar(qdi_bar.index, qdi_bar["sum_discharge"], width = 15)
plt.show()
This works fine for me: I am getting a barchart with only the years as ticks:
Still you can apply the suggestion from the original answer and customize the result further with matplotlib.dates.
Original post: I don't know of any way to do this with pandas.DataFrame.plot(), but if you are willing to use matplotlib instead, then you can customize your x-axis as you like with matplotlib.dates.
If I run the following, it appears to work as expected, but the y-axis is limited to the earliest and latest times in the data. I want it to show midnight to midnight. I thought I could do that with the code that's commented out. But when I uncomment it, I get the correct y-axis, yet nothing plots. Where am I going wrong?
from datetime import datetime
import matplotlib.pyplot as plt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-01-04 11:55:09']
x = []
y = []
for i in range(0, len(data)):
t = datetime.strptime(data[i], '%Y-%m-%d %H:%M:%S')
x.append(t.strftime('%Y-%m-%d')) # X-axis = date
y.append(t.strftime('%H:%M:%S')) # Y-axis = time
plt.plot(x, y, '.')
# begin = datetime.strptime('00:00:00', '%H:%M:%S').strftime('%H:%M:%S')
# end = datetime.strptime('23:59:59', '%H:%M:%S').strftime('%H:%M:%S')
# plt.ylim(begin, end)
plt.show()
Edit: I also noticed that the x-axis isn't right either. The data skips Jan 2, but I want that on the axis so the data is to scale.
This is a dramatically simplified version of code dealing with over a year's worth of data with over 2,500 entries.
If Pandas is available to you, consider this approach:
import pandas as pd
data = pd.to_datetime(data, yearfirst=True)
plt.plot(data.date, data.time)
_=plt.ylim(["00:00:00", "23:59:59"])
Update per comments
X-axis date formatting can be adjusted using the Locator and Formatter methods of the matplotlib.dates module. Locator finds the tick positions, and Formatter specifies how you want the labels to appear.
Sometimes Matplotlib/Pandas just gets it right, other times you need to call out exactly what you want using these extra methods. In this case, I'm not sure why those numbers are showing up, but this code will remove them.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time)
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%m-%d')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)
I have a bar graph with multiple data series and i want to set the xaxis values to a significant value of %.2f I already tried using the set_major formatter for the first graph, but it resets the values to 0, while the values should be like the second graph.
How can I fix this?
My code look like this:
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as mtick
# select the measurement location
MATH = "import/data/place"
SAVE = "save/location"
fig, axes = plt.subplots(figsize=(12,15),nrows=2, ncols=1) # size of the plots and the placing
fig.subplots_adjust(hspace=0.5) # set space between plots
DATA = pd.read_csv(MATH,delimiter=',',usecols = [2,3,4,5,6,7,8,9,10,11,12],names = ['set_t','set_rh',
'type','math','ref','LUFFT','VPL','VPR','VVL','VVR','PRO'], parse_dates=True)
# select the data
temp = DATA.loc[(DATA['type']=='T')&(DATA['math']=='dif')] # dif temperature data
rh = DATA.loc[((DATA['type']=='RH')&(DATA['math']=='dif'))] # dif relative humidity data
# plot temperature
fg = temp.plot.bar(x='set_t',y = ['LUFFT','VPL','VPR','VVL','VVR','PRO'],
color = ['b','firebrick','orange','forestgreen','darkturquoise','indigo'],
ax=axes[0])
fg.grid(True)
fg.set_ylabel('$ΔT$(°C)',fontsize = 12)
fg.set_xlabel('ref $T$ (°C)',fontsize = 12)
fg.set_title('Difference in T from reference at constant relative humidity 50%',fontsize = 15)
fg.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.2f'))
fg.xaxis.set_major_formatter(mtick.FormatStrFormatter('%.2f'))
# plot relative humidity
df = rh.plot.bar(x='set_t',y = ['LUFFT','VPL','VPR','VVL','VVR','PRO'],
color = ['b','firebrick','orange','forestgreen','darkturquoise','indigo'],
ax=axes[1])
df.grid(True)
df.set_ylabel('$ΔU$(%)',fontsize = 12)
df.set_xlabel('ref $T$ (°C)',fontsize = 12)
df.set_title('Difference in U from reference at constant relative humidity 50%',fontsize = 15)
plt.tight_layout()
plt.savefig(SAVE + "_example.jpg")
plt.show()
A sample of my data:
07:40:00,07:50:00,39.85716354999982,51.00504745588235,T,dif,,0.14283645000018197,-0.07502069285698099,-0.15716354999978677,0.0020201234696060055,-0.07111703837193772,-0.0620802166664447,
07:40:00,07:50:00,39.85716354999982,51.00504745588235,RH,dif,,-0.40504745588239643,3.994952544117652,2.994952544117652,4.994952544117652,,6.994952544117652,
08:40:00,08:50:00,34.861160704969016,51.1297401832298,T,dif,,0.22883929503095857,0.2509082605481865,-0.2575243413326831,0.24864321659958222,0.14092262836431502,-0.04441070496899613,
08:40:00,08:50:00,34.861160704969016,51.1297401832298,RH,dif,,-0.32974018322978793,3.8702598167702007,2.8702598167702007,4.870259816770201,,6.870259816770201,
This is due to the fact that with a grouped barplot like this, made by Pandas, the x-axes loses its actual 'range', and the values associated with the tick position become the position itself. That's a bit cryptic, but you can see with fg.get_xlim() that the values have lost 'touch' with the original data, and are simply increasing integers. You can explore/debug the 'values' and 'positions' Matplotlib uses if you provide a FuncFormatter with a function like this:
def check_pos(val, pos):
print(val, pos)
return '%.2f' % val
This basically shows that no formatter is going to work for your case.
Luckily the ticklabels are set correctly (as text), so you could parse these to float, and format them as you wish.
Remove your formatter altogether, and set the xticklabels with:
fg.set_xticklabels(['%.2f' % float(x.get_text()) for x in fg.get_xticklabels()])
Note that Matplotlib itself is perfectly capable of preserving the correct tickvalues in combination with a bar plot, but you would have to do the 'grouping' etc yourself, so that's not very convenient as well.
I am plotting a large dataset from a database using matplotlib and I use mpld3 to pass the figure to the browser. On the x-axis there are dates. The issue here is that while plotting without the mpld3 works perfect, when I use it, the dates don't appear correctly.
Here is my code:
date1 = '2015-04-22 20:28:50'
date2 = '2015-04-23 19:42:09'
db = Base('monitor').open()
result_set = db.select(['MeanVoltage','time'],"time>=start and time<=stop", start=date1, stop=date2)
V = [float(record.MeanVoltage) for record in result_set if record != 0]
Date = [str(record.time) for record in result_set]
dates = [datetime.datetime.strptime(record, '%Y-%m-%d %H:%M:%S') for record in Date]
dates = matplotlib.dates.date2num(dates)
fig, ax = plt.subplots()
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y %H:%M:%S' ))
plt.gcf().autofmt_xdate()
ax.plot(dates,V)
#mpld3.fig_to_html(fig)
#mpld3.show(fig)
plt.show()
that shows the plot perfectly like this:
.
Now, if I comment out this line only:
plt.show()
and uncomment these two:
mpld3.fig_to_html(fig)
mpld3.show(fig)
the figure appears in the browser like this:
As you can see, the only issue is how the dates appear in the x-axis.
Is there any way to overcome it?
Before creating the HTML figure, add the following line to specify that it is a date axis:
ax.xaxis_date()
The answer above is correct.
If you are exclusively passing through dates, for example
df["Date"][0] = "2018-11-23"
Then you can also pass that through in the format native mpl format below, without making an ordinal value by using date2num.
df["Date"] = [dt.datetime.strptime(d, '%Y-%m-%d') for d in df["Date"]]
ax.plot(df["Dates"].tolist(), some_y_value_list)
All I want is quite straight forward, I just want the locator ticks to start at a specified timestamp:
peudo code: locator.set_start_ticking_at( datetime_dummy )
I have no luck finding anything so far.
Here is the portion of the code for this question:
axes[0].set_xlim(datetime_dummy) # datetime_dummy = '2015-12-25 05:34:00'
import matplotlib.dates as matdates
seclocator = matdates.SecondLocator(interval=20)
minlocator = matdates.MinuteLocator(interval=1)
hourlocator = matdates.HourLocator(interval=12)
seclocator.MAXTICKS = 40000
minlocator.MAXTICKS = 40000
hourlocator.MAXTICKS = 40000
majorFmt = matdates.DateFormatter('%Y-%m-%d, %H:%M:%S')
minorFmt = matdates.DateFormatter('%H:%M:%S')
axes[0].xaxis.set_major_locator(minlocator)
axes[0].xaxis.set_major_formatter(majorFmt)
plt.setp(axes[0].xaxis.get_majorticklabels(), rotation=90 )
axes[0].xaxis.set_minor_locator(seclocator)
axes[0].xaxis.set_minor_formatter(minorFmt)
plt.setp(axes[0].xaxis.get_minorticklabels(), rotation=90 )
# other codes
# save fig as a picture
The x axis ticks of above code will get me:
How do I tell the minor locator to align with the major locator?
How do I tell the locators which timestamp to start ticking at?
what I have tried:
set_xlim doesn't do the trick
seclocator.tick_values(datetime_dummy, datetime_dummy1) doesn't do anything
Instead of using the interval keyword parameter, use bysecond and byminute to specify exactly which seconds and minutes you with to mark. The bysecond and byminute parameters are used to construct a dateutil rrule. The rrule generates datetimes which match certain specified patterns (or, one might say, "rules").
For example, bysecond=[20, 40] limits the datetimes to those whose seconds
equal 20 or 40. Thus, below, the minor tick marks only appear for datetimes
whose soconds equal 20 or 40.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as matdates
N = 100
fig, ax = plt.subplots()
x = np.arange(N).astype('<i8').view('M8[s]').tolist()
y = (np.random.random(N)-0.5).cumsum()
ax.plot(x, y)
seclocator = matdates.SecondLocator(bysecond=[20, 40])
minlocator = matdates.MinuteLocator(byminute=range(60)) # range(60) is the default
seclocator.MAXTICKS = 40000
minlocator.MAXTICKS = 40000
majorFmt = matdates.DateFormatter('%Y-%m-%d, %H:%M:%S')
minorFmt = matdates.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_locator(minlocator)
ax.xaxis.set_major_formatter(majorFmt)
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90)
ax.xaxis.set_minor_locator(seclocator)
ax.xaxis.set_minor_formatter(minorFmt)
plt.setp(ax.xaxis.get_minorticklabels(), rotation=90)
plt.subplots_adjust(bottom=0.5)
plt.show()
#unutbu: Many thanks: I've been looking everywhere for the answer to a related problem!
#eliu: I've adapted unutbu's excellent answer to demonstrate how you can define lists (to create different 'dateutil' rules) which give you complete control over which x-ticks are displayed. Try un-commenting each example below in turn and play around with the values to see the effect. Hope this helps.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
idx = pd.date_range('2017-01-01 05:03', '2017-01-01 18:03', freq = 'min')
df = pd.Series(np.random.randn(len(idx)), index = idx)
fig, ax = plt.subplots()
# Choose which major hour ticks are displayed by creating a 'dateutil' rule e.g.:
# Only use the hours in an explicit list:
# hourlocator = mdates.HourLocator(byhour=[6,12,8])
# Use the hours in a range defined by: Start, Stop, Step:
# hourlocator = mdates.HourLocator(byhour=range(8,15,2))
# Use every 3rd hour:
# hourlocator = mdates.HourLocator(interval = 3)
# Set the format of the major x-ticks:
majorFmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_locator(hourlocator)
ax.xaxis.set_major_formatter(majorFmt)
#... and ditto to set minor_locators and minor_formatters for minor x-ticks if needed as well)
ax.plot(df.index, df.values, color = 'black', linewidth = 0.4)
fig.autofmt_xdate() # optional: makes 30 deg tilt on tick labels
plt.show()