I have plotted a timeseries of carbon fluxes over 16 years at a particular site. I would like the x-axis to have years (1992-2007) instead of year number (1-16). When I set the x-axis to have a min value of 1992 and a max value of 2007, the graph doesnt appear on the plot, but when I dont set the min/max years, it appears. I'm not sure what I am doing wrong. I plotted another timeseries over one year and was able to label the x-axis with the months using MonthLocator, but am having no luck with YearLocator. Here is the code that I have written:
fig=pyplot.figure()
ax=fig.gca()
ax.plot_date(days,nee,'r-',label='model daily nee')
ax.plot_date(days,nee_obs,'b-',label='obs daily nee')
# locate the ticks
ax.xaxis.set_major_locator(YearLocator())
# format the ticks
ax.xaxis.set_major_formatter(DateFormatter('%Y'))
# set years 1992-2007
datemin = datetime.date(1992, 1, 1)
datemax = datetime.date(2007, 12, 31)
ax.set_xlim(datemin, datemax)
labels=ax.get_xticklabels()
setp(labels,'rotation',45,fontsize=10)
legend(loc="upper right", bbox_to_anchor=[0.98, 0.98],
ncol=1, shadow=True)
pyplot.ylabel('NEE($gC m^{-2} day^{-1}$)')
pyplot.title('Net Ecosystem Exchange')
pyplot.savefig('nee_obs_model_HF_daily.pdf')
# rotates and right aligns the x labels, and moves the bottom of the
# axes up to make room for them
#fig.autofmt_xdate()
pyplot.show()
pyplot.close()
I think Andrey Sobolev is right. When I run your script, with minor adjustments, :-), with some data the I have with the date field as a date, I get the years to show up with no problems. It's virtually your code, with the exception of:
fh = open(thisFileName)
# a numpy record array with fields: date, nee, nee_obs
# from a csv, thisFileName with format:
# Date,nee,nee_obs
# 2012-02-28,137.20,137.72
matplotlib.mlab.csv2rec(fh)
fh.close()
r.sort()
days = r.date
nee = r.nee
nee_obs = r.nee_obs
...
...
and then I get:
Much of this solution what borrowed from here. Let me know if I misinterpreted what you need.
Related
I am new to Python and visualization and am trying to get my plot to only display the xticks in the middle of each year (see my plot below). I have tried a couple of things with date_range but now my plot is displaying two xticks for each year, one for the beginning of the year and one for the middle of the year. How can I get rid of the xticks that are at the beginning of each year and only keep the ones at the middle of each year?
Here's my code and plot:
texasdrought['ValidStart']=pd.to_datetime(texasdrought['ValidStart'])
droughtMask = texasdrought[texasdrought['ValidStart'].dt.year.between(2005,2015)]
# Set the figure size
plt.figure(figsize = (30,16))
# Create a mask with the dates
dates = droughtMask["ValidStart"]
# Categorize droughts
droughtcat = {
'D4 - Exceptional Drought': droughtMask["D4"],
'D3 - Extreme Drought': droughtMask["D3"],
'D2 - Severe Drought': droughtMask["D2"],
'D1 - Moderate Drought': droughtMask["D1"],
'D0 - Abnormally Dry': droughtMask["D0"]
}
fig, ax = plt.subplots()
ax.stackplot(dates, droughtcat.values(), labels=droughtcat.keys(),colors=['#660000','#FF0000','#FF6600','#FFCC99','#FFFF00'])
# Format y-axis to percentages
yearsFmt = mdates.DateFormatter("'%y")
ax.xaxis.set_major_formatter(yearsFmt)
#ax.yaxis.set_major_formatter(mtick.PercentFormatter())
# Add legend location
ax.legend(loc='upper left')
# Add title to the stackplot
ax.set_title('Drought in Texas (2005-2015)')
ticks = pd.date_range('2005-01-01', '2015-12-31', freq='6M')
plt.xticks(ticks)
# Add axis labels
ax.set_xlabel('Year')
ax.set_ylabel('Drought Intensity')
# Save figure as DroughtPlot.jpg
fig.savefig('DroughtPlot.jpg')
plt.show()
Thank you.
Use this instead:
ticks = pd.date_range('2005-01-01', '2015-12-31', freq='A-JUN') which will generate monthly dates ending in middle of each year at the last date of middle month.
DatetimeIndex(['2005-06-30', '2006-06-30', '2007-06-30', '2008-06-30',
'2009-06-30', '2010-06-30', '2011-06-30', '2012-06-30',
'2013-06-30', '2014-06-30', '2015-06-30'],
dtype='datetime64[ns]', freq='A-JUN')
I'm working on an assignment from school, and have run into a snag when it comes to my stacked area chart.
The data is fairly simple: 4 columns that look similar to this:
Series id
Year
Period
Value
LNS140000
1948
M01
3.4
I'm trying to create a stacked area chart using Year as my x and Value as my y and breaking it up over Period.
#Stacked area chart still using unemployment data
x = d.Year
y = d.Value
plt.stackplot(x, y, labels = d['Period'])
plt.legend(d['Period'], loc = 'upper left')
plt.show()enter code here`
However, when I do it like this it only picks up M01 and there are M01-M12. Any thoughts on how I can make this work?
You need to preprocess your data a little before passing them to the stackplot function. I took a look at this link to work on an example that could be suitable for your case.
Since I've seen one row of your data, I add some random values to the dataset.
import pandas as pd
import matplotlib.pyplot as plt
dd=[[1948,'M01',3.4],[1948,'M02',2.5],[1948,'M03',1.6],
[1949,'M01',4.3],[1949,'M02',6.7],[1949,'M03',7.8]]
d=pd.DataFrame(dd,columns=['Year','Period','Value'])
years=d.Year.unique()
periods=d.Period.unique()
#Now group them per period, but in year sequence
d.sort_values(by='Year',inplace=True) # to ensure entire dataset is ordered
pds=[]
for p in periods:
pds.append(d[d.Period==p]['Value'].values)
plt.stackplot(years,pds,labels=periods)
plt.legend(loc='upper left')
plt.show()
Is that what you want?
So I was able to use Seaborn to help out. First I did a pivot table
df = d.pivot(index = 'Year',
columns = 'Period',
values = 'Value')
df
Then I set up seaborn
plt.style.use('seaborn')
sns.set_style("white")
sns.set_theme(style = "ticks")
df.plot.area(figsize = (20,9))
plt.title("Unemployment by Year and Month\n", fontsize = 22, loc = 'left')
plt.ylabel("Values", fontsize = 22)
plt.xlabel("Year", fontsize = 22)
It seems to me that the problem you are having relates to the formatting of the data. Look how the values are formatted in this matplotlib example. I would try to groupby the data by period, or pivot it in the correct format, and then graphing again.
I have a CSV file with time data as follows:
Time,Download,Upload
17:00,7.51,0.9
17:15,6.95,0.6
17:31,5.2,0.46
I import the csv into a pandas dataframe: df = pd.read_csv('speeds.csv', parse_dates=['Time'])
And then plot the graph like so:
fig, ax = plt.subplots(figsize=(20, 7))
df.plot(ax=ax)
majorFmt = mdates.DateFormatter('%H:%M:')
minorFmt = mdates.DateFormatter('%H:%M:')
hour_locator = mdates.HourLocator()
min_locator = mdates.MinuteLocator(byminute=[15, 30, 45])
ax.xaxis.set_major_locator(hour_locator)
ax.xaxis.set_major_formatter(majorFmt)
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90, fontsize=10)
ax.xaxis.set_minor_locator(min_locator)
ax.xaxis.set_minor_formatter(minorFmt)
plt.setp(ax.xaxis.get_minorticklabels(), rotation=90, fontsize=8)
However the final graph starts from 00:00 like so, although the CSV file starts at 17:00:
How comes the graph doesnt start at 17:00 also?
Another problem (while im here) is the major lables dont line up with the major markers, they are shifted left slightly how would I fix that?
First question - graph doesn't start at 17:00:
Your csv only gives times (no dates) and it rolls over midnight. Pandas implicitely adds the current date to all times, so that times after midnight, which pertain to the next day, get the same date a times before midnight. Therefore you'll have to adjust the date part:
days = 0
df['Datetime']=df['Time']
for i in df.index:
if i > 0 and df.at[i,'Time'] < df.at[i-1,'Time']:
days += 1
df.at[i,'Datetime'] = df.at[i,'Time'] + DateOffset(days=days)
and then use the Datetime column on your x axis.
Second question - shifted major markers:
Set horizontal alingment
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90, fontsize=10, horizontalalignment='center')
This is a first time I am working with matplotlib and the task might be really trivial, but turns out to be hard for me.
I have following data: tickets numbers, dates when they were resolved and dates when whey were supposed to be resolved.
What I want to do is to draw a plot with tickets on x axis and dates on y axis. Then for every ticket I need to have 2 bars: first one with height equal to the date it was resolved, and another one with height equal to the date is was supposed to be resolved.
What I have right now:
a list of all tickets
tickets = []
a list with all dates (both resolved and expected)
all_dates = []
lists with sets of (ticket, datetime) for expected time and resolved time:
tickets_estimation = []
tickets_real = []
the code I am at right now:
plt.xticks(arange(len(tickets)), tickets, rotation=90)
plt.yticks(arange(len(all_dates)), all_dates)
plt.show()
which shows me following plot:
So how can I do the rest? Please pay attention that I need to map tickets numbers at X axis to the dates on Y axis.
Ok, here is is simplified at where I stack:
I cannot figure out how to draw even a single bar so its X axis will be a ticket and its Y axis will be a date of it's resolution.
For example:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from numpy import arange
date = ['3 Jan 2013', '4 Jan 2013', '5 Jan 2013']
tickets = ['ENV-666', 'ENV-999', 'ENV-1000']
# Convert to matplotlib's internal date format.
y = mdates.datestr2num(date)
x = arange(len(tickets))
fig, ax = plt.subplots()
ax.plot(x,y)
ax.yaxis_date()
# Optional. Just rotates x-ticklabels in this case.
fig.autofmt_xdate()
plt.show()
This works find, it shows a plot with a line. But if I change
ax.plit(x,y) to ax.bar(x,y) I am receiving an error:
ValueError: ordinal must be >= 1
Bar is your friend: https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.bar.html
Then just use your fantasy to plot it as you like. For example a black line for the due date, red boxes for overdue and green boxes for on time.
supp = np.array([3,5,7])
res = np.array([1,7,7])
H = res-supp
H[H==0] = 0.01
ind = np.arange(len(D1))
plt.bar(ind[H>0], H[H>0], bottom=supp[H>0], color='r');
plt.bar(ind[H<0], H[H<0], bottom=supp[H<0], color='g');
plt.bar(ind, 0.1, bottom=supp, color='k');
I have the following persistent problem:
The following code should draw a straight line:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
d = pd.date_range(start="1/1/2012", end="2/1/2012", freq="B")
v = np.linspace(1,10,len(d))
plt.plot_date(d,v,"-")
But all I get is a jagged line because "plot_date" somehow fills up the dates in "d" with the weekends.
Is there a way to force matplotlib to take my dates (only business days) as is without filing them up with weekend dates?
>>>d
DatetimeIndex(['2012-01-02', '2012-01-03', '2012-01-04', '2012-01-05',
'2012-01-06', '2012-01-09', '2012-01-10', '2012-01-11',
'2012-01-12', '2012-01-13', '2012-01-16', '2012-01-17',
'2012-01-18', '2012-01-19', '2012-01-20', '2012-01-23',
'2012-01-24', '2012-01-25', '2012-01-26', '2012-01-27',
'2012-01-30', '2012-01-31', '2012-02-01'],
dtype='datetime64[ns]', freq='B')
plot_date does a trick, it converts dates to number of days since 1-1-1 and uses these numbers to plot, then converts the ticks to dates again in order to draw nice tick labels. So using plot_date each day count as 1, business or not.
You can plot your data against a uniform range of numbers but if you want dates as tick labels you need to do it yourself.
d = pd.date_range(start="1/1/2012", end="2/1/2012", freq="B")
v = np.linspace(1,10,len(d))
plt.plot(range(d.size), v)
xticks = plt.xticks()[0]
xticklabels = [(d[0] + x).strftime('%Y-%m-%d') for x in xticks.astype(int)]
plt.xticks(xticks, xticklabels)
plt.autoscale(True, axis='x', tight=True)
But be aware that the labels can be misleading. The segment between 2012-01-02 and 2012-01-09 represents five days, not seven.