Make line chart with multiple series and error bars - python

I'm hoping to create a line graph which shows the changes to flowering and fruiting times (phenophases) from year to year. For each phenophase I'd like to plot the average Day of Year and, if possible, show the min and max for each year as an error bar. I've filtered down all the data I need in a few data frames, grouped it all in a sensible way, but I can't figure out how to get it all to plot. Here's a screen grab of where I'm at: Imgur
All the examples I've found adding error bars have been based on formulas or other equal amounts over/under, but in my case the max/min will be different so I'm not sure how to integrate that. Possible just create a list of each column's data and feed that to plot? I'm playing with that now but not getting far.
Also, if anyone has general suggestions as to better ways to present this data I'm all ears. I've looked into Gantt plots but didn't get far with them, as this seems a bit more straight-forward just using matplotlib. I'm happy to put some demo data or the rest of my notebook up if anyone thinks that would help.
Edit: Here's some sample data and the code from my notebook: Gist
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline
pd.set_option('display.max_columns', 40)
tick_spacing = 1
dfClean = df[['Site_Cluster', 'Species', 'Phenophase_Name',
'Phenophase_Status', 'Observation_Year', 'Day_of_Year']]
dfClean = dfClean[dfClean.Phenophase_Status == 1]
PhenoNames = ['Open flowers', 'Ripe fruits']
dfLakes = dfClean[(dfClean.Phenophase_Name.isin(PhenoNames))
& (dfClean.Site_Cluster == 'Lakes')
& (dfClean.Species == 'lapponica')]
dfLakesGrouped = dfLakes.groupby(['Observation_Year', 'Phenophase_Name'])
dfLakesReady = dfLakesGrouped.Day_of_Year.agg([np.min, np.mean, np.max]).round(0)
dfLakesReady = dfLakesReady.unstack()
print(dfLakesReady['mean'].plot())

Here's another answer:
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
rng = date_range(start='2015-01-01', periods=5, freq='24H')
df = DataFrame({'y':np.random.normal(size=len(rng))}, index=rng)
y1 = df['y']
y2 = (y1*3)
sd1 = (y1*2)
sd2 = (y1*2)
fig,(ax1,ax2) = plt.subplots(2,1,sharex=True)
_ = y1.plot(yerr=sd1, ax=ax1)
_ = y2.plot(yerr=sd2, ax=ax2)
Output:

Related

How do I change the order of the x axis in Python?

import pandas as pd
from pandas_datareader import wb
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.formula.api as smf
G_pop = wb.download(indicator='SP.POP.TOTL', country="DEU", end=2020, start=2010)
G_pop = G_pop.reset_index(1)
G_pop.columns = ["year", "Population in Germany"]
pd.set_option("display.max.columns", 100000)
pd.set_option("display.max.rows", 300000)
pd.set_option("display.width", 1000000)
x = G_pop["year"]
y = G_pop["Population in Germany"]
plt.xticks(rotation=45, ha='left')
plt.plot(x,y)
plt.show()
I'm new to programming and am trying to modulate graphs with the World Bank database. It works quite well apart from my X-axis. Does anyone know how I can convert them? Since the left is 2000 and the right is 2020 ascending. It is currently the case that the left on the x axis is 2020 and it is descending. I've been struggling with this problem for two days and can't get any further. invert_xaxis() and invert_yaxis(). I've tried both and it only gives me error messages. I would be very thankful for any help.
My code and the Graph wit the wrong x axis:
Picture of the wrong Graph:
Add this line:
G_pop.columns = ["year", "Population in Germany"]
G_pop['year'] = pd.to_numeric(G_pop['year']) # Convert text to numeric for automatic sorting
pd.set_option("display.max.columns", 100000)
Output:

Howto force Pandas and native matplotlib to share axis

I folks,
Consider the following example
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
fig, (ax1,ax2) = plt.subplots(2,1)
dates = pd.date_range("2018-01-01","2019-01-01",freq = "1d")
x = pd.DataFrame(index = dates, data = np.linspace(0,1,len(dates)) )
x.plot(ax=ax1)
y = np.random.random([len(dates),100]) * x.values
ax2.pcolormesh(range(len(x)), np.linspace(-1,1,100), y.T)
plt.show()
At this point, I would like the both axis (ax1,ax2) to share the x-axis, i.e. displaying proper pandas dates on the second axis. sharex=True does not seem to work. How can I achieve that? I tried different possibilities which did not work out.
Edit: Since the pandas date formatting is superior to the native matplotlib formatting, please provide me with a solution where pandas date formatting is used (for instance, zooming with an interactive environment works much better with pandas date formatting). Thanks You!
One way to do it would be to do all the plotting with matplotlib, this way there are no problems with the different time formats being used:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
fig, (ax1,ax2) = plt.subplots(2,1, sharex='col')
dates = pd.date_range("2018-01-01","2019-01-01",freq = "1d")
x = pd.DataFrame(index = dates, data = np.linspace(0,1,len(dates)) )
#x.plot(ax=ax1)
ax1.plot(x.index, x.values)
y = np.random.random([len(dates),100]) * x.values
ax2.pcolormesh(x.index, np.linspace(-1,1,100), y.T)
fig.tight_layout()
plt.show()
This gives the following plot:
What seems to work fine is to first plot the same line into the axes that should host the image, then plot the image, then remove the line again. What this does is that it tells pandas to apply its locators and formatters to that axes; they will stay after removing the line.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
fig, (ax1,ax2) = plt.subplots(2,1, sharex=True)
dates = pd.date_range("2018-01-01","2019-01-01",freq = "1d")
x = pd.DataFrame(index = dates, data = np.linspace(0,1,len(dates)) )
x.plot(ax=ax1)
y = np.random.random([len(dates),100]) * x.values
x.plot(ax=ax2, legend=False)
ax2.pcolormesh(dates, np.linspace(-1,1,100), y.T)
ax2.lines[0].remove()
plt.show()
Note that there may be caveats of this solution when zooming or panning. Consider it more like a hack and use it as long as it works, but don't blame anyone once it doesn't.

How to format x-axis time-series tick marks with missing dates

How can I format the x-axis so that the spacing between periods is "to scale". As in, the distance between 10yr and 30yr should be much larger than the distance between 1yr and 2yr.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import Quandl as ql
yield_ = ql.get("USTREASURY/YIELD")
today = yield_.iloc[-1,:]
month_ago = yield_.iloc[-1000,:]
df = pd.concat([today, month_ago], axis=1)
df.columns = ['today', 'month_ago']
df.plot(style={'today': 'ro-', 'month_ago': 'bx--'},title='Treasury Yield Curve, %');
plt.show()
I want my chart to look like this...
I think doing this while staying purely within Pandas might be tricky. You first need to create a new matplotlib figure and axe. The following might not work exactly but will give you a good idea.
df['years']=[1/12.,0.25,0.5,1,2,3,5,7,10,20,30]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
df.plot(x='years',y='today',ax=ax,kind='scatter')
df.plot(x='years',y='month_ago',ax=ax,kind='scatter')
plt.show()
If you want your axe labels to look like your chart you'll also need to set the lower and upper limit of your axis so they look good and then do something like:
ax.set_xticklabels(list(df.index))

Is it possible to plot a statistical time series with dates in Seaborn

I'm trying to plot a statistical time series using Seaborn but I can't seem to figure it out. I've tried using both the lmplot and tsplot methods but am obviously missing something key.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as pylab
p = pd.DataFrame({
"date": pd.date_range('1/1/2015', periods = 12),
"values":range(1,13)
})
# Regular Matplotlib (via pandas) works
p.plot(x = "date", style = 'o--')
# Can't get lmplot to work
sns.lmplot(x = "date", y = "values", data = p)
# Can't get tsplot to work either
sns.tsplot(time = "date", value = "values", data = p)
Sorry I can't add this as a comment as I'm not rated high enough.
I've been battling through timeseries recently, and the following SO post is pretty much exactly the same as yours, with the same question about confidence intervals:
Plotting time-series data with seaborn

Add date tickers to a matplotlib/python chart

I have a question that sounds simple but it's driving me mad for some days. I have a historical time series closed in two lists: the first list is containing prices, let's say P = [1, 1.5, 1.3 ...] while the second list is containing the related dates, let's say D = [01/01/2010, 02/01/2010...]. What I would like to do is to plot SOME of these dates (when I say "some" is because the "best" result I got so far is to show all of them as tickers, so creating a black cloud of unreadable data in the x-axis) that, when you zoom in, are shown more in details. This picture is now having the progressive automated range made by Matplotlib:
Instead of 0, 200, 400 etc. I would like to have the dates values that are related to the data-point plotted. Moreover, when I zoom-in I get the following:
As well as I get the detail between 0 and 200 (20, 40 etc.) I would like to get the dates attached to the list.
I'm sure this is a simple problem to solve but I'm new to Matplotlib as well as to Python and any hint would be appreciated. Thanks in advance
Matplotlib has sophisticated support for plotting dates. I'd recommend the use of AutoDateFormatter and AutoDateLocator. They are even locale-specific, so they choose month-names according to your locale.
import matplotlib.pyplot as plt
from matplotlib.dates import AutoDateFormatter, AutoDateLocator
xtick_locator = AutoDateLocator()
xtick_formatter = AutoDateFormatter(xtick_locator)
ax = plt.axes()
ax.xaxis.set_major_locator(xtick_locator)
ax.xaxis.set_major_formatter(xtick_formatter)
EDIT
For use with multiple subplots, use multiple locator/formatter pairs:
import datetime
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import AutoDateFormatter, AutoDateLocator, date2num
x = [datetime.datetime.now() + datetime.timedelta(days=30*i) for i in range(20)]
y = np.random.random((20))
xtick_locator = AutoDateLocator()
xtick_formatter = AutoDateFormatter(xtick_locator)
for i in range(4):
ax = plt.subplot(2,2,i+1)
ax.xaxis.set_major_locator(xtick_locator)
ax.xaxis.set_major_formatter(xtick_formatter)
ax.plot(date2num(x),y)
plt.show()
You can do timeseries plot with pandas
For detail refer this : http://pandas.pydata.org/pandas-docs/dev/timeseries.html and
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.Series.plot.html
import pandas as pd
DateStrList = ['01/01/2010','02/01/2010']
P = [2,3]
D = pd.Series([pd.to_datetime(date) for date in DateStrList])
series =pd.Series(P, index=D)
pd.Series.plot(series)
import matplotlib.pyplot as plt
import pandas
pandas.TimeSeries(P, index=D).plot()
plt.show()

Categories

Resources