seaborn plots empty white plane - python

When plotting a CSV with pandas the values below with seaborn:
value date
0.296776 2016-07-01
0.273482 2016-08-01
0.207982 2016-09-01
0.176148 2016-10-01
0.124666 2016-11-01
0.072311 2016-12-01
0.042762 2017-01-01
0.043232 2017-02-01
0.083472 2017-03-01
sns.tsplot(time="date", value="value", data=df)
I only get an empty white plane - what is wrong?

The thing with .tsplot is that it's meant to plot timeseries with representation of uncertainty, so if you are not providing to the function a field in the DataFrame that identifies the sampling unit, it's not going to work.
To bypass this without going through the trouble of modifying your .csv dataset, you should not use the data argument:
>>> sns.tsplot(df['value'],time=df['date'])
<matplotlib.axes._subplots.AxesSubplot object at 0x07DA7A30>
>>> sns.plt.show()

Related

How do I construct ticks and labels when ploting large temporal series with matplotlib where the dates are a str column?

I have a pandas data series to display that contains basically a temporal series where each observation is contained in df['date'] (x-axis) and df['value'] (y-axis):
df['date']
0 2004-01-02
1 2004-01-05
2 2004-01-06
3 2004-01-07
4 2004-01-08
...
4527 2021-12-27
4528 2021-12-28
4529 2021-12-29
4530 2021-12-30
4531 2021-12-31
Name: session, Length: 4532, dtype: object
Notice how the Series contains str types:
print(type(df['date'].values[0]))
<class 'str'>
df['values'] are just integers.
If I plot the series using matplotlib and try to use df['date'] I obtain a too dense chart where the xtick labels can not be read:
ax.plot(df['date'], df['value']);
If I want to display xticks on every month change (so 2004/01, 2004/02, .... 2021/11, 2021/12) and labels just when the year changes (2004, 2005, ... 2021), which would be the best way to accomplish that either via numpy or pandas to get the arrays that .set_xticks require?
Just to further elaborate on tdy's comment (all credit on the answer to tdy), this is the code excerpt to implement to_datetime and use the locators:
ax.plot(pd.to_datetime(df['date']), df['value'])
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))

Python matplotlib: data labels for multiple line graphs

.In existing thread (Annotate Time Series plot in Matplotlib), they annotate a single line graph. I am after annotation of multiple line graphs that share the same -axis: I have two data frames which look like as follow:
df:
Value
Week
2020-04-05 0.330967
2020-04-12 1.307075
2020-04-19 2.406805
2020-04-26 2.562565
2020-05-03 2.868995
2020-05-10 5.174968
2020-05-17 5.734933
2020-05-24 6.903961
2020-05-31 7.205925
2020-06-07 9.960470
2020-06-14 11.106135
2020-06-21 12.356842
2020-06-28 13.247175
2020-07-05 13.600287
2020-07-12 15.098775
2020-07-19 16.754835
2020-07-26 18.596575
2020-08-02 20.118878
2020-08-09 21.168825
2020-08-16 21.201978
2020-08-23 21.784821
2020-08-30 22.329772
2020-09-06 23.981835
2020-09-13 23.981835
2020-09-20 23.981835
df2:
Value
Date
2020-09-27 29.003255
2020-10-04 29.642155
2020-10-11 30.872583
2020-10-18 32.492713
2020-10-25 33.436226
2020-11-01 35.187827
2020-11-08 35.589155
2020-11-15 37.185094
2020-11-22 37.575597
2020-11-29 39.273018
2020-12-06 40.047140
2020-12-13 41.621320
2020-12-20 42.563794
2020-12-27 43.750932
2021-01-03 44.823089
2021-01-10 45.797449
2021-01-17 47.109407
2021-01-24 48.045107
2021-01-31 49.472744
2021-02-07 50.355325
2021-02-14 51.717578
2021-02-21 52.602765
2021-02-28 53.886987
2021-03-07 54.888933
2021-03-14 56.108036
2021-03-21 57.226216
2021-03-28 58.345462
I plot these two data frames as a line graph using the following code:
I want to plot these data frames and want to show the data labels on the graph. For this purpose, I was following this article (https://queirozf.com/entries/add-labels-and-text-to-matplotlib-plots-annotation-examples) to plot labels on the line graph. As I have two different data frames so I tried a slightly different method to get the value of xs and ys. Here is my code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
ys = np.array([df.index,df2.index])
xs = np.array([df.Value,df2.Value])
fig, ax = plt.subplots(figsize=(12,6))
ax.plot(df.index,df['Value'],'-',color='c')
ax.plot(df2.index,df2['Value'],'-',color='g')
for x,y in zip(xs,ys):
label = "{:.2f}".format(y)
plt.annotate(label, (x,y), textcoords="offset points", ha='center')
plt.show()
When I ran the above code, it gave me the following error:
TypeError: unsupported format string passed to DatetimeIndex.__format__
Could anyone guide me where am I making the mistake?
The problems could be solved by keeping things more clear. Specifically, you make an array of appended data from the two data frames and then you sometimes use that, and sometimes use the unappended data frames, and things are getting confused.
Instead, I'd suggest just keep the data frames separate throughout, since you are clearly interpreting them as distinct because you plot them in different colors, and loop over through the dataframes so you don't duplicate code. So something like this:
df0 = pd.read_csv("data5001.csv", sep="\s+") # uninteresting, my reading in the data, but do what you have here
df1 = pd.read_csv("data5002.csv", sep="\s+")
fig, ax = plt.subplots(figsize=(16,8)) # basically what you have
ax.plot(df0['Date'], df0['Value'],'-',color='c')
ax.plot(df1['Date'], df1['Value'],'-',color='g')
plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")
for df in (df0, df1): # loop through the dataframes
for index, v in df.iterrows(): # loop through the data in each frame
label = "{:.2f}".format(v['Value']) # I assume you want the value and not the date, but, whatever, it should be clear now
plt.annotate(label, (v['Date'], v['Value']), ha='center')
I won't address the over-crowding problems since that's an entirely separate question.

Iteratively plot data through datetime in pandas dataframe

I have a dataframe here that contains a value daily since 2000 (ignore the index).
Extent Date
6453 13.479 2001-01-01
6454 13.385 2001-01-02
6455 13.418 2001-01-03
6456 13.510 2001-01-04
6457 13.566 2001-01-05
I would like to make a plot where the x-axis is the day of the year, and the y-axis is the value. The plot would contain 20 different lines, with each line corresponding to the year of the data. Is there an intuitive way to do this using pandas, or is it easier to do with matplotlib?
Here is a quick paint sketch to illustrate.
One quick way is to plot x-axis as strings:
df['Date'] = pd.to_datetime(df['Date'])
(df.set_index([df.Date.dt.strftime('%m-%d'),
df.Date.dt.year])
.Extent.unstack()
.plot()
)

How can I plot different length pandas series with matplotlib?

I've got two pandas series, one with a 7 day rolling mean for the entire year and another with monthly averages. I'm trying to plot them both on the same matplotlib figure, with the averages as a bar graph and the 7 day rolling mean as a line graph. Ideally, the line would be graph on top of the bar graph.
The issue I'm having is that, with my current code, the bar graph is showing up without the line graph, but when I try plotting the line graph first I get a ValueError: ordinal must be >= 1.
Here's what the series' look like:
These are first 15 values of the 7 day rolling mean series, it has a date and a value for the entire year:
date
2016-01-01 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 0.088473
2016-01-09 0.099122
2016-01-10 0.086265
2016-01-11 0.084836
2016-01-12 0.076741
2016-01-13 0.070670
2016-01-14 0.079731
2016-01-15 0.079187
2016-01-16 0.076395
This is the entire monthly average series:
dt_month
2016-01-01 0.498323
2016-02-01 0.497795
2016-03-01 0.726562
2016-04-01 1.000000
2016-05-01 0.986411
2016-06-01 0.899849
2016-07-01 0.219171
2016-08-01 0.511247
2016-09-01 0.371673
2016-10-01 0.000000
2016-11-01 0.972478
2016-12-01 0.326921
Here's the code I'm using to try and plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
series_two.plot(ax=ax)
plt.show()
Here's the graph that generates:
Any help is hugely appreciated! Also, advice on formatting this question and creating code to make two series for a minimum working example would be awesome.
Thanks!!
The problem is that pandas bar plots are categorical (Bars are at subsequent integer positions). Since in your case the two series have a different number of elements, plotting the line graph in categorical coordinates is not really an option. What remains is to plot the bar graph in numerical coordinates as well. This is not possible with pandas, but is the default behaviour with matplotlib.
Below I shift the monthly dates by 15 days to the middle of the month to have nicely centered bars.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
import pandas as pd
t1 = pd.date_range("2018-01-01", "2018-12-31", freq="D")
s1 = pd.Series(np.cumsum(np.random.randn(len(t1)))+14, index=t1)
s1[:6] = np.nan
t2 = pd.date_range("2018-01-01", "2018-12-31", freq="MS")
s2 = pd.Series(np.random.rand(len(t2))*15+5, index=t2)
# shift monthly data to middle of month
s2.index += pd.Timedelta('15 days')
fig, ax = plt.subplots()
ax.bar(s2.index, s2.values, width=14, alpha=0.3)
ax.plot(s1.index, s1.values)
plt.show()
The problem might be the two series' indices are of very different scales. You can use ax.twiny to plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
ax_tw = ax.twiny()
series_two.plot(ax=ax_tw)
plt.show()
Output:

stop connecting points in pandas time series plot

I have some data in a pandas series and when I type
mydata.head()
I get:
BPM
timestamp
2015-04-07 02:24:00 96.0
2015-04-07 02:24:00 96.0
2015-04-07 02:24:00 95.0
2015-04-07 02:24:00 95.0
2015-04-07 02:24:00 95.0
Also, when using
mydata.info()
I get:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 33596 entries, 2015-04-07 02:24:00 to 2015-07-15 14:23:50
Data columns (total 1 columns):
BPM 33596 non-null float64
dtypes: float64(1)
memory usage: 524.9 KB
When I go to plot using
import matplotlib.pyplot as pyplot
fig, ax = pyplot.subplots()
ax.plot(mydata)
I just get a complete mess, it's like it's joining lots of points together that should not be joined together.
How can I sort this out to display as a proper time series plot?
Just use
mydata.plot(style="o")
and you should get a Pandas plot without lines connecting the points.
Just tell matplotlib to plot markers instead of lines. For example,
import matplotlib.pyplot as pyplot
fig, ax = pyplot.subplots()
ax.plot(my data, '+')
If you prefer another marker, you can change it (see this link).
You can also plot directly from pandas:
mydata.plot('+')
If you really want the lines, you need to sort your data before plotting it.

Categories

Resources