Matplotlib scatter plot automatically duplicate datetime xticks

Matplotlib scatter plot automatically duplicate datetime xticks - python

In my dataframe I have, for each day, the mean value of cardiac frequency during the day and during the night that I previously calculated. The problem is that when I try to do a scatter plot, the values on the x axis get duplicated.
So this is my df.
date FC_mean_day FC_mean_night
0 2022-12-28 79.43 74.11
1 2022-12-29 74.25 75.00
2 2022-12-30 75.75 74.40
3 2022-12-31 70.91 72.90
4 2023-01-01 68.43 73.00
date datetime64\[ns\]
FC_mean_day float64
FC_mean_night float64
dtype: object
And this is what I have done to generate the scatter plot
fig,ax = plt.subplots()
plt.scatter(df5.date, df5.FC_mean_night, color= 'blue', label = 'Notte(22-06)', alpha=1, )
plt.scatter(df5.date, df5.FC_mean_day, color= 'yellow', label = 'Giorno(06-22)', alpha=1,)
my_format = mdates.DateFormatter('%d/%m/%y')
plt.rcParams["figure.figsize"] = (3,4)
plt.title("Media della Frequenza Cardiaca")
plt.xlabel('Data')
plt.ylabel('Valore')
plt.xticks(rotation = 90)
plt.legend(loc='center left', bbox_to_anchor=(1,0.5))
ax.yaxis.grid(color='grey', linestyle = 'dashed')
ax.set_axisbelow(True)
ax.xaxis.set_major_formatter(my_format)
plt.show()
And this is what it generates:
scatter with duplicates
I noticed that when I have some values in the next day and not the night the problem doesn't happen, like in this case:
date FC_mean_day FC_mean_night
0 2022-12-28 79.43 74.11
1 2022-12-29 74.25 75.00
2 2022-12-30 75.75 74.40
3 2022-12-31 70.91 72.90
4 2023-01-01 68.43 73.00
5 2023-01-02 75.00 NaN
scatter without duplicates
What am I missing?

You already used a dateformatter, so you just need set a locator with Axis.set_major_locator.
ax.xaxis.set_major_formatter(my_format)
ax.xaxis.set_major_locator(mdates.DayLocator(interval=1)) # <- add this line
Output :

Related

Pandas: Plotting / annotating from DataFrame

There is this boring dataframe with stock data I have:
date close MA100 buy sell
2022-02-14 324.95 320.12 0 0
2022-02-13 324.87 320.11 1 0
2022-02-12 327.20 321.50 0 0
2022-02-11 319.61 320.71 0 1
Then I am plotting the prices
import pandas as pd
import matplotlib.pyplot as plt
df = ...
df['close'].plot()
df['MA100'].plot()
plt.show()
So far so good...
Then I'd like to show a marker on the chart if there was buy (green) or sell (red) on that day.
It's just to highlight if there was a transaction on that day. The exact intraday price at which the trade happened is not important.
So the x/y-coordinates could be the date and the close if there is a 1 in column buy (sell).
I am not sure how to implement this.
Would I need a loop to iterate over all rows where buy = 1 (sell = 1) and then somehow add these matches to the plot (probably with annotate?)
I'd really appreciate it if someone could point me in the right direction!

You can query the data frame for sell/buy and scatter plot:
fig, ax = plt.subplots()
df.plot(x='date', y=['close', 'MA100'], ax=ax)
df.query("buy==1").plot.scatter(x='date', y='close', c='g', ax=ax)
df.query("sell==1").plot.scatter(x='date', y='close', c='r', ax=ax)
Output:

matplotlib.units.ConversionError: Failed to convert value(s) to axis units: NaT

I am trying to plot the following time-series data and draw some vertical lines based on the first date and last date. If there is any NA I am getting an error. I would like to run the code without any error but keeping NA values in the Date column.
data
Date Result
2017-01-06 0.0
2017-01-07 1.0
2017-01-08 0.0
2017-01-09 0.0
2017-01-010 0.0
NA 0.0
code
first = '2017-01-06'
last = 'NA'
fig, ax = plt.subplots()
ax.bar(data['Date'], data['Result'], linestyle='-',color='midnightblue' ,lw=6, width=0.3)
ax.axvline(pd.to_datetime('first'), color='red', zorder=1, linestyle='--', lw=8)
ax.axvline(pd.to_datetime('last'), color='red', zorder=1, linestyle='--', lw=8)
figure = fig.savefig(result, bbox_inches='tight')

How to plot evenly spaced values on the x axis while plotting using matplotlib

I have a Series with more than 100 000 rows that I want to plot. I have problem with the x-axis of my figure. Since my x-axis is made of several dates, you can't see anything if you plot all of them.
How can I choose to show only 1 out of every x on the x-axis ?
Here is an example of a code which produces a graphic with an ugly x-axis :
sr = pd.Series(np.array(range(15)))
sr.index = [ '2018-06-' + str(x).zfill(2) for x in range(1,16)]
Out :
2018-06-01 0
2018-06-02 1
2018-06-03 2
2018-06-04 3
2018-06-05 4
2018-06-06 5
2018-06-07 6
2018-06-08 7
2018-06-09 8
2018-06-10 9
2018-06-11 10
2018-06-12 11
2018-06-13 12
2018-06-14 13
2018-06-15 14
fig = plt.plot(sr)
plt.xlabel('Date')
plt.ylabel('Sales')

Using xticks you can achieve the desired effect:
In your example:
sr = pd.Series(np.array(range(15)))
sr.index = [ '2018-06-' + str(x).zfill(2) for x in range(1,16)]
fig = plt.plot(sr)
plt.xlabel('Date')
plt.xticks(sr.index[::4]) #Show one in every four dates
plt.ylabel('Sales')
Output:
Also, if you want to set the number of ticks, instead, you can use locator_params:
sr.plot(xticks=sr.reset_index().index)
plt.locator_params(axis='x', nbins=5) #Show five dates
plt.ylabel('Sales')
plt.xlabel('Date')
Output:

How can I plot different length pandas series with matplotlib?

I've got two pandas series, one with a 7 day rolling mean for the entire year and another with monthly averages. I'm trying to plot them both on the same matplotlib figure, with the averages as a bar graph and the 7 day rolling mean as a line graph. Ideally, the line would be graph on top of the bar graph.
The issue I'm having is that, with my current code, the bar graph is showing up without the line graph, but when I try plotting the line graph first I get a ValueError: ordinal must be >= 1.
Here's what the series' look like:
These are first 15 values of the 7 day rolling mean series, it has a date and a value for the entire year:
date
2016-01-01 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 0.088473
2016-01-09 0.099122
2016-01-10 0.086265
2016-01-11 0.084836
2016-01-12 0.076741
2016-01-13 0.070670
2016-01-14 0.079731
2016-01-15 0.079187
2016-01-16 0.076395
This is the entire monthly average series:
dt_month
2016-01-01 0.498323
2016-02-01 0.497795
2016-03-01 0.726562
2016-04-01 1.000000
2016-05-01 0.986411
2016-06-01 0.899849
2016-07-01 0.219171
2016-08-01 0.511247
2016-09-01 0.371673
2016-10-01 0.000000
2016-11-01 0.972478
2016-12-01 0.326921
Here's the code I'm using to try and plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
series_two.plot(ax=ax)
plt.show()
Here's the graph that generates:
Any help is hugely appreciated! Also, advice on formatting this question and creating code to make two series for a minimum working example would be awesome.
Thanks!!

The problem is that pandas bar plots are categorical (Bars are at subsequent integer positions). Since in your case the two series have a different number of elements, plotting the line graph in categorical coordinates is not really an option. What remains is to plot the bar graph in numerical coordinates as well. This is not possible with pandas, but is the default behaviour with matplotlib.
Below I shift the monthly dates by 15 days to the middle of the month to have nicely centered bars.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
import pandas as pd
t1 = pd.date_range("2018-01-01", "2018-12-31", freq="D")
s1 = pd.Series(np.cumsum(np.random.randn(len(t1)))+14, index=t1)
s1[:6] = np.nan
t2 = pd.date_range("2018-01-01", "2018-12-31", freq="MS")
s2 = pd.Series(np.random.rand(len(t2))*15+5, index=t2)
# shift monthly data to middle of month
s2.index += pd.Timedelta('15 days')
fig, ax = plt.subplots()
ax.bar(s2.index, s2.values, width=14, alpha=0.3)
ax.plot(s1.index, s1.values)
plt.show()

The problem might be the two series' indices are of very different scales. You can use ax.twiny to plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
ax_tw = ax.twiny()
series_two.plot(ax=ax_tw)
plt.show()
Output:

Plotting multiple dates on year in scatterplot Python

I have data that has Dates that I would like to be able to automatically split in a scatterplot and looks like:
Date Level Price
2008-01-01 56 11
2008-01-03 10 12
2008-01-05 52 13
2008-02-01 66 14
2008-05-01 20 10
..
2009-01-01 12 11
2009-02-01 70 11
2009-02-05 56 12
..
2018-01-01 56 10
2018-01-11 10 17
..
Only way I know how to tackle this is to just manually select using iloc and eyeball the dates in the dataframe like this:
fig = plt.figure(figsize=(15,10))
ax1 = fig.add_subplot(111)
ax1.scatter(df.iloc[____, 1], df.iloc[_____, 2], s=10, c='r', marker="o", label='2008')
ax1.scatter(df.iloc[____, 1],df.iloc[____, 2], s=10, c='mediumblue', marker="o", label='2009')
.
.
. (for each year I want)
plt.ylabel('Level', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(loc='upper left', prop={'size': 12});
plt.show()
But this takes a lot of time.
I'd like to automatically loop through each Date's Year and plot different Levels (Y) to Price (X) on colors by that given year and make a legend label for each year.
What would be a good strategy to do this?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matplotlib scatter plot automatically duplicate datetime xticks - python

You already used a dateformatter, so you just need set a locator with Axis.set_major_locator. ax.xaxis.set_major_formatter(my_format) ax.xaxis.set_major_locator(mdates.DayLocator(interval=1)) # <- add this line Output :

Related

Pandas: Plotting / annotating from DataFrame

matplotlib.units.ConversionError: Failed to convert value(s) to axis units: NaT

How to plot evenly spaced values on the x axis while plotting using matplotlib

How can I plot different length pandas series with matplotlib?

Plotting multiple dates on year in scatterplot Python

Categories

Resources