Python pandas scatterplot of year against month-day - python

So I have a dataframe with a date column.
date
2021-06-17
2020-06-20
What I want to do is to do a scatterplot with the x-axis being the year, and the y-axis being month-day. So what I have already is this:
What I would like is for the y-axis ticks to be the actual month-day values and not the day number for the month-day-year. Not sure if this is possible, but any help is much appreciated.

Some Sample Data:
import pandas as pd
from matplotlib import pyplot as plt, dates as mdates
# Some Sample Data
df = pd.DataFrame({
'date': pd.date_range(
start='2000-01-01', end='2020-12-31', freq='D'
)
}).sample(n=100, random_state=5).sort_values('date').reset_index(drop=True)
Then one option would be to normalize the dates to the same year. Any year works as long as it's a leap year to handle the possibility of a February 29th (leap day).
This becomes the new y-axis.
# Create New Column with all dates normalized to same year
# Any year works as long as it's a leap year in case of a Feb-29
df['month-day'] = pd.to_datetime('2000-' + df['date'].dt.strftime('%m-%d'))
# Plot DataFrame
ax = df.plot(kind='scatter', x='date', y='month-day')
# Set Date Format On Axes
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y')) # Year Only
ax.yaxis.set_major_formatter(mdates.DateFormatter('%m-%d')) # No Year
plt.tight_layout()
plt.show()

Related

Is it possible to add another x axis to a plotly chart?

I have a plotly chart that looks like this:
Is there a way to make a second x axis that only has the years? What I mean is that I want two x axes: a 'sub-axis' that has the months (Sep, Nov, Jan , ...), and another one that has the years (2021, 2022, 2023).
It is possible to handle this by making the x-axis a multiple list, but if the original data is in date units, it will be changed to a graph of one day in month units. To put it more simply, if the data is for one year, there are 365 points, but if the data is displayed in months only, there will be 12 points. The closest way to meet the request is to make it month name and day.
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import calendar
df = px.data.stocks()
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
multi_index = [df['year'].values,df['date'].dt.strftime('%b-%d').values]
fig = go.Figure()
fig.add_scatter(x=multi_index, y=df['GOOG'])
fig.show()

How can I only plot hours and minutes in seaborn? [duplicate]

This question already has answers here:
How to show labels in hours and minutes format on xaxis for all x values
(1 answer)
Plot datetime.time in seaborn
(1 answer)
Python plot with 24 hrs x and y axis using only hours and minutes from timestamp
(1 answer)
Closed 8 months ago.
I have the following data structure:
df = pd.DataFrame({"Date":["2015-02-02 14:19:00","2015-02-02 14:22:00","2015-02-17 14:57:00","2015-02-17 14:58:59"],"Occurrence":[1,0,1,1]})
df["Date"] = pd.to_datetime(df["Date"])
I want to plot the following:
import seaborn as sns
sns.set_theme(style="darkgrid")
sns.lineplot(x="Date", y="Occurrence", data=df)
And I get this:
I only want the hours and minutes to be shown on the x axis (the date of the day is unnecessary). How can I do that?
You can use the matplotlib's Dateformatter. Updated code and plot below. I did notice that the Date column you posted had dates on 2nd and 17th. I changed those to show everything on the 2nd. Otherwise, there would be too many entries. Hope this helps...
df = pd.DataFrame({"Date":["2015-02-02 10:19:00","2015-02-02 12:22:00","2015-02-02 14:57:00","2015-02-02 16:58:59"],"Occurrence":[1,0,1,1]})
df["Date"] = pd.to_datetime(df["Date"])
import seaborn as sns
sns.set_theme(style="darkgrid")
ax = sns.lineplot(x="Date", y="Occurrence", data=df)
import matplotlib.dates as mdates
ax.xaxis.set_major_locator(mdates.HourLocator(interval=2))
# set formatter
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
Output Plot
You would want to convert your ['Date'] column to only include time information, im not sure if you want the data to be ordered by date or not but that should just show time information on the X-axis:
df['Date'].dt.time

Group years by decade in seaborn barplot

If I have a DataFrame with a column 'Year' and another column 'Average temperature' and I want to represent them in a barplot to see if the global average temperature has risen over the last decades, how do you convert years to decades?
For example, between 1980 and 1989 I need it to be represented in x axis as 1980. For 1990 and 1999 as 1990, and so on.
Note that:
x axis = Year
y axis = Average temperature
Many thanks
You can do this by,
Converting years to starting year of the decade
Then take average temperature for years of that decade
Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
#Sample data. Replace it with your data.
df = pd.DataFrame([[2011,20],[2012,10],[2013,10],[2014,10],[2015,10],[2016,10],[2017,10],[2018,10],[2019,10],[2020,10],[2021,10],[2022,15]], columns=['year','temp'])
df['year'] = df['year'] - df['year'] % 10
df_decade = (df.groupby(['year']).mean().reset_index())
ax = sns.barplot(x="year", y="temp", data=df_decade)
plt.show()

How can I draw the histogram of date values group by month in each year in Python?

I wrote this code to draw the histogram of date values in each month. It shows the number of dates for each month in the whole dataset. But I want the histogram to be for each month in each year.That is, for example, I should have January through December for year1, and then January through December for year2 and so on.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.options.display.mpl_style = 'default'
sns.set_context("talk")
df = pd.read_csv("data.csv", names=['lender','loan','country','sector','amount','date'],header=None)
date=df['date']
df.date = date.astype("datetime64")
df.groupby(df.date.dt.month).count().plot(kind="bar")
According to the docstring the groupby docstring, the by parameter is:
list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups
So your code simply becomes:
df = pd.read_csv(...)
df['date'] = df['date'].astype("datetime64")
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df.groupby(by=['month', 'year']).count().plot(kind="bar")
But I would write this as:
ax = (
pandas.read_csv(...)
.assign(date=lambda df: df['date'].astype("datetime64"))
.assign(year=lambda df: df['date'].dt.year)
.assign(month=lambda df: df['date'].dt.month)
.groupby(by=['year', 'month'])
.count()
.plot(kind="bar")
)
And now you have a matplotlib axes object that you can use to modify the tick labels (e.g., matplotlib x-axis ticks dates formatting and locations)

Plotting a stacked bar chart with matplotlib using dates as index where there are gaps and missing data

I have a data frame containing several columns for which I have continuous (annual) data since 1971 up to 2012. After that I have some say "predicted" values for 2020, 2025, 2030 and 2035. The index to the data frame is in integer format (each date), and I've tried converting it to a date time format using the appropriate module, but this still doesn't correctly space out the dates on the x-axis (to show the actual time-gaps) Here's the code I've been experimenting with:
fig, ax = plt.subplots()
# Set title
ttl = "India's fuel mix (1971-2012)"
# Set color transparency (0: transparent; 1: solid)
a = 0.7
# Convert the index integer dates into actual date objects
new_fmt.index = [datetime.datetime(year=date, month=1, day=1) for date in new_fmt.index]
new_fmt.ix[:,['Coal', 'Oil', 'Gas', 'Biofuels', 'Nuclear', 'Hydro','Wind']].plot(ax=ax,kind='bar', stacked=True, title = ttl)
ax.grid(False)
xlab = 'Date (Fiscal Year)'
ylab = 'Electricity Generation (GWh)'
ax.set_title(ax.get_title(), fontsize=20, alpha=a)
ax.set_xlabel(xlab, fontsize=16, alpha=a)
ax.set_ylabel(ylab, fontsize=16, alpha=a)
# Tell matplotlib to interpret the x-axis values as dates
ax.xaxis_date()
# Make space for and rotate the x-axis tick labels
fig.autofmt_xdate()
I tried to figure it out:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import datetime
# create data frame with random data (3 rows, 2 columns)
df = pd.DataFrame(np.random.randn(3,2))
# time index with missing years
t = [datetime.date(year=1971, month=12, day=31), datetime.date(year=1972, month=12, day=31), datetime.date(year=1980, month=12, day=31)]
df.index = t
# time index with all the years:
tnew = pd.date_range(datetime.date(year=1971, month=1, day=1),datetime.date(year=1981, month=1, day=1),freq="A")
# reindex data frame (missing years will be filled with NaN
df2 = df.reindex(tnew)
# replace NaN with 0
df2_zeros = df2.fillna(0)
# or interpolate
df2_interp = df2.interpolate()
# and plot
df2_interp.columns = ["coal","wind"]
df2_interp.plot(kind='bar', stacked=True)
plt.show()
Hope this helps.

Categories

Resources