If I have a DataFrame with a column 'Year' and another column 'Average temperature' and I want to represent them in a barplot to see if the global average temperature has risen over the last decades, how do you convert years to decades?
For example, between 1980 and 1989 I need it to be represented in x axis as 1980. For 1990 and 1999 as 1990, and so on.
Note that:
x axis = Year
y axis = Average temperature
Many thanks
You can do this by,
Converting years to starting year of the decade
Then take average temperature for years of that decade
Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
#Sample data. Replace it with your data.
df = pd.DataFrame([[2011,20],[2012,10],[2013,10],[2014,10],[2015,10],[2016,10],[2017,10],[2018,10],[2019,10],[2020,10],[2021,10],[2022,15]], columns=['year','temp'])
df['year'] = df['year'] - df['year'] % 10
df_decade = (df.groupby(['year']).mean().reset_index())
ax = sns.barplot(x="year", y="temp", data=df_decade)
plt.show()
Related
I have a plotly chart that looks like this:
Is there a way to make a second x axis that only has the years? What I mean is that I want two x axes: a 'sub-axis' that has the months (Sep, Nov, Jan , ...), and another one that has the years (2021, 2022, 2023).
It is possible to handle this by making the x-axis a multiple list, but if the original data is in date units, it will be changed to a graph of one day in month units. To put it more simply, if the data is for one year, there are 365 points, but if the data is displayed in months only, there will be 12 points. The closest way to meet the request is to make it month name and day.
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import calendar
df = px.data.stocks()
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
multi_index = [df['year'].values,df['date'].dt.strftime('%b-%d').values]
fig = go.Figure()
fig.add_scatter(x=multi_index, y=df['GOOG'])
fig.show()
This question already has answers here:
How to show labels in hours and minutes format on xaxis for all x values
(1 answer)
Plot datetime.time in seaborn
(1 answer)
Python plot with 24 hrs x and y axis using only hours and minutes from timestamp
(1 answer)
Closed 8 months ago.
I have the following data structure:
df = pd.DataFrame({"Date":["2015-02-02 14:19:00","2015-02-02 14:22:00","2015-02-17 14:57:00","2015-02-17 14:58:59"],"Occurrence":[1,0,1,1]})
df["Date"] = pd.to_datetime(df["Date"])
I want to plot the following:
import seaborn as sns
sns.set_theme(style="darkgrid")
sns.lineplot(x="Date", y="Occurrence", data=df)
And I get this:
I only want the hours and minutes to be shown on the x axis (the date of the day is unnecessary). How can I do that?
You can use the matplotlib's Dateformatter. Updated code and plot below. I did notice that the Date column you posted had dates on 2nd and 17th. I changed those to show everything on the 2nd. Otherwise, there would be too many entries. Hope this helps...
df = pd.DataFrame({"Date":["2015-02-02 10:19:00","2015-02-02 12:22:00","2015-02-02 14:57:00","2015-02-02 16:58:59"],"Occurrence":[1,0,1,1]})
df["Date"] = pd.to_datetime(df["Date"])
import seaborn as sns
sns.set_theme(style="darkgrid")
ax = sns.lineplot(x="Date", y="Occurrence", data=df)
import matplotlib.dates as mdates
ax.xaxis.set_major_locator(mdates.HourLocator(interval=2))
# set formatter
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
Output Plot
You would want to convert your ['Date'] column to only include time information, im not sure if you want the data to be ordered by date or not but that should just show time information on the X-axis:
df['Date'].dt.time
So I have a dataframe with a date column.
date
2021-06-17
2020-06-20
What I want to do is to do a scatterplot with the x-axis being the year, and the y-axis being month-day. So what I have already is this:
What I would like is for the y-axis ticks to be the actual month-day values and not the day number for the month-day-year. Not sure if this is possible, but any help is much appreciated.
Some Sample Data:
import pandas as pd
from matplotlib import pyplot as plt, dates as mdates
# Some Sample Data
df = pd.DataFrame({
'date': pd.date_range(
start='2000-01-01', end='2020-12-31', freq='D'
)
}).sample(n=100, random_state=5).sort_values('date').reset_index(drop=True)
Then one option would be to normalize the dates to the same year. Any year works as long as it's a leap year to handle the possibility of a February 29th (leap day).
This becomes the new y-axis.
# Create New Column with all dates normalized to same year
# Any year works as long as it's a leap year in case of a Feb-29
df['month-day'] = pd.to_datetime('2000-' + df['date'].dt.strftime('%m-%d'))
# Plot DataFrame
ax = df.plot(kind='scatter', x='date', y='month-day')
# Set Date Format On Axes
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y')) # Year Only
ax.yaxis.set_major_formatter(mdates.DateFormatter('%m-%d')) # No Year
plt.tight_layout()
plt.show()
I am trying to create a stacked bar-chart showing total marriages by months for each year between 2008 and 2015.
import pandas as pd
import numpy as np
import io
import requests
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
url = "https://data.code4sa.org/api/views/r4bb-fvka/rows.csv"
file=requests.get(url).content
c=pd.read_csv(io.StringIO(file.decode('utf-8')))
Here I am adding the total number of marriages for each year then grouping by both Marriage Year and month to have the total number of marriages for each month
c['Total'] = c['MarriageYear']
months = c.groupby(['MarriageYear','MarriageMonth'])['Total'].count()
I think the index should be both Marriage Year and Marriage Month since I want the total of marriages for each month in every year???
months.set_index(['MarriageYear','MarriageMonth'])\
.reindex(months.set_index('MarriageMonth').sum().sort_values().index, axis=1)\
.T.plot(kind='bar', stacked=True,
colormap=ListedColormap(sns.color_palette("GnBu", 10)),
figsize=(24,28))
If you do post any potential solutions or what I should look at again, please explain why/where I went wrong and how I should be approaching this
Try this:
c.groupby(['MarriageYear', 'MarriageMonth']).size() \
.unstack().plot.bar(stacked=True, colormap='GnBu', figsize=(12, 14))
I wrote this code to draw the histogram of date values in each month. It shows the number of dates for each month in the whole dataset. But I want the histogram to be for each month in each year.That is, for example, I should have January through December for year1, and then January through December for year2 and so on.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.options.display.mpl_style = 'default'
sns.set_context("talk")
df = pd.read_csv("data.csv", names=['lender','loan','country','sector','amount','date'],header=None)
date=df['date']
df.date = date.astype("datetime64")
df.groupby(df.date.dt.month).count().plot(kind="bar")
According to the docstring the groupby docstring, the by parameter is:
list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups
So your code simply becomes:
df = pd.read_csv(...)
df['date'] = df['date'].astype("datetime64")
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df.groupby(by=['month', 'year']).count().plot(kind="bar")
But I would write this as:
ax = (
pandas.read_csv(...)
.assign(date=lambda df: df['date'].astype("datetime64"))
.assign(year=lambda df: df['date'].dt.year)
.assign(month=lambda df: df['date'].dt.month)
.groupby(by=['year', 'month'])
.count()
.plot(kind="bar")
)
And now you have a matplotlib axes object that you can use to modify the tick labels (e.g., matplotlib x-axis ticks dates formatting and locations)