I'm attempting to plot a pandas stacked bar plot with the x axis showing Months on the major ticks, or years on Jan 1, ideally with small ticks identifying the weeks but with no label.
I have a dataset with a datetime index that was then grouped by week and then I plot that dataset. If I don't attempt to control the settings the dates show up but are vertical and don't fit. So I used the set formatter to fix that but then the axes changed to 1970 as if following an index number instead of date. If I replace the pandas plotting with a regular bar chart, the "ConciseDateFormatter" works as desired/expected. But I wanted to use stacked with pandas as creating a regular stacked bar chart is a pain. I don't understand why I can't control pandas axes like I can a regular plot.
One thing I notice is that the index is shown as an object. If I convert it to to_datetime() it then adds 00:00 for times that I don't want on the axes or my data.
My data is a simple set of weekly random data:
date A B C D
3/20/2022 1.540765154 0.504616419 1.543679189 2.952934623
3/27/2022 1.781135128 4.594966635 4.799026389 3.499803401
4/3/2022 0.254059207 0.69835265 0.323039575 1.628138491
4/10/2022 3.112760301 0.287056897 4.372938373 0.130817579
4/17/2022 0.497273044 0.913246096 1.296612207 1.250610278
4/24/2022 1.370087689 3.124985109 4.322253295 4.49571603
5/1/2022 3.952629538 3.976896924 1.679311114 1.265443147
5/8/2022 3.470328161 1.266161308 3.990502436 1.364929959
5/15/2022 2.296588269 4.639761391 0.04685036 1.438471692
5/22/2022 3.443458637 2.66592719 0.968656871 2.349325343
5/29/2022 1.820278464 4.794211675 2.435710815 2.156110694
6/5/2022 4.328825266 0.049132356 1.842839099 3.665701299
6/12/2022 0.184631564 0.412976815 4.787477069 4.80052839
6/19/2022 4.846734385 3.471474741 1.808871854 2.440013553
6/26/2022 1.612870444 0.70191857 3.55713114 1.438699834
7/3/2022 2.896859156 4.025996887 0.209608767 4.174881655
Code:
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
maxval = 200
values = ['A','B','C','D']
cum = [v + '_CUM' for v in values]
df = pd.read_csv('test_data.csv', index_col='date', parse_dates=True,
infer_datetime_format=True)
#df.index = pd.to_datetime(df.index.date).strftime("%b %d")
df = df.join(df.cumsum(), rsuffix="_CUM")
df = df.join(df[cum]/maxval * 100, rsuffix="_LIFE")
fig, axs = plt.subplots(nrows=2, ncols=1, sharex=False, squeeze=False,
facecolor='white')
axs = axs.flatten()
ax = axs[0]
df[values].plot.bar(ax=ax, grid=True, stacked=True, legend=True)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.ConciseDateFormatter
(ax.xaxis.get_major_locator()))
# ax.xaxis.set_tick_params(rotation = 0)
plt.show(block=False)
Related
I have a dataset containing various fields of users, like dates, like count etc. I am trying to plot a histogram which shows like count with respect to date, how should I do that?
The dataset:
Assuming you want to plot number of public likes by date, you could do something like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('analysis.csv')
# convert text column to date time and keep only the date part
df['created_at'] = pd.to_datetime(df['created_at'])
df['created_at'] = df['created_at'].dt.date
# group by date taking the sum of public_metrics.like_count
df1 = df.groupby(['created_at'])['public_metrics.like_count'].sum().reset_index()
df1 = df1.set_index('created_at')
# plot and show
df1.plot()
plt.show()
And this is the output you will get
Just to add something to the first answer: you could visualize only the likes count of a specific month by making a bar plot. In this way, maybe you have a plot that is "closer" to the idea of histogram that you want. For example, I did it for January month:
import pandas as pd
import matplotlib.pylab as plt
import matplotlib.dates as mdates
# Read and clean data
df = pd.read_csv('tweets_data.txt')
df['created_at'] = df['created_at'].str.replace(".000Z", "")
df.created_at
# Create a new dataframe with only two columns: data and number of likes
histogram_data = pd.concat([df[['created_at']],df[['public_metrics.like_count']]],axis=1)
January_values = histogram_data[histogram_data['created_at'].astype(str).str.contains('2018-01')] #histogram_data['created_at'].astype(str)
January_values
January_values.shape
dictionary = {}
for date, n_likes in January_values.itertuples(index=False):
dictionary[date] = n_likes
print(dictionary)
# Create figure and plot space
fig, ax = plt.subplots(figsize=(12, 12))
# Add x-axis and y-axis
ax.bar(dictionary.keys(),
dictionary.values(),
color='purple')
# Set title and labels for axes
ax.set_xlabel('Date', fontsize = 20)
ax.set_ylabel('Counts', fontsize = 20)
ax.set_title('Tweets likes counts in January 2018', fontsize = 15, weight = "bold")
# Ensure a major tick for each week using (interval=1)
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))
ax.tick_params(axis='x', which='major', labelsize=15, width=2)
plt.setp( ax.xaxis.get_majorticklabels(), rotation=-45, ha="left", weight="bold")
plt.show()
The output is:
Of course, if you use all your data (that are more than 3000 dates), you will obtain a plot with bars really sharp...
I have plotted a heatmap which is displayed below. on the xaxis it shows time of the day and y axis shows date. I want to show xaxis at every hour instead of the random xlabels it displays here.
I tried following code but the resulting heatmap overrites all xlabels together:
t = pd.date_range(start='00:00:00', end='23:59:59', freq='60T').time
df = pd.DataFrame(index=t)
df.reset_index(inplace=True)
df['index'] = df['index'].astype('str')
sns_hm = sns.heatmap(data=mat, cbar=True, lw=0,cmap=colormap,xticklabels=df['index'])
The following code supposes mat is a dataframe with columns for some timestamps for each of a number of days. Each of the days, the same timestamps need to appear again.
After drawing the heatmap, the left and right limits of the x-axis are retrieved. Supposing these go from 0 to 24 hour, the range can be subdivided into 25 positions, one for each of the hours.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from pandas.tseries.offsets import DateOffset
from matplotlib.colors import ListedColormap, to_hex
# first, create some test data
df = pd.DataFrame()
df["date"] = pd.date_range('20220304', periods=19000, freq=DateOffset(seconds=54))
df["val"] = (((np.random.rand(len(df)) ** 100).cumsum() / 2).astype(int) % 2) * 100
df['day'] = df['date'].dt.strftime('%d-%m-%Y')
df['time'] = df['date'].dt.strftime('%H:%M:%S')
mat = df.pivot(index='day', columns='time', values='val')
colors = list(plt.cm.Greens(np.linspace(0.2, 0.9, 10)))
ax = sns.heatmap(mat, cmap=colors, cbar_kws={'ticks': range(0, 101, 10)})
xmin, xmax = ax.get_xlim()
tick_pos = np.linspace(xmin, xmax, 25)
tick_labels = [f'{h:02d}:00:00' for h in range(len(tick_pos))]
ax.set_xticks(tick_pos)
ax.set_xticklabels(tick_labels, rotation=90)
ax.set(xlabel='', ylabel='')
plt.tight_layout()
plt.show()
The left plot shows the default tick labels, the right plot the customized labels.
I have the followings:
fig, ax = plt.subplots(figsize=(40, 10))
sns.lineplot(x="Date", y="KFQ imports", data=df_dry, color="BLACK", ax=ax)
sns.lineplot(x="Date", y="QRR imports", data=df_dry, color="RED",ax=ax)
ax.set(xlabel="Date", ylabel="Value", )
x_dates = df_dry['Date'].dt.strftime('%b-%Y')
ax.set_xticklabels(labels=x_dates, rotation=45)
Result
When I use a barchart (sns.barplot) the entire spectrum of dates are shown. Am I missing something for the line chart? I
The idea would be to set the xticks to exactly the dates in your dataframe. To this end you can use set_xticks(df.Date.values). It might then be good to use a custom formatter for the dates, which would allow to format them in the way you want them.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates
import seaborn as sns
df = pd.DataFrame({"Date" : ["2018-01-22", "2018-04-04", "2018-12-06"],
"val" : [1,2,3]})
df.Date = pd.to_datetime(df.Date)
ax = sns.lineplot(data=df, x="Date", y="val", marker="o")
ax.set(xticks=df.Date.values)
ax.xaxis.set_major_formatter(dates.DateFormatter("%d-%b-%Y"))
plt.show()
Note how the same can be achieved without seaborn, as
ax = df.set_index("Date").plot(x_compat=True, marker="o")
ax.set(xticks=df.Date.values)
ax.xaxis.set_major_formatter(dates.DateFormatter("%d-%b-%Y"))
plt.show()
One way to circumvent the date tick sparsification from Seaborn is to convert the date column to string values.
df['dates_str'] = df.dates.astype(str)
Plotting using the new column as the x-axis will show all the dates.
I have a pandas dataframe with a date column (RankingDate).
This date field is initially a string loaded from a csv in the the format "2006-11-03"
After running df["RankingDate"]=pd.to_datetime(df["RankingDate"]), the data type becomes '<M8[ns]'
I then plot multiple lines over time using seaborn:
f, ax = plt.subplots(figsize=(16, 8))
sns.tsplot(data, time='RankingDate', unit='Dummy', condition='Player', value='Points', ax=ax)
However this gives me a chart where the date axis is labelled in nanoseconds (i.e. 1e10^18), instead of a nice date format like "2006-11-03".
How can I get seaborn to display a date instead of nanoseconds?
Example code:
import numpy as np
import pandas as pd
import seaborn as sns
RankingDate = ['2015-03-02','2015-03-03','2015-03-04','2015-03-05','2015-03-06']
Player = ['Player1','Player2','Player2','Player1','Player1']
Points = np.random.randn(5)
df = pd.DataFrame({'RankingDate': RankingDate , 'Player': Player, 'Points': Points})
df["RankingDate"]=pd.to_datetime(df["RankingDate"])
df["Dummy"]=0
f, ax = plt.subplots(figsize=(16, 8))
sns.tsplot(df, time='RankingDate', unit='Dummy', condition='Player', value='Points', ax=ax)
I have a dataframe with a DatetimeIndex like so:
In [3]: index = pd.date_range('September 1 2014', 'September 1 2015', freq='M')
...: index
Out[3]:
DatetimeIndex(['2014-09-30', '2014-10-31', '2014-11-30', '2014-12-31',
'2015-01-31', '2015-02-28', '2015-03-31', '2015-04-30',
'2015-05-31', '2015-06-30', '2015-07-31', '2015-08-31'],
dtype='datetime64[ns]', freq='M'
Plotting without changing the x tick labels or explicit date formatting yields an x-axis from 0-12.
My figure contains 13 subplots in one column. I'm trying to set the x-axis on the last plot using AutoDateLocator() at the end of the code after all the subplots are plotted:
fig.axes[-1].xaxis.set_major_locator(mdates.AutoDateLocator())
Which returns the following error:
ValueError: ordinal must be >= 1
I tried converting the dataframe index with dates2num as suggested here but it yielded the same result:
df.index = mdates.date2num(df.index.to_pydatetime())
I tried consulting the documentation but I couldn't dig up any other way to make matplotlib recognize the x-axis as dates.
Here is the complete code:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
index = pd.date_range('September 1 2014', 'September 1 2015', freq='M')
data = np.random.random([12, 13])
df = DataFrame(data=data, index=index)
# Draw figure
fig = plt.figure(figsize=(19,20), dpi=72)
fig.suptitle('Chart', fontsize=40, color='#333333')
# Draw charts
for i in range(0, len(df)):
ax = plt.subplot(len(df),1, i+1)
# Set ticks and labels
ax.tick_params(direction='in', length=45, pad=60,colors='0.75')
ax.yaxis.set_tick_params(size=0)
ax.set_yticklabels([])
ax.set_ylim([0, 2])
plt.ylabel(df.columns[i], rotation=0, labelpad=60, fontsize=20, color='#404040')
# Remove spines
sns.despine(ax=ax, bottom=True)
# Draw baseline, data, and threshold lines
# Threshold
ax.axhline(1.6, color='#a0db8e', linestyle='--', label='Threshold')
# Draw baseline
ax.axhline(1, color='0.55',label='Enterprise')
# Plot data
ax.plot(df.iloc[:, i], color='#14509f',label='Data')
# Subplot spacing
fig.subplots_adjust(hspace=1)
# Hide tick labels and first/last tick lines
plt.setp([a.get_xticklabels() for a in fig.axes[:-1]], visible=False)
plt.setp([a.get_xticklines()[-2:] for a in fig.axes],visible=False)
# Date in x-axis
fig.axes[-1].xaxis.set_major_locator(mdates.AutoDateLocator())
fig.axes[-1].xaxis.set_major_formatter(mdates.DateFormatter('%Y.%m.%d'))