Dataframe Bar plot with Seaborn - python

I'm trying to create a bar plot from a DataFrame with Datetime Index.
This is an example working code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set()
index = pd.date_range('2012-01-01', periods=48, freq='M')
data = np.random.randint(100, size = (len(index),1))
df = pd.DataFrame(index=index, data=data, columns=['numbers'])
fig, ax = plt.subplots()
ax.bar(df.index, df['numbers'])
The result is:
As you can see the white bars cannot be distinguished well with respect of the background (why?).
I tried using instead:
df['numbers'].plot(kind='bar')
import matplotlib.ticker as ticker
ticklabels = df.index.strftime('%Y-%m')
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))
with this result:
But in this way I lose the automatic xticks labels (and grid) 6-months spacing.
Any idea?

You can just change the style:
import matplotlib.pyplot as plt
index = pd.date_range('2012-01-01', periods=48, freq='M')
data = np.random.randint(100, size = (len(index),1))
df = pd.DataFrame(index=index, data=data, columns=['numbers'])
plt.figure(figsize=(12, 5))
plt.style.use('default')
plt.bar(df.index,df['numbers'],color="red")

You do not actually use seaborn. Replace ax.bar(df.index, df['numbers'])
with
sns.barplot(df.index, df['numbers'], ax=ax)

Related

Bar plot for multidimensional columns using pandas

I want to plot my dataframe (df) as a bar plot based on the time columns, where each bar represents the value counts() for each letter that appears in the column.
Expected output
.
date,00:00:00,01:00:00,02:00:00,03:00:00,04:00:00
2002-02-01,Y,Y,U,N,N
2002-02-02,U,N,N,N,N
2002-02-03,N,N,N,N,N
2002-02-04,N,N,N,N,N
2002-02-05,N,N,N,N,N
When I select individual time columns, I can do as below
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
df = pd.read_csv('df.csv')
df = df['04:00:00'].value_counts()
df.plot(kind='bar')
plt.show()
How can I plot all the columns on the same bar plot as shown on the expected output.
One possible solution is:
pd.DataFrame({t: df[t].value_counts() for t in df.columns if t != "date"}).T.plot.bar()
Here is an approach via seaborn's catplot:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from io import StringIO
df_str = '''date,00:00:00,01:00:00,02:00:00,03:00:00,04:00:00
2002-02-01,Y,Y,U,N,N
2002-02-02,U,N,N,N,N
2002-02-03,N,N,N,N,N
2002-02-04,N,N,N,N,N
2002-02-05,N,N,N,N,N'''
df = pd.read_csv(StringIO(df_str))
df_long = df.set_index('date').melt(var_name='hour', value_name='kind')
g = sns.catplot(kind='count', data=df_long, x='kind', palette='mako',
col='hour', col_wrap=5, height=3, aspect=0.5)
for ax in g.axes.flat:
ax.set_xlabel(ax.get_title()) # use the title as xlabel
ax.grid(True, axis='y')
ax.set_title('')
if len(ax.get_ylabel()) == 0:
sns.despine(ax=ax, left=True) # remove left axis for interior subplots
ax.tick_params(axis='y', size=0)
plt.tight_layout()
plt.show()

Seaborn xaxis with large timeline

I have around 4475 rows of csv data like below:
,Time,Values,Size
0,1900-01-01 23:11:30.368,2,
1,1900-01-01 23:11:30.372,2,
2,1900-01-01 23:11:30.372,2,
3,1900-01-01 23:11:30.372,2,
4,1900-01-01 23:11:30.376,2,
5,1900-01-01 23:11:30.380,,
6,1900-01-01 23:11:30.380,,
7,1900-01-01 23:11:30.380,,
8,1900-01-01 23:11:30.380,,321
9,1900-01-01 23:11:30.380,,111
.
.
4474,1900-01-01 23:11:32.588,,
When I try to create simple seaborn lineplot with below code. It creates line chart but its continuous chart while my data i.e. 'Values' has many empty/nan values which should show as gap on chart. How can I do that?
[from datetime import datetime
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("Data.csv")
sns.set(rc={'figure.figsize':(13,4)})
ax =sns.lineplot(x="Time", y="Values", data=df)
ax.set(xlabel='Time', ylabel='Values')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()]
As reported in this answer:
I've looked at the source code and it looks like lineplot drops nans from the DataFrame before plotting. So unfortunately it's not possible to do it properly.
So, the easiest way to do it is to use matplotlib in place of seaborn.
In the code below I generate a dataframe like your with 20% of missing values in 'Values' column and I use matplotlib to draw a plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'Time': pd.date_range(start = '1900-01-01 23:11:30', end = '1900-01-01 23:11:30.1', freq = 'L')})
df['Values'] = np.random.randint(low = 2, high = 10, size = len(df))
df['Values'] = df['Values'].mask(np.random.random(df['Values'].shape) < 0.2)
fig, ax = plt.subplots(figsize = (13, 4))
ax.plot(df['Time'], df['Values'])
ax.set(xlabel = 'Time', ylabel = 'Values')
plt.xticks(rotation = 90)
plt.tight_layout()
plt.show()

matplotlib.pyplot x values not displayed as expected

I want the x values to be "yyyy-mm" from 2016-01 to 2020-01 (interval per half year). However, tried many ways the x values still in a mess.
My code is as follows:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import matplotlib.dates as mdates
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
df = pd.read_csv('fcc-forum-pageviews.csv',index_col='date')
df = df[(df['value'] >= df['value'].quantile(0.025)) & (df['value'] <= df['value'].quantile(0.975))]
fig = plt.figure()
fig.set_figwidth(15)
fig.set_figheight(9)
ax = fig.add_subplot(1,1,1)
ax.set_title('Daily freeCodeCamp Forum Page Views 5/2016-12/2019')
ax.set_xlabel('Date')
ax.set_ylabel('Page Views')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
ax.plot(df['value'],color='r')
The following is the result:
However, what I expected should be:
The dataset can be downloaded from here.Any help is highly appreciated.
After #r-beginners help, I have updated the code as follows:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import matplotlib.dates as mdates
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
df = pd.read_csv('fcc-forum-pageviews.csv',index_col='date')
df = df[(df['value'] >= df['value'].quantile(0.025)) & (df['value'] <= df['value'].quantile(0.975))]
df.index = pd.to_datetime(df.index)
fig = plt.figure()
fig.set_figwidth(15)
fig.set_figheight(9)
ax = fig.add_subplot(1,1,1)
ax.set_title('Daily freeCodeCamp Forum Page Views 5/2016-12/2019')
ax.set_xlabel('Date')
ax.set_ylabel('Page Views')
ax.set_xticks(range(len(df.index)))
ax.set_xticklabels(range(len(df.index)))
months = mdates.MonthLocator(interval=6)
months_fmt = mdates.DateFormatter('%Y-%m')
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(months_fmt)
ax.plot(df['value'], color='r')
now it can display the x values in 6-month interval. However, instead of starting from 2016-07 (the date from the dataframe), it starts from 1970-02. Any hints? Thanks.
Convert the date data to pd.to_datetime()
You can use MonthLocator to control it. The interval is set to 6 months.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import matplotlib.dates as mdates
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
df = pd.read_csv('fcc-forum-pageviews.csv',index_col='date')
df = df[(df['value'] >= df['value'].quantile(0.025)) & (df['value'] <= df['value'].quantile(0.975))]
df.index = pd.to_datetime(df.index)
fig = plt.figure()
fig.set_figwidth(15)
fig.set_figheight(9)
ax = fig.add_subplot(1,1,1)
ax.set_title('Daily freeCodeCamp Forum Page Views 5/2016-12/2019')
ax.set_xlabel('Date')
ax.set_ylabel('Page Views')
# ax.set_xticks(range(len(df.index)))
# ax.set_xticklabels(range(len(df.index)))
# months = mdates.MonthLocator(interval=6)
# months_fmt = mdates.DateFormatter('%Y-%m')
# ax.xaxis.set_major_locator(months)
# ax.xaxis.set_major_formatter(months_fmt)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
ax.plot(df['value'], color='r')
Please read on Xticks
From the docs
>>> locs, labels = xticks() # Get the current locations and labels.
>>> xticks(np.arange(0, 1, step=0.2)) # Set label locations.
>>> xticks(np.arange(3), ['Tom', 'Dick', 'Sue']) # Set text labels.
>>> xticks([0, 1, 2], ['January', 'February', 'March'],
... rotation=20) # Set text labels and properties.
>>> xticks([]) # Disable xticks.

Seaborn - Display Last Value / Label

I would like create an plot with to display the last value on line. But i can not create the plot with the last value on chart. Do you have an idea for to resolve my problem, thanks you !
Input :
DataFrame
Plot
Output :
Cross = Last Value In columns
Output Final
# import eikon as ek
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import os
import seaborn as sns; sns.set()
import pylab
from scipy import *
from pylab import *
fichier = "P:/GESTION_RPSE/GES - Gestion Epargne Salariale/Dvp Python/Florian/Absolute
Performance/PLOT.csv"
df = pd.read_csv(fichier)
df = df.drop(columns=['Unnamed: 0'])
# sns.set()
plt.figure(figsize=(16, 10))
df = df.melt('Date', var_name='Company', value_name='Value')
#palette = sns.color_palette("husl",12)
ax = sns.lineplot(x="Date", y="Value", hue='Company', data=df).set_title("LaLaLa")
plt.show()
Do you just want to put an 'X' at the end of your lines?
If so, you could pass markerevery=[-1] to the call to lineplot(). However there are a few caveats:
You have to use style= instead of hue= otherwise, there are no markers drawn
Filled markers work better than unfilled markers (like "x"). You can just use markers=True to use the default markers, or pass a list markers=['s','d','o',etc...]
code:
fmri = sns.load_dataset("fmri")
fig, ax = plt.subplots()
ax = sns.lineplot(x="timepoint", y="signal",
style="event", data=fmri, ci=None, markers=True, markevery=[-1], markersize=10)

Matplotlib DateFormatter for axis label not working

I'm trying to adjust the formatting of the date tick labels of the x-axis so that it only shows the Year and Month values. From what I've found online, I have to use mdates.DateFormatter, but it's not taking effect at all with my current code as is. Anyone see where the issue is? (the dates are the index of the pandas Dataframe)
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
fig = plt.figure(figsize = (10,6))
ax = fig.add_subplot(111)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
basicDF['some_column'].plot(ax=ax, kind='bar', rot=75)
ax.xaxis_date()
Reproducible scenario code:
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
rng = pd.date_range('1/1/2014', periods=20, freq='m')
blah = pd.DataFrame(data = np.random.randn(len(rng)), index=rng)
fig = plt.figure(figsize = (10,6))
ax = fig.add_subplot(111)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
blah.plot(ax=ax, kind='bar')
ax.xaxis_date()
Still can't get just the year and month to show up.
If I set the format after .plot , get an error like this:
ValueError: DateFormatter found a value of x=0, which is an illegal date. This usually occurs because you have not informed the axis that it is plotting dates, e.g., with ax.xaxis_date().
It's the same for if I put it before ax.xaxis_date() or after.
pandas just doesn't work well with custom date-time formats.
You need to just use raw matplotlib in cases like this.
import numpy
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas
N = 20
numpy.random.seed(N)
dates = pandas.date_range('1/1/2014', periods=N, freq='m')
df = pandas.DataFrame(
data=numpy.random.randn(N),
index=dates,
columns=['A']
)
fig, ax = plt.subplots(figsize=(10, 6))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
ax.bar(df.index, df['A'], width=25, align='center')
And that gives me:
Solution with pandas only
You can create nicely formatted ticks by using the DatetimeIndex and taking advantage of the datetime properties of the timestamps. Tick locators and formatters from matplotlib.dates are not necessary for a case like this unless you would want dynamic ticks when using the interactive interface of matplotlib for zooming in and out (more relevant for time ranges longer than in this example).
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
# Create sample time series with month start frequency, plot it with a pandas bar chart
rng = np.random.default_rng(seed=1) # random number generator
dti = pd.date_range('1/1/2014', periods=20, freq='m')
df = pd.DataFrame(data=rng.normal(size=dti.size), index=dti)
ax = df.plot.bar(figsize=(10,4), legend=None)
# Set major ticks and tick labels
ax.set_xticks(range(df.index.size))
ax.set_xticklabels([ts.strftime('%b\n%Y') if ts.year != df.index[idx-1].year
else ts.strftime('%b') for idx, ts in enumerate(df.index)])
ax.figure.autofmt_xdate(rotation=0, ha='center');
The accepted answer claims that "pandas won't work well with custom date-time formats", but you can make use of pandas' to_datetime() function to use your existing datetime Series in the dataframe:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import pandas as pd
rng = pd.date_range('1/1/2014', periods=20, freq='m')
blah = pd.DataFrame(data = np.random.randn(len(rng)), index=pd.to_datetime(rng))
fig, ax = plt.subplots()
ax.xaxis.set_major_formatter(DateFormatter('%m-%Y'))
ax.bar(blah.index, blah[0], width=25, align='center')
Will result in:
You can see the different available formats here.
I stepped into the same problem and I used an workaround to transform the index from date time format into the desired string format:
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
rng = pd.date_range('1/1/2014', periods=20, freq='m')
blah = pd.DataFrame(data = np.random.randn(len(rng)), index=rng)
fig = plt.figure(figsize = (10,6))
ax = fig.add_subplot(111)
# transform index to strings
blah_test = blah.copy()
str_index = []
for s_year,s_month in zip(blah.index.year.values,blah.index.month.values):
# build string accorind to format "%Y-%m"
string_day = '{}-{:02d}'.format(s_year,s_month)
str_index.append(string_day)
blah_test.index = str_index
blah_test.plot(ax=ax, kind='bar', rot=45)
plt.show()
which results in the following figure:

Categories

Resources