Seaborn - Timeseries line breaks when hue is added - python

I am having trouble plotting a timeseries plot using seaborn. By adding hue to the plot the timeseries plot breaks. I can't figure out why the time series plot break in between and how can I stop dates from overlapping.
The code to replicate the issue is below:
test_data = {'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-06', '2021-01-08', '2021-01-09'],
'Price':[20, 10, 30, 40, 25, 23, 56],
'Kind': ['Pre', 'Pre', 'Pre', 'Pre', 'Current', 'Post', 'Post']}
test_df = pd.DataFrame(test_data)
test_df['Date'] = pd.to_datetime(test_df['Date'])
sns.lineplot(data=test_df, x="Date", y="Price", hue='Kind')
How can I fix the line break and dates overlapping?

Try adding the style and markers arguments to handle the isolated point with "Kind" == "Current":
import seaborn as sns
import matplotlib.pyplot as plt
fig = plt.figure()
sns.lineplot(data=test_df, x="Date", y="Price", style='Kind', markers=['o', 'o', 'o'], hue='Kind')
fig.autofmt_xdate()
which displays the plot:

Related

Seaborn Boxplot with jittered outliers

I want a Boxplot with jittered outliers. But only the outliers not the non-outliers.
Searching the web you often find a workaround combining sns.boxplot() and sns.swarmplot().
The problem with that figure is that the outliers are drawn twice. I don't need the red ones I only need the jittered (green) ones.
Also the none-outliers are drawn. I don't need them also.
I also have a feautre request at upstream open about it. But on my current research there is no Seaborn-inbuild solution for that.
This is an MWE reproducing the boxplot shown.
#!/usr/bin/env python3
import random
import pandas
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme()
random.seed(0)
df = pandas.DataFrame({
'Vals': random.choices(range(200), k=200)})
df_outliers = pandas.DataFrame({
'Vals': random.choices(range(400, 700), k=20)})
df = pandas.concat([df, df_outliers], axis=0)
flierprops = {
'marker': 'o',
'markeredgecolor': 'red',
'markerfacecolor': 'none'
}
# Usual boxplot
ax = sns.boxplot(y='Vals', data=df, flierprops=flierprops)
# Add jitter with the swarmplot function
ax = sns.swarmplot(y='Vals', data=df, linewidth=.75, color='none', edgecolor='green')
plt.show()
Here is an approach to have jittered outliers. The jitter is similar to sns.stripplot(), not to sns.swarmplot() which uses a rather elaborate spreading algorithm. Basically, all the "line" objects of the subplot are checked whether they have a marker. The x-positions of the "lines" with a marker are moved a bit to create jitter. You might want to vary the amount of jitter, e.g. when you are working with hue.
import random
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme()
random.seed(0)
df = pd.DataFrame({
'Vals': random.choices(range(200), k=200)})
df_outliers = pd.DataFrame({
'Vals': random.choices(range(400, 700), k=20)})
df = pd.concat([df, df_outliers], axis=0)
flierprops = {
'marker': 'o',
'markeredgecolor': 'red',
'markerfacecolor': 'none'
}
# Usual boxplot
ax = sns.boxplot(y='Vals', data=df, flierprops=flierprops)
for l in ax.lines:
if l.get_marker() != '':
xs = l.get_xdata()
xs += np.random.uniform(-0.2, 0.2, len(xs))
l.set_xdata(xs)
plt.tight_layout()
plt.show()
An alternative approach could be to filter out the outliers, and then call sns.swarmplot() or sns.stripplot() only with those points. As seaborn doesn't return the values calculated to position the whiskers, you might need to calculate those again via scipy, taking into account seaborn's filtering on x and on hue.

matplotlib to plotly plot conversion

I wanted to create an interactive plot with matplotlib in google colab. It seems like a complex task so I want a little help to convert this piece of code which is in matplotlib to Plotly.
close = df['A']
fig = plt.figure(figsize = (15,5))
plt.plot(close, color='r', lw=2.)
plt.plot(close, '^', markersize=10, color='m', label = 'signal X', markevery = df_x)
plt.plot(close, 'v', markersize=10, color='k', label = 'signal Y', markevery = df_y)
plt.title('Turtle Agent: total gains %f, total investment %f%%'%(df_A, df_B))
plt.legend()
plt.show()
using sample data from plotly OHLC examples https://plotly.com/python/ohlc-charts/
create a line trace
add scatter traces based on filters of data frame with required formatting. This is done as a list comprehension, could be done as inline code
import pandas as pd
import numpy as np
import plotly.express as px
df = pd.read_csv(
"https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv"
)
df["Date"] = pd.to_datetime(df["Date"])
# make data set more useful for demonstrating this plot
df.loc[df.sample((len(df)//8)*7).index, "direction"] = np.nan
px.line(df, x="Date", y="AAPL.Close").update_traces(line_color="red").add_traces(
[
px.scatter(
df.loc[df["direction"].eq(filter)], x="Date", y="AAPL.Close"
)
.update_traces(marker=fmt)
.data[0]
for filter, fmt in zip(
["Increasing", "Decreasing"],
[
{"color": "black", "symbol": "triangle-up", "size": 10},
{"color": "blue", "symbol": "triangle-down", "size": 10},
],
)
]
)

Pandas scatter plot not coloring by column value

I have a simple pandas DataFrame as shown below. I want to create a scatter plot of value on the y-axis, date on the x-axis, and color the points by category. However, coloring the points isn't working.
# Create dataframe
df = pd.DataFrame({
'date': ['2016-01-01', '2016-02-01', '2016-03-01', '2016-01-01', '2016-02-01', '2016-03-01'],
'category': ['Wholesale', 'Wholesale', 'Wholesale', 'Retail', 'Retail', 'Retail'],
'value': [50, 60, 65, 55, 62, 70]
})
df['date'] = pd.to_datetime(df['date'])
# Try to plot
df.plot.scatter(x='date', y='value', c='category')
ValueError: 'c' argument must be a mpl color, a sequence of mpl colors or a sequence of numbers, not ['Wholesale' 'Wholesale' 'Wholesale' 'Retail' 'Retail' 'Retail'].
Why am a I getting the error? Pandas scatter plot documentation says the argument c can be "A column name or position whose values will be used to color the marker points according to a colormap."
df.plot.scatter(x='date', y='value', c=df['category'].map({'Wholesale':'red','Retail':'blue'}))
I think you are looking at seaborn:
import seaborn as sns
sns.scatterplot(data=df, x='date', y='value', hue='category')
Output:
Or you can loop through df.groupby:
fig, ax = plt.subplots()
for cat, d in df.groupby('category'):
ax.scatter(x=d['date'],y=d['value'], label=cat)
Output:

How to control white space between bars in seaborn bar plots?

I have following simple example dataframe:
import pandas as pd
data = [['Alex',25],['Bob',34],['Sofia',26],["Claire",35]]
df = pd.DataFrame(data,columns=['Name','Age'])
df["sex"]=["male","male","female","female"]
I use following code to plot barplots:
import matplotlib.pyplot as plt
import seaborn as sns
age_plot=sns.barplot(data=df,x="Name",y="Age", hue="sex",dodge=False)
age_plot.get_legend().remove()
plt.setp(age_plot.get_xticklabels(), rotation=90)
plt.ylim(0,40)
age_plot.tick_params(labelsize=14)
age_plot.set_ylabel("Age",fontsize=15)
age_plot.set_xlabel("",fontsize=1)
plt.tight_layout()
Produces following bar plot:
My question: how can I control de whitespace between bars? I want some extra white space between the male (blue) and female (orange) bars.
Output should look like this (poorly edited in MS PPT):
I have found several topics on this for matplotplib (e.g.https://python-graph-gallery.com/5-control-width-and-space-in-barplots/) but not for seaborn. I'd prefer to use seaborn because of the easy functionality to color by hue.
Thanks.
A possibility is to insert an empty bar in the middle:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'Name': ['Alex', 'Bob', 'Sofia', 'Claire'], 'Age': [15, 18, 16, 22], 'Gender': ['M', 'M', 'F', 'F']})
df = pd.concat([df[df.Gender == 'M'], pd.DataFrame({'Name': [''], 'Age': [0], 'Gender': ['M']}), df[df.Gender == 'F']])
age_plot = sns.barplot(data=df, x="Name", y="Age", hue="Gender", dodge=False)
age_plot.get_legend().remove()
plt.setp(age_plot.get_xticklabels(), rotation=90)
plt.ylim(0, 40)
age_plot.tick_params(labelsize=14)
age_plot.tick_params(length=0, axis='x')
age_plot.set_ylabel("Age", fontsize=15)
age_plot.set_xlabel("", fontsize=1)
plt.tight_layout()
plt.show()

Multicolumn plot Seaborn and Matplotlib

I have this table of data:
Date, City, Sales, PSale1, PSale2
Where PSale1, and PSale2 are predicted prices for sales.
The data set looks like this:
Date,City,Sales,PSale1,PSale2
01/01,NYC,200,300,178
01/01,SF,300,140,100
02/01,NYC,400,410,33
02/01,SF,42,24,60
I want to plot this data in seaborn, I was working on to plot a graph which will have date on the x-axis, and PSale1 and PSale2 as bars for each city on y-axis, but I am confused how to work with such a data.
As much as I know seaborn doesn't allow plotting two variables on any axis, how should I approach this situation?
Convert the dataframe to a long format
Plot using seaborn.FacetGrid
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = {'Date': ['01/01', '01/01', '02/01', '02/01'],
'City': ['NYC', 'SF', 'NYC', 'SF'],
'Sales': [200, 300, 400, 42],
'PSale1': [300, 140, 410, 24],
'PSale2': [178, 100, 33, 60]}
df = pd.DataFrame(data)
# convert to long
dfl = df.set_index(['Date', 'City', 'Sales']).stack().reset_index().rename(columns={'level_3': 'pred', 0: 'value'})
# plot
g = sns.FacetGrid(dfl, row='City')
g.map(sns.barplot, 'Date', 'value', 'pred').add_legend()
plt.show()
All in one figure
# shape the dataframe
dfc = df.drop(columns=['Sales']).set_index(['Date', 'City']).stack().unstack(level=1)
dfc.columns.name = None
# plot
dfc.plot.bar(stacked=True)
plt.xlabel('Date: Predicted')
plt.show()

Categories

Resources