Connecting pairs of dots on the scatterplot with jitter with lines - Python - python

I have two groups of points, but they also overlap, so I need to add jitter if I plot them with a scatterplot. I also want to connect matching points from each group (they all have a pair).
There are many questions that suggest:
data = [['abc', 'pre', 10], ['abc', 'post', 5], ['bce', 'pre', 10], ['bce', 'post', 5], ['cef', 'pre', 8], ['cef', 'post', 5]]
df = pd.DataFrame(data, columns=['ID', 'time', 'value'])
grouped = df.groupby('ID')
for name, group in grouped:
sns.scatterplot(x='time', y='value', data=group, color='#3C74BC')
sns.lineplot(x='time', y='value', data=group, color='#3C74BC')
plt.show()
It works ok, but it doesn't have jitter. If I add jitter via sns. stripplot(), the lines do not connect dots anymore and they are coming out of arbitrary places.

The approach below makes following changes:
Convert the time to numeric (0 for 'pre' and 1 for 'post') via (df['time'] != 'pre').astype(float)
Add a random jitter to these values: + np.random.uniform(-0.1, 0.1, len(df)). Depending on how many values you have, you might change 0.1 to a larger value.
Use sns.lineplot with a marker to avoid the need of scatterplot.
Use hue='ID' to draw everything in one go.
As hue doesn't look to color=, use palette= with the same number of colors as there are different hue values.
Suppress the legend, as all hue values have the same color.
Assign tick labels to 0 and 1.
Set xlim to so the tick labels are at equal distances to the respective border.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
data = [['abc', 'pre', 10], ['abc', 'post', 5], ['bce', 'pre', 10], ['bce', 'post', 5], ['cef', 'pre', 8], ['cef', 'post', 5]]
df = pd.DataFrame(data, columns=['ID', 'time', 'value'])
df['time'] = (df['time'] != 'pre').astype(float) + np.random.uniform(-0.1, 0.1, len(df))
ax = sns.lineplot(x='time', y='value', data=df, hue='ID', marker='o',
palette=['#3C74BC'] * len(df['ID'].unique()), legend=False)
ax.set_xticks([0, 1], ['pre', 'post'])
ax.set_xlim(-0.2, 1.2)
plt.show()

Related

Plot subplots inside subplots matplotlib

Context: I'd like to plot multiple subplots (sparated by legend) based on patterns from the columns of a dataframe inside a subplot however, I'm not being able to separate each subplots into another set of subplots.
This is what I have:
import matplotlib.pyplot as plt
col_patterns = ['pattern1','pattern2']
# define subplot grid
fig, axs = plt.subplots(nrows=len(col_patterns), ncols=1, figsize=(30, 80))
plt.subplots_adjust()
fig.suptitle("Title", fontsize=18, y=0.95)
for col_pat,ax in zip(col_patterns,axs.ravel()):
col_pat_columns = [col for col in df.columns if col_pat in col]
df[col_pat_columns].plot(x='Week',ax=ax)
# chart formatting
ax.set_title(col_pat.upper())
ax.set_xlabel("")
Which results in something like this:
How could I make it so that each one of those suplots turn into another 6 subplots all layed out horizontally? (i.e. each figure legend would be its own subplot)
Thank you!
In your example, you're defining a 2x1 subplot and only looping through two axes objects that get created. In each of the two loops, when you call df[col_pat_columns].plot(x='Week',ax=ax), since col_pat_columns is a list and you're passing it to df, you're just plotting multiple columns from your dataframe. That's why it's multiple series on a single plot.
#fdireito is correct—you just need to set the ncols argument of plt.subplots() to the right number that you need, but you'd need to adjust your loops to accommodate.
If you want to stay in matplotlib, then here's a basic example. I had to take some guesses as to how your dataframe was structured and so on.
# import matplotlib
import matplotlib.pyplot as plt
# create some fake data
x = [1, 2, 3, 4, 5]
df = pd.DataFrame({
'a':[1, 1, 1, 1, 1], # horizontal line
'b':[3, 6, 9, 6, 3], # pyramid
'c':[4, 8, 12, 16, 20], # steep line
'd':[1, 10, 3, 13, 5] # zig-zag
})
# a list of lists, where each inner list is a set of
# columns we want in the same row of subplots
col_patterns = [['a', 'b', 'c'], ['b', 'c', 'd']]
The following is a simplified example of what your code ends up doing.
fig, axes = plt.subplots(len(col_patterns), 1)
for pat, ax in zip(col_patterns, axes):
ax.plot(x, df[pat])
2x1 subplot (what you have right now)
I use enumerate() with col_patterns to iterate through the subplot rows, and then use enumerate() with each column name in a given pattern to iterate through the subplot columns.
# the following will size your subplots according to
# - number of different column patterns you want matched (rows)
# - largest number of columns in a given column pattern (columns)
subplot_rows = len(col_patterns)
subplot_cols = max([len(x) for x in col_patterns])
fig, axes = plt.subplots(subplot_rows, subplot_cols)
for nrow, pat in enumerate(col_patterns):
for ncol, col in enumerate(pat):
axes[nrow][ncol].plot(x, df[col])
Correctly sized subplot
Here's all the code, with a couple additions I omitted from the code above for simplicity's sake.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
df = pd.DataFrame({
'a':[1, 1, 1, 1, 1], # horizontal line
'b':[3, 6, 9, 6, 3], # pyramid
'c':[4, 8, 12, 16, 20], # steep line
'd':[1, 10, 3, 13, 5] # zig-zag
})
col_patterns = [['a', 'b', 'c'], ['b', 'c', 'd']]
# what you have now
fig, axes = plt.subplots(len(col_patterns), 1, figsize=(12, 8))
for pat, ax in zip(col_patterns, axes):
ax.plot(x, df[pat])
ax.legend(pat, loc='upper left')
# what I think you want
subplot_rows = len(col_patterns)
subplot_cols = max([len(x) for x in col_patterns])
fig, axes = plt.subplots(subplot_rows, subplot_cols, figsize=(16, 8), sharex=True, sharey=True, tight_layout=True)
for nrow, pat in enumerate(col_patterns):
for ncol, col in enumerate(pat):
axes[nrow][ncol].plot(x, df[col], label=col)
axes[nrow][ncol].legend(loc='upper left')
Another option you can consider is ditching matplotlib and using Seaborn relplots. There are several examples on that page that should help. If you have your dataframe set up correctly (long or "tidy" format), then to achieve the same as above, your one-liner would look something like this:
# import seaborn as sns
sns.relplot(data=df, kind='line', x=x_vals, y=y_vals, row=col_pattern, col=num_weeks_rolling)

How to make a horizontal stacked histplot based on counts?

I have a df which represents three states (S1, S2, S3) at 3 timepoints (1hr, 2hr and 3hr). I would like to show a stacked bar plot of the states but the stacks are discontinous or at least not cumulative. How can I fix this in Seaborn? It is important that time is on the y-axis and the state counts on the x-axis.
Below is some code.
data = [[3, 2, 18],[4, 13, 6], [1, 2, 20]]
df = pd.DataFrame(data, columns = ['S1', 'S2', 'S3'])
df = df.reset_index().rename(columns = {'index':'Time'})
melt = pd.melt(df, id_vars = 'Time')
plt.figure()
sns.histplot(data = melt,x = 'value', y = 'Time', bins = 3, hue = 'variable', multiple="stack")
EDIT:
This is somewhat what I am looking for, I hope this gives you an idea. Please ignore the difference in the scales between boxes...
If I understand correctly, I think you want to use value as a weight:
sns.histplot(
data=melt, y='Time', hue='variable', weights='value',
multiple='stack', shrink=0.8, discrete=True,
)
This is pretty tough in seaborn as it doesn't natively support stacked bars. You can use either the builtin plot from pandas, or try plotly express.
data = [[3, 2, 18],[4, 13, 6], [1, 2, 20]]
df = pd.DataFrame(data, columns = ['S1', 'S2', 'S3'])
df = df.reset_index().rename(columns = {'index':'Time'})
# so your y starts at 1
df.Time+=1
melt = pd.melt(df, id_vars = 'Time')
# so y isn't treated as continuous
melt.Time = melt.Time.astype('str')
Pandas can do it, but getting the labels in there is a bit of pain. Check around to figure out how to do it.
df.set_index('Time').plot(kind='barh', stacked=True)
Plotly makes it easier:
import plotly.express as px
px.bar(melt, x='value', y='Time', color='variable', orientation='h', text='value')

Pandas Plot floating bar chart

I am trying to create a bar chart where the upper and lower bound of each bar could be above or below zero. Hence the boxes should "float" depending on the data. I'm also trying to use pandas.plot function as it makes my life way easier in the real application.
The solution I've devised is a horrible kludge and only partially works. Basically I'm running two different bar charts that overlap, with one of the bars being white to "hide" the main bar if necessary. I'm using a mask to mark which bars should be which color. As you can see, this works OK in the "London" and "Paris" example below, but in the "Tokyo" it isn't working because the green bar is "in front" of the white bar.
I could manually fix this a few ways that I can think of, but it would make an already kludgy solution even worse. I'm sure there's a better way that I'm just not smart enough to think of!
Here's the plot, and full code below.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_data = {'Category':['London', 'Paris', 'New York', 'Tokyo'],
'Upper':[10, 5, 0, -5],
'Lower':[5, -5, -10, -10]}
df = pd.DataFrame(data = df_data)
#Color corrector
u_mask = df['Upper'] < 0
d_mask = df['Lower'] < 0
n = len(df)
uca = ['darkgreen' for i in range(n)]
uca = np.array(uca)
uc = uca.copy()
uc[u_mask] = 'white'
dca = ['white' for i in range(n)]
dca = np.array(dca, dtype=uca.dtype)
dc = dca.copy()
dc[d_mask] = 'darkgreen'
(df.plot(kind='bar', y='Upper', x='Category',
color=uc, legend=False))
ax = plt.gca()
(df.plot(kind='bar', y='Lower', x='Category',
color=dc, legend=False, ax=ax))
plt.axhline(0, color='black')
x_axis = ax.xaxis
x_axis.label.set_visible(False)
plt.subplots_adjust(left=0.1,right=0.90,bottom=0.2,top=0.90)
plt.show()
To create the plot via pandas, you could create an extra column with the height. And use df.plot(..., y=df['Height'], bottom=df['Lower']):
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_data = {'Category': ['London', 'Paris', 'New York', 'Tokyo'],
'Upper': [10, 5, 0, -5],
'Lower': [5, -5, -10, -10]}
df = pd.DataFrame(data=df_data)
df['Height'] = df['Upper'] - df['Lower']
ax = df.plot(kind='bar', y='Height', x='Category', bottom=df['Lower'],
color='darkgreen', legend=False)
ax.axhline(0, color='black')
plt.tight_layout()
plt.show()
PS: Note that pandas barplot forces the lower ylim to be "sticky". This is a desired behavior when all values are positive and the bars stand firmly on y=0. However, this behavior is distracting when both positive and negative values are involved.
To remove the stickyness:
ax.use_sticky_edges = False # df.plot() makes the lower ylim sticky
ax.autoscale(enable=True, axis='y')
plt.bar has a bottom paramter. You just need to calculate the heights. Here is a very easy exampel:
upper = [10, 5, 0, -5]
lower = [5, -5, -10, -10]
height = [upper[i] - lower[i] for i in range(len(upper))]
data = [1,2,3]
plt.bar(range(len(lower)),height, bottom=lower)
plt.show()

Show all lines in matplotlib line plot

How do I bring the other line to the front or show both the graphs together?
plot_yield_df.plot(figsize=(20,20))
If plot data overlaps, then one way to view both the data is increase the linewidth along with handling transparency, as shown:
plt.plot(np.arange(5), [5, 8, 6, 9, 4], label='Original', linewidth=5, alpha=0.5)
plt.plot(np.arange(5), [5, 8, 6, 9, 4], label='Predicted')
plt.legend()
Subplotting is other good way.
Problem
The lines are plotted in the order their columns appear in the dataframe. So for example
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
a = np.random.rand(400)*0.9
b = np.random.rand(400)+1
a = np.c_[a,-a].flatten()
b = np.c_[b,-b].flatten()
df = pd.DataFrame({"A" : a, "B" : b})
df.plot()
plt.show()
Here the values of "B" hide those from "A".
Solution 1: Reverse column order
A solution is to reverse their order
df[df.columns[::-1]].plot()
That has also changed the order in the legend and the color coding.
Solution 2: Reverse z-order
So if that is not desired, you can instead play with the zorder.
ax = df.plot()
lines = ax.get_lines()
for line, j in zip(lines, list(range(len(lines)))[::-1]):
line.set_zorder(j)

Stack the lables of an axis with matplotlib

I'm creating a bar graph and showing multiple values on the x axis. By default they are shown in series with a "," separating them as shown below. Instead of a coma how could I show the values stacked on top of each other as drawn on the image below? This would save space on the x-axis to allow for bigger graphs when I want to show multiple values.
import pandas as pd
import matplotlib as plt
dfex = pd.DataFrame({'City': ['LA', 'SF', 'Dallas'],
'Lakes': [3, 9, 6],
'Rivers': [1, 0, 0],
'State': ['CA', 'CA', 'TX'],
'Waterfalls': [2, 4, 5]})
myplot = dfex.plot(x=['City','State'],kind='bar',stacked='True')
You can simply hack the x-axis tick labels to achieve what you want.
ticks = myplot.xaxis.get_ticklabels()
new_ticks = ['\n'.join(t.get_text()[1:-1].split(', ')) for t in ticks]
myplot.xaxis.set_ticklabels(new_ticks)

Categories

Resources