Plot Multiple Y axis + 'hue' scatterplot in python - python

Dataframe
df
Sample Type y1 y2 y3 y4
S1 H 1000 135 220 171
S2 H 2900 1560 890 194
S3 P 678 350 127 255
S4 P 179 510 154 275
I want to plot y1, y2, y3, y4 vs Sample scatterplot with hue as Type.
Is there any way to do it in Seaborn?

Since, you want just one plot you can use sns.scatterplot:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
#df = pd.read_csv('yourfile.csv')
#plotting
df1 = df.melt(['Type','Sample'])
sns.scatterplot(data=df1, x="Sample", y="value",hue="Sample",style="Type")
plt.show()
In case you want multiple scatter plots, you can use sns.relplot:
#some preprocessing
df1 = df.melt(['Type','Sample'])
#plotting
sns.relplot(data=df1, x="Sample", y="value", hue="Type", col="variable", height=2, aspect=1.5)
plt.show()
In case, you want 2x2 grid :
df1 = df.melt(['Type','Sample'])
#plotting
sns.relplot(data=df1, x="Sample", y="value", hue="Type", col="variable",col_wrap=2, height=2, aspect=1.5)
plt.show()
In case, you want 1x4 grid :
df1 = df.melt(['Type','Sample'])
#plotting
sns.relplot(data=df1, x="Sample", y="value", hue="Type", col="variable",col_wrap=1, height=2, aspect=1.5)
plt.show()

Related

how to put label in dataframe in Density plotting in matplotlib

#dataframe
a=
timestamp count
2021-08-16 20
2021-08-17 60
2021-08-18 35
2021-08-19 1
2021-08-20 0
2021-08-21 1
2021-08-22 50
2021-08-23 36
2021-08-24 68
2021-08-25 125
2021-08-26 54
I applied this code
a.plot(kind="density")
this is not what i want.
I want to put Count on Y axis and timestamp in X axis with Density plotting.
just like i can do it with plt.bar(a['timestamp'],a['count'])
OR this is not possible with Density plotting?
The following code creates a density histogram. The total area sums to 1, supposing each of the timestamps counts as 1 unit. To get the timestamps as x-axis, they are set as the index. To get the total area to sum to 1, all count values are divided by their total sum.
A kde a calculated from the same data.
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
from scipy.stats import gaussian_kde
from io import StringIO
a_str = '''timestamp count
2021-08-16 20
2021-08-17 60
2021-08-18 35
2021-08-19 1
2021-08-20 0
2021-08-21 1
2021-08-22 50
2021-08-23 36
2021-08-24 68
2021-08-25 125
2021-08-26 54'''
a = pd.read_csv(StringIO(a_str), delim_whitespace=True)
ax = (a.set_index('timestamp') / a['count'].sum()).plot.bar(width=0.9, rot=0, figsize=(12, 5))
kde = gaussian_kde(np.arange(len(a)), bw_method=0.2, weights=a['count'])
xs = np.linspace(-1, len(a), 200)
ax.plot(xs, kde(xs), lw=2, color='crimson', label='kde')
ax.set_xlim(xs[0], xs[-1])
ax.legend(labels=['kde', 'density histogram'])
ax.set_xlabel('')
ax.set_ylabel('density')
plt.tight_layout()
plt.show()
If you just want to plot the kde curve, you can leave out the histogram. Optionally you can fill the area under the curve.
fig, ax = plt.subplots(figsize=(12, 5))
kde = gaussian_kde(np.arange(len(a)), bw_method=0.2, weights=a['count'])
xs = np.linspace(-1, len(a), 200)
# plot the kde curve
ax.plot(xs, kde(xs), lw=2, color='crimson', label='kernel density estimation')
# optionally fill the area below the curve
ax.fill_between(xs, kde(xs), color='crimson', alpha=0.2)
ax.set_xticks(np.arange(len(a)))
ax.set_xticklabels(a['timestamp'])
ax.set_xlim(xs[0], xs[-1])
ax.set_ylim(ymin=0)
ax.legend()
ax.set_xlabel('')
ax.set_ylabel('density')
plt.tight_layout()
plt.show()
To plot multiple similar curves, for example using more count columns, you can use a loop. A list of colors that go well together could be obtained from the Set2 colormap:
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
from scipy.stats import gaussian_kde
a = pd.DataFrame({'timestamp': ['2021-08-16', '2021-08-17', '2021-08-18', '2021-08-19', '2021-08-20', '2021-08-21',
'2021-08-22', '2021-08-23', '2021-08-24', '2021-08-25', '2021-08-26']})
for i in range(1, 5):
a[f'count{i}'] = (np.random.uniform(0, 12, len(a)) ** 2).astype(int)
xs = np.linspace(-1, len(a), 200)
fig, ax = plt.subplots(figsize=(12, 4))
for column, color in zip(a.columns[1:], plt.cm.Set2.colors):
kde = gaussian_kde(np.arange(len(a)), bw_method=0.2, weights=a[column])
ax.plot(xs, kde(xs), lw=2, color=color, label=f"kde of '{column}'")
ax.fill_between(xs, kde(xs), color=color, alpha=0.2)
ax.set_xlim(xs[0], xs[-1])
ax.set_xticks(np.arange(len(a)))
ax.set_xticklabels(a['timestamp'])
ax.set_xlim(xs[0], xs[-1])
ax.set_ylim(ymin=0)
ax.legend()
ax.set_xlabel('Date')
ax.set_ylabel('Density of Counts')
plt.tight_layout()
plt.show()

Set color-palette in Seaborn Grouped Barplot depending on values

I have a dataframe with positive and negative values from three kind of variables.
labels variable value
0 -10e5 nat -38
1 2e5 nat 50
2 10e5 nat 16
3 -10e5 agr -24
4 2e5 agr 35
5 10e5 agr 26
6 -10e5 art -11
7 2e5 art 43
8 10e5 art 20
when values are negative I want the barplot to follow the color sequence:
n_palette = ["#ff0000","#ff0000","#00ff00"]
Instead when positive I want it to reverse the palette:
p_palette = ["#00ff00","#00ff00","#ff0000"]
I've tried this:
palette = ["#ff0000","#ff0000","#00ff00",
"#00ff00","#00ff00","#ff00",
"#00ff00","#00ff00","#ff00"]
ax = sns.barplot(x=melted['labels'], y=melted['value'], hue = melted['variable'],
linewidth=1,
palette=palette)
But I get the following output:
what I'd like is the first two bars of the group to become green and the last one red when values are positive.
You seem to want to do the coloring depending on a criterion on two columns. It seems suitable to add a new column which uniquely labels that criterion.
Further, seaborn allows the palette to be a dictionary telling exactly which hue label gets which color. Adding barplot(..., order=[...]) would define a fixed order.
Here is some example code:
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from io import StringIO
data_str = ''' labels variable value
0 -10e5 nat -38
1 2e5 nat 50
2 10e5 nat 16
3 -10e5 agr -24
4 2e5 agr 35
5 10e5 agr 26
6 -10e5 art -11
7 2e5 art 43
8 10e5 art 20
'''
melted = pd.read_csv(StringIO(data_str), delim_whitespace=True, dtype={'labels': str})
melted['legend'] = np.where(melted['value'] < 0, '-', '+')
melted['legend'] = melted['variable'] + melted['legend']
palette = {'nat-': "#ff0000", 'agr-': "#ff0000", 'art-': "#00ff00",
'nat+': "#00ff00", 'agr+': "#00ff00", 'art+': "#ff0000"}
ax = sns.barplot(x=melted['labels'], y=melted['value'], hue=melted['legend'],
linewidth=1, palette=palette)
ax.axhline(0, color='black')
plt.show()
PS: To remove the legend: ax.legend_.remove(). Or to have a legend with multiple columns: ax.legend(ncol=3).
A different approach, directly with the original dataframe, is to create two bar plots: one for the negative values and one for the positive. For this to work well, it is necessary that the 'labels' column (the x=) is explicitly made categorical. Also adding pd.Categorical(..., categories=['nat', 'agr', 'art']) for the 'variable' column could fix an order.
This will generate a legend with the labels twice with different colors. Depending on what you want, you can remove it or create a more custom legend.
An idea is to add the labels under the positive and on top of the negative bars:
sns.set()
melted = pd.read_csv(StringIO(data_str), delim_whitespace=True, dtype={'labels': str})
palette_pos = {'nat': "#00ff00", 'agr': "#00ff00", 'art': "#ff0000"}
palette_neg = {'nat': "#ff0000", 'agr': "#ff0000", 'art': "#00ff00"}
melted['labels'] = pd.Categorical(melted['labels'])
ax = sns.barplot(data=melted[melted['value'] < 0], x='labels', y='value', hue='variable',
linewidth=1, palette=palette_neg)
sns.barplot(data=melted[melted['value'] >= 0], x='labels', y='value', hue='variable',
linewidth=1, palette=palette_pos, ax=ax)
ax.legend_.remove()
ax.axhline(0, color='black')
ax.set_xlabel('')
ax.set_ylabel('')
for bar_container in ax.containers:
label = bar_container.get_label()
for p in bar_container:
x = p.get_x() + p.get_width() / 2
h = p.get_height()
if not np.isnan(h):
ax.text(x, 0, label + '\n\n' if h < 0 else '\n\n' + label, ha='center', va='center')
plt.show()
Still another option involves sns.catplot() which could be clearer when a lot of data is involved:
sns.set()
melted = pd.read_csv(StringIO(data_str), delim_whitespace=True, dtype={'labels': str})
melted['legend'] = np.where(melted['value'] < 0, '-', '+')
melted['legend'] = melted['variable'] + melted['legend']
palette = {'nat-': "#ff0000", 'agr-': "#ff0000", 'art-': "#00ff00",
'nat+': "#00ff00", 'agr+': "#00ff00", 'art+': "#ff0000"}
g = sns.catplot(kind='bar', data=melted, col='labels', y='value', x='legend',
linewidth=1, palette=palette, sharex=False, sharey=True)
for ax in g.axes.flat:
ax.axhline(0, color='black')
ax.set_xlabel('')
ax.set_ylabel('')
plt.show()

How to draw cumulative density plot from pandas?

I have a dataframe:
count_single count_multi column_names
0 11345 7209 e
1 11125 6607 w
2 10421 5105 j
3 9840 4478 r
4 9561 5492 f
5 8317 3937 i
6 7808 3795 l
7 7240 4219 u
8 6915 3854 s
9 6639 2750 n
10 6340 2465 b
11 5627 2834 y
12 4783 2384 c
13 4401 1698 p
14 3305 1753 g
15 3283 1300 o
16 2767 1697 t
17 2453 1276 h
18 2125 1140 a
19 2090 929 q
20 1330 518 d
I want to visualize the single count and multi_count while column_names as a common column in both of them. I am looking something like this :
What I've tried:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_context('paper')
f, ax = plt.subplots(figsize = (6,15))
sns.set_color_codes('pastel')
sns.barplot(x = 'count_single', y = 'column_names', data = df,
label = 'Type_1', color = 'orange', edgecolor = 'w')
sns.set_color_codes('muted')
sns.barplot(x = 'count_multi', y = 'column_names', data = df,
label = 'Type_2', color = 'green', edgecolor = 'w')
ax.legend(ncol = 2, loc = 'lower right')
sns.despine(left = True, bottom = True)
plt.show()
it's giving me plot like this:
How to visualize these two columns with same as expected images?
I really appreciate any help you can provide.
# instantiate figure with two rows and one column
fig, axes = plt.subplots(nrows=2, figsize=(10,5))
# plot barplot in the first row
df.set_index('column_names').plot.bar(ax=axes[0], color=['rosybrown', 'tomato'])
# first scale each column bydividing by its sum and then use cumulative sum to generate the cumulative density function. plot on the second ax
df.set_index('column_names').apply(lambda x: x/x.sum()).cumsum().plot(ax=axes[1], color=['rosybrown', 'tomato'])
# change ticks in first plot:
axes[0].set_yticks(np.linspace(0, 12000, 7)) # this means: make 7 ticks between 0 and 12000
# adjust the axislabels for the second plot
axes[1].set_xticks(range(len(df)))
axes[1].set_xticklabels(df['column_names'], rotation=90)
plt.tight_layout()

Plotting multiple lines in the same graph for every different entry in a column

My dataset looks like this:
Town week price sales
A 1 1.1 101
A 2 1.2 303
A 3 1.3 234
B 1 1.2 987
B 2 1.5 213
B 3 3.9 423
C 1 2.4 129
C 2 1.3 238
C 3 1.3 132
Now I need make a single figure with 3 lines (each representing a different town), where I plot the sales and price per week. I know how to do it when I take the mean of the towns, but I can't figure out how to do it per Town.
data = pd.read_excel("data.xlsx")
dfEuroAvg = data[data['Product'] == "Euro"].groupby('Week').mean()
t = np.arange(1, 50, 1)
y3 = dfEuroAvg['Sales']
y4 = dfEuroAvg['Price']
fig, ax2 = plt.subplots()
color = 'tab:green'
ax2.set_xlabel('Week')
ax2.set_ylabel('Sales', color = color)
ax2.plot(t, y3, color = color)
ax2.tick_params(axis = 'y', labelcolor = color)
ax3 = ax2.twinx()
color = 'tab:orange'
ax3.set_ylabel('Price', color=color)
ax3.plot(t, y4, color=color)
ax3.tick_params(axis='y', labelcolor=color)
ax2.set_title("product = Euro, Sales vs. Price")
EDIT: On the X-axis are the weeks and on the Y-axis are the price and sales.
This is one way of doing it using groupby to form groups based on Town and then plot the price and sales using a secondary y axis
fig, ax = plt.subplots(figsize=(8, 6))
df_group = data.groupby('Town')['week','price','sales']
ylabels = ['price', 'sales']
colors =['r', 'g', 'b']
for i, key in enumerate(df_group.groups.keys()):
df_group.get_group(key).plot('week', 'price', color=colors[i], ax=ax, label=key)
df_group.get_group(key).plot('week', 'sales', color=colors[i], linestyle='--', secondary_y=True, ax=ax)
handles,labels = ax.get_legend_handles_labels()
legends = ax.legend()
legends.remove()
plt.legend(handles, labels)
ax1.set_ylabel('Price')
ax2.set_ylabel('Sales')
You will have to fetch the data for each town separately by filtering the dataframe.
# df = your dataframe with all the data
towns = ['A', 'B', 'C']
for town in towns:
town_df = df[df['town'] == town]
plt.plot(town_df['week'], town_df['price'], label=town)
plt.legend()
plt.xlabel('Week')
plt.ylabel('Price')
plt.title('Price Graph')
plt.show()
Output:
I have done this for the price graph, you can similarly create a graph with Sales as the y-axis using the same steps
You may plot the pivoted data directly with pandas.
ax = df.pivot("week", "Town", "price").plot()
ax2 = df.pivot("week", "Town", "sales").plot(secondary_y=True, ax=ax)
Complete example:
import io
import pandas as pd
import matplotlib.pyplot as plt
u = """Town week price sales
A 1 1.1 101
A 2 1.2 303
A 3 1.3 234
B 1 1.2 987
B 2 1.5 213
B 3 3.9 423
C 1 2.4 129
C 2 1.3 238
C 3 1.3 132"""
df = pd.read_csv(io.StringIO(u), delim_whitespace=True)
ax = df.pivot("week", "Town", "price").plot(linestyle="--", legend=False)
ax.set_prop_cycle(None)
ax2 = df.pivot("week", "Town", "sales").plot(secondary_y=True, ax=ax, legend=False)
ax.set_ylabel('Price')
ax2.set_ylabel('Sales')
ax2.legend()
plt.show()

Plotting graph with categorical axes

I have the following dataframe, which I am aiming to plot both max data and min data on the same graph, using Month_Day as x-axis, but only printing 'Jan', 'Feb', 'Mar', etc...
Month_Day max min
0 Jan-01 243 86
1 Jan-02 230 90
2 Jan-03 233 104
3 Jan-04 220 73
4 Jan-05 224 71
but once I include the dates, it poped an error.
dates = pd.date_range('1/1/2015','31/12/2015', freq='D')
plt.plot(tmax, '-r', tmin, '-b')
#plt.plot(dates, tmax, '-r', dates, tmin, '-b') <- this is the line i plot dates as axis
plt.fill_between(range(len(tmin)), tmin, tmax, facecolor='gray', alpha=0.25)
plt.grid(True)
gives the error:
error: ordinal must be >= 1
You could use xaxis.set_major_formatter().
Here's a simple example of this:
import datetime
import random
import matplotlib.pyplot as plt
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(days=i) for i in range(180)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
p1 = plt.subplot(211)
p1.xaxis.set_major_formatter(mdate.DateFormatter('%b', None))
# plot
plt.plot(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
Output

Categories

Resources