seaborn factorplot: set series order of display in legend - python

Seaborn, for some special cases, order the legend sometimes differently than the plotting order:
data = {'group': [-2, -1, 0] * 5,
'x': range(5)*3,
'y' : range(15)}
df = pd.DataFrame(data)
sns.factorplot(kind='point', x='x', y='y', hue='group', data=df)
While the plotting sequence is [-2, -1, 0], the legend is listed in order of [-1, -2, 0].
My current workaround is to disable the legend in factorplot and then add the legend afterwards using matplotlib. Is there a better way?

I think what you're looking for is hue_order = [-2, -1, 0]
df = pd.DataFrame({'group': ['-2','-1','0'] * 5, 'x' : range(5) * 3, 'y' : range(15)})
sns.factorplot(kind = 'point', x = 'x', y= 'y', hue_order = ['-2', '-1', '0'], hue = 'group', data = df)

I just stumbled across this oldish post. The only answer doesn't seem to work for me but I found a more satisfying solution to change legend order.
Although in your examples the legends are set correctly for me, it is possible to change the ordre via the add_legend() method:
df = pd.DataFrame({'group': [-2,-1,0] * 5, 'x' : range(5) * 3, 'y' : range(15)})
ax = sns.factorplot(kind = 'point', x = 'x', y= 'y', hue = 'group', data = df, legend = False)
ax.add_legend(label_order = ['0','-1','-2'])
And for automated numerical sorting:
ax.add_legend(label_order = sorted(ax._legend_data.keys(), key = int))

Related

How can I apply clipping to mark_text() in altair?

I have my plot clipped so it only shows certain ranges on the y axis. I added text to it using this code:
text2 = plot2.mark_text(align='left', dx=5, dy= -8, size = 15).encode(text = alt.Text('Accuracy', format = ',.2f'))
But this added annotation appears outside of the plot. So I need to get rid of it.
In the plot, I'm using sth like this:clip = True in mark_line().
You need to set clip=True for the text mark explicitly:
df = pd.DataFrame({'x': [1, 3], 'y': [1, 4], 'text': ['a', 'b']})
chart = alt.Chart(df).mark_line(clip=True).encode(
x=alt.X('x', scale=alt.Scale(domain=[0, 2])),
y='y'
)
chart + chart.mark_text().encode(text='text')
chart + chart.mark_text(clip=True).encode(text='text')

How to make a horizontal stacked histplot based on counts?

I have a df which represents three states (S1, S2, S3) at 3 timepoints (1hr, 2hr and 3hr). I would like to show a stacked bar plot of the states but the stacks are discontinous or at least not cumulative. How can I fix this in Seaborn? It is important that time is on the y-axis and the state counts on the x-axis.
Below is some code.
data = [[3, 2, 18],[4, 13, 6], [1, 2, 20]]
df = pd.DataFrame(data, columns = ['S1', 'S2', 'S3'])
df = df.reset_index().rename(columns = {'index':'Time'})
melt = pd.melt(df, id_vars = 'Time')
plt.figure()
sns.histplot(data = melt,x = 'value', y = 'Time', bins = 3, hue = 'variable', multiple="stack")
EDIT:
This is somewhat what I am looking for, I hope this gives you an idea. Please ignore the difference in the scales between boxes...
If I understand correctly, I think you want to use value as a weight:
sns.histplot(
data=melt, y='Time', hue='variable', weights='value',
multiple='stack', shrink=0.8, discrete=True,
)
This is pretty tough in seaborn as it doesn't natively support stacked bars. You can use either the builtin plot from pandas, or try plotly express.
data = [[3, 2, 18],[4, 13, 6], [1, 2, 20]]
df = pd.DataFrame(data, columns = ['S1', 'S2', 'S3'])
df = df.reset_index().rename(columns = {'index':'Time'})
# so your y starts at 1
df.Time+=1
melt = pd.melt(df, id_vars = 'Time')
# so y isn't treated as continuous
melt.Time = melt.Time.astype('str')
Pandas can do it, but getting the labels in there is a bit of pain. Check around to figure out how to do it.
df.set_index('Time').plot(kind='barh', stacked=True)
Plotly makes it easier:
import plotly.express as px
px.bar(melt, x='value', y='Time', color='variable', orientation='h', text='value')

Fixed heatmap table with customised colours

I've been breaking my head with this problem. I want to make in plotly something like this:
This is very common in excel plots, so I want to see if it is possible to make this in Plotly for python.
The idea is to customise the plot, I mean, show exactly what the image above shows, I need this to use it as a background in another plot that I made. So, I need to know if its possible to make something like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors
Index= ['1', '2', '3', '4', '5']
Cols = ['A', 'B', 'C', 'D','E']
data= [[ 0, 0,1, 1,2],[ 0, 1,2, 2,3],[ 1, 2,2, 3,4],[1, 2,3, 4,4],[ 2, 3,4, 4,4]]
df = pd.DataFrame(data, index=Index, columns=Cols)
cmap = colors.ListedColormap(['darkgreen','lightgreen','yellow','orange','red'])
bounds=[0, 1, 2, 3, 4,5]
norm = colors.BoundaryNorm(bounds, cmap.N)
heatmap = plt.pcolor(np.array(data), cmap=cmap, norm=norm)
plt.colorbar(heatmap, ticks=[0, 1, 2, 3,4,5])
plt.show()
And that code give us this plot:
Sorry to bother but by this point I'm completely hopeless haha, I've searched a lot and found nothing.
Thanks so much for reading, any help is appreciated.
The MPL heatmap you presented has some remaining issues but was created plotly. I used this example from the official reference as a basis.
import plotly.express as px
data= [[ 0, 0, 1, 1, 2],[ 0, 1, 2, 2, 3],[ 1, 2, 2, 3, 4],[1, 2, 3, 4, 4],[2, 3, 4, 4, 4]]
fig = px.imshow(data, color_continuous_scale=["darkgreen","lightgreen","yellow","orange","red"])
fig.update_yaxes(autorange=True)
fig.update_layout(
xaxis = dict(
tickmode = 'linear',
tick0 = 0,
dtick = 1
),
autosize=False,
width=500
)
# fig.layout['coloraxis']['colorbar']['x'] = 1.0
fig.update_layout(coloraxis_colorbar=dict(
tickvals=[0,1,2,3,4],
ticktext=[0,1,2,3,4],
x=1.0
))
fig.show()
I would recommend using Seaborn colour pattern:
https://seaborn.pydata.org/tutorial/color_palettes.html
And playing around with the cmap, max, vmin and central which allow you to change the tone of colors base on the scale of data. (It may take a while to get what you want :D)
g = sns.heatmap(data, vmax = 6, vmin = 0, cmap = 'Spectral', center = 3, yticklabels = True)

Y-axis values cuts off using seaborn scatter plot

I have an issue with plotting the big CSV file with Y-axis values ranging from 1 upto 20+ millions. There are two problems I am facing right now.
The Y-axis do not show all the values that it is suppose to. When using the original data, it shows upto 6 million, instead of showing all the data upto 20 millions. In the sample data (smaller data) I put below, it only shows the first Y-axis value and does not show any other values.
In the label section, since I am using hue and style = name, "name" appears as the label title and as an item inside.
Questions:
Could anyone give me a sample or help me to answer how may I show all the Y-axis values? How can I fix it so all the Y-values show up?
How can I get rid of "name" under label section without getting rid of shapes and colors for the scatter points?
(Please let me know of any sources exist or this question was answered on some other post without labeling it duplicated. Please also let me know if I have any grammar/spelling issues that I need to fix. Thank you!)
Below you can find the function I am using to plot the graph and the sample data.
def test_graph (file_name):
data_file = pd.read_csv(file_name, header=None, error_bad_lines=False, delimiter="|", index_col = False, dtype='unicode')
data_file.rename(columns={0: 'name',
1: 'date',
2: 'name3',
3: 'name4',
4: 'name5',
5: 'ID',
6: 'counter'}, inplace=True)
data_file.date = pd.to_datetime(data_file['date'], unit='s')
norm = plt.Normalize(1,4)
cmap = plt.cm.tab10
df = pd.DataFrame(data_file)
# Below creates and returns a dictionary of category-point combinations,
# by cycling over the marker points specified.
points = ['o', 'v', '^', '<', '>', '8', 's', 'p', 'H', 'D', 'd', 'P', 'X']
mult = len(df['name']) // len(points) + (len(df['name']) % len(points) > 0)
markers = {key:value for (key, value)
in zip(df['name'], points * mult)} ; markers
sc = sns.scatterplot(data = df, x=df['date'], y=df['counter'], hue = df['name'], style = df['name'], markers = markers, s=50)
ax.set_autoscaley_on(True)
ax.set_title("TEST", size = 12, zorder=0)
plt.legend(title="Names", loc='center left', shadow=True, edgecolor = 'grey', handletextpad = 0.1, bbox_to_anchor=(1, 0.5))
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(100))
plt.xlabel("Dates", fontsize = 12, labelpad = 7)
plt.ylabel("Counter", fontsize = 12)
plt.grid(axis='y', color='0.95')
fig.autofmt_xdate(rotation = 30)
fig = plt.figure(figsize=(20,15),dpi=100)
ax = fig.add_subplot(1,1,1)
test_graph(file_name)
plt.savefig(graph_results + "/Test.png", dpi=100)
# Prevents to cut-off the bottom labels (manually) => makes the bottom part bigger
plt.gcf().subplots_adjust(bottom=0.15)
plt.show()
Sample data
namet1|1582334815|ai1|ai1||150|101
namet1|1582392415|ai2|ai2||142|105
namet2|1582882105|pc1|pc1||1|106
namet2|1582594106|pc1|pc1||1|123
namet2|1580592505|pc1|pc1||1|141
namet2|1580909305|pc1|pc1||1|144
namet3|1581974872|ai3|ai3||140|169
namet1|1581211616|ai4|ai4||134|173
namet2|1582550907|pc1|pc1||1|179
namet2|1582608505|pc1|pc1||1|185
namet4|1581355640|ai5|ai5|bcu|180|298466
namet4|1582651641|pc2|pc2||233|298670
namet5|1582406860|ai6|ai6|bcu|179|298977
namet5|1580563661|pc2|pc2||233|299406
namet6|1581283626|qe1|q0/1|Link to btse1/3|51|299990
namet7|1581643672|ai5|ai5|bcu|180|300046
namet4|1581758842|ai6|ai6|bcu|179|300061
namet6|1581298027|qe2|q0/2|Link to btse|52|300064
namet1|1582680415|pc2|pc2||233|300461
namet6|1581744427|pc3|p90|Link to btsi3a4|55|6215663
namet6|1581730026|pc3|p90|Link to btsi3a4|55|6573348
namet6|1582190826|qe2|q0/2|Link to btse|52|6706378
namet6|1582190826|qe1|q0/1|Link to btse1/3|51|6788568
namet1|1581974815|pc2|pc2||233|6895836
namet4|1581974841|pc2|pc2||233|7874504
namet6|1582176427|qe1|q0/1|Link to btse1/3|51|9497687
namet6|1582176427|qe2|q0/2|Link to btse|52|9529133
namet7|1581974872|pc2|pc2||233|9573450
namet6|1582162027|pc3|p90|Link to btsi3a4|55|9819491
namet6|1582190826|pc3|p90|Link to btsi3a4|55|13494946
namet6|1582176427|pc3|p90|Link to btsi3a4|55|19026820
Results I am getting:
Big data:
Small data:
Updated Graph
Updated-graph
First of all, some improvements on your post: you are missing the import statements
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker
import seaborn as sns
The line
df = pd.DataFrame(data_file)
is not necessary, since data_file already is a DataFrame. The lines
points = ['o', 'v', '^', '<', '>', '8', 's', 'p', 'H', 'D', 'd', 'P', 'X']
mult = len(df['name']) // len(points) + (len(df['name']) % len(points) > 0)
markers = {key:value for (key, value)
in zip(df['name'], points * mult)}
do not cycle through points as you might expect, maybe use itertools as suggested here. Also, setting yticks like
ax.yaxis.set_major_locator(ticker.MultipleLocator(100))
for every 100 might be too much if your data is spanning values from 0 to 20 million, consider replacing 100 with, say, 1000000.
I was able to reproduce your first problem. Using df.dtypes I found that the column counter was stored as type object. Adding the line
df['counter']=df['counter'].astype(int)
resolved your first problem for me. I couldn't reproduce your second issue, though. Here is what the resulting plot looks like for me:
Have you tried updating all your packages to the latest version?
EDIT: as follow up on your comment, you can also adjust the number of xticks in your plot by replacing 1 in
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
by a higher number, say 10. Incorporating all my suggestions and deleting the seemingly unnecessary function definition, my version of your code looks as follows:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker
import seaborn as sns
import itertools
fig = plt.figure()
ax = fig.add_subplot()
df = pd.read_csv(
'data.csv',
header = None,
error_bad_lines = False,
delimiter = "|",
index_col = False,
dtype = 'unicode')
df.rename(columns={0: 'name',
1: 'date',
2: 'name3',
3: 'name4',
4: 'name5',
5: 'ID',
6: 'counter'}, inplace=True)
df.date = pd.to_datetime(df['date'], unit='s')
df['counter'] = df['counter'].astype(int)
points = ['o', 'v', '^', '<', '>', '8', 's', 'p', 'H', 'D', 'd', 'P', 'X']
markers = itertools.cycle(points)
markers = list(itertools.islice(markers, len(df['name'].unique())))
sc = sns.scatterplot(
data = df,
x = 'date',
y = 'counter',
hue = 'name',
style = 'name',
markers = markers,
s = 50)
ax.set_title("TEST", size = 12, zorder=0)
ax.legend(
title = "Names",
loc = 'center left',
shadow = True,
edgecolor = 'grey',
handletextpad = 0.1,
bbox_to_anchor = (1, 0.5))
ax.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1000000))
ax.minorticks_off()
ax.set_xlabel("Dates", fontsize = 12, labelpad = 7)
ax.set_ylabel("Counter", fontsize = 12)
ax.grid(axis='y', color='0.95')
fig.autofmt_xdate(rotation = 30)
plt.gcf().subplots_adjust(bottom=0.15)
plt.show()

Python pandas summary table plot

Really can't get to grips with how to plot a summary table of a pandas df. I'm sure this is not a case for a pivot table, or maybe a transposed method of displaying the data. Best I could find was : Plot table and display Pandas Dataframe
My code attempts are just not getting there:
dc = pd.DataFrame({'A' : [1, 2, 3, 4],'B' : [4, 3, 2, 1],'C' : [4, 3, 2, 1]})
data = dc['A'],dc['B'],dc['C']
ax = plt.subplot(111, frame_on=False)
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
cols=["A", "B", "C"]
row_labels=[0]
the_table = plt.table(cellText=data,colWidths = [0.5]*len(cols),rowLabels=row_labels, colLabels=cols,cellLoc = 'center', rowLoc = 'center')
plt.show()
All I would like to do, is produce a table plot, with A B C in the first column, and the total and mean in the rows next to them (see below). Any help or guidance would be great...feeling really stupid... (excuse the code example, it doesn't yet have the total and mean yet included...)
Total Mean
A x x
B x x
C x x
import pandas as pd
import matplotlib.pyplot as plt
dc = pd.DataFrame({'A' : [1, 2, 3, 4],'B' : [4, 3, 2, 1],'C' : [3, 4, 2, 2]})
plt.plot(dc)
plt.legend(dc.columns)
dcsummary = pd.DataFrame([dc.mean(), dc.sum()],index=['Mean','Total'])
plt.table(cellText=dcsummary.values,colWidths = [0.25]*len(dc.columns),
rowLabels=dcsummary.index,
colLabels=dcsummary.columns,
cellLoc = 'center', rowLoc = 'center',
loc='top')
fig = plt.gcf()
plt.show()
Does the dataFrame.describe() function helps you?
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html
Sorry, not enough points for comments.

Categories

Resources