Create a carpetplot with discrete values - python

I'd like to create a carpetplot with discrete values. For example I have these values:
import pandas as pd
import pylab as plt
df_data = pd.DataFrame(
[[1, 2, 1], [1, 1, 3], [2, 2, 5], [3, 2, 1]], index=['n1', 'n2', 'n3', 'n4'], columns=['var1', 'var2', 'var3'])
I have also a dictionary to match these discrete values to a color:
matcher_dict = {
1: (236, 99, 92), 2: (75, 129, 196), 3: (244, 153, 97), 5: (135, 104, 180)}
I'd now like to create a carpet plot, and I thought imshow could be a way to work this out, as the documentation for imshow says
cmap : Colormap, optional, default: None
If None, default to rc image.cmap value. cmap is ignored when X has RGB(A) information
So I create a new Dataframe with the colors as entries:
df_color = pd.DataFrame(index=df_data.index, columns=df_data.columns)
for col_index, col in enumerate(df_data.iteritems()):
for row_index, value in enumerate(col[1]):
df_color.ix[row_index].ix[col_index] = matcher_dict[
df_data.ix[row_index].ix[col_index]]
Now I expect this to work:
fig,ax = plt.subplots()
im = ax.imshow(
df_color.values, interpolation='nearest', aspect='auto')
But all I get is a TypeError: Image data can not convert to float
The result I expected (and was able to create with an terrible inefficient code should look like this Expected Result of the carpetplot
But this just raises an TypeError, Image data can not convert to float.
EDIT: If I use df_data.values directly (instead of df_color.values, it creates a plot, by using the default colormap. Is it possible to create a discrete colormap? (I didn't completly understand the colormap concepts from reading matplotlibs documentation)

I found a solution to my problem. As assumed an discrete colormap does the trick. How to create one is described in a scipy cookbook, search for discrete_cmap.
So my working code would be this:
import pandas as pd
import pylab as plt
df_data = pd.DataFrame(
[[1, 2, 1], [1, 1, 3], [2, 2, 5], [3, 2, 1]], index=['n1', 'n2', 'n3', 'n4'], columns=['var1', 'var2', 'var3'])
cpool = ['#EC635C', '#4B81C4', '#F49961', '#B45955',
'#8768B4']
cmap3 = plt.matplotlib.colors.ListedColormap(cpool[0:5], 'indexed')
fig, ax = plt.subplots()
im = ax.imshow(
df_data.values, cmap=cmap3, interpolation='nearest', aspect='auto')
plt.colorbar(mappable=im)
plt.show()
Axes descriptions still needs some fiddeling, but it works basically.

Related

Plot subplots inside subplots matplotlib

Context: I'd like to plot multiple subplots (sparated by legend) based on patterns from the columns of a dataframe inside a subplot however, I'm not being able to separate each subplots into another set of subplots.
This is what I have:
import matplotlib.pyplot as plt
col_patterns = ['pattern1','pattern2']
# define subplot grid
fig, axs = plt.subplots(nrows=len(col_patterns), ncols=1, figsize=(30, 80))
plt.subplots_adjust()
fig.suptitle("Title", fontsize=18, y=0.95)
for col_pat,ax in zip(col_patterns,axs.ravel()):
col_pat_columns = [col for col in df.columns if col_pat in col]
df[col_pat_columns].plot(x='Week',ax=ax)
# chart formatting
ax.set_title(col_pat.upper())
ax.set_xlabel("")
Which results in something like this:
How could I make it so that each one of those suplots turn into another 6 subplots all layed out horizontally? (i.e. each figure legend would be its own subplot)
Thank you!
In your example, you're defining a 2x1 subplot and only looping through two axes objects that get created. In each of the two loops, when you call df[col_pat_columns].plot(x='Week',ax=ax), since col_pat_columns is a list and you're passing it to df, you're just plotting multiple columns from your dataframe. That's why it's multiple series on a single plot.
#fdireito is correct—you just need to set the ncols argument of plt.subplots() to the right number that you need, but you'd need to adjust your loops to accommodate.
If you want to stay in matplotlib, then here's a basic example. I had to take some guesses as to how your dataframe was structured and so on.
# import matplotlib
import matplotlib.pyplot as plt
# create some fake data
x = [1, 2, 3, 4, 5]
df = pd.DataFrame({
'a':[1, 1, 1, 1, 1], # horizontal line
'b':[3, 6, 9, 6, 3], # pyramid
'c':[4, 8, 12, 16, 20], # steep line
'd':[1, 10, 3, 13, 5] # zig-zag
})
# a list of lists, where each inner list is a set of
# columns we want in the same row of subplots
col_patterns = [['a', 'b', 'c'], ['b', 'c', 'd']]
The following is a simplified example of what your code ends up doing.
fig, axes = plt.subplots(len(col_patterns), 1)
for pat, ax in zip(col_patterns, axes):
ax.plot(x, df[pat])
2x1 subplot (what you have right now)
I use enumerate() with col_patterns to iterate through the subplot rows, and then use enumerate() with each column name in a given pattern to iterate through the subplot columns.
# the following will size your subplots according to
# - number of different column patterns you want matched (rows)
# - largest number of columns in a given column pattern (columns)
subplot_rows = len(col_patterns)
subplot_cols = max([len(x) for x in col_patterns])
fig, axes = plt.subplots(subplot_rows, subplot_cols)
for nrow, pat in enumerate(col_patterns):
for ncol, col in enumerate(pat):
axes[nrow][ncol].plot(x, df[col])
Correctly sized subplot
Here's all the code, with a couple additions I omitted from the code above for simplicity's sake.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
df = pd.DataFrame({
'a':[1, 1, 1, 1, 1], # horizontal line
'b':[3, 6, 9, 6, 3], # pyramid
'c':[4, 8, 12, 16, 20], # steep line
'd':[1, 10, 3, 13, 5] # zig-zag
})
col_patterns = [['a', 'b', 'c'], ['b', 'c', 'd']]
# what you have now
fig, axes = plt.subplots(len(col_patterns), 1, figsize=(12, 8))
for pat, ax in zip(col_patterns, axes):
ax.plot(x, df[pat])
ax.legend(pat, loc='upper left')
# what I think you want
subplot_rows = len(col_patterns)
subplot_cols = max([len(x) for x in col_patterns])
fig, axes = plt.subplots(subplot_rows, subplot_cols, figsize=(16, 8), sharex=True, sharey=True, tight_layout=True)
for nrow, pat in enumerate(col_patterns):
for ncol, col in enumerate(pat):
axes[nrow][ncol].plot(x, df[col], label=col)
axes[nrow][ncol].legend(loc='upper left')
Another option you can consider is ditching matplotlib and using Seaborn relplots. There are several examples on that page that should help. If you have your dataframe set up correctly (long or "tidy" format), then to achieve the same as above, your one-liner would look something like this:
# import seaborn as sns
sns.relplot(data=df, kind='line', x=x_vals, y=y_vals, row=col_pattern, col=num_weeks_rolling)

Create multiple boxplots from statistics in one graph

I am having trouble finding a solution to plot multiple boxplots created from statistics into one graph.
From another application, I get a Dataframe that contains the different metrics needed to draw boxplots (median, quantile 1, ...). While I am able to plot a single boxplot from these statistics with the following code:
data = pd.read_excel("data.xlsx")
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(6, 6), sharey=True)
row = data.iloc[:, 0]
stats = [{
"label": i, # not required
"mean": row["sharpeRatio"], # not required
"med": row["sharpeRatio_med"],
"q1": row["sharpeRatio_q1"],
"q3": row["sharpeRatio_q3"],
# "cilo": 5.3 # not required
# "cihi": 5.7 # not required
"whislo": row["sharpeRatio_min"], # required
"whishi": row["sharpeRatio_max"], # required
"fliers": [] # required if showfliers=True
}]
axes.bxp(stats)
plt.show()
I am struggling to create a graph containing boxplots from all the rows in the dataframe. Do you have an idea how to achieve this?
You can pass a list of dictionaries to the bxp method. The easiest way to get such a list from your existing code is to put the dictionary construction inside a function and call it for each row of the dataframe.
Note that data.iloc[:, 0] would be the first column, not the first row.
import matplotlib.pyplot as plt
import pandas as pd
def stats(row):
return {"med": row["sharpeRatio_med"],
"q1": row["sharpeRatio_q1"],
"q3": row["sharpeRatio_q3"],
"whislo": row["sharpeRatio_min"],
"whishi": row["sharpeRatio_max"]}
data = pd.DataFrame({"sharpeRatio_med": [3, 4, 2],
"sharpeRatio_q1": [2, 3, 1],
"sharpeRatio_q3": [4, 5, 3],
"sharpeRatio_min": [1, 1, 0],
"sharpeRatio_max": [5, 6, 4]})
fig, axes = plt.subplots()
axes.bxp([stats(data.iloc[i, :]) for i in range(len(data))],
showfliers=False)
plt.show()

Fixed heatmap table with customised colours

I've been breaking my head with this problem. I want to make in plotly something like this:
This is very common in excel plots, so I want to see if it is possible to make this in Plotly for python.
The idea is to customise the plot, I mean, show exactly what the image above shows, I need this to use it as a background in another plot that I made. So, I need to know if its possible to make something like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors
Index= ['1', '2', '3', '4', '5']
Cols = ['A', 'B', 'C', 'D','E']
data= [[ 0, 0,1, 1,2],[ 0, 1,2, 2,3],[ 1, 2,2, 3,4],[1, 2,3, 4,4],[ 2, 3,4, 4,4]]
df = pd.DataFrame(data, index=Index, columns=Cols)
cmap = colors.ListedColormap(['darkgreen','lightgreen','yellow','orange','red'])
bounds=[0, 1, 2, 3, 4,5]
norm = colors.BoundaryNorm(bounds, cmap.N)
heatmap = plt.pcolor(np.array(data), cmap=cmap, norm=norm)
plt.colorbar(heatmap, ticks=[0, 1, 2, 3,4,5])
plt.show()
And that code give us this plot:
Sorry to bother but by this point I'm completely hopeless haha, I've searched a lot and found nothing.
Thanks so much for reading, any help is appreciated.
The MPL heatmap you presented has some remaining issues but was created plotly. I used this example from the official reference as a basis.
import plotly.express as px
data= [[ 0, 0, 1, 1, 2],[ 0, 1, 2, 2, 3],[ 1, 2, 2, 3, 4],[1, 2, 3, 4, 4],[2, 3, 4, 4, 4]]
fig = px.imshow(data, color_continuous_scale=["darkgreen","lightgreen","yellow","orange","red"])
fig.update_yaxes(autorange=True)
fig.update_layout(
xaxis = dict(
tickmode = 'linear',
tick0 = 0,
dtick = 1
),
autosize=False,
width=500
)
# fig.layout['coloraxis']['colorbar']['x'] = 1.0
fig.update_layout(coloraxis_colorbar=dict(
tickvals=[0,1,2,3,4],
ticktext=[0,1,2,3,4],
x=1.0
))
fig.show()
I would recommend using Seaborn colour pattern:
https://seaborn.pydata.org/tutorial/color_palettes.html
And playing around with the cmap, max, vmin and central which allow you to change the tone of colors base on the scale of data. (It may take a while to get what you want :D)
g = sns.heatmap(data, vmax = 6, vmin = 0, cmap = 'Spectral', center = 3, yticklabels = True)

Altair mark_line plots noisier than matplotlib?

I am learning altair to add interactivity to my plots. I am trying to recreate a plot I do in matplotlib, however altair is adding noise to my curves.
this is my dataset
df1
linked here from github: https://raw.githubusercontent.com/leoUninova/Transistor-altair-plots/master/df1.csv
This is the code:
fig, ax = plt.subplots(figsize=(8, 6))
for key, grp in df1.groupby(['Name']):
y=grp.logabsID
x=grp.VG
ax.plot(x, y, label=key)
plt.legend(loc='best')
plt.show()
#doing it directly from link
df1='https://raw.githubusercontent.com/leoUninova/Transistor-altair-plots/master/df1.csv'
import altair as alt
alt.Chart(df1).mark_line(size=1).encode(
x='VG:Q',
y='logabsID:Q',
color='Name:N'
)
Here is the image of the plots I am generating:
matplotlib vs altair plot
How do I remove the noise from altair?
Altair sorts the x axis before drawing lines, so if you have multiple lines in one group it will often lead to "noise", as you call it. This is not noise, but rather an accurate representation of all the points in your dataset shown in the default sort order. Here is a simple example:
import numpy as np
import pandas as pd
import altair as alt
df = pd.DataFrame({
'x': [1, 2, 3, 4, 5, 5, 4, 3, 2, 1],
'y': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'group': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
})
alt.Chart(df).mark_line().encode(
x='x:Q',
y='y:Q'
)
The best way to fix this is to set the detail encoding to a column that distinguishes between the different lines that you would like to be drawn individually:
alt.Chart(df).mark_line().encode(
x='x:Q',
y='y:Q',
detail='group:N'
)
If it is not the grouping that is important, but rather the order of the points, you can specify that by instead providing an order channel:
alt.Chart(df.reset_index()).mark_line().encode(
x='x:Q',
y='y:Q',
order='index:Q'
)
Notice that the two lines are connected on the right end. This is effectively what matplotlib does by default: it maintains the index order even if there is repeated data. Using the order channel for your data produces the result you're looking for:
df1 = pd.read_csv('https://raw.githubusercontent.com/leoUninova/Transistor-altair-plots/master/df1.csv')
alt.Chart(df1.reset_index()).mark_line(size=1).encode(
x='VG:Q',
y='logabsID:Q',
color='Name:N',
order='index:Q'
)
The multiple lines in each group are drawn in order connected at the ends, just as they are in matplotlib.

Python pandas summary table plot

Really can't get to grips with how to plot a summary table of a pandas df. I'm sure this is not a case for a pivot table, or maybe a transposed method of displaying the data. Best I could find was : Plot table and display Pandas Dataframe
My code attempts are just not getting there:
dc = pd.DataFrame({'A' : [1, 2, 3, 4],'B' : [4, 3, 2, 1],'C' : [4, 3, 2, 1]})
data = dc['A'],dc['B'],dc['C']
ax = plt.subplot(111, frame_on=False)
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
cols=["A", "B", "C"]
row_labels=[0]
the_table = plt.table(cellText=data,colWidths = [0.5]*len(cols),rowLabels=row_labels, colLabels=cols,cellLoc = 'center', rowLoc = 'center')
plt.show()
All I would like to do, is produce a table plot, with A B C in the first column, and the total and mean in the rows next to them (see below). Any help or guidance would be great...feeling really stupid... (excuse the code example, it doesn't yet have the total and mean yet included...)
Total Mean
A x x
B x x
C x x
import pandas as pd
import matplotlib.pyplot as plt
dc = pd.DataFrame({'A' : [1, 2, 3, 4],'B' : [4, 3, 2, 1],'C' : [3, 4, 2, 2]})
plt.plot(dc)
plt.legend(dc.columns)
dcsummary = pd.DataFrame([dc.mean(), dc.sum()],index=['Mean','Total'])
plt.table(cellText=dcsummary.values,colWidths = [0.25]*len(dc.columns),
rowLabels=dcsummary.index,
colLabels=dcsummary.columns,
cellLoc = 'center', rowLoc = 'center',
loc='top')
fig = plt.gcf()
plt.show()
Does the dataFrame.describe() function helps you?
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html
Sorry, not enough points for comments.

Categories

Resources