Altair mark_line plots noisier than matplotlib?

Altair mark_line plots noisier than matplotlib? - python

I am learning altair to add interactivity to my plots. I am trying to recreate a plot I do in matplotlib, however altair is adding noise to my curves.
this is my dataset
df1
linked here from github: https://raw.githubusercontent.com/leoUninova/Transistor-altair-plots/master/df1.csv
This is the code:
fig, ax = plt.subplots(figsize=(8, 6))
for key, grp in df1.groupby(['Name']):
y=grp.logabsID
x=grp.VG
ax.plot(x, y, label=key)
plt.legend(loc='best')
plt.show()
#doing it directly from link
df1='https://raw.githubusercontent.com/leoUninova/Transistor-altair-plots/master/df1.csv'
import altair as alt
alt.Chart(df1).mark_line(size=1).encode(
x='VG:Q',
y='logabsID:Q',
color='Name:N'
)
Here is the image of the plots I am generating:
matplotlib vs altair plot
How do I remove the noise from altair?

Altair sorts the x axis before drawing lines, so if you have multiple lines in one group it will often lead to "noise", as you call it. This is not noise, but rather an accurate representation of all the points in your dataset shown in the default sort order. Here is a simple example:
import numpy as np
import pandas as pd
import altair as alt
df = pd.DataFrame({
'x': [1, 2, 3, 4, 5, 5, 4, 3, 2, 1],
'y': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'group': [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
})
alt.Chart(df).mark_line().encode(
x='x:Q',
y='y:Q'
)
The best way to fix this is to set the detail encoding to a column that distinguishes between the different lines that you would like to be drawn individually:
alt.Chart(df).mark_line().encode(
x='x:Q',
y='y:Q',
detail='group:N'
)
If it is not the grouping that is important, but rather the order of the points, you can specify that by instead providing an order channel:
alt.Chart(df.reset_index()).mark_line().encode(
x='x:Q',
y='y:Q',
order='index:Q'
)
Notice that the two lines are connected on the right end. This is effectively what matplotlib does by default: it maintains the index order even if there is repeated data. Using the order channel for your data produces the result you're looking for:
df1 = pd.read_csv('https://raw.githubusercontent.com/leoUninova/Transistor-altair-plots/master/df1.csv')
alt.Chart(df1.reset_index()).mark_line(size=1).encode(
x='VG:Q',
y='logabsID:Q',
color='Name:N',
order='index:Q'
)
The multiple lines in each group are drawn in order connected at the ends, just as they are in matplotlib.

Related

How to make Altair display NaN points with a quantitative color scale?

In Altair, is there a way to plot NaN/None colors in quantitative encodings? Possibly even assigning a specific color such as in Matplotlib's set_bad?
For example, the third data point is missing using quantitative encoding for color 'c:Q'
df = pd.DataFrame(dict(x=[0, 1, 2, 3], c=[0, 1, None, 3]))
alt.Chart(df).mark_circle().encode(x='x:Q', y='x:Q', color='c:Q')
but it shows up (as null) when using ordinal encoding 'c:O':
df = pd.DataFrame(dict(x=[0, 1, 2, 3], c=[0, 1, None, 3]))
alt.Chart(df).mark_circle().encode(x='x:Q', y='x:Q', color='c:O')

Null data is filtered by default for some chart types and scales, but we can include them with invalid=None (the "invalid" param for marks in the docs). Then we can use a condition that assigns points the color grey if they are not valid numerical data:
import pandas as pd
import altair as alt
df = pd.DataFrame(dict(x=[0, 1, 2, 3], c=[0, 1, None, 3]))
alt.Chart(df).mark_circle(size=200, invalid=None).encode(
x='x:Q',
y='x:Q',
color=alt.condition('isValid(datum.c)', 'c:Q', alt.value('gray'))
)
If you want the legend to include the NaN, I think you need a layered chart:
points = alt.Chart(df).mark_circle(size=200).encode(
x='x:Q',
y='x:Q',
color='c:Q'
)
points + points.encode(
color=alt.Color('c:O', scale=alt.Scale(range=['grey']), title=None)
).transform_filter(
'!isValid(datum.c)'
)
It would be convenient if you could avoid layering and simply type out something like this instead, but that is not allowed currently:
alt.condition(
'isValid(datum.c)',
'c:Q',
alt.Color('c:O', scale=alt.Scale(range=['grey']), title=None)
)
Ref Dealing with missing values / nulls in Altair choropleth map

Create multiple boxplots from statistics in one graph

I am having trouble finding a solution to plot multiple boxplots created from statistics into one graph.
From another application, I get a Dataframe that contains the different metrics needed to draw boxplots (median, quantile 1, ...). While I am able to plot a single boxplot from these statistics with the following code:
data = pd.read_excel("data.xlsx")
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(6, 6), sharey=True)
row = data.iloc[:, 0]
stats = [{
"label": i, # not required
"mean": row["sharpeRatio"], # not required
"med": row["sharpeRatio_med"],
"q1": row["sharpeRatio_q1"],
"q3": row["sharpeRatio_q3"],
# "cilo": 5.3 # not required
# "cihi": 5.7 # not required
"whislo": row["sharpeRatio_min"], # required
"whishi": row["sharpeRatio_max"], # required
"fliers": [] # required if showfliers=True
}]
axes.bxp(stats)
plt.show()
I am struggling to create a graph containing boxplots from all the rows in the dataframe. Do you have an idea how to achieve this?

You can pass a list of dictionaries to the bxp method. The easiest way to get such a list from your existing code is to put the dictionary construction inside a function and call it for each row of the dataframe.
Note that data.iloc[:, 0] would be the first column, not the first row.
import matplotlib.pyplot as plt
import pandas as pd
def stats(row):
return {"med": row["sharpeRatio_med"],
"q1": row["sharpeRatio_q1"],
"q3": row["sharpeRatio_q3"],
"whislo": row["sharpeRatio_min"],
"whishi": row["sharpeRatio_max"]}
data = pd.DataFrame({"sharpeRatio_med": [3, 4, 2],
"sharpeRatio_q1": [2, 3, 1],
"sharpeRatio_q3": [4, 5, 3],
"sharpeRatio_min": [1, 1, 0],
"sharpeRatio_max": [5, 6, 4]})
fig, axes = plt.subplots()
axes.bxp([stats(data.iloc[i, :]) for i in range(len(data))],
showfliers=False)
plt.show()

Fixed heatmap table with customised colours

I've been breaking my head with this problem. I want to make in plotly something like this:
This is very common in excel plots, so I want to see if it is possible to make this in Plotly for python.
The idea is to customise the plot, I mean, show exactly what the image above shows, I need this to use it as a background in another plot that I made. So, I need to know if its possible to make something like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors
Index= ['1', '2', '3', '4', '5']
Cols = ['A', 'B', 'C', 'D','E']
data= [[ 0, 0,1, 1,2],[ 0, 1,2, 2,3],[ 1, 2,2, 3,4],[1, 2,3, 4,4],[ 2, 3,4, 4,4]]
df = pd.DataFrame(data, index=Index, columns=Cols)
cmap = colors.ListedColormap(['darkgreen','lightgreen','yellow','orange','red'])
bounds=[0, 1, 2, 3, 4,5]
norm = colors.BoundaryNorm(bounds, cmap.N)
heatmap = plt.pcolor(np.array(data), cmap=cmap, norm=norm)
plt.colorbar(heatmap, ticks=[0, 1, 2, 3,4,5])
plt.show()
And that code give us this plot:
Sorry to bother but by this point I'm completely hopeless haha, I've searched a lot and found nothing.
Thanks so much for reading, any help is appreciated.

The MPL heatmap you presented has some remaining issues but was created plotly. I used this example from the official reference as a basis.
import plotly.express as px
data= [[ 0, 0, 1, 1, 2],[ 0, 1, 2, 2, 3],[ 1, 2, 2, 3, 4],[1, 2, 3, 4, 4],[2, 3, 4, 4, 4]]
fig = px.imshow(data, color_continuous_scale=["darkgreen","lightgreen","yellow","orange","red"])
fig.update_yaxes(autorange=True)
fig.update_layout(
xaxis = dict(
tickmode = 'linear',
tick0 = 0,
dtick = 1
),
autosize=False,
width=500
)
# fig.layout['coloraxis']['colorbar']['x'] = 1.0
fig.update_layout(coloraxis_colorbar=dict(
tickvals=[0,1,2,3,4],
ticktext=[0,1,2,3,4],
x=1.0
))
fig.show()

I would recommend using Seaborn colour pattern:
https://seaborn.pydata.org/tutorial/color_palettes.html
And playing around with the cmap, max, vmin and central which allow you to change the tone of colors base on the scale of data. (It may take a while to get what you want :D)
g = sns.heatmap(data, vmax = 6, vmin = 0, cmap = 'Spectral', center = 3, yticklabels = True)

Show all lines in matplotlib line plot

How do I bring the other line to the front or show both the graphs together?
plot_yield_df.plot(figsize=(20,20))

If plot data overlaps, then one way to view both the data is increase the linewidth along with handling transparency, as shown:
plt.plot(np.arange(5), [5, 8, 6, 9, 4], label='Original', linewidth=5, alpha=0.5)
plt.plot(np.arange(5), [5, 8, 6, 9, 4], label='Predicted')
plt.legend()
Subplotting is other good way.

Problem
The lines are plotted in the order their columns appear in the dataframe. So for example
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
a = np.random.rand(400)*0.9
b = np.random.rand(400)+1
a = np.c_[a,-a].flatten()
b = np.c_[b,-b].flatten()
df = pd.DataFrame({"A" : a, "B" : b})
df.plot()
plt.show()
Here the values of "B" hide those from "A".
Solution 1: Reverse column order
A solution is to reverse their order
df[df.columns[::-1]].plot()
That has also changed the order in the legend and the color coding.
Solution 2: Reverse z-order
So if that is not desired, you can instead play with the zorder.
ax = df.plot()
lines = ax.get_lines()
for line, j in zip(lines, list(range(len(lines)))[::-1]):
line.set_zorder(j)

Using MaxNLocator for pandas bar plot results in wrong labels

I have a pandas dataframe and I want to create a plot of it:
import pandas as pd
from matplotlib.ticker import MultipleLocator, FormatStrFormatter, MaxNLocator
df = pd.DataFrame([1, 3, 3, 5, 10, 20, 11, 7, 2, 3, 1], range(-5, 6))
df.plot(kind='barh')
Nice, everything works as expected:
Now I wanted to hide some of the ticks on y axes. Looking at the docs, I thought I can achieve it with:
MaxNLocator: Finds up to a max number of intervals with ticks at nice
locations. MultipleLocator: Ticks and range are a multiple of base;
either integer or float.
But both of them plot not what I was expecting to see (the values on the y-axes do not show the correct numbers):
ax = df.plot(kind='barh')
ax.yaxis.set_major_locator(MultipleLocator(2))
ax = df.plot(kind='barh')
ax.yaxis.set_major_locator(MaxNLocator(3))
What do I do wrong?

Problem
The problem occurs because pandas barplots are categorical. Each bar is positioned at a succesive integer value starting at 0. Only the labels are adjusted to show the actual dataframe index. So here you have a FixedLocator with values 0,1,2,3,... and a FixedFormatter with values -5, -4, -3, .... Changing the Locator alone does not change the formatter, hence you get the numbers -5, -4, -3, ... but at different locations (one tick is not shown, hence the plot starts at -4 here).
A. Pandas solution
In addition to setting the locator you would need to set a formatter, which returns the correct values as function of the location. In the case of a dataframe index with successive integers as used here, this can be done by adding the minimum index to the location using a FuncFormatter. For other cases, the function for the FuncFormatter may become more complicated.
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.ticker import (MultipleLocator, MaxNLocator,
FuncFormatter, ScalarFormatter)
df = pd.DataFrame([1, 3, 3, 5, 10, 20, 11, 7, 2, 3, 1], range(-5, 6))
ax = df.plot(kind='barh')
ax.yaxis.set_major_locator(MultipleLocator(2))
sf = ScalarFormatter()
sf.create_dummy_axis()
sf.set_locs((df.index.max(), df.index.min()))
ax.yaxis.set_major_formatter(FuncFormatter(lambda x,p: sf(x+df.index[0])))
plt.show()
B. Matplotlib solution
Using matplotlib, the solution is potentially easier. Since matplotlib bar plots are numeric in nature, they position the bars at the locations given to the first argument. Here, setting a locator alone is sufficient.
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.ticker import MultipleLocator, MaxNLocator
df = pd.DataFrame([1, 3, 3, 5, 10, 20, 11, 7, 2, 3, 1], range(-5, 6))
fig, ax = plt.subplots()
ax.barh(df.index, df.values[:,0])
ax.yaxis.set_major_locator(MultipleLocator(2))
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Altair mark_line plots noisier than matplotlib? - python

Related

How to make Altair display NaN points with a quantitative color scale?

Create multiple boxplots from statistics in one graph

Fixed heatmap table with customised colours

Show all lines in matplotlib line plot

Using MaxNLocator for pandas bar plot results in wrong labels

Categories

Resources