For this question, I was provided the following information.
Data in code form:
order_data = {'Alice': {5: 'chocolate'},
'Bob': {9: 'vanilla'},
'Clair': {7: 'strawberry'},
'Drake': {10: 'chocolate' },
'Emma': {82: 'vanilla'},
'Alice': {70: 'strawberry'},
'Emma': {42: 'chocolate'},
'Ginger': {64: 'strawberry'} }
I was asked to make a bar graph detailing this data. The bar graph and the code used to make it using Altair is provided below.
import altair
data = altair.Data(customer=['Alice', 'Bob', 'Claire', 'Drake', 'Emma','Alice', 'Emma', 'Ginger'],
cakes=[5,9,7,10,82,70,42,64],
flavor=['chocolate', 'vanilla', 'strawberry','chocolate','vanilla','strawberry','chocolate','strawberry'])
chart = altair.Chart(data)
mark = chart.mark_bar()
enc = mark.encode(x='customer:N',y='cakes',color='flavor:N')
enc.display()
Graph:
My question is: What is the best way to go about constructing this graph using matplotlib?
I know this isn't an unusual graph per say but it is unusual in the sense that I have not found any replications of this kind of graph. Thank you!
It has already been answered, but you can also graph it in pandas.plot.
import pandas as pd
data = pd.DataFrame({'customer':['Alice', 'Bob', 'Claire', 'Drake', 'Emma','Alice', 'Emma', 'Ginger'],
'cakes':[5,9,7,10,82,70,42,64],
'flavor':['chocolate', 'vanilla', 'strawberry','chocolate','vanilla','strawberry','chocolate','strawberry']})
df = pd.DataFrame(data)
df = df.pivot(index='customer',columns='flavor', values='cakes').fillna(0)
df.plot(kind='bar', stacked=True)
Here is a reproduction of the Altair graph with Matplotlib. Note that I had to modify the order_data dictionary because a dict cannot be defined with multiple keys at once (so I had to group the dictionary by key values). Also note that some optionally styling statements are included to also mimic the style of Altair.
The trick is to use the bottom keyword argument of the ax.bar function. The following image is obtained from the code below.
import matplotlib.pyplot as plt
# data
order_data = {
"Alice": {"chocolate": 5, "strawberry": 70},
"Bob": {"vanilla": 9},
"Clair": {"strawberry": 7},
"Drake": {"chocolate": 10},
"Emma": {"chocolate": 42, "vanilla": 82},
"Ginger": {"strawberry": 64},
}
# init figure
fig, ax = plt.subplots(1, figsize=(2.5, 4))
colors = {"chocolate": "C0", "strawberry": "C1", "vanilla": "C2"}
# show a bar for each person
for person_id, (name, orders) in enumerate(order_data.items()):
quantities = 0
for order_id, (order, quantity) in enumerate(orders.items()):
ax.bar(person_id, quantity, bottom=quantities, color=colors[order])
quantities += quantity
# add legend
ax.legend([color for color in colors], bbox_to_anchor=(2.0, 1.0))
# remove top/right axes for style match
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.grid(axis="y", zorder=-1)
# ticks
ax.set_xticks(range(len(order_data)))
ax.set_xticklabels([name for name in order_data], rotation="vertical")
Related
So I got this code to make a pie chart, but I wanted to changes the colors of each class to the colors listed in the colors variable. The documentation about formatting plots in json is really hard to find so I can't figure it out. Does anyone know how to add colors to the plot? The code can be found below.
def plot_donut(df):
colors = ['#ca0020','#f4a582','#D9DDDC','#92c5de','#0571b0']
trace1 = {
"hole": 0.8,
"type": "pie",
"labels": ['-2','-1','0','1','2'],
"values": df['Time Spent (seconds)'],
"showlegend": False
}
fig = go.Figure(data=data, layout=layout)
fig.show()
plot_donut(df)
Further to my earlier comment, please see the code below for specifying named colours for a Plotly donut (pie) graph.
Like you, I much prefer to use the low-level Plotly API, rather than relying on the convenience wrappers. The code below shows how this is done at a low level.
Example code:
import plotly.io as pio
values = [2, 3, 5, 7, 11]
colours = ['#440154', '#3e4989', '#26828e', '#35b779', '#fde725']
trace1 = {'values': values,
'marker': {'colors': colours}, # <--- This is the key.
'type': 'pie',
'hole': 0.8,
'showlegend': False}
pio.show({'data': [trace1]})
Output:
In Plotly, in order to create scatter plots, I usually do the following:
fig = px.scatter(df, x=x, y=y)
fig.update_xaxes(range=[2, 10])
fig.update_yaxes(range=[2, 10])
I want the yaxis to intersect the xaxis at x=6. So, instead of left yaxis representing negative numbers, I want it to be from [2,6] After the intersection, right side of graph is from [6,10].
Likewise, yaxis from below axis goes from [2,6]. Above the xaxis, it goes from [6,10].
How can I do this in Plotly?
Following on from my comment, as far as I am aware, what you're after is not currently available.
However, here is an example of a work-around which uses a shapes dictionary to add horizontal and vertical lines - acting as intersecting axes - placed at your required x/y intersection of 6.
Sample dataset:
import numpy as np
x = (np.random.randn(100)*2)+6
y1 = (np.random.randn(100)*2)+6
y2 = (np.random.randn(100)*2)+6
Example plotting code:
import plotly.io as pio
layout = {'title': 'Intersection of X/Y Axes Demonstration'}
shapes = []
traces = []
traces.append({'x': x, 'y': y1, 'mode': 'markers'})
traces.append({'x': x, 'y': y2, 'mode': 'markers'})
shapes.append({'type': 'line',
'x0': 2, 'x1': 10,
'y0': 6, 'y1': 6})
shapes.append({'type': 'line',
'x0': 6, 'x1': 6,
'y0': 2, 'y1': 10})
layout['shapes'] = shapes
layout['xaxis'] = {'range': [2, 10]}
layout['yaxis'] = {'range': [2, 10]}
pio.show({'data': data, 'layout': layout})
Output:
Comments (TL;DR):
The example code shown here uses the low-level Plotly API (plotly.io), rather than a convenience wrapper such as graph_objects or express. The reason is that I (personally) feel it's helpful to users to show what is occurring 'under the hood', rather than masking the underlying code logic with a convenience wrapper.
This way, when the user needs to modify a finer detail of the graph, they will have a better understanding of the lists and dicts which Plotly is constructing for the underlying graphing engine (orca).
I think fig.add_hline() and fig.add_vline() is the function your need.
Example code
import plotly.express as px
import pandas as pd
df = pd.DataFrame({'x':[6,7,3], 'y':[4,5,6]})
fig = px.scatter(df, x='x', y='y')
fig.update_xaxes(range=[2, 10])
fig.update_yaxes(range=[2, 10])
fig.add_hline(y=4)
fig.add_vline(x=6)
fig.show()
Output
I have a dataset that looks like this:
x y z
0 Jan 28446000 110489.0
1 Feb 43267700 227900.0
When I plot a line chart like this:
px.line(data,x = 'x', y = ['y','z'], line_shape = 'spline', title="My Chart")
The y axis scale comes from 0 to 90 M. The first line on the chart for y is good enough. However, the second line appears to be always at 0M. What can I do to improve my chart such that we can see clearly how the values of both column change over the x values?
Is there any way I can normalize the data? Or perhaps I could change the scaling of the chart.
Often times we use data which is in different scales, and scaling the data would mask a characteristic we wish to display. One way to handle this is to add a secondary y-axis. An example is shown below.
Key points:
Create a layout dictionary object
Add a yaxis2 key to the dict, with the following: 'side': 'right', 'overlaying': 'y1'
This tells Plotly to create a secondary y-axis on the right side of the graph, and to overlay the primary y-axis.
Assign the appropriate trace to the newly created secondary y-axis as: 'yaxis': 'y2'
The other trace does not need to be assigned, as 'y1' is the default y-axis.
Comments (TL;DR):
The example code shown here uses the lower-level Plotly API, rather than a convenience wrapper such as graph_object to express. The reason is that I (personally) feel it's helpful to users to show what is occurring 'under the hood', rather than masking the underlying code logic with a convenience wrapper.
This way, when the user needs to modify a finer detail of the graph, they will have a better understanding of the lists and dicts which Plotly is constructing for the underlying graphing engine (orca).
The Docs:
Here is a link to the Plotly docs referencing multiple axes.
Example Code:
import pandas as pd
from plotly.offline import iplot
df = pd.DataFrame({'x': ['Jan', 'Feb'],
'y': [28446000, 43267700],
'z': [110489.0, 227900.0]})
layout = {'title': 'Secondary Y-Axis Demonstration',
'legend': {'orientation': 'h'}}
traces = []
traces.append({'x': df['x'], 'y': df['y'], 'name': 'Y Values'})
traces.append({'x': df['x'], 'y': df['z'], 'name': 'Z Values', 'yaxis': 'y2'})
# Add config common to all traces.
for t in traces:
t.update({'line': {'shape': 'spline'}})
layout['yaxis1'] = {'title': 'Y Values', 'range': [0, 50000000]}
layout['yaxis2'] = {'title': 'Z Values', 'side': 'right', 'overlaying': 'y1', 'range': [0, 400000]}
iplot({'data': traces, 'layout': layout})
Graph:
How can we make a faceted grid in subplots in Plotly? For example I want to plot the total_bill versus tip five times in subplots. I tired to do the following:
import plotly.plotly as py
import plotly.figure_factory as ff
from plotly import tools
subfigs = tools.make_subplots(rows= 5, cols=1)
import pandas as pd
tips = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/tips.csv')
for i in enumerate(tips.columns.tolist()):
fig = ff.create_facet_grid(
tips,
x='total_bill',
y='tip',
color_name='sex',
show_boxes=False,
marker={'size': 10, 'opacity': 1.0},
colormap={'Male': 'rgb(165, 242, 242)', 'Female': 'rgb(253, 174, 216)'}
)
subfigs.append_trace(fig, i+1, 1)
pyo.iplot(fig)
This does not work, because the faceted grid created by the figure factory is not considered as trace. Is there a way of doing this? This answer did not help because as it seemed to me cufflinks does not accept faceted grids
There are a number of things going on in here.
The function enumerate in python gives you a list of tuples, if you want to iterate by just the indexes you can use
for i in range(tips.columns.size):
otherwise you can unpack by doing
for i, col in enumerate(tips.columns):
The methods from figure factory return Figures, which contain Traces in it's data list. You can access one of the traces produced by the create_facet_grid method by it's index:
subfig.append_trace(fig['data'][index_of_the_trace], n1, n2)
The idea of faceted grids is to split the dataset by one of it's categoric columns, here's an example of how you can assign the partitions of the dataset by some selected column to different subplots:
tips = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/tips.csv')
current_column = 'sex'
subfigs = tools.make_subplots(
rows = tips[current_column].nunique(),
cols = 1
)
fig = ff.create_facet_grid(
tips,
x = 'total_bill',
y = 'tip',
facet_row = current_column,
color_name = 'sex',
show_boxes = False,
marker = {'size': 10, 'opacity': 1.0},
colormap = {
'Male': 'rgb(165, 242, 242)',
'Female': 'rgb(253, 174, 216)'
}
)
for i in range(tips[current_column].nunique()):
subfigs.append_trace(fig['data'][i], i+1, 1)
py.iplot(fig)
hope it helps.
Suppose I have the following two dataframes:
df1 = pd.DataFrame(np.random.randn(100, 3),columns=['A','B','C']).cumsum()
df2 = pd.DataFrame(np.random.randn(100, 3),columns=['A','B','C']).cumsum()
My question is that, how can I plot them in one graph such that:
The three series of df1 and df2 are still in the same blue, orange
and green lines as above.
The three series of df1 are in solid lines
The three series of df1 are in dashed lines
Currently the closest thing I can get is the following:
ax = df1.plot(style=['b','y','g'])
df2.plot(ax=ax, style=['b','y','g'], linestyle='--')
Is there any way to get the color codes used by default by DataFrame.plot()? Or is there any other better approach to achieve what I want? Ideally I don't want to specify any color codes with the style parameter but always use the default colors.
Without messing with the colors themselves or transferring them from one plot to the other you may easily just reset the colorcycle in between your plot commands
ax = df1.plot()
ax.set_prop_cycle(None)
df2.plot(ax=ax, linestyle="--")
You could use get_color from the lines:
df1 = pd.DataFrame(np.random.randn(100, 3),columns=['A','B','C']).cumsum()
df2 = pd.DataFrame(np.random.randn(100, 3),columns=['A','B','C']).cumsum()
ax = df1.plot()
l = ax.get_lines()
df2.plot(ax=ax, linestyle='--', color=(i.get_color() for i in l))
Output:
You can get the default color parameters that are currently being used from matplotlib.
import matplotlib.pyplot as plt
colors = list(plt.rcParams.get('axes.prop_cycle'))
[{'color': '#1f77b4'},
{'color': '#ff7f0e'},
{'color': '#2ca02c'},
{'color': '#d62728'},
{'color': '#9467bd'},
{'color': '#8c564b'},
{'color': '#e377c2'},
{'color': '#7f7f7f'},
{'color': '#bcbd22'},
{'color': '#17becf'}]
so just pass style=['#1f77b4', '#ff7f0e', '#2ca02c'] and the colors should work.
If you want to set another color cycler, say the older version, then:
plt.rcParams['axes.prop_cycle'] = ("cycler('color', 'bgrcmyk')")
list(plt.rcParams['axes.prop_cycle'])
#[{'color': 'b'},
# {'color': 'g'},
# {'color': 'r'},
# {'color': 'c'},
# {'color': 'm'},
# {'color': 'y'},
# {'color': 'k'}]