I recently saw this treemap chart from https://www.kaggle.com/philippsp/exploratory-analysis-instacart (two levels of hierarchy, colored, squarified treemap).
It is made with R ggplot2::treemap, by:
treemap(tmp,index=c("department","aisle"),vSize="n",title="",
palette="Set3",border.col="#FFFFFF")
I want to know how can I make this plot in Python?
I searched a bit, but didn't find any multi-level treemap example.
https://gist.github.com/gVallverdu/0b446d0061a785c808dbe79262a37eea
https://python-graph-gallery.com/200-basic-treemap-with-python/
You can use plotly. Here you can find several examples.
https://plotly.com/python/treemaps/
This is a very simple example with a multi-level structure.
import plotly.express as px
import pandas as pd
from collections import defaultdict
data = defaultdict()
data['level_1'] = ['A', 'A', 'A', 'B', 'B', 'B']
data['level_2'] = ['X', 'X', 'Y', 'Z', 'Z', 'X']
data['level_3'] = ['1', '2', '2', '1', '1', '2']
data = pd.DataFrame.from_dict(data)
fig = px.treemap(data, path=['level_1', 'level_2', 'level_3'])
fig.show()
The package matplotlib-extra provides a treemap function that supports multi-level treemap plot. For the dataset of G20, treemap can produce the similar treemap, such as:
import matplotlib.pyplot as plt
import mpl_extra.treemap as tr
fig, ax = plt.subplots(figsize=(7,7), dpi=100, subplot_kw=dict(aspect=1.156))
trc = tr.treemap(ax, df, area='gdp_mil_usd', fill='hdi', labels='country',
levels=['region', 'country'],
textprops={'c':'w', 'wrap':True,
'place':'top left', 'max_fontsize':20},
rectprops={'ec':'w'},
subgroup_rectprops={'region':{'ec':'grey', 'lw':2, 'fill':False,
'zorder':5}},
subgroup_textprops={'region':{'c':'k', 'alpha':0.5, 'fontstyle':'italic'}},
)
ax.axis('off')
cb = fig.colorbar(trc.mappable, ax=ax, shrink=0.5)
cb.ax.set_title('hdi')
cb.outline.set_edgecolor('w')
plt.show()
The obtained treemap is as follows:
For more inforamtion, you can see the project, which has some examples. The source code has an api docstring.
Related
I have a dataframe like this
df = pd.DataFrame({'name':['a', 'b', 'c', 'd', 'e'], 'value':[54.2, 53.239, 43.352, 36.442, -12.487]})
df
I'd like to plot a simple stacked bar chart like the one below whit plotly.express
How can a I do that?
I've seen on documentation several examples but none of them solved my problem
Thank you
It's a little wordy, but you can set a single value for the x axis, in this case zero. Then you just need to tweak your dimension, lables, and ranges.
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'name':['a', 'b', 'c', 'd', 'e'], 'value':[54.2, 53.239, 43.352, 36.442, -12.487]})
df['x'] = 0
fig = px.bar(df, x='x', y='value',color='name', width=500, height=1000)
fig.update_xaxes(showticklabels=False, title=None)
fig.update_yaxes(range=[-50,200])
fig.update_traces(width=.3)
fig.show()
The bar chart's only ever going to have one column? That seems like an odd use-case for a bar chart, but...
What I would do is create one trace per "name", filtering df as trace_df=df[df['name']==name], and then make a Bar for each of those, something like this:
import plotly.graph_objects as go
trace_dfs = [df[df['name']==name] for name in df['name']]
bars = [
go.Bar(
name=name,
x=['constant' for _ in trace_frame['value']],
y=trace_frame['value'],
)
for trace_frame in trace_dfs
]
fig = go.Figure(
data=bars,
barmode='stack'
)
Granted, that's not plotly_express, it's core plotly, which allows a lot more control. If you want multiple stacked bars for different values, you'll need separate labels and separate values for x and y, not the two-column DF you described. There are several more examples here and a full description of the available bar chart options here.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_excel("path to the file")
fig, ax = plt.subplots()
fig.set_size_inches(7,3)
df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age'])
df.plot.scatter(x='Age',
y='Pos',
c='DarkBlue', xticks=([15,20,25,30,35,40]))
plt.show()
Got the plot but not able to label these points
Provided you'd like to label each point, you can loop over each coordinate plotted, assigning it a label using plt.text() at the plotted point's position, like so:
from matplotlib import pyplot as plt
y_points = [i for i in range(0, 20)]
x_points = [(i*3) for i in y_points]
offset = 5
plt.figure()
plt.grid(True)
plt.scatter(x_points, y_points)
for i in range(0, len(x_points)):
plt.text(x_points[i] - offset, y_points[i], f'{x_points[i]}')
plt.show()
In the above example it will give the following:
The offset is just to make the labels more readable so that they're not right on top of the scattered points.
Obviously we don't have access to your spreadsheet, but the same basic concept would apply.
EDIT
For non numerical values, you can simply define the string as the coordinate. This can be done like so:
from matplotlib import pyplot as plt
y_strings = ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
x_values = [i for i, string in enumerate(y_strings)]
# Plot coordinates:
plt.scatter(x_values, y_strings)
for i, string in enumerate(y_strings):
plt.text(x_values[i], string, f'{x_values[i]}:{string}')
plt.grid(True)
plt.show()
Which will provide the following output:
I have following simple example dataframe:
import pandas as pd
data = [['Alex',25],['Bob',34],['Sofia',26],["Claire",35]]
df = pd.DataFrame(data,columns=['Name','Age'])
df["sex"]=["male","male","female","female"]
I use following code to plot barplots:
import matplotlib.pyplot as plt
import seaborn as sns
age_plot=sns.barplot(data=df,x="Name",y="Age", hue="sex",dodge=False)
age_plot.get_legend().remove()
plt.setp(age_plot.get_xticklabels(), rotation=90)
plt.ylim(0,40)
age_plot.tick_params(labelsize=14)
age_plot.set_ylabel("Age",fontsize=15)
age_plot.set_xlabel("",fontsize=1)
plt.tight_layout()
Produces following bar plot:
My question: how can I control de whitespace between bars? I want some extra white space between the male (blue) and female (orange) bars.
Output should look like this (poorly edited in MS PPT):
I have found several topics on this for matplotplib (e.g.https://python-graph-gallery.com/5-control-width-and-space-in-barplots/) but not for seaborn. I'd prefer to use seaborn because of the easy functionality to color by hue.
Thanks.
A possibility is to insert an empty bar in the middle:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'Name': ['Alex', 'Bob', 'Sofia', 'Claire'], 'Age': [15, 18, 16, 22], 'Gender': ['M', 'M', 'F', 'F']})
df = pd.concat([df[df.Gender == 'M'], pd.DataFrame({'Name': [''], 'Age': [0], 'Gender': ['M']}), df[df.Gender == 'F']])
age_plot = sns.barplot(data=df, x="Name", y="Age", hue="Gender", dodge=False)
age_plot.get_legend().remove()
plt.setp(age_plot.get_xticklabels(), rotation=90)
plt.ylim(0, 40)
age_plot.tick_params(labelsize=14)
age_plot.tick_params(length=0, axis='x')
age_plot.set_ylabel("Age", fontsize=15)
age_plot.set_xlabel("", fontsize=1)
plt.tight_layout()
plt.show()
The below code helps in obtaining subplots with unique colored boxes. But all subplots share a common set of x and y axis. I was looking forward to having independent axis for each sub-plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
df = pd.DataFrame(np.random.rand(140, 4), columns=['A', 'B', 'C', 'D'])
df['models'] = pd.Series(np.repeat(['model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'], 20))
bp_dict = df.boxplot(
by="models",layout=(2,2),figsize=(6,4),
return_type='both',
patch_artist = True,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
for row_key, (ax,row) in bp_dict.iteritems():
ax.set_xlabel('')
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
Here is an output of the above code:
I am trying to have separate x and y axis for each subplot...
You need to create the figure and subplots before hand and pass this in as an argument to df.boxplot(). This also means you can remove the argument layout=(2,2):
fig, axes = plt.subplots(2,2,sharex=False,sharey=False)
Then use:
bp_dict = df.boxplot(
by="models", ax=axes, figsize=(6,4),
return_type='both',
patch_artist = True,
)
You may set the ticklabels visible again, e.g. via
plt.setp(ax.get_xticklabels(), visible=True)
This does not make the axes independent though, they are still bound to each other, but it seems like you are asking about the visibilty, rather than the shared behaviour here.
If you really think it is necessary to un-share the axes after the creation of the boxplot array, you can do this, but you have to do everything 'by hand'. Searching a while through stackoverflow and looking at the matplotlib documentation pages I came up with the following solution to un-share the yaxes of the Axes instances, for the xaxes, you would have to go analogously:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
from matplotlib.ticker import AutoLocator, AutoMinorLocator
##using differently scaled data for the different random series:
df = pd.DataFrame(
np.asarray([
np.random.rand(140),
2*np.random.rand(140),
4*np.random.rand(140),
8*np.random.rand(140),
]).T,
columns=['A', 'B', 'C', 'D']
)
df['models'] = pd.Series(np.repeat([
'model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'
], 20))
##creating the boxplot array:
bp_dict = df.boxplot(
by="models",layout = (2,2),figsize=(6,8),
return_type='both',
patch_artist = True,
rot = 45,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
##adjusting the Axes instances to your needs
for row_key, (ax,row) in bp_dict.items():
ax.set_xlabel('')
##removing shared axes:
grouper = ax.get_shared_y_axes()
shared_ys = [a for a in grouper]
for ax_list in shared_ys:
for ax2 in ax_list:
grouper.remove(ax2)
##setting limits:
ax.axis('auto')
ax.relim() #<-- maybe not necessary
##adjusting tick positions:
ax.yaxis.set_major_locator(AutoLocator())
ax.yaxis.set_minor_locator(AutoMinorLocator())
##making tick labels visible:
plt.setp(ax.get_yticklabels(), visible=True)
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
The resulting plot looks like this:
Explanation:
You first need to tell each Axes instance that it shouldn't share its yaxis with any other Axis instance. This post got me into the direction of how to do this -- Axes.get_shared_y_axes() returns a Grouper object, that holds references to all other Axes instances with which the current Axes should share its xaxis. Looping through those instances and calling Grouper.remove does the actual un-sharing.
Once the yaxis is un-shared, the y limits and the y ticks need to be adjusted. The former can be achieved with ax.axis('auto') and ax.relim() (not sure if the second command is necessary). The ticks can be adjusted by using ax.yaxis.set_major_locator() and ax.yaxis.set_minor_locator() with the appropriate Locators. Finally, the tick labels can be made visible using plt.setp(ax.get_yticklabels(), visible=True) (see here).
Considering all this, #DavidG's answer is in my opinion the better approach.
I am using Pandas and Matplotlib to create some plots. I want line plots with error bars on them. The code I am using currently looks like this
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
df_yerr = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
fig, ax = plt.subplots()
df.plot(yerr=df_yerr, ax=ax, fmt="o-", capsize=5)
ax.set_xscale("log")
plt.show()
With this code, I get 6 lines on a single plot (which is what I want). However, the error bars completely overlap, making the plot difficult to read.
Is there a way I could slightly shift the position of each point on the x-axis so that the error bars no longer overlap?
Here is a screenshot:
One way to achieve what you want is to plot the error bars 'by hand', but it is neither straight forward nor much better looking than your original. Basically, what you do is make pandas produce the line plot and then iterate through the data frame columns and do a pyplot errorbar plot for each of them such, that the index is slightly shifted sideways (in your case, with the logarithmic scale on the x axis, this would be a shift by a factor). In the error bar plots, the marker size is set to zero:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
colors = ['red','blue','green','yellow','purple','black']
df = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
df_yerr = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
fig, ax = plt.subplots()
df.plot(ax=ax, marker="o",color=colors)
index = df.index
rows = len(index)
columns = len(df.columns)
factor = 0.95
for column,color in zip(range(columns),colors):
y = df.values[:,column]
yerr = df_yerr.values[:,column]
ax.errorbar(
df.index*factor, y, yerr=yerr, markersize=0, capsize=5,color=color,
zorder = 10,
)
factor *= 1.02
ax.set_xscale("log")
plt.show()
As I said, the result is not pretty:
UPDATE
In my opinion a bar plot would be much more informative:
fig2,ax2 = plt.subplots()
df.plot(kind='bar',yerr=df_yerr, ax=ax2)
plt.show()
you can solve with alpha for examples
df.plot(yerr=df_yerr, ax=ax, fmt="o-", capsize=5,alpha=0.5)
You can also check this link for reference