I am trying to prevent the labels in the Northeast US map below from overlapping. I have tried to turn labels on and off for certain states in the region, but there definitely is a better way of doing it. Below is my code and output.
csv = pd.read_csv(r'C:\Downloads\Data.csv')
sf = r'C:\Downloads\s_11au16\s_11au16.shp'
US = gpd.read_file(sf)
#Merge them
data = gpd.GeoDataFrame(csv.merge(US))
#set projection
data = data.to_crs(epsg=6923)
#set up basemap
ax = data.plot(figsize = (12,8), column="soil_data", cmap="Greens", edgecolor='black', linewidth=.5, vmin=0, vmax=70,
missing_kwds={"color": "white", "edgecolor": "k", "label": "none"})
ax.set_title("Example", fontsize=18, fontweight='bold')
ax.set_axis_off()
#annotate data
label = data.dropna(subset='soil_data')
label.apply(lambda x: ax.annotate(text=int(x['soil_data']), xy=x.geometry.centroid.coords[0], color="black",
ha='center', fontsize=14, path_effects=[pe.withStroke(linewidth=3,
foreground="white")]), axis=1)
Obviously I cannot test it without your data but if you're willing to try again with adjustText you could try replacing your label.apply(...) with something like that:
texts = []
for i, row in label.iterrows():
texts.append(ax.annotate(text=int(row['soil_data']), xy=row.geometry.centroid.coords[0], color="black",
ha='center', fontsize=14, path_effects=[pe.withStroke(linewidth=3,
foreground="white")]))
adjust_text(texts)
I don't know how adjust_text deals with annotations, so if this doesn't work, you could try converting it to plt.text.
(The matplotlib class Annotation inherits from the Text class)
Related
I want to add a legend for the blue vertical dashed lines and black vertical dashed lines with label long entry points and short entry points respectively. The other two lines (benchmark and manual strategy portfolio) came from the dataframe.
How do I add a legend for the two vertical line styles?
Here is my existing code and the corresponding graph. The dataframe is a two column dataframe of values that share date indices (the x) and have y values. The blue_x_coords and black_x_coords are the date indices for the vertical lines, as you would expect. Thanks in advance!
ax = df.plot(title=title, fontsize=12, color=["tab:purple", "tab:red"])
ax.set_xlabel(xlabel)
ax.set_ylabel(ylabel)
for xc in blue_x_coords:
plt.axvline(x=xc, color="blue", linestyle="dashed", label="Long Entry points")
for xc in black_x_coords:
plt.axvline(x=xc, color="black", linestyle="dashed", label="Short Entry points")
plt.savefig("./images/" + filename)
plt.clf()
You can do this by simply specifying the legend yourself instead of relying on pandas to do it for you.
Each call to ax.axvline will add another entry to your legend, so the only trick we'll need to do is deduplicate legend entries who share the same label. From there we simply call ax.legend with the corresponding handles and labels.
from matplotlib.pyplot import subplots, show
from pandas import DataFrame, date_range, to_datetime
from numpy.random import default_rng
from matplotlib.dates import DateFormatter
rng = default_rng(0)
df = DataFrame({
'Benchmark': rng.normal(0, .1, size=200),
'Manual Strategy Portfolio': rng.uniform(-.1, .1, size=200).cumsum(),
}, index=date_range('2007-12', freq='7D', periods=200))
ax = df.plot(color=['tab:purple', 'tab:red'])
blue_x_coords = to_datetime(['2008-07', '2009-11', '2010-10-12'])
black_x_coords = to_datetime(['2008-02-15', '2009-01-15', '2011-09-23'])
for xc in blue_x_coords:
blue_vline = ax.axvline(x=xc, color="blue", linestyle="dashed", label="Long Entry points")
for xc in black_x_coords:
black_vline = ax.axvline(x=xc, color="black", linestyle="dashed", label="Short Entry points")
# De-duplicate all legend entries based on their label
legend_entries = {label: artist for artist, label in zip(*ax.get_legend_handles_labels())}
# Restructure data to pass into ax.legend
labels, handles = zip(*legend_entries.items())
ax.legend(labels=labels, handles=handles, loc='center left', bbox_to_anchor=(1.02, .5))
You can just do plt.legend() before plt.show() but here you need to use vlines() here ymin and ymax are required
ax=df.plot(color=["green","red"])
ax.set_title("Test")
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.vlines(range(0,100,25),label="Long Entry points",linestyle="--",ymin=0,ymax=100,color="blue")
# you can pass blue_x_coords instead of range
ax.vlines(range(0,100,15),label="Short Entry points",linestyle="--",ymin=0,ymax=100,color="black")
# you can pass black_x_coords instead of range
plt.legend()
plt.show()
Output:
If using axvline then you can follow this approach:
You can add new legend with Axes.add_artist() to add new legend in the existing plot.
plt.legend() will work here as you have added label in axvline() but there's a catch as it's added via loop then that many label are added.
Removed label from plt.axvline as it is being added multiple time and thus there will be that many different label in legend.
While adding new legend you need to pass loc also or else it will be at default place only.
It will be added as another legend and not merged in same legend (I don't know method to add in same legend if someone knows please show)
ax=df.plot(color=["green","red"])
ax.set_title("Test")
ax.set_xlabel("X")
ax.set_ylabel("Y")
for xc in range(0,100,25):
line1=plt.axvline(x=xc, color="blue", linestyle="dashed")
for xc in range(0,100,15):
line2=plt.axvline(x=xc, color="black", linestyle="dashed")
new_legend=plt.legend([line1,line2],["Long Entry points","Short Entry points"],loc="lower right")
ax.add_artist(new_legend)
plt.legend()
plt.show()
Output:
Answer: Seems like the easiest way is to replace the for loops:
ax.vlines(x=blue_x_coords, colors="blue", ymin=bottom, ymax=top, linestyles="--", label="Long Entry Points")
ax.vlines(x=black_x_coords, colors="black", ymin=bottom, ymax=top, linestyles="--", label="Short Entry Points")
ax.legend()
Edit: The graph is fixed now but I am having troubles plotting the legend. It only shows legend for 1 of the plots. As seen in the picture below
I am trying to plot a double axis graph with twinx but I am facing some difficulties as seen in the picture below.
Any input is welcomed! If you require any additional information, I am happy to provide them to you.
as compared to the original before plotting z-axis.
I am unsure why my graph is like that as initially before plotting my secondary y axis, (the pink line), the closing value graph can be seen perfectly but now it seems cut.
It may be due to my data as provided below.
Link to testing1.csv: https://filebin.net/ou93iqiinss02l0g
Code I have currently:
# read csv into variable
sg_df_merged = pd.read_csv("testing1.csv", parse_dates=[0], index_col=0)
# define figure
fig = plt.figure()
fig, ax5 = plt.subplots()
ax6 = ax5.twinx()
x = sg_df_merged.index
y = sg_df_merged["Adj Close"]
z = sg_df_merged["Singapore"]
curve1 = ax5.plot(x, y, label="Singapore", color = "c")
curve2 = ax6.plot(x, z, label = "Face Mask Compliance", color = "m")
curves = [curve1, curve2]
# labels for my axis
ax5.set_xlabel("Year")
ax5.set_ylabel("Adjusted Closing Value ($)")
ax6.set_ylabel("% compliance to wearing face mask")
ax5.grid #not sure what this line does actually
# set x-axis values to 45 degree angle
for label in ax5.xaxis.get_ticklabels():
label.set_rotation(45)
ax5.grid(True, color = "k", linestyle = "-", linewidth = 0.3)
plt.gca().legend(loc='center left', bbox_to_anchor=(1.1, 0.5), title = "Country Index")
plt.show();
Initially, I thought it was due to my excel having entire blank lines but I have since removed the rows which can be found here
Also, I have tried to interpolate but somehow it doesn't work. Any suggestions on this is very much welcomed
Only rows that where all NaN, were dropped. There’s still a lot of rows with NaN.
In order for matplotlib to draw connecting lines between two data points, the points must be consecutive.
The plot API isn't connecting the data between the NaN values
This can be dealt with by converting the pandas.Series to a DataFrame, and using .dropna.
See that x has been dropped, because it will not match the index length of y or z. They are shorter after .dropna.
y is now a separate dataframe, where .dropna is used.
z is also a separate dataframe, where .dropna is used.
The x-axis for the plot are the respective indices.
# read csv into variable
sg_df_merged = pd.read_csv("test.csv", parse_dates=[0], index_col=0)
# define figure
fig, ax5 = plt.subplots(figsize=(8, 6))
ax6 = ax5.twinx()
# select specific columns to plot and drop additional NaN
y = pd.DataFrame(sg_df_merged["Adj Close"]).dropna()
z = pd.DataFrame(sg_df_merged["Singapore"]).dropna()
# add plots with markers
curve1 = ax5.plot(y.index, 'Adj Close', data=y, label="Singapore", color = "c", marker='o')
curve2 = ax6.plot(z.index, 'Singapore', data=z, label = "Face Mask Compliance", color = "m", marker='o')
# labels for my axis
ax5.set_xlabel("Year")
ax5.set_ylabel("Adjusted Closing Value ($)")
ax6.set_ylabel("% compliance to wearing face mask")
# rotate xticks
ax5.xaxis.set_tick_params(rotation=45)
# add a grid to ax5
ax5.grid(True, color = "k", linestyle = "-", linewidth = 0.3)
# create a legend for both axes
curves = curve1 + curve2
labels = [l.get_label() for l in curves]
ax5.legend(curves, labels, loc='center left', bbox_to_anchor=(1.1, 0.5), title = "Country Index")
plt.show()
I believe I am 90% of the way to solving this issue. As can be seen on the attached image, I am trying to create a nested donut chart based on two groups (rem and deep), the values have been plotted to show the proportion (in relation to 100%) a user is achieving. I want to display this in the "apple health ring style", in order for this to be effective, I want to highlight the two "types" (rem and deep) using two different colors. Within the image you can see the DF, the code applied to generate the existing view and the output. To summarize, I want to;
Assign a set color to "rem", and a different one for "deep"
Remove the axis labels and tick marks
Ideally (although I can probably do this), better format the labels (in some way).
Output to image file with black background
import re
# create donut plots
my_dpi=150
plt.figure(figsize=(1500/my_dpi, 900/my_dpi), dpi=my_dpi)
startingRadius = 0.7 + (0.3* (len(Ian_MitchellRD)-1))
for index, row in Ian_MitchellRD.iterrows():
scenario = row["index"]
percentage = row["Ian Mitchell"]
textLabel = scenario + ': ' + percentage+ '%'
print(startingRadius)
percentage = int(re.search(r'\d+', percentage).group())
remainingPie = 100 - percentage
donut_sizes = [remainingPie, percentage]
#colors = ['#FDFEFE','#AED6F1','#5cdb6f','#AED6F1']
plt.title('Proportion of Recommended Sleep Type being Achieved')
plt.text(0.05, startingRadius - 0.20, textLabel, horizontalalignment='left', verticalalignment='center', color='black')
plt.pie(donut_sizes, radius=startingRadius, startangle=90, colors=colors, frame=True,
wedgeprops={"edgecolor": "white", 'linewidth': 2}, )
startingRadius-=0.3
# equal ensures pie chart is drawn as a circle (equal aspect ratio)
plt.axis('equal')
# create circle and place onto pie chart
circle = plt.Circle(xy=(0, 0), radius=0.35, facecolor='white')
plt.gca().add_artist(circle)
plt.show()
See image of what the code current generates:
UPDATE:
I amended the code per the recommendation to look into the example suggested, code is now;
import matplotlib.lines as mlines
fig, ax = plt.subplots()
ax.axis('equal')
width = 0.25
fig.patch.set_facecolor('black')
fig.set_size_inches(10,10)
data_1 = Ian_MitchellRD.iloc[0]['Ian Mitchell']
data_2 = Ian_MitchellRD.iloc[1]['Ian Mitchell']
remainingPie_1 = 100 - data_1
remainingPie_2 = 100 - data_2
donut_sizes_1 = [remainingPie_1, data_1]
donut_sizes_2 = [remainingPie_2, data_2]
pie, _ = ax.pie(donut_sizes_1, radius=1, colors=['black','lightgreen'],startangle=90)
plt.setp( pie, width=width, edgecolor='black')
pie2, _ = ax.pie(donut_sizes_2, radius=1-width,startangle=90, colors=['black','pink'])
plt.setp( pie2, width=width, edgecolor='black')
plt.title("Ian Mitchell - Average % of REM + Deep Sleep vs Recommended", fontfamily='Consolas', size=16, color='white')
#setting up the legend
greenbar= mlines.Line2D([], [], color='lightgreen', marker='s', linestyle='None',
markersize=10, label='REM Sleep')
pinkbar = mlines.Line2D([], [], color='pink', marker='s', linestyle='None',
markersize=10, label='Deep Sleep')
plt.legend(handles=[greenbar, pinkbar],prop={'size': 12}, loc='lower right')
plt.show()
generating the following;
I would really appreciate guidance on adding labels either, directly on the relevant sections of the chart - i.e. the green chart has a label of around 94%, and the pink segment around 55%.
Thanks,
New Version
I'm using Seaborn to generate many types of graphs, but will use just a simple example here for illustration purposes based on an included dataset:
import seaborn
tips = seaborn.load_dataset("tips")
axes = seaborn.scatterplot(x="day", y="tip", size="sex", hue="time", data=tips)
In this result, the single legend box contains two titles "time" and "sex", each with sub-elements.
How could I easily separate the legend into two boxes, each with a single title? I.e. one for legend box indicating color codes (that could be placed at the left), and one legend box indicating size codes (that would be placed at the right).
The following code works well because there is the same number of time categories as sex categories. If it is not necessarily the case, you would have to calculate a priori how many lines of legend are required by each "label"
fig = plt.figure()
tips = seaborn.load_dataset("tips")
axes = seaborn.scatterplot(x="day", y="tip", size="sex", hue="time", data=tips)
h,l = axes.get_legend_handles_labels()
l1 = axes.legend(h[:int(len(h)/2)],l[:int(len(l)/2)], loc='upper left')
l2 = axes.legend(h[int(len(h)/2):],l[int(len(l)/2):], loc='upper right')
axes.add_artist(l1) # we need this because the 2nd call to legend() erases the first
If you want to use matplotlib instead of seaborn,
import matplotlib.pyplot as plt
import seaborn
tips = seaborn.load_dataset("tips")
tips["time_int"] = tips["time"].cat.codes
tips["sex_int"] = (tips["sex"].cat.codes*5+5)**2
sc = plt.scatter(x="day", y="tip", s="sex_int", c="time_int", data = tips, cmap="bwr")
leg1 = plt.legend(sc.legend_elements("colors")[0], tips["time"].cat.categories,
title="Time", loc="upper right")
leg2 = plt.legend(sc.legend_elements("sizes")[0], tips["sex"].cat.categories,
title="Sex", loc="upper left")
plt.gca().add_artist(leg1)
plt.show()
I took Diziet's answer and expanded on it. He produced the necessary syntax I was needing, but as he pointed out, was missing a way to calculate how many lines of legend are required for splitting the legend. I have added this, and wrote a complete script:
# Modules #
import seaborn
from matplotlib import pyplot
# Plot #
tips = seaborn.load_dataset("tips")
axes = seaborn.scatterplot(x="day", y="tip", size="sex", hue="time", data=tips)
# Legend split and place outside #
num_of_colors = len(tips['time'].unique()) + 1
handles, labels = axes.get_legend_handles_labels()
color_hl = handles[:num_of_colors], labels[:num_of_colors]
sizes_hl = handles[num_of_colors:], labels[num_of_colors:]
# Call legend twice #
color_leg = axes.legend(*color_hl,
bbox_to_anchor = (1.05, 1),
loc = 'upper left',
borderaxespad = 0.)
sizes_leg = axes.legend(*sizes_hl,
bbox_to_anchor = (1.05, 0),
loc = 'lower left',
borderaxespad = 0.)
# We need this because the 2nd call to legend() erases the first #
axes.add_artist(color_leg)
# Adjust #
pyplot.subplots_adjust(right=0.75)
# Display #
pyplot.ion()
pyplot.show()
My x-axis ticklabels (the ones below graph) are stealing valuable space from the overall figure. I have tried to reduce its size by changing the text rotation, but that doesn't help much since the text labels are quite long.
Is there a better approach for reducing the space taken up by the xticklabel area? For instance, could I display this text inside the bars? Thanks for your support.
My code for graph settings is:
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams['font.sans-serif'] = "Century Gothic"
matplotlib.rcParams['font.family'] = "Century Gothic"
ax = df1.plot.bar(x = '', y = ['Events Today', 'Avg. Events Last 30 Days'], rot = 25, width=0.8 , linewidth=1, color=['midnightblue','darkorange'])
for item in ([ax.xaxis.label, ax.yaxis.label] +
ax.get_xticklabels() + ax.get_yticklabels()):
item.set_fontsize(15)
ax.legend(fontsize = 'x-large', loc='best')
plt.tight_layout()
ax.yaxis.grid(True, which='major', linestyle='-', linewidth=0.15)
ax.set_facecolor('#f2f2f2')
plt.show()
When I end up with unaesthetically long xticklabels, the first and most important thing I do is to try to shorten them. It might seems evident, but it's worth pointing out that using an abbreviation or different description is often the simplest and most effective solution.
If you are stuck with long names and a certain fontsize, I recommend making a horizontal barplot instead. I generally prefer horizontal plots with longer labels, since it is easier to read text that is not rotated (which might also make it possible to reduce the fontsize one step further) Adding newlines can help as well.
Here is an example of an a graph with unwieldy labels:
import pandas as pd
import seaborn as sns # to get example data easily
iris = sns.load_dataset('iris')
means = iris.groupby('species').mean()
my_long_labels = ['looooooong_versicolor', 'looooooooog_setosa', 'looooooooong_virginica']
# Note the simpler approach of setting fontsize compared to your question
ax = means.plot(kind='bar', y=['sepal_length', 'sepal_width'], fontsize=15, rot=25)
ax.set_xlabel('')
ax.set_xticklabels(my_long_labels)
I would change this to a horizontal barplot:
ax = means.plot(kind='barh', y=['sepal_length', 'sepal_width'], fontsize=15)
ax.set_ylabel('')
ax.set_yticklabels(my_long_labels)
You could introduce newlines in the labels to further improve readability:
ax = means.plot(kind='barh', y=['sepal_length', 'sepal_width'], fontsize=15, rot=0)
ax.set_ylabel('')
ax.set_yticklabels([label.replace('_', '\n') for label in my_long_labels])
This also works with vertical bars:
ax = means.plot(kind='bar', y=['sepal_length', 'sepal_width'], fontsize=15, rot=0)
ax.set_xlabel('')
ax.set_xticklabels([label.replace('_', '\n') for label in my_long_labels])
Finally, you could also have the text inside the bars, but this is difficult to read.
ax = means.plot(kind='barh', y=['sepal_length', 'sepal_width'], fontsize=15)
ax.set_ylabel('')
ax.set_yticklabels(my_long_labels, x=0.03, ha='left', va='bottom')