Minor ticks in pandas plot - python

So I am trying to get minor tick grid lines to get displayed but they don't seem to appear on the plot. An example code is
data_temp = pd.read_csv(dir_readfile, dtype=float, delimiter='\t',
names = names, usecols=[0,1,2,3,4])
result = data_temp.groupby(['A', 'D']).agg({'B':'mean', 'E':'mean'})
result2 = result.unstack()
x = np.arange(450, 700, 50, dtype = int)
plt.grid(True, which='both')
plt.minorticks_on()
result2.B.plot(lw=2,colormap='jet',marker='.',markersize=4,
title='A v/s B', legend = True, grid = 'on' ,
xlim = [450, 700], ylim = [-70, -0], xticks = x)
What I get is
The major grid lines are displayed but the minor ones are not. I looked into the pandas documentation but just see the grid option. I was hoping to get the minor ticks grid lines to be a every 10th location on the X axis that is 460 470 etc and every location on the Y (actual scale of Y is a bit smaller)

Before plt.show() add plt.minorticks_on().
If you want to add minor ticks for selected axis then use:
ax = plt.gca()
ax.tick_params(axis='x',which='minor',bottom='off')

Related

Python (twinx plot) Wacky graph

Edit: The graph is fixed now but I am having troubles plotting the legend. It only shows legend for 1 of the plots. As seen in the picture below
I am trying to plot a double axis graph with twinx but I am facing some difficulties as seen in the picture below.
Any input is welcomed! If you require any additional information, I am happy to provide them to you.
as compared to the original before plotting z-axis.
I am unsure why my graph is like that as initially before plotting my secondary y axis, (the pink line), the closing value graph can be seen perfectly but now it seems cut.
It may be due to my data as provided below.
Link to testing1.csv: https://filebin.net/ou93iqiinss02l0g
Code I have currently:
# read csv into variable
sg_df_merged = pd.read_csv("testing1.csv", parse_dates=[0], index_col=0)
# define figure
fig = plt.figure()
fig, ax5 = plt.subplots()
ax6 = ax5.twinx()
x = sg_df_merged.index
y = sg_df_merged["Adj Close"]
z = sg_df_merged["Singapore"]
curve1 = ax5.plot(x, y, label="Singapore", color = "c")
curve2 = ax6.plot(x, z, label = "Face Mask Compliance", color = "m")
curves = [curve1, curve2]
# labels for my axis
ax5.set_xlabel("Year")
ax5.set_ylabel("Adjusted Closing Value ($)")
ax6.set_ylabel("% compliance to wearing face mask")
ax5.grid #not sure what this line does actually
# set x-axis values to 45 degree angle
for label in ax5.xaxis.get_ticklabels():
label.set_rotation(45)
ax5.grid(True, color = "k", linestyle = "-", linewidth = 0.3)
plt.gca().legend(loc='center left', bbox_to_anchor=(1.1, 0.5), title = "Country Index")
plt.show();
Initially, I thought it was due to my excel having entire blank lines but I have since removed the rows which can be found here
Also, I have tried to interpolate but somehow it doesn't work. Any suggestions on this is very much welcomed
Only rows that where all NaN, were dropped. There’s still a lot of rows with NaN.
In order for matplotlib to draw connecting lines between two data points, the points must be consecutive.
The plot API isn't connecting the data between the NaN values
This can be dealt with by converting the pandas.Series to a DataFrame, and using .dropna.
See that x has been dropped, because it will not match the index length of y or z. They are shorter after .dropna.
y is now a separate dataframe, where .dropna is used.
z is also a separate dataframe, where .dropna is used.
The x-axis for the plot are the respective indices.
# read csv into variable
sg_df_merged = pd.read_csv("test.csv", parse_dates=[0], index_col=0)
# define figure
fig, ax5 = plt.subplots(figsize=(8, 6))
ax6 = ax5.twinx()
# select specific columns to plot and drop additional NaN
y = pd.DataFrame(sg_df_merged["Adj Close"]).dropna()
z = pd.DataFrame(sg_df_merged["Singapore"]).dropna()
# add plots with markers
curve1 = ax5.plot(y.index, 'Adj Close', data=y, label="Singapore", color = "c", marker='o')
curve2 = ax6.plot(z.index, 'Singapore', data=z, label = "Face Mask Compliance", color = "m", marker='o')
# labels for my axis
ax5.set_xlabel("Year")
ax5.set_ylabel("Adjusted Closing Value ($)")
ax6.set_ylabel("% compliance to wearing face mask")
# rotate xticks
ax5.xaxis.set_tick_params(rotation=45)
# add a grid to ax5
ax5.grid(True, color = "k", linestyle = "-", linewidth = 0.3)
# create a legend for both axes
curves = curve1 + curve2
labels = [l.get_label() for l in curves]
ax5.legend(curves, labels, loc='center left', bbox_to_anchor=(1.1, 0.5), title = "Country Index")
plt.show()

Matplotlib - How to draw a line from the top of any bar to the y - axis?

I created a cumulative histogram. Now I want to draw a line from top of any bin to the y-axis in that histogram and show the value of it like this:
Can you show me the way to do?
Below is my code to draw that histogram:
plt.rcParams['ytick.right'] = plt.rcParams['ytick.labelright'] = True
plt.rcParams['ytick.left'] = plt.rcParams['ytick.labelleft'] = False
plt.figure(figsize=[8, 6])
plt.hist(df['days'], bins=range(0, 50, 1), color="dodgerblue", edgecolor='black'
,cumulative=-1, density=True
,histtype='barstacked')
plt.xlabel('Number of Days')
plt.ylabel('Density')
Thank you so much!
Oneliner:
plt.axhline(y, color='k', linestyle='dashed', linewidth=1)
Use this to add a horizontal line to your histogram.
Place your mean or value of y in place of y in the above code snippet.
Simply drawing a horizontal line rises two problems:
The line will be drawn on top of the bars, from the left to the right. To have it behind the bars, use zorder=0.
The line will still be visible at the far left, as there are no bars there. Changing the x-axis to a "tight" layout with plt.autoscale(enable=True, axis='x', tight=True) solves that.
To add a new tick at the specific y-position, you can take the list of existing ticks, create a list including the new tick and set those as the new ticks.
To change the color of the newly added tick, you first find its index in the list, and then change the color of the tick with that index.
One problem with this approach, is that the new tick might overlap with an existing tick. This could be solved by looping through the list and if an existing tick is nearer than some epsilon to the new tick, remove the existing tick. This is not yet implemented in the code example.
Alternatively, the tick value could be displayed to the left of the axis, on top of the horizontal line. Of course, that would lead to a problem in case there wouldn't be enough place for the text.
You might want to round the value of the special tick to the nearest hundredths to prevent that the other ticks also get displayed with more digits.
I created an example with simulated data:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({"days": np.random.normal(25, 10, 10000)})
plt.rcParams['ytick.right'] = plt.rcParams['ytick.labelright'] = True
plt.rcParams['ytick.left'] = plt.rcParams['ytick.labelleft'] = False
plt.figure(figsize=[8, 6])
bin_heights, _, _ = plt.hist(df['days'], bins=range(0, 50, 1), color="dodgerblue", edgecolor='black',
cumulative=-1, density=True,
histtype='barstacked')
plt.autoscale(enable=True, axis='both', tight=True) # use axis='x' to only set the x axis tight
special_y = bin_heights[15]
# draw a horizontal line, use zorder=0 so it is drawn behind the bars
plt.axhline(special_y, 0, 1, color='red', linestyle='dashed', linewidth=1, zorder=0)
plt.yticks(list(plt.yticks()[0]) + [special_y]) # add a tick in y for special_y
# find the index of special_y in the new ticks (ticks are sorted automatically)
index_special_y = list(plt.yticks()[0]).index(special_y)
plt.gca().get_yticklabels()[index_special_y].set_color('red') # change the color of the special tick
plt.xlabel('Number of Days')
plt.ylabel('Density')
plt.show()

Seaborn: add counts to countplot? [duplicate]

I have a Pandas DataFrame with a column called "AXLES", which can take an integer value between 3-12. I am trying to use Seaborn's countplot() option to achieve the following plot:
left y axis shows the frequencies of these values occurring in the data. The axis extends are [0%-100%], tick marks at every 10%.
right y axis shows the actual counts, values correspond to tick marks determined by the left y axis (marked at every 10%.)
x axis shows the categories for the bar plots [3, 4, 5, 6, 7, 8, 9, 10, 11, 12].
Annotation on top of the bars show the actual percentage of that category.
The following code gives me the plot below, with actual counts, but I could not find a way to convert them into frequencies. I can get the frequencies using df.AXLES.value_counts()/len(df.index) but I am not sure about how to plug this information into Seaborn's countplot().
I also found a workaround for the annotations, but I am not sure if that is the best implementation.
Any help would be appreciated!
Thanks
plt.figure(figsize=(12,8))
ax = sns.countplot(x="AXLES", data=dfWIM, order=[3,4,5,6,7,8,9,10,11,12])
plt.title('Distribution of Truck Configurations')
plt.xlabel('Number of Axles')
plt.ylabel('Frequency [%]')
for p in ax.patches:
ax.annotate('%{:.1f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
EDIT:
I got closer to what I need with the following code, using Pandas' bar plot, ditching Seaborn. Feels like I'm using so many workarounds, and there has to be an easier way to do it. The issues with this approach:
There is no order keyword in Pandas' bar plot function as Seaborn's countplot() has, so I cannot plot all categories from 3-12 as I did in the countplot(). I need to have them shown even if there is no data in that category.
The secondary y-axis messes up the bars and the annotation for some reason (see the white gridlines drawn over the text and bars).
plt.figure(figsize=(12,8))
plt.title('Distribution of Truck Configurations')
plt.xlabel('Number of Axles')
plt.ylabel('Frequency [%]')
ax = (dfWIM.AXLES.value_counts()/len(df)*100).sort_index().plot(kind="bar", rot=0)
ax.set_yticks(np.arange(0, 110, 10))
ax2 = ax.twinx()
ax2.set_yticks(np.arange(0, 110, 10)*len(df)/100)
for p in ax.patches:
ax.annotate('{:.2f}%'.format(p.get_height()), (p.get_x()+0.15, p.get_height()+1))
You can do this by making a twinx axes for the frequencies. You can switch the two y axes around so the frequencies stay on the left and the counts on the right, but without having to recalculate the counts axis (here we use tick_left() and tick_right() to move the ticks and set_label_position to move the axis labels
You can then set the ticks using the matplotlib.ticker module, specifically ticker.MultipleLocator and ticker.LinearLocator.
As for your annotations, you can get the x and y locations for all 4 corners of the bar with patch.get_bbox().get_points(). This, along with setting the horizontal and vertical alignment correctly, means you don't need to add any arbitrary offsets to the annotation location.
Finally, you need to turn the grid off for the twinned axis, to prevent grid lines showing up on top of the bars (ax2.grid(None))
Here is a working script:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import matplotlib.ticker as ticker
# Some random data
dfWIM = pd.DataFrame({'AXLES': np.random.normal(8, 2, 5000).astype(int)})
ncount = len(dfWIM)
plt.figure(figsize=(12,8))
ax = sns.countplot(x="AXLES", data=dfWIM, order=[3,4,5,6,7,8,9,10,11,12])
plt.title('Distribution of Truck Configurations')
plt.xlabel('Number of Axles')
# Make twin axis
ax2=ax.twinx()
# Switch so count axis is on right, frequency on left
ax2.yaxis.tick_left()
ax.yaxis.tick_right()
# Also switch the labels over
ax.yaxis.set_label_position('right')
ax2.yaxis.set_label_position('left')
ax2.set_ylabel('Frequency [%]')
for p in ax.patches:
x=p.get_bbox().get_points()[:,0]
y=p.get_bbox().get_points()[1,1]
ax.annotate('{:.1f}%'.format(100.*y/ncount), (x.mean(), y),
ha='center', va='bottom') # set the alignment of the text
# Use a LinearLocator to ensure the correct number of ticks
ax.yaxis.set_major_locator(ticker.LinearLocator(11))
# Fix the frequency range to 0-100
ax2.set_ylim(0,100)
ax.set_ylim(0,ncount)
# And use a MultipleLocator to ensure a tick spacing of 10
ax2.yaxis.set_major_locator(ticker.MultipleLocator(10))
# Need to turn the grid on ax2 off, otherwise the gridlines end up on top of the bars
ax2.grid(None)
plt.savefig('snscounter.pdf')
I got it to work using core matplotlib's bar plot. I didn't have your data obviously, but adapting it to yours should be straight forward.
Approach
I used matplotlib's twin axis and plotted the data as bars on the second Axes object. The rest ist just some fiddeling around to get the ticks right and make annotations.
Hope this helps.
Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
tot = np.random.rand( 1 ) * 100
data = np.random.rand( 1, 12 )
data = data / sum(data,1) * tot
df = pd.DataFrame( data )
palette = sns.husl_palette(9, s=0.7 )
### Left Axis
# Plot nothing here, autmatically scales to second axis.
fig, ax1 = plt.subplots()
ax1.set_ylim( [0,100] )
# Remove grid lines.
ax1.grid( False )
# Set ticks and add percentage sign.
ax1.yaxis.set_ticks( np.arange(0,101,10) )
fmt = '%.0f%%'
yticks = matplotlib.ticker.FormatStrFormatter( fmt )
ax1.yaxis.set_major_formatter( yticks )
### Right Axis
# Plot data as bars.
x = np.arange(0,9,1)
ax2 = ax1.twinx()
rects = ax2.bar( x-0.4, np.asarray(df.loc[0,3:]), width=0.8 )
# Set ticks on x-axis and remove grid lines.
ax2.set_xlim( [-0.5,8.5] )
ax2.xaxis.set_ticks( x )
ax2.xaxis.grid( False )
# Set ticks on y-axis in 10% steps.
ax2.set_ylim( [0,tot] )
ax2.yaxis.set_ticks( np.linspace( 0, tot, 11 ) )
# Add labels and change colors.
for i,r in enumerate(rects):
h = r.get_height()
r.set_color( palette[ i % len(palette) ] )
ax2.text( r.get_x() + r.get_width()/2.0, \
h + 0.01*tot, \
r'%d%%'%int(100*h/tot), ha = 'center' )
I think you can first set the y major ticks manually and then modify each label
dfWIM = pd.DataFrame({'AXLES': np.random.randint(3, 10, 1000)})
total = len(dfWIM)*1.
plt.figure(figsize=(12,8))
ax = sns.countplot(x="AXLES", data=dfWIM, order=[3,4,5,6,7,8,9,10,11,12])
plt.title('Distribution of Truck Configurations')
plt.xlabel('Number of Axles')
plt.ylabel('Frequency [%]')
for p in ax.patches:
ax.annotate('{:.1f}%'.format(100*p.get_height()/total), (p.get_x()+0.1, p.get_height()+5))
#put 11 ticks (therefore 10 steps), from 0 to the total number of rows in the dataframe
ax.yaxis.set_ticks(np.linspace(0, total, 11))
#adjust the ticklabel to the desired format, without changing the position of the ticks.
_ = ax.set_yticklabels(map('{:.1f}%'.format, 100*ax.yaxis.get_majorticklocs()/total))

matplotlib, formating space between bars/x labels [duplicate]

How do I increase the space between each bar with matplotlib barcharts, as they keep cramming them self to the centre. (this is what it currently looks)
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def ww(self):#wrongwords text file
with open("wrongWords.txt") as file:
array1 = []
array2 = []
for element in file:
array1.append(element)
x=array1[0]
s = x.replace(')(', '),(') #removes the quote marks from csv file
print(s)
my_list = ast.literal_eval(s)
print(my_list)
my_dict = {}
for item in my_list:
my_dict[item[2]] = my_dict.get(item[2], 0) + 1
plt.bar(range(len(my_dict)), my_dict.values(), align='center')
plt.xticks(range(len(my_dict)), my_dict.keys())
plt.show()
Try replace
plt.bar(range(len(my_dict)), my_dict.values(), align='center')
with
plt.figure(figsize=(20, 3)) # width:20, height:3
plt.bar(range(len(my_dict)), my_dict.values(), align='edge', width=0.3)
The option align='edge' will eliminate white space on the left of the bar chart.
and width=0.3 set the bars' width smaller size than the default value.
For the labels along x-axis, they should be rotated 90 degrees to make them readable.
plt.xticks(range(len(my_dict)), my_dict.keys(), rotation='vertical')
There are 2 ways to increase the space between the bars
For reference here is the plot functions
plt.bar(x, height, width=0.8, bottom=None, *, align='center', data=None, **kwargs)
Decrease the width of the bar
The plot function has a width parameter that controls the width of the bar. If you decrease the width the space between the bars will automatically reduce. Width for you is set to 0.8 by default.
width = 0.5
Scale the x-axis so the bars are placed further apart from each other
If you want to keep the width constant you will have to space out where the bars are placed on x-axis. You can use any scaling parameter. For example
x = (range(len(my_dict)))
new_x = [2*i for i in x]
# you might have to increase the size of the figure
plt.figure(figsize=(20, 3)) # width:10, height:8
plt.bar(new_x, my_dict.values(), align='center', width=0.8)
This answer changes the space between bars and it also rotate the labels on the x-axis. It also lets you change the figure size.
fig, ax = plt.subplots(figsize=(20,20))
# The first parameter would be the x value,
# by editing the delta between the x-values
# you change the space between bars
plt.bar([i*2 for i in range(100)], y_values)
# The first parameter is the same as above,
# but the second parameter are the actual
# texts you wanna display
plt.xticks([i*2 for i in range(100)], labels)
for tick in ax.get_xticklabels():
tick.set_rotation(90)
set your x axis limits starting from slightly negative value to slightly larger value than the number of bars in your plot and change the width of the bars in the barplot command
for example I did this for a barplot with just two bars
ax1.axes.set_xlim(-0.5,1.5)

matplotlib bar chart: space out bars

How do I increase the space between each bar with matplotlib barcharts, as they keep cramming them self to the centre. (this is what it currently looks)
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def ww(self):#wrongwords text file
with open("wrongWords.txt") as file:
array1 = []
array2 = []
for element in file:
array1.append(element)
x=array1[0]
s = x.replace(')(', '),(') #removes the quote marks from csv file
print(s)
my_list = ast.literal_eval(s)
print(my_list)
my_dict = {}
for item in my_list:
my_dict[item[2]] = my_dict.get(item[2], 0) + 1
plt.bar(range(len(my_dict)), my_dict.values(), align='center')
plt.xticks(range(len(my_dict)), my_dict.keys())
plt.show()
Try replace
plt.bar(range(len(my_dict)), my_dict.values(), align='center')
with
plt.figure(figsize=(20, 3)) # width:20, height:3
plt.bar(range(len(my_dict)), my_dict.values(), align='edge', width=0.3)
The option align='edge' will eliminate white space on the left of the bar chart.
and width=0.3 set the bars' width smaller size than the default value.
For the labels along x-axis, they should be rotated 90 degrees to make them readable.
plt.xticks(range(len(my_dict)), my_dict.keys(), rotation='vertical')
There are 2 ways to increase the space between the bars
For reference here is the plot functions
plt.bar(x, height, width=0.8, bottom=None, *, align='center', data=None, **kwargs)
Decrease the width of the bar
The plot function has a width parameter that controls the width of the bar. If you decrease the width the space between the bars will automatically reduce. Width for you is set to 0.8 by default.
width = 0.5
Scale the x-axis so the bars are placed further apart from each other
If you want to keep the width constant you will have to space out where the bars are placed on x-axis. You can use any scaling parameter. For example
x = (range(len(my_dict)))
new_x = [2*i for i in x]
# you might have to increase the size of the figure
plt.figure(figsize=(20, 3)) # width:10, height:8
plt.bar(new_x, my_dict.values(), align='center', width=0.8)
This answer changes the space between bars and it also rotate the labels on the x-axis. It also lets you change the figure size.
fig, ax = plt.subplots(figsize=(20,20))
# The first parameter would be the x value,
# by editing the delta between the x-values
# you change the space between bars
plt.bar([i*2 for i in range(100)], y_values)
# The first parameter is the same as above,
# but the second parameter are the actual
# texts you wanna display
plt.xticks([i*2 for i in range(100)], labels)
for tick in ax.get_xticklabels():
tick.set_rotation(90)
set your x axis limits starting from slightly negative value to slightly larger value than the number of bars in your plot and change the width of the bars in the barplot command
for example I did this for a barplot with just two bars
ax1.axes.set_xlim(-0.5,1.5)

Categories

Resources