Pandas x-ticks plot not working - python

Trying to plot a graph and set x ticks but to no avail. Any help would be appreciated thanks!
postcodes = ['6000', '6003', '6005', '6006', '6007', '6008', '6009', '6010', '6011', '6012']
ax4 = data9310.plot(title='Population 2011/2016')
ax4.set_xticklabels(postcodes, rotation=0)
Result:
6003 is in 6000 spot and so on... so each point should have one of the postcodes on the x axis

Your graph labels all major x-ticks that have been automatically created by matplotlib. You can specify the x-ticks that are set:
#create random test data
data = pd.DataFrame(np.random.randint(1, 100, (10, 3)), columns = list("ABC"))
#your plot
postcodes = ['6000', '6003', '6005', '6006', '6007', '6008', '6009', '6010', '6011', '6012']
ax4 = data.plot(title='Population 2011/2016')
#specify that every n-th x-tick is shown, in case that with n = 1 the axis is too crowded
n = 2
#set x-ticks
ax4.set_xticks(np.arange(0, len(postcodes), n))
#label them according to your list
ax4.set_xticklabels(postcodes[::n], rotation=0)
plt.show()
Output:

Related

Creating subplots with seaborn, filtered y values

I have this plot:
Which I am plotting using:
import pydove as dv
#Read in proper dataframe
intensity = pd.read_csv('filtered_storm_vmax_real.csv', sep = ',', header= 0 )
print(intensity.head(5))
#Define variables for linear regression - intensity
x_col ='season'
y_col ='vmax'
x = intensity[x_col]
y = intensity[y_col]
x_array = np.array(x).reshape(-1,1)
y_array = np.array(y).reshape(-1,1)
sns.set_theme(style="darkgrid", color_codes="ch:start=.2,rot=-.3")
sns.color_palette("ch:start=.2,rot=-.3", as_cmap=True)
fig, ax = plt.subplots()
res = dv.regplot(x,y, ax=ax,scatter=True, scatter_kws={'alpha':0.25})
plt.setp(ax.collections[1], alpha=0.5)
ax.set_xlabel('Season')
ax.set_ylabel('Intensity of Storm (max wind in knots)')
ax.set_title('Intensity of Storms Over Time')
And I want to create subplots with linear regressions (like the main plot) but only show the following y-axis value intervals:
0-40
40-80
80-120
120-160
I tried to use Seaborn's FacetGrid like so:
col_ordered = ['1','2','3','4']
g = sns.FacetGrid(data=intensity, col='subplot', col_order= col_ordered, sharex='row' ,height=1.7, aspect=4)
But didn't result in the proper y value intervals. Plots were also horizontal instead of stacked on top of one another as I am aiming for.

Remove for loops when plotting matplotlib subplots

I have large subplot-based figure to produce in python using matplotlib. In total the figure has in excess of 500 individual plots each with 1000s of datapoints. This can be plotted using a for loop-based approach modelled on the minimum example given below
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
# define main plot names and subplot names
mains = ['A','B','C','D']
subs = list(range(9))
# generate mimic data in pd dataframe
col = [letter+str(number) for letter in mains for number in subs]
col.insert(0,'Time')
df = pd.DataFrame(columns=col)
for title in df.columns:
df[title] = [i for i in range(100)]
# although alphabet and mains are the same in this minimal example this may not always be true
alphabet = ['A', 'B', 'C', 'D']
column_names = [column for column in df.columns if column != 'Time']
# define figure size and main gridshape
fig = plt.figure(figsize=(15, 15))
outer = gridspec.GridSpec(2, 2, wspace=0.2, hspace=0.2)
for i, letter in enumerate(alphabet):
# define inner grid size and shape
inner = gridspec.GridSpecFromSubplotSpec(3, 3,
subplot_spec=outer[i], wspace=0.1, hspace=0.1)
# select only columns with correct letter
plot_array = [col for col in column_names if col.startswith(letter)]
# set title for each letter plot
ax = plt.Subplot(fig, outer[i])
ax.set_title(f'Letter {letter}')
ax.axis('off')
fig.add_subplot(ax)
# create each subplot
for j, col in enumerate(plot_array):
ax = plt.Subplot(fig, inner[j])
X = df['Time']
Y = df[col]
# plot waveform
ax.plot(X, Y)
# hide all axis ticks
ax.axis('off')
# set y_axis limits so all plots share same y_axis
ax.set_ylim(df[column_names].min().min(),df[column_names].max().max())
fig.add_subplot(ax)
However this is slow, requiring minutes to plot the figure. Is there a more efficient (potentially for loop free) method to achieve the same result
The issue with the loop is not the plotting but the setting of the axis limits with df[column_names].min().min() and df[column_names].max().max().
Testing with 6 main plots, 64 subplots and 375,000 data points, the plotting section of the example takes approx 360s to complete when axis limits are set by searching df for min and max values each loop. However by moving the search for min and max outside the loops. eg
# set y_lims
y_upper = df[column_names].max().max()
y_lower = df[column_names].min().min()
and changing
ax.set_ylim(df[column_names].min().min(),df[column_names].max().max())
to
ax.set_ylim(y_lower,y_upper)
the plotting time is reduced to approx 24 seconds.

How to generate labelled barplots using seaborn?

I am a bit new to Python. And I am playing with a dummy dataset to get some Python data manipulation practice. Below is the code for generating the dummy data:
d = {
'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] ,
'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] ,
'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] ,
'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] ,
'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] ,
'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
CarWash = pd.DataFrame(data = d)
categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
numerical = ['DistancefromBranch']
CarWash[categoricals] = CarWash[categoricals].astype('category')
I am basically struggling with a couple of things:
#1. A stacked barplot with absolute values (like the excel example below)
#2. A stacked barplot with percentage values (like the excel example below)
Below are my target visualizations for # 1 and # 2 using countplot().
#1
#2
For # 1, instead of a stacked barplot, with countplot() I am able to make a clustered barplot, like below, and also the annotation snippet feels more like a workaround rather than being Python elegant.
# Looping through each categorical column and viewing target variable distribution (ReversedPayment) by value
figure, axes = plt.subplots(2,2,figsize = (10,10))
for i,ax in zip(categoricals[:-1],axes.flatten()):
sns.countplot(x= i, hue = 'ReversedPayment', data = CarWash, ax = ax)
for p in ax.patches:
height = np.nan_to_num(p.get_height()) # gets the height of each patch/bar
adjust = np.nan_to_num(p.get_width())/2 # a calculation for adusting the data label later
label_xy = (np.nan_to_num(p.get_x()) + adjust,np.nan_to_num(p.get_height()) + adjust) #x,y coordinates where we want to put our data label
ax.annotate(height,label_xy) # final annotation
For # 2, I tried creating a new data frame housing % values but that felt tedious and error-prone.
I feel an option like stacked = True, proportion = True, axis = 1, annotate = True could have been so useful for countplot() to have.
Are there any other libraries that would be straight-froward and less code-intensive? Any comments or suggestions are welcome.
In this case, I think plotly.express may be more intuitive for you.
import plotly.express as px
df_temp = CarWash.groupby(['SeniorCitizen', 'ReversedPayment'])['DistancefromBranch'].count().reset_index().rename({'DistancefromBranch':'count'}, axis=1)
fig = px.bar(df_temp, x="SeniorCitizen", y="count", color="ReversedPayment", title="SeniorCitizen", text='count')
fig.update_traces(textposition='inside')
fig.show()
Basically, if you want to have more flexibility to adjust your charts, it is hard to avoid writing lots of codes.
I also try using matplotlib and pandas to create a stacked bar chart for percentages. If you are interested in it, you can try it.
sns.set()
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=[12,8], dpi=100)
# Conver the axes matrix to a 1-d array
axes = ax.flatten()
for i, col in enumerate(['SeniorCitizen', 'CollegeDegree', 'Married', 'FulltimeJob']):
# Calculate the number of plots
df_temp = (CarWash.groupby(col)['ReversedPayment']
.value_counts()
.unstack(1).fillna(0)
.rename({0:f'No', 1:f'Yes'})
.rename({0:'No', 1:'Yes'}, axis=1))
df_temp = df_temp / df_temp.sum(axis=0)
df_temp.plot.bar(stacked=True, ax=axes[i])
axes[i].set_title(col, y=1.03, fontsize=16)
rects = axes[i].patches
labels = df_temp.values.flatten()
for rect, label in zip(rects, labels):
if label == 0: continue
axes[i].text(rect.get_x() + rect.get_width() / 2, rect.get_y() + rect.get_height() / 3, '{:.2%}'.format(label),
ha='center', va='bottom', color='white', fontsize=12)
axes[i].legend(title='Reversed\nPayment', bbox_to_anchor=(1.05, 1), loc='upper left', title_fontsize = 10, fontsize=10)
axes[i].tick_params(rotation=0)
plt.tight_layout()
plt.show()

Change titles in a for loop for plt.plot and create 6x16 subplots

secondHold = np.zeros((96,30))
channel = ['channel' for x in range(96)]
for i in range (96):
BlankBinsx = bins[blankposition,0:30,i]
StimBinsx = bins[NonBlankPositions,0:30,i]
meanx = BlankBinsx.mean(axis=0);
stimmeanx = StimBinsx.mean(axis=0);
for j in range(30):
hold[i][j] = meanx[j];
secondHold[i][j] = stimmeanx[j];
plt.subplots(1, 1, sharex='all', sharey='all')
plt.plot(hold[i], label='stimulus')
plt.plot(secondHold[i], label='Blank Stimulus')
plt.title('Channel x')
plt.xlabel('time (ms)')
plt.ylabel('Avg Spike Rate')
plt.legend()
plt.show()
I am creating 96 different graphs through a for-loop and I want it to also label the graphs (i.e., the first graph would be 'Channel 1', graph two 'Channel 2' and so on. I tried ax.set_title but couldn't figure it out how to make it work with the string and numbers.
Also I'd like the graphs to print as a 6x16 subplots instead of 96 graphs in a column.
You are creating a new figure each time in your for loop that's why you get 96 figures. I don't have your data so I can't provide a final figure but the following should work for you. The idea here is:
Define a figure and an array of axes containing 6x16 subplots.
Use enumerate on axes.flatten to iterate through the subfigures ax row wise and use i as the index to access the data.
Use the field specifier %d to label the subplots iteratively.
Put plt.show() outside the for loop
secondHold = np.zeros((96,30))
channel = ['channel' for x in range(96)]
fig, axes = plt.subplots(nrows=6, ncols=16, sharex='all', sharey='all')
for i, ax in enumerate(axes.flatten()):
BlankBinsx = bins[blankposition,0:30,i]
StimBinsx = bins[NonBlankPositions,0:30,i]
meanx = BlankBinsx.mean(axis=0);
stimmeanx = StimBinsx.mean(axis=0);
for j in range(30):
hold[i][j] = meanx[j];
secondHold[i][j] = stimmeanx[j];
ax.plot(hold[i], label='stimulus')
ax.plot(secondHold[i], label='Blank Stimulus')
ax.set_title('Channel %d' %i)
ax.set_xlabel('time (ms)')
ax.set_ylabel('Avg Spike Rate')
ax.legend()
plt.show()

Matplotlib bar chart show x-ticks only at non-zero bars

I have to make a (stacked) bar plot that has ~3000 positions on the x axis. However, many of these positions do not contain bars but are still labeled on the x-axis, making reading the plot difficult. Is there any way to only show x-ticks for existing (stacked) bars? The spaces between the bars based on the x-tick values are necessary. How would one tackle this in matplotlib? Is there a more fitting plot than a stacked bar chart? I'm constructing the plots from a pandas cross-table (pd.crosstab()).
link to image of plot:
https://i.stack.imgur.com/qk99z.png
as an example of what my dataframe would look like (thanks gepcel):
import pandas as pd
import numpy as np
N = 3200
df = pd.DataFrame(np.random.randint(1, 5, size=(N, 3)))
df.loc[np.random.choice(df.index, size=3190, replace=False), :] = 0
df_select = df[df.sum(axis=1)>0]
Basically, without an example, you should select out the ticks that the total value (aka, stacked value) greater than zero. And then set the xticks and xticklabels manually.
Let's say you have a dataframe like the following:
import pandas as pd
import numpy as np
N = 3200
df = pd.DataFrame(np.random.randint(1, 5, size=(N, 3)))
df.loc[np.random.choice(df.index, size=3190, replace=False), :] = 0
Then the selected data should be something like this:
df_select = df[df.sum(axis=1)>0]
And then you can plot a stacked bar plot like:
# set width=20, the bar is not too thin to show
plt.bar(df_select.index, df_select[0], width=20, label='0')
plt.bar(df_select.index, df_select[1], width=20, label='1',
bottom=df_select[0])
plt.bar(df_select.index, df_select[2], width=20, label='2',
bottom=df_select[0]+df_select[1])
# Only show the selected ticks, it'll be a little tricky if
# you want ticklabels to be different than ticks
# And still hard to avoid ticklabels overlapping
plt.xticks(df_select.index)
plt.legend()
plt.show()
The result should be something like this:
UPDATE:
It's easy to put texts on top of bars by:
for n, row in df_select.iterrows():
plt.text(n, row.sum()+0.2, n, ha='center', rotation=90, va='bottom')
It's to calculate the position of the top of each bar, and put text there, and maybe add some offset (like +0.2), and use rotation=90 to control the rotation. Full codes will be:
df_select = df[df.sum(axis=1)>0]
plt.bar(df_select.index, df_select[0], width=20, label='0')
plt.bar(df_select.index, df_select[1], width=20, label='1',
bottom=df_select[0])
plt.bar(df_select.index, df_select[2], width=20, label='2',
bottom=df_select[0]+df_select[1])
# Here is the part to put text:
for n, row in df_select.iterrows():
plt.text(n, row.sum()+0.2, n, ha='center', rotation=90, va='bottom')
plt.xticks(df_select.index)
plt.legend()
plt.show()
And a result:
Here's a twist on gepcel's answer that adapts to a dataframe with a varying number of columns:
# in this case I'm creating the dataframe with 3 columns
# but the code is meant to adapt to dataframes with varying column numbers
df = pd.DataFrame(np.random.randint(1, 5, size=(3200, 3)))
df.loc[np.random.choice(df.index, size=3190, replace=False), :] = 0
df_select = df[df.sum(axis=1)>1]
fig, ax = plt.subplots()
ax.bar(df_select.index, df_select.iloc[:,0], label = df_select.columns[0])
if df_select.shape[1] > 1:
for i in range(1, df_select.shape[1]):
bottom = df_select.iloc[:,np.arange(0,i,1)].sum(axis=1)
ax.bar(df_select.index, df_select.iloc[:,i], bottom=bottom, label =
df_select.columns[i])
ax.set_xticks(df_select.index)
plt.legend(loc='best', bbox_to_anchor=(1, 0.5))
plt.xticks(rotation=90, fontsize=8)

Categories

Resources