I am a bit new to Python. And I am playing with a dummy dataset to get some Python data manipulation practice. Below is the code for generating the dummy data:
d = {
'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] ,
'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] ,
'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] ,
'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] ,
'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] ,
'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
CarWash = pd.DataFrame(data = d)
categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
numerical = ['DistancefromBranch']
CarWash[categoricals] = CarWash[categoricals].astype('category')
I am basically struggling with a couple of things:
#1. A stacked barplot with absolute values (like the excel example below)
#2. A stacked barplot with percentage values (like the excel example below)
Below are my target visualizations for # 1 and # 2 using countplot().
#1
#2
For # 1, instead of a stacked barplot, with countplot() I am able to make a clustered barplot, like below, and also the annotation snippet feels more like a workaround rather than being Python elegant.
# Looping through each categorical column and viewing target variable distribution (ReversedPayment) by value
figure, axes = plt.subplots(2,2,figsize = (10,10))
for i,ax in zip(categoricals[:-1],axes.flatten()):
sns.countplot(x= i, hue = 'ReversedPayment', data = CarWash, ax = ax)
for p in ax.patches:
height = np.nan_to_num(p.get_height()) # gets the height of each patch/bar
adjust = np.nan_to_num(p.get_width())/2 # a calculation for adusting the data label later
label_xy = (np.nan_to_num(p.get_x()) + adjust,np.nan_to_num(p.get_height()) + adjust) #x,y coordinates where we want to put our data label
ax.annotate(height,label_xy) # final annotation
For # 2, I tried creating a new data frame housing % values but that felt tedious and error-prone.
I feel an option like stacked = True, proportion = True, axis = 1, annotate = True could have been so useful for countplot() to have.
Are there any other libraries that would be straight-froward and less code-intensive? Any comments or suggestions are welcome.
In this case, I think plotly.express may be more intuitive for you.
import plotly.express as px
df_temp = CarWash.groupby(['SeniorCitizen', 'ReversedPayment'])['DistancefromBranch'].count().reset_index().rename({'DistancefromBranch':'count'}, axis=1)
fig = px.bar(df_temp, x="SeniorCitizen", y="count", color="ReversedPayment", title="SeniorCitizen", text='count')
fig.update_traces(textposition='inside')
fig.show()
Basically, if you want to have more flexibility to adjust your charts, it is hard to avoid writing lots of codes.
I also try using matplotlib and pandas to create a stacked bar chart for percentages. If you are interested in it, you can try it.
sns.set()
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=[12,8], dpi=100)
# Conver the axes matrix to a 1-d array
axes = ax.flatten()
for i, col in enumerate(['SeniorCitizen', 'CollegeDegree', 'Married', 'FulltimeJob']):
# Calculate the number of plots
df_temp = (CarWash.groupby(col)['ReversedPayment']
.value_counts()
.unstack(1).fillna(0)
.rename({0:f'No', 1:f'Yes'})
.rename({0:'No', 1:'Yes'}, axis=1))
df_temp = df_temp / df_temp.sum(axis=0)
df_temp.plot.bar(stacked=True, ax=axes[i])
axes[i].set_title(col, y=1.03, fontsize=16)
rects = axes[i].patches
labels = df_temp.values.flatten()
for rect, label in zip(rects, labels):
if label == 0: continue
axes[i].text(rect.get_x() + rect.get_width() / 2, rect.get_y() + rect.get_height() / 3, '{:.2%}'.format(label),
ha='center', va='bottom', color='white', fontsize=12)
axes[i].legend(title='Reversed\nPayment', bbox_to_anchor=(1.05, 1), loc='upper left', title_fontsize = 10, fontsize=10)
axes[i].tick_params(rotation=0)
plt.tight_layout()
plt.show()
I am making an application where the users inputs data and the results are displayed on a chart. I want the plot to update each time new data is input, but updating the plot is not happening.
The plot is produce using the matlib plot library and the interface is created using the tkinter package. The way it is done is uses a function to create the plot as I want to call this several times. What I think could be the problem is the way the function returns the variable, but I can't put my finger on what or why this is wrong.
I have looked on here for solutions and none of the ones I have read seem to work. I have tried to update my axes, used patlibplot.figure instead of matlibplot.plot , adding canvas.daw() to the end and even tried to get a new way of producing the plot.
This is the function that creates the plot:
import matplotlib.pyplot as plt
def PlotConsumption(fuelEntry, plotConfig):
'''
Produce a line plot of the consumption
:params object fuelEntry: all of the fuel entry information
:params object plotConfig: the config object for the plot
'''
# create a data frame for plotting
df = pd.DataFrame(fuelEntry)
df = df.sort_values(["sort"]) # ensure the order is correct
df["dateTime"] = pd.to_datetime(df["date"], format="%d/%m/%Y") # create a date-time version of the date string
if plotConfig.xName == "date": # change the indexing for dates
ax = df.plot(x = "dateTime", y = plotConfig.yName, marker = '.', rot = 90, title = plotConfig.title)
else:
ax = df.plot(x = plotConfig.xName, y = plotConfig.yName, marker = '.', title = plotConfig.title)
# calculate the moving average if applicable
if plotConfig.moveCalc == True and plotConfig.move > 1 and len(df) > plotConfig.move:
newName = plotConfig.yName + "_ma"
df[newName] = df[plotConfig.yName].rolling(window = plotConfig.move).mean()
if plotConfig.xName == "date":
df.plot(x = "dateTime", y = newName, marker = "*", ax = ax)
else:
df.plot(x = plotConfig.xName, y = newName, marker = "*", ax = ax)
# tidy up the plot
ax.set(xlabel = plotConfig.xLabel, ylabel = plotConfig.yLabel)
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90)
L=plt.legend()
L.get_texts()[0].set_text(plotConfig.yLegend)
if len(L.get_texts()) == 2: # only process if there is a moving average plot
L.get_texts()[1].set_text(plotConfig.moveLegend)
return plt # also tried with returning ax
And this is where it is used:
self.plot = displayCharts.PlotConsumption(self.data.carEntry["fuelEntry"], self.plotConfig)
#fig = plt.figure(1) # also tried creating fig this way
fig = self.plot.figure(1)
self.plotCanvas = FigureCanvasTkAgg(fig, master = self.master)
self.plotWidget = self.plotCanvas.get_tk_widget()
self.plotWidget.grid(row = 0, column = 1, rowspan = 3, sticky = tk.N)
fig.canvas.draw()
I expect the plot to update automatically but instead it does nothing. I know the data is read and processed as if you close the application, start it up again and then load the same data file the plot produced is correct.
I revisited some of the answers I initially looked at and picked my way through them and this one is the best I found and it actually does what I want!
There were a few sublte differences that I needed to change:
Create the figure and canvas to start with
Use these variables to create the plot
I copied the linked solution to see how it works and managed to get it to work. Now the next thing is to get it to plot with my data! But now that I have got it updating it should be easier to finish off.
secondHold = np.zeros((96,30))
channel = ['channel' for x in range(96)]
for i in range (96):
BlankBinsx = bins[blankposition,0:30,i]
StimBinsx = bins[NonBlankPositions,0:30,i]
meanx = BlankBinsx.mean(axis=0);
stimmeanx = StimBinsx.mean(axis=0);
for j in range(30):
hold[i][j] = meanx[j];
secondHold[i][j] = stimmeanx[j];
plt.subplots(1, 1, sharex='all', sharey='all')
plt.plot(hold[i], label='stimulus')
plt.plot(secondHold[i], label='Blank Stimulus')
plt.title('Channel x')
plt.xlabel('time (ms)')
plt.ylabel('Avg Spike Rate')
plt.legend()
plt.show()
I am creating 96 different graphs through a for-loop and I want it to also label the graphs (i.e., the first graph would be 'Channel 1', graph two 'Channel 2' and so on. I tried ax.set_title but couldn't figure it out how to make it work with the string and numbers.
Also I'd like the graphs to print as a 6x16 subplots instead of 96 graphs in a column.
You are creating a new figure each time in your for loop that's why you get 96 figures. I don't have your data so I can't provide a final figure but the following should work for you. The idea here is:
Define a figure and an array of axes containing 6x16 subplots.
Use enumerate on axes.flatten to iterate through the subfigures ax row wise and use i as the index to access the data.
Use the field specifier %d to label the subplots iteratively.
Put plt.show() outside the for loop
secondHold = np.zeros((96,30))
channel = ['channel' for x in range(96)]
fig, axes = plt.subplots(nrows=6, ncols=16, sharex='all', sharey='all')
for i, ax in enumerate(axes.flatten()):
BlankBinsx = bins[blankposition,0:30,i]
StimBinsx = bins[NonBlankPositions,0:30,i]
meanx = BlankBinsx.mean(axis=0);
stimmeanx = StimBinsx.mean(axis=0);
for j in range(30):
hold[i][j] = meanx[j];
secondHold[i][j] = stimmeanx[j];
ax.plot(hold[i], label='stimulus')
ax.plot(secondHold[i], label='Blank Stimulus')
ax.set_title('Channel %d' %i)
ax.set_xlabel('time (ms)')
ax.set_ylabel('Avg Spike Rate')
ax.legend()
plt.show()
The code below achieves what I want to do, but does so in a very roundabout way. I have looked around for a succinct way to produce a single legend for a figure that includes multiple subplots that takes into account their labels, to no avail. plt.figlegend() requires you to pass in labels and lines, and plt.legend() requires only handles (slightly better).
My example below illustrates what I want. I have 9 vectors, each with one of 3 categories. I want to plot each vector on a separate sub plot, label it, and plot a legend which indicates (using colour) what the label means; this is the automatic behaviour on a single plot.
Do you know of a better way of achieving the plot below?
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
nr_lines = 9
nr_cats = 3
np.random.seed(1337)
# Data
X = np.random.randn(nr_lines, 100)
labels = ['Category {}'.format(ii) for ii in range(nr_cats)]
y = np.random.choice(labels, nr_lines)
# Ideally wouldn't have to manually pick colours
clrs = matplotlib.rcParams['axes.prop_cycle'].by_key()['color']
clrs = [clrs[ii] for ii in range(nr_cats)]
lab_clr = {k: v for k, v in zip(labels, clrs)}
fig, ax = plt.subplots(3, 3)
ax = ax.flatten()
for ii in range(nr_lines):
ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
lines = [a.lines[0] for a in ax]
l_labels = [l.get_label() for l in lines]
# the hack - get a single occurance of each label
idx_list = [l_labels.index(lab) for lab in labels]
lines_ = [lines[idx] for idx in idx_list]
#l_labels_ = [l_labels[idx] for idx in idx_list]
plt.legend(handles=lines_, bbox_to_anchor=[2, 2.5])
plt.tight_layout()
plt.savefig('/home/james/Downloads/stack_figlegend_example.png',
bbox_inches='tight')
You could use a dictionary to collect them using the label as a key. For example:
handles = {}
for ii in range(nr_lines):
l1, = ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
if y[ii] not in handles:
handles[y[ii]] = l1
plt.legend(handles=handles.values(), bbox_to_anchor=[2, 2.5])
You only add a handle to the dictionary if the category isn't already present.
I am trying to plot two different variables (linked by a relation of causality), delai_jour and date_sondage on a single FacetGrid. I can do it with this code:
g = sns.FacetGrid(df_verif_sum, col="prefecture", col_wrap=2, aspect=2, sharex=True,)
g = g.map(plt.plot, "date_sondage", "delai_jour", color="m", linewidth=2)
g = g.map(plt.bar, "date_sondage", "impossible")
which gives me this:
FacetGrid
(There are 33 of them in total).
I'm interested in comparing the patterns across the various prefecture, but due to the difference in magnitude I cannot see the changes in the line chart.
For this specific work, the best way to do it is to create a secondary y axis, but I can't seem to make anything work: it doesn't look like it's possible with FacetGrid, and I didn't understand the code not was able to replicate the examples i've seen with pure matplotlib.
How should I go about it?
I got this to work by iterating through the axes and plotting a secondary axis as in a typical Seaborn graph.
Using the OP example:
g = sns.FacetGrid(df_verif_sum, col="prefecture", col_wrap=2, aspect=2, sharex=True)
g = g.map(plt.plot, "date_sondage", "delai_jour", color="m", linewidth=2)
for ax, (_, subdata) in zip(g.axes, df_verif_sum.groupby('prefecture')):
ax2=ax.twinx()
subdata.plot(x='data_sondage',y='impossible', ax=ax2,legend=False,color='r')
If you do any formatting to the x-axis, you may have to do it to both ax and ax2.
Here's an example where you apply a custom mapping function to the dataframe of interest. Within the function, you can call plt.gca() to get the current axis at the facet being currently plotted in FacetGrid. Once you have the axis, twinx() can be called just like you would in plain old matplotlib plotting.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
def facetgrid_two_axes(*args, **kwargs):
data = kwargs.pop('data')
dual_axis = kwargs.pop('dual_axis')
alpha = kwargs.pop('alpha', 0.2)
kwargs.pop('color')
ax = plt.gca()
if dual_axis:
ax2 = ax.twinx()
ax2.set_ylabel('Second Axis!')
ax.plot(data['x'],data['y1'], **kwargs, color='red',alpha=alpha)
if dual_axis:
ax2.plot(df['x'],df['y2'], **kwargs, color='blue',alpha=alpha)
df = pd.DataFrame()
df['x'] = np.arange(1,5,1)
df['y1'] = 1 / df['x']
df['y2'] = df['x'] * 100
df['facet'] = 'foo'
df2 = df.copy()
df2['facet'] = 'bar'
df3 = pd.concat([df,df2])
win_plot = sns.FacetGrid(df3, col='facet', size=6)
(win_plot.map_dataframe(facetgrid_two_axes, dual_axis=True)
.set_axis_labels("X", "First Y-axis"))
plt.show()
This isn't the prettiest plot as you might want to adjust the presence of the second y-axis' label, the spacing between plots, etc. but the code suffices to show how to plot two series of differing magnitudes within FacetGrids.