Creating a single tidy seaborn plot in a 'for' loop

Creating a single tidy seaborn plot in a 'for' loop - python

I'm trying to generate a plot in seaborn using a for loop to plot the contents of each dataframe column on its own row.
The number of columns that need plotting can vary between 1 and 30. However, the loop creates multiple individual plots, each with their own x-axis, which are not aligned and with a lot of wasted space between the plots. I'd like to have all the plots together with a shared x-axis without any vertical spacing between each plot that I can then save as a single image.
The code I have been using so far is below.
comp_relflux = measurements.filter(like='rel_flux_C', axis=1) *# Extracts relevant columns from larger dataframe
comp_relflux=comp_relflux.reindex(comp_relflux.mean().sort_values().index, axis=1) # Sorts into order based on column mean.
plt.rcParams["figure.figsize"] = [12.00, 1.00]
for column in comp_relflux.columns:
plt.figure()
sns.scatterplot((bjd)%1, comp_relflux[column], color='b', marker='.')
This is a screenshot of the resultant plots.
I have also tried using FacetGrid, but this just seems to plot the last column's data.
p = sns.FacetGrid(comp_relflux, height=2, aspect=6, despine=False)
p.map(sns.scatterplot, x=(bjd)%1, y=comp_relflux[column])

To combine the x-axis labels and have just one instead of having it for each row, you can use sharex. Also, using plt.subplot() to the number of columns you have, you would also be able to have just one figure with all the subplots within it. As there is no data available, I used random numbers below to demonstrate the same. There are 4 columns of data in my df, but have kept as much of your code and naming convention as is. Hope this is what you are looking for...
comp_relflux = pd.DataFrame(np.random.rand(100, 4)) #Random data - 4 columns
bjd=np.linspace(0,1,100) # Series of 100 points - 0 to 1
rows=len(comp_relflux.columns) # Use this to get column length = subplot length
fig, ax = plt.subplots(rows, 1, sharex=True, figsize=(12,6)) # The subplots... sharex is assigned here and I move the size in here from your rcParam as well
for i, column in enumerate(comp_relflux.columns):
sns.scatterplot((bjd)%1, comp_relflux[column], color='b',marker='.', ax=ax[i])
1 output plot with 4 subplots

Related

I have a large data set where the rows are a series of coordinates and need to plot specific rows

I have a very large dataset of coordinates that I need plot and specify specific rows instead of just editing the raw excel file.
The data is organized as so
frames xsnout ysnout xMLA yMLA
0 532.732971 503.774200 617.231018 492.803711
1 532.472351 504.891632 617.638550 493.078583
2 532.453552 505.676300 615.956116 493.2839
3 532.356079 505.914642 616.226318 494.179047
4 532.360718 506.818054 615.836548 495.555298
The column "frames" is the specific video frame for each of these coordinates (xsnout,ysnout) (xMLA,yMLA). Below is my code which is able to plot all frames and all data points without specifying the row
import numpy as np
import matplotlib.pyplot as plt
#import data
df = pd.read_excel("E:\\Clark\\Flow Tank\\Respirometry\\Cropped_videos\\F1\\MG\\F1_MG_4Hz_simplified.xlsx")
#different body points
ax1 = df.plot(kind='scatter', x='xsnout', y='ysnout', color='r', label='snout')
ax2 = df.plot(kind='scatter', x='xMLA', y='yMLA', color='g', ax=ax1)
How would I specify just a single row instead of plotting the whole dataset? And is there anyway to connect the coordinates of a single row with a line?
Thank you and any help would be greatly appreciated

How would I specify just a single row instead of plotting the whole dataset?
To do this you can slice your dataframe. There's a large variety of ways of doing this and they'll depend on exactly what you're trying to do. For instance, you can use df.iloc[] to specify which rows you want. This is short for index locator. Note the brackets! If you want to specify your rows by their row index (and same for columns), you have to use .loc[]. For example, the plot with the original data you provided is:
Slicing the dataframe with iloc:
ax1 = df.iloc[2:5, :].plot(kind='scatter', x='xsnout', y='ysnout', color='r', label='snout')
ax2 = df.iloc[2:5, :].plot(kind='scatter', x='xMLA', y='yMLA', color='g', ax=ax1)
Gives you this:
If you specify something like this, you get only a single line:
df.iloc[1:2, :]
And is there anyway to connect the coordinates of a single row with a line?
What exactly do you mean by this? You want to connect the points (xsnout, ysnout) with (xMLA, yMLA)? If that's so, then you can do it with this:
plt.plot([df['xsnout'], df['xMLA']], [df['ysnout'], df['yMLA']])

plotting whit subplots in a loop python [duplicate]

Case:
I receive a dataframe with (say 50) columns.
I extract the necessary columns from that dataframe using a condition.
So we have a list of selected columns of our dataframe now. (Say this variable is sel_cols)
I need a bar chart for each of these columns value_counts().
And I need to arrange all these bar charts in 3 columns, and varying number of rows based on number of columns selected in sel_cols.
So, if say 8 columns were selected, I want the figure to have 3 columns and 3 rows, with last subplot empty or just 8 subplots in 3x3 matrix if that is possible.
I could generate each chart separately using following code:
for col in sel_cols:
df[col].value_counts().plot(kind='bar)
plt.show()
plt.show() inside the loop so that each chart is shown and not just the last one.
I also tried appending these charts to a list this way:
charts = []
for col in sel_cols:
charts.append(df[col].value_counts().plot(kind='bar))
I could convert this list into an numpy array through reshape() but then it will have to be perfectly divisible into that shape. So 8 chart objects will not be reshaped into 3x3 array.
Then I tried creating the subplots first in this way:
row = len(sel_cols)//3
fig, axes = plt.subplots(nrows=row,ncols=3)
This way I would get the subplots, but I get two problems:
I end up with extra subplots in the 3 columns which will go unplotted (8 columns example).
I do not know how to plot under each subplots through a loop.
I tried this:
for row in axes:
for chart, col in zip(row,sel_cols):
chart = data[col].value_counts().plot(kind='bar')
But this only plots the last subplot with the last column. All other subplots stays blank.
How to do this with minimal lines of code, possibly without any need for human verification of the final subplots placements?
You may use this sample dataframe:
pd.DataFrame({'A':['Y','N','N','Y','Y','N','N','Y','N'],
'B':['E','E','E','E','F','F','F','F','E'],
'C':[1,1,0,0,1,1,0,0,1],
'D':['P','Q','R','S','P','Q','R','P','Q'],
'E':['E','E','E','E','F','F','G','G','G'],
'F':[1,1,0,0,1,1,0,0,1],
'G':['N','N','N','N','Y','N','N','Y','N'],
'H':['G','G','G','E','F','F','G','F','E'],
'I':[1,1,0,0,1,1,0,0,1],
'J':['Y','N','N','Y','Y','N','N','Y','N'],
'K':['E','E','E','E','F','F','F','F','E'],
'L':[1,1,0,0,1,1,0,0,1],
})
Selected columns are: sel_cols = ['A','B','D','E','G','H','J','K']
Total 8 columns.
Expected output is bar charts for value_counts() of each of these columns arranged in subplots in a figure with 3 columns. Rows to be decided based on number of columns selected, here 8 so 3 rows.

Given OP's sample data:
df = pd.DataFrame({'A':['Y','N','N','Y','Y','N','N','Y','N'],'B':['E','E','E','E','F','F','F','F','E'],'C':[1,1,0,0,1,1,0,0,1],'D':['P','Q','R','S','P','Q','R','P','Q'],'E':['E','E','E','E','F','F','G','G','G'],'F':[1,1,0,0,1,1,0,0,1],'G':['N','N','N','N','Y','N','N','Y','N'],'H':['G','G','G','E','F','F','G','F','E'],'I':[1,1,0,0,1,1,0,0,1],'J':['Y','N','N','Y','Y','N','N','Y','N'],'K':['E','E','E','E','F','F','F','F','E'],'L':[1,1,0,0,1,1,0,0,1]})
sel_cols = list('ABDEGHJK')
data = df[sel_cols].apply(pd.value_counts)
We can plot the columns of data in several ways (in order of simplicity):
DataFrame.plot with subplots param
seaborn.catplot
Loop through plt.subplots
1. DataFrame.plot with subplots param
Set subplots=True with the desired layout dimensions. Unused subplots will be auto-disabled:
data.plot.bar(subplots=True, layout=(3, 3), figsize=(8, 6),
sharex=False, sharey=True, legend=False)
plt.tight_layout()
2. seaborn.catplot
melt the data into long-form (i.e., 1 variable per column, 1 observation per row) and pass it to seaborn.catplot:
import seaborn as sns
melted = data.melt(var_name='var', value_name='count', ignore_index=False).reset_index()
sns.catplot(data=melted, kind='bar', x='index', y='count',
col='var', col_wrap=3, sharex=False)
3. Loop through plt.subplots
zip the columns and axes to iterate in pairs. Use the ax param to place each column onto its corresponding subplot.
If the grid size is larger than the number of columns (e.g., 3*3 > 8), disable the leftover axes with set_axis_off:
fig, axes = plt.subplots(3, 3, figsize=(8, 8), constrained_layout=True, sharey=True)
# plot each col onto one ax
for col, ax in zip(data.columns, axes.flat):
data[col].plot.bar(ax=ax, rot=0)
ax.set_title(col)
# disable leftover axes
for ax in axes.flat[data.columns.size:]:
ax.set_axis_off()

Alternative to the answer by tdy, I tried to do it without seaborn using Matplotlib and a for loop.
Figured it might be better for some who want specific control over subplots with formatting and other parameters, then this is another way:
fig = plt.figure(1,figsize=(16,12))
for i, col in enumerate(sel_cols,1):
fig.add_subplot(3,4,i,)
data[col].value_counts().plot(kind='bar',ax=plt.gca())
plt.title(col)
plt.tight_layout()
plt.show(1)
plt.subplot activates a subplot, while plt.gca() points to the active subplot.

Set x axis locator at hour intervals on matplotlib subplot

I am trying to create a figure with four subplots using the Matplotlib object based approach. I am having trouble setting the x-axis to hourly markers on each plot. With my present code the hourly marks are retained only on the last of the four subplots
I have a list which contains four dataframes that were read in from CSV. I used pd.to_datetime to create an index. No problem.
I can loop through the four dataframes and plot my y variable (TS_comp) against time. this works fine and I get date/time on each x-axis. But what I want is to have just hour markers on each of the x axis. When I add code in the loop to set the major locator it ends up that the x-axis labels are wiped on the first three subplots. The two lines of code from the loop below are:
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
I do not understand why this is happening as each time it goes through the loop it should be addressing a different axis object. Note x-axis time ranges are different so not a simple matter of sharing the x-axis across the subplots.
fig, ax = plt.subplots(nrows=2, ncols=2)
i=0
hours = mdates.HourLocator(interval = 1)
for ax in fig.get_axes():
ax.plot(dfs[i].TS_comp,'k-',markersize = 0.5)
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
i=i+1;
Expect to get hourly markers on each of the subplots, ended up with hourly markers on just the last plot

Why doesn't Subplot using Pandas show x-axis

When I plot single plots with panda dataframes I have an x-axis.
However, when I make a subplot and try to make a shared x-axis the way I would when using numpy arrays without pandas, there are no numbers labels
I only want the numbers and label to appear on the last plot as they share the same x-axis.
The data loaded and the plot produced can be found here:
https://drive.google.com/open?id=1hTmTSkIcYl-usv_CCxLl8U6bAoO6tMRh
This is for combining and plotting the data logged from two different logging devices which represent the same time period.
import pandas as pd
import matplotlib.pyplot as plt
df1=pd.read_csv('data1.csv', sep=',',header=0)
df1.columns.values
cols1 = list(df1.columns.values)
df2=pd.read_csv('data2.dat', sep='\t',header=18)
df2.columns.values
cols2 = list(df2.columns.values)
start =10000
stop = 30000
fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True, figsize=(10, 10))
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[1], ax=axes[0])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[2], ax=axes[0])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[3], ax=axes[2])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[4], ax=axes[2])
df2.iloc[start:stop].plot(x=cols2[0], y=cols2[3], ax=axes[3])
ax3.set_xlabel("Time [s]")
plt.show()
I expect there to be numbers and a label on the x-axis but instead, it only gives the pandas label "#timestamp"
UPDATE: I have found something that hints at the problem. I think the problem is due to the two files not having identical time spacing, the first column of each file is time, they are roughly 1 sample per second but not exactly. If I remove the x=cols[x] parts it then shows numbers on the x-axis but then there is a shift in time between the two plots as they are not plotting against time but rather against the index in the dataframe.
I am currently trying to interpolate the data so that they have the same x-axis but I would not have expected that to be necessary.

Populating Seaborn subplots using an array

The att_sales table has 3 fields item,qty and yr_mon. The buckets(i) function returns a list of 20 SKUs(list of item values). A group by function is used to find the monthly sales for each of these SKUs
and then churns out a violinplot. The exercise wors fine till this point.
I am trying to visualize the monthly sales for about 200 SKUs along 10 subplots.To do this I intended an iterator to run from 1 through 10 and populate each of the subplots.The code below populates the last subplot out of 10 empty ones. How do I go about achieving this?
fig, (axis1) = plt.subplots(5,2,figsize=(15,30))
plt.xticks(rotation=45)
s=att_sales[['item','qty','yr_mon']]
s=s[s.item.isin(buckets(i))]
s=s.groupby(['item','yr_mon'], as_index=False).qty.sum()
sns.violinplot(x="item", y="qty", data=s)
Edit1: On implmenting #Ted's solution I got an error min() arg is an empty sequence when the for loop ran from 0 to n. Changing the for loop to run between 1 and n, provides most the solution but not quite.
I need to know how to increase the size of the overall plot and of the individual subplots, and also change the orientation of the xticks to 45 degrees.

Here is a simplified example that I think you can tweak to make it work for you. I am using the tips dataset in seaborn and plotting 4 different violin plots based on what day it is. I have also created a buckets function that returns a single element list of one day.
When the figure is created with fig, axes = plt.subplots(2,2,figsize=(10,10)), it returns both a matplotlib figure object which is stored into fig and a 2 dimensional numpy array of matplotlib axes objects which is stored in axes. To get the top left plot you would do axes[0, 0]. If you wanted the bottom right hand plot you would do axes[1, 1]. If you created a 5 row by 2 column figure axes[3,0] would be the plot on the 4th row and first column.
# create function that will return a list of items
# this particular example returns just a list of one element
def buckets(i):
return [tips.day.unique()[i]]
# load dataset and create figure
tips = sns.load_dataset("tips")
num_plots = 4
fig, axes = plt.subplots(2,2,figsize=(10,10))
# iterate through all axes and create a violin plot
for i in range(num_plots):
df = tips[tips.day.isin(buckets(i))]
row = i // 2
col = i % 2
ax_curr = axes[row, col]
sns.violinplot(x="sex", y="tip", data=df, ax=ax_curr)
ax_curr.set_title(buckets(i))
Note that in this particular example you can use a facet grid which will do the same exact thing as what I did by plotting each day in a separate plot. You can take advantage of the facet grid if you label each bucket of SKUs a unique id. See the very last example on this page

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating a single tidy seaborn plot in a 'for' loop - python

Related

I have a large data set where the rows are a series of coordinates and need to plot specific rows

plotting whit subplots in a loop python [duplicate]

Set x axis locator at hour intervals on matplotlib subplot

Why doesn't Subplot using Pandas show x-axis

Populating Seaborn subplots using an array

Categories

Resources