I want to plot all columns in my dataframe against one column in the same df: totCost. The following code works fine:
for i in range(0, len(df.columns), 5):
g=sns.pairplot(data=df,
x_vars=df.columns[i:i+5],
y_vars=['totCost'])
g.set(xticklabels=[])
g.savefig('output.png')
Problem is output.png only contains the last 3 graphs (there are 18 total). Same happens if I de-dent that line. How do I write all 18 as a single graphic?
So, the problem with using pairplot like you do, is that in every iteration of the loop, a new figure is created and assigned to g.
If you take your last line of code g.savefig('output.png'), outside of the loop, only the last version of g is saved to disk, and this is the one with only the last three subplots in it.
If you put that line into you loop, all figures get saved to disk, but under the same name, and the last one is of course again the figure with three subplots in it.
A way around this is to create a figure, and assign all subplots to it, as they come, and then save that figure to disk:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
# generate random data, with 18 columns
dic = {str(a): np.random.randint(0,10,10) for a in range(18)}
df = pd.DataFrame(dic)
# rename first column of dataframe
df.rename(columns={'0':'totCost'}, inplace=True)
#instantiate figure
fig = plt.figure()
# loop through all columns, create subplots in 5 by 5 grid along the way,
# and add them to the figure
for i in range(len(df.columns)):
ax = fig.add_subplot(5,5,i+1)
ax.scatter(df['totCost'], df[df.columns[i]])
ax.set_xticklabels([])
plt.tight_layout()
fig.savefig('figurename.png')
Related
I am using the integrated plot() function in pandas to generate a graph with two y-axes. This works well and the legend even points to the (right) y-axis for the second data set. But imho the legend's position is bad.
However, when I update the legend position I get two legends the correct one ('A', 'B (right)') at an inconvenient location, and a wrong one ('A' only) at the chosen location.
So now I want to generate a legend on my own and was looking for the second <matplotlib.lines.Line2D>, but it is not contained in the ax environment.
import pandas as pd
df = pd.DataFrame({"A":[1,2,3],"B":[1/4,1/5,1/6]})
ax = df.plot(secondary_y=['B'])
len(ax.lines)
>>> 1
My ultimate objective is to be able to move the correct legend around, but I am confident I could manually place a legend, if only I had access to the second line container.
If I had, I was going to suppress the original legend by invoking df.plot(...,legend=None) and do something like plt.legend([ax.lines[0],ax.lines[1]],['A','B (right)'],loc='center left',bbox_to_anchor=(1.2, 0.5)). But ax only stores the first line "A", where is the second?
Also ax.get_legend_handles_labels() only contains ([<matplotlib.lines.Line2D at 0x2630e2193c8>], ['A']).
You create two axes. Each contains a line. So you need to loop over the axes and take the line(s) from each of them.
import numpy as np
import pandas as pd
df = pd.DataFrame({"A":[1,2,3],"B":[1/4,1/5,1/6]})
ax = df.plot(secondary_y=['B'])
lines = np.array([axes.lines for axes in ax.figure.axes]).flatten()
print(lines)
For the purpose of creating a single legend you may however just use a figure legend,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"A":[1,2,3],"B":[1/4,1/5,1/6]})
ax = df.plot(secondary_y=['B'], legend=False)
ax.figure.legend()
plt.show()
I have
x = collections.Counter(df.f.values.tolist())
if 'nan' in x:
del x['nan']
plt.bar(range(len(x)), x.values(), align='center')
plt.xticks(range(len(x)), list(x.keys()))
plt.show()
My question is, how can I remove the nan's from the dictionary that is created, and how can I change the order of the bar plot to go from 1-5? The first 3 nan's are empty spots in the data (intentional since its from a poll), and the last one is the title of the column. I tried manually changing the range part of plt.bar to be 1-5 but it does not seem to work.
You can use .value_counts on a pandas.Series to simply get how many times each value occurs. This makes it simple to then make a barplot.
By default, value_counts will ignore the NaN values, so that takes care of that, and by using .sort_index() we can guarantee the values are plotted in order. It seems we need to use .to_frame() so that it only plots one color for the column (it chooses one color per row for a Series).
Sample Data
import pandas as pd
import numpy as np
# Get your plot settings
import seaborn as sns
sns.set()
np.random.seed(123)
df = pd.DataFrame({'f': np.random.randint(1,6,100)})
df = df.append(pd.DataFrame({'f': np.repeat(np.NaN,1000)}))
Code
df.f.value_counts().to_frame().sort_index().plot(kind='bar', legend=False)
I have a dataframe and with a column called "my_row". It has many values. I only want to see some of the data on FacetGrid that belong to specific values of "my_row" on the row. I tried to make a subset of my dataframe and visualize that, but still somehow seaborn "knows" that my original dataframe had more values in "my_row" column and shows empty plots for the rows that I dont want.
So using the following code still gives me a figure with 2 rows of data that I want and many empty plots after that.
X = df[(df['my_row']=='1') | (df['my_row']=='2')].copy()
g = sns.FacetGrid(X, row='my_row', col='column')
How can I tell python to just plot that 2 rows?
I get plots like this with many empty plots:
I cannot reproduce this. The code from the question seems to work fine. Here we have a dataframe with four different values in the my_row column. Then filtering out two of them creates a FacetGrid with only two rows.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
df = pd.DataFrame({"my_row" : np.random.choice(list("1234"), size=40),
"column" : np.random.choice(list("AB"), size=40),
"x" : np.random.rand(40),
"y" : np.random.rand(40)})
X = df[(df['my_row']=='1') | (df['my_row']=='2')].copy()
g = sns.FacetGrid(X, row='my_row', col='column')
g.map(plt.scatter, "x", "y")
plt.show()
For anyone encountering this problem-- the issue is that my_row is a categorical type. To solve, change this to a str.
i.e.
X = df[(df['my_row']=='1') | (df['my_row']=='2')].copy()
X['my_row']=X['my_row'].astype(str)
g = sns.FacetGrid(X, row='my_row', col='column')
This should now work! :)
I got inspired by this link:
Plot lower triangle in a seaborn Pairgrid
and changed my code to this:
g = sns.FacetGrid(df, row='my_row', col='column')
for i in list(range(2,48)):
for j in list(range(0,12)):
g.axes[i,j].set_visible(False)
So I had to iterate over each plot individually at make it invisible. But I think there should be an easier way to do this. And in the end I still don't understand how FacetGrid knows anything about the size of my original dataframe df when I use X and its input.
This is an answer that works, but I think there must be better solutions. One problem with my answer is that when I save the figure, I get a big white space in the saved plot (corresponding to the axes that I set their visibility to False) that I do not see in jupyter notebooks when I am running the code. If FacetGrid just plots the dataframe that I am giving it as the input (in this case X), there would have been no problem anymore. There should be a way to do that.
I am looping through a bunch of CSV files containing various measurements.
Each file might be from one of 4 different data sources.
In each file, I merge the data into monthly datasets, that I then plot in a 3x4 grid. After this plot has been saved, the loop moves on and does the same to the next file.
This part I got figured out, however I would like to add a visual clue to the plots, as to what data it is. As far as I understand it (and tried it)
plt.subplot(4,3,1)
plt.hist(Jan_Data,facecolor='Red')
plt.ylabel('value count')
plt.title('January')
does work, however this way, I would have to add the facecolor='Red' by hand to every 12 subplots. Looping through the plots wont work for this situation, since I want the ylabel only for the leftmost plots, and xlabels for the bottom row.
Setting facecolor at the beginning in
fig = plt.figure(figsize=(20,15),facecolor='Red')
does not work, since it only changes the background color of the 20 by 15 figure now, which subsequently gets ignored when I save it to a PNG, since it only gets set for screen output.
So is there just a simple setthecolorofallbars='Red' command for plt.hist(… or plt.savefig(… I am missing, or should I just copy n' paste it to all twelve months?
You can use mpl.rc("axes", color_cycle="red") to set the default color cycle for all your axes.
In this little toy example, I use the with mpl.rc_context block to limit the effects of mpl.rc to just the block. This way you don't spoil the default parameters for your whole session.
import matplotlib as mpl
import matplotlib.pylab as plt
import numpy as np
np.random.seed(42)
# create some toy data
n, m = 2, 2
data = []
for i in range(n*m):
data.append(np.random.rand(30))
# and do the plotting
with mpl.rc_context():
mpl.rc("axes", color_cycle="red")
fig, axes = plt.subplots(n, m, figsize=(8,8))
for ax, d in zip(axes.flat, data):
ax.hist(d)
The problem with the x- and y-labels (when you use loops) can be solved by using plt.subplots as you can access every axis seperately.
import matplotlib.pyplot as plt
import numpy.random
# creating figure with 4 plots
fig,ax = plt.subplots(2,2)
# some data
data = numpy.random.randn(4,1000)
# some titles
title = ['Jan','Feb','Mar','April']
xlabel = ['xlabel1','xlabel2']
ylabel = ['ylabel1','ylabel2']
for i in range(ax.size):
a = ax[i/2,i%2]
a.hist(data[i],facecolor='r',bins=50)
a.set_title(title[i])
# write the ylabels on all axis on the left hand side
for j in range(ax.shape[0]):
ax[j,0].set_ylabel(ylabel[j])
# write the xlabels an all axis on the bottom
for j in range(ax.shape[1]):
ax[-1,j].set_xlabel(xlabels[j])
fig.tight_layout()
All features (like titles) which are not constant can be put into arrays and placed at the appropriate axis.
I am trying to make a matplotlib figure that will have multiple horizontal boxplots stacked on one another. The documentation shows both how to make a single horizontal boxplot and how to make multiple vertically oriented plots in this section.
I tried using subplots as in the following code:
import numpy as np
import pylab as plt
totfigs = 5
plt.figure()
plt.hold = True
for i in np.arange(totfigs):
x = np.random.random(50)
plt.subplot('{0}{1}{2}'.format(totfigs,1,i+1))
plt.boxplot(x,vert=0)
plt.show()
My output results in just a single horizontal boxplot though.
Any suggestions anyone?
Edit: Thanks to #joaquin, I fixed the plt.subplot call line. Now the subplot version works, but still would like the boxplots all in one figure...
If I'm understanding you correctly, you just need to pass boxplot a list (or a 2d array) containing each array you want to plot.
import numpy as np
import pylab as plt
totfigs = 5
plt.figure()
plt.hold = True
boxes=[]
for i in np.arange(totfigs):
x = np.random.random(50)
boxes.append(x)
plt.boxplot(boxes,vert=0)
plt.show()
try:
plt.subplot('{0}{1}{2}'.format(totfigs, 1, i+1) # n rows, 1 column
or
plt.subplot('{0}{1}{2}'.format(1, totfigs, i+1)) # 1 row, n columns
from the docstring:
subplot(*args, **kwargs)
Create a subplot command, creating axes with::
subplot(numRows, numCols, plotNum)
where plotNum = 1 is the first plot number and increasing plotNums
fill rows first. max(plotNum) == numRows * numCols
if you want them all together, shift them conveniently. As an example with a constant shift:
for i in np.arange(totfigs):
x = np.random.random(50)
plt.boxplot(x+(i*2),vert=0)