Graph axes not showing correctly in Python - python

I'm trying to create a 2x2 graphs in python and is struggling with the axes. This is what I get so far - the axes on each subplot is messed up.
This is my code:
def plotCarBar(df):
fig = plt.figure()
j = 1
for i in pandaDF.columns[15:18]:
cat_count = df.groupby(i)[i].count().sort_values().plot(figsize= 12,12), kind = 'line')
ax = fig.add_subplot(2, 2, j)
j += 1
return ax.plot(lw = 1.3)
plotCarBar(pandaDF)
Can someone please help? Thanks in advance!

I am not sure if you need two loops. If you post some sample data, we may be able to make better sense of what your cat_count line is doing. As it stands, I'm not sure if you need two counters (i and j).
Generally, I would also recommend using matplotlib directly, unless you're really just doing some quick and dirty plotting in pandas.
So, something like this might work:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
randoms = np.random.rand(10, 4) # generate some data
print(randoms)
fig = plt.figure()
for i in range(1, randoms.shape[1] + 1): # number of cols
ax = fig.add_subplot(2, 2, i)
ax.plot(randoms[i, :])
plt.show()
Output:
[[0.78436298 0.85009767 0.28524816 0.28137471]
[0.58936976 0.00614068 0.25312449 0.58549765]
[0.24216048 0.13100618 0.76956316 0.66210005]
[0.95156085 0.86171181 0.40940887 0.47077143]
[0.91523306 0.33833055 0.74360696 0.2322519 ]
[0.68563804 0.69825892 0.5836696 0.97711073]
[0.62709986 0.44308186 0.24582971 0.97697002]
[0.04356271 0.01488111 0.73322443 0.04890864]
[0.9090653 0.25895051 0.73163902 0.83620635]
[0.51622846 0.6735348 0.20570992 0.13803589]]

Related

How to create and save distinct scatterplots using matplotlib and nested 'for-loops' for labelled data?

I have a dataset containing 10 features and corresponding labels. I am using scatterplot to plot distinct pair of features to see which of them describe the labels perfectly (which means that total 45 plots will be created). In order to do that, I used a nested loop format. The code shows no error and I obtained all the plots as well. However, there is clearly something wrong with the code because each new scatterplot that gets created and saved is accumulating points from the previous ones as well. I am attaching the complete code which I used. How to fix this problem? Below is the link for raw dataset:
https://github.com/IITGuwahati-AI/Learning-Content/raw/master/Phase%203%20-%202020%20(Summer)/Week%201%20(Mar%2028%20-%20Apr%204)/assignment/data.txt
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
data_url ='https://raw.githubusercontent.com/diwakar1412/Learning-Content/master/DiwakarDas_184104503/datacsv.csv'
df = pd.read_csv(data_url)
df.head()
def transform_label(value):
if value >= 2:
return "BLUE"
else:
return "RED"
df["Label"] = df.Label.apply(transform_label)
df.head()
colors = {'RED':'r', 'BLUE':'b'}
fig, ax = plt.subplots()
for i in range(1,len(df.columns)):
for j in range(i+1,len(df.columns)):
for k in range(len(df[str(i)])):
ax.scatter(df[str(i)][k], df[str(j)][k], color=colors[df['Label'][k]])
ax.set_title('F%svsF%s' %(i,j))
ax.set_xlabel('%s' %i)
ax.set_ylabel('%s' %j)
plt.savefig('F%svsF%s' %(i,j))
Dataset
You have to create a new figure each time. Try to put
fig, ax = plt.subplots()
inside your loop:
for i in range(1,len(df.columns)):
for j in range(i+1,len(df.columns)):
fig, ax = plt.subplots() # <-------------- here
for k in range(len(df[str(i)])):
ax.scatter(df[str(i)][k], df[str(j)][k], color=colors[df['Label'][k]])
ax.set_title('F%svsF%s' %(i,j))
ax.set_xlabel('%s' %i)
ax.set_ylabel('%s' %j)
plt.savefig('/Users/Alessandro/Desktop/tmp/F%svsF%s' %(i,j))

Hot to make dynamic subplots (optimization of code)

I am making a figure with subplots. The number of subplots is dynamic and depending on df.shape.
This is working but i am not satisfied. I have 3 Questions:
1) Is it possible to optimize the plot part? if k==b is lil bit annoying
2) How can I delete the last 3 empty subplots?
3) I was thinking of making a static figure (size=4,4) and opening a new one after the figure is full. How can I realize this?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import string
import random
# got generator from here:
# https://stackoverflow.com/questions/2257441/random-string-generation-with-upper-case-letters-and-digits
def id_generator(size=4, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for _ in range(size))
#%% make random data
labels = []
for i in range(0,27):
labels.append(id_generator())
mat = np.random.rand(20,27)
df = pd.DataFrame(mat,columns=labels)
#%% plot
k = 0 #go left
l=0 #go down
b = 5 #static number for columns
a = math.ceil(len(labels)/5) #round up for 'go down'
fig, axs = plt.subplots(
a,b, figsize=(10, 10),sharex=True, constrained_layout=True
)
for j in labels:
axs[l,k].plot(df[j])
k+=1
if k == b:
k = 0
l+=1
Edit:
With help of Chris A the 3 Questions above are solved.
I found out how to change the xlabel,xlim and title.
Is it possible to change the position of the legend to the top left corner?
Cant find anything in the documentation and how can i hand over a list with ylabels?
df.index.name = 'xlabel'
fig = df.plot(subplots=True, title= 'Make title',y=labels,layout=(-1, 5), figsize=(10, 10),grid=True,xlim=[0,20]) #xticks=[0,5,10,15,20]

Pagebreak inside Subplot? Matplotlib subplot over mulitple pages

I want to create a python programm that is able to plot multiple graphs into one PDF file, however the number of subplots is variable. I did this already with one plot per page. However, since i got someteimes arround 100 plots that makes a lot of scrolling and is not really clearly shown. Therefore I would like to get like 5X4 subpltots per page.
I wrote code for that alreaedy, the whole code is long and since im very new to pyhton it looks terrible to someone who knows what to do, however the ploting part looks like this:
rows = (len(tags))/5
fig = plt.figure()
count = 0
for keyInTags in tags:
count = count + 1
ax = fig.add_subplot(int(rows), 5, count)
ax.set_title("cell" + keyInTags)
ax.plot(x, y_green, color='k')
ax.plot(x, y_red, color='k')
plt.subplots_adjust(hspace=0.5, wspace=0.3)
pdf.savefig(fig)
The idea is that i get an PDF with all "cells" (its for biological research) ploted. The code I wrote is working fine so far, however if I got more than 4 rows of subplots I would like to do a "pageprake". In some cases i got over 21 rows on one page, that makes it impossible to see anything.
So, is there a solution to, for example, tell Python to do a page break after 4 rows? In the case with 21 rows id like to have 6 pages with nice visible plots. Or is it done by doing 5x4 plots and then iterating somehow over the file?
I would be really happy if someone could help a little or give a hint. Im sitting here since 4 hours, not finding a solution.
A. Loop over pages
You could find out how many pages you need (npages) and create a new figure per page.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
tags = ["".join(np.random.choice(list("ABCDEFG123"), size=5)) for _ in range(53)]
N = len(tags) # number of subplots
nrows = 5 # number of rows per page
ncols = 4 # number of columns per page
# calculate number of pages needed
npages = N // (nrows*ncols)
if N % (nrows*ncols) > 0:
npages += 1
pdf = PdfPages('out2.pdf')
for page in range(npages):
fig = plt.figure(figsize=(8,11))
for i in range(min(nrows*ncols, N-page*(nrows*ncols))):
# Your plot here
count = page*ncols*nrows+i
ax = fig.add_subplot(nrows, ncols, i+1)
ax.set_title(f"{count} - {tags[count]}")
ax.plot(np.cumsum(np.random.randn(33)))
# end of plotting
fig.tight_layout()
pdf.savefig(fig)
pdf.close()
plt.show()
B. Loop over data
Or alternatively you could loop over the tags themselves and create a new figure once it's needed:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
tags = ["".join(np.random.choice(list("ABCDEFG123"), size=5)) for _ in range(53)]
nrows = 5 # number of rows per page
ncols = 4 # number of columns per page
pdf = PdfPages('out2.pdf')
for i, tag in enumerate(tags):
j = i % (nrows*ncols)
if j == 0:
fig = plt.figure(figsize=(8,11))
ax = fig.add_subplot(nrows, ncols,j+1)
ax.set_title(f"{i} - {tags[i]}")
ax.plot(np.cumsum(np.random.randn(33)))
# end of plotting
if j == (nrows*ncols)-1 or i == len(tags)-1:
fig.tight_layout()
pdf.savefig(fig)
pdf.close()
plt.show()
You can use matplotlib's PdfPages as follows.
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
import numpy as np
pp = PdfPages('multipage.pdf')
x=np.arange(1,10)
y=np.arange(1,10)
fig=plt.figure()
ax1=fig.add_subplot(211)
# ax1.set_title("cell" + keyInTags)
# ax1.plot(x, y, color='k')
# ax.plot(x, y_red, color='k')
ax2=fig.add_subplot(212)
pp.savefig(fig)
fig2=plt.figure()
ax1=fig2.add_subplot(321)
ax1.plot(x, y, color='k')
ax2=fig2.add_subplot(322)
ax2.plot(x, y, color='k')
ax3=fig2.add_subplot(313)
pp.savefig(fig2)
pp.close()
Play with these subplot numbers a little bit, so you would understand how to handle which graph goes where.

Python Animated plotting, one point at a time

I have a set of points [index, minimum] and I would like to scatter one point i (index[i],minimum[i]) at a time so that I can see the evolution of the plot.
I would like to know how I can do that. I have tried a time- delay like:
plt.figure()
for i in range (np.size(index)):
plt.plot(index[i], minimum[i],'*')
plt.show()
time.sleep(1)
it did not work.
Thanks in advance.
Might seem stupid but did you import the time library ? Also there is no indentation, is your code really like that or that's a copy/paste fail ?
Edit: Answer in comments, use plt.pause(1), see http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.pause
you should use an "animate" plot :
http://matplotlib.org/api/animation_api.html
and here some good example :
http://matplotlib.org/examples/animation/index.html
You do have to use a nan arrays to plot empty values then update your array as you move in time. Here is a working example:
import numpy as np
import matplotlib.pyplot as plt
import time
nbPoints = 100
nanArray = np.array(np.ones(nbPoints))
nanArray[:] = np.nan
index = range(nbPoints)
minimum = np.random.randint(5, size=nbPoints)
minimumPlotData = nanArray
fig = plt.figure()
ax = plt.subplot(111)
ax.set_xlim(0, nbPoints)
ax.set_ylim(min(minimum), max(minimum))
li, = ax.plot(index,minimumPlotData, marker = 'o', linestyle="")
fig.canvas.draw()
plt.show(block=False)
for i in range(nbPoints):
minimumPlotData[i]=minimum[i]
li.set_ydata(minimumPlotData)
fig.canvas.draw()
time.sleep(1)

Plotting profile hitstograms in python

I am trying to make a profile plot for two columns of a pandas.DataFrame. I would not expect this to be in pandas directly but it seems there is nothing in matplotlib either. I have searched around and cannot find it in any package other than rootpy. Before I take the time to write this myself I thought I would ask if there was a small package that contained profile histograms, perhaps where they are known by a different name.
If you don't know what I mean by "profile histogram" have a look at the ROOT implementation. http://root.cern.ch/root/html/TProfile.html
You can easily do it using scipy.stats.binned_statistic.
import scipy.stats
import numpy
import matplotlib.pyplot as plt
x = numpy.random.rand(10000)
y = x + scipy.stats.norm(0, 0.2).rvs(10000)
means_result = scipy.stats.binned_statistic(x, [y, y**2], bins=50, range=(0,1), statistic='mean')
means, means2 = means_result.statistic
standard_deviations = numpy.sqrt(means2 - means**2)
bin_edges = means_result.bin_edges
bin_centers = (bin_edges[:-1] + bin_edges[1:])/2.
plt.errorbar(x=bin_centers, y=means, yerr=standard_deviations, linestyle='none', marker='.')
Use seaborn. Data as from #MaxNoe
import numpy as np
import seaborn as sns
# just some random numbers to get started
x = np.random.uniform(-2, 2, 10000)
y = np.random.normal(x**2, np.abs(x) + 1)
sns.regplot(x=x, y=y, x_bins=10, fit_reg=None)
You can do much more (error bands are from bootstrap, you can change the estimator on the y-axis, add regression, ...)
While #Keith's answer seems to fit what you mean, it is quite a lot of code. I think this can be done much simpler, so one gets the key concepts and can adjust and build on top of it.
Let me stress one thing: what ROOT is calling a ProfileHistogram is not a special kind of plot. It is an errorbar plot. Which can simply be done in matplotlib.
It is a special kind of computation and that's not the task of a plotting library. This lies in the pandas realm, and pandas is great at stuff like this. It's symptomatical for ROOT as the giant monolithic pile it is to have an extra class for this.
So what you want to do is: discretize in some variable x and for each bin, calculate something in another variable y.
This can easily done using np.digitize together with the pandas groupy and aggregate methods.
Putting it all together:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# just some random numbers to get startet
x = np.random.uniform(-2, 2, 10000)
y = np.random.normal(x**2, np.abs(x) + 1)
df = pd.DataFrame({'x': x, 'y': y})
# calculate in which bin row belongs base on `x`
# bins needs the bin edges, so this will give as 100 equally sized bins
bins = np.linspace(-2, 2, 101)
df['bin'] = np.digitize(x, bins=bins)
bin_centers = 0.5 * (bins[:-1] + bins[1:])
bin_width = bins[1] - bins[0]
# grouby bin, so we can calculate stuff
binned = df.groupby('bin')
# calculate mean and standard error of the mean for y in each bin
result = binned['y'].agg(['mean', 'sem'])
result['x'] = bin_centers
result['xerr'] = bin_width / 2
# plot it
result.plot(
x='x',
y='mean',
xerr='xerr',
yerr='sem',
linestyle='none',
capsize=0,
color='black',
)
plt.savefig('result.png', dpi=300)
Just like ROOT ;)
I made a module myself for this functionality.
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
def Profile(x,y,nbins,xmin,xmax,ax):
df = DataFrame({'x' : x , 'y' : y})
binedges = xmin + ((xmax-xmin)/nbins) * np.arange(nbins+1)
df['bin'] = np.digitize(df['x'],binedges)
bincenters = xmin + ((xmax-xmin)/nbins)*np.arange(nbins) + ((xmax-xmin)/(2*nbins))
ProfileFrame = DataFrame({'bincenters' : bincenters, 'N' : df['bin'].value_counts(sort=False)},index=range(1,nbins+1))
bins = ProfileFrame.index.values
for bin in bins:
ProfileFrame.ix[bin,'ymean'] = df.ix[df['bin']==bin,'y'].mean()
ProfileFrame.ix[bin,'yStandDev'] = df.ix[df['bin']==bin,'y'].std()
ProfileFrame.ix[bin,'yMeanError'] = ProfileFrame.ix[bin,'yStandDev'] / np.sqrt(ProfileFrame.ix[bin,'N'])
ax.errorbar(ProfileFrame['bincenters'], ProfileFrame['ymean'], yerr=ProfileFrame['yMeanError'], xerr=(xmax-xmin)/(2*nbins), fmt=None)
return ax
def Profile_Matrix(frame):
#Much of this is stolen from https://github.com/pydata/pandas/blob/master/pandas/tools/plotting.py
import pandas.core.common as com
import pandas.tools.plotting as plots
from pandas.compat import lrange
from matplotlib.artist import setp
range_padding=0.05
df = frame._get_numeric_data()
n = df.columns.size
fig, axes = plots._subplots(nrows=n, ncols=n, squeeze=False)
# no gaps between subplots
fig.subplots_adjust(wspace=0, hspace=0)
mask = com.notnull(df)
boundaries_list = []
for a in df.columns:
values = df[a].values[mask[a].values]
rmin_, rmax_ = np.min(values), np.max(values)
rdelta_ext = (rmax_ - rmin_) * range_padding / 2.
boundaries_list.append((rmin_ - rdelta_ext, rmax_+ rdelta_ext))
for i, a in zip(lrange(n), df.columns):
for j, b in zip(lrange(n), df.columns):
common = (mask[a] & mask[b]).values
nbins = 100
(xmin,xmax) = boundaries_list[i]
ax = axes[i, j]
Profile(df[a][common],df[b][common],nbins,xmin,xmax,ax)
ax.set_xlabel('')
ax.set_ylabel('')
plots._label_axis(ax, kind='x', label=b, position='bottom', rotate=True)
plots._label_axis(ax, kind='y', label=a, position='left')
if j!= 0:
ax.yaxis.set_visible(False)
if i != n-1:
ax.xaxis.set_visible(False)
for ax in axes.flat:
setp(ax.get_xticklabels(), fontsize=8)
setp(ax.get_yticklabels(), fontsize=8)
return axes
To my knowledge matplotlib doesn't still allow to directly produce profile histograms.
You can instead give a look at Hippodraw, a package developed at SLAC, that can be used as a Python extension module.
Here there is a Profile histogram example:
http://www.slac.stanford.edu/grp/ek/hippodraw/datareps_root.html#datareps_profilehist

Categories

Resources