Standardize axis as multiple graphs generated from dataframe - python

I have a seemingly simple problem of standardizing and labeling my axis on a series of graphs I am creating from a DataFrame. This dataframe contains a column with a sort of ID and each row contains a value for x and a value for y. I am generating a separate graph for each ID; however, I would like a standard axis across all of these graphs. Here is my code:
groups = data.groupby('Pedigree')
for Pedigree,group in groups:
group.plot(x='EnvironmentalIndex',y='GrainYield',marker='o',linestyle='',color ='white',label=Pedigree)
plt.plot([0,250],[0,250],linestyle = 'dashed',color='black')
x = group.EnvironmentalIndex
y = group.GrainYield
z = np.polyfit(x,y,1)
p = np.poly1d(z)
q = sum(y)/len(y)
plt.plot(x,p(x),color='green')
plt.text(25,220,'Stability=%.6f'%(z[0]))
plt.text(25,205,'Mean Yield=%.6f'%(q))
I know there is an axes function in Matplotlib, but I can't get the formatting right so that it plays well with the for loop. I have tried inserting a
group.axes()
inside of the for loop but I get the error that the list object is not callable.

If you mean by standard having the same ticks, there are different ways of doing this, one is, if you don't have a lot of plots, create a subplot that shares the same x-axis,
no_rows = len(data.groupby('Pedigree'))
no_columns = 1
fig, ax = plt.subplots(no_rows, no_columns, sharex = True)
ax = ax.reshape(-1)
count = 0
for Pedigree,group in groups:
...
q = sum(y)/len(y)
ax[count].plot(x,p(x),color='green')
ax[count].text(25,220,'Stability=%.6f'%(z[0]))
ax[count].text(25,205,'Mean Yield=%.6f'%(q))
count+=1
Only the xticks from the bottom plot will be applied, you can also define a different number of columns but make sure no_rows * no_columns >= # of plots.

Related

'numpy.ndarray' object has no attribute 'twinx' [duplicate]

The information below may be superfluous if you are trying to understand the error message. Please start off by reading the answer
by #user707650.
Using MatPlotLib, I wanted a generalizable script that creates the following from my data.
A window containing a subplots arranged so that there are b subplots per column. I want to be able to change the values of a and b.
If I have data for 2a subplots, I want 2 windows, each with the previously described "a subplots arranged according to b subplots per column".
The x and y data I am plotting are floats stored in np.arrays and are structured as follows:
The x data is always the same for all plots and is of length 5.
'x_vector': [0.000, 0.005, 0.010, 0.020, 0.030, 0.040]
The y data of all plots are stored in y_vector where the data for the first plot is stored at indexes 0 through 5. The data for the second plot is stored at indexes 6 through 11. The third plot gets 12-18, the fourth 19-24, and so on.
In total, for this dataset, I have 91 plots (i.e. 91*6 = 546 values stored in y_vector).
Attempt:
import matplotlib.pyplot as plt
# Options:
plots_tot = 14 # Total number of plots. In reality there is going to be 7*13 = 91 plots.
location_of_ydata = 6 # The values for the n:th plot can be found in the y_vector at index 'n*6' through 'n*6 + 6'.
plots_window = 7 # Total number of plots per window.
rows = 2 # Number of rows, i.e. number of subplots per column.
# Calculating number of columns:
prim_cols = plots_window / rows
extra_cols = 0
if plots_window % rows > 0:
extra_cols = 1
cols = prim_cols + extra_cols
print 'cols:', cols
print 'rows:', rows
# Plotting:
n=0
x=0
fig, ax = plt.subplots(rows, cols)
while x <= plots_tot:
ax[x].plot(x_vector, y_vector[n:(n+location_of_ydata)], 'ro')
if x % plots_window == plots_window - 1:
plt.show() # New window for every 7 plots.
n = n+location_of_ydata
x = x+1
I get the following error:
cols: 4
rows: 2
Traceback (most recent call last):
File "Script.py", line 222, in <module>
ax[x].plot(x_vector, y_vector[n:(n+location_of_ydata)], 'ro')
AttributeError: 'numpy.ndarray' object has no attribute 'plot'
If you debug your program by simply printing ax, you'll quickly find out that ax is a two-dimensional array: one dimension for the rows, one for the columns.
Thus, you need two indices to index ax to retrieve the actual AxesSubplot instance, like:
ax[1,1].plot(...)
If you want to iterate through the subplots in the way you do it now, by flattening ax first:
ax = ax.flatten()
and now ax is a one dimensional array. I don't know if rows or columns are stepped through first, but if it's the wrong around, use the transpose:
ax = ax.T.flatten()
Of course, by now it makes more sense to simply create each subplot on the fly, because that already has an index, and the other two numbers are fixed:
for x < plots_tot:
ax = plt.subplot(nrows, ncols, x+1)
Note: you have x <= plots_tot, but with x starting at 0, you'll get an IndexError next with your current code (after flattening your array). Matplotlib is (unfortunately) 1-indexed for subplots. I prefer using a 0-indexed variable (Python style), and just add +1 for the subplot index (like above).
The problem here is with how matplotlib handles subplots. Just do the following:
fig, axes = plt.subplots(nrows=1, ncols=2)
for axis in axes:
print(type(axis))
you will get a matplotlib object which is actually a 1D array which can be traversed using single index i.e. axis[0], axis[1]...and so on. But if you do
fig, axes = plt.subplots(nrows=2, ncols=2)
for axis in axes:
print(type(axis))
you will get a numpy ndarray object which is actually a 2D array which can be traversed only using 2 indices i.e. axis[0, 0], axis[1, 0]...and so on. So be mindful how you incorporate your for loop to traverse through axes object.
In case if you use N by 1 graphs, for example if you do like fig, ax = plt.subplots(3, 1) then please do likeax[plot_count].plot(...)
The axes are in 2-d, not 1-d so you can't iterate through using one loop. You need one more loop:
fig,axes=plt.subplots(nrows=2,ncols=2)
plt.tight_layout()
for ho in axes:
for i in ho:
i.plot(a,a**2)
This gives no problem but if I try:
for i in axes:
i.plot(a,a**2)
the error occurs.

How to make a heatmap in python with aggregated/summarized data?

I'm trying to plot some X and Z coordinates on an image to show which parts of the image have higher counts. Y values are height in this case so I am excluding since I want 2D. Since I have many millions of data points, I have grouped by the combinations of X and Z coordinates and counted how many times that value occurred. The data should contain almost all conbinations of X and Z coordinates. It looks something like this (fake data):
I have experimented with matplotlib.pyplot by using the plt.hist2d(x,y) function but it seems like this takes raw data and not already-summarized data like I've got.
Does anyone know if this is possible?
Note: I can figure out the plotting on an image part later, first I'm trying to get the scatter-plot/heatmap to show aggregated data.
I managed to figure this out. After loading in the data in the format of the original post, step one is pivoting the data so you have x values as columns and z values as rows. Then you plot it using seaborn heatmap. See below:
#pivot columns
values = pd.pivot_table(raw, values='COUNT_TICKS', index=['Z_LOC'], columns = ['X_LOC'], aggfunc=np.sum)
plt.figure(figsize=(20, 20))
sns.set(rc={'axes.facecolor':'cornflowerblue', 'figure.facecolor':'cornflowerblue'})
#ax = sns.heatmap(values, vmin=100, vmax=5000, cmap="Oranges", robust = True, xticklabels = x_labels, yticklabels = y_labels, alpha = 1)
ax = sns.heatmap(values,
#vmin=1,
vmax=1000,
cmap="Greens", #BrBG is also good
robust = True,
alpha = 1)
plt.show()

plotting multiple figures in python

Perhaps the title is misleading or someone will come along telling me that is duplicated. However after many (many) hours of browsing I haven't found anything. I want to plot multiple scatter diagrams and merge them up into a subplot for some nrows and ncols?
Assume that we have the following:
new_list=[]
for j in list(set(lala)):
df1 = df[df['Date'] == j]
df1.drop('Date', axis = 1, inplace = True)
df2 = df1.groupby('Z').mean()
df2.reset_index(inplace = True)
new_list.append(df2)
for j in range(0, len(new_list)):
plt.figure(figsize=(6, 6), dpi=80)
plt.scatter(new_list[j]['X'],new_list[j]['Y'])
and let me explain a little bit of what it does; I create a list called new_list, which contains data frames constructed in the for loop (you can ignore the construction since I'm asking for a global approach). Afterwards, I print scatter diagrams (in total as many as the number of elements of new_list) for each data frame in new_list.
Because the number of the printouts is big, I want to create subplots off these printouts to make the final image easier for the eye.
So how can I take all these scatter diagrams and merge them up into a subplot for some nrows and ncols?
Assuming you have 4 rows and 10 columns, you can do something like this (just one way of doing it). Here flatten returns you a list of 40 axis objects (4 x 10) where the order is across the row: first row four columns first, then second row four columns, and so on.
fig, axes = plt.subplots(nrows=4, ncols=10)
for i, ax in enumerate(axes.flatten()):
ax.scatter(new_list[i]['X'],new_list[i]['Y'])
If you don't want to use enumerate, alternatively you can also use the following
fig, axes = plt.subplots(nrows=4, ncols=10)
ax = axes.flatten()
for j in range(0, len(new_list)):
ax[j].scatter(new_list[j]['X'],new_list[j]['Y'])

plot histogram for many columns quickly using groupby function of pandas dataframe

I have a pandas dataframe, which looks like
Here one of column is named label, that can take only two possible values 0 or 1.
I would like to make histogram for label 1 and for label 0 separately one top of other, like
I am able to make this for one of the column (named "MA_CL05") like:
temp = infile.groupby('label')
for k, v in temp:
if k == 1:
v.MA_CL05.hist(label='1',figsize=(15,15),bins=25,alpha=1.0,histtype = 'step',lw=4)
if k == 0:
v.MA_CL05.hist(label='0',figsize=(15,15),bins=25,alpha=1.0,histtype = 'step',lw=4)
plt.legend(loc=1, prop={'size': 51})
plt.show()
I can copy and past this patch for all of 20 columns and it will be fine. But, is there any easy way to plot this histogram of type (2) in one go?
You can add another loop, looping about the columns of the dataframe and specifying the axes to plot to.
fig, axes = plt.subplots(4,5)
for col,ax in zip(infile.columns[2:],axes.flatten()):
temp = infile.groupby('label')
for k, v in temp:
v[col].hist(label=str(k),bins=25,alpha=1.0,histtype = 'step',lw=4, ax=ax)
plt.legend(loc=1, prop={'size': 51})
plt.show()

Plot points in different colors

I have a dataframe created using
results = df2[(df2['R'] > 100)].sort(columns='ZR', ascending=False)
I would like to do
plt.plot(results['ZR'], marker='o')
except I would like the points were results['I'] == foo to be in red and the points where results['I'] != foo to be in blue.
I tried
firstset = results.ZR[results.I.str.contains('foo')]
secondset = results.ZR[~results.I.str.contains('foo)]
plt.plot(firstset, marker='o', color='red')
plt.plot(secondset, marker='o', color='blue')
but this plots both halves starting from x axis 0 which is not what I need.
I would instead just like the original graph but with some of the points in red and some in blue. That is no new points and no points with changed positions. Here is the original graph.
First of all, in absence of any X parameter, plt.show assumes it as consecutive integers from 1. In order to find correct index using logical operations, do this:
firstIndex = results.index[results.I.str.contains('foo')]
secondIndex = results.index[~results.I.str.contains('foo)]
If your original index is complex, create a dummy pandas DataFrame for new index.
newDf = pd.DataFrame(range(len(results)))
firstIndex = newDf.index[results.I.str.contains('foo')]
secondIndex = newDf.index[~results.I.str.contains('foo')]
This is guaranteed to create two indices that are subsets of 1:230.
Next, pass a string qulifier to plt.plot to specify color and marker type.
plt.plot(results['ZR'])
plt.plot(firstIndex, firstset, "bo")
plt.plot(secondIndex, secondset, "ro")
This will plot the entire underlined data using lines, with no marker for points. It will then overlay the two set points with their respective colors.

Categories

Resources