Plot panda series in separate subplots using matplotlib - python

Hoping to get some help please, I'm trying plot simulation data in separate subplots using pandas and matplotlib my code so far is:
import matplotlib.pylab as plt
import pandas as pd
fig, ax = plt.subplots(2, 3)
for i in range(2):
for j in range(50, 101, 10):
for e in range(3):
Var=(700* j)/ 100
Names1 = ['ig','M_GZ']
Data1 = pd.read_csv('~/File/JTL_'+str(Var)+'/GZ.csv', names=Names1)
ig = Data1['ig']
M_GZ=Data1['M_GZ']
MGZ = Data1[Data1.M_GZ != 0]
ax[i, e].plot(MGZ['ig'][:4], MGZ['M_GZ'][:4], '--v', linewidth=1.75)
plt.tight_layout()
plt.show()
But the code gives me 6 duplicate copies of the same plot:
instead of each iteration of Var having its own plot, I've tried changing the loop and using different variations like:
fig = plt.figure()
for i in range(1, 7):
ax = fig.add_subplot(2, 3, i)
for j in range(50, 101, 10):
Var=(700* j)/ 100
Names1 = ['ig','M_GZ']
Data1 = pd.read_csv('~/File/JTL_'+str(Var)+'/GZ.csv', names=Names1)
ig = Data1['ig']
M_GZ=Data1['M_GZ']
MGZ = Data1[Data1.M_GZ != 0]
ax.plot(MGZ['ig'][:4], MGZ['M_GZ'][:4], '--v', linewidth=1.75)
plt.tight_layout()
plt.show()
but that changes nothing I still get the same plot as above. Any help would be appreciated, I'm hoping that each subplot contains one set of data instead of all six
This is a Link to one of the Dataframes each subdirectory ~/File/JTL_'+str(Var)+'/ contains a copy of this file there are 6 in total

The problem is in your loop
for i in range(2): # Iterating rows of the plot
for j in range(50, 101, 10): # Iterating your file names
for e in range(3): # iterating the columns of the plot
The end result is that you iterate all the columns for each file name
For it two work, you should have only two nesting levels in your loop. Potential code (updated) :
import matplotlib.pylab as plt
import pandas as pd
fig, ax = plt.subplots(2, 3)
for row in range(2):
for col in range(3):
f_index = range(50, 101, 10)[row+1 * col]
print row, col, f_index
Var=(700* f_index)/ 100
Names1 = ['ig','M_GZ']
Data1 = pd.read_csv('~/File/JTL_'+str(Var)+'/GZ.csv', names=Names1)
ig = Data1['ig']
M_GZ=Data1['M_GZ']
MGZ = Data1[Data1.M_GZ != 0]
ax[row, col].plot(MGZ['ig'][:4], MGZ['M_GZ'][:4], '--v',linewidth=1.75)
plt.tight_layout()
plt.show()

Related

plotting each columns in single subplot

I have a text file that contain 2048 rows and 256 columns, i want to plot only 10 columns of data in an subplot,
I tried
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data=np.loadtxt("input_data.txt")
data=data[:,0:10]
print(data.shape)
nrows=5
ncols=10
fig, axs = plt.subplots(nrows,ncols, figsize=(13,10))
count = 0
for i in range(ncols):
for j in range(nrows):
axs[i,j].plot(data[count])
count += 1
print(count)
plt.show()
But it doesnot plot the each column values, I hope experts may help me.Thanks.
I used random numbers to reproduce your problem.
data=np.random.randint(0,1,size = (2048,256))
data=data[:,0:10]
print(np.shape(data))
nrows=2
ncols=5
fig, axs = plt.subplots(nrows,ncols, figsize=(13,10))
count = 0
for i in range(nrows):
for j in range(ncols):
print(count)
axs[i,j].plot(data[:,count])
count += 1
Here is your plot.
if you put real data you will see some variation in each subplot.

Subplotting of Pandas.DataFrameGroupBy[group_name] does not yield expected results

This is a re-opening of my initial question with the same title which was closed as duplicate. As None of the suggested duplicates helped me to solve my problem, I post this question again.
I have a DataFrame with time series related to some devices which come from a hdf-file:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from pandas import DataFrame
def open_dataset(file_name: str, name: str, combined_frame: DataFrame):
data_set: DataFrame = pd.read_hdf(file_name, key=name)
data_set['name'] = name
combined_frame = pd.concat([combined_frame, data_set], axis=0)
return combined_frame
if __name__ == '__main__':
names = ['YRT1IN1E', 'YRT1LE1', 'YRT1MH1', 'YR08DT1ML']
working_frame = DataFrame()
for name in names:
working_frame = open_dataset('data.h5', name, working_frame)
grouped_frame = working_frame.groupby('name')
fig, axs = plt.subplots(figsize=(10, 5),
nrows=4, ncols=1, # fix as above
gridspec_kw=dict(hspace=0), sharex=True)
axs = grouped_frame.get_group('YR08DT1ML').rawsum.plot()
axs = grouped_frame.get_group('YRT1LE1').voltage.plot()
axs = grouped_frame.get_group('YRT1MH1').current.plot()
axs = grouped_frame.get_group('YRT1IN1E').current.plot()
plt.show()
This produces the following output:
What am I doing wrong? I would like to have each of the plots in it's own row, not all in one row.
The data file "data.h5" is available at: Google Drive
What I tried from the suggested posts:
Answer by joris, Mar 18, 2014 at 15:45 causes code to go into infinite loop, data is never plotted:
fig, axs = plt.subplots(nrows=2, ncols=2)
grouped_frame.get_group('YR08DT1ML').rawsum.plot(ax=axs[0,0])
grouped_frame.get_group('YR...').rawsum.plot(ax=axs[0,1])
grouped_frame.get_group('YR...').rawsum.plot(ax=axs[1,0])
grouped_frame.get_group('YR...').rawsum.plot(ax=axs[1,1])
A variation is leading to same result as I described above:
axes[0,0] = grouped_frame.get_group('YR08DT1ML').rawsum.plot()
axes[0,1] = grouped_frame.get_group('YR...').rawsum.plot()
...
Infinite loop happens as well for sedeh's, Jun 4, 2015 at 15:26 answer:
grouped_frame.get_group('YR08DT1ML').rawsum.plot(subplots=True, layout=(1,2))
...
Infinite loop happens as well for Justice_Lords, Mar 15, 2019 at 7:26 answer:
fig=plt.figure()
ax1=fig.add_subplot(4,1,1)
ax2=fig.add_subplot(4,1,2)
ax3=fig.add_subplot(4,1,3)
ax4=fig.add_subplot(4,1,4)
grouped_frame.get_group('YR08DT1ML').rawsum.plot(ax=ax1)
...
It seems to me that the problem is related to the fact that I plot with a pandas.DataFrameGroupBy and not a pandas.DataFrame
Seems like matplotlib was taking a long time to process the DatetimeIndex. Converting to a time and cleaning everything up did the trick:
names = ['YR08DT1ML', 'YRT1LE1', 'YRT1MH1', 'YRT1IN1E']
df = pd.concat([pd.read_hdf('data.h5', name) for name in names])
df.reset_index(inplace=True)
df.index = df['time'].dt.time
df.sort_index(inplace=True)
fig, axes = plt.subplots(figsize=(10, 5), nrows=4, ncols=1, gridspec_kw=dict(hspace=0), sharex=True)
cols = ['rawsum', 'voltage', 'current', 'current']
for ix, name in enumerate(names):
df.loc[df['nomen'].eq(name), cols[ix]]\
.plot(ax=axes[ix])
plt.show();
Hope this helps.
Thanks to #fishmulch's answer I found a way to do what I wanted. However, I want to provide an answer for my initial question how to plot the "groupby" data set. The following __main__ function provides the desired output with input file data.h5:
if __name__ == '__main__':
names = ['YRT1IN1E', 'YRT1LE1', 'YRT1MH1', 'YR08DT1ML']
working_frame = DataFrame()
for name in names:
working_frame = open_dataset('data.h5', name, working_frame)
grouped_frame = working_frame.groupby('name')
fig = plt.figure(1)
gs = gridspec.GridSpec(4, 1)
gs.update(wspace=0.0, hspace=0.0) # set the spacing between axes.
cols = ['current', 'voltage', 'current', 'rawsum']
row = 0
for name, col in zip(names, cols):
data = grouped_frame.get_group(name)
if row == 0:
ax = fig.add_subplot(gs[row])
else:
ax = fig.add_subplot(gs[row], sharex=ax)
ax.plot(data.get(col))
row += 1
plt.show()
... some beautification still needed ...

Speed up loop for matplotlib in python

I have a similar, but larger data set with more dates and over ten thousand rows. Usually, it takes 3mins or longer to run the code and plot. I think the problem comes from loop. Looping is time-consuming in python. In this case, would be appreciated if someone knows how to rewrite the code to make it faster.
data = {'Date' : ["2022-07-01"]*5000 + ["2022-07-02"]*5000+ ["2022-07-03"]*5000,
'OB1' : range(1,15001),
'OB2' : range(1,15001)}
df = pd.DataFrame(data)
# multi-indexing
df = df.set_index(['Date'])
# loop for plot
i = 1
fig, axs = plt.subplots(nrows = 1, ncols = 3, sharey = True)
fig.subplots_adjust(wspace=0)
for j, sub_df in df.groupby(level=0):
plt.subplot(130 + i)
x = sub_df['OB1']
y = sub_df['OB2']
plt.barh(x, y)
i = i + 1
plt.show()
The slowness comes from the barh function, which involves drawing many rectangles. While your example is already pretty slow (a minute on my laptop), this one runs in less than a second. I replaced barh with fill_betweenx, which fills the area between two curves (here 0 and the height of bars) instead of drawing rectangles. It goes much faster but is not strictly the same. Also, I use the option step=post, so if you zoom, you will have a bar-style graph.
import pandas as pd
import matplotlib.pyplot as plt
data = {
"Date": ["2022-07-01"] * 5000
+ ["2022-07-02"] * 5000
+ ["2022-07-03"] * 5000,
"OB1": range(1, 15001),
"OB2": range(1, 15001),
}
df = pd.DataFrame(data)
# multi-indexing
df = df.set_index(["Date"])
# loop for plot
i = 1
fig, axs = plt.subplots(nrows=1, ncols=3, sharey=True)
fig.subplots_adjust(wspace=0)
for j, sub_df in df.groupby(level=0):
plt.subplot(130 + i)
x = sub_df["OB1"]
y = sub_df["OB2"]
# plt.barh(x, y)
plt.fill_betweenx(y, 0, x, step="post")
i = i + 1
plt.show()

Python: Conditionally plotting data from many columns from a Dataframe in a loop

I have about 200 pairs of columns in a dataframe that I would like to plot in a single plot. Each pair of columns can be thought of as related "x" and "y" variables. Some of the "y variables" are 0 at certain points in the data. I don't want to plot those. I would rather they show up as a discontinuity in the plot. I am not able to figure out an efficient way to excluse those variables. There is also a "date" variable that I don't need in the plot but I am keeping it in the sample data just to mirror the reality.
Here is a sample data set and what I have done with it. I created my sample dataset in a hurry, the original data has unique "y values" for a given "x value" for every pair of column data.
import pandas as pd
from numpy.random import randint
data1y = [n**3 -n**2+n for n in range(12)]
data1x = [randint(0, 100) for n in range(12)]
data1x.sort()
data2y = [n**3 for n in range(12)]
data2x = [randint(0, 100) for n in range(12)]
data2x.sort()
data3y = [n**3 - n**2 for n in range(12)]
data3x = [randint(0, 100) for n in range(12)]
data3x.sort()
data1y = [0 if x%7==0 else x for x in data1y]
data2y = [0 if x%7==0 else x for x in data2y]
data3y = [0 if x%7==0 else x for x in data3y]
date = ['Jan','Feb','Mar','Apr','May', 'Jun','Jul','Aug','Sep','Oct','Nov','Dec']
df = pd.DataFrame({'Date':date,'Var1':data1y, 'Var1x':data1x, 'Vartwo':data2y, 'Vartwox':data2x,'datatree':data3y, 'datatreex':data3x})
print(df)
ax = plt.gca()
fig = plt.figure()
for k in ['Var1','Vartwo','datatree']:
df.plot(x=k+'x', y=k, kind = 'line',ax=ax)enter code here
The output I get this this:
I would like to see discontinuity where the 'y variables' are zero.
I have tried:
import numpy as np
df2 = df.copy()
df2[df2.Var1 < 0.5] = np.nan
But this makes an entire row NaN when I only want it to be a particular variable.
I'm trying this but it isnt working.
ax = plt.gca()
fig = plt.figure()
for k in ['Var1','Vartwo','datatree']:
filter = df.k.values > 0
x = df.k+'x'
y = df.k
plot(x[filter], y[filter], kind = 'line',ax=ax)
This works for a single variable but I don't know how to loop it across 200 variables and this also doesn't show the discontinuities.
import matplotlib.pyplot as plt
ax = plt.gca()
fig = plt.figure()
for k in ['Var1','Vartwo','datatree']:
filter = df.Var1.values > 0
x = df.Var1x[filter]
y = df.Var1[filter]
plt.plot(x, y)
You're looking for .replace():
df2 = df.copy()
cols_to_replace = ['Var1','Var1x','Vartwo']
df2[cols_to_replace] = df2[cols_to_replace].replace({0:np.nan})
fig, ax = plt.subplots()
for k in ['Var1','Vartwo','datatree']:
df2.plot(x=k+'x', y=k, kind = 'line',ax=ax)
Result:

Multiple titles (suptitle) with subplots

I have a series of 9 subplots in a 3x3 grid, each subplot with a title.
I want to add a title for each row. To do so I thought about using suptitle.
The problem is if I use 3 suptitles they seems to be overwritten and only the last one seems to be shown.
Here is my basic code:
fig, axes = plt.subplots(3,3,sharex='col', sharey='row')
for j in range(9):
axes.flat[j].set_title('plot '+str(j))
plt1 = fig.suptitle("row 1",x=0.6,y=1.8,fontsize=18)
plt2 = fig.suptitle("row 2",x=0.6,y=1.2,fontsize=18)
plt3 = fig.suptitle("row 3",x=0.6,y=0.7,fontsize=18)
fig.subplots_adjust(right=1.1,top=1.6)
You can tinker with the titles and labels. Check the following example adapted from your code:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(3,3,sharex='col', sharey='row')
counter = 0
for j in range(9):
if j in [0,3,6]:
axes.flat[j].set_ylabel('Row '+str(counter), rotation=0, size='large',labelpad=40)
axes.flat[j].set_title('plot '+str(j))
counter = counter + 1
if j in [0,1,2]:
axes.flat[j].set_title('Column '+str(j)+'\n\nplot '+str(j))
else:
axes.flat[j].set_title('plot '+str(j))
plt.show()
, which results in:

Categories

Resources