I have multiple .dat file (30) and I load them as Dataframes and add more columns:
A_files = glob.glob("*.A*.dat") #load .dat file which contains "A" in their name
for files in A_files:
df=pd.read_fwf(files,header=None,infer_nrows=300,names=["Time","Result",'Error']) #loading the files as dataframes
df['error_plus']=df["Result"]+df['Error'] #defining first curve for error bars
df['error_minus']=df["Result"]-df['Error'] #defining second curve for error bars
Now I'd like to make subplots from the dataframes, where x='Time', y='Result', and 'error_plus' with 'error_minus' will serve for function ax.fill_between. I tried to extend the code above with this:
A_files = glob.glob("*.A*.dat")
for files in A_files:
df=pd.read_fwf(files,header=None,infer_nrows=300,names = ["Time","Result",'Error'])
df['error_plus']=df["Result"]+df['Error']
df['error_minus']=df["Result"]-df['Error']
ax=df.plot(subplots=True,x='Time', y="Result",sharey=True, sharex=True)
ax.fill_between(df["Time"], df["error_plus"],df["error_minus"],color="r")
However, it didn't make subplots as I expected and this error was raised: 'numpy.ndarray' object has no attribute 'fill_between' (but when I plot just one dataframe without looping, then this error doesn't occurr).
Is there some easy/elegant approach how to make subplots from a loop of dataframes? And also containing fill_between function to highlight error and shared axis?
Thanks.
The problem is in the use of ax=df.plot(), you need to give a previously created "matplotlib axes object" as a parameter to df.plot().
You also don't need the subplots parameter, he has a different meaning, he says whether to plot each of df's columns in a different subplot, or all together in one subplot.
See: pandas.DataFrame.plot documentation.
Then you also have to tell sharex, sharey to the pyplot.subplots() function instead, because this is where you create one plot with subplots.
Here is a mockup of all the changes:
fig, axes = plt.subplots(nrows=5, ncols=6, sharex=True, sharey=True)
axes_flat = axes.flatten()
for file, ax in zip(A_files, axes_flat):
df = pd.read_fwf(file, ...)
...
df.plot(x='Time', y="Result", ax=ax)
plt.show()
Related
I have a list of data frames, called listofdf. It contains 5 data frames, and within each data frame, I am trying to print a barchart of 2 of the columns in a certain size. This should produce a total of 10 charts. I am trying to plot 10 separate plots using a single for loop, I tried subplots in the code below because I couldn't find any way to do separate plots but separate plots would be ideal.
The columns all contain normal numerical data and I can plot them if I do,
listofdf[0]['col1'].plot(kind='bar', figsize=(20,5))
so the data should be fine.
Here is my code I am trying to use to iterate over the all the data frames and the columns I want to display within the data frames,
plotsperloop = 2
fig, ax = plt.subplots(nrows=len(listofdf)*plotsperloop, ncols=1)
for idx, df in enumerate(listofdf):
idxcount = idx * plotsperloop
ax[idxcount].plot(df['col1'],kind='bar', figsize=(20,5))
ax[idxcount+1].plot(df['col2'],kind='bar', figsize=(20,5))
#plt.show()
plt.show()
However, I am unable to select the kind as bar. I have tried adding the argument kind = 'bar' inside the plot method, but I keep getting an error AttributeError: 'Line2D' object has no property 'kind'.
If I don't include the kind and figsize arguments, I am able to display multiple line graphs in a column. But I need them to be bar graphs of a certain size.
I also tried,
for df in listofdf:
df.plot(kind='bar', figsize=(20,5))
but I only get 1 plot instead of 10.
What is the proper way to print multiple dynamically generated data frames as plots like this?
I am trying to merge an arbitrary number of line charts into a single image, and while there are many, many questions about this sort of thing, none of them seem applicable to the code I'm working with.
Unlike a large number of answers, I don't want to have the separate graphs displayed side by side, or above one another, in a single output, but rather, combined together.
For all of these graphs the value of the "y_x" column would be the same, but the "yhat_y" produced during each loop would be different.
Adding subplots = True to the plot method of a dataframe seems to change the return type to something that is no longer compatible with the code numpy.ndarray' object has no attribute 'get_figure'
#ax = plt.subplot(111) doesnt seem to do anything
for variable in range(max_num):
forecast = get_forecast(variable)
cmp1 = forecast.set_index("ds")[["yhat", "yhat_lower", "yhat_upper"]].join(
both.set_index("ds")
)
e.augmented_error[variable]= sklearn.metrics.mean_absolute_error(
cmp["y"].values, cmp1["yhat"].values
)
cmp2=cmp.merge(cmp1,on='ds')
plot = cmp2[['y_x', 'yhat_y']].plot(title =e)
fig1 = plot.get_figure()
plot.set_title("prediction")
plt.show()
fig1.savefig('output.pdf', format="pdf")
plt.close()
The most straightforward way would be to create a reusable ax handle outside the loop, then call ax.plot inside the loop:
fig, ax = plt.subplots() # create reusable `fig` and `ax` handles
for variable in range(max_num):
...
ax.plot(cmp2['y_x'], cmp2['yhat_y']) # use `ax.plot(cmp2...)` instead of `cmp2.plot()`
ax.set_title('predictions')
fig.savefig('output.pdf', format='pdf')
I have two dataframes with the same headers. I want to plot the 'Close' column from both data frames into one chart of lines.
so I have:
(aapl.Close).plot()
and
(tsla.Close).plot()
which clearly does what I need to plot but in two different charts. I need two lines in one line chart.
Tried the below, literally after I posted.
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(aapl.Close)
ax.plot(tsla.Close)
I am plotting two dataframes in the same chart: the USDEUR exchange rate and the 3-day moving average.
df.plot(ax=ax, linewidth=1)
rolling_mean.plot(ax=ax, linewidth=1)
Both dataframes are labelled "Value" so I would like to customize that:
I tried passing the label option but that didn't work, as it seems that this option is exclusive to matplotlib.axes.Axes.plot and not to pandas.DataFrame.plot. So I tried using axes instead, and passing each label:
ax.plot(df, linewidth=1, label='FRED/DEXUSEU')
ax.plot(rolling_mean, linewidth=1, label='3-day SMA')
However now the legend is not showing up at all unless I explicitly call ax.legend() afterwards.
Is it possible to plot the dataframes while passing custom labels without the need of an additional explicit call?
When setting a label using df.plot() you have to specifiy the data which is being plotted:
fig, (ax1, ax2) = plt.subplots(1,2)
df = pd.DataFrame({'Value':np.random.randn(10)})
df2 = pd.DataFrame({'Value':np.random.randn(10)})
df.plot(label="Test",ax=ax1)
df2.plot(ax=ax1)
df.plot(y="Value", label="Test",ax=ax2)
df2.plot(y="Value", ax=ax2)
ax1.set_title("Reproduce problem")
ax2.set_title("Possible solution")
plt.show()
Which gives:
Update: It appears that there is a difference between plotting a dataframe, and plotting a series. When plotting a dataframe, the labels are taken from the column names. However, when specifying y="Value" you are then plotting a series, which then actually uses the label argument.
I'm puzzled by the meaning of the 'ax' keyword in the pandas scatter_matrix function:
pd.scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, diagonal='hist', marker='.', density_kwds={}, hist_kwds={}, **kwds)
The only clue given in the docstring for the ax keyword is too cryptic for me:
ax : Matplotlib axis object
I had a look in the pandas code for the scatter_matrix function, and the ax variable is incorporated in the following matplotlib subplots call:
fig, axes = plt.subplots(nrows=n, ncols=n, figsize=figsize, ax=ax,
squeeze=False)
But, for the life of me, I can't find any reference to an 'ax' keyword in matplotlib subplots!
Can anyone tell me what this ax keyword is for???
This is tricky here. When looking at the source of pandas scatter_matrix you will find this line right after the docstring:
fig, axes = _subplots(nrows=n, ncols=n, figsize=figsize, ax=ax, squeeze=False)
Hence, internally, a new figure, axes combination is created using the internal _subplots method. This is strongly related to the matplotlibs subplots command but slightly different. Here, the ax keyword is supplied as well. If you look at the corresponding source (pandas.tools.plotting._subplots) you will find these lines:
if ax is None:
fig = plt.figure(**fig_kw)
else:
fig = ax.get_figure()
fig.clear()
Hence, if you supply an axes object (e.g. created using matplotlibs subplots command), pandas scatter_matrix grabs the corresponding (matplolib) figure object and deletes its content. Afterwards a new subplots grid is created into this figure object.
All in all, the ax keyword allows to plot the scatter matrix into a given figure (even though IMHO in a slightly strange way).
In short, it targets a subplot within a grid.
If you have nrows=2 and ncols=2, for example, then ax allows you to plot on a specific axis by passing ax=axes[0,0] (top left) or ax=axes[1,1] (bottom right), etc.
When you create the subplots, you receive an axes variable. You can later plot (or subplot) with an element of that axes variable as above.
Take a look at the "Targeting different subplots" section of this page: http://pandas.pydata.org/pandas-docs/dev/visualization.html#targeting-different-subplots
I hope this helps.