I am new to Pyplot and simply trying to read data from a .csv file and produce a line plot using ax.plot(x,y):
filepath ='Monthly_Raw_Financials.csv'
raw_data = pd.`read_csv`(filepath, index_col='Month', parse_dates=True)
fig, ax = plt.subplots()
ax.plot(raw_data.index, raw_data['Profit'])
plt.show()
I get only an empty axis with no data plotted and and error message "'Series' object has no attribute 'find'". I am following the example of a number of tutorials. What am I doing wrong?
In pandas, a column is a Series object, which isn't quite the same as a numpy array. It holds a numpy array in its .values attribute, but it also holds an index (.index attribute). I don't understand where your error comes from, but you could try plotting the values instead, i.e.
fig, ax = plt.subplots()
ax.plot(raw_data.index, raw_data['Profit'].values)
plt.show()
Note that you could use the plot method on your dataframe as:
ax = raw_data.plot('Profit')
plt.show()
Related
I am trying to do a plot that uses sub plots in Python, but it doesn't seem to be working.
SO I have a dataframe and there is an Index. This index holds datetime. I have two columns that I would like to plot, and I am trying to use sub plots. Data1 and Data2 are my column names. My code is below
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(x=df[index], y=df['Data1'])
ax2.plot(x=df[index], y=df['Data2'])
plt.show()
When I run this I get the following error.
TypeError: 'DataFrame' object cannot be interpreted as an integer
I was looking to format the ticks in my y-axis (this thread Format y axis as percent is great) but a lot of the solutions were causing an AttributeError for my particular code
#Example code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# arbitary data to plot
df = pd.DataFrame(np.random.randn(100,2))
#plotting the data and then plotting a 0 line over the top
ax = df.plot()
ax = plt.plot([0,100],[0,0])
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])
plt.show()
A bit of testing revealed that the second plot call messed up the ax object it went from being: class 'matplotlib.axes._subplots.AxesSubplot' to class 'list'. Thus it was easy to work-around, don't attribute the second plot to ax:
ax = df.plot()
plt.plot([0,100],[0,0])
Or move the formatting line up in the code before the ax object gets changed:
ax = df.plot()
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])
ax = plt.plot([0,100],[0,0])
So my question is: What is the best practice? Both the work-around's feel incorrect, what is the best way to assign the plt.plot() call to ax object without changing it? Or would best practice be to use plt.plot(df.index, df) instead of df.plot()?
*Note dp8's answer in Format y axis as percent "plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()])" worked regardless of the mess that I made with calls to ax
I think you are getting mixed up, when you do:
ax=df.plot()
you've already made the ax object. if you want to add more to it (like another plot) you can simply just use its methods to do this, such as:
ax.plot([0,100],[0,0])
I have multiple CSV files that I am trying to plot in same the figure to have a comparison between them. I already read some information about pandas problem not keeping memory plot and creating the new one every time. People were talking about using an ax var, but I do not understand it...
For now I have:
def scatter_plot(csvfile,param,exp):
for i in range (1,10):
df = pd.read_csv('{}{}.csv'.format(csvfile,i))
ax = df.plot(kind='scatter',x=param,y ='Adjusted')
df.plot.line(x=param,y='Adjusted',ax=ax,style='b')
plt.show()
plt.savefig('plot/{}/{}'.format(exp,param),dpi=100)
But it's showing me ten plot and only save the last one.
Any idea?
The structure is
create an axes to plot to
run the loop to populate the axes
save and/or show (save before show)
In terms of code:
import matplotlib.pyplot as plt
import pandas as pd
ax = plt.gca()
for i in range (1,10):
df = pd.read_csv(...)
df.plot(..., ax=ax)
df.plot.line(..., ax=ax)
plt.savefig(...)
plt.show()
Pretty much what it says in the title.. most pandas examples suggest doing fig = plt.figure() before df.plot(..). But if I do that, two figures pop up after plt.show() - the first completely empty and the second with the actual pandas figure.. Any ideas why?
On a DataFrame, df.plot(..) will create a new figure, unless you provide an Axes object to the ax keyword argument.
So you are correct that the plt.figure() is not needed in this case. The plt.figure() calls in the pandas documentation should be removed, as they indeed are not needed. There is an issue about this: https://github.com/pydata/pandas/issues/8776
What you can do with the ax keyword is eg:
fig, ax = plt.subplots()
df.plot(..., ax=ax)
Note that when plotting a series, this will by default plot on the 'current' axis (plt.gca()) if you don't provide ax.
In trying to do some small multiple stuff, I want to make a bunch of subplots with Matplotlib and toss in varying data to each. pyplot.subplots() gives me a Figure and Numpy array of Axes, but in trying to iterate over the axes, I am stumped on how to actually get at them.
I'm trying something akin to:
import numpy as np
import matplotlib.pyplot as plt
f, axs = plt.subplots(2,2)
for ax in np.nditer(axs, flags=['refs_ok']):
ax.plot([1,2],[3,4])
However, the type of ax in each iteration is not an Axes, but rather an ndarray, so attempting to plot fails with:
AttributeError: 'numpy.ndarray' object has no attribute 'plot'
How can I loop over my axes?
You can do this more simply:
for ax in axs.ravel():
ax.plot(...)
Numpy arrays have a .flat attribute that returns a 1-D iterator:
for ax in axs.flat:
ax.plot(...)
Another option is reshaping the array to a single dimension:
for ax in np.reshape(axs, -1):
ax.plot(...)