I am iterating through files in folder and for each file I am plotting the close_price on x-axis and date on y-axis.
here is code.Everything is working fine except I want title "abc" to appear on each page but it not coming. What am I doing wrong here.
import os
from matplotlib.backends.backend_pdf import PdfPages
import pandas as pd
import matplotlib.pyplot as plt
pp = PdfPages('multipage.pdf')
pth = "D:/Technical_Data/"
for fle in os.listdir(pth):
df = pd.read_csv(os.path.join(pth, fle),usecols=(0, 4))
if not df.empty:
df=df.astype(float)
plt.title("abc")
df.plot()
pp.savefig()
pp.close()
You should pass the title as an argument of the plot() method, like:
import os
from matplotlib.backends.backend_pdf import PdfPages
import pandas as pd
import matplotlib.pyplot as plt
pp = PdfPages('multipage.pdf')
pth = "D:/Technical_Data/"
for fle in os.listdir(pth):
df = pd.read_csv(os.path.join(pth, fle),usecols=(0, 4))
if not df.empty:
df=df.astype(float)
df.plot(title="abc")
pp.savefig()
pp.close()
Another way would be to put plt.title("abc") after df.plot(). Currently, your title "abc" was overwritten by the default title of df.plot()… which is None.
Related
I was using the ipywidgets dropdown to create plots for the columns listed in the dropdown. I have two issues. Would any one help?
I used the clear_ouput() to clear out the graph before the next selection. But it did not work;
When I first time clicked the first item in the dropdown list ("quarter"), it did not response (No graph showed). I have to select other items first before I can generate the graph for "quarter".
Thanks a lot!
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import ipywidgets as ipw
url = "https://data.london.gov.uk/download/number-international-visitors-london/b1e0f953-4c8a-4b45-95f5-e0d143d5641e/international-visitors-london-raw.csv"
df_london = pd.read_csv(url)
dropdown_Col = ipw.Dropdown(options = ['quarter', 'market', 'dur_stay', 'mode'], description='Sel Col NM:')
output = ipw.Output()
def Col_Sel(ColNm):
output.clear_output()
with output:
sns.set_style("whitegrid")
sns.relplot(x=ColNm, y='visits', data=df_london, kind='line', ci=None)
def dropdown_Col_eventhandler(change):
Col_Sel(change.new)
dropdown_Col.observe(dropdown_Col_eventhandler, names='value')
display(dropdown_Col)
display(output)
I added plt.show() and now The clear_output works.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import ipywidgets as ipw
url = "https://data.london.gov.uk/download/number-international-visitors-london/b1e0f953-4c8a-4b45-95f5-e0d143d5641e/international-visitors-london-raw.csv"
df_london = pd.read_csv(url)
dropdown_Col = ipw.Dropdown(options = ['quarter', 'market', 'dur_stay', 'mode'], description='Sel Col NM:')
output = ipw.Output()
def Col_Sel(ColNm):
output.clear_output()
with output:
sns.set_style("whitegrid")
sns.relplot(x=ColNm, y='visits', data=df_london, kind='line', ci=None)
def dropdown_Col_eventhandler(change):
Col_Sel(change.new)
dropdown_Col.observe(dropdown_Col_eventhandler, names='value')
display(dropdown_Col)
display(output)
I have the following pandas plot:
Is it possible to add '%' sign on the y axis not as a label but on the number. Such as it would show instead of 0.0 it would be 0.0% and so on for all the numbers?
Code:
import pandas as pd
from pandas import datetime
from pandas import DataFrame as df
import matplotlib
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import datetime
end = datetime.date.today()
start = datetime.date(2020,1,1)
data = web.DataReader('fb', 'yahoo', start, end)
data['percent'] = data['Close'].pct_change()
data['percent'].plot()
Here is how you can use matplotlib.ticker:
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
plt.show()
Output:
You can now control the display format of the y-axis. I think it will be 0.0%.
yvals = ax.get_yticks()
ax.set_yticklabels(["{:,.1%}".format(y) for y in yvals], fontsize=12)
You can also use plt.gca() instead of using ax
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(xmax=1.0))
Continuation from: Getting date/time and data out of csv into matplotlib
import datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
import pandas
import StringIO
f = open(r'clean data.csv')
#Make a string buffer and read in the CSV file while stripping \x00's
output = StringIO.StringIO()
for x in f.readlines():
output.write(x.replace('\x00',''))
#Close out input file
f.close()
#Set position back to start for pandas to read
output.seek(0)
df = pandas.read_csv(output, skiprows=38, parse_dates=['Time'], index_col="Time")
fig, ax = plt.subplots()
ax.plot(df.index,df['108 <Air> (C)'])
#ax.xaxis.set_major_locator(mdates.DayLocator())
#ax.format_xdata = mdates.DateFormatter('%Y-%m-%d')
#fig.autofmt_xdate()
plt.show()
So I can actually plot this data with this current code, the problem occurs when I try to continue on with this example: https://matplotlib.org/gallery/api/date.html#sphx-glr-gallery-api-date-py
If you uncomment out
ax.xaxis.set_major_locator(mdates.DayLocator())
I get
OverflowError: Python int too large to convert to C long
Whats up with that?
Here is some input data: https://pastebin.com/SSZyaSJ4
I am interested in plotting a time series with data from several different pandas data frames. I know how to plot a data for a single time series and I know how to do subplots, but how would I manage to plot from several different data frames in a single plot? I have my code below. Basically what I am doing is I am scanning through a folder of json files and parsing that json file into a panda so that I can plot. When I run this code it is only plotting from one of the pandas instead of the ten pandas created. I know that 10 pandas are created because I have a print statement to ensure they are all correct.
import sys, re
import numpy as np
import smtplib
import matplotlib.pyplot as plt
from random import randint
import csv
import pylab as pl
import math
import pandas as pd
from pandas.tools.plotting import scatter_matrix
import argparse
import matplotlib.patches as mpatches
import os
import json
parser = argparse.ArgumentParser()
parser.add_argument('-file', '--f', help = 'folder where JSON files are stored')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
args = parser.parse_args()
dat = {}
i = 0
direc = args.f
directory = os.fsencode(direc)
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
for files in os.listdir(direc):
filename = os.fsdecode(files)
if filename.endswith(".json"):
path = '/Users/Katie/Desktop/Work/' + args.f + "/" +filename
with open(path, 'r') as data_file:
data = json.load(data_file)
for r in data["commits"]:
dat[i] = (r["author_name"], r["num_deletions"], r["num_insertions"], r["num_lines_changed"],
r["num_files_changed"], r["author_date"])
name = "df" + str(i).zfill(2)
i = i + 1
name = pd.DataFrame.from_dict(dat, orient='index').reset_index()
name.columns = ["index", "author_name", "num_deletions",
"num_insertions", "num_lines_changed",
"num_files_changed", "author_date"]
del name['index']
name['author_date'] = name['author_date'].astype(int)
name['author_date'] = pd.to_datetime(name['author_date'], unit='s')
ax1.plot(name['author_date'], name['num_lines_changed'], '*',c=np.random.rand(3,))
print(name)
continue
else:
continue
plt.xticks(rotation='35')
plt.title('Number of Lines Changed vs. Author Date')
plt.show()
Quite straightforward actually. Don't let pandas confuse you. Underneath it every column is just a numpy array.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
ax1.plot(df1['A'])
ax1.plot(df2['B'])
pd.DataFrame.plot method has an argument ax for this:
fig = plt.figure()
ax = plt.subplot(111)
df1['Col1'].plot(ax=ax)
df2['Col2'].plot(ax=ax)
If you are using pandas plot, the return from datafame.plot is axes, so you can assign the next dataframe.plot equal to that axes.
df1 = pd.DataFrame({'Frame 1':pd.np.arange(5)*2},index=pd.np.arange(5))
df2 = pd.DataFrame({'Frame 2':pd.np.arange(5)*.5},index=pd.np.arange(5))
ax = df1.plot(label='df1')
df2.plot(ax=ax)
Output:
Or if your dataframes have the same index, you can use pd.concat:
pd.concat([df1,df2],axis=1).plot()
Trust me. #omdv's answer is the only solution I have found so far. Pandas dataframe plot function doesn't show plotting at all when you pass ax to it.
df_hdf = pd.read_csv(f_hd, header=None,names=['degree', 'rank', 'hits'],
dtype={'degree': np.int32, 'rank': np.float32, 'hits': np.float32})
df_hdf_pt = pd.read_csv(pt_f_hd, header=None,names=['degree', 'rank', 'hits'],
dtype={'degree': np.int32, 'rank': np.float32, 'hits': np.float32})
ax = plt.subplot()
ax.plot(df_hdf_pt['hits'])
ax.plot(df_hdf['hits'])
trying to label the csv in python...how do I do this loading the legend in the comment in attached code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tarfile
import os
%matplotlib inline
%autosave 20
tar = tarfile.open('C:\Users\mpiercy\projects\sc-sessions-09-25.csv.tgz', mode='r:gz')
tar.extractall('C:\Users\mpiercy\projects\sc-sessions-09-25')
df = pd.read_csv('C:\Users\mpiercy\projects\sc-sessions-09-25\sc-sessions-09-25.csv', header=None)
df.head()
#site name, site id, start_time, end_time, energy_added, start_soe, end_soe
It sounds like you want to label the columns, which you can just do like this:
df.columns = ('site_name', 'site_id', 'start_time', 'end_time', 'energy_added', 'start_soe', 'end_soe')