Title not appearing in pdf

Title not appearing in pdf - python

I am iterating through files in folder and for each file I am plotting the close_price on x-axis and date on y-axis.
here is code.Everything is working fine except I want title "abc" to appear on each page but it not coming. What am I doing wrong here.
import os
from matplotlib.backends.backend_pdf import PdfPages
import pandas as pd
import matplotlib.pyplot as plt
pp = PdfPages('multipage.pdf')
pth = "D:/Technical_Data/"
for fle in os.listdir(pth):
df = pd.read_csv(os.path.join(pth, fle),usecols=(0, 4))
if not df.empty:
df=df.astype(float)
plt.title("abc")
df.plot()
pp.savefig()
pp.close()

You should pass the title as an argument of the plot() method, like:
import os
from matplotlib.backends.backend_pdf import PdfPages
import pandas as pd
import matplotlib.pyplot as plt
pp = PdfPages('multipage.pdf')
pth = "D:/Technical_Data/"
for fle in os.listdir(pth):
df = pd.read_csv(os.path.join(pth, fle),usecols=(0, 4))
if not df.empty:
df=df.astype(float)
df.plot(title="abc")
pp.savefig()
pp.close()
Another way would be to put plt.title("abc") after df.plot(). Currently, your title "abc" was overwritten by the default title of df.plot()… which is None.

Related

Jupyter widgets clear_output() does not work

I was using the ipywidgets dropdown to create plots for the columns listed in the dropdown. I have two issues. Would any one help?
I used the clear_ouput() to clear out the graph before the next selection. But it did not work;
When I first time clicked the first item in the dropdown list ("quarter"), it did not response (No graph showed). I have to select other items first before I can generate the graph for "quarter".
Thanks a lot!
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import ipywidgets as ipw
url = "https://data.london.gov.uk/download/number-international-visitors-london/b1e0f953-4c8a-4b45-95f5-e0d143d5641e/international-visitors-london-raw.csv"
df_london = pd.read_csv(url)
dropdown_Col = ipw.Dropdown(options = ['quarter', 'market', 'dur_stay', 'mode'], description='Sel Col NM:')
output = ipw.Output()
def Col_Sel(ColNm):
output.clear_output()
with output:
sns.set_style("whitegrid")
sns.relplot(x=ColNm, y='visits', data=df_london, kind='line', ci=None)
def dropdown_Col_eventhandler(change):
Col_Sel(change.new)
dropdown_Col.observe(dropdown_Col_eventhandler, names='value')
display(dropdown_Col)
display(output)

I added plt.show() and now The clear_output works.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import ipywidgets as ipw
url = "https://data.london.gov.uk/download/number-international-visitors-london/b1e0f953-4c8a-4b45-95f5-e0d143d5641e/international-visitors-london-raw.csv"
df_london = pd.read_csv(url)
dropdown_Col = ipw.Dropdown(options = ['quarter', 'market', 'dur_stay', 'mode'], description='Sel Col NM:')
output = ipw.Output()
def Col_Sel(ColNm):
output.clear_output()
with output:
sns.set_style("whitegrid")
sns.relplot(x=ColNm, y='visits', data=df_london, kind='line', ci=None)
def dropdown_Col_eventhandler(change):
Col_Sel(change.new)
dropdown_Col.observe(dropdown_Col_eventhandler, names='value')
display(dropdown_Col)
display(output)

How to format y-axis to show percentage with Python Jupyter Notebook lmplot? [duplicate]

I have the following pandas plot:
Is it possible to add '%' sign on the y axis not as a label but on the number. Such as it would show instead of 0.0 it would be 0.0% and so on for all the numbers?
Code:
import pandas as pd
from pandas import datetime
from pandas import DataFrame as df
import matplotlib
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import datetime
end = datetime.date.today()
start = datetime.date(2020,1,1)
data = web.DataReader('fb', 'yahoo', start, end)
data['percent'] = data['Close'].pct_change()
data['percent'].plot()

Here is how you can use matplotlib.ticker:
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
plt.show()
Output:

You can now control the display format of the y-axis. I think it will be 0.0%.
yvals = ax.get_yticks()
ax.set_yticklabels(["{:,.1%}".format(y) for y in yvals], fontsize=12)

You can also use plt.gca() instead of using ax
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(xmax=1.0))

pandas dataset OverflowError when trying to use datetime data

Continuation from: Getting date/time and data out of csv into matplotlib
import datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
import pandas
import StringIO
f = open(r'clean data.csv')
#Make a string buffer and read in the CSV file while stripping \x00's
output = StringIO.StringIO()
for x in f.readlines():
output.write(x.replace('\x00',''))
#Close out input file
f.close()
#Set position back to start for pandas to read
output.seek(0)
df = pandas.read_csv(output, skiprows=38, parse_dates=['Time'], index_col="Time")
fig, ax = plt.subplots()
ax.plot(df.index,df['108 <Air> (C)'])
#ax.xaxis.set_major_locator(mdates.DayLocator())
#ax.format_xdata = mdates.DateFormatter('%Y-%m-%d')
#fig.autofmt_xdate()
plt.show()
So I can actually plot this data with this current code, the problem occurs when I try to continue on with this example: https://matplotlib.org/gallery/api/date.html#sphx-glr-gallery-api-date-py
If you uncomment out
ax.xaxis.set_major_locator(mdates.DayLocator())
I get
OverflowError: Python int too large to convert to C long
Whats up with that?
Here is some input data: https://pastebin.com/SSZyaSJ4

Plotting data from multiple pandas data frames in one plot

I am interested in plotting a time series with data from several different pandas data frames. I know how to plot a data for a single time series and I know how to do subplots, but how would I manage to plot from several different data frames in a single plot? I have my code below. Basically what I am doing is I am scanning through a folder of json files and parsing that json file into a panda so that I can plot. When I run this code it is only plotting from one of the pandas instead of the ten pandas created. I know that 10 pandas are created because I have a print statement to ensure they are all correct.
import sys, re
import numpy as np
import smtplib
import matplotlib.pyplot as plt
from random import randint
import csv
import pylab as pl
import math
import pandas as pd
from pandas.tools.plotting import scatter_matrix
import argparse
import matplotlib.patches as mpatches
import os
import json
parser = argparse.ArgumentParser()
parser.add_argument('-file', '--f', help = 'folder where JSON files are stored')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
args = parser.parse_args()
dat = {}
i = 0
direc = args.f
directory = os.fsencode(direc)
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
for files in os.listdir(direc):
filename = os.fsdecode(files)
if filename.endswith(".json"):
path = '/Users/Katie/Desktop/Work/' + args.f + "/" +filename
with open(path, 'r') as data_file:
data = json.load(data_file)
for r in data["commits"]:
dat[i] = (r["author_name"], r["num_deletions"], r["num_insertions"], r["num_lines_changed"],
r["num_files_changed"], r["author_date"])
name = "df" + str(i).zfill(2)
i = i + 1
name = pd.DataFrame.from_dict(dat, orient='index').reset_index()
name.columns = ["index", "author_name", "num_deletions",
"num_insertions", "num_lines_changed",
"num_files_changed", "author_date"]
del name['index']
name['author_date'] = name['author_date'].astype(int)
name['author_date'] = pd.to_datetime(name['author_date'], unit='s')
ax1.plot(name['author_date'], name['num_lines_changed'], '*',c=np.random.rand(3,))
print(name)
continue
else:
continue
plt.xticks(rotation='35')
plt.title('Number of Lines Changed vs. Author Date')
plt.show()

Quite straightforward actually. Don't let pandas confuse you. Underneath it every column is just a numpy array.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
ax1.plot(df1['A'])
ax1.plot(df2['B'])

pd.DataFrame.plot method has an argument ax for this:
fig = plt.figure()
ax = plt.subplot(111)
df1['Col1'].plot(ax=ax)
df2['Col2'].plot(ax=ax)

If you are using pandas plot, the return from datafame.plot is axes, so you can assign the next dataframe.plot equal to that axes.
df1 = pd.DataFrame({'Frame 1':pd.np.arange(5)*2},index=pd.np.arange(5))
df2 = pd.DataFrame({'Frame 2':pd.np.arange(5)*.5},index=pd.np.arange(5))
ax = df1.plot(label='df1')
df2.plot(ax=ax)
Output:
Or if your dataframes have the same index, you can use pd.concat:
pd.concat([df1,df2],axis=1).plot()

Trust me. #omdv's answer is the only solution I have found so far. Pandas dataframe plot function doesn't show plotting at all when you pass ax to it.
df_hdf = pd.read_csv(f_hd, header=None,names=['degree', 'rank', 'hits'],
dtype={'degree': np.int32, 'rank': np.float32, 'hits': np.float32})
df_hdf_pt = pd.read_csv(pt_f_hd, header=None,names=['degree', 'rank', 'hits'],
dtype={'degree': np.int32, 'rank': np.float32, 'hits': np.float32})
ax = plt.subplot()
ax.plot(df_hdf_pt['hits'])
ax.plot(df_hdf['hits'])

trying to label the csv in python...how do i do this loading the legend in the comment in attached code

trying to label the csv in python...how do I do this loading the legend in the comment in attached code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tarfile
import os
%matplotlib inline
%autosave 20
tar = tarfile.open('C:\Users\mpiercy\projects\sc-sessions-09-25.csv.tgz', mode='r:gz')
tar.extractall('C:\Users\mpiercy\projects\sc-sessions-09-25')
df = pd.read_csv('C:\Users\mpiercy\projects\sc-sessions-09-25\sc-sessions-09-25.csv', header=None)
df.head()
#site name, site id, start_time, end_time, energy_added, start_soe, end_soe

It sounds like you want to label the columns, which you can just do like this:
df.columns = ('site_name', 'site_id', 'start_time', 'end_time', 'energy_added', 'start_soe', 'end_soe')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Title not appearing in pdf - python

Related

Jupyter widgets clear_output() does not work

How to format y-axis to show percentage with Python Jupyter Notebook lmplot? [duplicate]

pandas dataset OverflowError when trying to use datetime data

Plotting data from multiple pandas data frames in one plot

trying to label the csv in python...how do i do this loading the legend in the comment in attached code

Categories

Resources