Plotting data from multiple pandas data frames in one plot

Plotting data from multiple pandas data frames in one plot - python

I am interested in plotting a time series with data from several different pandas data frames. I know how to plot a data for a single time series and I know how to do subplots, but how would I manage to plot from several different data frames in a single plot? I have my code below. Basically what I am doing is I am scanning through a folder of json files and parsing that json file into a panda so that I can plot. When I run this code it is only plotting from one of the pandas instead of the ten pandas created. I know that 10 pandas are created because I have a print statement to ensure they are all correct.
import sys, re
import numpy as np
import smtplib
import matplotlib.pyplot as plt
from random import randint
import csv
import pylab as pl
import math
import pandas as pd
from pandas.tools.plotting import scatter_matrix
import argparse
import matplotlib.patches as mpatches
import os
import json
parser = argparse.ArgumentParser()
parser.add_argument('-file', '--f', help = 'folder where JSON files are stored')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
args = parser.parse_args()
dat = {}
i = 0
direc = args.f
directory = os.fsencode(direc)
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
for files in os.listdir(direc):
filename = os.fsdecode(files)
if filename.endswith(".json"):
path = '/Users/Katie/Desktop/Work/' + args.f + "/" +filename
with open(path, 'r') as data_file:
data = json.load(data_file)
for r in data["commits"]:
dat[i] = (r["author_name"], r["num_deletions"], r["num_insertions"], r["num_lines_changed"],
r["num_files_changed"], r["author_date"])
name = "df" + str(i).zfill(2)
i = i + 1
name = pd.DataFrame.from_dict(dat, orient='index').reset_index()
name.columns = ["index", "author_name", "num_deletions",
"num_insertions", "num_lines_changed",
"num_files_changed", "author_date"]
del name['index']
name['author_date'] = name['author_date'].astype(int)
name['author_date'] = pd.to_datetime(name['author_date'], unit='s')
ax1.plot(name['author_date'], name['num_lines_changed'], '*',c=np.random.rand(3,))
print(name)
continue
else:
continue
plt.xticks(rotation='35')
plt.title('Number of Lines Changed vs. Author Date')
plt.show()

Quite straightforward actually. Don't let pandas confuse you. Underneath it every column is just a numpy array.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
ax1.plot(df1['A'])
ax1.plot(df2['B'])

pd.DataFrame.plot method has an argument ax for this:
fig = plt.figure()
ax = plt.subplot(111)
df1['Col1'].plot(ax=ax)
df2['Col2'].plot(ax=ax)

If you are using pandas plot, the return from datafame.plot is axes, so you can assign the next dataframe.plot equal to that axes.
df1 = pd.DataFrame({'Frame 1':pd.np.arange(5)*2},index=pd.np.arange(5))
df2 = pd.DataFrame({'Frame 2':pd.np.arange(5)*.5},index=pd.np.arange(5))
ax = df1.plot(label='df1')
df2.plot(ax=ax)
Output:
Or if your dataframes have the same index, you can use pd.concat:
pd.concat([df1,df2],axis=1).plot()

Trust me. #omdv's answer is the only solution I have found so far. Pandas dataframe plot function doesn't show plotting at all when you pass ax to it.
df_hdf = pd.read_csv(f_hd, header=None,names=['degree', 'rank', 'hits'],
dtype={'degree': np.int32, 'rank': np.float32, 'hits': np.float32})
df_hdf_pt = pd.read_csv(pt_f_hd, header=None,names=['degree', 'rank', 'hits'],
dtype={'degree': np.int32, 'rank': np.float32, 'hits': np.float32})
ax = plt.subplot()
ax.plot(df_hdf_pt['hits'])
ax.plot(df_hdf['hits'])

Related

Plotting 3 diffrent coloums from a CSV file python

My goal is to use the sorted result data to plot "Month vs Mean Temp" graph for each year on the same window.
I've sorted the first two columns that have the year and the month respectively and then saved the new sorted data into a file called NewFile, but I can't seem to get to a solution here, I used csv reader and now I'm using numpy,
Code:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
csv1 = open('Data_5.1.csv')
data = np.array(list(csv.reader(csv1,delimiter=',').astype("string")
year = data[:,0]
mounth = data[:,1]
temp= data[:,3]
fig, ax = plt.subplots(figsize=(10,10))
ax.plot(year, mounth, label='mounth/year')
ax.plot(year, temp, label='year/temp')
plt.legend()
But it just throws an error saying:
File "<ipython-input-282-282e91df631f>", line 9
year = data[:,0]
^
SyntaxError: invalid syntax
I will put two links to the files, the Data_5.1 and the NewFile respectively
Data_5.1
NewFile

1 - You didn't close brackets in line 6, hence you are getting the error in line 8.
2 - astype("string") is not needed in line 6.
I fixed your code, but you will have to complete the subplotting. Good luck!
import numpy as np
import matplotlib.pyplot as plt
import csv
plt.style.use('ggplot')
csv1 = open('Data_5.1.csv')
data = np.array(list(csv.reader(csv1,delimiter=',')))
year = data[:,0]
mounth = data[:,1]
temp= data[:,3]
fig, ax = plt.subplots(2,2) #This will create 4X4 subplots in one window
ax[0,0].plot(year, mounth, label='mounth/year') #This will plot in the 0,0 subplot
ax[0,1].plot(year, temp, label='year/temp') #This will plot in the 0,1 subplot
'''
For you to continue.
'''
plt.legend()
plt.show()

Your data is in a CSV file, and it's non-homogenous in type. Pandas is really the more appropriate tool for this.
I had to adapt your CSV slightly due to encoding errors, here is what it ended up looking like:
year,Month,Other Month,temperature_C
2003,Jan.,Some val,17.7
2004,Jan.,Some val,19.5
2005,Jan.,Some val,17.3
2006,Jan.,Some val,17.8
...
Here is a general sketch of what the code you shared could look like after the refactoring:
import matplotlib.pyplot as plt
import pandas as pd
plt.style.use('ggplot')
# csv1 = open('Data_5.1.csv')
# data = np.array(list(csv.reader(csv1,delimiter=',').astype("string")
df_1 = pd.read_csv('../resources/Data_5.1.csv', header=0, names=['year', 'month', 'some_col', 'temp'],
dtype={'some_col': str, 'temp': float, 'month': str, 'year': str})
year = df_1['year']
month = df_1['month']
temp = df_1['temp']
fig, ax = plt.subplots(figsize=(10, 10))
ax.plot(year, month, label='month/year')
ax.plot(year, temp, label='year/temp')
plt.show()
Let me know if you have any questions :)

Plotting 2 columns from multiple csv files from NASDAQ in a directory

I am trying to make a program where I read multiple csv files in a directory. The files has been downloaded from http://www.nasdaqomxnordic.com/aktier/historiskakurser
The first row is sep= and it is skipped. The separator is ';'
The problem is that even though I get the data printed from all csv files, I get only blank plots.
The idea is to show a plot of data in column 6 with date as x-axis (column 0) for one csv file at a time and so on until the given directory is empty.
I would prefer the name of the csv file (paper) only as title. Now I get the directory/csv name.
It seems as matplotlib do not understand the csv file correct even though the data is printed.
My code looks as this:
import pandas as pd
#import csv
import glob
import matplotlib.pyplot as plt
#from matplotlib.dates import date2num
import pylab
#import numpy as np
#from matplotlib import style
ferms = glob.glob("OMX-C20_ScrapeData_Long_Name/*.csv")
for ferm in ferms:
print(ferm)
# define the dataframe
data = pd.read_csv(ferm, skiprows=[0], encoding='utf-8', sep=';', header=0)
print(data)
data.head()
pylab.rcParams['figure.figsize'] = (25, 20)
plt.rcParams['figure.dpi'] = 80
plt.rcParams['legend.fontsize'] = 'medium'
plt.rcParams['figure.titlesize'] = 'large'
plt.rcParams['figure.autolayout'] = 'true'
plt.rcParams['xtick.minor.visible'] = 'true'
plt.xlabel('Date')
plt.ylabel('Closing price')
plt.title(ferm)
plt.show()
I have tried some other ways to open the csv files but the result is the same. No curves.
Hope one of you experienced guys can give a hint.

I have made a few additions to your code. I downloaded a single file from the page you linked and ran the below code. Change your ferms and add the for loop back again. One reason why you weren't getting anything is because you haven't plotted the data anywhere. You have changed the aesthetics and everything but nowhere in your code you are telling python that you want to plot this data.
Secondly even if you add the plotting command it still wouldn't plot because neither of Date and Closing price are in numeric format. I change the Date column to datetime format. Your Closing price is a comma separated string. It could be representing a number either in thousands or maybe a decimal. I have assumed it is a decimal although its more likely to a number in thousands separated by a comma. I have changed it to numeric by using a self defined function called to_num using the apply method of pandas dataframe. It replaces the comma with a decimal.
import pandas as pd
#import csv
import glob
import matplotlib.pyplot as plt
#from matplotlib.dates import date2num
import pylab
#import numpy as np
#from matplotlib import style
ferm = glob.glob("Downloads/trial/*.csv")[0]
def to_num(inpt_string):
nums = [x.strip() for x in inpt_string.split()]
return float(''.join(nums).replace(',', '.'))
# print(ferm)
data = pd.read_csv(ferm, skiprows=[0], encoding='utf-8', sep=';', header=0)
data['Date'] = pd.to_datetime(data['Date'])
data['Closing price'] = data['Closing price'].apply(to_num)
# print(data)
# data.head()
pylab.rcParams['figure.figsize'] = (25, 20)
plt.rcParams['figure.dpi'] = 80
plt.rcParams['legend.fontsize'] = 'medium'
plt.rcParams['figure.titlesize'] = 'large'
plt.rcParams['figure.autolayout'] = 'true'
plt.rcParams['xtick.minor.visible'] = 'true'
plt.xlabel('Date')
plt.ylabel('Closing price')
plt.title(ferm)
plt.plot(data.loc[:,'Date'], data.loc[:,'Closing price']) # this line plots the data
plt.show()
EDIT
Maintaining the same code structure as yours -
import pandas as pd
#import csv
import glob
import matplotlib.pyplot as plt
#from matplotlib.dates import date2num
import pylab
#import numpy as np
#from matplotlib import style
ferms = glob.glob("OMX-C20_ScrapeData_Long_Name/*.csv")
def to_num(inpt_string):
nums = [x.strip() for x in inpt_string.split()]
return float(''.join(nums).replace(',', '.'))
for ferm in ferms:
data = pd.read_csv(ferm, skiprows=[0], encoding='utf-8', sep=';', header=0)
data['Date'] = pd.to_datetime(data['Date'])
data['Closing price'] = data['Closing price'].apply(to_num) # change to numeric
# print(data)
# data.head()
pylab.rcParams['figure.figsize'] = (25, 20)
plt.rcParams['figure.dpi'] = 80
plt.rcParams['legend.fontsize'] = 'medium'
plt.rcParams['figure.titlesize'] = 'large'
plt.rcParams['figure.autolayout'] = 'true'
plt.rcParams['xtick.minor.visible'] = 'true'
plt.xlabel('Date')
plt.ylabel('Closing price')
plt.title(ferm)
plt.plot(data.loc[:,'Date'], data.loc[:,'Closing price'])
plt.show()

How to plot DataFrames? in Python

I'm trying to plot a DataFrame, but I'm not getting the results I need. This is an example of what I'm trying to do and what I'm currently getting. (I'm new in Python)
import pandas as pd
import matplotlib.pyplot as plt
my_data = {1965:{'a':52, 'b':54, 'c':67, 'd':45},
1966:{'a':34, 'b':34, 'c':35, 'd':76},
1967:{'a':56, 'b':56, 'c':54, 'd':34}}
df = pd.DataFrame(my_data)
df.plot( style=[])
plt.show()
I'm getting the following graph, but what I need is: the years in the X axis and each line must be what is currently in X axis (a,b,c,d). Thanks for your help!!.

import pandas as pd
import matplotlib.pyplot as plt
my_data = {1965:{'a':52, 'b':54, 'c':67, 'd':45},
1966:{'a':34, 'b':34, 'c':35, 'd':76},
1967:{'a':56, 'b':56, 'c':54, 'd':34}}
df = pd.DataFrame(my_data)
df.T.plot( kind='bar') # or df.T.plot.bar()
plt.show()
Updates:
If this is what you want:
df = pd.DataFrame(my_data)
df.columns=[str(x) for x in df.columns] # convert year numerical values to str
df.T.plot()
plt.show()

you can do it this way:
ax = df.T.plot(linewidth=2.5)
plt.locator_params(nbins=len(df.columns))
ax.xaxis.set_major_formatter(mtick.FormatStrFormatter('%4d'))

Title not appearing in pdf

I am iterating through files in folder and for each file I am plotting the close_price on x-axis and date on y-axis.
here is code.Everything is working fine except I want title "abc" to appear on each page but it not coming. What am I doing wrong here.
import os
from matplotlib.backends.backend_pdf import PdfPages
import pandas as pd
import matplotlib.pyplot as plt
pp = PdfPages('multipage.pdf')
pth = "D:/Technical_Data/"
for fle in os.listdir(pth):
df = pd.read_csv(os.path.join(pth, fle),usecols=(0, 4))
if not df.empty:
df=df.astype(float)
plt.title("abc")
df.plot()
pp.savefig()
pp.close()

You should pass the title as an argument of the plot() method, like:
import os
from matplotlib.backends.backend_pdf import PdfPages
import pandas as pd
import matplotlib.pyplot as plt
pp = PdfPages('multipage.pdf')
pth = "D:/Technical_Data/"
for fle in os.listdir(pth):
df = pd.read_csv(os.path.join(pth, fle),usecols=(0, 4))
if not df.empty:
df=df.astype(float)
df.plot(title="abc")
pp.savefig()
pp.close()
Another way would be to put plt.title("abc") after df.plot(). Currently, your title "abc" was overwritten by the default title of df.plot()… which is None.

Matplotlib DateFormatter for axis label not working

I'm trying to adjust the formatting of the date tick labels of the x-axis so that it only shows the Year and Month values. From what I've found online, I have to use mdates.DateFormatter, but it's not taking effect at all with my current code as is. Anyone see where the issue is? (the dates are the index of the pandas Dataframe)
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
fig = plt.figure(figsize = (10,6))
ax = fig.add_subplot(111)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
basicDF['some_column'].plot(ax=ax, kind='bar', rot=75)
ax.xaxis_date()
Reproducible scenario code:
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
rng = pd.date_range('1/1/2014', periods=20, freq='m')
blah = pd.DataFrame(data = np.random.randn(len(rng)), index=rng)
fig = plt.figure(figsize = (10,6))
ax = fig.add_subplot(111)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
blah.plot(ax=ax, kind='bar')
ax.xaxis_date()
Still can't get just the year and month to show up.
If I set the format after .plot , get an error like this:
ValueError: DateFormatter found a value of x=0, which is an illegal date. This usually occurs because you have not informed the axis that it is plotting dates, e.g., with ax.xaxis_date().
It's the same for if I put it before ax.xaxis_date() or after.

pandas just doesn't work well with custom date-time formats.
You need to just use raw matplotlib in cases like this.
import numpy
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas
N = 20
numpy.random.seed(N)
dates = pandas.date_range('1/1/2014', periods=N, freq='m')
df = pandas.DataFrame(
data=numpy.random.randn(N),
index=dates,
columns=['A']
)
fig, ax = plt.subplots(figsize=(10, 6))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
ax.bar(df.index, df['A'], width=25, align='center')
And that gives me:

Solution with pandas only
You can create nicely formatted ticks by using the DatetimeIndex and taking advantage of the datetime properties of the timestamps. Tick locators and formatters from matplotlib.dates are not necessary for a case like this unless you would want dynamic ticks when using the interactive interface of matplotlib for zooming in and out (more relevant for time ranges longer than in this example).
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
# Create sample time series with month start frequency, plot it with a pandas bar chart
rng = np.random.default_rng(seed=1) # random number generator
dti = pd.date_range('1/1/2014', periods=20, freq='m')
df = pd.DataFrame(data=rng.normal(size=dti.size), index=dti)
ax = df.plot.bar(figsize=(10,4), legend=None)
# Set major ticks and tick labels
ax.set_xticks(range(df.index.size))
ax.set_xticklabels([ts.strftime('%b\n%Y') if ts.year != df.index[idx-1].year
else ts.strftime('%b') for idx, ts in enumerate(df.index)])
ax.figure.autofmt_xdate(rotation=0, ha='center');

The accepted answer claims that "pandas won't work well with custom date-time formats", but you can make use of pandas' to_datetime() function to use your existing datetime Series in the dataframe:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import pandas as pd
rng = pd.date_range('1/1/2014', periods=20, freq='m')
blah = pd.DataFrame(data = np.random.randn(len(rng)), index=pd.to_datetime(rng))
fig, ax = plt.subplots()
ax.xaxis.set_major_formatter(DateFormatter('%m-%Y'))
ax.bar(blah.index, blah[0], width=25, align='center')
Will result in:
You can see the different available formats here.

I stepped into the same problem and I used an workaround to transform the index from date time format into the desired string format:
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
rng = pd.date_range('1/1/2014', periods=20, freq='m')
blah = pd.DataFrame(data = np.random.randn(len(rng)), index=rng)
fig = plt.figure(figsize = (10,6))
ax = fig.add_subplot(111)
# transform index to strings
blah_test = blah.copy()
str_index = []
for s_year,s_month in zip(blah.index.year.values,blah.index.month.values):
# build string accorind to format "%Y-%m"
string_day = '{}-{:02d}'.format(s_year,s_month)
str_index.append(string_day)
blah_test.index = str_index
blah_test.plot(ax=ax, kind='bar', rot=45)
plt.show()
which results in the following figure:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting data from multiple pandas data frames in one plot - python

pd.DataFrame.plot method has an argument ax for this: fig = plt.figure() ax = plt.subplot(111) df1['Col1'].plot(ax=ax) df2['Col2'].plot(ax=ax)

Related

Plotting 3 diffrent coloums from a CSV file python

Plotting 2 columns from multiple csv files from NASDAQ in a directory

How to plot DataFrames? in Python

Title not appearing in pdf

Matplotlib DateFormatter for axis label not working

Categories

Resources