Plotting dates only when frequency changes - python

I have been trying to plot date against frequency.
This is how my data set looks like:
2017-07-04,13
2018-04-11,13
2017-08-17,13
2017-08-30,13
2018-04-26,12
2018-01-03,12
2017-07-05,11
2017-06-21,11
This is the code I have tried:
with open('test.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(temp)
### Extract data from CSV ###
with open('test.csv', 'r') as n:
reader = csv.reader(n)
dates = []
freq = []
for row in reader:
dates.append(row[0])
freq.append(row[1])
fig = plt.figure()
graph = fig.add_subplot(111)
# Plot the data as a red line with round markers
graph.plot(dates, freq, 'r-o')
graph.set_xticks(dates)
graph.set_xticklabels(
[dates]
)
plt.show()
This is the result I got:
The xlabels are very cluttered. I want the dates in the labels to be displayed only when there is a change of value.
I don't know how to do that.
Help is appreciated.
Thanks!

Firstly, I would strongly encourage you to use the pandas library and its DataFrame object to handle your data. It has some very useful functions, such as read_csv, which will save you some work.
To have matplotlib space the xticks more sensibly, you'll want to convert your dates to datetime objects (instead of storing your dates as strings).
Here I'll read your data in with pandas, parse the dates and order by date:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Read data
df = pd.read_csv('/path/to/test.csv', names=['date', 'freq'], parse_dates=['date'])
# Sort by date
df.sort_values(by='date', inplace=True)
You can then go ahead and plot the data (you'll need the latest version of pandas to automatically handle the dates):
fig, ax = plt.subplots(1, 1)
# Plot date against frequency
ax.plot(df['date'], df['freq'], 'r-o')
# Rotate the tick labels
ax.tick_params(axis='x', rotation=45)
fig.tight_layout()
If you only wanted to display dates when the frequency changes, the following would work
ax.set_xticks(df.loc[np.diff(df['freq']) != 0, 'date'])
though I wouldn't recommend it (the unequal spacing looks messy)

Related

Newbie Matplotlib and Pandas Plotting from CSV file

I haven't had much training with Matplotlib at all, and this really seems like a basic plotting application, but I'm getting nothing but errors.
Using Python 3, I'm simply trying to plot historical stock price data from a CSV file, using the date as the x axis and prices as the y. The data CSV looks like this:
(only just now noticing to big gap in times, but whatever)
import glob
import pandas as pd
import matplotlib.pyplot as plt
def plot_test():
files = glob.glob('./data/test/*.csv')
for file in files:
df = pd.read_csv(file, header=1, delimiter=',', index_col=1)
df['close'].plot()
plt.show()
plot_test()
I'm using glob for now just to identify any CSV file in that folder, but I've also tried just designating one specific CSV filename and get the same error:
KeyError: 'close'
I've also tried just designating a specific column number to only plot one particular column instead, but I don't know what's going on.
Ideally, I would like to plot it just like real stock data, where everything is on the same graph, volume at the bottom on it's own axis, open high low close on the y axis, and date on the x axis for every row in the file. I've tried a few different solutions but can't seem to figure it out. I know this has probably been asked before but I've tried lots of different solutions from SO and others but mine seems to be hanging up on me. Thanks so much for the newbie help!
Here on pandas documentation you can find that the header kwarg should be 0 for your csv, as the first row contains the column names. What is happening is that the DataFrame you are building doesn't have the column close, as it is taking the headers from the "second" row. It will probably work fine if you take the header kwarg or change it to header=0. It is the same with the other kwargs, no need to define them. A simple df = pd.read_csv(file) will do just fine.
You can prettify this according to your needs
import pandas
import matplotlib.pyplot as plt
def plot_test(file):
df = pandas.read_csv(file)
# convert timestamp
df['timestamp'] = pandas.to_datetime(df['timestamp'], format = '%Y-%m-%d %H:%M')
# plot prices
ax1 = plt.subplot(211)
ax1.plot_date(df['timestamp'], df['open'], '-', label = 'open')
ax1.plot_date(df['timestamp'], df['close'], '-', label = 'close')
ax1.plot_date(df['timestamp'], df['high'], '-', label = 'high')
ax1.plot_date(df['timestamp'], df['low'], '-', label = 'low')
ax1.legend()
# plot volume
ax2 = plt.subplot(212)
# issue: https://github.com/matplotlib/matplotlib/issues/9610
df.set_index('timestamp', inplace = True)
df.index.to_pydatetime()
ax2.bar(df.index, df['volume'], width = 1e-3)
ax2.xaxis_date()
plt.show()

How to filter csv with matplotlib using dates on the x-axis and rainfall on the y-axis?

I have a csv file with all two columns one that says 'Date' and the other that has the rainfall amount in inches called 'Rainfall'. I am not sure how to go about this, so far my approach has not been working. I also need to skip the first 5 lines before I enter into the 'Date' and 'Rainfall' column.
Here is the code I have so far:
import matplotlib.pyplot as plt
import csv
x = []
y = []
with open('1541553208_et.csv','r') as csvfile:
plots = csv.reader(csvfile, delimiter=',')
for row in plots:
for i in row:
x.append(row[0])
y.append(row[1])
plt.plot(x,y, label='Loaded from file!')
plt.xlabel('Dates')
plt.ylabel('Evaporation (inches)')
plt.title('Eden_7')
plt.legend()
plt.show()
When I run the code I get the following incorrect results:
I want to have it so that each months rainfall data is clustered into one
Here is an example of what I am going on:
I am trying to get the same effect as the top. How could this be done?
Thank you
You may have a simpler time using the pandas library instead of the csv library.
For instance, pandas allows you to store the csv file directly into a data structure called a dataframe. This will allow you to group on dates or rainfall and plot the data.
import pandas as pd
# rain will be an dataframe instance
rain = pd.read_csv(csvfile)
rain = rain.groupby(rain['rainfall'])
rain.plot(kind='bar')
plt.show()
Play around with it, pandas is very powerful.
You can find the pandas documentation here: https://pandas.pydata.org/pandas-docs/stable/
While this may not be an immediate solution, it may help in the long run.
Using pandas library will be easier as previously mentioned. Following your csv library can you try to run this,
import matplotlib.pyplot as plt
import csv
x = []
y = []
f = open('1541553208_et.csv')
csv_f = csv.reader(f,delimiter=',')
for row in csv_f:
x.append(row[0])
for row in csv_f:
y.append(row[1])
plt.plot(x,y, label='Loaded from file!')
plt.xlabel('Dates')
plt.ylabel('Evaporation (inches)')
plt.title('Eden_7')
plt.legend()
plt.show()

Plot data according to the date recorded

I have the following code that it takes a list of files. Using their data it plots a pie chart and extracts the mean value and percentiles for each file specifically.
The file however, might contain recorded data from several days. (The file has on the left column the date and on the right the values recorded.) Now I have to do the same thing as before, but instead of plotting and getting the mean value from each whole file, I need to plot the pie chart and get the mean value for each date recorded in the file.
import dateutil.parser
import glob
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
selection = input('Press all ')
counter=0
files1 = glob.glob(r'C:\Users\And\Documents\testing *.csv')
d2={}
for sfile in files1:
if selection == 'all':
x=[]
y=[]
z=[]
xtime=0
ytime=0
ztime=0
data_file = np.genfromtxt(sfile, delimiter=',', usecols=range(2),unpack=True,skip_header=10,dtype='U')
tdelta=dateutil.parser.parse(data_file[0][1][11:])-dateutil.parser.parse(data_file[0][0][11:])
tseconds=tdelta.total_seconds()
for i in data_file[1]:
if i != 0:
if float(i) >= 55:
x.append(float(i))
xtime+=tseconds
elif float(i)>40 and float(i)<55:
y.append(float(i))
ytime+=tseconds
else:
z.append(float(i))
ztime+=tseconds
labels = ["upper", "middle", "lower"]
sizes=[xtime,ytime,ztime]
legends=[xtime,ytime,ztime]
colors=["blue","orange","yellow"]
plt.pie(sizes, explode=(0.1,0,0), labels=labels, colors=colors, autopct='%1.1f%%', shadow=False, startangle=140)
plt.legend(legends, loc='best')
plt.axis('equal')
plt.show()
plt.savefig("test{filename}.png".format(filename=counter))
plt.clf()
xarray=np.asarray(x)
yarray=np.asarray(y)
zarray=np.asarray(z)
totalarray=np.append(zarray,np.append(xarray,yarray))
counter+=1
EQ=np.mean(totalarray)
P15, P50, P85 = np.percentile(totalarray, 15), np.percentile(totalarray, 50), np.percentile(totalarray, 85)
d2[sfile[36:]]=[f'{P15:.2f}',f'{P50:.2f}',f'{P85:.2f}', f'{EQ:.2f}']
table1 = pd.DataFrame(d2,index=['P15', 'P50', 'P85', 'EQ'])
table= table1.T
The image shows a portion of the data in the csv file
I am having trouble writing a code that is able to create a list of pie charts according to the different dates that the files contain, and not plotting one pie chart for the whole file. At the end I would like to have a table with the mean value for each date. Any help how to modify the code to do this will be appreciated.

Reading data from csv and create a graph

I have a csv file with data in the following format -
Issue_Type DateTime
Issue1 03/07/2011 11:20:44
Issue2 01/05/2011 12:30:34
Issue3 01/01/2011 09:44:21
... ...
I'm able to read this csv file, but what I'm unable to achieve is to plot a graph or rather trend based on the data.
For instance - I'm trying to plot a graph with X-axis as Datetime(only Month) and Y-axis as #of Issues. So I would show the trend in line-graphy with 3 lines indicating the pattern of issue under each category for the month.
I really don't have a code for plotting the graph and hence can't share any, but so far I'm only reading the csv file. I'm not sure how to proceed further to plot a graph
PS: I'm not bent on using python - Since I've parsed csv using python earlier I though of using the language, but if there is an easier approach using some other language - I would be open explore that as well.
A way to do this is to use dataframes with pandas.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.read_csv("D:\Programmes Python\Data\Data_csv.txt",sep=";") #Reads the csv
df.index = pd.to_datetime(df["DateTime"]) #Set the index of the dataframe to the DateTime column
del df["DateTime"] #The DateTime column is now useless
fig, ax = plt.subplots()
ax.plot(df.index,df["Issue_Type"])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m')) #This will only show the month number on the graph
This assumes that Issue1/2/3 are integers, I assumed they were as I didn't really understand what they were supposed to be.
Edit: This should do the trick then, it's not pretty and can probably be optimised, but it works well:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.read_csv("D:\Programmes Python\Data\Data_csv.txt",sep=";")
df.index = pd.to_datetime(df["DateTime"])
del df["DateTime"]
list=[]
for Issue in df["Issue_Type"]:
list.append(int(Issue[5:]))
df["Issue_number"]=list
fig, ax = plt.subplots()
ax.plot(df.index,df["Issue_number"])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m'))
plt.show()
The first thing you need to do is to parse the datetime fields as dates/times. Try using dateutil.parser for that.
Next, you will need to count the number of issues of each type in each month. The naive way to do that would be to maintain lists of lists for each issue type, and just iterate through each column, see which month and which issue type it is, and then increment the appropriate counter.
When you have such a frequency count of issues, sorted by issue types, you can simply plot them against dates like this:
import matplotlib.pyplot as plt
import datetime as dt
dates = []
for year in range(starting_year, ending_year):
for month in range(1, 12):
dates.append(dt.datetime(year=year, month=month, day=1))
formatted_dates = dates.DateFormatter('%b') # Format dates to only show month names
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(issues[0], dates) # To plot just issues of type 1
ax.plot(issues[1], dates) # To plot just issues of type 2
ax.plot(issues[2], dates) # To plot just issues of type 3
ax.xaxis.set_major_formatter(formatted_dates) # Format X tick labels
plt.show()
plt.close()
honestly, I would just use R. check this link out on downloading / setting up R & RStudio.
data <- read.csv(file="c:/yourdatafile.csv", header=TRUE, sep=",")
attach(data)
data$Month <- format(as.Date(data$DateTime), "%m")
plot(DateTime, Issue_Type)

Using pandas/matplotlib/python, I cannot visualize my csv file as clusters

My csv file is,
https://github.com/camenergydatalab/EnergyDataSimulationChallenge/blob/master/challenge2/data/total_watt.csv
I want to visualize this csv file as clusters.
My ideal result would be the following image.(Higher points (red zone) would be higher energy consumption and lower points (blue zone) would be lower energy consumption.)
I want to set x-axis as dates (e.g. 2011-04-18), y-axis as time (e.g. 13:22:00), and z-axis as energy consumption (e.g. 925.840613752523).
I successfully visualized the csv data file as values per 30mins with the following program.
from matplotlib import style
from matplotlib import pylab as plt
import numpy as np
style.use('ggplot')
filename='total_watt.csv'
date=[]
number=[]
import csv
with open(filename, 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in csvreader:
if len(row) ==2 :
date.append(row[0])
number.append(row[1])
number=np.array(number)
import datetime
for ii in range(len(date)):
date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')
plt.plot(date,number)
plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.show()
I also succeeded to visualize the csv data file as values per day with the following program.
from matplotlib import style
from matplotlib import pylab as plt
import numpy as np
import pandas as pd
style.use('ggplot')
filename='total_watt.csv'
date=[]
number=[]
import csv
with open(filename, 'rb') as csvfile:
df = pd.read_csv('total_watt.csv', parse_dates=[0], index_col=[0])
df = df.resample('1D', how='sum')
import datetime
for ii in range(len(date)):
date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')
plt.plot(date,number)
plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')
df.plot()
plt.show()
Although I could visualize the csv file as values per 30mins and per days, I do not have any idea to visualize the csv data as clusters in 3D..
How can I program it...?
Your main issue is probably just reshaping your data so that you have date along one dimension and time along the other. Once you do that you can use whatever plotting you like best (here I've used matplotlib's mplot3d, but it has some quirks).
What follows takes your data and reshapes it appropriately so you can then plot a surface that I believe is what your are looking for. The key is using the pivot method, which restructures your data by date and time.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
fname = 'total_watt.csv'
# Read in the data, but I skipped setting the index and made sure no data
# is lost to a nonexistent header
df = pd.read_csv(fname, parse_dates=[0], header=None, names=['datetime', 'watt'])
# We want to separate the date from the time, so create two new columns
df['date'] = [x.date() for x in df['datetime']]
df['time'] = [x.time() for x in df['datetime']]
# Now we want to reshape the data so we have dates and times making the result 2D
pv = df.pivot(index='time', columns='date', values='watt')
# Not every date has every time, so fill in the subsequent NaNs or there will be holes
# in the surface
pv = pv.fillna(0.0)
# Now, we need to construct some arrays that matplotlib will like for X and Y values
xx, yy = np.mgrid[0:len(pv),0:len(pv.columns)]
# We can now plot the values directly in matplotlib using mplot3d
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(xx, yy, pv.values, cmap='jet', rstride=1, cstride=1)
ax.grid(False)
# Now we have to adjust the ticks and ticklabels - so turn the values into strings
dates = [x.strftime('%Y-%m-%d') for x in pv.columns]
times = [str(x) for x in pv.index]
# Setting a tick every fifth element seemed about right
ax.set_xticks(xx[::5,0])
ax.set_xticklabels(times[::5])
ax.set_yticks(yy[0,::5])
ax.set_yticklabels(dates[::5])
plt.show()
This gives me (using your data) the following graph:
Note that I've assumed when plotting and making the ticks that your dates and times are linear (which they are in this case). If you have data with uneven samples, you'll have to do some interpolation before plotting.

Categories

Resources