Python Matplotlib Plotting CSV data, formatting date X label - python

My data looks as follows:
2012021305, 65217
2012021306, 82418
2012021307, 71316
2012021308, 66833
2012021309, 69406
2012021310, 76422
2012021311, 94188
2012021312, 111817
2012021313, 127002
2012021314, 141099
2012021315, 147830
2012021316, 136330
2012021317, 122252
2012021318, 118619
2012021319, 115763
2012021320, 121393
2012021321, 130022
2012021322, 137658
2012021323, 139363
Where the first column is the data YYYYMMDDHH . I'm trying to graph the data using the csv2rec module. I can get the data to graph but the x axis and labels are not showing up the way that I expect them to.
import matplotlib
matplotlib.use('Agg')
from matplotlib.mlab import csv2rec
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from pylab import *
output_image_name='plot1.png'
input_filename="data.log"
input = open(input_filename, 'r')
input.close()
data = csv2rec(input_filename, names=['time', 'count'])
rcParams['figure.figsize'] = 10, 5
rcParams['font.size'] = 8
fig = plt.figure()
plt.plot(data['time'], data['count'])
ax = fig.add_subplot(111)
ax.plot(data['time'], data['count'])
hours = mdates.HourLocator()
fmt = mdates.DateFormatter('%Y%M%D%H')
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(fmt)
ax.grid()
plt.ylabel("Count")
plt.title("Count Log Per Hour")
fig.autofmt_xdate(bottom=0.2, rotation=90, ha='left')
plt.savefig(output_image_name)
I assume this has something to do with the date format. Any suggestions?

You need to convert the x-values to datetime objects
Something like:
time_vec = [datetime.strp(str(x),'%Y%m%d%H') for x in data['time']]
plot(time_vec,data['count'])
Currently, you are telling python to format integers (2012021305) as a date, which it does not know how to do, so it returns and empty string (although, I suspect that you are getting errors raised someplace).
You should also check your format string mark up.

Related

How to plot time on the y axis correctly using python matplotlib?

I have two lists containing the sunset and sunrise times and the corresponding dates.
It looks like:
sunrises = ['06:30', '06:28', '06:27', ...]
dates = ['3.21', '3.22', '3.23', ...]
I want to make a plot of the sunrise times as the Y axis and the dates as the X axis.
Simply using
ax.plot(dates, sunrises)
ax.xaxis.set_major_locator(matplotlib.ticker.MultipleLocator(7))
ax.yaxis.set_major_locator(matplotlib.ticker.MultipleLocator(7))
plt.show()
can plot the dates correctly, but the time is wrong:
And actually, the sunrise time isn't supposed to be a straight line.
How do I solve this problem?
You need to transform the datetime in string format to the format that matplotlib can comprehend by using datetime
from matplotlib import pyplot as plt
import matplotlib as mpl
from datetime import datetime
import matplotlib.dates as mdates
sunrises = ['06:30', '06:28', '06:27',]
sunrises_dt = [datetime.strptime(item,'%H:%M') for item in sunrises]
dates = ['3.21', '3.22', '3.23',]
fig,ax = plt.subplots()
ax.plot(dates, sunrises_dt)
ax.yaxis.set_major_formatter(mdates.DateFormatter('%H:%M',))
ax.xaxis.set_major_locator(mpl.ticker.MultipleLocator(1))
plt.show()
This is because your sunrises are not numerical. I'm assuming you'd want them in a form such that "6:30" means 6.5. Which is calculated below:
import matplotlib.pyplot as plt
sunrises = ['06:30', '06:28', '06:27']
# This converts to decimals
sunrises = [float(x[0:2])+(float(x[-2:])/60) for x in sunrises]
dates = ['3.21', '3.22', '3.23']
plt.plot(sunrises, dates)
plt.xlabel('sunrises')
plt.ylabel('dates')
plt.show()
Note, your dates are being treated as decimals. Is this correct?

Only show the first letter of the Month as label of a matplotlib datetime axis [duplicate]

I have time-series plots (over 1 year) where the months on the x-axis are of the form Jan, Feb, Mar, etc, but I would like to have just the first letter of the month instead (J,F,M, etc). I set the tick marks using
ax.xaxis.set_major_locator(MonthLocator())
ax.xaxis.set_minor_locator(MonthLocator())
ax.xaxis.set_major_formatter(matplotlib.ticker.NullFormatter())
ax.xaxis.set_minor_formatter(matplotlib.dates.DateFormatter('%b'))
Any help would be appreciated.
The following snippet based on the official example here works for me.
This uses a function based index formatter order to only return the first letter of the month as requested.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import matplotlib.cbook as cbook
import matplotlib.ticker as ticker
datafile = cbook.get_sample_data('aapl.csv', asfileobj=False)
print 'loading', datafile
r = mlab.csv2rec(datafile)
r.sort()
r = r[-365:] # get the last year
# next we'll write a custom formatter
N = len(r)
ind = np.arange(N) # the evenly spaced plot indices
def format_date(x, pos=None):
thisind = np.clip(int(x+0.5), 0, N-1)
return r.date[thisind].strftime('%b')[0]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(ind, r.adj_close, 'o-')
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
fig.autofmt_xdate()
plt.show()
I tried to make the solution suggested by #Appleman1234 work, but since I, myself, wanted to create a solution that I could save in an external configuration script and import in other programs, I found it inconvenient that the formatter had to have variables defined outside of the formatter function itself.
I did not solve this but I just wanted to share my slightly shorter solution here so that you and maybe others can take it or leave it.
It turned out to be a little tricky to get the labels in the first place, since you need to draw the axes, before the tick labels are set. Otherwise you just get empty strings, when you use Text.get_text().
You may want to get rid of the agrument minor=True which was specific to my case.
# ...
# Manipulate tick labels
plt.draw()
ax.set_xticklabels(
[t.get_text()[0] for t in ax.get_xticklabels(minor=True)], minor=True
)
I hope it helps:)
The original answer uses the index of the dates. This is not necessary. One can instead get the month names from the DateFormatter('%b') and use a FuncFormatter to use only the first letter of the month.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
from matplotlib.dates import MonthLocator, DateFormatter
x = np.arange("2019-01-01", "2019-12-31", dtype=np.datetime64)
y = np.random.rand(len(x))
fig, ax = plt.subplots()
ax.plot(x,y)
month_fmt = DateFormatter('%b')
def m_fmt(x, pos=None):
return month_fmt(x)[0]
ax.xaxis.set_major_locator(MonthLocator())
ax.xaxis.set_major_formatter(FuncFormatter(m_fmt))
plt.show()

Adding formatted dates as xticks in Matplotlib

I am trying to add a list of dates to Matplotlib xticks and when I do that the actual plot disappears keeping only xticks.
For example, I have the following code:
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.dates import (DateFormatter, rrulewrapper, RRuleLocator, YEARLY)
# Generate random data and dates
data = np.random.randn(10000)
start = dt.datetime.strptime("2019-03-14", "%Y-%m-%d")
end = dt.datetime.strptime("2046-07-30", "%Y-%m-%d")
date = [start + dt.timedelta(days=x) for x in range(0, (end-start).days)]
rule = rrulewrapper(YEARLY, byeaster=1, interval=2)
loc = RRuleLocator(rule)
formatter = DateFormatter('%d/%m/%y')
fig, ax = plt.subplots()
ax.xaxis.set_major_locator(loc)
ax.xaxis.set_major_formatter(formatter)
ax.xaxis.set_tick_params(rotation=30, labelsize=10)
plt.plot(data)
# ax.set_xlim(min(date), max(date))
plt.show()
This code plots the data which looks like this:
Now if I uncomment ax.set_xlim(min(date), max(date)) and rerun the code I get:
You can see that I only get the dates, formatted correctly but not the plot. I am not sure what the problem here. Any help would be appreciated.
Update
If I change data = np.random.randn(10000) to data = np.random.randn(1000000), then I am able to see the plot Which is not what I want
Most likely your data is plotted, but not at the correct location. If you go along that example you would need to add something like fig.autofmt_xdate() to your code.
The way to do this is by passing the date array along with data array in the plot method. That is with the given example it will be:
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.dates import (DateFormatter, rrulewrapper, RRuleLocator, YEARLY)
# Generate random data and dates
data = np.random.randn(10000)
start = dt.datetime.strptime("2019-03-14", "%Y-%m-%d")
end = dt.datetime.strptime("2046-07-30", "%Y-%m-%d")
date = [start + dt.timedelta(days=x) for x in range(0, (end-start).days)]
rule = rrulewrapper(YEARLY, byeaster=1, interval=2)
loc = RRuleLocator(rule)
formatter = DateFormatter('%d/%m/%y')
fig, ax = plt.subplots()
ax.xaxis.set_major_locator(loc)
ax.xaxis.set_major_formatter(formatter)
ax.xaxis.set_tick_params(rotation=30, labelsize=10)
plt.plot(date, data)
ax.set_xlim(min(date), max(date))
plt.show()
Then you'll get:
See matplotlib.pyplot.plot() for more information.

Matplotlib x-axis overlapping using time string

I want to create a plot from the following data:
timeArray= ['11:47:46.585', '11:47:46.695', '11:47:46.805', '11:47:46.915', '11:47:47.025', '11:47:47.135', '11:47:47.245', '11:47:47.355', '11:47:47.465', '11:47:47.575', '11:47:47.685', '11:47:47.795', '11:47:47.905', '11:47:48.015', '11:47:48.125', '11:47:48.235', '11:47:48.345', '11:47:48.455', '11:47:48.565', '11:47:48.675', '11:47:48.785', '11:47:48.895', '11:47:49.005', '11:47:49.115', '11:47:49.225', '11:47:49.335', '11:47:49.445', '11:47:49.555', '11:47:49.665', '11:47:49.775', '11:47:49.885', '11:47:49.995', '11:47:50.105', '11:47:50.215', '11:47:50.325', '11:47:50.435', '11:47:50.545', '11:47:50.655', '11:47:50.765', '11:47:50.875', '11:47:50.985', '11:47:51.095', '11:47:51.205', '11:47:51.315', '11:47:51.425', '11:47:51.535', '11:47:51.645', '11:47:51.755', '11:47:51.865', '11:47:51.975', '11:47:52.085', '11:47:52.195', '11:47:52.305', '11:47:52.415']
valueArray = [10382.0, 8372.0, 11117.0, 11804.0, 10164.0, 10221.0, 10488.0, 7910.0, 12911.0, 11422.0, 15361.0, 15424.0, 10629.0, 14993.0, 13827.0, 15164.0, 10514.0, 10356.0, 14638.0, 12272.0, 14980.0, 14391.0, 12984.0, 18967.0, 15792.0, 14753.0, 16205.0, 19187.0, 13922.0, 10787.0, 14500.0, 12918.0, 13985.0, 14695.0, 14014.0, 12087.0, 12163.0, 11424.0, 8598.0, 8573.0, 9986.0, 10315.0, 11449.0, 9146.0, 11160.0, 6861.0, 10211.0, 9097.0, 8443.0, 5446.0, 6354.0, 6829.0, 5786.0, 7860.0]
timeArray will be x-axis, valueArray will be y-axis.
plot line looks like this:
import matplotlib.pyplot as plt
plt.plot(timeArray,valueArray,'r', label='values over time')
And I'm getting this graph:
I have used: plt.gcf().autofmt_xdate(), but still getting one time over the next.
I have also tried:
xaxis = np.linspace(min(timeArray),max(timeArray), 10)
plt.xticks(xaxis)
but i got a typeError: ufunc 'multiply' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
Is there a simple way to keep the data as it is but without showing every single time with microseconds?
I'd suggest you convert your times to datetime objects rather than strings, and then use matplotlib.mdates.DateFormatter() with an appropriate date format:
import matplotlib.dates as mdates
import datetime
fmt = mdates.DateFormatter('%H:%M:%S')
timeArray = [datetime.datetime.strptime(i, '%H:%M:%S.%f') for i in timeArray]
fig, ax = plt.subplots()
plt.plot(timeArray,valueArray,'r', label='values over time')
ax.xaxis.set_major_formatter(fmt)
The result:
You can do so:
import datetime
import matplotlib.pyplot as plt
import pandas as pd
timeArray = pd.to_datetime(pd.Series(timeArray))
plt.plot(timeArray,valueArray,'r', label='values over time')
plt.show()
Output:
Or adding some rotation to the ticks:
plt.xticks(rotation=45) # or 90 to be vertical

Using pandas/matplotlib/python, I cannot visualize my csv file as clusters

My csv file is,
https://github.com/camenergydatalab/EnergyDataSimulationChallenge/blob/master/challenge2/data/total_watt.csv
I want to visualize this csv file as clusters.
My ideal result would be the following image.(Higher points (red zone) would be higher energy consumption and lower points (blue zone) would be lower energy consumption.)
I want to set x-axis as dates (e.g. 2011-04-18), y-axis as time (e.g. 13:22:00), and z-axis as energy consumption (e.g. 925.840613752523).
I successfully visualized the csv data file as values per 30mins with the following program.
from matplotlib import style
from matplotlib import pylab as plt
import numpy as np
style.use('ggplot')
filename='total_watt.csv'
date=[]
number=[]
import csv
with open(filename, 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in csvreader:
if len(row) ==2 :
date.append(row[0])
number.append(row[1])
number=np.array(number)
import datetime
for ii in range(len(date)):
date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')
plt.plot(date,number)
plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.show()
I also succeeded to visualize the csv data file as values per day with the following program.
from matplotlib import style
from matplotlib import pylab as plt
import numpy as np
import pandas as pd
style.use('ggplot')
filename='total_watt.csv'
date=[]
number=[]
import csv
with open(filename, 'rb') as csvfile:
df = pd.read_csv('total_watt.csv', parse_dates=[0], index_col=[0])
df = df.resample('1D', how='sum')
import datetime
for ii in range(len(date)):
date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')
plt.plot(date,number)
plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')
df.plot()
plt.show()
Although I could visualize the csv file as values per 30mins and per days, I do not have any idea to visualize the csv data as clusters in 3D..
How can I program it...?
Your main issue is probably just reshaping your data so that you have date along one dimension and time along the other. Once you do that you can use whatever plotting you like best (here I've used matplotlib's mplot3d, but it has some quirks).
What follows takes your data and reshapes it appropriately so you can then plot a surface that I believe is what your are looking for. The key is using the pivot method, which restructures your data by date and time.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
fname = 'total_watt.csv'
# Read in the data, but I skipped setting the index and made sure no data
# is lost to a nonexistent header
df = pd.read_csv(fname, parse_dates=[0], header=None, names=['datetime', 'watt'])
# We want to separate the date from the time, so create two new columns
df['date'] = [x.date() for x in df['datetime']]
df['time'] = [x.time() for x in df['datetime']]
# Now we want to reshape the data so we have dates and times making the result 2D
pv = df.pivot(index='time', columns='date', values='watt')
# Not every date has every time, so fill in the subsequent NaNs or there will be holes
# in the surface
pv = pv.fillna(0.0)
# Now, we need to construct some arrays that matplotlib will like for X and Y values
xx, yy = np.mgrid[0:len(pv),0:len(pv.columns)]
# We can now plot the values directly in matplotlib using mplot3d
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(xx, yy, pv.values, cmap='jet', rstride=1, cstride=1)
ax.grid(False)
# Now we have to adjust the ticks and ticklabels - so turn the values into strings
dates = [x.strftime('%Y-%m-%d') for x in pv.columns]
times = [str(x) for x in pv.index]
# Setting a tick every fifth element seemed about right
ax.set_xticks(xx[::5,0])
ax.set_xticklabels(times[::5])
ax.set_yticks(yy[0,::5])
ax.set_yticklabels(dates[::5])
plt.show()
This gives me (using your data) the following graph:
Note that I've assumed when plotting and making the ticks that your dates and times are linear (which they are in this case). If you have data with uneven samples, you'll have to do some interpolation before plotting.

Categories

Resources