I'm using matplotlib to plot some data imported from CSV files. These files have the following format:
Date,Time,A,B
25/07/2016,13:04:31,5,25550
25/07/2016,13:05:01,0,25568
....
01/08/2016,19:06:43,0,68425
The dates are formatted as they would be in the UK, i.e. %d/%m/%Y. The end result is to have two plots: one of how A changes with time, and one of how B changes with time. I'm importing the data from the CSV like so:
import matplotlib
matplotlib.use('Agg')
from matplotlib.mlab import csv2rec
import matplotlib.pyplot as plt
from datetime import datetime
import sys
...
def analyze_log(file, y):
data = csv2rec(open(file, 'rb'))
fig = plt.figure()
date_vec = [datetime.strptime(str(x), '%Y-%m-%d').date() for x in data['date']]
print date_vec[0]
print date_vec[len(date_vec)-1]
time_vec = [datetime.strptime(str(x), '%Y-%m-%d %X').time() for x in data['time']]
print time_vec[0]
print time_vec[len(time_vec)-1]
datetime_vec = [datetime.combine(d, t) for d, t in zip(date_vec, time_vec)]
print datetime_vec[0]
print datetime_vec[len(datetime_vec)-1]
y_vec = data[y]
plt.plot(datetime_vec, y_vec)
...
# formatters, axis headers, etc.
...
return plt
And all was working fine before 01 August. However, since then, matplotlib is trying to plot my 01/08/2016 data points as 2016-01-08 (08 Jan)!
I get a plotting error because it tries to plot from January to July:
RuntimeError: RRuleLocator estimated to generate 4879 ticks from 2016-01-08 09:11:00+00:00 to 2016-07-29 16:22:34+00:00:
exceeds Locator.MAXTICKS * 2 (2000)
What am I doing wrong here? The results of the print statements in the code above are:
2016-07-25
2016-01-08 #!!!!
13:04:31
19:06:43
2016-07-25 13:04:31
2016-01-08 19:06:43 #!!!!
Matplotlib's csv2rec function parses your dates already and tries to be intelligent when it comes to parsing dates. The function has two options to influence the parsing, dayfirst should help here:
dayfirst: default is False so that MM-DD-YY has precedence over DD-MM-YY.
yearfirst: default is False so that MM-DD-YY has precedence over YY-MM-DD.
See http://labix.org/python-dateutil#head-b95ce2094d189a89f80f5ae52a05b4ab7b41af47 for further information.
You're using strings in %d/%m/%Y format but you've given the format specifier as %Y-%m-%d.
Related
I have to make spatial plots from a bunch of WRFout files that I have. Currently, I am using following lines of code to print the respective times for each spatial plot
#..Load packages
import os
import netCDF4
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap,addcyclic,cm,shiftgrid
from wrf import getvar,get_basemap,to_np,latlon_coords
#..Read the files
fpath = sorted(glob.glob("/path/wrfout_d01_2017-03-02_00:00:00"))
with netCDF4.Dataset(fpath, 'r') as fin:
#..Read variables
p = getvar(fin,'pressure')
times = getvar(fin,'times',meta=False)
#..Make the pressure plot
fig = plt.figure()
mp = get_basemap(p)
x,y = mp(to_np(lons),to_np(lats))
cntrs = mp.contourf(x,y,p,cmap='jet')
plt.title(str(to_np(times))[0:-10])
plt.show()
The times variable gives time in the format 2017-03-02T00:00:00.000000000.
The line of code plt.title(str(to_np(times))[0:-10]) prints the time as 2017-03-02T00:00:00, which is a UTC time. But, I want it to be printed as 2017-03-01 17:00:00, which is the local time (UTC- 7 hours).
Thanks in advance, any suggestions will be highly appreciated.
You can use pandas to do the conversion.You can choose the timezone that works for you.
Just added the snippet thats useful.
import pandas as pd
#..Read variables
...
times = getvar(fin,'times',meta=False)
mountainTime = pd.Timestamp(times,tz='US/Mountain')
#..Make the pressure plot
...
plt.title(str(mountainTime)[0:-6])
This might help.
import datetime
dt=datetime.datetime.strptime("2017-03-02T00:00:00", "%Y-%m-%dT%H:%M:%S") #Get your datetime object
dt=dt.replace(tzinfo=datetime.timezone.utc) #Convert it to an aware datetime object in UTC time.
print(dt) #You do not need this line. For show only :P
dt=dt.astimezone() #Convert it to your local timezone
print(dt.strftime("%Y-%m-%d %H:%M:%S"))
Output:
2017-03-02 00:00:00+00:00
2017-03-02 05:30:00
Now my timezone is UTC+5:30 (India). So, showing that. Yours should give your.
I am reading in data from a text file which contains data in the format (date time; microVolts):
e.g. 07.03.2017 23:14:01,000; 279
And I wish to plot a graph using matplotlib by capturing only the time (x-axis) and plotting it against microVolts (y-axis). So far, I've managed to extract the time element from the string and convert it into datetime format (shown below).
I tried to append each value of time into x to plot, but the program just freezes and displays nothing.
Here is part of the code:
from datetime import datetime
import matplotlib.pyplot as plt
ecg = open(file2).readlines()
x = []
for line in range(len(ecg)):
ecgtime = ecg[7:][line][:23]
ecgtime = datetime.strptime(ecgtime, '%d.%m.%Y %H:%M:%S,%f')
x.append(ecgtime.time())
I'm aware the datetime format is causing the issue but I can't figure out how to convert it into float/int as it says:
'invalid literal for float(): 23:14:01,000'
I have no reputation for comment than I have to answer.
datetime.datetime.time() converts to datetime.time object, you need float.
Could you try datetime.datetime.timestamp()?
See last line:
from datetime import datetime
import matplotlib.pyplot as plt
ecg = open(file2).readlines()
x = []
for line in range(len(ecg)):
ecgtime = ecg[7:][line][:23]
ecgtime = datetime.strptime(ecgtime, '%d.%m.%Y %H:%M:%S,%f')
x.append(ecgtime.timestamp())
EDIT: timestamp() is available sine Python 3.3. For Python 2 you can use
from time import mktime
...
x.append(mktime(ecgtime.timetuple()))
I'm trying to evaluate some quadcopter flight data and got some log-files with epoch timestamps.
I then converted them to datetime values (with pd.to_datetime([...], unit='ms')) and truncated some digits.
My problem is, that I actually don't need the dates, which also makes plotting the data a lot more complicated (unwanted xtick distances, error inducing matplotlib.dates locators, etc).
Now I'm left with the following index:
2019-09-13 10:09:16.200,...
2019-09-13 10:09:16.300,...
2019-09-13 10:09:16.400,...
...
2019-09-13 10:12:18.300,...
My imports:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import glob
import os.path
from datetime import datetime
from mpl_toolkits.mplot3d import Axes3D
My data input/initialization:
data = pd.read_csv(s,',',index_col=0) # Commands
data2 = pd.read_csv(s2,',',index_col=0) # Observations
d1 = data[data['field.handle']==d].drop(['field.handle','Commanded alpha','Commanded beta','Commanded gamma'], axis=1)
d2 = data2[data2['field.handle']==d].drop(['field.handle','Observed alpha','Observed beta','Observed gamma'], axis=1)
merged_data = pd.concat([d1,d2], axis=1, sort=False)
merged_data.index = truncate_index(merged_data)
filled_merge = merged_data.groupby(merged_data.index).mean().fillna(method='ffill')
finished_merge = filled_merge.copy().dropna()
deviations = finished_merge.copy()
My plot code (sometimes working, sometimes not - depends on data, locators and formatter)
myFmt = mdates.DateFormatter('%M')
ax = deviations.plot(figsize=(14,9), use_index=True, y=['Positional x deviation','Positional y deviation','Positional z deviation'], subplots=True, sharex=True, layout=(3,1))
for axis in ax:
for axi in axis:
axi.set(xlabel = "Time in minutes (minor ticks in seconds)", ylabel="Deviation in meters")
axi.xaxis.set_major_formatter(myFmt)
axi.xaxis.set_minor_locator(mdates.SecondLocator())
axi.xaxis.set_major_locator(mdates.MinuteLocator())
plt.suptitle(plot_title, fontsize=14)
plt.subplots_adjust(top=0.92)
It would be more beneficial for my work, I think, if the index could be in milliseconds (or fractions of a second - e.g., a float value) - starting at the first datetime value, like for example:
2019-09-13 10:09:16.200(first entry) will become 0 or 0.0, where the second entry would change from 2019-09-13 10:09:16.300 to 0.1.
I sadly can not drop the index altogether and just numerate with the row count, as there are some gaps in the datetimes for, for example, 300 milliseconds, that I want to preserve.
I tried different things to plot my data consistently, but in the end nothing worked and I hope a new approach with a new index will solve my problem(s)...
I also looked at possible candidates in the pandas and matplotlib API (frome timedeltas to date2num, etc.) to enable my envisioned index-transformation, but to no evail. Probably because I'm not really sure what the correct terminology would be for this 'transformation'.
Any help is really appreciated!
If your index looks like this:
>>> d = ['2019-09-13 10:09:16.200',
'2019-09-13 10:09:16.300',
'2019-09-13 10:09:16.400',
'2019-09-13 10:12:18.300']
>>> s = pd.Series([pd.Timestamp(thing) for thing in d])
>>> s
0 2019-09-13 10:09:16.200
1 2019-09-13 10:09:16.300
2 2019-09-13 10:09:16.400
3 2019-09-13 10:12:18.300
dtype: datetime64[ns]
>>>
You can create a timedelta series and get total seconds relative to the first item. And use it.
>>> a = s - s[0]
>>> a
0 00:00:00
1 00:00:00.100000
2 00:00:00.200000
3 00:03:02.100000
dtype: timedelta64[ns]
>>> a.dt.total_seconds()
0 0.0
1 0.1
2 0.2
3 182.1
dtype: float64
>>>
I am trying to plot histogram of percentage change in stock
my code looks like:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv("M:/Trading/1.JOZO/ALXN.csv")
dataframe = (data['Adj Close'])
zmena1 = (dataframe.pct_change(periods = 1)*100)
data["Zmena"] = zmena1
plt.hist(zmena1, bins = "auto", range = "auto" )
plt.show
but i get an error:
mn, mx = [mi + 0.0 for mi in range]
TypeError: Can't convert 'float' object to str implicitly
I tried str(zmena1) but can to get it...
Do not know how to move through this one...
From the name of the csv file, I can guess that your data can be retrieved from Yahoo finance, so using the Remote Access datareader I'm downloading all 2016 data to play with:
import datetime
data = web.DataReader('ALXN', data_source='yahoo',
start=datetime.datetime(2016, 1, 1))
Now I can calculate the percent change in the [0,100] range
data['Zmena'] = data['Adj Close'].pct_change(periods=1)*100
From there, I would definitely use the built-in DataFrame.hist function:
data['Zmena'].hist()
Using plt.hist
In case you do want to use plt.hist instead, you need to filter out the NaN(not a number), in particular you will always have one as the first entry:
print(data[['Adj Close','Zmena']].head())
Adj Close Zmena
Date
2016-01-04 184.679993 NaN
2016-01-05 184.899994 0.119126
2016-01-06 184.070007 -0.448884
2016-01-07 174.369995 -5.269741
2016-01-08 168.130005 -3.578592
So, in order to use plt.hist:
plt.hist(data.Zmena.dropna())
Another problem is that you're specifying bins = "auto", range = "auto", when really you should just not pass them if you want to default. See the documentation for both parameters at pyplot.hist
I have list of timestamps in the format of HH:MM:SS and want to plot against some values using datetime.time. Seems like python doesn't like the way I do it. Can someone please help ?
import datetime
import matplotlib.pyplot as plt
# random data
x = [datetime.time(12,10,10), datetime.time(12, 11, 10)]
y = [1,5]
# plot
plt.plot(x,y)
plt.show()
*TypeError: float() argument must be a string or a number*
Well, a two-step story to get 'em PLOT really nice
Step 1: prepare data into a proper format
from a datetime to a matplotlib convention compatible float for dates/times
As usual, devil is hidden in detail.
matplotlib dates are almost equal, but not equal:
# mPlotDATEs.date2num.__doc__
#
# *d* is either a class `datetime` instance or a sequence of datetimes.
#
# Return value is a floating point number (or sequence of floats)
# which gives the number of days (fraction part represents hours,
# minutes, seconds) since 0001-01-01 00:00:00 UTC, *plus* *one*.
# The addition of one here is a historical artifact. Also, note
# that the Gregorian calendar is assumed; this is not universal
# practice. For details, see the module docstring.
So, highly recommended to re-use their "own" tool:
from matplotlib import dates as mPlotDATEs # helper functions num2date()
# # and date2num()
# # to convert to/from.
Step 2: manage axis-labels & formatting & scale (min/max) as a next issue
matplotlib brings you arms for this part too.
Check code in this answer for all details
It is still valid issue in Python 3.5.3 and Matplotlib 2.1.0.
A workaround is to use datetime.datetime objects instead of datetime.time ones:
import datetime
import matplotlib.pyplot as plt
# random data
x = [datetime.time(12,10,10), datetime.time(12, 11, 10)]
x_dt = [datetime.datetime.combine(datetime.date.today(), t) for t in x]
y = [1,5]
# plot
plt.plot(x_dt, y)
plt.show()
By deafult date part should not be visible. Otherwise you can always use DateFormatter:
import matplotlib.dates as mdates
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H-%M-%S'))
I came to this page because I have a similar issue. I have a Pandas DataFrame df with a datetime column df.dtm and a data column df.x, spanning several days, but I want to plot them using matplotlib.pyplot as a function of time of day, not date and time (datetime, datetimeindex). I.e., I want all data points to be folded into the same 24h range in the plot. I can plot df.x vs. df.dtm without issue, but I've just spent two hours trying to figure out how to convert df.dtm to df.time (containing the time of day without a date) and then plotting it. The (to me) straightforward solution does not work:
df.dtm = pd.to_datetime(df.dtm)
ax.plot(df.dtm, df.x)
# Works (with times on different dates; a range >24h)
df['time'] = df.dtm.dt.time
ax.plot(df.time, df.x)
# DOES NOT WORK: ConversionError('Failed to convert value(s) to axis '
matplotlib.units.ConversionError: Failed to convert value(s) to axis units:
array([datetime.time(0, 0), datetime.time(0, 5), etc.])
This does work:
pd.plotting.register_matplotlib_converters() # Needed to plot Pandas df with Matplotlib
df.dtm = pd.to_datetime(df.dtm, utc=True) # NOTE: MUST add a timezone, even if undesired
ax.plot(df.dtm, df.x)
# Works as before
df['time'] = df.dtm.dt.time
ax.plot(df.time, df.x)
# WORKS!!! (with time of day, all data in the same 24h range)
Note that the differences are in the first two lines. The first line allows better collaboration between Pandas and Matplotlib, the second seems redundant (or even wrong), but that doesn't matter in my case, since I use a single timezone and it is not plotted.