I am unexperienced with Python and am trying to parse all timestamps of the following csv as datetime objects in order to then perform functions on them (e.g. find timestamp differences etc.).
However, I can parse single lines but not the whole timestamp column. I am getting a 'KeyError: '2010-12-30 14:32:00' for the first date of the timestamp column, when reaching the line below my 'not working' comment.
Thanks in advance.
from datetime import datetime, timedelta
import pandas as pd
from dateutil.parser import parse
csvFile = pd.read_csv('runningComplete.csv')
column = csvFile['timestamp']
column = column.str.slice(0, 19, 1)
print(column)
dt1 = datetime.strptime(column[1], '%Y-%m-%d %H:%M:%S')
print(dt1)
dt2 = datetime.strptime(column[2], '%Y-%m-%d %H:%M:%S')
print(dt1)
dt3 = dt1 - dt2
print(dt3)
for row in column:
print(row)
Not working:
for row in column:
timestamp = datetime.strptime(column[row], '%Y-%m-%d %H:%M:%S')
I have two dates in a csv file 9:20:00 AM and 4:09:21 PM and I need to read them in python. How do I find the time between these two dates?
There are various ways of reading CSV's in python. I would suggest you take a look at the official python csv library and pandas and its easy to use read_csv function.
As for finding out the time difference, you will have to parse the strings and find the difference like this:
from dateutil.parser import parse
a = "9:20:00 AM"
b = "4:09:21 PM"
a_obj = parse(a)
b_obj = parse(b)
time_diff = b_obj - a_obj
print time_diff.total_seconds()
You can use csv package of python to read and write csv files..
For example..
import csv
with open('eggs.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile)
for row in spamreader:
#Your Logic
Using the package read the two dates. This may be string, and convert it into python date object and then find the time between two dates.
a = first_date
b = second_date
c = b - a
divmod(c.days * 86400 + c.seconds, 60)
You would be interested in datetime module.
Try this in your interpreter.
>>> from datetime import datetime
>>> a = datetime.strptime('9:20:00 AM','%I:%M:%S %p')
>>> a
datetime.datetime(1900, 1, 1, 9, 20)
>>> b = datetime.strptime('4:09:21 PM','%I:%M:%S %p')
>>> b
datetime.datetime(1900, 1, 1, 16, 9, 21)
>>> c= b-a
>>> c
datetime.timedelta(0, 24561)
So your '9:20:00 AM' was converted into a datetime object. Similarly the other time.
NOTE: '%I:%M:%S %p' This is the format in which you tell strptime to convert the string into a datetime object. After you perform operations on the datetime objects you get a timedelta object.
datetime.timedelta(0, 24561)
You can get the appropriate days,seconds from it. However you can't get hours,minutes for that you have to perform simple maths
Here is a code for you,
import datetime
def days_hours_minutes(td):
return td.days, td.seconds//3600, (td.seconds//60)%60
a = datetime.datetime.strptime('9:20:00 AM','%I:%M:%S %p')
b = datetime.datetime.strptime('4:09:21 PM','%I:%M:%S %p')
c = datetime.timedelta(0, 24561)
days,hours,minutes = days_hours_minutes(c)
print("{0}:Days,{1}:Hours,{2}:Minutes".format(days,hours,minutes))
output:
0:Days,6:Hours,49:Minutes
I have defined a function days_hours_minutes() to print the days, hours,minutes.
I have about 100 csv files (at this moment, tomorrow will be more) in one location, updating every day with 24-40 new files. So, what is the best way to import files just from the past day, but the other way than this where I need to put file name:
data = pd.read_csv('/data/testingfile-PM_18707-2017_06_14-05_03_23__382.csv', delimiter = ';', low_memory=False)
data1 = pd.read_csv('/data/testingfile--PM_18707-2017_06_14-06_30_56__131.csv', delimiter = ';', low_memory=False)
Is it possible to write some timestamp recognition function?
from datetime import time
from datetime import date
from datetime import datetime
import fnmatch
def get_local_file(date, hour, path='data/'):
"""Get date+hour processing file from local drive
:param date: str Processing date
:param hour: str Processing hour
:param path: str Path to file location
:return: Pandas DF Retrieved DataFrame
"""
hour = [time(i).strftime(%H) for i in range(24)]
sdate = date.replace('-', '_') + "-" + str(hour)
for p_file in os.listdir(path):
if fnmatch.fnmatch(p_file, 'testingfile-PM*'+sdate+'*.csv'):
return pd.read_csv(path+p_file, delimiter=';')
I found something like this, but I can't make it work.
If you are looking for a way to extract date from the name of your csv file, then have a look at the pythons' datetime module (or strptime method, to be accurate). It allows you to parse the strings into datetimes like this:
from datetime import datetime
name = "data/testingfile-PM_18707-2017_06_14-05_03_23__382.csv"
datepart = name.strip("data/testingfile-PM_18707-").split("__")[0] #quick and dirty parsing method that satisfies the given two examples.
date = datetime.strptime(datepart,"%Y_%m_%d-%H_%M_%S")
print(datepart)
print(date)
2017_06_14-05_03_23
2017-06-14 05:03:23
So if you want to selectively open only 1 day old csvs, you could do something like this:
import glob
from datetime import datetime
now = datetime.now()
for csv in glob.glob("data/*.csv"):
datepart = csv.strip("data/testingfile-PM_18707-").split("__")[0]
date = datetime.strptime(datepart, "%Y_%m_%d-%H_%M_%S")
if (now - date).total_seconds() < 3600*24:
pd.read_csv(csv)
else:
print("Too old to care!")
Note that this has nothing to do with Pandas itself.
import csv
import pandas as pd
from datetime import datetime,time,date
from pandas.io.data import DataReader
fd = pd.read_csv('c:\\path\\to\\file.csv')
fd.columns = ['Date','Time']
datex = fd.Date
timex = fd.Time
timestr = datetime.strptime ( str(datex+" "+timex) , "%m/%d/%Y %H:%M")
So, what I'm trying to do is pass columns Date and Time to datetime. There are two columns, date and time containing, obviously, the date and time. But when I try the above method, I receive this error:
\n35760 08/07/2015 04:56\n35761 08/07/2015 04:57\n35762 08/07/2015 04:58\n35763 08/07/2015 04:59\ndtype: object' does not match format '%m/%d/%Y %H:%M'
So, how do I either strip or remove \nXXXXX from datex and timex? Or otherwise match the format?
# concatenate two columns ( date and time ) into one column that represent date time now into one columns
datetime = datex + timex
# remove additional characters and spaces from newly created datetime colummn
datetime = datetime.str.replace('\n\d+' , '').str.strip()
# then string should be ready to be converted to datetime easily
pd.to_datetime(datetime , format='%m/%d/%Y%H:%M')
Use pandas built-in parse_dates function :)
pd.read_csv('c:\\path\\to\\file.csv', parse_dates=True)
I am trying to read date stamps in from a data logger and use these dates in plots. I have been playing with matplotlib dates date2num, datestr2num, and datetime but I keep getting formatting errors and am having trouble finding what the correct syntax and keywords are to do this (and also what they mean). I have been reading through the matplotlib help with not much luck. If you have any help or a better way to read in this information I would love the feedback.
import numpy as n
import matplotlib.pyplot as p
import matplotlib.dates as d
import datetime as dt
fileobj=open("filename",'r')
data=fileobj.readlines()
fileobj.close()
time=n.empty(len(data))
for i in range(len(data)):
strings=data[i].split(',')
if i >5:
some_time_dt = dt.datetime.strptime(str(strings[0]), '%Y-%m-%d %H:%M:%S')
time = d.date2num(some_time_dt)
Example data:
"2013-02-28 16:53:30",1588,11.85,24.35,22.93,24.1,25.05,22.06,22.2,30.94,21.99,22.7,21.91,22.02,21.79 ,21.72
"2013-02-28 16:53:31",1589,11.85,24.35,23,24.12,25.05,22.09,22.25,31.19,21.97,22.71,21.91,22.02,21.78 ,21.72
"2013-02-28 16:53:32",1590,11.85,24.35,22.98,24.12,25.05,22.12,22.3,31.35,21.98,22.68,21.9,22.01,21.7 4,21.69
"2013-02-28 16:53:33",1591,11.85,24.35,22.95,24.14,25.06,22.15,22.33,31.49,21.96,22.67,21.87,22,21.73 ,21.66
March 20,2013
I was able to get this to plot but I need to know how to get rid of the UTC label that prints as the time is not in UTC but in PST. I would prefer to just not show a timezone at all.
A simple solution would be to parse the file twice, once for the dates and once for the data:
import numpy as np
import datetime as dt
D = np.loadtxt("filename",delimiter=",",usecols=[0],dtype="str")
Z = np.loadtxt("filename",delimiter=",",usecols=range(1,10))
DATES = [dt.datetime.strptime(d,'"%Y-%m-%d %H:%M:%S"') for d in D]
You could also use the converters argument to pass a lambda function to loadtxt() so that it does the string to datetime object conversion for you. It doesn't save you any lines of code, I'm just noting it for a bit of variety:
datey = lambda x: dt.datetime.strptime(x,'"%Y-%m-%d %H:%M:%S"')
D = np.loadtxt("filename",delimiter=",",usecols=[0],
dtype=dt.datetime,converters={0:datey})
Z = np.loadtxt("filename",delimiter=",",usecols=range(1,10))
It sounds like you are having errors with parsing the dates. There may be a problem with the date format string you are using.
There is a table with all the options available for datetime parsing here:
http://docs.python.org/2/library/datetime.html#strftime-strptime-behavior
I normally do this kind of thing with pandas (see my comment above) but here's a rough solution using Python's built-in CSV module.
import csv
import datetime
data = []
for row in csv.reader(open('file.txt')):
row[0] = datetime.datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S') # Parse date.
row[1:] = map(float, row[1:]) # Convert data from strings to floats.
data.append(row)
There are fancier ways, but is a straightforward approach.
Running this on the data above, I get
[[datetime.datetime(2013, 2, 28, 16, 53, 30), 1588.0, 11.85...], ...]
You need to strip the " from you time string!
If there's only one time per line , i would change the for loop for something like this:
import time as time_module # there's is a var named time
lineNumber = 0
for line in data:
lineNumber += 1
if line <= 5:
continue # skip the first 5 lines
line = line.split(',')
timeString = line[0].strip('"')
print timeString
print time_module.strptime(timeString, '%Y-%m-%d %H:%M:%S')
time = d.date2num(time_module.strptime(timeString, '%Y-%m-%d %H:%M:%S'))