I am trying to read dates from a txt file and have that converted to datetime format
Code:
from datetime import datetime, date
with open("birth.txt") as f:
content = f.readlines()
content = [x.strip() for x in content]
for i in content:
a = i.split(":")
date_b = []
date_b.append(a[-1])
print date_b
for j in date_b:
date_object = datetime.strptime(str(j), '%m-%d-%Y')
print date_object
Text File:
a:11-23-2001
b:02-14-2002
ValueError: time data ' 11-23-2001' does not match format '%m-%d-%Y'
Can someone help me resolve this error?
There are multiple problematic parts with your code. The error is caused by having a space before your date string although I'm not sure where it comes from given your file. Also, why are you even having the second loop? And you're overwriting the date_b in each line loop... Try this:
from datetime import datetime
with open("birth.txt") as f:
dates = [] # store this outside of your loop
for line in f: # read line by line
v, d = line.strip().split(":")
d = datetime.strptime(d.strip(), '%m-%d-%Y') # just in case of additional whitespace
dates.append((v, d))
print(dates)
# [('a', datetime.datetime(2001, 11, 23, 0, 0)), ('b', datetime.datetime(2002, 2, 14, 0, 0))]
You can turn the latter into a dictionary, too (dict(dates)) or build a dictionary immediately.
Besides the issues that #zwer points out, you have the major inefficiency of reading the entire file into memory before processing it. This actually makes your job harder than it needs to be because files in Python are iterable over their lines. You can do something like:
from datetime import datetime, date
with open('birth.txt') as f:
for line in f:
key, datestr = line.strip().split(':')
dateobj = datetime.strptime(datestr, '%m-%d-%Y')
print(dateobj)
Using the fact that the file is an iterator, you can write a one-line list comprehension to generate a full list of dates:
with open('birth.txt') as f:
dates = [datetime.strptime(line.strip().split(':')[1], '%m-%d-%Y') for line in f]
If the key has some significance, you can create a dictionary with a dictionary comprehension using a similar syntax:
with open('birth.txt') as f:
dates = {key: datetime.strptime(datestr, '%m-%d-%Y') for key, datestr in (line.strip().split(':') for line in f)}
Related
I am working in python and using a JSON file and pulling info from it and sending to a csv file. The code I am using is as follows:
import csv
import json
csv_kwargs = {
'dialect': 'excel',
'doublequote': True,
'quoting': csv.QUOTE_MINIMAL
}
inpfile = open('checkin.json', 'r', encoding='utf-8')
outfile = open('checkin.csv', 'w', encoding='utf-8')
writer = csv.writer(outfile, **csv_kwargs, lineterminator="\n")
for line in inpfile:
d = json.loads(line)
writer.writerow([d['business_id'],d['date']])
inpfile.close()
outfile.close()
checkin.json key values of business_id and date. The date values are in the form of 'MM:DD:YYYY HH:MM:SS' where it shows the date and then the time. Each business_id includes multiple dates associated with it. I included a line of the JSON file to show how each 'business_id' works and the dates associated with it. A line from the JSON is shown below:
{"business_id":"--1UhMGODdWsrMastO9DZw","date":"2016-04-26 19:49:16, 2016-08-30 18:36:57, 2016-10-15 02:45:18, 2016-11-18 01:54:50, 2017-04-20 18:39:06, 2017-05-03 17:58:02"}
My question is how do you code this to keep the date, but not the time being that they are in the same key value.
You can parse the date in your JSON as a timestamp and then truncate it to date using Python's built-in datetime module.
Import the module:
from datetime import datetime
Parse the date while writing:
for line in inpfile:
d = json.loads(line)
dates = map(lambda dt: datetime.strptime(dt.strip(), '%Y-%m-%d %H:%M:%S').strftime('%Y-%m-%d'), d['dates'].split(' '))
for date in dates:
writer.writerow([d['business_id'], date])
The formatting for date values described in you question isn't consistent, first you say it's MM:DD:YYYY, however in the sample line from the json input file it appears to be YYYY-MM-DD, and while such details may matter, that particular one doesn't to the revised code below. What did matter was the fact that there can be more than one, which is why I'm updating my answer.
import csv
import json
csv_kwargs = {
'dialect': 'excel',
'doublequote': True,
'quoting': csv.QUOTE_MINIMAL,
}
with open('checkin.json', 'r', encoding='utf-8') as inpfile, \
open('checkin.csv', 'w', encoding='utf-8', newline='') as outfile:
writer = csv.writer(outfile, **csv_kwargs)
for line in inpfile:
d = json.loads(line)
# Convert date value string into list of dates with the times removed.
dates = [date.strip().split(' ')[0] for date in d['date'].split(',')]
writer.writerow([d['business_id']] + dates)
If you're strictly using this program to convert the json file to csv, you can simply use string slices:
date, time = d['date'][:12], d['date'][12:]
If you want to store it as a datetime object to do something else
dt = time.strptime(d['date'], "'%m:%d:%Y''%H:%M:%S'")
# Other stuff
dt_string = dt.strftime("'%m:%d:%Y'")
I need to remove the day in date and I tried to use datetime.strftime and datetime.strptime but it couldn't work. I need to create a tuple of 2 items(date,price) from a nested list but I need to change the date format first.
here's part of the code:
def get_data(my_csv):
with open("my_csv.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter = (','))
next(csv_reader)
data = []
for line in csv_reader:
data.append(line)
return data
def get_monthly_avg(data):
oldformat = '20040819'
datetimeobject = datetime.strptime(oldformat,'%y%m%d')
newformat = datetime.strftime('%y%m ')
You miss print with date formats. 'Y' has to be capitalized.
from datetime import datetime
# use datetime to convert
def strip_date(data):
d = datetime.strptime(data,'%Y%m%d')
return datetime.strftime(d,'%Y%m')
data = '20110513'
print (strip_date(data))
# or just cut off day (2 last symbols) from date string
print (data[:6])
The first variant is better because you can verify that string is in proper date format.
Output:
201105
201105
You didnt specify any code, but this might work:
date = functionThatGetsDate()
date = date[0:6]
I have a file that consists of a bunch of lines that have dates in them, for example:
1, '01-JAN-10', '04-JAN-10', 100, 'HELEN', 'PRICE'
2, 'MARK', 'TYER', '05-JAN-10', '06-JAN-10', 120
I want to change the date parts of the lines to a different format, but I don't know how to detect which part of the line has the date fields and I don't know how to replace them with the new date format. I already have a function called changeDate(date) that returns a correctly formatted date given a bad format date. This is my code so far:
def editFile(filename)
f = open(filename)
while line:
line = f.readline()
for word in line.split():
#detect if it is a date, and change to new format
f.close()
You can use strptime and try/catch to do this:
strptime
Return a datetime corresponding to date_string, parsed according to
format.
See more details from strftime() and strptime() Behavior.
from datetime import datetime
s="1, '01-JAN-10', '04-FEB-28', 100, 'HELEN', 'PRICE'"
for word in s.replace(' ','').replace('\'','').split(','):
try:
dt=datetime.strptime(word,'%y-%b-%d')
print('{0}/{1}/{2}'.format(dt.month, dt.day, dt.year))
except Exception as e:
print(word)
Result:
1
1/10/2001
2/28/2004
100
HELEN
PRICE
You can use regex to detect. It's hard to modify the file in place, maybe you could write all the new contents to a new file.
import re
with open('filename', 'r') as f:
input_file = f.read()
# input_file = "1, '01-JAN-10', '04-JAN-10', 100, 'HELEN', 'PRICE'"
dates = re.findall(r'\d+-[A-Za-z]+-\d+', input_file) # output: ['01-JAN-10', '04-JAN-10']
for old in dates:
input_file.replace(old, changeDate(old)) # your changeDate(date) in your question
with open('new_file', 'w+') as f:
f.write(input_file)
below is some code written to open a CSV file. Its values are stored like this:
03/05/2017 09:40:19,21.2,35.0
03/05/2017 09:40:27,21.2,35.0
03/05/2017 09:40:38,21.1,35.0
03/05/2017 09:40:48,21.1,35.0
This is just a snippet of code I use in a real time plotting program, which fully works but the fact that the array is getting so big is unclean. Normally new values get added to the CSV while the program is running and the length of the arrays is very high. Is there a way to not have exploding arrays like this?
Just run the program, you will have to make a CSV with those values too and you will see my problem.
from datetime import datetime
import time
y = [] #temperature
t = [] #time object
h = [] #humidity
def readfile():
readFile = open('document.csv', 'r')
sepFile = readFile.read().split('\n')
readFile.close()
for idx, plotPair in enumerate(sepFile):
if plotPair in '. ':
# skip. or space
continue
if idx > 1: # to skip the first line
xAndY = plotPair.split(',')
time_string = xAndY[0]
time_string1 = datetime.strptime(time_string, '%d/%m/%Y %H:%M:%S')
t.append(time_string1)
y.append(float(xAndY[1]))
h.append(float(xAndY[2]))
print([y])
while True:
readfile()
time.sleep(2)
This is the output I get:
[[21.1]]
[[21.1, 21.1]]
[[21.1, 21.1, 21.1]]
[[21.1, 21.1, 21.1, 21.1]]
[[21.1, 21.1, 21.1, 21.1, 21.1]]
Any help is appreciated.
You can use Python's deque if you also want to limit the total number of entries you wish to keep. It produces a list which features a maximum length. Once the list is full, any new entries push the oldest entry off the start.
The reason your list is growing is that you need to re-read your file up to the point of you last entry before continuing to add new entries. Assuming your timestamps are unique, you could use takewhile() to help you do this, which reads entries until a condition is met.
from itertools import takewhile
from collections import deque
from datetime import datetime
import csv
import time
max_length = 1000 # keep this many entries
t = deque(maxlen=max_length) # time object
y = deque(maxlen=max_length) # temperature
h = deque(maxlen=max_length) # humidity
def read_file():
with open('document.csv', newline='') as f_input:
csv_input = csv.reader(f_input)
header = next(csv_input) # skip over the header line
# If there are existing entries, read until the last read item is found again
if len(t):
list(takewhile(lambda row: datetime.strptime(row[0], '%d/%m/%Y %H:%M:%S') != t[-1], csv_input))
for row in csv_input:
print(row)
t.append(datetime.strptime(row[0], '%d/%m/%Y %H:%M:%S'))
y.append(float(row[1]))
h.append(float(row[2]))
while True:
read_file()
print(t)
time.sleep(1)
Also, it is easier to work with the entries using Python's built in csv library to read each of the values into a list for each row. As you have a header row, read this in using next() before starting the loop.
I have a list of blog posts with two columns. The date they were created and the unique ID of the person creating them.
I want to return the date of the most recent blog post for each unique ID. Simple, but all of the date values are stored in strings. And all of the strings don't have a leading 0 if the month is less than 10.
I've been struggling w/ strftime and strptime but can't get it to return effectively.
import csv
Posters = {}
with open('datetouched.csv','rU') as f:
reader = csv.reader(f)
for i in reader:
UID = i[0]
Date = i[1]
if UID in Posters:
Posters[UID].append(Date)
else:
Posters[UID] = [Date]
for i in Posters:
print i, max(Posters[i]), Posters[i]
This returns the following output
0014000000s5NoEAAU 7/1/10 ['1/6/14', '7/1/10', '1/18/14', '1/24/14', '7/1/10', '2/5/14']
0014000000s5XtPAAU 2/3/14 ['1/4/14', '1/10/14', '1/16/14', '1/22/14', '1/28/14', '2/3/14']
0014000000vHZp7AAG 2/1/14 ['1/2/14', '1/8/14', '1/14/14', '1/20/14', '1/26/14', '2/1/14']
0014000000wnPK6AAM 2/2/14 ['1/3/14', '1/9/14', '1/15/14', '1/21/14', '1/27/14', '2/2/14']
0014000000d5YWeAAM 2/4/14 ['1/5/14', '1/11/14', '1/17/14', '1/23/14', '1/29/14', '2/4/14']
0014000000s5VGWAA2 7/1/10 ['7/1/10', '1/7/14', '1/13/14', '1/19/14', '7/1/10', '1/31/14']
It's returning 7/1/2010 because that # is larger than 1. I need the max value of the list returned as the exact same string value.
I'd convert the date to a datetime when loading, and store the results in a defaultdict, eg:
import csv
from collections import defaultdict
from datetime import datetime
posters = defaultdict(list)
with open('datetouched.csv','rU') as fin:
csvin = csv.reader(fin)
items = ((row[0], datetime.strptime(row[1], '%m/%d/%y')) for row in csvin)
for uid, dt in items:
posters[uid].append(dt)
for uid, dates in posters.iteritems():
# print uid, list of datetime objects, and max date in same format as input
print uid, dates, '{0.month}/{0.day}/%y'.format(max(dates))
Parse the dates with datetime.datetime.strptime(), either when loading the CSV or as a key function to max().
While loading:
from datetime import datetime
Date = datetime.strptime(i[1], '%m/%d/%y')
or when using max():
print i, max(Posters[i], key=lambda d: datetime.strptime(d, '%m/%d/%y')), Posters[i]
Demo of the latter:
>>> from datetime import datetime
>>> dates = ['1/6/14', '7/1/10', '1/18/14', '1/24/14', '7/1/10', '2/5/14']
>>> max(dates, key=lambda d: datetime.strptime(d, '%m/%d/%y'))
'2/5/14'
Your code can be optimised a little:
import csv
posters = {}
with open('datetouched.csv','rb') as f:
reader = csv.reader(f)
for row in reader:
uid, date = row[:2]
posters.setdefault(uid, []).append(datetime.strptime(date, '%d/%m/%y'))
for uid, dates in enumerate(posters.iteritems()):
print i, max(dates), dates
The dict.setdefault() method sets a default value (an empty list here) whenever the key is not present yet.