I am trying to read date stamps in from a data logger and use these dates in plots. I have been playing with matplotlib dates date2num, datestr2num, and datetime but I keep getting formatting errors and am having trouble finding what the correct syntax and keywords are to do this (and also what they mean). I have been reading through the matplotlib help with not much luck. If you have any help or a better way to read in this information I would love the feedback.
import numpy as n
import matplotlib.pyplot as p
import matplotlib.dates as d
import datetime as dt
fileobj=open("filename",'r')
data=fileobj.readlines()
fileobj.close()
time=n.empty(len(data))
for i in range(len(data)):
strings=data[i].split(',')
if i >5:
some_time_dt = dt.datetime.strptime(str(strings[0]), '%Y-%m-%d %H:%M:%S')
time = d.date2num(some_time_dt)
Example data:
"2013-02-28 16:53:30",1588,11.85,24.35,22.93,24.1,25.05,22.06,22.2,30.94,21.99,22.7,21.91,22.02,21.79 ,21.72
"2013-02-28 16:53:31",1589,11.85,24.35,23,24.12,25.05,22.09,22.25,31.19,21.97,22.71,21.91,22.02,21.78 ,21.72
"2013-02-28 16:53:32",1590,11.85,24.35,22.98,24.12,25.05,22.12,22.3,31.35,21.98,22.68,21.9,22.01,21.7 4,21.69
"2013-02-28 16:53:33",1591,11.85,24.35,22.95,24.14,25.06,22.15,22.33,31.49,21.96,22.67,21.87,22,21.73 ,21.66
March 20,2013
I was able to get this to plot but I need to know how to get rid of the UTC label that prints as the time is not in UTC but in PST. I would prefer to just not show a timezone at all.
A simple solution would be to parse the file twice, once for the dates and once for the data:
import numpy as np
import datetime as dt
D = np.loadtxt("filename",delimiter=",",usecols=[0],dtype="str")
Z = np.loadtxt("filename",delimiter=",",usecols=range(1,10))
DATES = [dt.datetime.strptime(d,'"%Y-%m-%d %H:%M:%S"') for d in D]
You could also use the converters argument to pass a lambda function to loadtxt() so that it does the string to datetime object conversion for you. It doesn't save you any lines of code, I'm just noting it for a bit of variety:
datey = lambda x: dt.datetime.strptime(x,'"%Y-%m-%d %H:%M:%S"')
D = np.loadtxt("filename",delimiter=",",usecols=[0],
dtype=dt.datetime,converters={0:datey})
Z = np.loadtxt("filename",delimiter=",",usecols=range(1,10))
It sounds like you are having errors with parsing the dates. There may be a problem with the date format string you are using.
There is a table with all the options available for datetime parsing here:
http://docs.python.org/2/library/datetime.html#strftime-strptime-behavior
I normally do this kind of thing with pandas (see my comment above) but here's a rough solution using Python's built-in CSV module.
import csv
import datetime
data = []
for row in csv.reader(open('file.txt')):
row[0] = datetime.datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S') # Parse date.
row[1:] = map(float, row[1:]) # Convert data from strings to floats.
data.append(row)
There are fancier ways, but is a straightforward approach.
Running this on the data above, I get
[[datetime.datetime(2013, 2, 28, 16, 53, 30), 1588.0, 11.85...], ...]
You need to strip the " from you time string!
If there's only one time per line , i would change the for loop for something like this:
import time as time_module # there's is a var named time
lineNumber = 0
for line in data:
lineNumber += 1
if line <= 5:
continue # skip the first 5 lines
line = line.split(',')
timeString = line[0].strip('"')
print timeString
print time_module.strptime(timeString, '%Y-%m-%d %H:%M:%S')
time = d.date2num(time_module.strptime(timeString, '%Y-%m-%d %H:%M:%S'))
Related
Using a Python script, I need to read a CVS file where dates are formated as DD/MM/YYYY, and convert them to YYYY-MM-DD before saving this into a SQLite database.
This almost works, but fails because I don't provide time:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%Y-%m-%d")
#ValueError: time data did not match format: data=21/12/2008 fmt=%Y-%m-%d
print lastconnection
I assume there's a method in the datetime object to perform this conversion very easily, but I can't find an example of how to do it. Thank you.
Your example code is wrong. This works:
import datetime
datetime.datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
The call to strptime() parses the first argument according to the format specified in the second, so those two need to match. Then you can call strftime() to format the result into the desired final format.
you first would need to convert string into datetime tuple, and then convert that datetime tuple to string, it would go like this:
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime('%Y-%m-%d')
I am new to programming. I wanted to convert from yyyy-mm-dd to dd/mm/yyyy to print out a date in the format that people in my part of the world use and recognise.
The accepted answer above got me on the right track.
The answer I ended up with to my problem is:
import datetime
today_date = datetime.date.today()
print(today_date)
new_today_date = today_date.strftime("%d/%m/%Y")
print (new_today_date)
The first two lines after the import statement gives today's date in the USA format (2017-01-26). The last two lines convert this to the format recognised in the UK and other countries (26/01/2017).
You can shorten this code, but I left it as is because it is helpful to me as a beginner. I hope this helps other beginner programmers starting out!
Does anyone else else think it's a waste to convert these strings to date/time objects for what is, in the end, a simple text transformation? If you're certain the incoming dates will be valid, you can just use:
>>> ddmmyyyy = "21/12/2008"
>>> yyyymmdd = ddmmyyyy[6:] + "-" + ddmmyyyy[3:5] + "-" + ddmmyyyy[:2]
>>> yyyymmdd
'2008-12-21'
This will almost certainly be faster than the conversion to and from a date.
#case_date= 03/31/2020
#Above is the value stored in case_date in format(mm/dd/yyyy )
demo=case_date.split("/")
new_case_date = demo[1]+"-"+demo[0]+"-"+demo[2]
#new format of date is (dd/mm/yyyy) test by printing it
print(new_case_date)
If you need to convert an entire column (from pandas DataFrame), first convert it (pandas Series) to the datetime format using to_datetime and then use .dt.strftime:
def conv_dates_series(df, col, old_date_format, new_date_format):
df[col] = pd.to_datetime(df[col], format=old_date_format).dt.strftime(new_date_format)
return df
Sample usage:
import pandas as pd
test_df = pd.DataFrame({"Dates": ["1900-01-01", "1999-12-31"]})
old_date_format='%Y-%m-%d'
new_date_format='%d/%m/%Y'
conv_dates_series(test_df, "Dates", old_date_format, new_date_format)
Dates
0 01/01/1900
1 31/12/1999
The most simplest way
While reading the csv file, put an argument parse_dates
df = pd.read_csv("sample.csv", parse_dates=['column_name'])
This will convert the dates of mentioned column to YYYY-MM-DD format
Convert date format DD/MM/YYYY to YYYY-MM-DD according to your question, you can use this:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
print(lastconnection)
df is your data frame
Dateclm is the column that you want to change
This column should be in DateTime datatype.
df['Dateclm'] = pd.to_datetime(df['Dateclm'])
df.dtypes
#Here is the solution to change the format of the column
df["Dateclm"] = pd.to_datetime(df["Dateclm"]).dt.strftime('%Y-%m-%d')
print(df)
I'm doing some data analysis on a dataset (https://www.kaggle.com/sudalairajkumar/covid19-in-usa) and Im trying to convert the date and time column (lastModified) to the proper datetime format. When I tried it first it returned an error
ValueError: hour must be in 0..23
so I tried doing this -
data_df[['date','time']] =
data_df['lastModified'].str.split(expand=True)
data_df['lastModified'] = (pd.to_datetime(data_df.pop('date'),
format='%d/%m/%Y') +
pd.to_timedelta(data_df.pop('time') + ':00'))
This gives an error - Columns must be same length as key
I understand this means that both columns I'm splitting arent the same size. How do I resolve this issue? I'm relatively new to python. Please explain in a easy to understand manner. thanks very much
This is my whole code-
import pandas as pd
dataset_url = 'https://www.kaggle.com/sudalairajkumar/covid19-in-
usa'
import opendatasets as od
od.download(dataset_url)
data_dir = './covid19-in-usa'
import os
os.listdir(data_dir)
data_df = pd.read_csv('./covid19-in-usa/us_covid19_daily.csv')
data_df
data_df[['date','time']] =
data_df['lastModified'].str.split(expand=True)
data_df['lastModified'] = (pd.to_datetime(data_df.pop('date'),
format='%d/%m/%Y') +
pd.to_timedelta(data_df.pop('time') + ':00'))
Looks like lastModified is in ISO format. I have used something like below to convert iso date string:
from dateutil import parser
from datetime import datetime
...
timestamp = parser.isoparse(lastModified).timestamp()
dt = datetime.fromtimestamp(timestamp)
...
On this line:
data_df[['date','time']] = data_df['lastModified'].str.split(expand=True)
In order to do this assignment, the number of columns on both sides of the = must be the same. split can output multiple columns, but it will only do this if it finds the character it's looking for to split on. By default, it splits by whitespace. There is no whitespace in the date column, and therefore it will not split. You can read the documentation for this here.
For that reason, this line should be like this, so it splits on the T:
data_df[['date','time']] = data_df['lastModified'].str.split('T', expand=True)
But the solution posted by #southiejoe is likely to be more reliable. These timestamps are in a standard format; parsing them is a previously-solved problem.
You need these libraries
#import
from dateutil import parser
from datetime import datetime
Then try writing something similar for convert the date and time column. This way the columns should be the same length as the key
#convert the time column to the correct datetime format
clock = parser.isoparse(lastModified).timestamp()
#convert the date column to the correct datetime format
data = datetime.fromtimestamp(timestamp)
I have data of timedeltas which looks like this:
time_delta = '+414 00:45:41.004000'
So, these values are strings and they are of the format ddd hh:mm:ss.f. I now want to get this deltas to seconds. I tried to use .total_seconds() but it did not work.
How could I achieve what I am trying to do?
If you always assume the same input format, you can build a function as below (result to be checked with a simple case) :
import datetime as dt
def parseTimeDelta(time_delta_str):
splitted = time_delta_str.split(' ')
day_part = int(splitted[0][1:])
time_part = dt.datetime.strptime(splitted[1], "%H:%M:%S.%f")
delta = dt.timedelta(days=day_part, hours=time_part.hour, minutes=time_part.minute, seconds=time_part.second,microseconds=time_part.microsecond)
return delta.total_seconds()
time_delta = '+414 00:45:41.004000'
parseTimeDelta(time_delta)
can do this with pandas library
import pandas as pd
# Create the Timedelta object
td = pd.Timedelta('3 days 06:05:01.000000111')
print(td)
print(td.seconds)
Unfortunately we can't create a timedelta with a formatted string directly, but we can get a similar effect with regex then unpack parsed values into a timedelta.
import re
import datetime
# Create parser for your time format with named groups that match timedelta kwargs
time_parser = re.compile(r"\+(?P<days>\d+)\s+(?P<hours>\d{2}):(?P<minutes>\d{2}):(?P<seconds>\d{2})\.(?P<microseconds>\d+)")
# Get the values from your example string
regex_match = time_parser.match("+414 00:45:41.004000")
time_dict = regex_match.groupdict()
# Convert the time values to integers from strings
timedelta_kwargs = {k: int(v) for k, v in time_dict.items()}
# Make a time delta object
delta = datetime.timedelta(**timedelta_kwargs)
# Get total seconds
delta_in_seconds = delta.total_seconds()
Organise that into some functions and you'll get the functionality you're looking for with standard python packages.
I want to create a new column which contains seconds since 1970 for each row for the following input file:
timestamp, air_temp, rh, pressure, dir, spd
2016-11-30T00:00:00Z,-36.50,56.00,624.60,269.00,5.80
2016-11-30T01:00:00Z,-35.70,55.80,624.70,265.00,5.90
2016-11-30T02:00:00Z,-34.80,56.00,625.00,266.00,6.30
The first column represents the timestamp but it contains extra characters 'T' and 'Z'. My current code looks like this:
i = 0
ip_file.readline()
for line in ip_file:
line = line.strip()
year[i] = int(line[0:4])
month[i] = int(line[5:7])
day[i] = int(line[8:10])
hour[i] = int(line[11:13])
time[i] = (datetime(year[i],month[i],day[i],hour[i])-datetime(1970, 1, 1)).total_seconds()
i += 1
This returns me what I want but it takes long time if input file is big. If the timestamp didn't had those extra characters, I would have directly used it instead of calculating year, month, day and hour. Is there a better way? Any thoughts would be appreciated.
Instead of using string slice. Why not split the string by comma? And use strptime method in datetime module to convert string datetime to datetime object.
Example:
import datetime
with open(path, "r") as infile:
for i in infile.readlines()[1:]:
dVal = i.strip().split(",")[0]
print (datetime.datetime.strptime(dVal, '%Y-%m-%dT%H:%M:%SZ')-datetime.datetime(1970, 1, 1)).total_seconds()
Output:
1480464000.0
1480467600.0
1480471200.0
Input:
import datetime as dt
line = '2016-11-30T00:00:00Z,-36.50,56.00,624.60,269.00,5.80'
# We know the datetime data is always 20 characters long
line_dt_str = line[:20]
line_secs_since_epoch = dt.datetime.strptime(line_dt_str, '%Y-%m-%dT%H:%M:%SZ').timestamp()
print(line_secs_since_epoch)
Output:
1480482000.0
Note that there is a difference between calling .timestamp() and subtracting your datetime from the 1970 epoch. This comes from how these two methods handle (or don't handle) daylight savings time. Read more here
You can achieve this by first splitting your line in file on , and casting it to datetime object
>>> import datetime
>>> line = '2016-11-30T00:00:00Z,-36.50,56.00,624.60,269.00,5.80'
>>> t = datetime.strptime(line.split(',')[0], '%Y-%m-%dT%H:%M:%SZ')
To convert to seconds you can simply use:
>>> int(t.strftime("%s"))
>>> 1480435200
Using a Python script, I need to read a CVS file where dates are formated as DD/MM/YYYY, and convert them to YYYY-MM-DD before saving this into a SQLite database.
This almost works, but fails because I don't provide time:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%Y-%m-%d")
#ValueError: time data did not match format: data=21/12/2008 fmt=%Y-%m-%d
print lastconnection
I assume there's a method in the datetime object to perform this conversion very easily, but I can't find an example of how to do it. Thank you.
Your example code is wrong. This works:
import datetime
datetime.datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
The call to strptime() parses the first argument according to the format specified in the second, so those two need to match. Then you can call strftime() to format the result into the desired final format.
you first would need to convert string into datetime tuple, and then convert that datetime tuple to string, it would go like this:
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime('%Y-%m-%d')
I am new to programming. I wanted to convert from yyyy-mm-dd to dd/mm/yyyy to print out a date in the format that people in my part of the world use and recognise.
The accepted answer above got me on the right track.
The answer I ended up with to my problem is:
import datetime
today_date = datetime.date.today()
print(today_date)
new_today_date = today_date.strftime("%d/%m/%Y")
print (new_today_date)
The first two lines after the import statement gives today's date in the USA format (2017-01-26). The last two lines convert this to the format recognised in the UK and other countries (26/01/2017).
You can shorten this code, but I left it as is because it is helpful to me as a beginner. I hope this helps other beginner programmers starting out!
Does anyone else else think it's a waste to convert these strings to date/time objects for what is, in the end, a simple text transformation? If you're certain the incoming dates will be valid, you can just use:
>>> ddmmyyyy = "21/12/2008"
>>> yyyymmdd = ddmmyyyy[6:] + "-" + ddmmyyyy[3:5] + "-" + ddmmyyyy[:2]
>>> yyyymmdd
'2008-12-21'
This will almost certainly be faster than the conversion to and from a date.
#case_date= 03/31/2020
#Above is the value stored in case_date in format(mm/dd/yyyy )
demo=case_date.split("/")
new_case_date = demo[1]+"-"+demo[0]+"-"+demo[2]
#new format of date is (dd/mm/yyyy) test by printing it
print(new_case_date)
If you need to convert an entire column (from pandas DataFrame), first convert it (pandas Series) to the datetime format using to_datetime and then use .dt.strftime:
def conv_dates_series(df, col, old_date_format, new_date_format):
df[col] = pd.to_datetime(df[col], format=old_date_format).dt.strftime(new_date_format)
return df
Sample usage:
import pandas as pd
test_df = pd.DataFrame({"Dates": ["1900-01-01", "1999-12-31"]})
old_date_format='%Y-%m-%d'
new_date_format='%d/%m/%Y'
conv_dates_series(test_df, "Dates", old_date_format, new_date_format)
Dates
0 01/01/1900
1 31/12/1999
The most simplest way
While reading the csv file, put an argument parse_dates
df = pd.read_csv("sample.csv", parse_dates=['column_name'])
This will convert the dates of mentioned column to YYYY-MM-DD format
Convert date format DD/MM/YYYY to YYYY-MM-DD according to your question, you can use this:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
print(lastconnection)
df is your data frame
Dateclm is the column that you want to change
This column should be in DateTime datatype.
df['Dateclm'] = pd.to_datetime(df['Dateclm'])
df.dtypes
#Here is the solution to change the format of the column
df["Dateclm"] = pd.to_datetime(df["Dateclm"]).dt.strftime('%Y-%m-%d')
print(df)