In Python:
Got a .dat file one column is a datestr 'yyyy-mm-dd'. Column years range from 2000, to 2010 I only want to use 2005.
How can I successfully read using np.loadtxt, keeping in the same format.
I am then going to use:
time_string = yyyy-mm-dd
doy = int (time.strftime ("%j", time.strptime ( time_string, "%Y, %m, %d")))
to convert yyyy-mm-dd to day of year (1-365)
The question doesn't point any reason to the use of loadtxt from numpy, so you actually don't care about how your data is loaded. Said that, in this case you simply use dtype=object for loading it.
Suppose this is your .dat file, let us call it d1.dat:
1 2000-01-01 blah
2 2005-01-01 bleh
3 2006-02-03 blih
4 2008-03-04 bloh
5 2010-04-05 bluh
6 2005-03-12 blahr
Then (for example) to load it using numpy:
import numpy
data = numpy.loadtxt('d1.dat', usecols=[1,2], dtype=object)
Now you can apply your function to extract the day of the year from the first column in data:
for date, _ in data:
print time.strftime("%j", time.strptime(date, "%Y-%m-%d"))
from datetime import datetime
time_string = '2012-12-31'
dt = datetime.strptime(time_string, '%Y-%m-%d')
print dt.timetuple().tm_yday
366
Related
I have a dataframe containing different recorded times as string objects, such as 1:02:45, 51:11, 54:24.
I can't convert to time objects, this is the error I am getting:
"time data '49:49' does not match format '%H:%M:%S"
This is the code I am using:
df_plot2 = df[['year', 'gun_time', 'chip_time']]
df_plot2['gun_time'] = pd.to_datetime(df_plot2['gun_time'], format = '%H:%M:%S')
df_plot2['chip_time'] = pd.to_datetime(df_plot2['chip_time'], format = '%H:%M:%S')
Thanks in advance for your help!
you can create a common format in the time Series by checking string len and adding the hours as zero '00:' where there are only minutes and seconds. Then parse to datetime. Ex:
import pandas as pd
s = pd.Series(["1:02:45", "51:11", "54:24"])
m = s.str.len() <= 5
s.loc[m] = '00:' + s.loc[m]
dts = pd.to_datetime(s)
print(dts)
0 2021-12-01 01:02:45
1 2021-12-01 00:51:11
2 2021-12-01 00:54:24
dtype: datetime64[ns]
I believe it may be because for %H python expects to see 01, 02, 03 etc instead of 1, 2, 3. To use your specific example 1:02:45 may have to be in the 01:02:45 format for python to be able to convert it to a datetime variable with %H:%M:$S.
I have a column in a Pandas dataframe that contains the year and the week number (1 up to 52) in one string in this format: '2017_03' (meaning 3d week of year 2017).
I want to convert the column to datetime and I am using the pd.to_datetime() function. However I get an exception:
pd.to_datetime('2017_01',format = '%Y_%W')
ValueError: Cannot use '%W' or '%U' without day and year
On the other hand the strftime documentation mentions that:
I am not sure what I am doing wrong.
You need also define start day:
a = pd.to_datetime('2017_01_0',format = '%Y_%W_%w')
print (a)
2017-01-08 00:00:00
a = pd.to_datetime('2017_01_1',format = '%Y_%W_%w')
print (a)
2017-01-02 00:00:00
a = pd.to_datetime('2017_01_2',format = '%Y_%W_%w')
print (a)
2017-01-03 00:00:00
I have a data sheet in which issue_d is a date column having values stored in a format - 11-Dec. On clicking any cell of the column, date is coming as 12/11/2018.
But while reading the csv file, issue_d is getting imported as 11-Dec. Year is not getting imported.
How do I get the issue_d column in format- d/m/y?
Code i tried -
import pandas
data=pandas.read_csv('Project_data.csv')
print(data)
checking issue_d column: data['issue_d']
result :
0 11-Dec
1 11-Dec
2 11-Dec
expected:
0 11-Dec-2018
1 11-Dec-2018
2 11-Dec-201
You can use to_datetime with add year to column:
df['issue_d'] = pd.to_datetime(df['issue_d'] + '-2018')
print (df)
issue_d
0 2018-12-11
1 2018-12-11
2 2018-12-11
A more 'controllable' way of getting the data is to first get the datetime from the data frame as normal, and then convert it:
dt = dt.strftime('%Y-%m-%d')
In this case, you'd put %d in front. strftime is a great technique because it allows the most customization when converting a datetime variable, and I used it in my tutorial book - if you're a beginner to python algorithms, you should definitely check it out!
After you do this, you can splice out each individual month, day, and year, and then use
strftime("%B")
to get the string-name of the month (e.g. "February").
Good Luck!
I'm starting with python and pandas and matplotlib. I'm working with data with over million entries. I'm trying to change the date format. In CSV file date format is 23-JUN-11. I will like to use dates in future to plot amount of donation for each candidate. How to convert the date format to a readable format for pandas?
Here is the link to cut file 149 entries
My code:
%matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
First candidate
reader_bachmann = pd.read_csv('P00000001-ALL.csv' ,converters={'cand_id': lambda x: str(x)[1:]},parse_dates=True, squeeze=True, low_memory=False, nrows=411 )
date_frame = pd.DataFrame(reader_bachmann, columns = ['contb_receipt_dt'])
Data slice
s = date_frame.iloc[:,0]
date_slice = pd.Series([s])
date_strip = date_slice.str.replace('JUN','6')
Trying to convert to new date format
date = pd.to_datetime(s, format='%d%b%Y')
print(date_slice)
Here is the error message
ValueError: could not convert string to float: '05-JUL-11'
You need to use a different date format string:
format='%d-%b-%y'
Why?
The error message gives a clue as to what is wrong:
ValueError: could not convert string to float: '05-JUL-11'
The format string controls the conversion, and is currently:
format='%d%b%Y'
And the fields needed are:
%y - year without a century (range 00 to 99)
%b - abbreviated month name
%d - day of the month (01 to 31)
What is missing is the - that are separating the field in your data string, and the y for a two digit year instead of the current Y for a four digit year.
As an alternative you can use dateutil.parser to parse dates containing string directly, I have created a random dataframe for demo.
l = []
for i in range(100):
l.append('23-JUN-11')
B = pd.DataFrame({'Date':l})
Now, Let's import dateutil.parser and apply it on our date column
import dateutil.parser
B['Date2'] = B['Date'].apply(lambda x : dateutil.parser.parse(x))
B.head()
Out[106]:
Date Date2
0 23-JUN-11 2011-06-23
1 23-JUN-11 2011-06-23
2 23-JUN-11 2011-06-23
3 23-JUN-11 2011-06-23
4 23-JUN-11 2011-06-23
I have am trying to process data with a timestamp field. The timestamp looks like this:
'20151229180504511' (year, month, day, hour, minute, second, millisecond)
and is a python string. I am attempting to convert it to a python datetime object. Here is what I have tried (using pandas):
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda x:datetime.strptime(x,"%Y%b%d%H%M%S"))
# returns error time data '20151229180504511' does not match format '%Y%b%d%H%M%S'
So I add milliseconds:
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda x:datetime.strptime(x,"%Y%b%d%H%M%S%f"))
# also tried with .%f all result in a format error
So tried using the dateutil.parser:
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda s: dateutil.parser.parse(s).strftime(DateFormat))
# results in OverflowError: 'signed integer is greater than maximum'
Also tried converting these entries using the pandas function:
data['TIMESTAMP'] = pd.to_datetime(data['TIMESTAMP'], unit='ms', errors='coerce')
# coerce does not show entries as NaT
I've made sure that whitespace is gone. Converting to Strings, to integers and floats. No luck so far - pretty stuck.
Any ideas?
p.s. Background info: The data is generated in an Android app as a the java.util.Calendar class, then converted to a string in Java, written to a csv and then sent to the python server where I read it in using pandas read_csv.
Just try :
datetime.strptime(x,"%Y%m%d%H%M%S%f")
You miss this :
%b : Month as locale’s abbreviated name.
%m : Month as a zero-padded decimal number.
%b is for locale-based month name abbreviations like Jan, Feb, etc.
Use %m for 2-digit months:
In [36]: df = pd.DataFrame({'Timestamp':['20151229180504511','20151229180504511']})
In [37]: df
Out[37]:
Timestamp
0 20151229180504511
1 20151229180504511
In [38]: pd.to_datetime(df['Timestamp'], format='%Y%m%d%H%M%S%f')
Out[38]:
0 2015-12-29 18:05:04.511
1 2015-12-29 18:05:04.511
Name: Timestamp, dtype: datetime64[ns]