Datetime strftime formatting - python

I am taking my date column in my dataframe and making it a string in order to take off the time element when I write it to excel. For some reason I cant seem to write the date in the following format i need which is (10/2/2016). I can it to appear in this format (10/02/2016) but two issues arise - I need the day to be to one digit and not two and also is not in date order ( it seems to be sequencing on the month and not the year than month than day).
Here is my code:
df8 = df.set_index('DATE').resample('W-WED').apply(pd.DataFrame.tail, n=1)
df8.index= df8.index.droplevel(0)
df8 = df8.reset_index('DATE', drop=False)
df8['DATE'] = pd.to_datetime(df8['DATE']).apply(lambda x:x.date().strftime('%m/%d/%Y'))
Sample data (this is what is showing with the above formatting)
DATE Distance (cm)
01/02/2013 206.85
01/04/2012 315.33
01/05/2011 219.46
01/06/2016 180.44
01/07/2015 168.55
01/08/2014 156.89

You can use dt.day instead of %d directive which automatically discards the leading zeros to give the desired formatted date strings as shown:
pd.to_datetime(df8['DATE']).map(lambda x: '{}/{}/{}'.format(x.month, x.day, x.year))
Demo:
df = pd.DataFrame(dict(date=['2016/10/02', '2016/10/03',
'2016/10/04', '2016/10/05', '2016/10/06']))
>>> pd.to_datetime(df['date']).map(lambda x: '{}/{}/{}'.format(x.month, x.day, x.year))
0 10/2/2016
1 10/3/2016
2 10/4/2016
3 10/5/2016
4 10/6/2016
Name: date, dtype: object
EDIT based on sample data added:
Inorder for it to impact only the days and not months, we must fill/pad the left side of the strings containing the .month attribute with 0's using str.zfill having a width parameter equal to 2, so that single digit months would be left padded with 0 and the double digit ones would be left unchanged.
>>> pd.to_datetime(df['DATE']).map(lambda x: '{}/{}/{}'.format(str(x.month).zfill(2), x.day, x.year))
0 01/2/2013
1 01/4/2012
2 01/5/2011
3 01/6/2016
4 01/7/2015
5 01/8/2014
Name: DATE, dtype: object

From here, you can make the day not zero-padded by using
Code Meaning Example
%m Month as a zero-padded decimal number. 09
%-m Month as a decimal number. (Platform specific) 9
So use %-m instead of %m

Related

Combining Year and DayOfYear, H:M:S columns into date time object

I have a time column with the format XXXHHMMSS where XXX is the Day of Year. I also have a year column. I want to merge both these columns into one date time object.
Before I had detached XXX into a new column but this was making it more complicated.
I've converted the two columns to strings
points['UTC_TIME'] = points['UTC_TIME'].astype(str)
points['YEAR_'] = points['YEAR_'].astype(str)
Then I have the following line:
points['Time'] = pd.to_datetime(points['YEAR_'] * 1000 + points['UTC_TIME'], format='%Y%j%H%M%S')
I'm getting the value errorr, ValueError: time data '137084552' does not match format '%Y%j%H%M%S' (match)
Here is a photo of my columns and a link to the data
works fine for me if you combine both columns as string, EX:
import pandas as pd
df = pd.DataFrame({'YEAR_': [2002, 2002, 2002],
'UTC_TIME': [99082552, 135082552, 146221012]})
pd.to_datetime(df['YEAR_'].astype(str) + df['UTC_TIME'].astype(str).str.zfill(9),
format="%Y%j%H%M%S")
# 0 2002-04-09 08:25:52
# 1 2002-05-15 08:25:52
# 2 2002-05-26 22:10:12
# dtype: datetime64[ns]
Note, since %j expects zero-padded day of year, you might need to zero-fill, see first row in the example above.

Converting a string containing year and week number to datetime in Pandas

I have a column in a Pandas dataframe that contains the year and the week number (1 up to 52) in one string in this format: '2017_03' (meaning 3d week of year 2017).
I want to convert the column to datetime and I am using the pd.to_datetime() function. However I get an exception:
pd.to_datetime('2017_01',format = '%Y_%W')
ValueError: Cannot use '%W' or '%U' without day and year
On the other hand the strftime documentation mentions that:
I am not sure what I am doing wrong.
You need also define start day:
a = pd.to_datetime('2017_01_0',format = '%Y_%W_%w')
print (a)
2017-01-08 00:00:00
a = pd.to_datetime('2017_01_1',format = '%Y_%W_%w')
print (a)
2017-01-02 00:00:00
a = pd.to_datetime('2017_01_2',format = '%Y_%W_%w')
print (a)
2017-01-03 00:00:00

Get week numbers for different dates coming in the given format python

I am getting a list of dates in the format
Date
20180223
20180120
20180201
I want to get the week numbers for these in a new column
Date Week_num
20180223 8
20180120 3
20180210 6
The code being used to get the date here is :
yyyymmdd= (dt.datetime.today()-timedelta(days=1)+timedelta(hours=5.3)).strftime('%Y%m%d')
I need help with getting the weeks for the same.
You'll have to convert your date back to a datetime.date() object:
>>> mydate = datetime.datetime.strptime("20180223", "%Y%m%d").date()
>>> mydate
datetime.date(2018, 2, 23)
datetime.date() has a date.isocalendar(), which contains ISO year, ISO week number and ISO weekday:
>>> mydate.isocalendar()
(2018, 8, 5)
As you can see the second entry to the tuple is the weeknumber you are looking for.
>>> mydate.isocalendar()[1]
8
You can create a new week column using .apply() and lambda to impact all values in a column.
df['week'] = df['existing_date_col'].apply(lambda x: x.isocalendar()[1])

Converting days since epoch to date

How can one convert a serial date number, representing the number of days since epoch (1970), to the corresponding date string? I have seen multiple posts showing how to go from string to date number, but I haven't been able to find any posts on how to do the reverse.
For example, 15951 corresponds to "2013-09-02".
>>> import datetime
>>> (datetime.datetime(2013, 9, 2) - datetime.datetime(1970,1,1)).days + 1
15951
(The + 1 because whatever generated these date numbers followed the convention that Jan 1, 1970 = 1.)
TL;DR: Looking for something to do the following:
>>> serial_date_to_string(15951) # arg is number of days since 1970
"2013-09-02"
This is different from Python: Converting Epoch time into the datetime because I am starting with days since 1970. I not sure if you can just multiply by 86,400 due to leap seconds, etc.
Use the datetime package as follows:
import datetime
def serial_date_to_string(srl_no):
new_date = datetime.datetime(1970,1,1,0,0) + datetime.timedelta(srl_no - 1)
return new_date.strftime("%Y-%m-%d")
This is a function which returns the string as required.
So:
serial_date_to_string(15951)
Returns
>> "2013-09-02"
And for a Pandas Dataframe:
df["date"] = pd.to_datetime(df["date"], unit="d")
... assuming that the "date" column contains values like 18687 which is days from Unix Epoch of 1970-01-01 to 2021-03-01.
Also handles seconds and milliseconds since Unix Epoch, use unit="s" and unit="ms" respectively.
Also see my other answer with the exact reverse.

Python string of numbers to date

I have am trying to process data with a timestamp field. The timestamp looks like this:
'20151229180504511' (year, month, day, hour, minute, second, millisecond)
and is a python string. I am attempting to convert it to a python datetime object. Here is what I have tried (using pandas):
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda x:datetime.strptime(x,"%Y%b%d%H%M%S"))
# returns error time data '20151229180504511' does not match format '%Y%b%d%H%M%S'
So I add milliseconds:
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda x:datetime.strptime(x,"%Y%b%d%H%M%S%f"))
# also tried with .%f all result in a format error
So tried using the dateutil.parser:
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda s: dateutil.parser.parse(s).strftime(DateFormat))
# results in OverflowError: 'signed integer is greater than maximum'
Also tried converting these entries using the pandas function:
data['TIMESTAMP'] = pd.to_datetime(data['TIMESTAMP'], unit='ms', errors='coerce')
# coerce does not show entries as NaT
I've made sure that whitespace is gone. Converting to Strings, to integers and floats. No luck so far - pretty stuck.
Any ideas?
p.s. Background info: The data is generated in an Android app as a the java.util.Calendar class, then converted to a string in Java, written to a csv and then sent to the python server where I read it in using pandas read_csv.
Just try :
datetime.strptime(x,"%Y%m%d%H%M%S%f")
You miss this :
%b : Month as locale’s abbreviated name.
%m : Month as a zero-padded decimal number.
%b is for locale-based month name abbreviations like Jan, Feb, etc.
Use %m for 2-digit months:
In [36]: df = pd.DataFrame({'Timestamp':['20151229180504511','20151229180504511']})
In [37]: df
Out[37]:
Timestamp
0 20151229180504511
1 20151229180504511
In [38]: pd.to_datetime(df['Timestamp'], format='%Y%m%d%H%M%S%f')
Out[38]:
0 2015-12-29 18:05:04.511
1 2015-12-29 18:05:04.511
Name: Timestamp, dtype: datetime64[ns]

Categories

Resources