I need to covert a string which contains date information (e.g., November 3, 2020) into date format (i.e., 11/03/2020).
I wrote
df['Date']=pd.to_datetime(df['Date']).map(lambda x: x.strftime('%m/%d/%y'))
where Date is
November 3, 2020
June 26, 2002
July 02, 2010
and many other dates, but I found the error ValueError: NaTType does not support strftime.
You can use pandas.Series.dt.strftime, which handles the NaT:
import pandas as pd
dates = ['November 3, 2020',
'June 26, 2002',
'July 02, 2010',
'NaT']
dates = pd.to_datetime(dates)
df = pd.DataFrame(dates, columns=['Date'])
df['Date'] = df['Date'].dt.strftime('%m/%d/%y')
Output:
Date
0 11/03/20
1 06/26/02
2 07/02/10
3 NaN
Related
So I have a data frame in python. I want to make a new column that has solely the year from the column found here.
The column is not in datetime format or anything due to the country listed at the tail end and I've tried using split() like so:
df['new_column'] = df['column_name'].astype(str).split(",", 3)[2]
but apparently, that doesn't work on objects.
Again, the columns are listed like so:
October 1, 2020 (United States)
April 27, 2019 (Cameroon)
but are type object and not string.
It is primarily the differing lengths in the countries at the end that has kept me from pulling from index like so:
df['new_column'] = df['column_name'].astype(str).str[x:x]
Thank You!
You can convert a column to datetime with pandas.to_datetime(). You can:
pass the format as strftime. Check out the documentation here.
If you do not know the format, you can use infer_datetime_format=True. However be careful while using this parameter, because it may convert in wrong order.
After that year can be extracted as follows:
# Create sample df:
df = pd.DataFrame({
'id': [1, 2],
'date': [
'April 27, 2019 (Cameroon)',
'October 1, 2020 (United States)'
]
})
# Remove country names
df['new_date'] = df['date'].apply(lambda x: str(x).split(' (')[0])
print(df)
Output:
id date new_date
0 1 April 27, 2019 (Cameroon) April 27, 2019
1 2 October 1, 2020 (United States) October 1, 2020
Then new_date can be converted to datetime:
df['new_date'] = pd.to_datetime(df['new_date'], infer_datetime_format=True)
df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 2 non-null int64
1 date 2 non-null object
2 new_date 2 non-null datetime64[ns]
Then we can extract year from new_date:
df['year'] = df['new_date'].apply(lambda x: x.year)
Here is the final df:
id date new_date year
0 1 April 27, 2019 (Cameroon) 2019-04-27 2019
1 2 October 1, 2020 (United States) 2020-10-01 2020
I have a DataFrame with a column containing seconds and I would like to convert the column to date and time and save the file with a column containing the date and time .I Have a column like this in seconds
time
2384798300
1500353475
7006557825
1239779541
1237529231
I was able to do it but by only inserting the number of seconds that i want to convert with the following code:
datetime.fromtimestamp(1238479969).strftime("%A, %B %d, %Y %I:%M:%S")
output : Tuesday, March 31, 2009 06:12:49'
What i want to get is the conversion of the whole column,I tried this :
datetime.fromtimestamp(df['time']).strftime("%A, %B %d, %Y %I:%M:%S") but I can not get it, any help of how i can do it will be appreciated.
Use df.apply:
In [200]: from datetime import datetime
In [203]: df['time'] = df['time'].apply(lambda x: datetime.fromtimestamp(x).strftime("%A, %B %d, %Y %I:%M:%S"))
In [204]: df
Out[204]:
time
0 Friday, July 28, 2045 01:28:20
1 Tuesday, July 18, 2017 10:21:15
2 Wednesday, January 11, 2192 03:33:45
3 Wednesday, April 15, 2009 12:42:21
4 Friday, March 20, 2009 11:37:11
I am trying to convert dates of the following format:
2007-10-18 11:31:46 -0400 (Thu, 18 Oct 2007)
or
Thu, 18 Oct 2007 11:31:49 -0400
to the day of the week. Even though the day is given in the above dates, but how can I extract the only day of the week from the above dates?
Is this too simple:
days = {
'mon': 'Monday',
'thu': 'Thursday',
# and the others
}
haystack = '2007-10-18 11:31:46 -0400 (Thu, 18 Oct 2007)'
for acronym, full_name in days.items():
if acronym in haystack.lower():
print(f"Found {full_name}")
days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
date = "2007-10-18 11:31:46 -0400 (Thu, 18 Oct 2007)"
for d in days:
if d in date:
print(d)
You can use dateutil.parser to get a datetime object from a string:
from dateutil import parser
dt = parser.parse("2007-10-18 11:31:46 -0400")
# dt = datetime.datetime(2007, 10, 18, 11, 31, 46, tzinfo=tzoffset(None, -14400))
Then just use datetime.weekday. This will give you a number from 0 to 6, where 0 is the first day of the week relatively to your timezone, Monday by default.
dt.weekday
# 3, starting from 0 so it's Thursday
One of my columns in a pandas dataframe has dates formatted like so:
Saturday, April 29th, 2017
How would I change this to a pandas readable date type so that I can sort by date?
(python 3)
use to_datetime. see example below
import pandas as pd
df = pd.DataFrame({'date': ["Saturday, April 29th, 2017", "Wednesday, March 22nd, 2017"]})
print df.head()
# conversion to pandas date time
df.date = pd.to_datetime(df.date)
print df.head()
# Sorting by Date
print "sorted by Date"
print df.sort_values(['date']).head()
results in
date
0 Saturday, April 29th, 2017
1 Wednesday, March 22nd, 2017
date
0 2017-04-29
1 2017-03-22
sorted by Date
date
1 2017-03-22
0 2017-04-29
My source data has a column including the date information but it is a string type.
Typical lines are like this:
04 13, 2013
07 1, 2012
I am trying to convert to a date format, so I used panda's to_datetime function:
df['ReviewDate_formated'] = pd.to_datetime(df['ReviewDate'],format='%mm%d, %yyyy')
But I got this error message:
ValueError: time data '04 13, 2013' does not match format '%mm%d, %yyyy' (match)
My questions are:
How do I convert to a date format?
I also want to extract to Month and Year and Day columns because I need to do some month over month comparison? But the problem here is the length of the string varies.
Your format string is incorrect, you want '%m %d, %Y', there is a reference that shows what the valid format identifiers are:
In [30]:
import io
import pandas as pd
t="""ReviewDate
04 13, 2013
07 1, 2012"""
df = pd.read_csv(io.StringIO(t), sep=';')
df
Out[30]:
ReviewDate
0 04 13, 2013
1 07 1, 2012
In [31]:
pd.to_datetime(df['ReviewDate'], format='%m %d, %Y')
Out[31]:
0 2013-04-13
1 2012-07-01
Name: ReviewDate, dtype: datetime64[ns]
To answer the second part, once the dtype is a datetime64 then you can call the vectorised dt accessor methods to get just the day, month, and year portions:
In [33]:
df['Date'] = pd.to_datetime(df['ReviewDate'], format='%m %d, %Y')
df['day'],df['month'],df['year'] = df['Date'].dt.day, df['Date'].dt.month, df['Date'].dt.year
df
Out[33]:
ReviewDate Date day month year
0 04 13, 2013 2013-04-13 13 4 2013
1 07 1, 2012 2012-07-01 1 7 2012