How to extract date AND hour from date time in python? - python

What's the best way to do this? I thought about extracting the two separately then combining them? This doesn't seem like it should be the most efficient way?
df['date'] = df['datetime'].dt.date
df['hour'] = df['datetime'].hour
df['dateAndHour'] = df['datetime'].dt.date.astype(str) + ' ' + df['datetime'].dt.hour.astype(str)

You can use strftime and it depends on the format your date is in and how you want to combine them
from datetime import datetime
import pandas as pd
df = pd.DataFrame({'date':[datetime.now()]})
df['date-hour'] = df.date.dt.strftime('%Y-%m-%d %H')
df
date date-hour
0 2020-11-18 11:03:38.390393 2020-11-18 11

Depends what you want to do with it, but one way to do this would be to use strftime to format the datetime column to %Y-%m-%d %H or similar:
>>> df
datetime
0 2020-01-01 12:15:00
1 2020-10-22 11:11:11
>>> df.datetime.dt.strftime("%Y-%m-%d %H")
0 2020-01-01 12
1 2020-10-22 11
Name: datetime, dtype: object

Related

How to remove hours, minutes, seconds and UTC offset from pandas date column? I'm running with streamlit and pandas

How to remove T00:00:00+05:30 after year, month and date values in pandas? I tried converting the column into datetime but also it's showing the same results, I'm using pandas in streamlit. I tried the below code
df['Date'] = pd.to_datetime(df['Date'])
The output is same as below :
Date
2019-07-01T00:00:00+05:30
2019-07-01T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-05T00:00:00+05:30
Can anyone help me how to remove T00:00:00+05:30 from the above rows?
If I understand correctly, you want to keep only the date part.
Convert date strings to datetime
df = pd.DataFrame(
columns={'date'},
data=["2019-07-01T02:00:00+05:30", "2019-07-02T01:00:00+05:30"]
)
date
0 2019-07-01T02:00:00+05:30
1 2019-07-02T01:00:00+05:30
2 2019-07-03T03:00:00+05:30
df['date'] = pd.to_datetime(df['date'])
date
0 2019-07-01 02:00:00+05:30
1 2019-07-02 01:00:00+05:30
Remove the timezone
df['datetime'] = df['datetime'].dt.tz_localize(None)
date
0 2019-07-01 02:00:00
1 2019-07-02 01:00:00
Keep the date only
df['date'] = df['date'].dt.date
0 2019-07-01
1 2019-07-02
Don't bother with apply to Python dates or string changes. The former will leave you with an object type column and the latter is slow. Just round to the day frequency using the library function.
>>> pd.Series([pd.Timestamp('2000-01-05 12:01')]).dt.round('D')
0 2000-01-06
dtype: datetime64[ns]
If you have a timezone aware timestamp, convert to UTC with no time zone then round:
>>> pd.Series([pd.Timestamp('2019-07-01T00:00:00+05:30')]).dt.tz_convert(None) \
.dt.round('D')
0 2019-07-01
dtype: datetime64[ns]
Pandas doesn't have a builtin conversion to datetime.date, but you could use .apply to achieve this if you want to have date objects instead of string:
import pandas as pd
import datetime
df = pd.DataFrame(
{"date": [
"2019-07-01T00:00:00+05:30",
"2019-07-01T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-05T00:00:00+05:30"]})
df["date"] = df["date"].apply(lambda x: datetime.datetime.fromisoformat(x).date())
print(df)

Adding a datetime column in pandas dataframe from minute values

I have a data frame where there is time columns having minutes from 0-1339 meaning 1440 minutes of a day. I want to add a column datetime representing the day 2021-3-21 including hh amd mm like this 1980-03-01 11:00 I tried following code
from datetime import datetime, timedelta
date = datetime.date(2021, 3, 21)
days = date - datetime.date(1900, 1, 1)
df['datetime'] = pd.to_datetime(df['time'],format='%H:%M:%S:%f') + pd.to_timedelta(days, unit='d')
But the error seems like descriptor 'date' requires a 'datetime.datetime' object but received a 'int'
Is there any other way to solve this problem or fixing this code? Please help to figure this out.
>>df
time
0
1
2
3
..
1339
I want to convert this minutes to particular format 1980-03-01 11:00 where I will use the date 2021-3-21 and convert the minutes tohhmm part. The dataframe will look like.
>df
datetime time
2021-3-21 00:00 0
2021-3-21 00:01 1
2021-3-21 00:02 2
...
How can I format my data in this way?
Let's try with pd.to_timedelta instead to get the duration in minutes from time then add a TimeStamp:
df['datetime'] = (
pd.Timestamp('2021-3-21') + pd.to_timedelta(df['time'], unit='m')
)
df.head():
time datetime
0 0 2021-03-21 00:00:00
1 1 2021-03-21 00:01:00
2 2 2021-03-21 00:02:00
3 3 2021-03-21 00:03:00
4 4 2021-03-21 00:04:00
Complete Working Example with Sample Data:
import numpy as np
import pandas as pd
df = pd.DataFrame({'time': np.arange(0, 1440)})
df['datetime'] = (
pd.Timestamp('2021-3-21') + pd.to_timedelta(df['time'], unit='m')
)
print(df)

extract date only from pandas column

I have this column in pandas df:
'''
full_date
2020-12-02T08:11:30-0600
2020-12-02T02:11:50-0600
2020-12-03T08:56:29-0600
'''
I only need the date, hoping to have this column:
'''
date
2020-12-02
2020-12-02
2020-12-03
'''
I have tried to find the solution from previous questions, but still failed. If anyone can help, I will appreciate that a lot. thanks.
In case your column is not a datetime type, you can convert it to that and then use the .dt accessor to get just the date:
>>> df["date"] = df["full_date"].pipe(pd.to_datetime, utc=True).dt.date
>>> print(df)
full_date date
0 2020-12-02T08:11:30-0600 2020-12-02
1 2020-12-02T02:11:50-0600 2020-12-02
2 2020-12-03T08:56:29-0600 2020-12-03
You can convert the datetime very easily using this python code, if suitable.
from dateutil.parser import parse
var = "2020-12-02T08:11:30-0600"
parseddate = parse(var).date()

Convert a number into a special datetime

Date1 :20061201
Date2 :01/12/2006
How could use pandas in Python to convert date1 into date2(day/month/year) format?Thanks!Date1 and Date2 are two column in csv files.
Data:
In [151]: df
Out[151]:
Date
0 20061201
1 20170530
Option 1:
In [152]: pd.to_datetime(df.Date, format='%Y%m%d').dt.strftime('%d/%m/%Y')
Out[152]:
0 01/12/2006
1 30/05/2017
Name: Date, dtype: object
Option 2:
In [153]: df.Date.astype(str).str.replace('(\d{4})(\d{2})(\d{2})', r'\3/\2/\1')
Out[153]:
0 01/12/2006
1 30/05/2017
Name: Date, dtype: object
If you're using pandas and want a timestamp object back
pd.to_datetime('20061201')
Timestamp('2006-12-01 00:00:00')
If you want a string back
str(pd.to_datetime('20061201').date())
'2006-12-01'
Assuming you have a dataframe df
df = pd.DataFrame(dict(Date1=['20161201']))
Then you can use the same techniques in vectorized form.
as timestamps
df.assign(Date2=pd.to_datetime(df.Date1))
Date1 Date2
0 20161201 2016-12-01
as strings
df.assign(Date2=pd.to_datetime(df.Date1).dt.date.astype(str))
Date1 Date2
0 20161201 2016-12-01
import datetime
A=datetime.datetime.strptime('20061201','%Y%m%d')
A.strftime('%m/%d/%Y')
You may use apply and lambda function here.
Suppose you have a dataset named df as below:
id date1
0 20061201
2 20061202
You can use the code like below:
df['date2'] = df['date1'].apply(lambda x: x[6:] + '/' + x[4:6] + '/' + x[:4])
The result will be:
id date1 date2
0 20061201 01/12/2016
2 20061202 02/12/2016
The simplest way is probably using the date parsing provided by datetime:
from datetime import datetime
datetime.strptime(str(20061201), "%Y%m%d")
You can apply this transformation to all rows in your pandas dataframe/series using the following:
from datetime import datetime
def convert_date(d):
return datetime.strptime(str(d), "%Y%m%d")
df['Date2'] = df.Date1.apply(convert_date)
This will add a Date2 column to your dataframe df, which is the datetime representation of the Date1 column.
You can then serialize the date again by using strftime:
def serialize_date(d):
return d.strftime(d, "%d/%m/%Y")
df['Date2'] = df.Date2.apply(serialize_date)
Alternatively you can do it all with string manipulations:
def reformat_date(d):
year = d // 10000
month = d % 10000 // 100
day = d % 100
return "{day}/{month}/{year}".format(day=day, month=month, year=year)
df['Date2'] = df.Date1.apply(reformat_date)
This is quite a bit faster than using the parsing machinery provided by strptime.

convert to datetime64 format with to_datetime()

I'm trying to convert some date time data in to pandas.to_datetime() format. It is not working and the type of df['Time'] is Object. Where is wrong?
Please Note that I have attached my time file.
My Code
import pandas as pd
import numpy as np
from datetime import datetime
f = open('time','r')
lines = f.readlines()
t = []
for line in lines:
time = line.split()[1][-20:]
time2 = time[:11] + ' ' +time[12:21]
t.append(time2)
df = pd.DataFrame(t)
df.columns = ['Time']
df['Time'] = pd.to_datetime(df['Time'])
print df['Time']
Name: Time, Length: 16136, dtype: object
please find the attach time data file here
The file time contain some invalid data.
For example, line 8323 contain 8322 "5/Jul/2013::8:25:18 0530",
which is different from normal lines 8321 "15/Jul/2013:18:25:18 +0530".
8321 "15/Jul/2013:18:25:18 +0530"
8322 "5/Jul/2013::8:25:18 0530"
For normal line, time2 become 15/Jul/2013 18:25:18, but for invalid line "5/Jul/2013::8:25:18
15/Jul/2013 18:25:18
"5/Jul/2013::8:25:18
Which cause some lines are parsed to datetime, and some lines not; data are coerced to object (to contain both datetime and string).
>>> pd.Series(pd.to_datetime(['15/Jul/2013 18:25:18', '15/Jul/2013 18:25:18']))
0 2013-07-15 18:25:18
1 2013-07-15 18:25:18
dtype: datetime64[ns]
>>> pd.Series(pd.to_datetime(['15/Jul/2013 18:25:18', '*5/Jul/2013 18:25:18']))
0 15/Jul/2013 18:25:18
1 *5/Jul/2013 18:25:18
dtype: object
If you take only first 5 data (which has correct date format) from files, you will get what you expected.
...
df = pd.DataFrame(t[:5])
df.columns = ['Time']
df['Time'] = pd.to_datetime(df['Time'])
Above code yield:
0 2013-07-15 00:00:12
1 2013-07-15 00:00:18
2 2013-07-15 00:00:23
3 2013-07-15 00:00:27
4 2013-07-15 00:00:29
Name: Time, dtype: datetime64[ns]
UPDATE
Added a small example that show the cause of dtype of object, not datetime.

Categories

Resources