Rounding up the value to the nearest hour - python

This is my first time to post a question here, if I don't explain the question very clearly, please give me a chance to improve the way of asking. Thank you!
I have a dataset contains dates and times like this
TIME COL1 COL2 COL3 ...
2018/12/31 23:50:23 34 DC 23
2018/12/31 23:50:23 32 NC 23
2018/12/31 23:50:19 12 AL 33
2018/12/31 23:50:19 56 CA 23
2018/12/31 23:50:19 98 CA 33
I want to create a new column and the format would be like '2018-12-31 11:00:00 PM' instead of '2018/12/31 23:10:23' and 17:40 was rounded up to 6:00
I have tried to use .dt.strftime("%Y-%m-%d %H:%M:%S") to change the format and then when I try to convert the time from 12h to 24h, I stuck here.
Name: TIME, Length: 3195450, dtype: datetime64[ns]
I found out the type of df['TIME'] is pandas.core.series.Series
Now I have no idea about how to continue. Please give me some ideas, hints or any instructions. Thank you very much!

From your example it seems you want to floor to the hour, instead of round? In any case, first make sure your TIME column is of datetime dtype.
df['TIME'] = pd.to_datetime(df['TIME'])
Now floor (or round) using the dt accessor and an offset alias:
df['newTIME'] = df['TIME'].dt.floor('H') # could use round instead of floor here
# df['newTIME']
# 0 2018-12-31 23:00:00
# 1 2018-12-31 23:00:00
# 2 2018-12-31 23:00:00
# 3 2018-12-31 23:00:00
# 4 2018-12-31 23:00:00
# Name: newTIME, dtype: datetime64[ns]
Afer that, you can format to string in a desired format, again using the dt accessor to access properties of a datetime series:
df['timestring'] = df['newTIME'].dt.strftime("%Y-%m-%d %I:%M:%S %p")
# df['timestring']
# 0 2018-12-31 11:00:00 PM
# 1 2018-12-31 11:00:00 PM
# 2 2018-12-31 11:00:00 PM
# 3 2018-12-31 11:00:00 PM
# 4 2018-12-31 11:00:00 PM
# Name: timestring, dtype: object

Related

Inconsistency when parsing year-weeknum string to date

When parsing year-weeknum strings, I came across an inconsistency when comparing the results from %W and %U (docs):
What works:
from datetime import datetime
print("\nISO:") # for reference...
for i in range(1,8): # %u is 1-based
print(datetime.strptime(f"2019-01-{i}", "%G-%V-%u"))
# ISO:
# 2018-12-31 00:00:00
# 2019-01-01 00:00:00
# 2019-01-02 00:00:00
# ...
# %U -> week start = Sun
# first Sunday 2019 was 2019-01-06
print("\n %U:")
for i in range(0,7):
print(datetime.strptime(f"2019-01-{i}", "%Y-%U-%w"))
# %U:
# 2019-01-06 00:00:00
# 2019-01-07 00:00:00
# 2019-01-08 00:00:00
# ...
What is unexpected:
# %W -> week start = Mon
# first Monday 2019 was 2019-01-07
print("\n %W:")
for i in range(0,7):
print(datetime.strptime(f"2019-01-{i}", "%Y-%W-%w"))
# %W:
# 2019-01-13 00:00:00 ## <-- ?! expected 2019-01-06
# 2019-01-07 00:00:00
# 2019-01-08 00:00:00
# 2019-01-09 00:00:00
# 2019-01-10 00:00:00
# 2019-01-11 00:00:00
# 2019-01-12 00:00:00
The date jumping from 2019-01-13 to 2019-01-07? What's going on here? I don't see any ambiguities in the calendar for 2019... I also tried to parse the same dates in rust with chrono, and it fails for the %W directive -> playground example. A jump backwards in Python and an error in Rust, what am I missing here?
That week goes from Monday January 7 to Sunday January 13.
%w is documented as "Weekday as a decimal number, where 0 is Sunday and 6 is Saturday.". So 0 means Sunday (= January 13), and 1 means Monday (= January 7).
In your code, you're trying to parse the string "2019-01-0" as a day of a year, which is not a valid day. That's why you're encountering an unexpected result when using the %W format code.
If you want to parse a date, you should specify a value that is bigger then 1 not 0.
Also keep it might help to keep the style consistent with f-string
f'(2019-01-{i:02d}')
which will add the leading 0 when necessary like the following.
2019-01-00
2019-01-01
2019-01-02
2019-01-03
2019-01-04
2019-01-05
2019-01-06
Here is your modified code:
for i in range(0,7):
print(datetime.strptime(f"2019-01-{i}", "%Y-%W-%w"))

Convert a column to a specific time format which contains different types of time formats in python

This is my data frame
df = pd.DataFrame({
'Time': ['10:00PM', '15:45:00', '13:40:00AM','5:00']
})
Time
0 10:00PM
1 15:45:00
2 13:40:00AM
3 5:00
I need to convert the time format in a specific format which is my expected output, given below.
Time
0 22:00:00
1 15:45:00
2 01:40:00
3 05:00:00
I tried using split and endswith function of str which is a complicated solution. Is there any better way to achieve this?
Thanks in advance!
here you go. One thing to mention though 13:40:00AM will result in an error since 13 is a) wrong format as AM/PM only go from 1 to 12 and b) PM (which 13 would be) cannot at the same time be AM :)
Cheers
import pandas as pd
df = pd.DataFrame({'Time': ['10:00PM', '15:45:00', '01:40:00AM', '5:00']})
df['Time'] = pd.to_datetime(df['Time'])
print(df['Time'].dt.time)
<<< 22:00:00
<<< 15:45:00
<<< 01:45:00
<<< 05:00:00

Python dataframe converting time date 'SylmiSeb' (2018-12-31 23:43:02+00:00) to datetime

I'm trying to convert a column of the style 2018-12-31 23:43:02+00:00 to 2018-12-31 by using pd.to_datetime . I got this database by using snscrape library (https://github.com/JustAnotherArchivist/snscrape).
However when I try this:
database_2018['date_created'] =
pd.to_datetime(database_2018['date_created'],
infer_datetime_format=True)
I get the following error: ParserError: Unknown string format: SylmiSeb
When I ask the dtype of this column date it appears as an object type. Any ideas on how to solve this?
I also tried:
database_2018['date_created'] =
pd.Timestamp(database_2018['date_created'])
.to_datetime()
But I get the following error:
TypeError: Cannot convert input [0 2018-12-31 23:43:02+00:00
1 2018-12-31 23:30:20+00:00
2 2018-12-31 23:30:00+00:00
3 2018-12-31 23:28:09+00:00
4 2018-12-31 23:28:08+00:00
...
105037 2018-01-01 00:29:18+00:00
105038 2018-01-01 00:25:04+00:00
105039 2018-01-01 00:10:03+00:00
105040 2018-01-01 00:03:28+00:00
105041 2018-01-01 00:00:44+00:00
Name: date_created, Length: 105042, dtype: object] of type <class 'pandas.core.series.Series'> to Timestamp
Thanks for the help !
Try:
database_2018['date_created'] = database_2018['date_created'].apply(
lambda x: x[:x.rfind(':')] + x[x.rfind(':')+1:]
)
database_2018['date_created'] = pd.to_datetime(
database_2018['date_created'], format='%Y-%m-%d %H:%M:%S%z')
This is the format of your dates, where %z represents UTC offset. For more information, see datetime documentation. The UTC offset needs to be without the colon character. So the first part of the code above removes that colon.
IIUC You are trying to fetch only date from a datetime column with timezone.
Setup
d="""date_created
2018-12-31 23:30:20+00:00
2018-12-31 23:30:00+00:00
2018-12-31 23:28:09+00:00
2018-12-31 23:28:08+00:00"""
df=pd.read_csv(StringIO(d))
df
date_created
0 2018-12-31 23:30:20+00:00
1 2018-12-31 23:30:00+00:00
2 2018-12-31 23:28:09+00:00
3 2018-12-31 23:28:08+00:00
Code
Option 1
df['date_created'] = pd.to_datetime(df.date_created,errors='coerce').dt.date
df
Output
date_created
0 2018-12-31
1 2018-12-31
2 2018-12-31
3 2018-12-31
Option 2, if we want to remove timezone
For timezone understanding, if you want to just remove timezone.
df['date_created'] = pd.to_datetime(df.date_created,errors='coerce').dt.tz_localize(None)
df
Output
date_created
0 2018-12-31 23:30:20
1 2018-12-31 23:30:00
2 2018-12-31 23:28:09
3 2018-12-31 23:28:08

Time difference in pandas (from string format to datetime)

I have the following column
Time
2:00
00:13
1:00
00:24
in object format (strings). This time refers to hours and minutes ago from a time that I need to use as a start: 8:00 (it might change; in this example is 8:00).
Since the times in the column Time are referring to hours/minutes ago, what I would like to expect should be
Time
6:00
07:47
7:00
07:36
calculated as time difference (e.g. 8:00 - 2:00).
However, I am having difficulties in doing this calculation and transform the result in a datetime (keeping only hours and minutes).
I hope you can help me.
Since the Time columns contains only Hour:Minute I suggest using timedelta instead of datetime:
df['Time'] = pd.to_timedelta(df.Time+':00')
df['Start_Time'] = pd.to_timedelta('8:00:00') - df['Time']
Output:
Time Start_Time
0 02:00:00 06:00:00
1 00:13:00 07:47:00
2 01:00:00 07:00:00
3 00:24:00 07:36:00
you can do it using pd.to_datetime.
ref = pd.to_datetime('08:00') #here define the hour of reference
s = ref-pd.to_datetime(df['Time'])
print (s)
0 06:00:00
1 07:47:00
2 07:00:00
3 07:36:00
Name: Time, dtype: timedelta64[ns]
This return a series, that can be change to a dataframe with s.to_frame() for example

How to add date and time information to time series data using python numpy or pandas

I have trouble with using date & time calculation with pandas.
I think there are some logics that calculate duration automatically beginning with specific date & time. But still I couldn't find it.
I'd like to know how to add date & time info to 1 second duration time series data.
Before:
10
12
13
..
20
21
19
18
After:
1013-01-01 00:00:00 10
1013-01-01 00:00:01 12
1013-01-01 00:00:02 13
..
1013-10-04 12:45:40 20
1013-10-04 12:45:41 21
1013-10-04 12:45:42 19
1013-10-04 12:45:43 18
Any help would be appreciated.
Thank you in advance.
The documentation gives a similar example at the beginning using date_range. If you have a Series object, you can make a DatetimeIndex starting at the appropriate time (I'm assuming 1013 was a typo for 2013), with a frequency of one second, and of the appropriate length:
>>> x = pd.Series(np.random.randint(8,24,23892344)) # make some random data
>>> when = pd.date_range(start=pd.datetime(2013,1,1),freq='S',periods=len(x))
>>> when
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00, ..., 2013-10-04 12:45:43]
Length: 23892344, Freq: S, Timezone: None
and then we can make a new series from the original data using this as the new index:
>>> x_with_time = pd.Series(x.values, index=when)
>>> x_with_time
2013-01-01 00:00:00 13
2013-01-01 00:00:01 14
2013-01-01 00:00:02 15
2013-01-01 00:00:03 22
2013-01-01 00:00:04 16
[...]
2013-10-04 12:45:41 21
2013-10-04 12:45:42 16
2013-10-04 12:45:43 15
Freq: S, Length: 23892344
how about using python's datetime module combined with timeestamps?
from datetime import datetime
START_TIME = 1381069963.506736
secs = [10, 12, 13, 20, 21, 19, 18]
dates = [datetime.fromtimestamp(START_TIME + sec) for sec in secs]
after this, the list dates is a list of datetime objects with START_TIME + <given interval from time series data>

Categories

Resources