Convert date column formated as xx:xx.x

Convert date column formated as xx:xx.x - python

I have come across a CSV file that contains a date column formatted in the following manner: xx:xx.x, here's a couple of the data present in the column marked as date:
07:33.0
34:53.0
06:30.0
30:09.0
02:18.0
My question is what type of formatting is this? And how can I convert it to a proper date format using Python?

It looks like times without hours.
You can create timedeltas by add 0 hours by to_timedelta:
df['col'] = pd.to_timedelta('00:' + df['col'])
print (df)
col
0 0 days 00:07:33
1 0 days 00:34:53
2 0 days 00:06:30
3 0 days 00:30:09
4 0 days 00:02:18
Or convert to datetimes by to_datetime - there is added default date:
df['col'] = pd.to_datetime(df['col'], format='%M:%S.%f')
print (df)
col
0 1900-01-01 00:07:33
1 1900-01-01 00:34:53
2 1900-01-01 00:06:30
3 1900-01-01 00:30:09
4 1900-01-01 00:02:18

Related

Desperately Need Advice on Converting Date Column

I have a dataset that has mixed data types in the Date column.
For example, the column looks like this:
ID Date
1 2019-01-01
2 2019-01-02
3 2019-11-01
4 40993
5 40577
6 39949
When I just try to convert the column using pd.to_datetime, I get an error message "mixed datetimes and integers in passed array".
I would really appreciate it if someone could help me out with this! Ideally, it would be nice to have all rows in 'yyyy-mm-dd' format.
Thank you!

I'm guessing those are excel date format?
Convert Excel style date with pandas
import xlrd
def read_date(date):
try:
return xlrd.xldate.xldate_as_datetime(int(date), 0)
except:
return pd.to_datetime(date)
df['New Date'] = df['Date'].apply(read_date)
df
Out[1]:
ID Date New Date
0 1 2019-01-01 2019-01-01
1 2 2019-01-02 2019-01-02
2 3 2019-11-01 2019-11-01
3 4 40993 2012-03-25
4 5 40577 2011-02-03
5 6 39949 2009-05-16

Sort date in string format in a pandas dataframe?

I have a dataframe like this, how to sort this.
df = pd.DataFrame({'Date':['Oct20','Nov19','Jan19','Sep20','Dec20']})
Date
0 Oct20
1 Nov19
2 Jan19
3 Sep20
4 Dec20
I familiar in sorting list of dates(string)
a.sort(key=lambda date: datetime.strptime(date, "%d-%b-%y"))
Any thoughts? Should i split it ?

First convert column to datetimes and get positions of sorted values by Series.argsort what is used for change ordering with DataFrame.iloc:
df = df.iloc[pd.to_datetime(df['Date'], format='%b%y').argsort()]
print (df)
Date
2 Jan19
1 Nov19
3 Sep20
0 Oct20
4 Dec20
Details:
print (pd.to_datetime(df['Date'], format='%b%y'))
0 2020-10-01
1 2019-11-01
2 2019-01-01
3 2020-09-01
4 2020-12-01
Name: Date, dtype: datetime64[ns]

How to drop rows based on timestamp where hours are not in list

I have a large dataframe (several million rows) where one of my columns is a timestamp (labeled 'Timestamp') in the format "hh:mm:ss" e.g. "07:00:04". I want to drop the rows where the hour is NOT between or equal to 7 and 21.
I've have tried to convert the timestamps to strings and use slicing but I was not able to get it working and I believe there should be a more effective way.
# Create list of opening hours (these should not be droped)
opening_hour = 7
closeing_hour = 21
trading_hours = []
for hour in range(closeing_hour - opening_hour + 1):
add_hour = opening_hour + hour
trading_hours.append(add_hour)
My dataframe looks something like this:
Date Timestamp Close
0 20180102 07:05:00 12925.979
1 20180102 21:05:02 12925.479
2 20180102 22:05:04 12925.280
3 20180102 23:55:06 12925.479
4 20180102 06:05:07 12925.780
5 20180103 07:05:07 12925.780
[...]
I want to drop the rows with index 2, 3 and 4 (there are several thousand), so the result should be something like:
Date Timestamp Close
0 20180102 07:05:00 12925.979
1 20180102 21:05:02 12925.479
2 20180103 07:05:07 12925.780
[...]

First you can give your DataFrame a proper DatetimeIndex as follows:
dtidx = pd.DatetimeIndex(df['Date'].astype(str) + ' ' + df['Timestamp'].astype(str))
df.index = dtidx
and then use between_time to get the hours between hours 07 and 21 inclusive:
df.between_time('07:00', '22:00')
# returns
Date Timestamp Close
2018-01-02 07:05:00 20180102 07:05:00 12926
2018-01-02 21:05:02 20180102 21:05:02 12925.5
2018-01-03 07:05:07 20180103 07:05:07 12925.8

Since you mentioned about slicing and someone already mentioned about how to go with it, I would like to introduce you to extracting the hour using dt.hour
First convert your date with type string to date with type datetime:
df['date'] = pd.to_datetime(df['date'])
You can now easily extract the hour part using dt.hour:
df['hour'] = df['date'].dt.hour
You can also extract year, month, second, and so on in a similar way.
Now you can do normal filtering as you would do with other dataframes:
df[(df.hour >= 7) & (df.hour <= 21)]

I prefer the other answers which work with proper timestamp data types, but since you mentioned trying and failing with a string slicing method, it might be helpful for you to see a solution using string slicing that does work:
df['Hour'] = df['Timestamp'].str.slice(0, 2).astype(int)
df[(df['Hour'] >= 7) & (df['Hour'] <= 21)]
The first line creates a new integer column from the slice of the string which represents the hour, and the second line filters on said new column.
Date Timestamp Close Hour
0 20180102 07:05:00 12925.979 7
1 20180102 21:05:02 12925.479 21
5 20180103 07:05:07 12925.780 7

My guess would be to use pd.between_time.
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df.set_index('Timestamp').between_time('07:00:00', '21:59:59')
Timestamp Date Close
2019-07-22 07:05:00 20180102 12925.979
2019-07-22 21:05:02 20180102 12925.479
2019-07-22 07:05:07 20180103 12925.78

how to convert time in unorthodox format to timestamp in pandas dataframe

I have a column in my dataframe which I want to convert to a Timestamp. However, it is in a bit of a strange format that I am struggling to manipulate. The column is in the format HHMMSS, but does not include the leading zeros.
For example for a time that should be '00:03:15' the dataframe has '315'. I want to convert the latter to a Timestamp similar to the former. Here is an illustration of the column:
message_time
25
35
114
1421
...
235347
235959
Thanks

Use Series.str.zfill for add leading zero and then to_datetime:
s = df['message_time'].astype(str).str.zfill(6)
df['message_time'] = pd.to_datetime(s, format='%H%M%S')
print (df)
message_time
0 1900-01-01 00:00:25
1 1900-01-01 00:00:35
2 1900-01-01 00:01:14
3 1900-01-01 00:14:21
4 1900-01-01 23:53:47
5 1900-01-01 23:59:59
In my opinion here is better create timedeltas by to_timedelta:
s = df['message_time'].astype(str).str.zfill(6)
df['message_time'] = pd.to_timedelta(s.str[:2] + ':' + s.str[2:4] + ':' + s.str[4:])
print (df)
message_time
0 00:00:25
1 00:00:35
2 00:01:14
3 00:14:21
4 23:53:47
5 23:59:59

Converting numeric SAS dates to datetimes Pandas

I am currently trying to reproduce this: convert numeric sas date to datetime in Pandas
, but get the following error:
"Python int too large to convert to C long"
Here and example of my dates:
0 1.416096e+09
1 1.427069e+09
2 1.433635e+09
3 1.428624e+09
4 1.433117e+09
Name: dates, dtype: float64
Any ideas?

Here is a little hacky solution. If the date column is called 'date', try
df['date'] = pd.to_datetime(df['date'] - 315619200, unit = 's')
Here 315619200 is the number of seconds between Jan 1 1960 and Jan 1 1970.
You get
0 2004-11-15 00:00:00
1 2005-03-22 00:03:20
2 2005-06-05 23:56:40
3 2005-04-09 00:00:00
4 2005-05-31 00:03:20

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert date column formated as xx:xx.x - python

Related

Desperately Need Advice on Converting Date Column

Sort date in string format in a pandas dataframe?

How to drop rows based on timestamp where hours are not in list

how to convert time in unorthodox format to timestamp in pandas dataframe

Converting numeric SAS dates to datetimes Pandas

Categories

Resources