This is the plain column
0 06:55:22
1 06:55:23
2 06:55:24
3 06:55:25
4 06:55:26
And the I would like to put that column in the index, the problem is when I try to use the method resample() I always get the same problem:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
I've been using this to change the Time column to
datetime dt['Time'] = pd.to_datetime(dt['Time'],format).apply(lambda x: x.time())
You can use set_index to set the Time column as your index of the dataframe.
In [1954]: df.set_index('Time')
Out[1954]:
a
Time
06:55:23 1
06:55:24 2
06:55:25 3
06:55:26 4
Update after OP's comment
If you don't have a date column, so pandas will attach a default date 1900-01-01 when you convert it to datetime. Like this:
In [1985]: pd.to_datetime(df['Time'], format='%H:%M:%S')
Out[1985]:
0 1900-01-01 06:55:23
1 1900-01-01 06:55:24
2 1900-01-01 06:55:25
3 1900-01-01 06:55:26
Name: Time, dtype: datetime64[ns]
Related
Hello there stackoverflow community,
I would like to change the datetime format of a column, but I doesn't work and I don't know what I'am doing wrong.
After executing the following code:
df6['beginn'] = pd.to_datetime(df6['beginn'], unit='s', errors='ignore')
I got this output, and thats fine, but i would like to take out the hour to have only %m/%d/%Y left.
ID DATE
91060 2017-11-10 00:00:00
91061 2022-05-01 00:00:00
91062 2022-04-01 00:00:00
Name: beginn, Length: 91063, dtype: object
I've tried this one and many others
df6['beginn'] = df6['beginn'].dt.strftime('%m/%d/%Y')
and get the following output:
AttributeError: Can only use .dt accessor with datetimelike values.
But I don't understand why, I've transformed the data with pd.to_datetime or not?
Appreciate any hint you can give me! Thanks a lot!
The reason you have to use errors="ignore" is because not all the dates you are parsing are in the correct format. If you use errors="coerce" like #phi has mentioned then any dates that cannot be converted will be set to NaT. The columns datatype will still be converted to datatime64 and you can then format as you like and deal with the NaT as you want.
Example
A dataframe with one item in Date not written as Year/Month/Day (25th Month is wrong):
>>> df = pd.DataFrame({'ID': [91060, 91061, 91062, 91063], 'Date': ['2017/11/10', '2022/05/01', '2022/04/01', '2055/25/25']})
>>> df
ID Date
0 91060 2017/11/10
1 91061 2022/05/01
2 91062 2022/04/01
3 91063 2055/25/25
>>> df.dtypes
ID int64
Date object
dtype: object
Using errors="ignore":
>>> df['Date'] = pd.to_datetime(df['Date'], errors='ignore')
>>> df
ID Date
0 91060 2017/11/10
1 91061 2022/05/01
2 91062 2022/04/01
3 91063 2055/25/25
>>> df.dtypes
ID int64
Date object
dtype: object
Column Date is still an object because not all the values have been converted. Running df['Date'] = df['Date'].dt.strftime("%m/%d/%Y") will result in the AttributeError
Using errors="coerce":
>>> df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
>>> df
ID Date
0 91060 2017-11-10
1 91061 2022-05-01
2 91062 2022-04-01
3 91063 NaT
>>> df.dtypes
ID int64
Date datetime64[ns]
dtype: object
Invalid dates are set to NaT and the column is now of type datatime64 and you can now format it:
>>> df['Date'] = df['Date'].dt.strftime("%m/%d/%Y")
>>> df
ID Date
0 91060 11/10/2017
1 91061 05/01/2022
2 91062 04/01/2022
3 91063 NaN
Note: When formatting datatime64, it is converted back to type object so NaT's are changed to NaN. The issue you are having is a case of some dirty data not in the correct format.
I am trying to convert my column in a df into a time series. The dataset goes from March 23rd 2015-August 17th 2019 and the dataset looks like this:
time 1day_active_users
0 2015-03-23 00:00:00-04:00 19687.0
1 2015-03-24 00:00:00-04:00 19437.0
I am trying to convert the time column into a datetime series but it returns the column as an object. Here is the code:
data = pd.read_csv(data_path)
data.set_index('time', inplace=True)
data.index= pd.to_datetime(data.index)
data.index.dtype
data.index.dtype returns dtype('O'). I assume this is why when I try to index an element in time, it returns an error. For example, when I run this:
data.loc['2015']
It gives me this error
KeyError: '2015'
Any help or feedback would be appreciated. Thank you.
As commented, the problem might be due to the different timezones. Try passing utc=True to pd.to_datetime:
df['time'] = pd.to_datetime(df['time'],utc=True)
df['time']
Test Data
time 1day_active_users
0 2015-03-23 00:00:00-04:00 19687.0
1 2015-03-24 00:00:00-05:00 19437.0
Output:
0 2015-03-23 04:00:00+00:00
1 2015-03-24 05:00:00+00:00
Name: time, dtype: datetime64[ns, UTC]
And then:
df.set_index('time', inplace=True)
df.loc['2015']
gives
1day_active_users
time
2015-03-23 04:00:00+00:00 19687.0
2015-03-24 05:00:00+00:00 19437.0
I have a column in a dataframe which has timestamps and their datatype is object (string):
data_log = pd.read_csv(DATA_LOG_PATH)
print(data_log['LocalTime'])
0 09:38:49
1 09:38:50
2 09:38:51
3 09:38:52
4 09:38:53
...
Name: LocalTime, Length: 872, dtype: object
Now I try to convert to datetime:
data_log['LocalTime'] = pd.to_datetime(data_log['LocalTime'], format='%H:%M:%S')
print(data_log['LocalTime'])
0 1900-01-01 09:38:49
1 1900-01-01 09:38:50
2 1900-01-01 09:38:51
3 1900-01-01 09:38:52
4 1900-01-01 09:38:53
...
Name: LocalTime, Length: 872, dtype: datetime64[ns]
How do I remove that date there? I just want the time in the format that I specified, but it adds the 1900-01-01 to every row.
You can get the time part of a datetime series with Series.dt.time
print(data_log['LocalTime'].dt.time)
This series will consist of Python standard library datetime.time objects.
You can do it in different ways from the datatype with 1900-01-01:
data_log['LocalTime'] = pd.Series([lt.time() for lt in data_log['LocalTime']])
or using a lambda function:
data_log['LocalTime'] = data_log.LocalTime.apply(lambda x: x.time(), axis = 1)
For check the type in specific columns:
print(df['LocalTime'].dtypes)
to_dateTime func from pandas
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
df['LocalTime'] = pd.to_datetime(df['timestamp'], unit='s')
where: unit='s' defines the unit of the timestamp (seconds in this case)
For taking consider timezones:
df.rimestamp.dt.tz_localize('UTC').dt.tz_convert('Europe/Brussels')
how can i convert a float64 type value into datetime type value.
here is the the first five float values from the dataset:
0 41245.0
1 41701.0
2 36361.0
3 36145.0
4 42226.0
Name: product_first_sold_date, dtype: float64
And to convert the float type to datetime type value I wrote this:
from datetime import datetime
pd.to_datetime(y['product_first_sold_date'], format='%m%d%Y.0', errors='coerce')
but as the output I got 'NaT' for all the rows in the dataset:
0 NaT
1 NaT
2 NaT
3 NaT
4 NaT
Name: product_first_sold_date, Length: 19273, dtype: datetime64[ns]
then, this:
print(pd.to_datetime(y.product_first_sold_date, infer_datetime_format=True))
but it shows the same date for all the rows in the dataset
0 1970-01-01 00:00:00.000041245
1 1970-01-01 00:00:00.000041701
2 1970-01-01 00:00:00.000036361
3 1970-01-01 00:00:00.000036145
4 1970-01-01 00:00:00.000042226
and I really can't figure out what's wrong with the code?
i have also tried this:
pd.to_datetime(pd.Series(g.product_first_sold_date).astype(str), format='%d%m%Y.0')
and got this as output I have also change the format = '%Y%m%d.0':
ValueError: time data '41245.0' does not match format '%d%m%Y.0' (match)
it looks like nothing works or may be I just did something wrong, don't know how to fix this.Thanks in advance!
I'd assume these floating point values represent dates as Excel handles them internally, i.e. days since 1900-01-01:
To convert this format to Python/pandas datetime, you can do so by setting the appropriate origin and unit:
df['product_first_sold_date'] = pd.to_datetime(df['product_first_sold_date'],
origin='1899-12-30',
unit='D')
...which gives for the provided example
0 2012-12-02
1 2014-03-03
2 1999-07-20
3 1998-12-16
4 2015-08-10
Name: product_first_sold_date, dtype: datetime64[ns]
Important to note here (see #chux-ReinstateMonica's comment) is that 1900-01-01 is day 1 in Excel, not day zero (which you have to provide as origin). Day zero is 1899-12-30; in case you wonder why it's not 1899-12-31, the explanation is quite interesting, you can find more info here.
I have time in series, let's say 002959.20. I would like to change in this format 00:25:59.20 with timezone (+7.00) or 07:25:59. I have tried with 'strftime', but it's not working. How can i change the format? really appreciate any help! :)
edit
Here's my data:
$PU 003114.00
$PU 003114.20
$PU 003114.40
$PU 003114.60
$PU 003114.80
Name: Time, dtype: object
Here's my code:
y = (New[Time])
import time
time.strftime(y,'%H:%M:%S.%f')
output:
TypeError: strftime() argument 1 must be str, not Series
and I tried to convert to string
TypeError: Tuple or struct_time argument required
You can convert to datetime this way:
Using your data:
s
0 003114.00
1 003114.20
2 003114.80
Name: test, dtype: object
# reassign the Series to datetime format
s = pd.to_datetime(s, format='%H%M%S.%f')
s
0 1900-01-01 00:31:14.000
1 1900-01-01 00:31:14.200
2 1900-01-01 00:31:14.800
Name: test, dtype: datetime64[ns]
Adding 7 hours:
s = s + pd.offsets.Hour(7)
0 1900-01-01 07:31:14.000
1 1900-01-01 07:31:14.200
2 1900-01-01 07:31:14.800
Name: test, dtype: datetime64[ns]
There most certanly is a better solution (this is literally the worst possible solution), but you could:
Add 70000 to your current time
Convert it to a string
and group them like this: string[0] string[1] : string[2] string[3] : string[4-8]