I am new to python (coming from R), and I am trying to understand how I can convert a timestamp series in a pandas dataframe (in my case this is called df['timestamp']) into what I would call a string vector in R. is this possible? How would this be done?
I tried df['timestamp'].apply('str'), but this seems to simply put the entire column df['timestamp'] into one long string. I'm looking to convert each element into a string and preserve the structure, so that it's still a vector (or maybe this a called an array?)
Consider the dataframe df
df = pd.DataFrame(dict(timestamp=pd.to_datetime(['2000-01-01'])))
df
timestamp
0 2000-01-01
Use the datetime accessor dt to access the strftime method. You can pass a format string to strftime and it will return a formatted string. When used with the dt accessor you will get a series of strings.
df.timestamp.dt.strftime('%Y-%m-%d')
0 2000-01-01
Name: timestamp, dtype: object
Visit strftime.org for a handy set of format strings.
Use astype
>>> import pandas as pd
>>> df = pd.to_datetime(pd.Series(['Jul 31, 2009', '2010-01-10', None]))
>>> df.astype(str)
0 2009-07-31
1 2010-01-10
2 NaT
dtype: object
returns an array of strings
Following on from VinceP's answer, to convert a datetime Series in-place do the following:
df['Column_name']=df['Column_name'].astype(str)
Related
How should I transform from datetime to string? My attempt:
dates = p.to_datetime(p.Series(['20010101', '20010331']), format = '%Y%m%d')
dates.str
There is no .str accessor for datetimes and you can't do .astype(str) either.
Instead, use .dt.strftime:
>>> series = pd.Series(['20010101', '20010331'])
>>> dates = pd.to_datetime(series, format='%Y%m%d')
>>> dates.dt.strftime('%Y-%m-%d')
0 2001-01-01
1 2001-03-31
dtype: object
See the docs on customizing date string formats here: strftime() and strptime() Behavior.
For old pandas versions <0.17.0, one can instead can call .apply with the Python standard library's datetime.strftime:
>>> dates.apply(lambda x: x.strftime('%Y-%m-%d'))
0 2001-01-01
1 2001-03-31
dtype: object
As of pandas version 0.17.0, you can format with the dt accessor:
dates.dt.strftime('%Y-%m-%d')
There is a pandas function that can be applied to DateTime index in pandas data frame.
date = dataframe.index #date is the datetime index
date = dates.strftime('%Y-%m-%d') #this will return you a numpy array, element is string.
dstr = date.tolist() #this will make you numpy array into a list
the element inside the list:
u'1910-11-02'
You might need to replace the 'u'.
There might be some additional arguments that I should put into the previous functions.
I have a DataFrame with one column sensorTime representing timestamps in nanoseconds (currentTimeMillis() * 1000000). An example is the following:
sensorTime
1597199687312000000
1597199687315434496
1597199687320437760
1597199687325465856
1597199687330448640
1597199687335456512
1597199687340429824
1597199687345459456
1597199687350439168
1597199687355445504
I'm transforming this column to datetime using
res['sensorTime'] = pd.to_datetime(res['sensorTime'])
How can I transform the datetime back to timestamps in nanoseconds so that I get the exact values as in the example above?
The obvious way to do this would be to convert the datetime series to Unix time (nanoseconds since the epoch) using astype:
import pandas as pd
res = pd.DataFrame({'sensorTime': [1597199687312000000, 1597199687315434496, 1597199687320437760]})
res['sensorTime'] = pd.to_datetime(res['sensorTime'])
# back to nanoseconds / Unix time - .astype to get a pandas.series
s = res['sensorTime'].astype('int64')
print(type(s), s)
<class 'pandas.core.series.Series'>
0 1597199687312000000
1 1597199687315434496
2 1597199687320437760
Name: sensorTime, dtype: int64
Another option is to use a view:
# .view to get a numpy.ndarray
arr = res['sensorTime'].values.view('int64')
print(type(arr), arr)
<class 'numpy.ndarray'>
[1597199687312000000 1597199687315434496 1597199687320437760]
But be careful, it's just a view, so the underlying data stays the same. Meaning that here, if you change a value in the dataframe's Series, this change will also be visible in the view array.
Hi I am using pandas to convert a column to month.
When I read my data they are objects:
Date object
dtype: object
So I am first making them to date time and then try to make them as months:
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', encoding='utf-8-sig', usecols= ['Date', 'ids'])
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month
Also if that helps:
In [10]: df['Date'].dtype
Out[10]: dtype('O')
So, the error I get is like this:
/Library/Frameworks/Python.framework/Versions/2.7/bin/User/lib/python2.7/site-packages/pandas/core/series.pyc in _make_dt_accessor(self)
2526 return maybe_to_datetimelike(self)
2527 except Exception:
-> 2528 raise AttributeError("Can only use .dt accessor with datetimelike "
2529 "values")
2530
AttributeError: Can only use .dt accessor with datetimelike values
EDITED:
Date columns are like this:
0 2014-01-01
1 2014-01-01
2 2014-01-01
3 2014-01-01
4 2014-01-03
5 2014-01-03
6 2014-01-03
7 2014-01-07
8 2014-01-08
9 2014-01-09
Do you have any ideas?
Thank you very much!
Your problem here is that to_datetime silently failed so the dtype remained as str/object, if you set param errors='coerce' then if the conversion fails for any particular string then those rows are set to NaT.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
So you need to find out what is wrong with those specific row values.
See the docs
First you need to define the format of date column.
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S')
For your case base format can be set to;
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
After that you can set/change your desired output as follows;
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d')
Your problem here is that the dtype of 'Date' remained as str/object. You can use the parse_dates parameter when using read_csv
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', parse_dates= [col],encoding='utf-8-sig', usecols= ['Date', 'ids'],)
df['Month'] = df['Date'].dt.month
From the documentation for the parse_dates parameter
parse_dates : bool or list of int or names or list of lists or dict, default False
The behavior is as follows:
boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more.
Note: A fast-path exists for iso8601-formatted dates.
The relevant case for this question is the "list of int or names" one.
col is the columns index of 'Date' which parses as a separate date column.
#Convert date into the proper format so that date time operation can be easily performed
df_Time_Table["Date"] = pd.to_datetime(df_Time_Table["Date"])
# Cal Year
df_Time_Table['Year'] = df_Time_Table['Date'].dt.strftime('%Y')
train_data=pd.read_csv("train.csv",parse_dates=["date"])
I encountered a similar problem when trying to use pd.Series.dt.floor, although all the elements in my pd.Series were datetime.datetime instances (absolutely no NAs). I suspect it had to do with having tz-aware instances with different timezones.
My workaround, in order to take advantage of the pd.Timestamp.floor method was to define the following function:
def floor_datetime(base_datetime_aware, freq="2H"):
return pd.Timestamp(base_datetime_aware).floor(freq)
The I would just use pd.Series.apply to get every element of my Series through the function.
In the end, when you use the .dt accessor, the functions you would use are methods of the base classes, so using apply with a short custom function like mine may solve your problem!
When you write
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['Date'] = df['Date'].dt.strftime('%m/%d')
It can fixed
Hi I am using pandas to convert a column to month.
When I read my data they are objects:
Date object
dtype: object
So I am first making them to date time and then try to make them as months:
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', encoding='utf-8-sig', usecols= ['Date', 'ids'])
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month
Also if that helps:
In [10]: df['Date'].dtype
Out[10]: dtype('O')
So, the error I get is like this:
/Library/Frameworks/Python.framework/Versions/2.7/bin/User/lib/python2.7/site-packages/pandas/core/series.pyc in _make_dt_accessor(self)
2526 return maybe_to_datetimelike(self)
2527 except Exception:
-> 2528 raise AttributeError("Can only use .dt accessor with datetimelike "
2529 "values")
2530
AttributeError: Can only use .dt accessor with datetimelike values
EDITED:
Date columns are like this:
0 2014-01-01
1 2014-01-01
2 2014-01-01
3 2014-01-01
4 2014-01-03
5 2014-01-03
6 2014-01-03
7 2014-01-07
8 2014-01-08
9 2014-01-09
Do you have any ideas?
Thank you very much!
Your problem here is that to_datetime silently failed so the dtype remained as str/object, if you set param errors='coerce' then if the conversion fails for any particular string then those rows are set to NaT.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
So you need to find out what is wrong with those specific row values.
See the docs
First you need to define the format of date column.
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S')
For your case base format can be set to;
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
After that you can set/change your desired output as follows;
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d')
Your problem here is that the dtype of 'Date' remained as str/object. You can use the parse_dates parameter when using read_csv
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', parse_dates= [col],encoding='utf-8-sig', usecols= ['Date', 'ids'],)
df['Month'] = df['Date'].dt.month
From the documentation for the parse_dates parameter
parse_dates : bool or list of int or names or list of lists or dict, default False
The behavior is as follows:
boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more.
Note: A fast-path exists for iso8601-formatted dates.
The relevant case for this question is the "list of int or names" one.
col is the columns index of 'Date' which parses as a separate date column.
#Convert date into the proper format so that date time operation can be easily performed
df_Time_Table["Date"] = pd.to_datetime(df_Time_Table["Date"])
# Cal Year
df_Time_Table['Year'] = df_Time_Table['Date'].dt.strftime('%Y')
train_data=pd.read_csv("train.csv",parse_dates=["date"])
I encountered a similar problem when trying to use pd.Series.dt.floor, although all the elements in my pd.Series were datetime.datetime instances (absolutely no NAs). I suspect it had to do with having tz-aware instances with different timezones.
My workaround, in order to take advantage of the pd.Timestamp.floor method was to define the following function:
def floor_datetime(base_datetime_aware, freq="2H"):
return pd.Timestamp(base_datetime_aware).floor(freq)
The I would just use pd.Series.apply to get every element of my Series through the function.
In the end, when you use the .dt accessor, the functions you would use are methods of the base classes, so using apply with a short custom function like mine may solve your problem!
When you write
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['Date'] = df['Date'].dt.strftime('%m/%d')
It can fixed
How should I transform from datetime to string? My attempt:
dates = p.to_datetime(p.Series(['20010101', '20010331']), format = '%Y%m%d')
dates.str
There is no .str accessor for datetimes and you can't do .astype(str) either.
Instead, use .dt.strftime:
>>> series = pd.Series(['20010101', '20010331'])
>>> dates = pd.to_datetime(series, format='%Y%m%d')
>>> dates.dt.strftime('%Y-%m-%d')
0 2001-01-01
1 2001-03-31
dtype: object
See the docs on customizing date string formats here: strftime() and strptime() Behavior.
For old pandas versions <0.17.0, one can instead can call .apply with the Python standard library's datetime.strftime:
>>> dates.apply(lambda x: x.strftime('%Y-%m-%d'))
0 2001-01-01
1 2001-03-31
dtype: object
As of pandas version 0.17.0, you can format with the dt accessor:
dates.dt.strftime('%Y-%m-%d')
There is a pandas function that can be applied to DateTime index in pandas data frame.
date = dataframe.index #date is the datetime index
date = dates.strftime('%Y-%m-%d') #this will return you a numpy array, element is string.
dstr = date.tolist() #this will make you numpy array into a list
the element inside the list:
u'1910-11-02'
You might need to replace the 'u'.
There might be some additional arguments that I should put into the previous functions.