Pandas objet-list datetime serie to datetime index - python

I'm using the fields parameter on the python-elasticsearch api to retrieve some data from elasticsearch trying to parse the #timestamp in iso format, for use in a pandas dataframe.
fields = \
[{
"field": "#timestamp",
"format": "strict_date_optional_time"
}]
By default elasticsearch return the results on array-list format as seen in doc:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-fields.html
The fields response always returns an array of values for each field, even when there is a single value in the _source.
Due to this the resulting dataframe contains a object-list serie that can't be parsed to a datetime serie by conventional methods.
Name: fields.#timestamp, Length: 18707, dtype: object
0 [2021-11-04T01:30:00.263Z]
1 [2021-11-04T01:30:00.385Z]
2 [2021-11-04T01:30:00.406Z]
3 [2021-11-04T01:30:00.996Z]
4 [2021-11-04T01:30:01.001Z]
...
8368 [2021-11-04T02:00:00.846Z]
8369 [2021-11-04T02:00:00.894Z]
8370 [2021-11-04T02:00:00.895Z]
8371 [2021-11-04T02:00:00.984Z]
8372 [2021-11-04T02:00:00.988Z]
When trying to parse the serie to datetime serie:
pd.to_datetime(["fields.#timestamp"])
That result in:
TypeError: <class 'list'> is not convertible to datetime
My use case requires lot of datetime formats and fields parameter suits very well querying multiple in formats, but the on listed object datetime string difficult the things.

The issue here is that items of fields.#timestamp are actually lists.
So you could do :
fields['timestamp'] = fields['timestamp'].str[0]
to extract the date from the list,
and then use pd.to_datetime :
fields['timestamp'] = pd.to_datetime(fields['timestamp'])

Related

How to prevent Pandas to_dict() from converting timestamps to string?

I have a dataframe with a date field which appear to be represented as unix timestamps. When i call df.to_dict() on it the dates are getting converted to a string like this yyyy-mm-dd .... how can I prevent this from happening?
I'm using the code to return a JSON in my FastAPI app ...
df_results = pd.read_sql_query(sql_query_str, _engine)
return_object["results"] = df_results.to_dict(orient='records')
# outputs "date": 2021-12-31" in the json
return_object["results"] = json.loads(df_results.to_json(orient='records'))
# outputs "date": 1640908800000 in the json
You can specify the data type before using .to_dict(). Calling it as an integer should keep the UNIX timestamp e.g
df.astype(int).to_dict()

how to convert data which is in time delta into datetime.?

|Event Date|startTime|
|----------|---------|
|2022-11-23|0 days 08:30:00|
when i was tring to get data a sql table to dataframe using variables from columns of other dataframe
it came like this i want it only the time 08:30:00 what to do to get the required output
output I need is like this
|Event Date|startTime|
|----------|---------|
|2022-11-23|08:30:00|
i tried
sql['startTime']=pd.to_datetime(df1['startTime']).dt.time
it is showing this error
TypeError: <class 'datetime.time'> is not convertible to datetime
tried finding for it be didn't get anything useful solution but came across the opposite situation question still not useful info present in the question for my situation
Add the timedelta to a datetime, then you have a time component. Ex:
import pandas as pd
df = pd.DataFrame({"Event Date": ["2022-11-23"],
"startTime": ["0 days 08:30:00"]})
# ensure correct datatypes; that might be not necessary in your case
df["Event Date"] = pd.to_datetime(df["Event Date"])
df["startTime"] = pd.to_timedelta(df["startTime"])
# create a datetime Series that includes the timedelta
df["startDateTime"] = df["Event Date"] + df["startTime"]
df["startDateTime"].dt.time
0 08:30:00
Name: startDateTime, dtype: object

Incoherent handling of datetime objects in by the DataFrame.to_markdown method

Main objective
I want to transform a pandas.DataFrame with a single column, containing datetime.datetime objects, into its markdown representation, using the pandas.DataFrame.to_markdown method.
Issue
The markdown table does not display the date as desired (displays a timestamp instead of the expected YYYY-MM-DD HH:mm:SS). How can I make it display the date in the usual format?
Code
from datetime import datetime
import pandas as pd
df:pd.DataFrame = pd.DataFrame({
"date": [
datetime(year=2022, month=1, day=1, hour=1, minute=1, second=1),
datetime(year=2022, month=6, day=2, hour=2, minute=2, second=2),
datetime(year=2022, month=10, day=3, hour=3, minute=3, second=3)
]
})
print(df.to_markdown())
Displays
date
0
1.641e+18
1
1.65414e+18
2
1.66477e+18
Why is this "incoherent"?
When I first had to display this DataFrame, I had add one column to it, in which I inserted the year of the corresponding Timestamp object. I thus have 2 columns, with respectively pandas._libs.tslibs.timestamps.Timestamp and numpy.int64 objects in them.
When converted to markdown, it produced the desired effect by formatting the Timestamp as expected.
Code
# To add after the previous code
df["year"] = df.date.apply(lambda x: x.year)
print(df.to_markdown())
Displays
date
year
0
2022-01-01 01:01:01
2022
1
2022-06-02 02:02:02
2022
2
2022-10-03 03:03:03
2022
Lead
By checking the types by using the pandas.DataFrame.info method, and by calling the type method on the content of the individual cells, I observed that the types are not always the same. Is it normal?
For instance, the type method on the content of a year cell will show that these cells contain numpy.int64 objects, while the info method will display the content of the column as int64.
Additionally, the date column will be shown as filled of datetime64[ns] by the info method, while the type one will say the cells are pandas._libs.tslibs.timestamps.Timestamp.
Could it have any influence whatsoever?

how to change the data type date object to datetime in python?

In a train data set, datetime column is an object . First row of this column : 2009-06-15 17:26:21 UTC . I tried splitting the data
train['Date'] = train['pickup_datetime'].str.slice(0,11)
train['Time'] = test['pickup_datetime'].str.slice(11,19)
So that I can split the Date and time as two variables and change them to datetime data type. Tried lot of methods but could not get the result.
train['Date']=pd.to_datetime(train['Date'], format='%Y-%b-%d')
Also tried spliting the date,time and UTC
train['DateTime'] = pd.to_datetime(train['DateTime'])
Please suggest a code for this. I am a begginer.
Thanks in advance
I would try the following
import pandas as pd
#create some random dates matching your formatting
df = pd.DataFrame({"date": ["2009-06-15 17:26:21 UTC", "2010-08-16 19:26:21 UTC"]})
#convert to datetime objects
df["date"] = pd.to_datetime(df["date"])
print(df["date"].dt.date) #returns the date part without tz information
print(df["date"].dt.time) #returns the time part
Output:
0 2009-06-15
1 2010-08-16
Name: date, dtype: object
0 17:26:21
1 19:26:21
Name: date, dtype: object
For further information feel free to consult the docs:
dt.date
dt.time
For your particular case:
#convert to datetime object
df['pickup_datetime']= pd.to_datetime(df['pickup_datetime'])
# seperate date and time
df['Date'] = df['pickup_datetime'].dt.date
df['Time'] = df['pickup_datetime'].dt.time

How to format Twitter (and other) timestamps?

UPDATE: The problem was dirty data and not a data type issue. The above options SHOULD work if your data is clean. In my case, I had about 10 records where the language code had been shifted over into the timestamp field :(
ORIGINAL POST:
I am trying to work with Twitter timestamps which look like this:
df.created_at.head()
0 2015-10-23T07:57:45.000Z
1 2015-10-23T07:56:04.000Z
2 2015-10-23T07:48:26.000Z
3 2015-10-23T07:48:07.000Z
4 2015-10-23T07:44:09.000Z
Name: created_at, dtype: object
I am trying to convert 'created_at' into a datetime data type. I have tried a few ways of doing this but they all give me errors.
If I try to change the data type I get this error:
df.created_at.astype('datetime64[ns]')
ValueError: Error parsing datetime string "en" at position 0
If I use a tweaked version of #Alexander's suggestion below, I get this error:
s = pd.Series(df.created_at)
datetime_idx = pd.DatetimeIndex(pd.to_datetime(s))
ValueError: Unable to convert 0 2015-10-23T07:57:45.000Z...
This approach gives me the following error:
pd.to_datetime(df.created_at, format="%Y-%m-%dT%H:%M:%S.000Z")
ValueError: time data u'en' does not match format '%Y-%m-%dT%H:%M:%S.000Z' (match)
Is this what you're looking for? I just used DatetimeIndex on the series converted to datetime with to_datetime.
s = pd.Series(['2015-10-23T07:57:45.000Z', '2015-10-23T07:56:04.000Z', '2015-10-23T07:48:26.000Z', '2015-10-23T07:48:07.000Z', '2015-10-23T07:44:09.000Z'], name='created_at')
datetime_idx = pd.DatetimeIndex(pd.to_datetime(s))
>>> datetime_idx
DatetimeIndex(['2015-10-23 07:57:45', '2015-10-23 07:56:04', '2015-10-23 07:48:26', '2015-10-23 07:48:07', '2015-10-23 07:44:09'], dtype='datetime64[ns]', freq=None, tz=None)

Categories

Resources