pd.read_sql formatting issue with timestamp - python

I am using the below to extract data from database using cx_Oracle.connect for connection. I am having issue when the field I'm trying to extract from database is of datatype TIMESTAMP(6).
Value retrieved is 1625236451324000000 instead of 02-JUL-21 02.54.05.569000 PM
df_ora = pd.read_sql(sql_query_lpi, con=md_connection)
df_list=df_ora.values.tolist()
for columnname in df_list:
run_info = dict()
run_info['UPDATE_TS'] = columnname[0]
Any special formatting required in pandas to handle this ?
Thank you for any help/suggestion.

you need to pass in the unit argument when casting it to a datetime,
pd.to_datetime(your_col, unit='ns')
In [4]: pd.Timestamp(1625236451324000000, unit='ns')
Out[4]: Timestamp('2021-07-02 14:34:11.324000')
import pandas as pd
df = pd.DataFrame({'col' : [1625236451324000000]})
df['date_col'] = pd.to_datetime(df['col'], unit='ns')
print(df)
col date_col
0 1625236451324000000 2021-07-02 14:34:11.324
Edit.
If you need to preserve a format then use .dt.srftime, note this will turn your timestamp into a string.
df['date_col_sql'] = pd.to_datetime(df['col'], unit='ns')\
.dt.strftime('%d-%b-%y %I.%M.%S.%f')
col date_col date_col_sql
0 1625236451324000000 2021-07-02 14:34:11.324 02-Jul-21 02.34.11.324000

Related

How to convert a column in a dataframe to an index datetime object?

I have a question about how to convert a column 'Timestamp' into an index&datetime. And then also drop the column once it's converted into an index.
df = {'Timestamp':['20/01/2021 01:00:00.12 AM','20/01/2021 01:00:00.21 AM','20/01/2021 01:00:01.34 AM],
'Value':['14','178','158','75']}
I tried the following, but obvious didn't work.
df.Timestamp = pd.to_datetime(df.Timestamp.str[0])
df=df.set_index(['Timestamp'], drop=True)
FYI. The df is actually a lot text processing so unfortunately I cannot just do read_csv and parse datetime object. :( So yes, the df is exactly as what's prescribed above.
Thank you.
Don't enclose 'Timestamp' in square brackets.
import pandas as pd
df = pd.DataFrame({'Timestamp':['20/01/2021 01:00:00.12 AM','20/01/2021 01:00:00.21 AM','20/01/2021 01:00:01.34 AM'],
'Value':['14','178','158']})
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.set_index('Timestamp')
print(df)
## Output
Value
Timestamp
20/01/2021 01:00:00.12 AM 14
20/01/2021 01:00:00.21 AM 178
20/01/2021 01:00:01.34 AM 158

Filter particular date in a DF column

I want to filter particular date in a DF column.
My code:
df
df["Crawl Date"]=pd.to_datetime(df["Crawl Date"]).dt.date
date=pd.to_datetime("03-21-2020")
df=df[df["Crawl Date"]==date]
It is showing no match.
Note: df column is having time also with date which need to be trimmed.
Thanks in advance.
The following script assumes that the 'Crawl Dates' column contains strings:
import pandas as pd
import datetime
column_names = ["Crawl Date"]
df = pd.DataFrame(columns = column_names)
#Populate dataframe with dates
df.loc[0] = ['03-21-2020 23:45:57']
df.loc[1] = ['03-22-2020 23:12:33']
df["Crawl Date"]=pd.to_datetime(df["Crawl Date"]).dt.date
date=pd.to_datetime("03-21-2020")
df=df[df["Crawl Date"]==date]
Then df returns:
Crawl Date 0 2020-03-21

Python Pandas sort by Time and group by user ID

I am loading a CSV file with pandas. It has three columns: a column with date and time, a column with a user id, and another 'campaignID'.
Example rows:
date user_id campaign_id
2018-01-10 0:21:09 151312395 GOOGLE
2018-01-10 0:21:19 151312395 GOOGLE
2018-01-10 0:21:32 151312395 GOOGLE
I want to group the data by the user id, and then for each user id group the rows by time and the campaign ID, it should look as follows.
user_id date ad_campaign
151312395 2018-01-10 0:21:09 GOOGLE
2018-01-10 0:21:19 GOOGLE
2018-01-10 0:21:32 GOOGLE
This is what I have made until now:
import pandas as pd
import numpy as np
import datetime
def dateparse(time_in_secs):
return datetime.datetime.fromtimestamp(float(time_in_secs))
columnnames = ['date','user_id', 'ad_campaign']
columnnames, sep='\t' ,usecols=[0,1,3],index_col = 'date')
df=pd.read_csv(r'C:\Users\L\Desktop\Data.csv' ,
sep='\t',names = columnnames, usecols=[0,1,3],
parse_dates=True,date_parser=dateparse)
df.date = pd.to_datetime(df.date)
df = df.sort_values(by = 'date')
g = df.groupby('user_id')['ad_campaign']
print(g)
This gives the following output:
<pandas.core.groupby.SeriesGroupBy object at 0x04EF26F0>
[Finished in 0.6s]
Why doesnt the print provide the sorted columns?
Firstly, if you are doing groupby, you don't need to sort the column explicitly.
You can do:
Method 1:
df.date = pd.to_datetime(df.date)
g = df.groupby(['user_id','date'])['ad_campaign']
print(g.first())
Method 2:
df.set_index(['user_id','date']).sort_index()
You could try df.set_index(['user_id', 'date']).

How to change datetime format?

I have timestamps given in the following format in my pandas DataFrame df: 2015-03-09 11:09:05.0.
How can I transform them into this format 2015-03-09T11:09:05.0 (i.e. separated by T)?
df["dt"] = df["dt"].apply(lambda x: ???)
You were almost there. You are looking for the the isoformat. https://docs.python.org/3.6/library/datetime.html#datetime.date.isoformat
import pandas as pd
df = pd.DataFrame({'dt':pd.to_datetime(['2015-03-09 11:09:05.0'])})
df["dt"] = df["dt"].apply(lambda x: x.isoformat())
df
Returns
dt
0 2015-03-09T11:09:05
You can change the T (default) by inserting a parameter to isoformat(), e.g. df["dt"] = df["dt"].apply(lambda x: x.isoformat(" "))
Use strftime with custom format:
df = pd.DataFrame({'dt':pd.to_datetime(['2015-03-09 11:09:05.0'])})
print (df)
df["dt"] = df["dt"].dt.strftime('%Y-%m-%dT%H:%M:%S.%f')
print (df)
dt
0 2015-03-09T11:09:05.000000
Or convert to string, split by whitespace and join by T:
df["dt"] = df["dt"].astype(str).str.split().str.join('T')

Replace text with numbers using dictionary in pandas

I'm trying to replace months represented as a character (e.g. 'NOV') for their numerical counterparts ('-11-'). I can get the following piece of code to work properly.
df_cohorts['ltouch_datetime'] = df_cohorts['ltouch_datetime'].str.replace('NOV','-11-')
df_cohorts['ltouch_datetime'] = df_cohorts['ltouch_datetime'].str.replace('DEC','-12-')
df_cohorts['ltouch_datetime'] = df_cohorts['ltouch_datetime'].str.replace('JAN','-01-')
However, to avoid redundancy, I'd like to use a dictionary and .replace to replace the character variable for all months.
r_month1 = {'JAN':'-01-','FEB':'-02-','MAR':'-03-','APR':'-04-','MAY':'-05-','JUN':'-06-','JUL':'-07-','AUG':'-08-','SEP':'-09-','OCT':'-10-','NOV':'-11-','DEC':'-12-'}
df_cohorts.replace({'conversion_datetime': r_month1,'ltouch_datetime': r_month1})
When I enter the code above, my output dataset is unchanged. For reference, please see my sample data below.
User_ID ltouch_datetime conversion_datetime
001 11NOV14:13:12:56 11NOV14:16:12:00
002 07NOV14:17:46:14 08NOV14:13:10:00
003 04DEC14:17:46:14 04DEC15:13:12:00
Thanks!
Let me suggest a different approach: You could parse the date strings into a column of pandas TimeStamps like this:
import pandas as pd
df = pd.read_table('data', sep='\s+')
for col in ('ltouch_datetime', 'conversion_datetime'):
df[col] = pd.to_datetime(df[col], format='%d%b%y:%H:%M:%S')
print(df)
# User_ID ltouch_datetime conversion_datetime
# 0 1 2014-11-11 13:12:56 2014-11-11 16:12:00
# 1 2 2014-11-07 17:46:14 2014-11-08 13:10:00
# 2 3 2014-12-04 17:46:14 2015-12-04 13:12:00
I would stop right here, since representing dates as TimeStamps is the ideal
form for the data in Pandas.
However, if you need/want date strings with 3-letter months like 'NOV' converted to -11-, then you can convert the Timestamps with strftime and apply:
for col in ('ltouch_datetime', 'conversion_datetime'):
df[col] = df[col].apply(lambda x: x.strftime('%d-%m-%y:%H:%M:%S'))
print(df)
yields
User_ID ltouch_datetime conversion_datetime
0 1 11-11-14:13:12:56 11-11-14:16:12:00
1 2 07-11-14:17:46:14 08-11-14:13:10:00
2 3 04-12-14:17:46:14 04-12-15:13:12:00
To answer your question literally, in order to use Series.str.replace you need a column with the month string abbreviations all by themselves. You can arrange for that by first calling Series.str.extract. Then you can join the columns back into one using apply:
import pandas as pd
import calendar
month_map = {calendar.month_abbr[m].upper():'-{:02d}-'.format(m)
for m in range(1,13)}
df = pd.read_table('data', sep='\s+')
for col in ('ltouch_datetime', 'conversion_datetime'):
tmp = df[col].str.extract(r'(.*?)(\D+)(.*)')
tmp[1] = tmp[1].replace(month_map)
df[col] = tmp.apply(''.join, axis=1)
print(df)
yields
User_ID ltouch_datetime conversion_datetime
0 1 11-11-14:13:12:56 11-11-14:16:12:00
1 2 07-11-14:17:46:14 08-11-14:13:10:00
2 3 04-12-14:17:46:14 04-12-15:13:12:00
Finally, although you haven't asked for this directly, it's good to be aware
that if your data is in a file, you can parse the datestring columns into
TimeStamps directly using
import pandas as pd
import datetime as DT
df = pd.read_table(
'data', sep='\s+', parse_dates=[1,2],
date_parser=lambda x: DT.datetime.strptime(x, '%d%b%y:%H:%M:%S'))
This might be the most convenient method of all (assuming you want TimeStamps).

Categories

Resources