I have timestamps given in the following format in my pandas DataFrame df: 2015-03-09 11:09:05.0.
How can I transform them into this format 2015-03-09T11:09:05.0 (i.e. separated by T)?
df["dt"] = df["dt"].apply(lambda x: ???)
You were almost there. You are looking for the the isoformat. https://docs.python.org/3.6/library/datetime.html#datetime.date.isoformat
import pandas as pd
df = pd.DataFrame({'dt':pd.to_datetime(['2015-03-09 11:09:05.0'])})
df["dt"] = df["dt"].apply(lambda x: x.isoformat())
df
Returns
dt
0 2015-03-09T11:09:05
You can change the T (default) by inserting a parameter to isoformat(), e.g. df["dt"] = df["dt"].apply(lambda x: x.isoformat(" "))
Use strftime with custom format:
df = pd.DataFrame({'dt':pd.to_datetime(['2015-03-09 11:09:05.0'])})
print (df)
df["dt"] = df["dt"].dt.strftime('%Y-%m-%dT%H:%M:%S.%f')
print (df)
dt
0 2015-03-09T11:09:05.000000
Or convert to string, split by whitespace and join by T:
df["dt"] = df["dt"].astype(str).str.split().str.join('T')
Related
I am using the below to extract data from database using cx_Oracle.connect for connection. I am having issue when the field I'm trying to extract from database is of datatype TIMESTAMP(6).
Value retrieved is 1625236451324000000 instead of 02-JUL-21 02.54.05.569000 PM
df_ora = pd.read_sql(sql_query_lpi, con=md_connection)
df_list=df_ora.values.tolist()
for columnname in df_list:
run_info = dict()
run_info['UPDATE_TS'] = columnname[0]
Any special formatting required in pandas to handle this ?
Thank you for any help/suggestion.
you need to pass in the unit argument when casting it to a datetime,
pd.to_datetime(your_col, unit='ns')
In [4]: pd.Timestamp(1625236451324000000, unit='ns')
Out[4]: Timestamp('2021-07-02 14:34:11.324000')
import pandas as pd
df = pd.DataFrame({'col' : [1625236451324000000]})
df['date_col'] = pd.to_datetime(df['col'], unit='ns')
print(df)
col date_col
0 1625236451324000000 2021-07-02 14:34:11.324
Edit.
If you need to preserve a format then use .dt.srftime, note this will turn your timestamp into a string.
df['date_col_sql'] = pd.to_datetime(df['col'], unit='ns')\
.dt.strftime('%d-%b-%y %I.%M.%S.%f')
col date_col date_col_sql
0 1625236451324000000 2021-07-02 14:34:11.324 02-Jul-21 02.34.11.324000
I need to add seconds in YYYY-MM-DD-HH-MM-SS. My code works perfectly for one data point but not for the whole set. The data.txt consists of 7 columns and around 200 rows.
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
df = pd.read_csv('data.txt',sep='\t',header=None)
a = np.array(list(df[0]))
b = np.array(list(df[1]))
c = np.array(list(df[2]))
d = np.array(list(df[3]))
e = np.array(list(df[4]))
f = np.array(list(df[5]))
g = np.array(list(df[6]))
t1=datetime(year=a, month=b, day=c, hour=d, minute=e, second=f)
t = t1 + timedelta(seconds=g)
print(t)
You can pass parameter names to read_csv for new columns names in first step and then convert first 5 columns to datetimes by to_datetime and add seconds converted to timedeltas by to_timedelta:
names = ["year","month","day","hour","minute","second","new"]
df = pd.read_csv('data.txt',sep='\t',names=names)
df['out'] = pd.to_datetime(df[names]) + pd.to_timedelta(df["new"], unit='s')
use apply with axis=1 to apply a function to every row of the dataframe.
df.apply(lambda x: datetime(year=x[0],
month=x[1],
day=x[2],
hour=x[3],
minute=x[4],
second=x[5]) + timedelta(seconds=int(x[6])) , axis=1)
generating dataset
simple to do as pandas series
s = 20
df = pd.DataFrame(np.array([np.random.randint(2015,2020,s),np.random.randint(1,12,s),np.random.randint(1,28,s),
np.random.randint(0,23,s), np.random.randint(0,59,s), np.random.randint(0,59,s),
np.random.randint(0,200,s)]).T,
columns=["year","month","day","hour","minute","second","add"])
pd.to_datetime(df.loc[:,["year","month","day","hour","minute","second"]]) + df["add"].apply(lambda s: pd.Timedelta(seconds=s))
without using apply()
pd.to_datetime(df.loc[:,["year","month","day","hour","minute","second"]]) + pd.to_timedelta(df["add"], unit="s")
Trying to remove "time" from rows in column using pandas '06/07/2020 14:00'
How can I access last 6 characters of a string to replace it using str.replace("x", "")
Your advice will be much appreciated.
data = {'datetime': ['06/07/2020 14:00', '06/07/2020 16:00', '06/07/2020 18:00']}
df = pd.DataFrame(data)
df['date'] = df['datetime'].str[:-6]
.str[:-5]
is the solution I was looking for.
time_and_date = '06/07/2020 14:00'
only_time = time_and_date.split(' ')[1]
# or
only_time = time_and_date[-5:]
if you want to replace the time in the string, you can do it like this:
time_and_date = '06/07/2020 14:00'
new_value_to_be_placed = 'Some value'
new_time_and_date = time_and_date.split(' ')[0] + new_value_to_be_placed
Use apply and lambda expressions. If your column is really a string:
import pandas as pd
from datetime import datetime, timedelta
d1 = {'my_date_str': ['06/07/2020 14:00', '08/07/2020 14:00'], 'my_date': [datetime.now(), datetime.now() - timedelta(days=10)]}
d1 = pd.DataFrame(data=d1)
d1['my_date_str_new'] = d1['my_date_str'].apply(lambda x: x[:10])
if your column is a datetime object:
d1['my_date_new'] = d1['my_date'].apply(lambda x: x.date())
I tried the following code.
The result1 is filtered by a given date, but the result2 isn't filtered.
How can I filter by date in a function?
import pandas as pd
over20='https://gist.githubusercontent.com/shinokada/dfcdc538dedf136d4a58b9bcdcfc8f18/raw/d1db4261b76af67dd67c00a400e373c175eab428/LNS14000024.csv'
df_over20 = pd.read_csv(over20)
display(df_over20)
result1=df_over20[df_over20['DATE']>='1972-01-01']
display(result1)
def changedate(item):
# something more here
item['DATE']=pd.to_datetime(item['DATE'])
start=pd.to_datetime('1972-01-01')
item[item['DATE']>=start]
return item
result2=changedate(df_over20)
display(result2)
In my experience I would make the Date column the index by running:
df.index = df[“DATE”]
df.drop(“DATE” , inplace = True , axis = 1 )
Try to use the index column
date = DT.datetime(‘2020-04-01’)
x = df[df.index > date]
You can also use the following command to make sure your index is a datetime index
df.index = pd.to_datetime( df.index )
You should not compare datetime by own string. it leads bad result.
please use this.
import datetime
def compare (date1,date2):
date1 = datetime.datetime.fromisoformat(date1).timestamp()
date2 = datetime.datetime.fromisoformat(date2).timestamp()
if(date1>date2):
return 1
elif(date1 == date2):
return 0
else:
return -1
I have a dataframe with 'Date' and 'Value', where the Date is in format m/d/yyyy. I need to convert to yyyymmdd.
df2= df[["Date", "Transaction"]]
I know datetime can do this for me, but I can't get it to accept my format.
example data files:
6/15/2006,-4.27,
6/16/2006,-2.27,
6/19/2006,-6.35,
You first need to convert to datetime, using pd.datetime, then you can format it as you wish using strftime:
>>> df
Date Transaction
0 6/15/2006 -4.27
1 6/16/2006 -2.27
2 6/19/2006 -6.35
df['Date'] = pd.to_datetime(df['Date'],format='%m/%d/%Y').dt.strftime('%Y%m%d')
>>> df
Date Transaction
0 20060615 -4.27
1 20060616 -2.27
2 20060619 -6.35
You can say:
df['Date']=df['Date'].dt.strftime('%Y%m%d')
dt accesor's strftime method is your clear friend now.
Note: if didn't convert to pandas datetime yet, do:
df['Date']=pd.to_datetime(df['Date']).dt.strftime('%Y%m%d')
Output:
Date Transaction
0 20060615 -4.27
1 20060616 -2.27
2 20060619 -6.35
For a raw python solution, you could try something along the following (assuming datafile is a string).
datafile="6/15/2006,-4.27,\n6/16/2006,-2.27,\n6/19/2006,-6.35"
def zeroPad(str, desiredLen):
while (len(str) < desiredLen):
str = "0" + str
return str
def convToYYYYMMDD(datafile):
datafile = ''.join(datafile.split('\n')) # remove \n's, they're unreliable and not needed
datafile = datafile.split(',') # split by comma so '1,2' becomes ['1','2']
out = []
for i in range(0, len(datafile)):
if (i % 2 == 0):
tmp = datafile[i].split('/')
yyyymmdd = zeroPad(tmp[2], 4) + zeroPad(tmp[0], 2) + zeroPad(tmp[1], 2)
out.append(yyyymmdd)
else:
out.append(datafile[i])
return out
print(convToYYYYMMDD(datafile))
This outputs: ['20060615', '-4.27', '20060616', '-2.27', '20060619', '-6.35'].