I have a csv file with a long timestamp column (years):
1990-05-12 14:01
.
.
1999-01-10 10:00
where the time is in hh:mm format. I'm trying to extract each day worth of data into a new csv file. Here's my code:
import datetime
import pandas as pd
df = pd.read_csv("/home/parallels/Desktop/ewh_log/hpwh_log.csv",parse_dates=True)
#change timestmap column format
def extract_months_data(df):
df = pd.to_datetime(df['timestamp'])
print(df)
def write_o_csv(df):
print('writing ..')
#todo
x1 = pd.to_datetime(df['timestamp'],format='%m-%d %H:%M').notnull().all()
if (x1)==True:
extract_months_data(df)
else:
x2 = pd.to_datetime(df['timestamp'])
x2 = x1.dt.strftime('%m-%d %H:%M')
write_to_csv(df)
The issue is that when I get to the following line
def extract_months_data(df):
df = pd.to_datetime(df['timestamp'])
I get the following error:
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime
Is there alternative solution to do it with pandas without ignoring the rest of the data. I saw posts that suggested using coerce but that replaces the rest of the data with NaT.
Thanks
UPDATE:
This post here here answers half of the question which is how to filter hours (or minutes) out of timestamp column. The second part would be how to extract a full day to another csv file. I'll post updates here once I get to a solution.
You are converting to datetime two times which is not needed
Something like that should work
import pandas as pd
df = pd.read_csv('data.csv')
df['month_data'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M')
df['month_data'] = df['month_data'].dt.strftime('%m-%d %H:%M')
# If you dont want columns with month_data NaN
df = df[df['month_data'].notna()]
print(df)
Related
I am trying to split a column from a CSV file. The first column contains a date (YYmmdd) and then time (HHmmss) so the string looks like 20221001131245. I want to split this so it reads 2022 10 01 in one column and then 13:12:45 in another.
I have tried the str.split but I recognise my data isn't in a string so this isn't working.
Here is my code so far:
import pandas as pd
CSVPath = "/Desktop/Test Data.csv"
data = pd.read_csv(CSVPath)
print(data)
To answer the question from your comment:
You can use df.drop(['COLUMN_1', 'COLUMN_2'], axis=1) to drop unwanted columns.
I am guessing you want to write the data back to a .csv file? Use the following snippet to only write specific columns:
df[['COLUMN_1', 'COLUMN_2']].to_csv("/Desktop/Test Data Edited.csv")
Use to_datetime combined with strftime:
# convert to datetime
s = pd.to_datetime(df['col'], format='%Y%m%d%H%M%S')
# or if integer as input
# s = pd.to_datetime(df['col'].astype(str), format='%Y%m%d%H%M%S')
# format strings
df['date'] = s.dt.strftime('%Y %m %d')
df['time'] = s.dt.strftime('%H:%M:%S')
Output:
col date time
0 20221001131245 2022 10 01 13:12:45
alternative
using string slicing and concatenation
s = df['col'].str
df['date'] = s[:4]+' '+s[4:6]+' '+s[6:8]
df['time'] = s[8:10]+':'+s[10:12]+':'+s[12:]
How to change format date from 12-Mar-2022 to , format='%d/%m/%Y' in python
so the problem is I read data from the google sheet where in the data contain multiple format, some of them is 12/03/2022 and some of them 12-Mar-2022.
I tried using this got error of couse because doesn't match for 12-Mar-2022
defectData_x['date'] = pd.to_datetime(defectData_x['date'], format='%d/%m/%Y')
Appreciate your help
defectData_x['date1'] = defectData_x['date'].dt.strftime('%d/%m/%Y')
don forget date1's dtype is not datetime but object
so it is better using date column and date1 column both before make final result
after final result, you can drop date column
add my example:
import pandas as pd
df = pd.DataFrame(["12/03/2022", "12-Mar-2022"], columns=["date"])
df["date1"] = pd.to_datetime(df["date"])
df['date2'] = df['date1'].dt.strftime('%d/%m/%Y')
I am calling some financial data from an API which is storing the time values as (I think) UTC (example below):
enter image description here
I cannot seem to convert the entire column into a useable date, I can do it for a single value using the following code so I know this works, but I have 1000's of rows with this problem and thought pandas would offer an easier way to update all the values.
from datetime import datetime
tx = int('1645804609719')/1000
print(datetime.utcfromtimestamp(tx).strftime('%Y-%m-%d %H:%M:%S'))
Any help would be greatly appreciated.
Simply use pandas.DataFrame.apply:
df['date'] = df.date.apply(lambda x: datetime.utcfromtimestamp(int(x)/1000).strftime('%Y-%m-%d %H:%M:%S'))
Another way to do it is by using pd.to_datetime as recommended by Panagiotos in the comments:
df['date'] = pd.to_datetime(df['date'],unit='ms')
You can use "to_numeric" to convert the column in integers, "div" to divide it by 1000 and finally a loop to iterate the dataframe column with datetime to get the format you want.
import pandas as pd
import datetime
df = pd.DataFrame({'date': ['1584199972000', '1645804609719'], 'values': [30,40]})
df['date'] = pd.to_numeric(df['date']).div(1000)
for i in range(len(df)):
df.iloc[i,0] = datetime.utcfromtimestamp(df.iloc[i,0]).strftime('%Y-%m-%d %H:%M:%S')
print(df)
Output:
date values
0 2020-03-14 15:32:52 30
1 2022-02-25 15:56:49 40
I'm trying to change the time format of my data that's now in form of 15:41:28:4330 or hh:mm:ss:msmsmsms to seconds.
I browsed through some of the pandas documentation but can't seem to find this format anywhere.
Would it be possible to simply calculate the seconds from that time format row by row?
You'll want to obtain a timedelta and take the total_seconds method to get seconds after midnight. So you can parse to datetime first, and subtract the default date (that will be added automatically). Ex:
#1 - via datetime
import pandas as pd
df = pd.DataFrame({'time': ["15:41:28:4330"]})
df['time'] = pd.to_datetime(df['time'], format='%H:%M:%S:%f')
df['sec_after_mdnt'] = (df['time']-df['time'].dt.floor('d')).dt.total_seconds()
df
time sec_after_mdnt
0 1900-01-01 15:41:28.433 56488.433
Alternatively, you can clean your time format and parse directly to timedelta:
#2 - str cleaning & to timedelta
df = pd.DataFrame({'time': ["15:41:28:4330"]})
# last separator must be a dot...
df['time'] = df['time'].str[::-1].str.replace(':', '.', n=1, regex=False).str[::-1]
df['sec_after_mdnt'] = pd.to_timedelta(df['time']).dt.total_seconds()
df
time sec_after_mdnt
0 15:41:28.4330 56488.433
I have a column of (created AT) in my DataFrame which has a timestamps like shown below:
Created AT
1) 2021-04-19T09:14:10.526Z
2) 2021-04-19T09:13:06.809Z
3) 2021-04-19T09:13:06.821Z
I want to extract the time only from above column etc . It should show like:
9:14:8 etc
How to extract this ?
If your date column is a string, you need to convert it to datetime and then take a substring of the time:
df = pd.DataFrame(data = {"Created At":["2021-04-19T09:14:10.526Z","2021-04-19T09:14:10.526Z"]})
df['Created At'] = pd.to_datetime(df['Created At'])
df['Created At'] = df['Created At'].dt.time.astype(str).str[:8]
df['time'] = pd.to_datetime(df['Created AT'])
print(df['time'].dt.time)
On the first line, convert the datetime to objects and write in a new column.
On the second, we get the time from datetime objects
I have a solution to your question. It can have multiple solutions but here I am giving some solution here using time, DateTime
you can get the string using
import time
import datetime
s = '2021-04-19T09:14:10.526Z'
t = s.split('T')[1].split('.')[0]
print(t)
and for getting time stamp of it do one more line
print(datetime.datetime.strptime(t,"%H:%M:%S"))
Convert to datetime and use strftime to format exactly as you like it.
data = ['2021-04-19T09:14:10.526Z',
'2021-04-19T09:13:06.809Z',
'2021-04-19T09:13:06.821Z']
df = pd.DataFrame(data=data, columns=['Created AT'])
df['Created AT'] = pd.to_datetime(df['Created AT']).dt.strftime('%H:%M:%S')
print(df)
Created AT
0 09:14:10
1 09:13:06
2 09:13:06
First convert the column to datetime format if not already in that format:
df['Created AT'] = pd.to_datetime(df['Created AT'])
Then, add the new column time with formatting by .dt.strftime() as follows (if you don't want the nano-second part):
df['time'] = df['Created AT'].dt.strftime('%H:%M:%S')
print(df)
Created AT time
0 2021-04-19 09:14:10.526000+00:00 09:14:10
1 2021-04-19 09:13:06.809000+00:00 09:13:06
2 2021-04-19 09:13:06.821000+00:00 09:13:06