Wrong week conversion in Python Pandas - python

I have the following date: 2019-11-20 which corresponds to week 47 of the calendar year. This is also what my excel document says. However, when I do it in Python I get week 46 instead. I will upload my code but I do not get what's wrong with it. I tried to split up the column I had to date and time separately but still, I get the same problem. Very odd I do not know what's wrong and my local time at my laptop is fine. Thanks for your help in advance!
Here is my code:
import pandas as pd
from datetime import datetime
import numpy as np
import re
df = pd.read_csv (r'C:\Users\user\document.csv')
df['startedAt'].replace(regex=True,inplace=True,to_replace=r'\+01:00',value=r'')
df['startedAt'].replace(regex=True,inplace=True,to_replace=r'\+02:00',value=r'')
df['startedAt'] = df['startedAt'].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S').strftime('%d-%m-%y %H:%M:%S'))
df['endedAt'].replace(regex=True,inplace=True,to_replace=r'\+01:00',value=r'')
df['endedAt'].replace(regex=True,inplace=True,to_replace=r'\+02:00',value=r'')
df['endedAt'] = pd.to_datetime(df['endedAt'], format='%Y-%m-%d')
df['startedAt'] = pd.to_datetime(df['startedAt'])
df['Date_started'] = df['startedAt'].dt.strftime('%d/%m/%Y')
df['Time_started'] = df['startedAt'].dt.strftime('%H:%M:%S')
df['Date_started'] = pd.to_datetime(df['Date_started'], errors='coerce')
df['week'] = df['Date_started'].dt.strftime('%U')
print(df)

Related

python pandas converting UTC integer to datetime

I am calling some financial data from an API which is storing the time values as (I think) UTC (example below):
enter image description here
I cannot seem to convert the entire column into a useable date, I can do it for a single value using the following code so I know this works, but I have 1000's of rows with this problem and thought pandas would offer an easier way to update all the values.
from datetime import datetime
tx = int('1645804609719')/1000
print(datetime.utcfromtimestamp(tx).strftime('%Y-%m-%d %H:%M:%S'))
Any help would be greatly appreciated.
Simply use pandas.DataFrame.apply:
df['date'] = df.date.apply(lambda x: datetime.utcfromtimestamp(int(x)/1000).strftime('%Y-%m-%d %H:%M:%S'))
Another way to do it is by using pd.to_datetime as recommended by Panagiotos in the comments:
df['date'] = pd.to_datetime(df['date'],unit='ms')
You can use "to_numeric" to convert the column in integers, "div" to divide it by 1000 and finally a loop to iterate the dataframe column with datetime to get the format you want.
import pandas as pd
import datetime
df = pd.DataFrame({'date': ['1584199972000', '1645804609719'], 'values': [30,40]})
df['date'] = pd.to_numeric(df['date']).div(1000)
for i in range(len(df)):
df.iloc[i,0] = datetime.utcfromtimestamp(df.iloc[i,0]).strftime('%Y-%m-%d %H:%M:%S')
print(df)
Output:
date values
0 2020-03-14 15:32:52 30
1 2022-02-25 15:56:49 40

get days from long timestamp csv file python

I have a csv file with a long timestamp column (years):
1990-05-12 14:01
.
.
1999-01-10 10:00
where the time is in hh:mm format. I'm trying to extract each day worth of data into a new csv file. Here's my code:
import datetime
import pandas as pd
df = pd.read_csv("/home/parallels/Desktop/ewh_log/hpwh_log.csv",parse_dates=True)
#change timestmap column format
def extract_months_data(df):
df = pd.to_datetime(df['timestamp'])
print(df)
def write_o_csv(df):
print('writing ..')
#todo
x1 = pd.to_datetime(df['timestamp'],format='%m-%d %H:%M').notnull().all()
if (x1)==True:
extract_months_data(df)
else:
x2 = pd.to_datetime(df['timestamp'])
x2 = x1.dt.strftime('%m-%d %H:%M')
write_to_csv(df)
The issue is that when I get to the following line
def extract_months_data(df):
df = pd.to_datetime(df['timestamp'])
I get the following error:
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime
Is there alternative solution to do it with pandas without ignoring the rest of the data. I saw posts that suggested using coerce but that replaces the rest of the data with NaT.
Thanks
UPDATE:
This post here here answers half of the question which is how to filter hours (or minutes) out of timestamp column. The second part would be how to extract a full day to another csv file. I'll post updates here once I get to a solution.
You are converting to datetime two times which is not needed
Something like that should work
import pandas as pd
df = pd.read_csv('data.csv')
df['month_data'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M')
df['month_data'] = df['month_data'].dt.strftime('%m-%d %H:%M')
# If you dont want columns with month_data NaN
df = df[df['month_data'].notna()]
print(df)

Lack of desired output

In the code below, I am trying to get data for a specified date only.
It perfectly works for the shown code.
But if I change the date to 26-12-2020, it results in data of both 26-12-2020 and 27-12-2020.
import csv
import datetime
import os
import pandas as pd
import xlsxwriter
import numpy as np
from datetime import date
import datetime
import calendar
rdate = 27-12-2020
data= pd.read_excel(r'C:/Clover Workspace/NPS/Customer Feedback-28-12-2020.xlsx')
data.drop(columns=['User ID','Comments','Purpose ID'],inplace= True, axis=1)
df = pd.DataFrame(data, columns=['Name','Rating','Date','Store','Feedback choice'])
df['Date'] = pd.to_datetime(data['Date'])
df= df[df['Date'].ge("27-12-2020")]
How can I generate the output only for the specified date, irrespective of the date on the excel sheet name?
here:
df= df[df['Date'].ge("27-12-2020")]
.ge means greater or equal, so when you put in 26-12-2020 you get both days. Try using .eq instead:
df= df[df['Date'].eq("26-12-2020")]

Can't parse a date from an excel file using Pandas

Data:Panda Dataframe, read from excel
Month Sales
01-01-17 1009
01-02-17 1004
..
01-12-19 2244
Code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from statsmodels.tsa.stattools import adfuller
import datetime
CHI = pd.read_excel('D:\DS\TS.xls', index="Month")
CHI['Month'] = pd.to_datetime(CHI['Month']).dt.date
CHI['NetSalesUSD'] = pd.to_numeric(CHI['NetSalesUSD'], errors='coerce')
result = adfuller(CHI)
Error received:
float() argument must be a string or a number, not 'datetime.date'
I tried converting to integer , still not able to get the results, any suggestions?
I think the issue here is excel.
Excel likes to show dates as Month-Day for some reason.
Try changing the date format to short date in excel then save and run your python script again.
It looks like Pandas is not recognizing the date format by default. You can instruct Pandas to use a custom date parser. See the Pandas documentation for more details.
In your case, it would look something like this:
def parse_custom_date(x):
return pd.datetime.strptime(x, '%b-%y')
data_copy = pd.read_excel(
'D:\DS\DATA.xls',
'CHI',
index='Month',
parse_dates=['Month'],
date_parser=parse_custom_date,
)
Note that your date format does not appear to have day of the month, so this would assume the first day of the month.

When I use apply function in pandas, it shows "TypeError: must be string, not float"

One column in dataframe is like this:
2018-01-23 23:55:07
I want to convert the values in this column to unix time.
Below is my code:
def convert_to_unix(s):
return float(time.mktime(datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S").timetuple()))
pd.set_option('display.max_columns', None)
fields=['JOB_START_TIMESTAMP','JOB_END_TIMESTAMP','JOB_RUNTIME_SECONDS', 'JOB_NODES_USED']
df_temp=pd.read_csv('a.csv',usecols=fields)
df_temp['JOB_START_TIMESTAMP']=df_temp['JOB_START_TIMESTAMP'].apply(convert_to_unix)
Then it shows errorTypeError: must be string, not float.
error_ image
Can anybody help me? Thanks very much!
Code below converts a date column (datetime64[ns]) to unix time (float64).
Import libraries
import pandas as pd
import numpy as np
from datetime import datetime
from time import mktime
Create sample dataframe
df = pd.DataFrame({'Date': ['2018-01-23 23:55:07', '2017-01-23 23:55:07', '2015-11-23 11:50:07',
'2013-01-03 13:55:07', '2007-01-24 23:55:07', '2017-12-23 12:55:07']})
df['Date'] = pd.to_datetime(df['Date'])
df
Function that converts to unix time
def convert_to_unix(s):
return df.apply(lambda x: mktime((x['Date']).timetuple()),axis=1)
Get unix time
df['unix_time'] = convert_to_unix(df)
df
df.dtypes
Alternative without using function
df['unix_time'] = df.apply(lambda x: mktime((x['Date']).timetuple()),axis=1)
df
Thanks Kunar. My problem is there is NaTType in my data.
His answer works and is concise since it is in the comments and hided, I just put it here.
df_temp['JOB_START_TIMESTAMP']=df_temp['JOB_START_TIMESTAMP'].apply(pd.Timestamp).apply(pd.Timestamp.timestamp)

Categories

Resources