Data:Panda Dataframe, read from excel
Month Sales
01-01-17 1009
01-02-17 1004
..
01-12-19 2244
Code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from statsmodels.tsa.stattools import adfuller
import datetime
CHI = pd.read_excel('D:\DS\TS.xls', index="Month")
CHI['Month'] = pd.to_datetime(CHI['Month']).dt.date
CHI['NetSalesUSD'] = pd.to_numeric(CHI['NetSalesUSD'], errors='coerce')
result = adfuller(CHI)
Error received:
float() argument must be a string or a number, not 'datetime.date'
I tried converting to integer , still not able to get the results, any suggestions?
I think the issue here is excel.
Excel likes to show dates as Month-Day for some reason.
Try changing the date format to short date in excel then save and run your python script again.
It looks like Pandas is not recognizing the date format by default. You can instruct Pandas to use a custom date parser. See the Pandas documentation for more details.
In your case, it would look something like this:
def parse_custom_date(x):
return pd.datetime.strptime(x, '%b-%y')
data_copy = pd.read_excel(
'D:\DS\DATA.xls',
'CHI',
index='Month',
parse_dates=['Month'],
date_parser=parse_custom_date,
)
Note that your date format does not appear to have day of the month, so this would assume the first day of the month.
Related
I wrote a program where a dataframe is traversed & when any column with the name 'Date' is encountered, all the rows under that column are supposed to be converted to a 'datetime' object using 'pd.to_datetime' in the format mentioned. But this method is not working for me.
The 'Date' column in my dataset consists of dates in various formats & with different separators. Example: 26/04/2007, 01-15-1998, 2020-12-2. When I do debugging I get an error message for those dates that are not in the format specified.
Isn't the whole point of using the method a way to convert dates in any format to a datetime object & in the format specified?
My code:
from dateutil.parser import parse
import re
from datetime import datetime
import calendar
import pandas as pd
def date_fun(filepath):
date_list=['Date', 'date', 'Dates', 'dates']
for i in filepath.columns:
for j in date_list:
if i==j:
filepath[i]=pd.to_datetime(filepath[i], format='%d-%m-%Y')
main_path = pd.read_csv('C:/Data_Cleansing/lockdown_us.csv')
fpath=main_path.copy()
date_fun(fpath)
Error: time data '26-2016-09' does not match format '%d-%m-%Y' (match)
Where is the mistake in my code?
I grab data with yfinance package. I convert it into a panda dataframe.
However, I am unable to save the dataframe to excel file.
ValueError: Excel does not support datetimes with timezones. Please
ensure that datetimes are timezone unaware before writing to Excel.
This is how the dataframe looks like. It should be 8 columns. Spyder says it has 7 columns.
Below is my codes:
import yfinance as yf
import pandas as pd
stock = yf.Ticker("BABA")
# get stock info
stock.info
# get historical market data
hist = stock.history(start="2021-03-25",end="2021-05-20",interval="15m")
hist = pd.DataFrame(hist)
# pd.to_datetime(hist['Datetime'])
# hist['Datetime'].dt.tz_localize(None)
hist.to_excel(excel_writer= "D:/data/python projects/stock_BABA2.xlsx")
You can remove the time zone information of DatetimeIndex using DatetimeIndex.tz_localize() , as follows:
hist.index = hist.index.tz_localize(None)
You can convert time zones using tz_convert(), in your situation it should work with:
hist.index = hist.index.tz_convert(None)
In the code below, I am trying to get data for a specified date only.
It perfectly works for the shown code.
But if I change the date to 26-12-2020, it results in data of both 26-12-2020 and 27-12-2020.
import csv
import datetime
import os
import pandas as pd
import xlsxwriter
import numpy as np
from datetime import date
import datetime
import calendar
rdate = 27-12-2020
data= pd.read_excel(r'C:/Clover Workspace/NPS/Customer Feedback-28-12-2020.xlsx')
data.drop(columns=['User ID','Comments','Purpose ID'],inplace= True, axis=1)
df = pd.DataFrame(data, columns=['Name','Rating','Date','Store','Feedback choice'])
df['Date'] = pd.to_datetime(data['Date'])
df= df[df['Date'].ge("27-12-2020")]
How can I generate the output only for the specified date, irrespective of the date on the excel sheet name?
here:
df= df[df['Date'].ge("27-12-2020")]
.ge means greater or equal, so when you put in 26-12-2020 you get both days. Try using .eq instead:
df= df[df['Date'].eq("26-12-2020")]
I have the following date: 2019-11-20 which corresponds to week 47 of the calendar year. This is also what my excel document says. However, when I do it in Python I get week 46 instead. I will upload my code but I do not get what's wrong with it. I tried to split up the column I had to date and time separately but still, I get the same problem. Very odd I do not know what's wrong and my local time at my laptop is fine. Thanks for your help in advance!
Here is my code:
import pandas as pd
from datetime import datetime
import numpy as np
import re
df = pd.read_csv (r'C:\Users\user\document.csv')
df['startedAt'].replace(regex=True,inplace=True,to_replace=r'\+01:00',value=r'')
df['startedAt'].replace(regex=True,inplace=True,to_replace=r'\+02:00',value=r'')
df['startedAt'] = df['startedAt'].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S').strftime('%d-%m-%y %H:%M:%S'))
df['endedAt'].replace(regex=True,inplace=True,to_replace=r'\+01:00',value=r'')
df['endedAt'].replace(regex=True,inplace=True,to_replace=r'\+02:00',value=r'')
df['endedAt'] = pd.to_datetime(df['endedAt'], format='%Y-%m-%d')
df['startedAt'] = pd.to_datetime(df['startedAt'])
df['Date_started'] = df['startedAt'].dt.strftime('%d/%m/%Y')
df['Time_started'] = df['startedAt'].dt.strftime('%H:%M:%S')
df['Date_started'] = pd.to_datetime(df['Date_started'], errors='coerce')
df['week'] = df['Date_started'].dt.strftime('%U')
print(df)
I have a pandas.DataFrame indexed by time, as seen below. The time is in Epoch time. When I graph the second column these time values display along the x-axis. I want a more readable time in minutes:seconds.
In [13]: print df.head()
Time
1481044277379 0.581858
1481044277384 0.581858
1481044277417 0.581858
1481044277418 0.581858
1481044277467 0.581858
I have tried some pandas functions, and some methods for converting the whole column, I visited: Pandas docs, this question and the cool site.
I am using pandas 0.18.1
If you read your data with read_csv you can use a custom dateparser:
import pandas as pd
#example.csv
'''
Time,Value
1481044277379,0.581858
1481044277384,0.581858
1481044277417,0.581858
1481044277418,0.581858
1481044277467,0.581858
'''
def dateparse(time_in_secs):
time_in_secs = time_in_secs/1000
return datetime.datetime.fromtimestamp(float(time_in_secs))
dtype= {"Time": float, "Value":float}
df = pd.read_csv("example.csv", dtype=dtype, parse_dates=["Time"], date_parser=dateparse)
print df
You can convert an epoch timestamp to HH:MM with:
import datetime as dt
hours_mins = dt.datetime.fromtimestamp(1347517370).strftime('%H:%M')
Adding a column to your pandas.DataFrame can be done as:
df['H_M'] = pd.Series([dt.datetime.fromtimestamp(int(ts)).strftime('%H:%M')
for ts in df['timestamp']]).values