In the code below, I am trying to get data for a specified date only.
It perfectly works for the shown code.
But if I change the date to 26-12-2020, it results in data of both 26-12-2020 and 27-12-2020.
import csv
import datetime
import os
import pandas as pd
import xlsxwriter
import numpy as np
from datetime import date
import datetime
import calendar
rdate = 27-12-2020
data= pd.read_excel(r'C:/Clover Workspace/NPS/Customer Feedback-28-12-2020.xlsx')
data.drop(columns=['User ID','Comments','Purpose ID'],inplace= True, axis=1)
df = pd.DataFrame(data, columns=['Name','Rating','Date','Store','Feedback choice'])
df['Date'] = pd.to_datetime(data['Date'])
df= df[df['Date'].ge("27-12-2020")]
How can I generate the output only for the specified date, irrespective of the date on the excel sheet name?
here:
df= df[df['Date'].ge("27-12-2020")]
.ge means greater or equal, so when you put in 26-12-2020 you get both days. Try using .eq instead:
df= df[df['Date'].eq("26-12-2020")]
Related
I have a DataFrame with a column containing timestamps and I would like to convert the column to date time in Python and save the file with a column containing the date and time. Here is the code:
import pandas as pd
df = pd.DataFrame({
"time": [1465585763000, 1465586363000, 1465586963000,
1465587563000, 1465588163000]})
df
This could also work
import pandas as pd
from datetime import datetime as dt
d = {'time': [1465585763000, 1465586363000, 1465586963000,
1465587563000, 1465588163000]}
print(d['time'])
new = [dt.fromtimestamp(x/1000).strftime('%Y-%m-%d %H:%M:%S') for x in d['time']]
pd.to_datetime(new)
This could work
from datetime import datetime as dt
import pandas as pd
times = [
1465585763000,
1465586363000,
1465586963000,
1465587563000,
1465588163000]
start_ts = dt.timestamp(dt(1970, 1, 1))
dates = [dt.fromtimestamp(time / 1000 + start_ts) for time in times]
pd.to_datetime(dates)
Data:Panda Dataframe, read from excel
Month Sales
01-01-17 1009
01-02-17 1004
..
01-12-19 2244
Code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from statsmodels.tsa.stattools import adfuller
import datetime
CHI = pd.read_excel('D:\DS\TS.xls', index="Month")
CHI['Month'] = pd.to_datetime(CHI['Month']).dt.date
CHI['NetSalesUSD'] = pd.to_numeric(CHI['NetSalesUSD'], errors='coerce')
result = adfuller(CHI)
Error received:
float() argument must be a string or a number, not 'datetime.date'
I tried converting to integer , still not able to get the results, any suggestions?
I think the issue here is excel.
Excel likes to show dates as Month-Day for some reason.
Try changing the date format to short date in excel then save and run your python script again.
It looks like Pandas is not recognizing the date format by default. You can instruct Pandas to use a custom date parser. See the Pandas documentation for more details.
In your case, it would look something like this:
def parse_custom_date(x):
return pd.datetime.strptime(x, '%b-%y')
data_copy = pd.read_excel(
'D:\DS\DATA.xls',
'CHI',
index='Month',
parse_dates=['Month'],
date_parser=parse_custom_date,
)
Note that your date format does not appear to have day of the month, so this would assume the first day of the month.
I have the following date: 2019-11-20 which corresponds to week 47 of the calendar year. This is also what my excel document says. However, when I do it in Python I get week 46 instead. I will upload my code but I do not get what's wrong with it. I tried to split up the column I had to date and time separately but still, I get the same problem. Very odd I do not know what's wrong and my local time at my laptop is fine. Thanks for your help in advance!
Here is my code:
import pandas as pd
from datetime import datetime
import numpy as np
import re
df = pd.read_csv (r'C:\Users\user\document.csv')
df['startedAt'].replace(regex=True,inplace=True,to_replace=r'\+01:00',value=r'')
df['startedAt'].replace(regex=True,inplace=True,to_replace=r'\+02:00',value=r'')
df['startedAt'] = df['startedAt'].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S').strftime('%d-%m-%y %H:%M:%S'))
df['endedAt'].replace(regex=True,inplace=True,to_replace=r'\+01:00',value=r'')
df['endedAt'].replace(regex=True,inplace=True,to_replace=r'\+02:00',value=r'')
df['endedAt'] = pd.to_datetime(df['endedAt'], format='%Y-%m-%d')
df['startedAt'] = pd.to_datetime(df['startedAt'])
df['Date_started'] = df['startedAt'].dt.strftime('%d/%m/%Y')
df['Time_started'] = df['startedAt'].dt.strftime('%H:%M:%S')
df['Date_started'] = pd.to_datetime(df['Date_started'], errors='coerce')
df['week'] = df['Date_started'].dt.strftime('%U')
print(df)
I am using the following code to generate data series :-
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import calendar
from datetime import datetime
from itertools import cycle, islice
month_input = "Jan"
year_input = 2018
month_start= str(month_input)
year_start = int(year_input)
start = pd.to_datetime(f'{month_start}{year_start}', format='%b%Y')
end = pd.to_datetime(f'{month_input}{year_start + 1}', format='%b%Y') - pd.Timedelta('1d') # Generating Date Range for an Year
daily_series_cal = pd.DataFrame({'Date': pd.date_range(start, end)})
When I am trying to do:
print(daily_series_cal["Date"][0])
It is giving as output as :-
2018-01-01 00:00:00
How can I change the format of whole column to 01/01/2018 ie mm/dd/yyyy?
It is possible by DatetimeIndex.strftime, but lost datetimes and get strings:
daily_series_cal = pd.DataFrame({'Date': pd.date_range(start, end).strftime('%m/%d/%Y')})
One column in dataframe is like this:
2018-01-23 23:55:07
I want to convert the values in this column to unix time.
Below is my code:
def convert_to_unix(s):
return float(time.mktime(datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S").timetuple()))
pd.set_option('display.max_columns', None)
fields=['JOB_START_TIMESTAMP','JOB_END_TIMESTAMP','JOB_RUNTIME_SECONDS', 'JOB_NODES_USED']
df_temp=pd.read_csv('a.csv',usecols=fields)
df_temp['JOB_START_TIMESTAMP']=df_temp['JOB_START_TIMESTAMP'].apply(convert_to_unix)
Then it shows errorTypeError: must be string, not float.
error_ image
Can anybody help me? Thanks very much!
Code below converts a date column (datetime64[ns]) to unix time (float64).
Import libraries
import pandas as pd
import numpy as np
from datetime import datetime
from time import mktime
Create sample dataframe
df = pd.DataFrame({'Date': ['2018-01-23 23:55:07', '2017-01-23 23:55:07', '2015-11-23 11:50:07',
'2013-01-03 13:55:07', '2007-01-24 23:55:07', '2017-12-23 12:55:07']})
df['Date'] = pd.to_datetime(df['Date'])
df
Function that converts to unix time
def convert_to_unix(s):
return df.apply(lambda x: mktime((x['Date']).timetuple()),axis=1)
Get unix time
df['unix_time'] = convert_to_unix(df)
df
df.dtypes
Alternative without using function
df['unix_time'] = df.apply(lambda x: mktime((x['Date']).timetuple()),axis=1)
df
Thanks Kunar. My problem is there is NaTType in my data.
His answer works and is concise since it is in the comments and hided, I just put it here.
df_temp['JOB_START_TIMESTAMP']=df_temp['JOB_START_TIMESTAMP'].apply(pd.Timestamp).apply(pd.Timestamp.timestamp)