How to extract year, month, date from a date column? - python

I'm trying to extract date information from a date column, and append the new columns to the original dataframe. However, I kept getting this message saying I cannot use .dt with this column. Not sure what I did wrong here, any help will be appreciated.
Error message that I got in python:

First do df.datecolumn = pd.to_datetime(df.datecolumn), then live happily ever after.

This will give you year, month and day in that month. You can also easily get week of the year and day of the week.
import pandas as pd
df = pd.DataFrame(data=[['1920-01-01'], ['2008-12-06']], columns=['Date'])
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].apply(lambda x : x.year)
df['Month'] = df['Date'].apply(lambda x : x.month)
df['Day'] = df['Date'].apply(lambda x : x.day)
print(df)
In your Time list you have a typo Dayorweek should be dayofweek.

Related

Check if value in Date column value is month end

I am new to Python so I'm sorry if this sounds silly. I have a date column in a DataFrame. I need to check if the values in the date column is the end of the month, if yes then add one day and display the result in the new date column and if not we will just replace the day of with the first of that month.
For example. If the date 2000/3/31 then the output date column will be 2000/4/01
and if the date is 2000/3/30 then the output value in the date column would be 2000/3/1
Now I can do a row wise iteration of the column but I was wondering if there is a pythonic way to do it.
Let's say my Date column is called "Date" and new column which I want to create is "Date_new" and my dataframe is df, I am trying to code it like this but it is giving me an error:
if(df['Date'].dt.is_month_end == 'True'):
df['Date_new'] = df['Date'] + timedelta(days = 1)
else:
df['Date_new'] =df['Date'].replace(day=1)
I made your if statement into a function and modified it a bit so it works for columns. I used dataframe .apply method with axis=1 so it operates on columns instead of rows
import pandas as pd
import datetime
df = pd.DataFrame({'Date': [datetime.datetime(2022, 1, 31), datetime.datetime(2022, 1, 20)]})
print(df)
def my_func(column):
if column['Date'].is_month_end:
return column['Date'] + datetime.timedelta(days = 1)
else:
return column['Date'].replace(day=1)
df['Date_new'] = df.apply(my_func, axis=1)
print(df)

Error converting string to date field in Pandas

As you can infer from the above , When I try to convert the string , it gives error.
Tried below codes but got same error as,day is not defined,
df['day'] = pd.to_datetime(df['day'],format='%d %b %Y %H:%M:%S:%f')
As SO memeber suggested,I edited code but index stills the string, did not convert to day
If you don't want to create another column, then just this will do:
df.index = pd.to_datetime(df.index)
In your example, df['day'] actually appears to be your index. To fix this, you'd want to call pd.to_datetime on your index:
df.index = pd.to_datetime(df.index)
I could tell it was your index because pandas offsets the row height of the columns for the index column and the other columns. Take this example:
df = pd.DataFrame({'a':[1,2,3], 'b':['a','b','c']})
df.set_index('a', inplace=True)
outputs:
b
a
1 a
2 b
3 c

How to get from array with Timestamps - years, months, days in python?

How can I get from array with Timestamps - years, months, days?
I have DataFrame where index is Timestamps and I try this
for i in data_frame.index:
print(datetime.fromtimestamp(i).isoformat())
But I got this error:
print(datetime.fromtimestamp(i).isoformat()) ===>
===> TypeError: an integer is required (got type Timestamp)
first use df['date'] = pd.to_datetime(df['timestap']) to convert to proper format
then create new columns for year, month, and day
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day

Extracting the hour from a time column in pandas

Suppose I have the following dataset:
How would I create a new column, to be the hour of the time?
For example, the code below works for individual times, but I haven't been able to generalise it for a column in pandas.
t = datetime.strptime('9:33:07','%H:%M:%S')
print(t.hour)
Use to_datetime to datetimes with dt.hour:
df = pd.DataFrame({'TIME':['9:33:07','9:41:09']})
#should be slowier
#df['hour'] = pd.to_datetime(df['TIME']).dt.hour
df['hour'] = pd.to_datetime(df['TIME'], format='%H:%M:%S').dt.hour
print (df)
TIME hour
0 9:33:07 9
1 9:41:09 9
If want working with datetimes in column TIME is possible assign back:
df['TIME'] = pd.to_datetime(df['TIME'], format='%H:%M:%S')
df['hour'] = df['TIME'].dt.hour
print (df)
TIME hour
0 1900-01-01 09:33:07 9
1 1900-01-01 09:41:09 9
My suggestion:
df = pd.DataFrame({'TIME':['9:33:07','9:41:09']})
df['hour']= df.TIME.str.extract("(^\d+):", expand=False)
"str.extract(...)" is a vectorized function that extract a regular expression pattern ( in our case "(^\d+):" which is the hour of the TIME) and return a Pandas Series object by specifying the parameter "expand= False"
The result is stored in the "hour" column
You can use extract() twice to feature out the 'hour' column
df['hour'] = df. TIME. str. extract("(\d+:)")
df['hour'] = df. hour. str. extract("(\d+)")

Concatenate/Merge/Join two different Dataframes Pandas

I am looking to join two dataframes using pandas on the 'Date' columns. I usually use df2= pd.concat([df, df1],axis=1), however for some reason this is not working.
In this example, i am pulling the data from a sql file, creating a new column called 'Date' that is merging my year and month columns, and then pivoting. Whne i try and concatenate the two dataframes, the dataframe shows up side by side instead of merged together.
What comes up:
Date Count of Cats Date Count of Dogs
What I want to come up:
Date Count of Cats Count of Dogs
Any ideas?
My other problem is I am trying to make sure the Date columns writes to excel as a string and not a datetime function. Please keep this is mind when thinking about a solution.
Here is my code:
executeScriptsFromFile('cats.sql')
df = pd.DataFrame(cursor.fetchall())
df.columns = [rec[0] for rec in cursor.description]
monthend = {'Q1':'3/31','Q2':'6/30','Q3':'9/30','Q4':'12/31'}
df['Date']=df['QUARTER'].map(monthend)+'/'+ df['YEAR']
df['Date'] = pd.to_datetime(df['Date'])
df10= df.pivot_table(['Breed'], ['Date'], aggfunc=np.sum,fill_value=0)
df10.reset_index(drop=False, inplace=True)
df10.reindex_axis(['Breed', 'Count of Cats'], axis=1)
df10.columns = ('Breed', 'Count of Cats')
executeScriptsFromFile('dogs.sql')
df = pd.DataFrame(cursor.fetchall())
df.columns = [rec[0] for rec in cursor.description]
monthend = {'Q1':'3/31','Q2':'6/30','Q3':'9/30','Q4':'12/31'}
df['Date']=df['QUARTER'].map(monthend)+'/'+ df['YEAR']
df['Date'] = pd.to_datetime(df['Date'])
df11= df.pivot_table(['Breed'], ['Date'], aggfunc=np.sum,fill_value=0)
df11.reset_index(drop=False, inplace=True)
df11.reindex_axis(['Breed', 'Count of Dogs'], axis=1)
df11.columns = ('Breed', 'Count of Dogs')
df11a= df11.round(0)
df12= pd.concat([df10, df11a],axis=1)
I think you have to remove code:
df10.reset_index(drop=False, inplace=True)
df11.reset_index(drop=False, inplace=True)
because need level date in index for concat by date.
Also for convert index to string use:
df.inde = df.index.astype(str)

Categories

Resources