I have dataframe with column date with type datetime64[ns].
When I try to create new column day with format MM-DD based on date column only first method works from below. Why second method doesn't work in pandas?
df['day'] = df['date'].dt.strftime('%m-%d')
df['day2'] = str(df['date'].dt.month) + '-' + str(df['date'].dt.day)
Result for one row:
day 01-04
day2 0 1\n1 1\n2 1\n3 1\n4 ...
Types of columns
day object
day2 object
Problem of solution is if use str with df['date'].dt.month it return Series, correct way is use Series.astype:
df['day2'] = df['date'].dt.month.astype(str) + '-' + df['date'].dt.day.astype(str)
Related
I have a column in my dataframe which consists of date 1/6/2023 (m/d/yyy) format. The date datatype is object but I want to convert it from object to int64 data type. I have tried the following code but it is drastically changing date values:
df = df.astype({'date':'int'})
is changing my values drastically is there any other alternative for the same ?
df = df.astype({'date':'int'})
Convert values to datetimes, then to strings - e.g. here YYYYMMDD format and last to integers:
print (df)
date
0 1/6/2023
df['date'] = pd.to_datetime(df['date'], dayfirst=True).dt.strftime('%Y%m%d').astype(int)
print (df)
date
0 20230601
I am new to Python so I'm sorry if this sounds silly. I have a date column in a DataFrame. I need to check if the values in the date column is the end of the month, if yes then add one day and display the result in the new date column and if not we will just replace the day of with the first of that month.
For example. If the date 2000/3/31 then the output date column will be 2000/4/01
and if the date is 2000/3/30 then the output value in the date column would be 2000/3/1
Now I can do a row wise iteration of the column but I was wondering if there is a pythonic way to do it.
Let's say my Date column is called "Date" and new column which I want to create is "Date_new" and my dataframe is df, I am trying to code it like this but it is giving me an error:
if(df['Date'].dt.is_month_end == 'True'):
df['Date_new'] = df['Date'] + timedelta(days = 1)
else:
df['Date_new'] =df['Date'].replace(day=1)
I made your if statement into a function and modified it a bit so it works for columns. I used dataframe .apply method with axis=1 so it operates on columns instead of rows
import pandas as pd
import datetime
df = pd.DataFrame({'Date': [datetime.datetime(2022, 1, 31), datetime.datetime(2022, 1, 20)]})
print(df)
def my_func(column):
if column['Date'].is_month_end:
return column['Date'] + datetime.timedelta(days = 1)
else:
return column['Date'].replace(day=1)
df['Date_new'] = df.apply(my_func, axis=1)
print(df)
in my dataframe i have a column [date_time/full_company_name] that contains a date, time and a company name. I want to split the column in order to have 2 columns, one with date and time and one with the company name - the issue is, that they are directly adjacent e.g.
[2011-11-19 12:22:10Anderson-Henderson]
So my initial idea of using the following code:
split = df[['date_time', 'full_company_name']] = df['date_time/full_company_name'].str.split('/', n=1, expand=True)
returned 2 columns but one with all the information and the second one without values.
How can I insert a '/' between date and company name in my initial dataframe to make use of this kind of splitting? or is there an easier way overall?
You can also do this by string slicing:
so firstly use astype() method and strip() method:
df['date_time/full_company_name']=df['date_time/full_company_name'].astype(str).str.strip('[]')
Finally assign columns:
df['date_time']=df['date_time/full_company_name'].str[1:20]
df['full_company_name']=df['date_time/full_company_name'].str[20:-1]
Now if you print df you will get:
date_time/full_company_name date_time full_company_name
0 '2011-11-19 12:22:10Anderson-Henderson' 2011-11-19 12:22:10 Anderson-Henderson
1 '2011-11-19 12:22:10Anderson-Henderson' 2011-11-19 12:22:10 Anderson-Henderson
I hope you find a better solution but until you do, I've come up with one that works.
Split by ":" into multiple columns, then take the seconds from the name column and shift that to the date column.
df[['date', 'hour', 'name']] = df["col"].str.split(':', expand=True)
df['date'] = df['date'] + ":" + df['hour'] + ":" + df['name'].str[:2]
df['name'] = df['name'].str[2:]
Output:
col date name
2011-11-19 12:22:10Anderson-Henderson 2011-11-19 12:22:10 Anderson-Henderson
Suppose I have the following dataset:
How would I create a new column, to be the hour of the time?
For example, the code below works for individual times, but I haven't been able to generalise it for a column in pandas.
t = datetime.strptime('9:33:07','%H:%M:%S')
print(t.hour)
Use to_datetime to datetimes with dt.hour:
df = pd.DataFrame({'TIME':['9:33:07','9:41:09']})
#should be slowier
#df['hour'] = pd.to_datetime(df['TIME']).dt.hour
df['hour'] = pd.to_datetime(df['TIME'], format='%H:%M:%S').dt.hour
print (df)
TIME hour
0 9:33:07 9
1 9:41:09 9
If want working with datetimes in column TIME is possible assign back:
df['TIME'] = pd.to_datetime(df['TIME'], format='%H:%M:%S')
df['hour'] = df['TIME'].dt.hour
print (df)
TIME hour
0 1900-01-01 09:33:07 9
1 1900-01-01 09:41:09 9
My suggestion:
df = pd.DataFrame({'TIME':['9:33:07','9:41:09']})
df['hour']= df.TIME.str.extract("(^\d+):", expand=False)
"str.extract(...)" is a vectorized function that extract a regular expression pattern ( in our case "(^\d+):" which is the hour of the TIME) and return a Pandas Series object by specifying the parameter "expand= False"
The result is stored in the "hour" column
You can use extract() twice to feature out the 'hour' column
df['hour'] = df. TIME. str. extract("(\d+:)")
df['hour'] = df. hour. str. extract("(\d+)")
I have the following dataframe
How can I aggregate the number of tickets (summing) for every month?
I tried:
df_res[df_res["type"]=="other"].groupby(["type","date"])["n_tickets"].sum()
date is an object
You need assign to new DataFrame for same size of Series created by Series.dt.month:
#if necessary convert to datetimes
df['date'] = pd.to_datetime(df['date'])
df = df_res[df_res["type"]=="pax"]
#type is same, so should be omited
out = df.groupby(df["date"].dt.month)["n_tickets"].sum()
#if need column with same value `pax`
#out = df.groupby(['type',df["date"].dt.month])["n_tickets"].sum()
If want grouping by pax and no pax:
types = np.where(df_res["type"]=="pax", 'pax', 'no pax')
df_res.groupby([types, df_res["date"].dt.month])["n_tickets"].sum()