How do you pull a certain date from a datetime column. I have been using this
df.loc[(df['column name'] == 'date')]
But it cannot find the date although it is in the df.
Your datetime column has probably smaller granularity than just the date(year,month,day), the default in pandas is nanoseconds(ns) but it could also be just seconds in your case, depending on the data source. You can see the dtype by accessing df.column.dtype yourself, and it also helps with the case when you column isnt actually of datetime dtype, in which case you need to cast it to datetime first.
And '2001-15-12' is not equal to '2001-15-12 18:36:45:2242'
Neither '2001-15-12' to '2001-15-12 18:36:45'
If you only need dates, set the datetime colum to just the date like this, using the .dt accesor for datetime segments:
df['column name'] = df['column name'].dt.date
Then you'll be able to access
df.loc[(df['column name'] == 'date')] #using just the year, month and date in the format above.
Related
1. Question
I have a dataframe, and the Year-Month column contains the year and month which I want to extract.
For example, an element in this column is "2022-10". And I want to extract year=2022, month=10 from it.
My current solution is to use apply and lambda function:
df['xx_month'] = df['Year-Month'].apply(lambda x: int(x.split('-')[1]))
But it's super slow on a huge dataframe.
How to do it more efficiently?
2. Solutions
Thanks for your wisdom, I summarized each one's solution with the code:
(1) split by '-' and join #Vitalizzare
pandas.Series.str.split - split strings of a series, if expand=True then return a data frame with each part in a separate column;
pandas.DataFrame.set_axis - if axis='columns' then rename column names of a data frame;
pandas.DataFrame.join - if the indices are equal, then the frames stacked together horizontally are returned.
df = pd.DataFrame({'Year-Month':['2022-10','2022-11','2022-12']})
df = df.join(
df['Year-Month']
.str.split('-', expand=True)
.set_axis(['year','month'], axis='columns')
)
(2) convert the datatype from object (str) into datetime format #Neele22
import pandas as pd
df['Year-Month'] = pd.to_datetime(df['Year-Month'], format="%Y-%m")
(3) use regex or datetime to extract year and month #mozway
df['Year-Month'].str.extract(r'(?P<year>\d+)-(?P<month>\d+)').astype(int)
# If you want to assign the output to the same DataFrame while removing the original Year-Month:
df[['year', 'month']] = df.pop('Year-Month').str.extract(r'(\d+)-(\d+)').astype(int)
Or use datetime:
date = pd.to_datetime(df['Year-Month'])
df['year'] = date.dt.year
df['month'] = date.dt.month
3. Follow up question
But there will be a problem if I want to subtract 'Year-Month' with other datetime columns after converting the incomplete 'Year-Month' column from string to datetime.
For example, if I want to get the data which is no later than 2 months after the timestamp of each record.
import dateutil # dateutil is a better package than datetime package according to my experience
df[(df['timestamp'] - df['Year-Month'])>= dateutil.relativedelta.relativedelta(months=0) and (df['timestamp'] - df['Year-Month'])<= datetime.timedelta(months=2)]
This code will have type error for subtracting the converted Year-Month column with actual datetime column.
TypeError: Cannot subtract tz-naive and tz-aware datetime-like objects
The types for these two columns are:
Year-Month is datetime64[ns]
timestamp is datetime64[ns, UTC]
Then, I tried to specify utc=True when changing Year-Month to datetime type:
df[["Year-Month"]] = pd.to_datetime(df[["Year-Month"]],utc=True,format="%Y-%m")
But I got Value Error.
ValueError: to assemble mappings requires at least that [year, month,
day] be specified: [day,month,year] is missing
4. Take away
If the [day,month,year] is not complete for the elements in a column. (like in my case, I only have year and month), we can't change this column from string type into datetime type to do calculations. But to use the extracted day and month to do the calculations.
If you don't need to do calculations between the incomplete datetime column and other datetime columns like me, you can change the incomplete datetime string into datetime type, and extract [day,month,year] from it. It's easier than using regex, split and join.
df = pd.DataFrame({'Year-Month':['2022-10','2022-11','2022-12']})
df = df.join(
df['Year-Month']
.str.split('-', expand=True)
.set_axis(['year','month'], axis='columns')
)
pandas.Series.str.split - split strings of a series, if expand=True then return a data frame with each part in a separate column;
pandas.DataFrame.set_axis - if axis='columns' then rename column names of a data frame;
pandas.DataFrame.join - if the indices are equal, then the frames stacked together horizontally are returned.
You can use a regex for that.
Creating a new DataFrame:
df['Year-Month'].str.extract(r'(?P<year>\d+)-(?P<month>\d+)').astype(int)
If you want to assign the output to the same DataFrame while removing the original Year-Month:
df[['year', 'month']] = df.pop('Year-Month').str.extract(r'(\d+)-(\d+)').astype(int)
Example input:
Year-Month
0 2022-10
output:
year month
0 2022 10
alternative using datetime:
You can also use a datetime intermediate
date = pd.to_datetime(df['Year-Month'])
df['year'] = date.dt.year
df['month'] = date.dt.month
output:
Year-Month year month
0 2022-10 2022 10
You can also convert the datatype from object (str) into datetime format. This will make it easier to work with the dates.
import pandas as pd
df['Year-Month'] = pd.to_datetime(df['Year-Month'], format="%Y-%m")
I am trying to create datetime index in python. I have an existing dataframe with date column (CrimeDate), here is a snapshot of it:
The date is not in datetime format though.
I intent to have an output similar to the below format, but with my existing dataframe's date column-
The Crimedate column has approx. 334192 rows and start date from 2021-04-24 to 1963-10-30 (all are in sequence of months and year)
First you'll need to convert the date column to datetime:
df['CrimeDate'] = pd.to_datetime(df['CrimeDate'])
And after that set that column as the index:
df.set_index(['CrimeDate'], inplace=True)
Once set, you can access the datetime index directly:
df.index
Trying to change multiple columns to the same datatype at once,
columns contain time data like hours minute and seconds, like
And the data
and I'm not able to change multiple columns at once to using pd.to_datetime to only the time format, I don't want the date because, if I do pd.to_datetime the date also gets added to the column which is not required, just want the time
how to convert the column to DateTime and only keep time in the column
First You can't have a datetime with only time in it in pandas/python.
So
Because python time is object in pandas convert all columns to datetimes (but there are also dates):
cols = ['Total Break Time','col1','col2']
df[cols] = df[cols].apply(pd.to_datetime)
Or convert columns to timedeltas, it looks like similar times, but possible working by datetimelike methods in pandas:
df[cols] = df[cols].apply(pd.to_timedelta)
You can pick only time as below:
import time
df['Total Break Time'] = pd.to_datetime(df['Total Break Time'],format= '%H:%M:%S' ).dt.time
Then you can repeat this for all your columns, as I suppose you already are.
The catch is, to convert to datetime and then only picking out what you need.
Very simple query but did not find the answer on google.
df with timestamp in date column
Date
22/11/2019 22:30:10 etc. say which is of the form object on doing df.dtype()
Code:
df['Date']=pd.to_datetime(df['Date']).dt.date
Now I want the date to be converted to datetime using column number rather than column name. Column number in this case will be 0(I have very big column names and similar multipe files, so I want to change date column to datetime using its position '0' in this case).
Can anyone help?
Use DataFrame.iloc for column (Series) by position:
df.iloc[:, 0] = pd.to_datetime(df.iloc[:, 0]).dt.date
Or is also possible extract column name by indexing:
df[df.columns[0]] = pd.to_datetime(df[df.columns[0]]).dt.date
I have a column in my dataframe that lists time in HH:MM:SS. When I run dtype on the column, it comes up with dtype('o') and I want to be able to use it as the x-axis for plotting some of my other signals. I saw previous documentation on using to_datetime and tried to use that to convert it to a usable time format for matplotlib.
Used pandas version is 0.18.1
I used:
time=pd.to_datetime(df.Time,format='%H:%M:%S')
where the output then becomes:
time
0 1900-01-01 00:00:01
and is carried out for the rest of the data points in the column.
Even though I specified just hour,minutes,and seconds I am still getting date. Why is that? I also tried
time.hour()
just to extract the hour portion but then I get an error that it doesn't have an 'hour' attribute.
Any help is much appreciated! Thanks!
Now in 2019, using pandas 0.25.0 and Python 3.7.3.
(Note : Edited answer to take plotting in account)
Even though I specified just hour,minutes,and seconds I am still getting date. Why is that?
According to pandas documentation I think it's because in a pandas Timestamp (equivalent of Datetime) object, the arguments year, month and day are mandatory, while hour, minutes and seconds are optional.
Therefore if you convert your object-type object in a Datetime, it must have a year-month-day part - if you don't indicate one, it will be the default 1900-01-01.
Since you also have a Date column in your sample, you can use it to have a datetime column with the right dates that you can use to plot :
import pandas as pd
df['Time'] = df.Date + " " + df.Time
df['Time'] = pd.to_datetime(df['Time'], format='%m/%d/%Y %H:%M:%S')
df.plot('Time', subplots=True)
With this your 'Time' column will display values like : 2016-07-25 01:12:07 and its dtype is datetime64[ns].
That being said, IF you plot day by day and you only want to compare times within a day (and not dates+times), having a default date does not seem bothering as long as it's the same date for all times - the times will be correctly compared on a same day, be it a wrong one.
And in the least likely case you would still want a time-only column, this is the reverse operation :
import pandas as pd
df['Time-only'] = pd.to_datetime(df['Time'], format='%H:%M:%S').dt.time
As explained before, it doesn't have a date (year-month-day) so it cannot be a datetime object, therefore this column will be in Object format.
You can extract a time object like:
import pandas as pd
df = pd.DataFrame([['12:10:20']], columns={"time": "item"})
time = pd.to_datetime(df.time, format='%H:%M:%S').dt.time[0]
After which you can extract desired properties as:
hour = time.hour
(Source)