Sorting Pandas data by hour of the day - python

Given the following dataset that I have extracted from an Excel file via Panda:
…
[131124 rows x 2 columns]
date datetime64[ns]
places_occupees int64
dtype: object
Is there a way to sort this data by the hour of day no matter the date?
What I would like to do is to get all the data in between 9 and 10 o'clock in the morning for instance.
You can find a sample of the dataset below.
https://ufile.io/jlilr

after converting to datetime pd.to_datetime(df['date']) you can create a separate column with the hour in it, e.g. df['Hour'] = df.date.dt.hour and then sort by it
df.sort_values('Hour')
EDIT:
As you want to sort by time you can instead of using hour, put the timestamp part into a 'time' column. In order to get times between 9 and 10 you can filter by where hour==9 and then sort by the time column as per below
df['date'] = pd.to_datetime(df['date'])
#put the timestamp part of the datetime into a separate column
df['time'] = df['date'].dt.time
#filter by times between 9 and 10 and sort by timestamp
df.loc[df.date.dt.hour==9].sort_values('time')

Related

Splitting Date in a Pandas Dataframe

Dataset
Hi, I Have a Index ['release_date'] in a format of month,date,year , I was trying to split this column by doing
test['date_added'].str.split(' ',expand=True) #code_1
but now it's creating a 4 columns and what really is happening is for some reason is it is simply for few rows it's shifting columns therefore creating a 4th column
code_1
This is the error I am facing
I tried splitting ['release_date'], I am expecting it to be splitted into 3 rows but for some reason few rows are being shifting to other column.
if someone wants to inspect that dataframe you can use google colab for it,
!gdown 1x-_Kq9qYrybB9-DxJHoeVlPabmAm6xbQ
you can use:
df['day'] = pd.DatetimeIndex(df['date_added']).day
df['Month'] = pd.DatetimeIndex(df['date_added']).month
df['year'] = pd.DatetimeIndex(df['date_added']).year
day, month, year = zip(*[(d.day, d.month, d.year) for d in df['date_added']])
df df.assign(day = day, month = month, year = year)

How to covnert int64 data into ?day, month and year?

I have a date feature in the format 20001130 and another 2000-11-30 without any space. How can i write the optimized code that works for both to split the date into day month and year efficiently
You can use pandas.to_datetime:
import pandas as pd
pd.to_datetime([20001130, 20001129], format='%Y%m%d')
or with a dataframe.
df = pd.DataFrame({'time': [20001129, 20001130]})
df.time = pd.to_datetime(df.time, format='%Y%m%d')
EDIT
The two date formats should be in one column. In this case, convert all to strings and let pandas.to_datetime interpret the values, as it supports different formats in one column.
df = pd.DataFrame({'time': [20001129, '2000-11-30']})
df.time = pd.to_datetime(df.time.astype(str))
time
0
2000-11-29
1
2000-11-30

retrieve only months with at least 28 sample days - pandas dataframe

Hello to the people of the web,
I have a dataframe containing 'DATE' (datetime) as index and TMAX as column with values:
tmax dataframe
What i'm trying to do is checking for every month (of each year) the amount of samples (each TMAX column value is considered as a sample).
If I have less than 28 samples, I want to drop that particular month (of that particular year) and all it's samples.
I have the following code:
if __name__ == '__main__':
df = pd.read_csv("2961941.csv")
# set date column as index, drop the 'DATE' column to avoid repititions + create as datetime object
# speed up parsing using infer_datetime_format=True.
df['DATE'] = pd.to_datetime(df['DATE'], infer_datetime_format=True)
df.set_index('DATE', inplace=True)
# create new table out of 'DATE' and 'TMAX'
tmax = df.filter(['DATE', 'TMAX'], axis=1)
# erase rows with missing data
tmax.dropna()
# create snow table & delete rows with missing info
snow = df.filter(['DATE', 'SNOW']).dropna()
# for index, row in tmax.iterrows():
Thanks for the help.
I can suggest trying the following.
Here I have highlighted the results of counting days in a month into a variable 'a'.
And then I filter the data in which there are less than 28 days in a month.
It worked for me.
a = df.groupby(pd.Grouper(level='DATE', freq="M")).transform('count')
print(df[a['TMAX'] >= 28])

Creating Datetime index in python

I am trying to create datetime index in python. I have an existing dataframe with date column (CrimeDate), here is a snapshot of it:
The date is not in datetime format though.
I intent to have an output similar to the below format, but with my existing dataframe's date column-
The Crimedate column has approx. 334192 rows and start date from 2021-04-24 to 1963-10-30 (all are in sequence of months and year)
First you'll need to convert the date column to datetime:
df['CrimeDate'] = pd.to_datetime(df['CrimeDate'])
And after that set that column as the index:
df.set_index(['CrimeDate'], inplace=True)
Once set, you can access the datetime index directly:
df.index

Convert string time into DatetimeIndex and then resample

Two of the columns in my dataset are hour and mins as integers. Here's a snippet of the dataset.
I'm creating a timestamp through the following code:
TIME = pd.to_timedelta(df["hour"], unit='h') + pd.to_timedelta(df["mins"], unit='m')
#df['TIME'] = TIME
df['TIME'] = TIME.astype(str)
I convert TIME to string format because I'm exporting the dataframe to MS Excel which doesn't support timedelta format.
Now I want timestamps for every minute.
For that, I want to fill the missing minutes and add zero to the TOTAL_TRADE_RATE against them, for which I first have to set the TIME column as index. I'm applying this:
df = df.set_index('TIME')
df.index = pd.DatetimeIndex(df.index)
df.resample('60s').sum().reset_index()
but it's giving the following error:
Unknown string format: 0 days 09:33:00.000000000

Categories

Resources