Pandas dataframe delete rows by date

Pandas dataframe delete rows by date - python

I have two pandas dataframe named dataset and startdate. dataset dataframe contains rows date starting from (1961 - February - 1) to (1961 - December - 31). and many years same way.
The startdate dataframe contain start day for each year such as for 1961 the start date is 1961-February-8. So I need to remove the rows from the dataset dated before start date (1961-February-8) in 1961. That means removing rows dated from (1961-February-1) to (1961-February-7).
I need to do the same for all the other years.
For 1961 I can do :
datset[dataset['date']>='1961-02-08']
But problem is start date from startdate for each year is different.

We do reindex
s=startdate.date.reindex(dataset['year'])
s.index=dataset.index
df=dataset[dataset['date']>=s].copy()

Related

Python3: How can i select only weekdays from a pandas dataframe?

I currently have a dataframe with sales data, named "visitresult_and_outcome".
I have a column named "DATEONLY" that holds the sale date (format yyyy-mm-dd) in string format.
I now want to make 2 new dataframes: 1 for the sales made in the weekend, 1 for the sales made on weekdays. How can i do this in an efficient way?

df['dayofweek'] = df['DATEONLY'].dt.dayofweek
This will pull the day of the week out of your date attributes. Creating your other dataframes will just be a matter of slicing.

How can I group monthly over years in Python with pandas?

I have a dataset ranging from 2009 to 2019. The Dates include Years, months and days. I have two columns: one with dates and the other with values. I need to group my Dataframe monthly summing up the Values in the other column. At the moment what I am doing is setting the date column as index and using "df.resample('M').sum()".
The problem is that this is grouping my Dataframe monthly but for each different year (so I have 128 values in the "date" column). How can I group my data only for the 12 months without taking into consideration years?
Thank you very much in advance
I attached two images as example of the Dataset I have and the one I want to obtain.
Dataframe I have
Dataframe I want to obtain

use dt.month on your date column.
Example is
df.groupby(df['date'].dt.month).agg({'value':'sum'})

Can you extract both year AND month from date in Pandas [duplicate]

This question already has answers here:
Extracting just Month and Year separately from Pandas Datetime column
(13 answers)
Closed 3 months ago.
I have a dataframe with a date column (type datetime). I can easily extract the year or the month to perform groupings, but I can't find a way to extract both year and month at the same time from a date. I need to analyze performance of a product over a 1 year period and make a graph with how it performed each month. Naturally I can't just group by month because it will add the same months for 2 different years, and grouping by year doesn't produce my desired results because I need to look at performance monthly.
I've been looking at several solutions, but none of them have worked so far.
So basically, my current dates look like this
2018-07-20
2018-08-20
2018-08-21
2018-10-11
2019-07-20
2019-08-21
And I'd just like to have 2018-07, 2018-08, 2018-10, and so on.

You can use to_period
df['month_year'] = df['date'].dt.to_period('M')

If they are stored as datetime you should be able to create a string with just the year and month to group by using datetime.strftime (https://strftime.org/).
It would look something like:
df['ym-date'] = df['date'].dt.strftime('%Y-%m')

If you have some data that uses datetime values, like this:
sale_date = [
pd.date_range('2017', freq='W', periods=121).to_series().reset_index(drop=True).rename('Sale Date'),
pd.Series(np.random.normal(1000, 100, 121)).rename('Quantity')
]
sales = pd.concat(data, axis='columns')
You can group by year and date simultaneously like this:
d = sales['Sale Date']
sales.groupby([d.dt.year.rename('Year'), d.dt.month.rename('Month')]).sum()
You can also create a string that represents the combination of month and year and group by that:
ym_id = d.apply("{:%Y-%m}".format).rename('Sale Month')
sales.groupby(ym_id).sum()

A couple of options, one is to map to the first of each month:
Assuming your dates are in a column called 'Date', something like:
df['Date_no_day'] = df['Date'].apply(lambda x: x.replace(day=1))
If you are really keen on storing the year and month only, you could map to a (year, month) tuple, eg:
df['Date_no_day'] = df['Date'].apply(lambda x: (x.year, x.month))
From here, you can groupby/aggregate by this new column and perform your analysis

One way could be to transform the column to get the first of month for all of these dates and then create your analsis on month to month:
date_col = pd.to_datetime(['2011-09-30', '2012-02-28'])
new_col = date_col + pd.offsets.MonthBegin(1)
Here your analysis remains intact as monthly

Remove last n days from dataframe

I have a pandas dataframe with datetime index (30 min frequency). And I want do remove "n" last days from it. My dataframe do not include weekends, so if the last day of it is Monday, I want to remove Monday, Friday and Thursday (from the end). So, I mean observed days, not calendar. What is the most pythonic way to do it?
Thanks.

Pandas knows about Monday to Friday as business days.
So if you want to remove the last n business days from your dataframe, you can just do:
df.drop(df[df.index >= df.index.max().date()-pd.offsets.BDay(n-1)].index, inplace=True)
If you really need to remove observed days in the dataframe, if will be slightly more complex because you will have to count the days. Code could be (using a companion dataframe called df_days):
# create a dataframe with same index and only one row per day:
df_days = pd.DataFrame(index=df.index).assign(day=df.index.date).drop_duplicates('day')
# now count the observed day in the companion dataframe
df_days['new_day'] = 1
df_days['days'] = df_days['new_day'].cumsum()
# compute first index to remove to remove last observed n days
ix = df_days.loc[df_days['days'] == df_days['days'].max() + 1 - n].index[0]
# ok drop the last observed n days from the initial dataframe and delete the companion one
df.drop(df.loc[df.index > ix].index)
del df_days

Wrong index after monthly resampling of Dataframe, Pandas

My DataFrame has the following format:
I resampled the values based on a monthly basis, but the problem is that even the datatime index start from 2017-07-08, the Date Column after grouping by month and finding the mean, start from 2017-01-31. (There are not data at all in my DataFrame from January 2017 to August 2017). The data recording has started from August 2017.
Could you please give me some insights to understand what is happening?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas dataframe delete rows by date - python

We do reindex s=startdate.date.reindex(dataset['year']) s.index=dataset.index df=dataset[dataset['date']>=s].copy()

Related

Python3: How can i select only weekdays from a pandas dataframe?

How can I group monthly over years in Python with pandas?

Can you extract both year AND month from date in Pandas [duplicate]

Remove last n days from dataframe

Wrong index after monthly resampling of Dataframe, Pandas

Categories

Resources