I currently have a dataframe with sales data, named "visitresult_and_outcome".
I have a column named "DATEONLY" that holds the sale date (format yyyy-mm-dd) in string format.
I now want to make 2 new dataframes: 1 for the sales made in the weekend, 1 for the sales made on weekdays. How can i do this in an efficient way?
df['dayofweek'] = df['DATEONLY'].dt.dayofweek
This will pull the day of the week out of your date attributes. Creating your other dataframes will just be a matter of slicing.
Related
Looking at a fruit and veg dataset with prices and dates. However when I try to plot anything with the date there are way too many instances as the date feature does it by each week. Is there anyway to either group the dates by month or something? The date format is like 2022-02-11.
A simple way is to add a month column and group by it. We use pandas.DataFrame.groupby and pandas.DatetimeIndex to do this.
df['Month'] = pd.DatetimeIndex(df.date).month
df.groupby(['Month']).sum().plot()
I have a pandas dataframe with 3 columns:
OrderID_new (integer)
OrderTotal (float)
OrderDate_new (string or datetime sometimes)
Sales order ID's are in the first column, order values (totals) are in the 2nd column and order date - in mm/dd/yyyy format are in the last column.
I need to do 2 things:
to aggregate the order totals:
a) first into total sales per each day and then
b) into total sales per each calendar month
to convert values in OrderDate_new from mm/dd/yyyy format (e.g. 01/30/2015) into MM YYYY (e.g. January 2015) format.
The problem is some input files have 3rd column (date) already in datetime format while some have it as string format so that means sometimes string to datetime parsing will be needed while in other cases, reformatting datetime.
I have been trying to do 2 step aggregation with groupby but I'm getting some strange daily and monthly totals that make no sense.
What I need as the final stage is time series with 2 columns - 1. monthly sales and 2. month (Month Year)...
Then I will need to select and train some model for monthly sales time series forecast (out of scope for this question)...
What am I doing wrong?
How to do it effectively in Python?
dataframe example:
You did not provide usable sample data, hence I've synthesized.
resample() allows you to rollup a date column. Have provided daily and monthly
pd.to_datetime() gives you what you want
def mydf(size=10):
return pd.DataFrame({"OrderID_new":np.random.randint(100,200, size),
"OrderTotal":np.random.randint(200, 10000, size),
"OrderDate_new":np.random.choice(pd.date_range(dt.date(2019,8,1),dt.date(2020,1,1)),size)})
# smash orderdate to be a string for some rows
df = pd.concat([mydf(5), mydf(5).assign(OrderDate_new=lambda dfa: dfa.OrderDate_new.dt.strftime("%Y/%m/%d"))])
# make sure everything is a date..
df.OrderDate_new = pd.to_datetime(df.OrderDate_new)
# totals
df.resample("1d", on="OrderDate_new")["OrderTotal"].sum()
df.resample("1m", on="OrderDate_new")["OrderTotal"].sum()
I have a dataset ranging from 2009 to 2019. The Dates include Years, months and days. I have two columns: one with dates and the other with values. I need to group my Dataframe monthly summing up the Values in the other column. At the moment what I am doing is setting the date column as index and using "df.resample('M').sum()".
The problem is that this is grouping my Dataframe monthly but for each different year (so I have 128 values in the "date" column). How can I group my data only for the 12 months without taking into consideration years?
Thank you very much in advance
I attached two images as example of the Dataset I have and the one I want to obtain.
Dataframe I have
Dataframe I want to obtain
use dt.month on your date column.
Example is
df.groupby(df['date'].dt.month).agg({'value':'sum'})
This question already has answers here:
Extracting just Month and Year separately from Pandas Datetime column
(13 answers)
Closed 3 months ago.
I have a dataframe with a date column (type datetime). I can easily extract the year or the month to perform groupings, but I can't find a way to extract both year and month at the same time from a date. I need to analyze performance of a product over a 1 year period and make a graph with how it performed each month. Naturally I can't just group by month because it will add the same months for 2 different years, and grouping by year doesn't produce my desired results because I need to look at performance monthly.
I've been looking at several solutions, but none of them have worked so far.
So basically, my current dates look like this
2018-07-20
2018-08-20
2018-08-21
2018-10-11
2019-07-20
2019-08-21
And I'd just like to have 2018-07, 2018-08, 2018-10, and so on.
You can use to_period
df['month_year'] = df['date'].dt.to_period('M')
If they are stored as datetime you should be able to create a string with just the year and month to group by using datetime.strftime (https://strftime.org/).
It would look something like:
df['ym-date'] = df['date'].dt.strftime('%Y-%m')
If you have some data that uses datetime values, like this:
sale_date = [
pd.date_range('2017', freq='W', periods=121).to_series().reset_index(drop=True).rename('Sale Date'),
pd.Series(np.random.normal(1000, 100, 121)).rename('Quantity')
]
sales = pd.concat(data, axis='columns')
You can group by year and date simultaneously like this:
d = sales['Sale Date']
sales.groupby([d.dt.year.rename('Year'), d.dt.month.rename('Month')]).sum()
You can also create a string that represents the combination of month and year and group by that:
ym_id = d.apply("{:%Y-%m}".format).rename('Sale Month')
sales.groupby(ym_id).sum()
A couple of options, one is to map to the first of each month:
Assuming your dates are in a column called 'Date', something like:
df['Date_no_day'] = df['Date'].apply(lambda x: x.replace(day=1))
If you are really keen on storing the year and month only, you could map to a (year, month) tuple, eg:
df['Date_no_day'] = df['Date'].apply(lambda x: (x.year, x.month))
From here, you can groupby/aggregate by this new column and perform your analysis
One way could be to transform the column to get the first of month for all of these dates and then create your analsis on month to month:
date_col = pd.to_datetime(['2011-09-30', '2012-02-28'])
new_col = date_col + pd.offsets.MonthBegin(1)
Here your analysis remains intact as monthly
I have 4 columns which have Date , Account #, Quantity and Sale respectively. I have daily data but I want to be able to show Weekly Sales per Customer and the Quantity.
I have been able to group the column by week, but I also want to group it by OracleNumber, and Sum the Quantity and Sales columns. How would I get that to work without messing up the Week format.
import pandas as pd
names = ['Date','OracleNumber','Quantity','Sale']
sales = pd.read_csv("CustomerSalesNVG.csv",names=names)
sales['Date'] = pd.to_datetime(sales['Date'])
grouped=sales.groupby(sales['Date'].map(lambda x:x.week))
print(grouped.head())
IIUC, you could groupby w.r.t the week column and OracleNumber column by providing an extra key to the list for which the Groupby object has to use and perform sum operation later:
sales.groupby([sales['Date'].dt.week, 'OracleNumber']).sum()