I have a dataset with sales per customer, per month. I have both a date field (e.g. June 2018) and a "month counter" which gives each month a progressive number (e.g., if data starts in Jan 2018, Jan 2018 is "1", Dec 2018 is "12", and Jan 2019 is "13").
Please see the image, the first 4 columns is a sample of the data I have.
I'd like, for each month and each customer, to sum the sales of the previous 6 months and of the next 6 months, like in the last 2 columns in the attached image.
For instance: for month 1 and customer "John", I'd like to sum sales for month 2,3,4,5,6,7, only looking at "John", this would be "Next 6 months sales" for John in month 1. Reverse logic for the last 6 months sales.
I tried building a for loop and building some functions, but I didn't quite manage to build anything like what I need.
data
Related
I am trying to create two plots with data similar to the df I created down below. Sr no. represent the the total number of publication each year. For example, in 2022 there are total 4 publications, in 2021, there are 2 publications, and in 2020, there are 6 publications in total.
I want:
In the first plot: 'total number of publications per year' and 'total citation per year'; x-axis is year, left side of y-axis is number of publications each year, right side of y-axis is total citation per year. Bar graph for publication and line/dot graph for citation.
In the second plot: 'total number of publications per year' and 'mean total citation per year', 'mean total citation per article', x-axis is year, the left side of y-axis is No of pub/ Mean Total citation per article, the right side of y-axis mean total citation per year.
The example plots I want for this data posted below:
pub vs citation per year
pub and citation history
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("cancer.csv")
Data
Sr Year Cited by
1 2022 5
2 2022 2
3 2022 7
4 2022 3
5 2021 5
6 2021 25
7 2020 23
8 2020 16
9 2020 1
10 2020 3
11 2020 23
12 2020 3
I was trying with groupby command like following:
figure = df.groupby(['Year'])['Cited by'].mean()
But I am not sure how to continue to generate the graphs like in the example above. Any help will be highly appreciated.
I am trying to sort a chart with flight accident information. So in csv file there are different airlines, year of the accident and bunch of other things. I want to add up all the incidents by year and another chart adding by each year and each airline:
First chart desirable outcome:
year
incidents
2012
11
2013
12
Second chart desirable outcome:
year
incidents
Airline
2011
23
United
2011
20
Hawaii
2011
30
United
I tried to use dt.year but it's not working. Because csv year is in 2018,2019 format, not in 2018-10-12. I cannot use it as date information.
Try:
import matplotlib.pyplot as plt
# Per year
df.value_counts('year').plot()
# Per year, for each company
df.value_counts(['year', 'Airline']).unstack('Airline').plot(kind='bar')
plt.show()
I have monthly and daily weather data which starts from 1981 to 2018 but I want to have data for 2019 and 2020 based on the previous years . how can I do/ get data for 2019 and 2020 that using python
I have a pandas dataframe that has a datetime column called date.
How can I create a new column to represent the Australian financial year using the date column?
The Australian financial year starts on 1 July and ends the next year on 30 June.
Example 1: 10 June 2019 is FY 2019
Example 2: 5 July 2019 is FY 2020
The code below creates a new column representing Australian financial year using the existing 'date' column:
df['FY'] = df['date'].map(lambda d: d.year + 1 if d.month > 6 else d.year)
I have a housing market dataset categorized by U.S Counties showing columns such as total_homes_sold. I'm trying to show a comparison between housing sales YoY (e.g. Jan 2020 vs. Jan 2019) and by county (e.g. Aberdeen Mar 2020 vs. Suffolk Mar 2020). However not sure how to group the dates as they are not sorted by months (Jan, Feb, Mar etc.) but rather by 4-week intervals: period_begin and period_end.
Intervals between years vary. The period_begin for Aberdeen (around Jan) for 2019 might be 1/7 to 2/3 but 1/6 to 2/2 for 2020 (image shown below).
I tried using count (code below) to label each 4-week period as a number (shown below) thinking I could compare Aberdeen 2017-1 to Aberdeen 2020-1 (1 coded as the first time interval) but realized that some years for some regions have more 4 week periods in a year than others (2017 has 13 whereas 2018 has 14).
*df['count'] = df.groupby((everyfourth['region_name'] != df['region_name'].shift(1)).cumsum()).cumcount()+1*
Any ideas on what code I could use to closely categorize these two columns into month-like periods?
Snippet of Dataset here
Let me know if you have any questions. Not sure I made sense! Thanks.