I have monthly and daily weather data which starts from 1981 to 2018 but I want to have data for 2019 and 2020 based on the previous years . how can I do/ get data for 2019 and 2020 that using python
Related
I know the year-on-year inflation rates for the past 5yrs. But I want to derive another column containing compounded inflation relative to the current year.
To illustrate, I have the below table where compound_inflation_to_2022 is the product of all yoy_inflation instances from each year prior to 2022.
So, for 2021 this is simply 2021's yoy_inflation rate.
For 2020 the compound rate is 2020 x 2021.
For 2019 the compound rate is 2019 x 2020 x 2021, and so on.
year
yoy_inflation
compound_inflation_to_2022
2021
1.048
1.048
2020
1.008
1.056
2019
1.014
1.071
2018
1.02
1.093
2017
1.027
1.122
2016
1.018
1.142
Does anyone have an elegant solution for calculating this compound inflation column in python?
So Pandas DataFrame has this feature called .cumprod() and I think it can be of utmost help to you.
df['compound_inflation_to_2022'] = df['yoy_inflation'].cumprod()
I hope this was what you were looking for ^_^
I am trying to sort a chart with flight accident information. So in csv file there are different airlines, year of the accident and bunch of other things. I want to add up all the incidents by year and another chart adding by each year and each airline:
First chart desirable outcome:
year
incidents
2012
11
2013
12
Second chart desirable outcome:
year
incidents
Airline
2011
23
United
2011
20
Hawaii
2011
30
United
I tried to use dt.year but it's not working. Because csv year is in 2018,2019 format, not in 2018-10-12. I cannot use it as date information.
Try:
import matplotlib.pyplot as plt
# Per year
df.value_counts('year').plot()
# Per year, for each company
df.value_counts(['year', 'Airline']).unstack('Airline').plot(kind='bar')
plt.show()
I have a dataset with sales per customer, per month. I have both a date field (e.g. June 2018) and a "month counter" which gives each month a progressive number (e.g., if data starts in Jan 2018, Jan 2018 is "1", Dec 2018 is "12", and Jan 2019 is "13").
Please see the image, the first 4 columns is a sample of the data I have.
I'd like, for each month and each customer, to sum the sales of the previous 6 months and of the next 6 months, like in the last 2 columns in the attached image.
For instance: for month 1 and customer "John", I'd like to sum sales for month 2,3,4,5,6,7, only looking at "John", this would be "Next 6 months sales" for John in month 1. Reverse logic for the last 6 months sales.
I tried building a for loop and building some functions, but I didn't quite manage to build anything like what I need.
data
I have a housing market dataset categorized by U.S Counties showing columns such as total_homes_sold. I'm trying to show a comparison between housing sales YoY (e.g. Jan 2020 vs. Jan 2019) and by county (e.g. Aberdeen Mar 2020 vs. Suffolk Mar 2020). However not sure how to group the dates as they are not sorted by months (Jan, Feb, Mar etc.) but rather by 4-week intervals: period_begin and period_end.
Intervals between years vary. The period_begin for Aberdeen (around Jan) for 2019 might be 1/7 to 2/3 but 1/6 to 2/2 for 2020 (image shown below).
I tried using count (code below) to label each 4-week period as a number (shown below) thinking I could compare Aberdeen 2017-1 to Aberdeen 2020-1 (1 coded as the first time interval) but realized that some years for some regions have more 4 week periods in a year than others (2017 has 13 whereas 2018 has 14).
*df['count'] = df.groupby((everyfourth['region_name'] != df['region_name'].shift(1)).cumsum()).cumcount()+1*
Any ideas on what code I could use to closely categorize these two columns into month-like periods?
Snippet of Dataset here
Let me know if you have any questions. Not sure I made sense! Thanks.
You can use this to create the dataframe:
xyz = pd.DataFrame({'release' : ['7 June 2013', '2012', '31 January 2013',
'February 2008', '17 June 2014', '2013']})
I am trying to split the data and save, them into 3 columns named "day, month and year", using this command:
dataframe[['day','month','year']] = dataframe['release'].str.rsplit(expand=True)
The resulting dataframe is :
dataframe
As you can see, that it works perfectly when it gets 3 strings, but whenever it is getting less then 3 strings, it saves the data at the wrong place.
I have tried split and rsplit, both are giving the same result.
Any solution to get the data at the right place?
The last one is year and it is present in every condition , it should be the first one to be saved and then month if it is present otherwise nothing and same way the day should be stored.
You could
In [17]: dataframe[['year', 'month', 'day']] = dataframe['release'].apply(
lambda x: pd.Series(x.split()[::-1]))
In [18]: dataframe
Out[18]:
release year month day
0 7 June 2013 2013 June 7
1 2012 2012 NaN NaN
2 31 January 2013 2013 January 31
3 February 2008 2008 February NaN
4 17 June 2014 2014 June 17
5 2013 2013 NaN NaN
Try reversing the result.
dataframe[['year','month','day']] = dataframe['release'].str.rsplit(expand=True).reverse()