how to segregate monthly average data based on station wise using pandas?

how to segregate monthly average data based on station wise using pandas? - python

I have 30 years of data that has been collected from 385 stations. I would like to calculate the monthly average of all years according to individual stations and export it into a CSV file. I am very new to coding I don't know how to execute this. please help someone to sort out my issues .herewith I have enclosed the code for one station. as same as like i should prepare csv file all 385 stations
#selective column only
ap= data[data["station_id"]=='C0A520']
ap=ap[['station_id','TEMP','YEAR','MONTH']]
grouped = ap.groupby(by=["YEAR","MONTH"])
monthly_mean = grouped.mean()
monthly_mean.head()
#export groupby
grouped.mean().reset_index().to_csv('D:/My_files/Research Progress/data/Temperature/final/coa520.csv')

I am assuming that your existing code works as intended and that you do not want to write the code for each of the 385 stations. This can be achieved in a simple for loop iterating over the station names:
for station in data["station_id"].unique():
# selective column only
ap= data[data["station_id"]==station]
ap=ap[['station_id','TEMP','YEAR','MONTH']]
grouped = ap.groupby(by=["YEAR","MONTH"])
# export groupby
grouped.mean().reset_index().to_csv(f'D:/My_files/Research Progress/data/Temperature/final/{station}.csv')
(You did not use the monthly_mean variable, so I wrote it out)

Related

pandas computing new column as a average of other two conditions

So I have this dataset of temperatures. Each line describe the temperature in celsius measured by hour in a day.
So, I need to compute a new variable called avg_temp_ar_mensal which representsthe average temperature of a city in a month. City in this dataset is represented as estacao and month as mes.
I'm trying to do this using pandas. The following line of code is the one I'm trying to use to solve this problem:
df2['avg_temp_ar_mensal'] = df2['temp_ar'].groupby(df2['mes', 'estacao']).mean()
The goal of this code is to store in a new column the average of the temperature of the city and month. But it doesn't work. If I try the following line of code:
df2['avg_temp_ar_mensal'] = df2['temp_ar'].groupby(df2['mes']).mean()
It will works, but it is wrong. It will calculate for every city of the dataset and I don't want it because it will cause noise in my data. I need to separate each temperature based on month and city and then calculate the mean.

The dataframe after groupby is smaller than the initial dataframe, that is why your code run into error.
There is two ways to solve this problem. The first one is using transform as:
df.groupby(['mes', 'estacao'])['temp_ar'].transform(lambda g: g.mean())
The second is to create a new dfn from groupby then merge back to df
dfn = df.groupby(['mes', 'estacao'])['temp_ar'].mean().reset_index(name='average')
df = pd.merge(df, dfn, on=['mes', 'estacao'], how='left']

You are calling a groupby on a single column when you are doing df2['temp_ar'].groupby(...). This doesn't make much sense since in a single column, there's nothing to group by.
Instead, you have to perform the groupby on all the columns you need. Also, make sure that the final output is a series and not a dataframe
df['new_column'] = df[['city_column', 'month_column', 'temp_column']].groupby(['city_column', 'month_column']).mean()['temp_column']
This should do the trick if I understand your dataset correctly. If not, please provide a reproducible version of your df

how can I take certain elements in a larger data frame and make another data Frame with these elements in python?

I am currently working on a project that uses a data Frame of almost 24000 basketball games from the years 2004-2021. what I want to do in the end is make a single data Frame that has only 1 row for each year and the column values will be the mean for that category. What I have so far is a mask function that can separate by year but I want to make a for loop that will go through the list of years, get the mean of that, and then concatenate them into a new data frame. The code might help explain this better.
## now i want to seperate this into data sets based on year so ill make a function this will be used to seperate by year. in my original dataset "SEASON" is the year.
def mask(year):
mask = stats['SEASON']== year
year_mask= stats[mask]
return year_mask
how can I possibly make this into a loop that seperates by year, finds mean clues of all columns in that year, and combines them into 1 data from that should have 18 rows that span from 2004-2021?

If you are using Pandas dataframes it's best to let pandas do the work for you.
I assume you want to calculate the mean of some category in your dataframe grouped by the year. To do this we can create a function like so:
def foo(df, category):
return df.groupby(by=["year"])[category].mean()
If you want to mean all the categories just use:
df.groupby(by=["year"]).mean()

How do I create new pandas dataframe by grouping multiple variables?

I am having tremendous difficulty getting my data sorted. I'm at the point where I could have manually created a new .csv file in the time I have spent trying to figure this out, but I need to do this through code. I have a large dataset of baseball salaries by player going back 150 years.
This is what my dataset looks like.
I want to create a new dataframe that adds the individual player salaries for a given team for a given year, organized by team and by year. Using the following technique I have come up with this: team_salaries_groupby_team = salaries.groupby(['teamID','yearID']).agg({'salary' : ['sum']}), which outputs this: my output. On screen it looks sort of like what I want, but I want a dataframe with three columns (plus an index on the left). I can't really do the sort of analysis I want to do with this output.
Lastly, I have also tried this method: new_column = salaries['teamID'] + salaries['yearID'].astype(str) salaries['teamyear'] = new_column salaries teamyear = salaries.groupby(['teamyear']).agg({'salary' : ['sum']}) print(teamyear). Another output It adds the individual player salaries per team for a given year, but now I don't know how to separate the year and put it into its own column. Help please?

You just need to reset_index()
Here is sample code :
salaries = pd.DataFrame(columns=['yearID','teamID','igID','playerID','salary'])
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'A','salary':10000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'B','salary':20000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'A','salary':10000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'C','salary':5000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'B','salary':20000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'A','salary':100000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'B','salary':200000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'C','salary':50000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'A','salary':100000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'B','salary':200000},ignore_index=True)
After that , groupby and reset_index
sample_df = salaries.groupby(['teamID', 'yearID']).salary.sum().reset_index()
Is this what you are looking for ?

Conditional Average from Pandas DataFrame

I have a dataframe with multiple columns of real estate sales data. I would like to find the average price-per-square-foot 'ppsf' for all 1bed-1bath sales by zip code. Here is my attempt (each key in the dict is a zip code):
bed1_bath1={}
for zip in zip_codes:
bed1_bath1[zip]= (df.loc[(df['bed']==1) & (df['bath']==1) & (df['zip']==zip)]).mean()
The problem is that this adds the mean of all columns from the dataframe to the dictionary. I'm sure there is a better way to do this; maybe using numpy.where?

(df[(df['bed']==1) & (df['bath']==1) & (df['zip']==zip)])['ppsf'].mean() would do it. You simply choose the column you are interested in before calculating the mean (so you will not even do the processing for the rest of the columns).

How to calculate based on multiple conditions using Python data frames?

I have excel data file with thousands of rows and columns.
I am using python and have started using pandas dataframes to analyze data.
What I want to do in column D is to calculate annual change for values in column C for each year for each ID.
I can use excel to do this – if the org ID is same are that in the prior row, calculate annual change (leaving the cells highlighted in blue because that’s the first period for that particular ID). I don’t know how to do this using python. Can anyone help?

Assuming the dataframe is already sorted
df.groupby(‘ID’).Cash.pct_change()
However, you can speed things up with the assumption things are sorted. Because it’s not necessary to group in order to calculate percentage change from one row to next
df.Cash.pct_change().mask(
df.ID != df.ID.shift()
)
These should produce the column values you are looking for. In order to add the column, you’ll need to assign to a column or create a new dataframe with the new column
df[‘AnnChange’] = df.groupby(‘ID’).Cash.pct_change()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to segregate monthly average data based on station wise using pandas? - python

Related

pandas computing new column as a average of other two conditions

how can I take certain elements in a larger data frame and make another data Frame with these elements in python?

How do I create new pandas dataframe by grouping multiple variables?

Conditional Average from Pandas DataFrame

How to calculate based on multiple conditions using Python data frames?

Categories

Resources