How to change the index to something else on pandas

How to change the index to something else on pandas - python

date = pd.DatetimeIndex(df['Release Date']).to_period("M")
per = df['Release Date'].dt.to_period("M")
g = df.groupby(per)
I want to set something different as an index but I want my data to be grouped by months of the year, because I want to be able to plot a graph with months and quantities sold, but I don't know how to. Please help!

date.groupby(data['Release Date'].map(lambda x: x.month))
The given code was horribly written so I cannot tell what actual dataframe looks like. please use the code sample format.

Related

How to create a line plot using the mean of a column and extracting year from a Date column?

Update: I've now managed to solve this. For extracting the year this is what I used,
df['year'] = pd.DatetimeIndex(df['Date']).year
this allowed me to add a new column for the year and then use that column to plot the chart.
sns.lineplot(y="Class", x="year", data=df)
plt.xlabel("Year",fontsize=20)
plt.ylabel("Success Rate",fontsize=20)
plt.show()
Now I managed the right plot chart.
I'm trying a get a line plot using the mean of a column and linking that to extracted value (year) from the date column. However, I can't seem to get the right outcome.
Here's how I extracted the Year value from the date column,
year=[]
def Extract_year(date):
for i in df["Date"]:
year.append(i.split("-")[0])
return year
And here's how plotted the values to create a line plot,
sns.lineplot(y=df['Class'].mean(), x=Extract_year(df))
plt.xlabel("Year",fontsize=20)
plt.ylabel("Success Rate",fontsize=20)
plt.show()
But instead of seeing a trend (see screenshot-1), it only displays a straight line (see screenshot-2) for the mean value. Could someone please explain to me, what I am doing wrong and how can I correct it?
Thanks!

What you are plotting is df['Class'].mean(), that of course is a fixed value. I don't know which time format you're using, but maybe you need to calculate different means for different years
EDIT:
Yes there is:
df = pd.DataFrame({'Date':['2020-01-20','2019-01-20','2022-01-20','2021-01-20','2012-01-20','2013-01-20','2016-01-20','2018-01-20']})
years = pd.to_datetime(df['Date'],format='%Y-%m-%d').dt.year.sort_values().tolist()

Calculating number of transaction that occur in each month in pandas

example data
I'm given a set of data transaction gathered throughout 3 years. I am required to count the number of transactions that occur each month and identify which month and year has more than 300 transactions.
I tried using this but idk how else I can do it.
Can you help me please?
The image attached has an example of the data I'm want to process
df[df[('Transaction_date')].value_counts()

You need to further preprocessing your data so you can groupby month and year but you need to provide more information in question so my answer be specific for your question my answer is general so far
df['year'] = df['Transaction_date'].dt.year
df['month'] = df['Transaction_date'].dt.month
df.groupby(['year','month']).size()

How do I create new pandas dataframe by grouping multiple variables?

I am having tremendous difficulty getting my data sorted. I'm at the point where I could have manually created a new .csv file in the time I have spent trying to figure this out, but I need to do this through code. I have a large dataset of baseball salaries by player going back 150 years.
This is what my dataset looks like.
I want to create a new dataframe that adds the individual player salaries for a given team for a given year, organized by team and by year. Using the following technique I have come up with this: team_salaries_groupby_team = salaries.groupby(['teamID','yearID']).agg({'salary' : ['sum']}), which outputs this: my output. On screen it looks sort of like what I want, but I want a dataframe with three columns (plus an index on the left). I can't really do the sort of analysis I want to do with this output.
Lastly, I have also tried this method: new_column = salaries['teamID'] + salaries['yearID'].astype(str) salaries['teamyear'] = new_column salaries teamyear = salaries.groupby(['teamyear']).agg({'salary' : ['sum']}) print(teamyear). Another output It adds the individual player salaries per team for a given year, but now I don't know how to separate the year and put it into its own column. Help please?

You just need to reset_index()
Here is sample code :
salaries = pd.DataFrame(columns=['yearID','teamID','igID','playerID','salary'])
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'A','salary':10000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'B','salary':20000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'A','salary':10000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'C','salary':5000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'B','salary':20000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'A','salary':100000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'B','salary':200000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'C','salary':50000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'A','salary':100000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'B','salary':200000},ignore_index=True)
After that , groupby and reset_index
sample_df = salaries.groupby(['teamID', 'yearID']).salary.sum().reset_index()
Is this what you are looking for ?

Is there function for computing percentage change for each month in dataframe python

I have a dataframe (df) table as follows (imported from excel)
From this, I would like to obtain percentage change for 'Total' based on month.
eg.
((2020-01-2019-01)/2019-01)*100
((2020-02-2019-02)/2019-02)*100 etc.
For now, I have come up with following function.
def TotalChange (last_year,current_year):
return((float(current_year)-last_year)/(last_year))*100
Are there any functions which will automate this pattern and how I can derive those figures and put them in a table?

Try:
df['monthly_perc'] = df['Total']/ df.groupby(df['Date'].dt.to_period('M'))['Total'].trasnform('sum')

How to sort bars in a Pandas Histogram

I have a problem with the following code.
databisclose.loc[:,"Close Month Only"]=databisclose.loc[:,"Close Month"].dt.month
serie = databisclose.loc[:,"Close Month Only"].value_counts()
serie.plot(kind='bar')
Databisclose is a dataframe.
The output is the following histogram :
Histogram
I would like to sort the columns in the month normal order (1,2,3,4..).
Do you know how I can do that ?
Thanks for your help, and don't hesitate to tell me if something is not understandable, it's the first time I ask a question !

Just update this line by adding a parameter to avoid sorting (By default its true) -
serie = databisclose.loc[:,"Close Month Only"].value_counts(sort = False)
More about this function in the docs

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to change the index to something else on pandas - python

date.groupby(data['Release Date'].map(lambda x: x.month)) The given code was horribly written so I cannot tell what actual dataframe looks like. please use the code sample format.

Related

How to create a line plot using the mean of a column and extracting year from a Date column?

Calculating number of transaction that occur in each month in pandas

How do I create new pandas dataframe by grouping multiple variables?

Is there function for computing percentage change for each month in dataframe python

How to sort bars in a Pandas Histogram

Categories

Resources