Update: I've now managed to solve this. For extracting the year this is what I used,
df['year'] = pd.DatetimeIndex(df['Date']).year
this allowed me to add a new column for the year and then use that column to plot the chart.
sns.lineplot(y="Class", x="year", data=df)
plt.xlabel("Year",fontsize=20)
plt.ylabel("Success Rate",fontsize=20)
plt.show()
Now I managed the right plot chart.
I'm trying a get a line plot using the mean of a column and linking that to extracted value (year) from the date column. However, I can't seem to get the right outcome.
Here's how I extracted the Year value from the date column,
year=[]
def Extract_year(date):
for i in df["Date"]:
year.append(i.split("-")[0])
return year
And here's how plotted the values to create a line plot,
sns.lineplot(y=df['Class'].mean(), x=Extract_year(df))
plt.xlabel("Year",fontsize=20)
plt.ylabel("Success Rate",fontsize=20)
plt.show()
But instead of seeing a trend (see screenshot-1), it only displays a straight line (see screenshot-2) for the mean value. Could someone please explain to me, what I am doing wrong and how can I correct it?
Thanks!
What you are plotting is df['Class'].mean(), that of course is a fixed value. I don't know which time format you're using, but maybe you need to calculate different means for different years
EDIT:
Yes there is:
df = pd.DataFrame({'Date':['2020-01-20','2019-01-20','2022-01-20','2021-01-20','2012-01-20','2013-01-20','2016-01-20','2018-01-20']})
years = pd.to_datetime(df['Date'],format='%Y-%m-%d').dt.year.sort_values().tolist()
example data
I'm given a set of data transaction gathered throughout 3 years. I am required to count the number of transactions that occur each month and identify which month and year has more than 300 transactions.
I tried using this but idk how else I can do it.
Can you help me please?
The image attached has an example of the data I'm want to process
df[df[('Transaction_date')].value_counts()
You need to further preprocessing your data so you can groupby month and year but you need to provide more information in question so my answer be specific for your question my answer is general so far
df['year'] = df['Transaction_date'].dt.year
df['month'] = df['Transaction_date'].dt.month
df.groupby(['year','month']).size()
I am having tremendous difficulty getting my data sorted. I'm at the point where I could have manually created a new .csv file in the time I have spent trying to figure this out, but I need to do this through code. I have a large dataset of baseball salaries by player going back 150 years.
This is what my dataset looks like.
I want to create a new dataframe that adds the individual player salaries for a given team for a given year, organized by team and by year. Using the following technique I have come up with this: team_salaries_groupby_team = salaries.groupby(['teamID','yearID']).agg({'salary' : ['sum']}), which outputs this: my output. On screen it looks sort of like what I want, but I want a dataframe with three columns (plus an index on the left). I can't really do the sort of analysis I want to do with this output.
Lastly, I have also tried this method: new_column = salaries['teamID'] + salaries['yearID'].astype(str) salaries['teamyear'] = new_column salaries teamyear = salaries.groupby(['teamyear']).agg({'salary' : ['sum']}) print(teamyear). Another output It adds the individual player salaries per team for a given year, but now I don't know how to separate the year and put it into its own column. Help please?
You just need to reset_index()
Here is sample code :
salaries = pd.DataFrame(columns=['yearID','teamID','igID','playerID','salary'])
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'A','salary':10000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'B','salary':20000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'A','salary':10000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'C','salary':5000},ignore_index=True)
salaries=salaries.append({'yearID':1985,'teamID':'ATL','igID':'NL','playerID':'B','salary':20000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'A','salary':100000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'B','salary':200000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'C','salary':50000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'A','salary':100000},ignore_index=True)
salaries=salaries.append({'yearID':2016,'teamID':'ATL','igID':'NL','playerID':'B','salary':200000},ignore_index=True)
After that , groupby and reset_index
sample_df = salaries.groupby(['teamID', 'yearID']).salary.sum().reset_index()
Is this what you are looking for ?
I have a dataframe (df) table as follows (imported from excel)
From this, I would like to obtain percentage change for 'Total' based on month.
eg.
((2020-01-2019-01)/2019-01)*100
((2020-02-2019-02)/2019-02)*100 etc.
For now, I have come up with following function.
def TotalChange (last_year,current_year):
return((float(current_year)-last_year)/(last_year))*100
Are there any functions which will automate this pattern and how I can derive those figures and put them in a table?
Try:
df['monthly_perc'] = df['Total']/ df.groupby(df['Date'].dt.to_period('M'))['Total'].trasnform('sum')
I have a problem with the following code.
databisclose.loc[:,"Close Month Only"]=databisclose.loc[:,"Close Month"].dt.month
serie = databisclose.loc[:,"Close Month Only"].value_counts()
serie.plot(kind='bar')
Databisclose is a dataframe.
The output is the following histogram :
Histogram
I would like to sort the columns in the month normal order (1,2,3,4..).
Do you know how I can do that ?
Thanks for your help, and don't hesitate to tell me if something is not understandable, it's the first time I ask a question !
Just update this line by adding a parameter to avoid sorting (By default its true) -
serie = databisclose.loc[:,"Close Month Only"].value_counts(sort = False)
More about this function in the docs