How to sort bars in a Pandas Histogram - python

I have a problem with the following code.
databisclose.loc[:,"Close Month Only"]=databisclose.loc[:,"Close Month"].dt.month
serie = databisclose.loc[:,"Close Month Only"].value_counts()
serie.plot(kind='bar')
Databisclose is a dataframe.
The output is the following histogram :
Histogram
I would like to sort the columns in the month normal order (1,2,3,4..).
Do you know how I can do that ?
Thanks for your help, and don't hesitate to tell me if something is not understandable, it's the first time I ask a question !

Just update this line by adding a parameter to avoid sorting (By default its true) -
serie = databisclose.loc[:,"Close Month Only"].value_counts(sort = False)
More about this function in the docs

Related

How to create a line plot using the mean of a column and extracting year from a Date column?

Update: I've now managed to solve this. For extracting the year this is what I used,
df['year'] = pd.DatetimeIndex(df['Date']).year
this allowed me to add a new column for the year and then use that column to plot the chart.
sns.lineplot(y="Class", x="year", data=df)
plt.xlabel("Year",fontsize=20)
plt.ylabel("Success Rate",fontsize=20)
plt.show()
Now I managed the right plot chart.
I'm trying a get a line plot using the mean of a column and linking that to extracted value (year) from the date column. However, I can't seem to get the right outcome.
Here's how I extracted the Year value from the date column,
year=[]
def Extract_year(date):
for i in df["Date"]:
year.append(i.split("-")[0])
return year
And here's how plotted the values to create a line plot,
sns.lineplot(y=df['Class'].mean(), x=Extract_year(df))
plt.xlabel("Year",fontsize=20)
plt.ylabel("Success Rate",fontsize=20)
plt.show()
But instead of seeing a trend (see screenshot-1), it only displays a straight line (see screenshot-2) for the mean value. Could someone please explain to me, what I am doing wrong and how can I correct it?
Thanks!
What you are plotting is df['Class'].mean(), that of course is a fixed value. I don't know which time format you're using, but maybe you need to calculate different means for different years
EDIT:
Yes there is:
df = pd.DataFrame({'Date':['2020-01-20','2019-01-20','2022-01-20','2021-01-20','2012-01-20','2013-01-20','2016-01-20','2018-01-20']})
years = pd.to_datetime(df['Date'],format='%Y-%m-%d').dt.year.sort_values().tolist()

ffill() does not fill the last year of observations

I have an issue with Python not filling the months of latest observed year. Does anyone know how to expand the line of code to get the values for all the months of the whole year of 2021?
I understand that our df is not any longer than 2021 and ffill does exactly what I have asked it to do. But is there any way to lengthen it by adding something that says so?
#to monthly scores
esg_data = esg_data.set_index("datadate")
esg_data = esg_data.groupby("conm").resample('M').ffill()
esg_data = esg_data.reset_index(level=0, drop=True).reset_index()
The output of the code can be found from here:
Screenshot of the output
Thanks in advance
Angela

Need to do calculation in dataframe with previous row value

I have this data frame with two column. The condition I need to form is when 'Balance Created column is empty, I need to take last filled value of Balance Created and add it with the next row of Amount value.
Original Data frame:
After Calculation, my desired result should be:
you can try using cummulative sum of pandas to achieve this,
df['Amount'].cumsum()
# Edit-1
condition = df['Balance Created'].isnull()
df.loc[condition, 'Balance Created'] = df['Amount'].loc[condition]
you can also apply based on groups like deposit and withdraw
df.groupby('transaction')['Amount'].cumsum()
I assume your question is mostly "How do I solve this using pandas", which is a good question that others have given you pandas-specific answers for.
But in case this question is more in the lines of "How do I solve this using an algorithm", which is a common problem to solve for people just starting of writing code, then this little paragraph might push you in the right direction.
for index in frame do
if frame.balance[i] is empty do
if i equals 0 do // Edge-case where first balance is missing
frame.balance[i] = frame.amount[i]
else do
frame.balance[i] = frame.amount[i] + frame.balance[i-1]
end
end
end

How to change the index to something else on pandas

date = pd.DatetimeIndex(df['Release Date']).to_period("M")
per = df['Release Date'].dt.to_period("M")
g = df.groupby(per)
I want to set something different as an index but I want my data to be grouped by months of the year, because I want to be able to plot a graph with months and quantities sold, but I don't know how to. Please help!
date.groupby(data['Release Date'].map(lambda x: x.month))
The given code was horribly written so I cannot tell what actual dataframe looks like. please use the code sample format.

Factorplot with multiindex dataframe

This is the dataframe I am working with:
(only the first two years don't have data for country 69 I will fix this). nkill being the number of killed for that year summed from the original long form dataframe.
I am trying to do something similar to this plot:
However, with the country code as a hue. I know there are similar posts but none have helped me solve this, thank you in advance.
By Hue I mean that in the seaborn syntactical use As pictured in this third picture. See in this example Hue creates a plot for every type of variable in that column. So if I had two country codes in the country column, for every year it would plot two bars (one for each country) side by side.
Just looking at the data it should be possible to directly use the hue argument.
But first you would need to create actual columns from the dataframe
df.reset_index(inplace=True)
Then something like
sns.barplot(x = "year", y="nkill", hue="country", data=df)
should give you the desired plot.

Categories

Resources