So I am plotting to visualize the education difference between genders in a given dataset.
I group the employees by gender, summing their years_in_education with this code
df1 = df[["gender","years_in_education"]] #creating a sub-dataframe with only the columns of gender and hourly wage
staff4=df1.groupby(['gender']).sum() #grouping the data frame by gender and assigning it to a new variable 'staff4'
staff4.head() #creating a visual of the grouped data for inspection
Then I use a bar chart to plot the difference with this code >>
my_plot = staff4.T.plot(kind='bar',title="Education difference between Genders") #creating the parameters to plot the graph
The graph comes out as this >>
But I observe the that scale of the y-axis is outrageous as the highest year in employment by the data is 30. I intend to adjust the scale to range from 0 - 30. I did that using my_plot.set_ylim([0,30]) and the result of that was >>
This graph is not reflective of the data as shown in the first. What can I do to change that?
Any ideas pls? How can I also change the orientation of the label on the y-axis.
Related
I am new in this field and I want to create grouped bar graph for the given grouped df.
Can anyone help me?
x = df.groupby(["year","continent"])[["lifeExp"]].mean()
The above is the grouped df that I have created but I don't know how to create grouped bar graph where x-axis should be year y-axis is life_exp and for different continent on xaxis.
My x object for grouped df
Need my grouped graph chart like this
I am not able to come up to any solution yet as i recently started my study in data visualization
I'm pretty new to all of this. I've been looking online how to change my y-ticks to represent a column that I wanted to initially graph.
Anyway, I have a dataframe that I created by using a SQL command...it's called eastasia_pac. The columns are index (although apparently it's not really a column), country, region, no.female_borrowers.
My dataframe looks like this:
I want to graph my dataframe so...
y = eastasia_pac.country
eastasia_pac.plot.barh()
plt.barh(y, width= 0.3, height= 0.8)
plt.xlabel("Number of Female Active Borrowers")
plt.ylabel("Country")
plt.yticks(rotation= 90)
plt.title("East Asia and the Pacific: No. of Female Active Borrowers")
It came out like this:
For some reason, the index is on the y-axis and not country and I don't know how to fix it
By default, matplotlib plots according the index of a specified dataframe. You can solve this by either setting the index of the data or by overwriting the values.
Overwrite method:
# I'm assuming this is what's creating your plot
eastasia_pac.plot.barh()
# Set your ticks
plt.yticks(eastasia_pac.country, rotation= 90)
# Set your labels
plt.xlabel("Number of Female Active Borrowers")
plt.ylabel("Country")
Set Index:
# Create your bar plot using country as an index
eastasia_pac.set_index("country").plot.barh()
# Set your labels
plt.xlabel("Number of Female Active Borrowers")
plt.ylabel("Country")
I'm new to using Seaborn and usually only use Matplotlib.pyplot.
With the recent COVID developments I was asked by a supervisor to put together estimates of how changes to the student population & expenses we need to fund affected student fees (I work in a college budgeting office). I've been able to put together my scenario analysis, but am now trying to visualize these results in a heatmap.
What I'd like to be able to do is have the:
x-axis be my population change rates,
y_axis be my expense change rates,
cmap be my new student fees depending on the x & y axis.
What my code is currently doing is:
x-axis is displaying the new student fee category (not sure how to describe this - see picture)
y-axis is displaying the population change and expense change (population, expenses)
cmap is displaying accurately
Essentially, my code is stacking each scenario on top of the others along the y-axis.
Here is a picture of what is currently being produced, which is not correct:
I've attached a link to a Colab Jupyter notebook with my code, and below is a snippet of the section giving me problems.
# Create Pandas DF of Scenario Analysis
df = pd.DataFrame(list(zip(Pop, Exp, NewStud, NewTotal)),
index = [i for i in range(0,len(NewStud))],
columns=['Population_Change', 'Expense_Change', 'New_Student_Activity_Fee', 'New_Total_Fee'])
# Group this scenario analysis
df = df.groupby(['Population_Change', 'Expense_Change'], sort=False).max()
# Create Figure
fig = plt.figure(figsize=(15,8))
ax = plt.subplot(111)
# Drop New Student Activity Fee Column. Analyze Only New Total Fee
df = df.drop(['New_Student_Activity_Fee'], axis=1)
########################### Not Working As Desired
sb.heatmap(df)
###########################
Your DataFrame is not in the right shape for seaborn.heatmap(). For example, as a result of the groupby operation, you have Population_Change and Expense_Change as a MultiIndex, which would only be used for labelling by the plotting function.
So instead of the groupby, first drop the superfluous column, and then do this:
df = df.pivot(index='Expense_Change', columns='Population_Change', values='New_Total_Fee')
Then seaborn.heatmap(df) should work as expected.
I am making a heatmap where I want the density to be determined by the value of a cell in a Pandas data frame, instead of the heatmap being determined by the amount of data points in certain areas.
HeatMap(data=df[["latitude", "longitude", "price_total"]].groupby(
["latitude", "longitude"]
).sum().reset_index().values.tolist(),
radius=8,
max_zoom=13).add_to(base_map)
How can I alter the code to make the heatmap density be determined by column df["price_total"]?
What data is your heat map expecting? Without knowing that or what your data looks like, I can only guess. If you want a sum of prices in your matrix, you can form it like this:
Heatmap(pd.crosstab(index = df['latitude'], columns = df['longitude'],
values = df['price_total'], aggfunc = 'sum').fillna(0))
I have a plot with hourly values for 2019. When plotting with a sub-set of dates (January only) on the x-axis, my plot goes blank.
I have a DF that I group on the row-axis based on Months and Hours from the time index, for a specific column 'SE3'. The grouping looks good.
Now, I want to plot. The plot looks potentially good, but I want to zoom in on one month only. Based on another post on stackoverflow, I use set_xlim.
Then my plot does not show anything.
#Grouping of DF
df['SE3'].groupby([df.index.month, df.index.hour]).mean().round(2).head()
Picture of grouped DF1
#Plotting and setting new, shorter in time x-axis
ax=df['SE3'].groupby([df.index.month, df.index.hour]).mean().round(2).plot()
ax.set_xlim(pd.Timestamp('2019-01-01 01:00:00'), pd.Timestamp('2019-01-31 23:00:00'))
The expected result is to show the same plot, but now only for January. Instead the grap goes blank. However, the Out data shows
(737060.0416666666, 737090.9583333334), which seems to be date data.
Picture without set_xlim
enter image description here
Picture with set_xlim (empty)
enter image description here
My final aim when I understand why my plot is blank, is to show hourly averages for each month, like this:
enter image description here