I have a df, from which Ive indexed europe_n and Ive plotted a bar plot.
europe_n (r=5, c=45), looks like this. ;
df['Country'](string) & df['Population'](numeric) variable/s.
plt.bar(df['Country'],df['Population'], label='Population')
plt.xlabel('Country')
plt.ylabel('Population')
plt.legend()
plt.show()
Which gives me;
Objective: Im trying to change my y-axis limit to start from 0, instead of 43,094.
I ran the, plt.ylim(0,500000) method, but there was no change to the y-axis and threw an error. Any suggestions from matplotlib library?
Error;
Conclusion: The reason why I wasn't able to plot the graph as I wanted was due to all columns being in object dtype. I only realized this when Jupyter threw an error stating, 'there are no integers to plot'. Eventually converted the digit column Population to int type, code worked and I got the graph!
ax.set_ylim([0,max_value])
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df = pd.DataFrame({
'Country':['Denmark', 'Finland', 'Iceland', 'Norway', 'Sweden'],
'Population':[5882261, 5540745, 372899, 5434319, 10549347]
})
print(df)
###
Country Population
0 Denmark 5882261
1 Finland 5540745
2 Iceland 372899
3 Norway 5434319
4 Sweden 10549347
fig, ax = plt.subplots()
ax.bar(df['Country'], df['Population'], color='#3B4B59')
ax.set_title('Population of Countries')
ax.set_xlabel('Country')
ax.set_ylabel('Population')
max_value = 12000000
ticks_loc = np.arange(0, max_value, step=2000000)
ax.set_yticks(ticks_loc)
ax.set_ylim([0,max_value])
ax.set_yticklabels(['{:,.0f}'.format(x) for x in ax.get_yticks()])
ax.grid(False)
fig.set_size_inches(10,5)
fig.set_dpi(300)
plt.show()
Be sure that you already imported the following packages,
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
Your code should probably like:
fig, ax = plt.subplots()
ax.bar(europe_n['Country'].values, europe_n['Area(sq km)'].values, color='#3B4B59')
ax.set_xlabel('Country')
ax.set_ylabel('Population')
max_value = 500000
ticks_loc = np.arange(0, max_value, step=10000)
ax.set_yticks(ticks_loc)
ax.set_ylim(0,max_value)
ax.set_yticklabels(['{:,.0f}'.format(x) for x in ax.get_yticks()])
ax.grid(False)
fig.set_size_inches(10,5)
fig.set_dpi(300)
plt.show()
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.ylim.html
To set the y limit
plt.ylim(start,end)
To set the x limit
plt.xlim(start,end)
Example
Related
I have an issue with axis labels when using groupby and trying to plot with seaborn. Here is my problem:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.DataFrame({'user': ['Bob', 'Jane','Alice','Bob','Jane','Alice'],
'income': [40000, 50000, 42000,47000,53000,46000]})
groupedProduct = df.groupby(['Product']).sum().reset_index()
I then plot a horizontal bar plot using seaborn:
bar = sns.barplot( x="income", y="user", data=df_group_user, color="b" )
#Prettify the plot
bar.set_yticklabels( bar.get_yticks(), size = 10)
bar.set_xticklabels( bar.get_xticks(), size = 10)
bar.set_ylabel("User", fontsize = 20)
bar.set_xlabel("Income ($)", fontsize = 20)
bar.set_title("Total income per user", fontsize = 20)
sns.set_theme(style="whitegrid")
sns.set_color_codes("muted")
Unfortunately, when I run the code in such a manner, the y-axis ticks are labelled as 0,1,2 instead of Bob, Jane, Alice as I'd like it to.
I can get around the issue if I use matplotlib in the following manner:
df_group_user = df.groupby(['user']).sum()
df_group_user['income'].plot(kind="barh")
plt.title("Total income per user")
plt.ylabel("User")
plt.xlabel("Income ($)")
Ideally, I'd like to use seaborn for plotting, but if I don't use reset_index() like above, when calling sns.barplot:
bar = sns.barplot( x="income", y="user", data=df_group_user, color="b" )
ValueError: Could not interpret input 'user'
just try re-writing the positions of x and y axis.
I'm using a diff dataframe to exhibit similar situation.
gp = df.groupby("Gender")['Salary'].sum().reset_index()
gp
Output:
Gender Salary
0 Female 8870
1 Male 23667
Now while plotting a bar chart, mention x axis first and then supply y axis and check,
bar = sns.barplot(x = 'Salary', y = "Gender", data = gp);
I've got a df with messages from a WhatsApp chat, the sender and the corresponding time in datetime format.
Time
Sender
Message
2020-12-21 22:23:00
Sender 1
"..."
2020-12-21 22:26:00
Sender 2
"..."
2020-12-21 22:35:00
Sender 1
"..."
I can plot the histogram with sns.histplot(df["Time"], bins=48)
But now the ticks on the x-axis don't make much sense. I end up with 30 ticks even though it should be 24 and also the ticks all contain the whole date plus the time where I would want only the time in "%H:%M"
Where is the issue with the wrong ticks coming from?
Thanks!
Both seaborn and pandas use matplotlib for plotting functions. Let's see who returns the bin values, we would need to adapt the x-ticks:
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 5))
#fake data generation
np.random.seed(1234)
n=20
start = pd.to_datetime("2020-11-15")
df = pd.DataFrame({"Time": pd.to_timedelta(np.random.rand(n), unit="D") + start, "A": np.random.randint(1, 100, n)})
#print(df)
#pandas histogram plotting function, left
pd_g = df["Time"].hist(bins=5, xrot=90, ax=ax1)
#no bin information
print(pd_g)
ax1.set_title("Pandas")
#seaborn histogram plotting, middle
sns_g = sns.histplot(df["Time"], bins=5, ax=ax2)
ax2.tick_params(axis="x", labelrotation=90)
#no bin information
print(sns_g)
ax2.set_title("Seaborn")
#matplotlib histogram, right
mpl_g = ax3.hist(df["Time"], bins=5, edgecolor="white")
ax3.tick_params(axis="x", labelrotation=90)
#hooray, bin information, alas in floats representing dates
print(mpl_g)
ax3.set_title("Matplotlib")
plt.tight_layout()
plt.show()
Sample output:
From this exercise we can conclude that all three refer to the same routine. So, we can directly use matplotlib which provides us with the bin values:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.dates import num2date
fig, ax = plt.subplots(figsize=(8, 5))
#fake data generation
np.random.seed(1234)
n=20
start = pd.to_datetime("2020-11-15")
df = pd.DataFrame({"Time": pd.to_timedelta(np.random.rand(n), unit="D") + start, "A": np.random.randint(1, 100, n)})
#plots histogram, returns counts, bin border values, and the bars themselves
h_vals, h_bins, h_bars = ax.hist(df["Time"], bins=5, edgecolor="white")
#plot x ticks at the place where the bin borders are
ax.set_xticks(h_bins)
#label them with dates in HH:MM format after conversion of the float values that matplotlib uses internally
ax.set_xticklabels([num2date(curr_bin).strftime("%H:%M") for curr_bin in h_bins])
plt.show()
Sample output:
Seaborn and pandas make life easier because they provide convenience wrappers and some additional functionality for commonly used plotting functions. However, if they do not suffice in the parameters they provide, one has often to revert to matplotlib which is more flexible in what it can do. Obviously, there might be an easier way in pandas or seaborn, I am not aware of. I will happily upvote any better suggestion within these libraries.
Trying to have my y axis range from 0-450,000 with an increment value of 50000. I believe I have the right technique incorporated with "plt.yticks(np.arange(0,450001,50000))" Confused as to why all my y axis values disappear however when I run it. I have also tried
"ax = plt.gca()
ax.set_ylim([0,450000])"
The numbers just end up looking smudged on the bottom of y axis. Here is my code thus far...
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import rcParams
import numpy as np
%matplotlib inline
rcParams['figure.figsize'] = 20,10
df = pd.read_csv('https://raw.githubusercontent.com/ObiP1/The-Future-Value-of-Homes/master/AverageHomeValues.csv')
plt.title('Median Cost Of Maryland Homes', fontsize=30)
plt.ylabel('Median Price Of Home',fontsize=25)
plt.yticks(np.arange(0,450001,50000))
plt.xlabel('Year', fontsize=25)
plt.plot(df.YEAR, df.MED_COST)
plt.grid(True)
The problem is that the $ strings are not interpreted as values but as strings (that line looked much to straight, didn't it?). If you convert it (as here) you get this:
df = pd.read_csv('https://raw.githubusercontent.com/ObiP1/The-Future-Value-of-Homes/master/AverageHomeValues.csv')
df[df.columns[1:]] = df[df.columns[1:]].replace('[\$,]', '', regex=True).astype(float)
plt.title('Median Cost Of Maryland Homes', fontsize=30)
plt.ylabel('Median Price Of Home',fontsize=25)
plt.yticks(np.arange(0,450001,50000))
plt.xlabel('Year', fontsize=25)
plt.plot(df.YEAR, df.MED_COST, 'o')
plt.grid(True)
The problem is that your MED_COST column are strings, not numbers. These strings get used as ticklabels, but for tick positions at 0,1,2,3,4,5,... Setting tick positions at 0, 50000, ... will make everything invisible, except tick 0.
So, converting these strings to numbers should solve the issues. They can be shown as currencies via the StrMethodFormatter. Instead of setting the ticks explicitly, MultipleLocator(50000) is another option to prevent that the ticks should be recalculated when new data comes available.
As plot can change some of the settings, first calling plot and only afterwards setting labels and ticks can be helpful.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import rcParams
import numpy as np
from matplotlib import ticker
rcParams['figure.figsize'] = 20, 10
df = pd.DataFrame({
'YEAR': [1940, 1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020],
'MED_COST': ['$31500', '$48700', '$58600', '$71800', '$115400', '$148800', '$146000', '$250242', '$295000']})
# make the 'MED_COST' column numeric
df.MED_COST = [int(m[1:]) for m in df.MED_COST]
plt.plot(df.YEAR, df.MED_COST)
plt.title('Median Cost Of Maryland Homes', fontsize=30)
plt.ylabel('Median Price Of Home', fontsize=25)
plt.xlabel('Year', fontsize=25)
plt.yticks(np.arange(0, 450001, 50000))
# plt.gca().yaxis.set_major_locator(ticker.MultipleLocator(50000))
plt.gca().yaxis.set_major_formatter(ticker.StrMethodFormatter('${x:,.0f}'))
plt.grid(True)
plt.show()
I would like to plot certain slices of my Pandas Dataframe for each rows (based on row indexes) with different colors.
My data look like the following:
I already tried with the help of this tutorial to find a way but I couldn't - probably due to a lack of skills.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("D:\SOF10.csv" , header=None)
df.head()
#Slice interested data
C = df.iloc[:, 2::3]
#Plot Temp base on row index colorfully
C.apply(lambda x: plt.scatter(x.index, x, c='g'))
plt.show()
Following is my expected plot:
I was also wondering if I could displace the mean of each row of the sliced data which contains 480 values somewhere in the plot or in the legend beside of plot! Is it feasible (like the following picture) to calculate the mean and displaced somewhere in the legend or by using small font size displace next to its own data in graph ?
Data sample: data
This gives the plot without legend
C = df.iloc[:,2::3].stack().reset_index()
C.columns = ['level_0', 'level_1', 'Temperature']
fig, ax = plt.subplots(1,1)
C.plot('level_0', 'Temperature',
ax=ax, kind='scatter',
c='level_0', colormap='tab20',
colorbar=False, legend=True)
ax.set_xlabel('Cycles')
plt.show()
Edit to reflect modified question:
stack() transform your (sliced) dataframe to a series with index (row, col)
reset_index() reset the double-level index above to level_0 (row), level_1 (col).
set_xlabel sets the label of x-axis to what you want.
Edit 2: The following produces scatter with legend:
CC = df.iloc[:,2::3]
fig, ax = plt.subplots(1,1, figsize=(16,9))
labels = CC.mean(axis=1)
for i in CC.index:
ax.scatter([i]*len(CC.columns[1:]), CC.iloc[i,1:], label=labels[i])
ax.legend()
ax.set_xlabel('Cycles')
ax.set_ylabel('Temperature')
plt.show()
This may be an approximate answer. scatter(c=, cmap= can be used for desired coloring.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import itertools
df = pd.DataFrame({'a':[34,22,1,34]})
fig, subplot_axes = plt.subplots(1, 1, figsize=(20, 10)) # width, height
colors = ['red','green','blue','purple']
cmap=matplotlib.colors.ListedColormap(colors)
for col in df.columns:
subplot_axes.scatter(df.index, df[col].values, c=df.index, cmap=cmap, alpha=.9)
[The resolution is described below.]
I'm trying to create a PairGrid. The X-axis has at least 2 different value ranges, although even when 'cvar' below is plotted by itself the x-axis overwrites itself.
My question: is there a way to tilt the x-axis labels to be vertical or have fewer x-axis labels so they don't overlap? Is there another way to solve this issue?
====================
import seaborn as sns
import matplotlib.pylab as plt
import pandas as pd
import numpy as np
columns = ['avar', 'bvar', 'cvar']
index = np.arange(10)
df = pd.DataFrame(columns=columns, index = index)
myarray = np.random.random((10, 3))
for val, item in enumerate(myarray):
df.ix[val] = item
df['cvar'] = [400,450,43567,23000,19030,35607,38900,30202,24332,22322]
fig1 = sns.PairGrid(df, y_vars=['avar'],
x_vars=['bvar', 'cvar'],
palette="GnBu_d")
fig1.map(plt.scatter, s=40, edgecolor="white")
# The fix: Add the following to rotate the x axis.
plt.xticks( rotation= -45 )
=====================
The code above produces this image
Thanks!
I finally figured it out. I added "plt.xticks( rotation= -45 )" to the original code above. More can be fund on the MatPlotLib site here.