I try to display a histogram with this dataframe.
gr_age weighted_cost
0 1 2272.985462
1 2 2027.919360
2 3 1417.617779
3 4 946.568598
4 5 715.731002
5 6 641.716770
I want to use gr_age column as the X axis and weighted_cost as the Y axis. Here is an example of what I am looking for with Excel:
I tried with the following code, and with discrete=True, but it gives another result, and I didn't do better with displot.
sns.histplot(data=df, x="gr_age", y="weighted_cost")
plt.show()
Thanking you for your ideas!
You want a barplot (x vs y values) not a histplot which plots the distribution of a dataset:
import seaborn as sns
ax = sns.barplot(data=df, x='gr_age', y='weighted_cost', color='#4473C5')
ax.set_title('Values by age group')
output:
Related
I would like to print the DataFrame besides the plot. What would be a pythonic way to do that?
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'Age':[21,22,23,24,25,26,27,28,29,30],'Count':[4,1,3,7,2,3,5,1,1,5]})
print(df)
Age Count
0 21 4
1 22 1
2 23 3
3 24 7
4 25 2
5 26 3
6 27 5
7 28 1
8 29 1
9 30 5
plt.rcParams['figure.figsize']=(10,6)
fig,ax = plt.subplots()
font_used={'fontname':'pristina', 'color':'Black'}
ax.set_ylabel('Count',fontsize=20,**font_used)
ax.set_xlabel('Age',fontsize=20,**font_used)
plt.plot(df['Age'],df['Count'])
I would like to have a Graph like this. How can I have the DataFrame's plotted values are printed alongside?:
You can use ax.text to add the DataFrame to the plot. DataFrames have a .to_string method which makes formatting nice. Supply index=False to remove the row index.
plt.rcParams['figure.figsize']=(10, 6)
fig,ax = plt.subplots()
font_used={'fontname':'pristina', 'color':'Black'}
ax.set_ylabel('Count',fontsize=20,**font_used)
ax.set_xlabel('Age',fontsize=20,**font_used)
# Adjust to where you want.
ax.text(x=28.5, y=4.5, s=df.to_string(index=False))
plt.plot(df['Age'],df['Count'])
plt.show()
Another option is to use the function plt.table():
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'Age':[21,22,23,24,25,26,27,28,29,30],'Count':[4,1,3,7,2,3,5,1,1,5]})
plt.rcParams['figure.figsize']=(10,15)
fig,ax = plt.subplots()
plt.subplots_adjust(left=0.1, right=0.85, top=0.9, bottom=0.1)
font_used={'fontname':'pristina', 'color':'Black'}
ax.set_ylabel('Count',fontsize=20,**font_used)
ax.set_xlabel('Age',fontsize=20,**font_used)
plt.plot(df['Age'],df['Count'])
ax.table(cellText=df['Count'].map(str),
rowLabels=df['Age'].map(str),
colWidths=[0.2,0.25],
loc='right')
plt.show()
This approach will create a table with their respective lines. Just make sure to adjust the plot with subplots_adjust() afterwards.
Pandas has a to_html function you can use and place the html next to it. What are you placing the graph and Dataframe into?
df.to_html()
I am trying to plot the following data as a horizontal stacked barplot. I would like to show the Week 1 and Week 2, as bars with the largest bar size ('Total') at the top and then descending down. The actual data is 100 lines so I arrived at using Seaborn catplots with kind='bar'. I'm not sure if possible to stack (like Matplotlib) so I opted to create two charts and overlay 'Week 1' on top of 'Total', for the same stacked effect.
However when I run the below I'm getting two separate plots and the chart title and axis is one the one graph. Am I able to combine this into one stacked horizontal chart. If easier way then appreciate to find out.
Company
Week 1
Week 2
Total
Stanley Atherton
0
1
1
Dennis Auton
1
1
2
David Bailey
3
8
11
Alan Ball
5
2
7
Philip Barker
3
0
3
Mark Beirne
0
1
1
Phyllis Blitz
3
0
3
Simon Blower
4
2
6
Steven Branton
5
7
12
Rebecca Brown
0
4
4
(Names created from random name generator)
Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
data = pd.read_csv('Sample1.csv', delimiter="\t", error_bad_lines=False)
data_rank = data.sort_values(["Attending", "Company"], ascending=[False,True])
sns.set(style="ticks")
g = sns.catplot(y='Company', x='Total', data=data_rank, kind='bar', height=4, color='red', aspect=0.8, ax=ax)
ax2 =ax.twinx()
g = sns.catplot(y='Company', x='Week 1', data=data_rank, kind='bar', height=4, color='blue', aspect=0.8, ax=ax2)
for ax in g.axes[0]:
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
ax.spines['bottom'].set_visible(True)
ax.spines['top'].set_visible(True)
plt.title("Company by week ", size=7)
catplot 1
catplot 2
I think something like this works.
g = sns.barplot(y='Company', x='Total', data=data_rank, color='red', label='Total')
g = sns.barplot(y='Company', x='Week1', data=data_rank, color='blue', label='Week 1')
plt.title("Company by week ", size=12)
plt.xlabel('Frequency')
plt.legend()
plt.show()
import pandas as pd
income_analysis = pd.DataFrame({'Household Income': ['0-24,999', '25,000-49,999', '50,000'], 'rank1': [3,2,1], 'rank2': [1,2,3]})
Household Income rank1 rank2
0 0-24,999 3 1
1 25,000-49,999 2 2
2 50,000 1 3
sns.barplot(data = income_analysis, x = 'Household Income', y = 'rank1')
I am trying to make a bar chart where each set of bars is a different rank, and within each set of bars it is divided based on household income. So all together, 6 bar, 2 sets of bars, 3 bars in each set. My marplot above plots one of them, but how do I do it for both?
Try this,transpose and pandas plot:
income_analysis.set_index('Household Income', inplace=True)
income_analysis.T.plot.bar()
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
Data in form:
x1 x2
data= 2104, 3
1600, 3
2400, 3
1416, 2
3000, 4
1985, 4
y= 399900
329900
369000
232000
539900
299900
I want to plot scatter plot which have got 2 X feature {x1 and x2} and single Y,
but when I try
y=data.loc[:'y']
px=data.loc[:,['x1','x2']]
plt.scatter(px,y)
I get:
'ValueError: x and y must be the same size'.
So I tried this:
data=pd.read_csv('ex1data2.txt',names=['x1','x2','y'])
px=data.loc[:,['x1','x2']]
x1=px['x1']
x2=px['x2']
y=data.loc[:'y']
plt.scatter(x1,x2,y)
This time I got blank graph with full blue color painted inside.
I will be great full if i get some guide
You can only plot with one x and several y's. You could plot the different x's in a twiny axis:
fig, ax = plt.subplots()
ay = ax.twiny()
ax.scatter(df['x1'], df['y'])
ay.scatter(df['x2'], df['y'], color='r')
plt.show()
Output:
You can check the pandas functions for plotting dataframe content, it's very powerful.
But if you want to use matplotlib you can check the documentation (https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.scatter.html), and it's said that X and Y must be array-like. You are instead passing a list.
So the working code it's like this:
data = pd.read_csv("test.txt", header=None)
data
0 1 2
0 2104 3 399900
1 1600 3 329900
2 2400 3 369000
3 1416 2 232000
4 3000 4 539900
5 1985 4 299900
data.columns = ["x1", "x2", "y"]
data
x1 x2 y
0 2104 3 399900
1 1600 3 329900
2 2400 3 369000
3 1416 2 232000
4 3000 4 539900
5 1985 4 299900
# If you call scatter many times and then plt.show() a single image is created
plt.scatter(data["x1"], data["y"])
plt.scatter(data["x2"], data["y"])
plt.show()
Note that if you want to have data in an array format you can do data["x1"].values and it will return an ndarray.
You could use seaborn with a melted dataframe. seaborn.scatterplot has a hue argument, which allows to include multiple data series.
import seaborn as sns
ax = sns.scatterplot(x='value', hue='series', y='y',
data=data.melt(value_vars=['x1', 'x2'],
id_vars='y',
var_name='series'))
However, if your x values are that different, you might want to use twin axes, as in #Quang Hoang's answer.
I'm trying to visualise a large (pandas) dataframe in Python as a heatmap. This dataframe has two types of variables: strings ("Absent" or "Unknown") and floats.
I want the heatmap to show cells with "Absent" in black and "Unknown" in red, and the rest of the dataframe as a normal heatmap, with the floats in a scale of greens.
I can do this easily in Excel with conditional formatting of cells, but I can't find any help online to do this with Python either with matplotlib, seaborn, ggplot. What am I missing?
Thank you for your time.
You could use cmap_custom.set_under('red') and cmap_custom.set_over('black') to apply custom colors to values below and above vmin and vmax (See 1, 2):
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.axes_grid1 as axes_grid1
import pandas as pd
# make a random DataFrame
np.random.seed(1)
arr = np.random.choice(['Absent', 'Unknown']+list(range(10)), size=(5,7))
df = pd.DataFrame(arr)
# find the largest and smallest finite values
finite_values = pd.to_numeric(list(set(np.unique(df.values))
.difference(['Absent', 'Unknown'])))
vmin, vmax = finite_values.min(), finite_values.max()
# change Absent and Unknown to numeric values
df2 = df.replace({'Absent': vmax+1, 'Unknown': vmin-1})
# make sure the values are numeric
for col in df2:
df2[col] = pd.to_numeric(df2[col])
fig, ax = plt.subplots()
cmap_custom = plt.get_cmap('Greens')
cmap_custom.set_under('red')
cmap_custom.set_over('black')
im = plt.imshow(df2, interpolation='nearest', cmap = cmap_custom,
vmin=vmin, vmax=vmax)
# add a colorbar (https://stackoverflow.com/a/18195921/190597)
divider = axes_grid1.make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.05)
plt.colorbar(im, cax=cax, extend='both')
plt.show()
The DataFrame
In [117]: df
Out[117]:
0 1 2 3 4 5 6
0 3 9 6 7 9 3 Absent
1 Absent Unknown 5 4 7 0 2
2 3 0 2 9 8 0 2
3 5 5 7 Unknown 5 Absent 4
4 7 7 5 4 7 Unknown Absent
becomes