This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Binning a column with pandas
(4 answers)
How to plot percentage with seaborn distplot / histplot / displot
(3 answers)
Closed 4 months ago.
I have the following table which shows the item and price for that item.
item CAR_PRIC1 Car_PRICE2
0 H1 17400.00 18400.00
1 H2 35450.00 27400.00
2 H3 55780.00 57400.00
3 H4 78500.00 37400.00
4 H5 25609.55 77400.00
5 H6 96000.00 97400.00
How I can draw a histogram to show on Y-axis a category of different prices and on X-Axis shows how many percentage of all contract falls among those category of prices.
like following:
It's straightforward with seaborn displot (editing your df a bit to make the plot more readable):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = {
'item': {0: 'H1', 1: 'H2', 2: 'H3', 3: 'H4', 4: 'H5', 5: 'H6'},
'CAR_PRICE1': {0: 7400.0, 1: 135450.0, 2: 5780.0, 3: 78500.0, 4: 25609.55, 5: 126000.0},
'CAR_PRICE2': {0: 78400.0, 1: 27400.0, 2: 37600.0, 3: 37400.0, 4: 77400.0, 5: 97400.0}
}
df = pd.DataFrame(data)
sns.displot(data=df[['CAR_PRICE1', 'CAR_PRICE2']])
plt.show()
Output:
If you want percentage instead of count:
sns.displot(data=df[['CAR_PRICE1', 'CAR_PRICE2']], stat='percent')
Related
This question already has answers here:
How to plot different groups of data from a dataframe into a single figure
(5 answers)
plot multiple pandas dataframes in one graph
(3 answers)
Closed 25 days ago.
I want to superpose two graphs where x-axis corresponds. The first is on the full range, while second is upon a sub-interval.
test1 = pd.DataFrame(
{
'x': [1,2,3,4,5,6,7,8,9],
'y': [0,1,1,2,1,2,1,1,1]
}
)
test2 = pd.DataFrame(
{
'x': [1,2,4,5,8],
'y': [3,2,2,3,3]
}
)
You can use the xlim() function in matplotlib.
Example:
import matplotlib.pyplot as plt
import pandas as pd
test1 = pd.DataFrame(
{
'x': [1,2,3,4,5,6,7,8,9],
'y': [0,1,1,2,1,2,1,1,1]
}
)
test2 = pd.DataFrame(
{
'x': [1,2,4,5,8],
'y': [3,2,2,3,3]
}
)
plt.plot(test1['x'], test1['y'], 'b-', label='test1')
plt.plot(test2['x'], test2['y'], 'r-', label='test2')
plt.xlim(min(test1['x']), max(test1['x']))
plt.legend()
plt.show()
Result: https://i.stack.imgur.com/bz41W.png
IIUC, you can add to the "y" values in the first one the "y" values colliding in the second where collision is over "x" values:
plt.plot(test1["x"], test1["y"].add(test1["x"].map(test1.set_index("x")["y"])))
to get
This question already has answers here:
How to plot multiple pandas columns
(3 answers)
Plot multiple columns of pandas DataFrame using Seaborn
(2 answers)
Closed 5 months ago.
My dataframe looks like the following:
df = pd.DataFrame(
{'id': [543476, 539345, 536068, 537710, 538255],
'true_distance': [22836.49,7920.67,720.39,1475.87,35212.81],
'simulated_distance': [19670.69,7811.64,386.67,568.95,24720.94]}
)
df
id true_distance simulated_distance
0 543476 22836.49 19670.69
1 539345 7920.67 7811.64
2 536068 720.39 386.67
3 537710 1475.87 568.95
4 538255 35212.81 24720.94
I need to compare the true distance and simulated distance in a single cdf plot.
EDIT
I want the cdf of true_distance and simulated_distance in one figure (identified by legend).
import pandas as pd
import seaborn as sns
#create a long format of your df
df_long = df.melt(id_vars=["id"],
value_vars= ["true_distance",
"simulated_distance"],
var_name="Variable",
value_name= "Distance",
ignore_index=True,
)
# create lineplot filtered by your two
Variables
sns.lineplot(data = df_long,
y = "Distance",
x = "id",
hue = "Variable",
linewidth = 2,
)
This question already has an answer here:
matplotlib subplots - too many indices for array [duplicate]
(1 answer)
Closed 1 year ago.
I would like to plot pandas dataframes as subplots.
I read this post: How can I plot separate Pandas DataFrames as subplots?
Here is my minimum example where, like the accepted answer in the post, I used the ax keyword:
import pandas as pd
from matplotlib.pyplot import plot, show, subplots
import numpy as np
# Definition of the dataframe
df = pd.DataFrame({'Pressure': {0: 1, 1: 2, 2: 4}, 'Volume': {0: 2, 1: 4, 2: 8}, 'Temperature': {0: 3, 1: 6, 2: 12}})
# Plot
fig,axes = subplots(2,1)
df.plot(x='Temperature', y=['Volume'], marker = 'o',ax=axes[0,0])
df.plot(x='Temperature', y=['Pressure'], marker = 'o',ax=axes[1,0])
show()
Unfortunately, there is a problem with the indices:
df.plot(x='Temperature', y=['Volume'], marker = 'o',ax=axes[0,0])
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
Please, could you help me ?
If you have only one dimension (like 2 x 1 subplots), you can just used axes[0] and axes[1]. When you have two dimensional subplots (2 x 3 subplots for example), you indeed need slicing with two numbers.
I have a dictionary of values (drug) as follows:
{0: {0: 100.0, 1: 0.41249706379061035, 2: 5.144449764434768, 3: 31.078456871927678}, 1: {0: 100.0, 1: 0.6688801420346955, 2: 77.32360971119694, 3: 78.15132480853421}, 2: {0: 100.0, 1: 136.01949766418852, 2: 163.4967732211563, 3: 146.7726208999281}}
It contains 3 drug types, then the efficacy of that drug type in 4 different concentrations.
I'm trying to make a clustered bar chart which compares the 3 drugs against each other like this:
Currently, my code is the following:
fig, ax = plt.subplots()
width = 0.35
ind = np.arange(3)
for x in range(3):
ax.bar(ind + (width * x), drug[x].values(), width, bottom=0)
ax.set_title('Drug efficacy')
ax.set_xticks(ind + width / 2)
ax.set_xticklabels(list(string.ascii_uppercase[0:drugCount]))
ax.autoscale_view()
plt.show()
I have adapted the code from this guide, but am having multiple problems.
I think the main reason is that the data used in the example is such that values in one group correspond to the same colour rather than the same cluster.
How can I adapt this code such that it will plot the efficacy of each drug in the 4 different concentrations in isolation compared to the other drugs?
IIUC you want to normalize your values by column, which can be done using sklearn:
from sklearn import preprocessing
df = pd.DataFrame(drug)
scaler = preprocessing.MinMaxScaler()
df = pd.DataFrame(scaler.fit_transform(df))
df.T.plot(kind="bar")
plt.show()
I need to plot a pie chart using matplotlib but my DataFrame has 3 columns namely gender, segment and total_amount.
I have tried playing with plt.pie() arguments but it only takes x and labels for data. I tried setting gender as a legend but then it doesn't look right.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'gender': {0: 'Female',
1: 'Female',
2: 'Female',
3: 'Male',
4: 'Male',
5: 'Male'},
'Segment': {0: 'Gold',
1: 'Platinum',
2: 'Silver',
3: 'Gold',
4: 'Platinum',
5: 'Silver'},
'total_amount': {0: 2110045.0,
1: 2369722.0,
2: 1897545.0,
3: 2655970.0,
4: 2096445.0,
5: 2347134.0}})
plt.pie(data = df,x="claim_amount",labels="Segment")
plt.legend(d3.gender)
plt.show()
The result I want is a pie chart of total_amount and its labels as gender and segment. If I can get the percentage, it will be a bonus.
I suggest the following:
# Data to plot
# Take the information from the segment and label columns and join them into one string
labels = df["Segment"]+ " " + df["gender"].map(str)
# Extract the sizes of the segments
sizes = df["total_amount"]
# Plot with labels and percentage
plt.pie(sizes, labels=labels,autopct='%1.1f%%')
plt.show()
You should get this: