How do I plot a Bar graph when comparing the rows? [duplicate] - python

This question already has answers here:
How to create a grouped bar plot
(4 answers)
Annotate bars with values on Pandas bar plots
(4 answers)
Closed 3 years ago.
I am having trouble in plotting a bar graph on this Dataset.
+------+------------+--------+
| Year | Discipline | Takers |
+------+------------+--------+
| 2010 | BSCS | 213 |
| 2010 | BSIS | 612 |
| 2010 | BSIT | 796 |
| 2011 | BSCS | 567 |
| 2011 | BSIS | 768 |
| 2011 | BSIT | 504 |
| 2012 | BSCS | 549 |
| 2012 | BSIS | 595 |
| 2012 | BSIT | 586 |
+------+------------+--------+
I'm trying to plot a bar chart with 3 bars representing the number of takers for each year. This is the algorithm I did.
import matplotlib.pyplot as plt
import pandas as pd
Y = df_group['Takers']
Z = df_group['Year']
df = pd.DataFrame(df_group['Takers'], index = df_group['Discipline'])
df.plot.bar(figsize=(20,10)).legend(["2010", "2011","2012"])
plt.show()
I'm expecting to show something like this graph
With the same legends

First reshape by DataFrame.pivot, plot and last add labels by this:
ax = df.pivot('Discipline', 'Year','Takers').plot.bar(figsize=(10,10))
for p in ax.patches:
ax.annotate(np.round(p.get_height(),decimals=2), (p.get_x()+p.get_width()/2., p.get_height()), ha='center', va='center', xytext=(0, 10), textcoords='offset points')

With Seaborn, you can directly use your Dataframe:
import seaborn as sns
ax = sns.barplot(data=df, x="Discipline", hue="Year", y="Takers")
To add the labels, you can use the snippet from jezrael:
for p in ax.patches:
ax.annotate(np.round(p.get_height(),decimals=2), (p.get_x()+p.get_width()/2., p.get_height()), ha='center', va='center', xytext=(0, 10), textcoords='offset points')
plt.tight_layout()

Just add 2 more lines before plt.show() in your code and you will get your result.
The whole code is given below.
import matplotlib.pyplot as plt
import pandas as pd
Y = df_group['Takers']
Z = df_group['Year']
df = pd.DataFrame(df_group['Takers'], index = df_group['Discipline'])
df.plot.bar(figsize=(20,10)).legend(["2010", "2011","2012"])
for i,v in enumerate(Y):
plt.text(x=i, y=v+2, s=v)
# Here i= index value
# v= real value which you have in variable Y.
# x and y are coordinate.
# 's' is the value you want to show on the plot.
plt.show()

Related

How to plot each row in Pandas dataframe and color it by data from one column

I have pandas table that looks like:
| Sample | Type | 1 | 2 | 3 | ...
| S1 | Type1 | 1 | 2 | 3 | ...
| S2 | Type2 | 5 | 6 | 7 | ...
| S3 | Type3 | 8 | 9 | 10 | ...
....
| S100 | Type3 | n | n | n | ...
I want to plot multiple line plot where each line will color by type from 'Type' column (there are only three types). And Axis mast be the numbers from column names (1, 2, 3 ect)
I have tried the solution from here, but because it plots each row, in the end I have more than 100 different colors.
here is csv file with a toy example
That what I did based on the link with the solution above. The result is good apart the fact that I want to have only 3 line colors, based on 'Type' column
df = pd.read_csv('data.csv', sep=',')
df = df.set_index('Type')
df = df.drop(columns='Sample')
ax = df.T.plot(figsize=(7, 6))
ax.set_ylabel('Absolute Power (log)', fontsize=12)
ax.set_xlabel('Frequencies', fontsize=12)
plt.show()
IIUC, you can use groupby to draw a collection of lines then keep only one instance of each to build:
fig, ax = plt.subplots(figsize=(8, 6))
colormap = {'type 1': 'red', 'type 2': 'green', 'type 3': 'blue'}
custom_lines = {}
for name, subdf in df.groupby(level='Type'):
lines = ax.plot(subdf.T, label=name, color=colormap[name])
custom_lines[name] = lines[0]
ax.set_ylabel('Absolute Power (log)', fontsize=12)
ax.set_xlabel('Frequencies', fontsize=12)
plt.legend(custom_lines.values(), custom_lines.keys())
plt.show()
Output:

How do I cluster two stacked bars using matplotlib/python? [duplicate]

This question already has answers here:
grouped stacked bar plot of different datasets stored as np.arrays
(1 answer)
How can I group a stacked bar chart?
(2 answers)
Closed 9 months ago.
I am trying to create a plot that has two stacked bars, side by side, for each FiscalYear.
using matplotlib / python, and I can't see how to "group" the "stacked bars".
This post How to have clusters of stacked bars with python (Pandas) is very close to what I'm trying to do, but I've not had any success finding the solution.
I can create the stacked bars, but not break them down into clusters or groups.
How do I turn this data
+----+--------------+---------------+-----------+----------+------------+
| | FiscalYear | Unallocated | Planned | Actual | Forecast |
|----+--------------+---------------+-----------+----------+------------|
| 0 | 2022 | 744765 | 685998 | 516718 | 442575 |
| 1 | 2023 | 51459 | 323787 | 372689 | 9759 |
| 2 | 2024 | 976143 | 560108 | 255508 | 36041 |
| 3 | 2025 | 695902 | 471972 | 464622 | 332749 |
| 4 | 2026 | 165179 | 345003 | 416089 | 729036 |
+----+--------------+---------------+-----------+----------+------------+
into this picture?
df=pd.DataFrame(np.random.randint(1000, 1000000, size=(5, 4)),
columns=['Unallocated','Planned','Actual','Forecast'])
df.insert(loc=0,
column='FiscalYear',
value=[2022,2023,2024,2025,2026])
print(tabulate(df, headers='keys', tablefmt='psql'))
labels = df['FiscalYear']
u = df['Unallocated']
p = df['Planned']
a = df['Actual']
f = df['Forecast']
width = 0.35 # the width of the bars: can also be len(x) sequence
fig, ax = plt.subplots()
ax.bar(labels, u, width, label='o')
ax.bar(labels, p, width, bottom=u, label='p')
ax.bar(labels, a, width, label='a')
ax.bar(labels, f, width, bottom=a, label='f')
ax.legend()
plt.show()

How to add colormap and rectangular boxes using matplotlib python?

I'm trying to plot a precedence matrix plot from bupar in python.
So far I'm able to add the text and plot the categorical variables with the count.
def plot_precedence_matrix(data,colx,coly,cols,color=['grey','black'],ratio=10,font='Helvetica',save=False,save_name='Default'):
df = data.copy()
# Create a dict to encode the categeories into numbers (sorted)
colx_codes=dict(zip(df[colx].sort_values().unique(),range(len(df[colx].unique()))))
coly_codes=dict(zip(df[coly].sort_values(ascending=False).unique(),range(len(df[coly].unique()))))
# Apply the encoding
df[colx]=df[colx].apply(lambda x: colx_codes[x])
df[coly]=df[coly].apply(lambda x: coly_codes[x])
ax=plt.gca()
ax.xaxis.set_label_position('top')
ax.xaxis.set_ticks_position('top')
# Prepare the aspect of the plot
# plt.rcParams['xtick.bottom'] = plt.rcParams['xtick.labelbottom'] = False
# plt.rcParams['xtick.top'] = plt.rcParams['xtick.labeltop'] = True
plt.rcParams['font.sans-serif']=font
plt.rcParams['xtick.color']=color[-1]
plt.rcParams['ytick.color']=color[-1]
# plt.box(False)
# Plot all the lines for the background
for num in range(len(coly_codes)):
plt.hlines(num,-1,len(colx_codes),linestyle='dashed',linewidth=2,color=color[num%2],alpha=0.1)
for num in range(len(colx_codes)):
plt.vlines(num,-1,len(coly_codes),linestyle='dashed',linewidth=2,color=color[num%2],alpha=0.1)
for x, y, tex in zip(df[colx], df[coly], df[colx]):
t = plt.text(x, y, round(tex, 1), horizontalalignment='center',
verticalalignment='center', fontdict={'color':'black',
'size':30})
# Change the ticks numbers to categories and limit them
plt.xticks(ticks=list(colx_codes.values()),labels=colx_codes.keys(),rotation=90)
plt.yticks(ticks=list(coly_codes.values()),labels=coly_codes.keys())
# Lighten borders
plt.gca().spines["top"].set_alpha(0.1)
plt.gca().spines["bottom"].set_alpha(0.1)
plt.gca().spines["right"].set_alpha(0.1)
plt.gca().spines["left"].set_alpha(0.1)
# Save if wanted
if save:
plt.savefig(save_name+'.png')
Sample dataset
| Antecedent | Consequent | Count |
|-------------------:|-------------------:|-------|
| register request | examine thoroughly | 1 |
| examine thoroughly | check ticket | 2 |
| check ticket | decide | 6 |
| decide | reject request | 3 |
| register request | check ticket | 2 |
| check ticket | examine casually | 2 |
| examine casually | decide | 2 |
| decide | pay compensation | 3 |
| register request | examine casually | 3 |
| examine casually | check ticket | 4 |
| decide | reinitiate request | 3 |
| reinitiate request | examine thoroughly | 1 |
| check ticket | examine thoroughly | 1 |
| examine thoroughly | decide | 1 |
| reinitiate request | check ticket | 1 |
| reinitiate request | examine casually | 1 |
colors=['darkorange','grey','darkblue']
#create the plot
fig = plt.figure(figsize=(12,8))
plot_precedence_matrix(df, 'Antecedent', 'Consequent', 'Count',color=colors,ratio=100, font='cursive')
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.show()
How to add the rectangular boxes with color scale using matplotlib? Can anybody shed some light on plotting the above plot with Python? I would be happy to receive any leads on it from you.
You could draw colored rectangles at each of the positions. A colormap together with a norm could define the color.
Here is an example:
from matplotlib import pyplot as plt
from matplotlib.cm import ScalarMappable
from matplotlib.colors import ListedColormap
import pandas as pd
import numpy as np
from io import StringIO
def plot_precedence_matrix(data, colx, coly, cols, color=['grey', 'black'], ratio=10, font='Helvetica',
save=False, save_name='Default'):
df = data.copy()
# Create a dict to encode the categeories into numbers (sorted)
colx_codes = dict(zip(df[colx].sort_values().unique(), range(len(df[colx].unique()))))
coly_codes = dict(zip(df[coly].sort_values(ascending=False).unique(), range(len(df[coly].unique()))))
# Apply the encoding
df[colx] = df[colx].apply(lambda x: colx_codes[x])
df[coly] = df[coly].apply(lambda x: coly_codes[x])
ax = plt.gca()
ax.xaxis.set_label_position('top')
ax.xaxis.set_ticks_position('top')
# Prepare the aspect of the plot
plt.rcParams['font.sans-serif'] = font
plt.rcParams['xtick.color'] = color[-1]
plt.rcParams['ytick.color'] = color[-1]
# Plot the lines for the background
for num in range(len(coly_codes)):
ax.hlines(num, -1, len(colx_codes), linestyle='dashed', linewidth=2, color=color[num % 2], alpha=0.1)
for num in range(len(colx_codes)):
ax.vlines(num, -1, len(coly_codes), linestyle='dashed', linewidth=2, color=color[num % 2], alpha=0.1)
cmap = ListedColormap(plt.get_cmap('Blues')(np.linspace(0.1, 1, 256))) # skip too light colors
norm = plt.Normalize(df[colx].min(), df[colx].max())
for x, y, tex in zip(df[colx], df[coly], df[colx]):
t = ax.text(x, y, round(tex, 1), horizontalalignment='center', verticalalignment='center',
fontdict={'color': 'black' if norm(tex) < 0.6 else 'white', 'size': 30})
ax.add_patch(plt.Rectangle((x - .5, y - .5), 1, 1, color=cmap(norm(tex)), ec='white'))
plt.colorbar(ScalarMappable(cmap=cmap, norm=norm), ax=ax)
# Change the ticks numbers to categories and limit them
ax.set_xticks(list(colx_codes.values()))
ax.set_xticklabels(colx_codes.keys(), rotation=90, fontsize=14)
ax.set_yticks(list(coly_codes.values()))
ax.set_yticklabels(coly_codes.keys(), fontsize=14)
# Lighten borders
for spine in ax.spines:
ax.spines[spine].set_alpha(0.1)
plt.tight_layout() # fit the labels into the figure
if save:
plt.savefig(save_name + '.png')
df_str = """
register request | examine thoroughly | 1
examine thoroughly | check ticket | 2
check ticket | decide | 6
decide | reject request | 3
register request | check ticket | 2
check ticket | examine casually | 2
examine casually | decide | 2
decide | pay compensation | 3
register request | examine casually | 3
examine casually | check ticket | 4
decide | reinitiate request | 3
reinitiate request | examine thoroughly | 1
check ticket | examine thoroughly | 1
examine thoroughly | decide | 1
reinitiate request | check ticket | 1
reinitiate request | examine casually | 1 """
df = pd.read_csv(StringIO(df_str), delimiter="\s*\|\s*", engine='python', names=['Antecedent', 'Consequent', 'Count'])
colors = ['darkorange', 'grey', 'darkblue']
fig = plt.figure(figsize=(12, 8))
plot_precedence_matrix(df, 'Antecedent', 'Consequent', 'Count', color=colors, ratio=100, font='cursive')
plt.show()

How to add a circle over a scatter plot I have created using seaborn [duplicate]

This question already has answers here:
plot a circle with pyplot
(9 answers)
Closed 2 years ago.
As the title states I want to impose a circle over my scatter plot
let's say I have a df like this
+--------+---------+------+
| X_GRID | Y_GRID | GRP |
+--------+---------+------+
| 1 | 0 | HOT |
| 2 | -1 | COLD |
| 1 | 2 | COLD |
| 2 | 1 | HOT |
+--------+---------+------+
and I use it to create a scatter plot like so
ax = sns.scatterplot(data=df, hue='GRP', x='X_GRID', y='Y_GRID')
ax.set(xlim=(-4, 4))
ax.set(ylim=(-4, 4))
How would I go about imposing a red circle (or any color) around my scatter plot of say radius = 2 at around (0,0)?
Refered from here
import pandas as pd
import seaborn as sns
df = pd.DataFrame()
df['X_GRID'] = [1,2,1,2]
df['Y_GRID'] = [0,-1,2,1]
df['GRP'] = ['HOT','COLD','COLD','HOT']
circle1 = plt.Circle(xy=(0, 0), radius=2, color='red', fill=False)
ax = sns.scatterplot(data=df, hue='GRP', x='X_GRID', y='Y_GRID')
ax.add_patch(circle1)
ax.set(xlim=(-4, 4))
ax.set(ylim=(-4, 4));

Red line when data is negative and green line when data is positive in Pandas

I wanted to have the data in this graph as red when y is below 0 and green when it is above 0:
I'm trying this, but unsuccessfully:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import seaborn as sns
sns.set(rc={"figure.figsize": (20, 10)})
df_positive = df[df["cum_profit"] > 0]["cum_profit"]
df_negative = df[df["cum_profit"] < 0]["cum_profit"]
plt.plot(df_positive, color='green')
plt.plot(df_negative, color='red')
plt.show()
My data looks like this:
+---+---------------------+------------+-----------+
| | placed_date | cum_profit | cum_stake |
+---+---------------------+------------+-----------+
| 0 | 2017-07-14 16:06:38 | -25.0 | 25 |
| 1 | 2017-07-14 16:26:42 | -50.0 | 50 |
| 2 | 2017-07-14 16:54:53 | -75.0 | 75 |
| 3 | 2017-07-17 16:48:07 | -150.0 | 150 |
| 4 | 2017-07-17 18:52:22 | -200.0 | 200 |
| 5 | 2017-07-17 18:54:51 | 10.0 | 250 |
| 6 | 2017-07-17 18:59:19 | 190.0 | 300 |
| 7 | 2017-07-17 19:06:41 | 140.0 | 350 |
| 8 | 2017-07-17 19:42:42 | 90.0 | 400 |
| 9 | 2017-07-18 12:46:59 | 154.0 | 450 |
+---+---------------------+------------+-----------+
Update
Latest attempt:
#df["positive"] = np.where(df["cum_profit"] > 0, df["cum_profit"], None)
#df["negative"] = np.where(df["cum_profit"] < 0, df["cum_profit"], None)
df.cum_profit.where(df.cum_profit.ge(0), np.nan).plot(color='green')
df.cum_profit.where(df.cum_profit.lt(0), np.nan).plot(color='red')
#plt.plot(df["positive"] , color='green')
#plt.plot(df["negative"], color='red')
plt.show()
The problem you are running into is that matplotlib will draw a line connecting each consecutive plotable point. By slicing your data frame, you are still providing all plotable points, just with a spaced out index.
To get around this, you can include the non-plotable points in the plotting operation. Instead of slicing, use .where() and with the fill values as NaN.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={"figure.figsize": (20, 10)})
np.random.seed(200)
df = pd.DataFrame(np.cumsum(np.random.rand(10000)-0.5), columns=['cum_profit'])
df.cum_profit.where(df.cum_profit.ge(0), np.nan).plot(color='green')
df.cum_profit.where(df.cum_profit.lt(0), np.nan).plot(color='red')
plt.show()
Here is an example with a different set of data, but overall should be able to be easy to apply it to your data. Use np.masked_where() to split the data into two chunks then plot it. The t variable is defining the frame from 0.0 to 2.0 and the scale by 0.01.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
t = np.arange(0.0, 2.0, 0.01)
s = np.sin(2 * np.pi * t)
lower = 0
supper = np.ma.masked_where(s > lower, s)
slower = np.ma.masked_where(s < lower, s)
cmap = colors.ListedColormap(['green'])
cmap.set_bad(color='red')
fig, ax = plt.subplots()
ax.plot(t, slower, color='green')
ax.plot(t, supper, color='red')
plt.show()
graph

Categories

Resources