How to plot certain values from columns (subplots) - python

My current dataframe looks like this:
a b c d e
in_1 | in_2 |
--------|-----------------------------------
car | bmw 2 4 5 34 46
| merc 23 4 55 64 21
| range 453 32 2 56 21
| lambo 4 6 2 5 12
| ferrari 12 46 34 23 642
fastfood| burger 123 34 213 23 234
| kfc 123 34 235 123 24
| tacoBell 213 432 124 12 1
I am trying to plot a subplot for each 'in_1' in which the x-axis is the column names (a, b, c, d, e), while the y-axis is the counts (the numbers in the cells).
So the first subplot would have the title "car". The x-axis would have 'a','b', 'c', 'd', 'e'. The y-axis will have the counts for each of 'bmw', 'merc', 'range', 'lambo', 'ferrari'.
The subplots can be bar or line plots and the values of in_2 can be represented in the form of a legend.

So I guess you could do something like this:
import numpy as np
import matplotlib.pyplot as plt
ng = 5 #number of groups
bmw = ..
merc = ..
..
fig, ax = plt.subplots()
index = np.arrange(ng)
bar_width = 0.2
fbmw = ax.bar(index, bmw, bar_width, color='r', label='BMW')
fmerc = ax.bar(index + bar_width, merc, color='b', label='MERC')
...
#don't forget to increase bar_width everytime
ax.set_xlabel('Cars')
ax.set_ylabel('Whatever this is')
ax.set_xticks(index + bar_width/2)
ax.set_xticklabels(('a', 'b', 'c', 'd', 'e'))
ax.legend()
fig.tight_layout()
Since I don't know what the numbers and columns a,b,c,d,e are, I left these labels empty. Also I thought you already have the dataframes for bmw, merc etc, so I didn't import them. Hope this helps!

You can use a simple loop to pick up all your columns and assign them to an axis. It will also create a subplots with the number of rows determined by the unique values in in_1.
Please note that it assumes you have a mutli index df with in_1 and in_2 as index values.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
n = len(df.columns)
rows = df.index.get_level_values('in_1').unique()
index = np.arange(n)
fig, axs = plt.subplots(len(rows),1,figsize=(15,10))
width = 0.2
colors=["#e08283", "#52b3d9", "#fde3a7", "#3fc380"]
for i in range(len(rows)):
intdf = df[df.index.get_level_values('in_1') == rows[i]]
offset = 0
for j in range(len(intdf.index.get_level_values('in_2'))):
v = intdf.iloc[j,].values
axs[i].bar(index + offset, v, width,
label=intdf.index.get_level_values('in_2')[j], color=colors[j])
offset += width
axs[i].set_xlabel(rows[i])
axs[i].set_xticks(index + width)
axs[i].set_xticklabels(tuple(df.columns))
axs[i].legend(loc=2)
plt.show()
Here is the output.
As a reminder more information can be found here

Related

Change plot color according to the values from array [duplicate]

This question already has answers here:
matplotlib color line by "value" [duplicate]
(2 answers)
How to manually create a legend
(5 answers)
map pandas values to a categorical level
(1 answer)
Closed 1 year ago.
I did try other solutions that are similar to my question but I did not succeed,
python: how to plot one line in different colors
Color by Column Values in Matplotlib
pandas plot one line graph with color change on column
I want the plot to change color when the values changes, for instance, if the emotion is 0, it will stay black, if the value changes to 1, the color will be red, if the value is 2, the color will be blue and etc. The progress I've made so far is attached to this question, thank you in advance.
random_emotions = [0,0,0,0,0,0,0,1,2,3,2,1,2,1,
2,3,2,2,2,2,1,1,2,3,3,3,3,3,3,4,
4,4,4,4,2,1,2,2,1,2,3,4,0,0,0,0,0]
random_emotions = np.array(random_emotions)
EmotionsInNumber = random_emotions
x = np.array(list(range(0,len(EmotionsInNumber))))
Angry = np.ma.masked_where(EmotionsInNumber == 0,EmotionsInNumber)
Fear = np.ma.masked_where(EmotionsInNumber == 1,EmotionsInNumber)
Happy = np.ma.masked_where(EmotionsInNumber == 2,EmotionsInNumber)
Neutral = np.ma.masked_where(EmotionsInNumber == 3, EmotionsInNumber)
Sad = np.ma.masked_where(EmotionsInNumber == 4,EmotionsInNumber)
fig, ax = plt.subplots()
ax.plot(x, Angry,linewidth = 4, color = 'black')
ax.plot(x, Fear,linewidth = 4, color = 'red')
ax.plot(x, Happy,linewidth = 4, color = 'blue')
ax.plot(x, Neutral,linewidth = 4, color = 'yellow')
ax.plot(x, Sad,linewidth = 4, color = 'green')
ax.legend(['Angry','Fear','Happy','Neutral','Sad',])
ax.set_title("Emotion Report of ")
plt.show()
This is the result that I am getting
The color is not changed accordingly, the legends are wrong and I have no idea how to fix this.
matplotlib color line by "value" [duplicate]
This 'matplotlib color line by "value" [duplicate]' is the closest I got, but when the color changes to cyan on index 1 and 5, the blue should be empty but it keeps plotting both blue and cyan. This is because the dataframe is grouped by 'colors' but it should not plot blue on 1 and 5 and cyan on 2,3,4 on the graph.
The main question will be closed as a duplicate to this answer of this question
The code is explained in the duplicates.
When a question is marked as a duplicate and you don't agree, it is your responsibility to show with code, exactly how you tried to incorporate the duplicate, and what's not working.
SO is a repository of questions and answers, which can be used as a reference to answer new questions. When a question is answered by code in an existing question/answer, it is up to you to do the work.
Since it's a duplicate, this answer has been added as a community wiki.
from matplotlib.lines import Line2D
import pandas as pd
import matplotlib.pyplot as plt
# set up the dataframe to match the duplicate
random_emotions = [0,0,0,0,0,0,0,1,2,3,2,1,2,1, 2,3,2,2,2,2,1,1,2,3,3,3,3,3,3,4, 4,4,4,4,2,1,2,2,1,2,3,4,0,0,0,0,0]
df = pd.DataFrame({'val': random_emotions})
# map values is covered in duplicate
emotion_dict = {0: 'Angry', 1: 'Fear', 2: 'Happy', 3: 'Neutral', 4: 'Sad'}
color_dict = {0: 'k', 1: 'r', 2: 'b', 3: 'y', 4: 'g'}
df['emotion'] = df.val.map(emotion_dict)
df['color'] = df.val.map(color_dict)
# everything else from here is a duplicated
df['change'] = df.val.ne(df.val.shift().bfill()).astype(int)
df['subgroup'] = df['change'].cumsum()
df.index += df['subgroup'].values
first_i_of_each_group = df[df['change'] == 1].index
for i in first_i_of_each_group:
# Copy next group's first row to current group's last row
df.loc[i-1] = df.loc[i]
# But make this new row part of the current group
df.loc[i-1, 'subgroup'] = df.loc[i-2, 'subgroup']
# Don't need the change col anymore
df.drop('change', axis=1, inplace=True)
df.sort_index(inplace=True)
# Create duplicate indexes at each subgroup border to ensure the plot is continuous.
df.index -= df['subgroup'].values
fig, ax = plt.subplots(figsize=(15, 4))
for k, g in df.groupby('subgroup'):
g.plot(ax=ax, y='val', color=g['color'].values[0], marker='.', legend=False, xticks=df.index)
ax.margins(x=0)
# create custom legend is covered in duplicate
custom_lines = [Line2D([0], [0], color=color, lw=4) for color in color_dict.values()]
_ = ax.legend(title='Emotion', handles=custom_lines, labels=emotion_dict.values(), bbox_to_anchor=(1, 1.02), loc='upper left')
# display(df.T)
0 1 2 3 4 5 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 18 19 20 20 21 22 22 23 23 24 25 26 27 28 29 29 30 31 32 33 34 34 35 35 36 36 37 38 38 39 39 40 40 41 41 42 42 43 44 45 46
val 0 0 0 0 0 0 0 1 1 2 2 3 3 2 2 1 1 2 2 1 1 2 2 3 3 2 2 2 2 2 1 1 1 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 2 2 1 1 2 2 2 1 1 2 2 3 3 4 4 0 0 0 0 0 0
emotion Angry Angry Angry Angry Angry Angry Angry Fear Fear Happy Happy Neutral Neutral Happy Happy Fear Fear Happy Happy Fear Fear Happy Happy Neutral Neutral Happy Happy Happy Happy Happy Fear Fear Fear Happy Happy Neutral Neutral Neutral Neutral Neutral Neutral Neutral Sad Sad Sad Sad Sad Sad Happy Happy Fear Fear Happy Happy Happy Fear Fear Happy Happy Neutral Neutral Sad Sad Angry Angry Angry Angry Angry Angry
color k k k k k k k r r b b y y b b r r b b r r b b y y b b b b b r r r b b y y y y y y y g g g g g g b b r r b b b r r b b y y g g k k k k k k
subgroup 0 0 0 0 0 0 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 10 10 10 11 11 11 12 12 13 13 13 13 13 13 13 14 14 14 14 14 14 15 15 16 16 17 17 17 18 18 19 19 20 20 21 21 22 22 22 22 22

Plot "stacked" density distributions of variables, categorized by 0 or 1, in Python

I have the following dataset:
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 6)), columns = ['Var_1', 'Var_2', 'Var_3', 'Var_4', 'Var_5', 'Var_6'])
df['Status'] = np.random.randint(0, 2, size=(100, 1))
df
Out[1]:
Var_1 Var_2 Var_3 Var_4 Var_5 Var_6 Status
0 32 65 48 83 60 21 1
1 44 49 65 84 52 34 1
2 9 2 3 14 82 80 1
3 66 90 97 60 28 12 0
4 28 95 64 53 39 30 1
.. ... ... ... ... ... ... ...
95 22 4 43 9 79 46 1
96 10 26 91 59 99 93 0
97 10 31 33 15 99 25 1
98 41 48 80 65 58 18 1
99 39 42 22 56 91 40 1
[100 rows x 7 columns]
How can I create a "stacked" density distribution plot of each variable, categorized by Status (0 or 1). I would like the plot to look like this:
This plot was was created in R. The plot in Python does not have to look exactly the same. What code could I use to accomplish this? Thank you
Here is an adaption of seaborn's ridgeplot example for the given structure. Here multiple='stack' is selected in sns.kdeplot (the default is multiple='layer' plotting them both starting from y=0). Note that common_norm defaults to True, which scales down both curves in proportion to the number of samples.
As seaborn works with data in "long form", pd.melt() transforms the given dataframe. The long form looks like:
Status variable value
0 0 Var 1 -0.961877
1 1 Var 1 6.454942
2 0 Var 1 6.020015
3 0 Var 1 7.094057
4 0 Var 1 10.289022
... ... ...
2995 0 Var 6 -5.718156
2996 0 Var 6 -5.142314
2997 0 Var 6 -5.155104
2998 1 Var 6 3.339401
2999 1 Var 6 7.912669
Here is a full code example:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Create the data
rs = np.random.RandomState(1979)
data = rs.randn(30, 100).cumsum(axis=1).reshape(-1, 6)
column_names = [f'Var {i}' for i in range(1, 7)]
df = pd.DataFrame(data, columns=column_names)
df['Status'] = rs.randint(0, 2, len(df))
for col in column_names:
df.loc[df['Status'] == 1, col] += 5
df_long = df.melt(id_vars='Status', value_vars=column_names)
# Initialize the FacetGrid object
g = sns.FacetGrid(data=df_long, row="variable", aspect=6, height=1.8)
# Draw the densities
g.map_dataframe(sns.kdeplot, "value",
bw_adjust=.5, clip_on=False, fill=True, alpha=1, linewidth=1.5,
hue="Status", hue_order=[0, 1], palette=['tomato', 'turquoise'], multiple='stack')
g.map(plt.axhline, y=0, lw=2, clip_on=False, color='black')
# Define and use a simple function to label the plot in axes coordinates
def label(x, color):
ax = plt.gca()
ax.text(0, .2, x.iloc[0], fontweight="bold", color='black',
ha="left", va="center", transform=ax.transAxes)
g.map(label, "variable")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play well with overlap
g.set_titles("")
g.set(yticks=[], xlabel="")
g.despine(bottom=True, left=True)
plt.show()

Multiple multi-line plots group wise in Python

I have a pandas dataframe like this -
(Creating a random dataframe)
from random import randint
from random import random
import random
import pandas as pd
x = [randint(1,20) for i in range(20)]
y1 = [random() for i in range(20)]
y2 = [random() for i in range(20)]
y3 = [random() for i in range(20)]
y4 = [random() for i in range(20)]
g = ['a', 'b', 'c']
group = [random.choice(g) for i in range(20)]
data = {'Group': group, 'x': x, 'y1':y1, 'y2':y2, 'y3':y3, 'y4':y4}
df = pd.DataFrame(data)
df.sort_values('Group')
The dataframe is like this -
>>> df.sort_values('Group')
Group x y1 y2 y3 y4
17 a 9 0.400730 0.242629 0.858307 0.799613
16 a 14 0.644299 0.952255 0.257262 0.376845
5 a 3 0.784374 0.800639 0.753612 0.441645
18 a 3 0.988016 0.739003 0.741000 0.299011
11 a 18 0.672816 0.232951 0.763451 0.762478
0 b 7 0.670889 0.785928 0.604563 0.620951
15 b 3 0.838479 0.286988 0.374546 0.013822
4 b 4 0.495855 0.159839 0.984262 0.882428
13 b 3 0.756058 0.979226 0.423426 0.297381
8 b 13 0.835705 0.374927 0.492676 0.939113
12 b 17 0.643511 0.156267 0.248037 0.316526
14 c 13 0.303215 0.177303 0.980071 0.705428
9 c 16 0.829414 0.173755 0.992532 0.398509
7 c 9 0.774353 0.082118 0.089582 0.587679
6 c 14 0.551595 0.737882 0.127206 0.985017
3 c 4 0.072765 0.497016 0.634819 0.149798
2 c 1 0.971598 0.254215 0.325086 0.588159
1 c 14 0.467277 0.631844 0.927199 0.051251
10 c 13 0.346592 0.384929 0.185384 0.330408
19 c 16 0.790785 0.449498 0.176042 0.036896
Using this dataframe I intend to plot multiple graphs group wise (in this case 3 graphs as there are only 3 groups). Each graph is a multi line graph with x on x-axis and [y1, y2, y3, y4] on y-axis
How can I achieve this, I can plot a single multiline graph, but unable to plot multiple plots group -wise.
You can use groupby:
fig, axes = plt.subplots(1, 3, figsize=(10,3))
for (grp, data), ax in zip(df.groupby('Group'), axes.flat):
data.plot(x='x', ax=ax)
Output:
Note: You don't really need to sort by group.

How to control width of graph line in matplotlib?

I am trying to plot line graphs in matplotlib with the following data, x,y points belonging to same id is one line, so there are 3 lines in the below df.
id x y
0 1 0.50 0.0
1 1 1.00 0.3
2 1 1.50 0.5
4 1 2.00 0.7
5 2 0.20 0.0
6 2 1.00 0.8
7 2 1.50 1.0
8 2 2.00 1.2
9 2 3.50 2.0
10 3 0.10 0.0
11 3 1.10 0.5
12 3 3.55 2.2
It can be simply plotted with following code:
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib notebook
fig, ax = plt.subplots(figsize=(12,8))
cmap = plt.cm.get_cmap("viridis")
groups = df.groupby("id")
ngroups = len(groups)
for i1, (key, grp) in enumerate(groups):
grp.plot(linestyle="solid", x = "x", y = "y", ax = ax, label = key)
plt.show()
But, I have another data frame df2 where weight of each id is given, and I am hoping to find a way to control the thickness of each line according to it's weight, the larger the weight, thicker is the line. How can I do this? Also what relation will be followed between the weight and width of the line ?
id weight
0 1 5
1 2 15
2 3 2
Please let me know if anything is unclear.
Based on the comments, you need to know a few things:
How to set the line width?
That's simple: linewidth=number. See https://matplotlib.org/examples/pylab_examples/set_and_get.html
How to take the weight and make it a significant width?
This depends on the range of your weight. If it's consistently between 2 and 15, I'd recommend simply dividing it by 2, i.e.:
linewidth=weight/2
If you find this aesthetically unpleasing, divide by a bigger number, though that would obviously reduce the number of linewidths you get.
How to get the weight out of df2?
Given the df2 you described and the code you showed, key is the id of df2. So you want:
df2[df2['id'] == key]['weight']
Putting it all together:
Replace your grp.plot line with the following:
grp.plot(linestyle="solid",
linewidth=df2[df2['id'] == key]['weight'] / 2.0,
x = "x", y = "y", ax = ax, label = key)
(All this is is your line with the entry for linewidth added in.)

How to create a Gantt plot

How is it possible with matplotlib to plot a graph with that data. The problem is to visualize the distance from column 2 to column 3. At the end it should look like a Gantt chart.
0 0 0.016 19.833
1 0 19.834 52.805
2 0 52.806 84.005
5 0 84.012 107.305
8 0 107.315 128.998
10 0 129.005 138.956
11 0 138.961 145.587
13 0 145.594 163.863
15 0 163.872 192.118
16 0 192.127 193.787
17 0 193.796 197.106
20 0 236.099 246.223
25 1 31.096 56.180
27 1 58.097 64.857
28 1 64.858 66.494
29 1 66.496 89.908
31 1 89.918 111.606
34 1 129.007 137.371
35 1 137.372 145.727
39 1 176.097 209.461
42 1 209.476 226.207
44 1 226.217 259.317
46 1 259.329 282.488
47 1 282.493 298.905
I need 2 colors for column 1. And for the y-axis the column 0 is selected, for the x-axis the column 2 and 3 are important. For each row a line should be plotted. Column 2 is the start time, and column 3 is the stop time.
If I have understood you correctly, you want to plot a horizontal line between the x-values of the 3rd and 4th column, with y-value equal that in column 0. To plot a horizontal line at a given y-value between two x-values, you could use hlines. I believe the code below is a possible solution.
import numpy as np
import matplotlib.pyplot as plt
# Read data from file into variables
y, c, x1, x2 = np.loadtxt('data.txt', unpack=True)
# Map value to color
color_mapper = np.vectorize(lambda x: {0: 'red', 1: 'blue'}.get(x))
# Plot a line for every line of data in your file
plt.hlines(y, x1, x2, colors=color_mapper(c))
You can read the text file using numpy.loadtxt, for example, and then plot it using matplotlib. For example:
import numpy as np
import matplotlib.pyplot as plt
x, y = np.loadtxt('file.txt', usecols=(2,3), unpack=True)
plt.plot(x,y)
You should see the matplotlib documentation for more options.

Categories

Resources