I have a dataset of four years' worth of ACT participation percentages by state entitled 'part_ACT'. Here's a snippet of it:
Index State ACT17 ACT18 ACT19 ACT20
0 Alabama 100 100 100 100
1 Alaska 65 33 38 33
2 Arizona 62 66 73 71
3 Arkansas 100 100 100 100
4 California 31 27 23 19
5 Colorado 100 30 27 25
6 Connecticut 31 26 22 19
I'm trying to produce a line graph with each of the four column headings on the x-axis and their values on the y-axis (1-100). I would prefer to display all of these line graphs into a single figure.
What's the easiest way to do this? I'm fine with Pandas, Matplotlib, Seaborn, or whatever. Thanks much!
One solution is to melt the df and plot with hue
import numpy as np
import pandas as pd
import seaborn as sns
df = pd.DataFrame({
'State': ['A', 'B', 'C', 'D'],
'x18': sorted(np.random.randint(0, 100, 4)),
'x19': sorted(np.random.randint(0, 100, 4)),
'x20': sorted(np.random.randint(0, 100, 4)),
'x21': sorted(np.random.randint(0, 100, 4)),
})
df_melt = df.melt(id_vars='State', var_name='year')
sns.relplot(
kind='line',
data=df_melt,
x='year', y='value',
hue='State'
)
Creating a plot is all about the shape of the DataFrame.
One way to accomplish this is by converting the DataFrame from wide to long, with melt, but this isn't necessary.
The primary requirement, is set 'State' as the index.
Plots can be generated directly with df, or df.T (.T is the transpose of the DataFrame).
The OP requests a line plot, but this is discrete data, and the correct way to visualize discrete data is with a bar plot, not a line plot.
pandas v1.2.3, seaborn v0.11.1, and matplotlib v3.3.4
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = {'State': ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut'],
'ACT17': [100, 65, 62, 100, 31, 100, 31],
'ACT18': [100, 33, 66, 100, 27, 30, 26],
'ACT19': [100, 38, 73, 100, 23, 27, 22],
'ACT20': [100, 33, 71, 100, 19, 25, 19]}
df = pd.DataFrame(data)
# set State as the index - this is important
df.set_index('State', inplace=True)
# display(df)
ACT17 ACT18 ACT19 ACT20
State
Alabama 100 100 100 100
Alaska 65 33 38 33
Arizona 62 66 73 71
Arkansas 100 100 100 100
California 31 27 23 19
Colorado 100 30 27 25
Connecticut 31 26 22 19
# display(df.T)
State Alabama Alaska Arizona Arkansas California Colorado Connecticut
ACT17 100 65 62 100 31 100 31
ACT18 100 33 66 100 27 30 26
ACT19 100 38 73 100 23 27 22
ACT20 100 33 71 100 19 25 19
Plot 1
Use pandas.DataFrame.plot
df.T.plot()
plt.legend(title='State', bbox_to_anchor=(1.05, 1), loc='upper left')
# get rid of the ticks between the labels - not necessary
plt.xticks(ticks=range(0, len(df.T)))
plt.show()
Plot 2 & 3
Use pandas.DataFrame.plot with kind='bar' or kind='barh'
The bar plot is much better at conveying the yearly changes in the data, and allows for an easy comparison between states.
df.plot(kind='bar')
plt.legend(title='Year', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
kind='bar'
kind='barh'
Plot 4
Use seaborn.lineplot
Will correctly plot a line plot from a wide dataframe with the columns and index labels.
sns.lineplot(data=df.T)
plt.legend(title='State', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
Related
I have some data for conditions that go together by pairs, structured like this:
mydata = {
"WT_before": [11,12,13],
"WT_after": [16,17,18],
"MRE11_before": [21,22,23,24,25],
"MRE11_after": [26,27,28,29,30],
"NBS1_before": [31,32,33,34],
"NBS1_after": [36,37,38,39]
}
(my real data has more conditions and more values per condition, this is just an example)
I looked into colouring the boxplots by pairs to help reading the figure, but it seemed quite convoluted to do in matplotlib.
For the moment I'm doing it this way:
bxplt_labels, bxplt_data = mydata.keys(), mydata.values()
bxplt_colors = ["pink", "pink", "lightgreen", "lightgreen", "lightblue", "lightblue"]
fig2, ax = plt.subplots(figsize=(20, 10), dpi=500)
bplot = plt.boxplot(bxplt_data, vert=False, showfliers=False, notch=False, patch_artist=True,)
for patch, color in zip(bplot['boxes'], bxplt_colors):
patch.set_facecolor(color)
plt.yticks(range(1, len(bxplt_labels) + 1), bxplt_labels)
fig2.show()
which produces the figure:
I would like:
to sort the condition names, so that I can order them to my choosing, and
to get a more elegant way of choosing the colours used, in particular because I will need to reuse this data for more figures afterwards (like scatterplot before/after for each condition)
If it is needed, I can rearrange the data structure, but each condition doesn't have the same number of values, so a dictionary seemed like the best option for me. Alternatevely, I can use seaborn, which I saw has quite a few possibilities, but I'm not familiar with it, so I would need more time to understand it.
Could you help me to figure out?
Seaborn works easiest with a dataframe in "long form". In this case, there would be rows with the condition repeated for every value with that condition.
Seaborn's boxplot accepts an order= keyword, where you can change the order of the x-values. E.g. order=sorted(mydata.keys()) to sort the values alphabetically. Or list(mydata.keys())[::-1] to use the original order, but reversed. The default order would be how the values appear in the dataframe.
For a horizontal boxplot, you can use x='value', y='condition'. The order will apply to either x or y, depending on which column contains strings.
For coloring, you can use the palette= keyword. This can either be a string indicating one of matplotlib's or seaborn's colormaps. Or it can be a list of colors. Many more options are possible.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
mydata = {
"WT_before": [11, 12, 13],
"WT_after": [16, 17, 18],
"MRE11_before": [21, 22, 23, 24, 25],
"MRE11_after": [26, 27, 28, 29, 30],
"NBS1_before": [31, 32, 33, 34],
"NBS1_after": [36, 37, 38, 39]
}
df = pd.DataFrame([[k, val] for k, vals in mydata.items() for val in vals],
columns=['condition', 'value'])
fig, ax = plt.subplots(figsize=(12, 5))
sns.boxplot(data=df, x='condition', y='value',
order=['WT_before', 'WT_after', 'MRE11_before', 'MRE11_after', 'NBS1_before', 'NBS1_after'],
palette='turbo', ax=ax)
plt.tight_layout()
plt.show()
Here is an example with horizontal boxes:
sns.boxplot(data=df, x='value', y='condition', palette='Paired')
sns.despine()
plt.xlabel('')
plt.ylabel('')
plt.tight_layout()
plt.show()
The dataframe would look like:
condition
value
0
WT_before
11
1
WT_before
12
2
WT_before
13
3
WT_after
16
4
WT_after
17
5
WT_after
18
6
MRE11_before
21
7
MRE11_before
22
8
MRE11_before
23
9
MRE11_before
24
10
MRE11_before
25
11
MRE11_after
26
12
MRE11_after
27
13
MRE11_after
28
14
MRE11_after
29
15
MRE11_after
30
16
NBS1_before
31
17
NBS1_before
32
18
NBS1_before
33
19
NBS1_before
34
20
NBS1_after
36
21
NBS1_after
37
22
NBS1_after
38
23
NBS1_after
39
I try to become warm with seaborn. I want to create one or both of that figures (bar plot & line plot). You see 12 months on the X-axis and 3 years each one with its own line or bar color.
That is the data creating script including the data in comments.
#!/usr/bin/env python3
import random as rd
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
rd.seed(0)
a = pd.DataFrame({
'Y': [2016]*12 + [2017]*12 + [2018]*12,
'M': list(range(1, 13)) * 3,
'n': rd.choices(range(100), k=36)
})
print(a)
# Y M n
# 0 2016 1 84
# 1 2016 2 75
# 2 2016 3 42
# ...
# 21 2017 10 72
# 22 2017 11 89
# 23 2017 12 68
# 24 2018 1 47
# 25 2018 2 10
# ...
# 34 2018 11 54
# 35 2018 12 1
b = a.pivot_table(columns='M', index='Y')
print(b)
# n
# M 1 2 3 4 5 6 7 8 9 10 11 12
# Y
# 2016 84 75 42 25 51 40 78 30 47 58 90 50
# 2017 28 75 61 25 90 98 81 90 31 72 89 68
# 2018 47 10 43 61 91 96 47 86 26 80 54 1
I'm even not sure which form (a or b or something elese) of a dataframe I should use here.
What I tried
I assume in seaboarn speech it is a countplot() I want. Maybe I am wrong?
>>> sns.countplot(data=a)
<AxesSubplot:ylabel='count'>
>>> plt.show()
The result is senseless
I don't know how I could add the pivoted dataframe b to seaborn.
You could do the first plot with a relplot, using hue as a categorical grouping variable:
sns.relplot(data=a, x='M', y='n', hue='Y', kind='line')
I'd use these colour and size settings to make it more similar to the plot you wanted:
sns.relplot(data=a, x='M', y='n', hue='Y', kind='line', palette='pastel', height=3, aspect=3)
The equivalent axes-level code would be sns.lineplot(data=a, x='M', y='n', hue='Y', palette='pastel')
Your second can be done with catplot:
sns.catplot(kind='bar', data=a, x='M', y='n', hue='Y')
Or the axes-level function sns.barplot. In that case let's move the default legend location:
sns.barplot(data=a, x='M', y='n', hue='Y')
plt.legend(bbox_to_anchor=(1.05, 1))
I have df with different groups. I have two predictions (iqr, median).
cntx_iqr pred_iqr cntx_median pred_median
18-54 83 K18-54 72
R18-54 34 R18-54 48
25-54 33 18-34 47
K18-54 29 18-54 47
18-34 27 R25-54 29
K18-34 25 25-54 23
K25-54 24 K25-54 14
R18-34 22 R18-34 8
R25-54 17 K18-34 6
Now I want to plot them using seaborn and I have melted data for pilots. However, it does not look right to me.
pd.melt(df, id_vars=['cntx_iqr', 'cntx_median'], value_name='category', var_name="kind")
I am aiming to compare predictions (pred_iqr,pred_median) from those 2 groups (cntx_iqr, cntx_median) maybe stack barplot or some other useful plot to see how each group differs for those 2 predictions.
any help/suggestion would be appreciated
Thanks in advance
Not sure how you obtained the data frame, but you need to match the values first:
df = df[['cntx_iqr','pred_iqr']].merge(df[['cntx_median','pred_median']],
left_on="cntx_iqr",right_on="cntx_median")
df.head()
cntx_iqr pred_iqr cntx_median pred_median
0 18-54 83 18-54 47
1 R18-54 34 R18-54 48
2 25-54 33 25-54 23
3 K18-54 29 K18-54 72
4 18-34 27 18-34 47
Once you have this, you can just make a scatterplot:
sns.scatterplot(x = 'pred_iqr',y = 'pred_median',data=df)
The barplot requires a bit of pivoting, but should be:
sns.barplot(x = 'cntx_iqr', y = 'value', hue='variable',
data = df.melt(id_vars='cntx_iqr',value_vars=['pred_iqr','pred_median']))
I have two dataframe, and I plot both of them.
one is for female and the other for male.
I want merge them in one graph with different color
(since they have same feature)
here are codes
female[feature].plot(kind='bar')
male[feature].plot(kind = "bar")
feature is the column name of data frame.
the date frame is look likes
X1 X2 X3 ..... X46
male 100 65 75 ..... 150
female 500 75 30 ..... 350
I think you can use DataFrame.plot.bar with transposing DataFrame by T:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'X2': {'female': 75, 'male': 65},
'X46': {'female': 350, 'male': 150},
'X1': {'female': 500, 'male': 100},
'X3': {'female': 30, 'male': 75}})
print (df)
X1 X2 X3 X46
female 500 75 30 350
male 100 65 75 150
df.T.plot.bar()
plt.show()
I have a dataframe like this.
column1 column2 column3
MyIndexes
7 22 90 98
8 50 06 56
23 60 58 44
49 30 62 00
I am using df.plot to plot a line chart. The problem is that using df.plot() treats the index as categorical data and plots graph for each of them (7, 8, 23 and 49). However I want these to be treated as numeric values and have a graph with even xticks and then plot these points into the graph. How will I be able to do that?
When I construct the dataframe as such:
df = pd.DataFrame([[22, 90, 98],
[50, 06, 56],
[60, 58, 44],
[30, 62, 00]],
index=pd.Index([7, 8, 23, 49], name='MyIndexex'),
columns=['column1', 'column2', 'column3'])
print df
column1 column2 column3
MyIndexex
7 22 90 98
8 50 6 56
23 60 58 44
49 30 62 0
then plot:
df.plot()
I suspect your index is not what you think it is.
To force your index to be integers do:
df.index = df.index.astype(int)
df.plot()