I have a pandas dataframe like this -
(Creating a random dataframe)
from random import randint
from random import random
import random
import pandas as pd
x = [randint(1,20) for i in range(20)]
y1 = [random() for i in range(20)]
y2 = [random() for i in range(20)]
y3 = [random() for i in range(20)]
y4 = [random() for i in range(20)]
g = ['a', 'b', 'c']
group = [random.choice(g) for i in range(20)]
data = {'Group': group, 'x': x, 'y1':y1, 'y2':y2, 'y3':y3, 'y4':y4}
df = pd.DataFrame(data)
df.sort_values('Group')
The dataframe is like this -
>>> df.sort_values('Group')
Group x y1 y2 y3 y4
17 a 9 0.400730 0.242629 0.858307 0.799613
16 a 14 0.644299 0.952255 0.257262 0.376845
5 a 3 0.784374 0.800639 0.753612 0.441645
18 a 3 0.988016 0.739003 0.741000 0.299011
11 a 18 0.672816 0.232951 0.763451 0.762478
0 b 7 0.670889 0.785928 0.604563 0.620951
15 b 3 0.838479 0.286988 0.374546 0.013822
4 b 4 0.495855 0.159839 0.984262 0.882428
13 b 3 0.756058 0.979226 0.423426 0.297381
8 b 13 0.835705 0.374927 0.492676 0.939113
12 b 17 0.643511 0.156267 0.248037 0.316526
14 c 13 0.303215 0.177303 0.980071 0.705428
9 c 16 0.829414 0.173755 0.992532 0.398509
7 c 9 0.774353 0.082118 0.089582 0.587679
6 c 14 0.551595 0.737882 0.127206 0.985017
3 c 4 0.072765 0.497016 0.634819 0.149798
2 c 1 0.971598 0.254215 0.325086 0.588159
1 c 14 0.467277 0.631844 0.927199 0.051251
10 c 13 0.346592 0.384929 0.185384 0.330408
19 c 16 0.790785 0.449498 0.176042 0.036896
Using this dataframe I intend to plot multiple graphs group wise (in this case 3 graphs as there are only 3 groups). Each graph is a multi line graph with x on x-axis and [y1, y2, y3, y4] on y-axis
How can I achieve this, I can plot a single multiline graph, but unable to plot multiple plots group -wise.
You can use groupby:
fig, axes = plt.subplots(1, 3, figsize=(10,3))
for (grp, data), ax in zip(df.groupby('Group'), axes.flat):
data.plot(x='x', ax=ax)
Output:
Note: You don't really need to sort by group.
Related
I am struggling with this problem.
These are my initial matrices:
columnsx = {'X1':[6,11,17,3,12],'X2':[1,2,10,24,18],'X3':[8,14,9,15,7], 'X4':[22,4,20,16,5],'X5':[19,21,13,23,25]}
columnsy = {'y1':[0,1,1,2,0],'y2':[1,0,0,2,1]}
X = pd.DataFrame(columnsx)
y = pd.DataFrame(columnsy)
This is the final solution I am figuring out. It adds a column to X (called X_i), corresponding to the name of y with y value > 0. Therefore, it takes only the positive values of y (y>0) and rensitutes a binary vector with cardinality 2.
columnsx = {'X1':[11,17,3,6,3,12],'X2':[2,10,24,1,24,18],'X3':[14,9,15,8,15,7],
'X4':[4,20,16,22,16,5],'X5':[21,13,23,19,23,25], 'X_i':['y1','y1','y1','y2','y2','y2']}
columnsy = {'y':[1,1,2,1,2,1]}
X = pd.DataFrame(columnsx)
y = pd.DataFrame(columnsy)
Use DataFrame.melt
new_df = (df.melt(df.columns[df.columns.str.contains('X')],
var_name='X_y', value_name='y')
.loc[lambda df: df['y'].gt(0)])
print(new_df)
Output
X1 X2 X3 X4 X5 X_y y
1 11 2 14 4 21 y1 1
2 17 10 9 20 13 y1 1
3 3 24 15 16 23 y1 2
5 6 1 8 22 19 y2 1
8 3 24 15 16 23 y2 2
9 12 18 7 5 25 y2 1
My current dataframe looks like this:
a b c d e
in_1 | in_2 |
--------|-----------------------------------
car | bmw 2 4 5 34 46
| merc 23 4 55 64 21
| range 453 32 2 56 21
| lambo 4 6 2 5 12
| ferrari 12 46 34 23 642
fastfood| burger 123 34 213 23 234
| kfc 123 34 235 123 24
| tacoBell 213 432 124 12 1
I am trying to plot a subplot for each 'in_1' in which the x-axis is the column names (a, b, c, d, e), while the y-axis is the counts (the numbers in the cells).
So the first subplot would have the title "car". The x-axis would have 'a','b', 'c', 'd', 'e'. The y-axis will have the counts for each of 'bmw', 'merc', 'range', 'lambo', 'ferrari'.
The subplots can be bar or line plots and the values of in_2 can be represented in the form of a legend.
So I guess you could do something like this:
import numpy as np
import matplotlib.pyplot as plt
ng = 5 #number of groups
bmw = ..
merc = ..
..
fig, ax = plt.subplots()
index = np.arrange(ng)
bar_width = 0.2
fbmw = ax.bar(index, bmw, bar_width, color='r', label='BMW')
fmerc = ax.bar(index + bar_width, merc, color='b', label='MERC')
...
#don't forget to increase bar_width everytime
ax.set_xlabel('Cars')
ax.set_ylabel('Whatever this is')
ax.set_xticks(index + bar_width/2)
ax.set_xticklabels(('a', 'b', 'c', 'd', 'e'))
ax.legend()
fig.tight_layout()
Since I don't know what the numbers and columns a,b,c,d,e are, I left these labels empty. Also I thought you already have the dataframes for bmw, merc etc, so I didn't import them. Hope this helps!
You can use a simple loop to pick up all your columns and assign them to an axis. It will also create a subplots with the number of rows determined by the unique values in in_1.
Please note that it assumes you have a mutli index df with in_1 and in_2 as index values.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
n = len(df.columns)
rows = df.index.get_level_values('in_1').unique()
index = np.arange(n)
fig, axs = plt.subplots(len(rows),1,figsize=(15,10))
width = 0.2
colors=["#e08283", "#52b3d9", "#fde3a7", "#3fc380"]
for i in range(len(rows)):
intdf = df[df.index.get_level_values('in_1') == rows[i]]
offset = 0
for j in range(len(intdf.index.get_level_values('in_2'))):
v = intdf.iloc[j,].values
axs[i].bar(index + offset, v, width,
label=intdf.index.get_level_values('in_2')[j], color=colors[j])
offset += width
axs[i].set_xlabel(rows[i])
axs[i].set_xticks(index + width)
axs[i].set_xticklabels(tuple(df.columns))
axs[i].legend(loc=2)
plt.show()
Here is the output.
As a reminder more information can be found here
I would like to plot lines between two points and my points are defined in different columns.
#coordinates of the points
#point1(A[0],B[0])
#point2(C[0],D[0])
#line between point1 and point 2
#next line would be
#point3(A[1],B[1])
#point4(C[1],D[1])
#line between point3 and point 4
plot_result:
A B C D E F
0 0 4 7 1 5 1
1 2 5 8 3 3 1
2 3 4 9 5 6 1
3 4 5 4 7 9 4
4 6 5 2 1 2 7
5 1 4 3 0 4 7
i tried with this code:
import numpy as np
import matplotlib.pyplot as plt
for i in range(0, len(plot_result.A), 1):
plt.plot(plot_result.A[i]:plot_result.B[i], plot_result.C[i]:plot_result.D[i], 'ro-')
plt.show()
but it is a invalid syntax. I have no idea how to implement this
The first two parameters of the method plot are x and y which can be single points or array-like objects. If you want to plot a line from the point (x1,y1) to the point (x2,y2) you have to do something like this:
for plot_result in plot_result.values: # if plot_results is a DataFrame
x1 = row[0] # A[i]
y1 = row[1] # B[i]
x2 = row[2] # C[i]
y2 = row[3] # D[i]
plt.plot([x1,x2],[y1,y2]) # plot one line for every row in the DataFrame.
Is there a way to group boxplots in matplotlib WITHOUT the use of seaborn or some other library?
e.g. in the following, I want to have blocks along the x axis, and plot values grouped by condition (so there will be 16 boxes). Like what seaborn's hue argument accomplishes.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
blocks = 4
conditions = 4
ndatapoints = blocks * conditions
blockcol = np.repeat(list(range(1, conditions+1)), blocks)
concol = np.repeat(np.arange(1, conditions+1, 1), blocks)
trialcol = np.arange(1, ndatapoints+1, 1)
valcol = np.random.normal(0, 1, ndatapoints)
raw_data = {'blocks': np.repeat(list(range(1, conditions+1)), blocks),
'condition': list(range(1, conditions+1))*blocks,
'trial': np.arange(1, ndatapoints+1, 1),
'value': np.random.normal(0, 1, ndatapoints)}
df = pd.DataFrame(raw_data)
df
blocks condition trial value
0 1 1 1 1.306146
1 1 2 2 -0.024201
2 1 3 3 -0.374561
3 1 4 4 -0.093366
4 2 1 5 -0.548427
5 2 2 6 -1.205077
6 2 3 7 0.617165
7 2 4 8 -0.239830
8 3 1 9 -0.876789
9 3 2 10 0.656436
10 3 3 11 -0.471325
11 3 4 12 -1.465787
12 4 1 13 -0.495308
13 4 2 14 -0.266914
14 4 3 15 -0.305884
15 4 4 16 0.546730
I can't seem to find any examples.
I think you just want a factor plot:
import numpy
import pandas
import seaborn
blocks = 3
conditions = 4
trials = 12
ndatapoints = blocks * conditions * trials
blockcol = list(range(1, blocks + 1)) * (conditions * trials)
concol = list(range(1, conditions + 1)) * (blocks * trials)
trialcol = list(range(1, trials + 1)) * (blocks * conditions)
valcol = numpy.random.normal(0, 1, ndatapoints)
fg = pandas.DataFrame({
'blocks': blockcol,
'condition': concol,
'trial': trialcol,
'value': valcol
}).pipe(
(seaborn.factorplot, 'data'),
x='blocks', y='value', hue='condition',
kind='box'
)
I want to draw bar chart for below data:
4 1406575305 4
4 -220936570 2
4 2127249516 2
5 -1047108451 4
5 767099153 2
5 1980251728 2
5 -2015783241 2
6 -402215764 2
7 927697904 2
7 -631487113 2
7 329714360 2
7 1905727440 2
8 1417432814 2
8 1906874956 2
8 -1959144411 2
9 859830686 2
9 -1575740934 2
9 -1492701645 2
9 -539934491 2
9 -756482330 2
10 1273377106 2
10 -540812264 2
10 318171673 2
The 1st column is the x-axis and the 3rd column is for y-axis. Multiple data exist for same x-axis value. For example,
4 1406575305 4
4 -220936570 2
4 2127249516 2
This means three bars for 4 value of x-axis and each of bar is labelled with tag(the value in middle column). The sample bar chart is like:
http://matplotlib.org/examples/pylab_examples/barchart_demo.html
I am using matplotlib.pyplot and np. Thanks..
I followed the tutorial you linked to, but it's a bit tricky to shift them by a nonuniform amount:
import numpy as np
import matplotlib.pyplot as plt
x, label, y = np.genfromtxt('tmp.txt', dtype=int, unpack=True)
ux, uidx, uinv = np.unique(x, return_index=True, return_inverse=True)
max_width = np.bincount(x).max()
bar_width = 1/(max_width + 0.5)
locs = x.astype(float)
shifted = []
for i in range(max_width):
where = np.setdiff1d(uidx + i, shifted)
locs[where[where<len(locs)]] += i*bar_width
shifted = np.concatenate([shifted, where])
plt.bar(locs, y, bar_width)
If you want you can label them with the second column instead of x:
plt.xticks(locs + bar_width/2, label, rotation=-90)
I'll leave doing both of them as an exercise to the reader (mainly because I have no idea how you want them to show up).