I am plotting 2D numpy arrays using
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1,2,3])
y = np.array([[2,2.2,3],[1,5,1]])
plt.plot(x,y.T[:,:])
plt.legend()
plt.show()
I want a legend that tells which line belongs to which row. Of course, I realize I can't give it meaningful names, but I need some sort of unique label for the line without running through loop.
import numpy as np
import matplotlib.pyplot as plt
import uuid
x = np.array([1,2,3])
y = np.array([[2,2.2,3],[1,5,1]])
fig, ax = plt.subplots()
lines = ax.plot(x,y.T[:,:])
ax.legend(lines, [str(uuid.uuid4())[:6] for j in range(len(lines))])
plt.show()
(This is off of the current mpl master branch with a preview of the 2.0 default styles)
Related
I am trying to connect lines based on a specific relationship associated with the points. In this example the lines would connect the players by which court they played in. I can create the basic structure but haven't figured out a reasonably simple way to create this added feature.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df_dict={'court':[1,1,2,2,3,3,4,4],
'player':['Bob','Ian','Bob','Ian','Bob','Ian','Ian','Bob'],
'score':[6,8,12,15,8,16,11,13],
'win':['no','yes','no','yes','no','yes','no','yes']}
df=pd.DataFrame.from_dict(df_dict)
ax = sns.boxplot(x='score',y='player',data=df)
ax = sns.swarmplot(x='score',y='player',hue='win',data=df,s=10,palette=['red','green'])
plt.show()
This code generates the following plot minus the gray lines that I am after.
You can use lineplot here:
sns.lineplot(
data=df, x="score", y="player", units="court",
color=".7", estimator=None
)
The player name is converted to an integer as a flag, which is used as the value of the y-axis, and a loop process is applied to each position on the court to draw a line.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df_dict={'court':[1,1,2,2,3,3,4,4],
'player':['Bob','Ian','Bob','Ian','Bob','Ian','Ian','Bob'],
'score':[6,8,12,15,8,16,11,13],
'win':['no','yes','no','yes','no','yes','no','yes']}
df=pd.DataFrame.from_dict(df_dict)
ax = sns.boxplot(x='score',y='player',data=df)
ax = sns.swarmplot(x='score',y='player',hue='win',data=df,s=10,palette=['red','green'])
df['flg'] = df['player'].apply(lambda x: 0 if x == 'Bob' else 1)
for i in df.court.unique():
dfq = df.query('court == #i').reset_index()
ax.plot(dfq['score'], dfq['flg'], 'g-')
plt.show()
I can plot multiple histograms in a single plot using pandas but there are few things missing:
How to give the label.
I can only plot one figure, how to change it to layout=(3,1) or something else.
Also, in figure 1, all the bins are filled with solid colors, and its kind of difficult to know which is which, how to fill then with different markers (eg. crosses,slashes,etc)?
Here is the MWE:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('iris')
df.groupby('species')['sepal_length'].hist(alpha=0.7,label='species')
plt.legend()
Output:
To change layout I can use by keyword, but can't give them colors
HOW TO GIVE DIFFERENT COLORS?
df.hist('sepal_length',by='species',layout=(3,1))
plt.tight_layout()
Gives:
You can resolve to groupby:
fig,ax = plt.subplots()
hatches = ('\\', '//', '..') # fill pattern
for (i, d),hatch in zip(df.groupby('species'), hatches):
d['sepal_length'].hist(alpha=0.7, ax=ax, label=i, hatch=hatch)
ax.legend()
Output:
In pandas version 1.1.0 you can simply set the legend keyword to true.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('iris')
df.groupby('species')['sepal_length'].hist(alpha=0.7, legend = True)
output image
It's more code, but using pure matplotlib will always give you more control over the plots. For your second case:
import matplotlib.pyplot as plt
import numpy as np
from itertools import zip_longest
# Dictionary of color for each species
color_d = dict(zip_longest(df.species.unique(),
plt.rcParams['axes.prop_cycle'].by_key()['color']))
# Use the same bins for each
xmin = df.sepal_length.min()
xmax = df.sepal_length.max()
bins = np.linspace(xmin, xmax, 20)
# Set up correct number of subplots, space them out.
fig, ax = plt.subplots(nrows=df.species.nunique(), figsize=(4,8))
plt.subplots_adjust(hspace=0.4)
for i, (lab, gp) in enumerate(df.groupby('species')):
ax[i].hist(gp.sepal_length, ec='k', bins=bins, color=color_d[lab])
ax[i].set_title(lab)
# same xlim for each so we can see differences
ax[i].set_xlim(xmin, xmax)
I tried to plot a bar figure and I want x-label to remain the specific order, so I use set_xticklabels. However, the result turns out the y-value didn't match the x-label.
import matplotlib.pyplot as plt
A=['Dog','Cat','Fish','Bird']
B=[26,39,10,20]
fig=plt.figure()
ax1 = fig.add_subplot(1,1,1)
ax1.bar(A, B)
ax1.set_xticklabels(A)
plt.title("Animals")
plt.show()
The expected result is Dog=26 Cat=39 Fish=10 Bird=20, but the result I got is Dog=20 Cat=39 Fish=26 Bird=20.
Here is one answer I found. However, if I use this method I cannot keep the original order I want.
import itertools
import matplotlib.pyplot as plt
A=['Dog','Cat','Fish','Bird']
B=[26,39,10,20]
lists = sorted(itertools.izip(*[A, B]))
new_x, new_y = list(itertools.izip(*lists))
fig=plt.figure()
ax1 = fig.add_subplot(1,1,1)
ax1.bar(new_x, new_y )
ax1.set_xticklabels(new_x)
plt.title("Animals")
plt.show()
Is there any way I can keep the original order of x-label and make y value match with x?
This code will serve the purpose,
import numpy as np
import matplotlib.pyplot as plt
A=['Dog','Cat','Fish','Bird']
B=[26,39,10,20]
y_pos = np.arange(len(A))
plt.bar(y_pos, B)
plt.xticks(y_pos, A)
plt.title("Animals")
plt.show()
Why don't you use pandas for storing your data:
import pandas as pd
import matplotlib
A= ['Dog','Cat','Fish','Bird']
B= [26,39,10,20]
ser = pd.Series(index=A, values=B)
ax = ser.loc[A].plot(kind='bar', legend=False)
ax.set_ylabel("Value")
ax.set_xlabel("Animals")
plt.show()
In matplotlib 2.2 you can just plot those lists as they are and get the correct result.
import matplotlib.pyplot as plt
A=['Dog','Cat','Fish','Bird']
B=[26,39,10,20]
plt.bar(A, B)
plt.title("Animals")
plt.show()
I have this code :
import numpy as np
import pylab as plt
a = np.array([1,2,3,4,5,6,7,8,9,10])
b = np.exp(a)
plt.plot(a,b,'.')
plt.show()
The code works fine, but I need to modify the x-axis labels of the plot.
I would like the x-axis labels to be all powers of 10 according to the a axis inputs. for the example code, it would be like [10^1, 10^2, ..., 10^10].
I would appreciate any suggestions.
Thank you !
import numpy as np
import pylab as plt
a = np.array([1,2,3,4,5,6,7,8,9,10])
# this is it, but better use floats like 10.0,
# a integer might not hold values that big
b = 10.0 ** a
plt.plot(a,b,'.')
plt.show()
This code probably is what you need:
import numpy as np
import pylab as plt
a = np.asarray([1,2,3,4,5,6,7,8,9,10])
b = np.exp(a)
c = np.asarray([10**i for i in a])
print(list(zip(a,c)))
plt.xticks(a, c)
plt.plot(a,b,'.')
plt.show()
By using plt.xtick() you can customize your x-label of plot. I also replaced 10^i with 10**i.
I'm making a clustered heatmap in seaborn as follows
import numpy as np
import seaborn as sns
np.random.seed(2)
data = np.random.randn(100, 10)
sns.clustermap(data)
but the rows are squished:
but if I pass a size to the clustermap function then it looks terrible
is there a way to only increase the size of the heatmap part? So that the row names can be read, but not stretch out the cluster portions.
As #mwaskom commented, I was able to use ax_heatmap.set_position along with the get_position function to achieve the correct result.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(2)
data = np.random.randn(100, 10)
cm = sns.clustermap(data)
hm = cm.ax_heatmap.get_position()
plt.setp(cm.ax_heatmap.yaxis.get_majorticklabels(), fontsize=6)
cm.ax_heatmap.set_position([hm.x0, hm.y0, hm.width*0.25, hm.height])
col = cm.ax_col_dendrogram.get_position()
cm.ax_col_dendrogram.set_position([col.x0, col.y0, col.width*0.25, col.height*0.5])
This can be done by passing the value of the dendrogram ratio in the kw arguments
import numpy as np
import seaborn as sns
np.random.seed(2)
data = np.random.randn(100, 10)
sns.clustermap(data,figsize=(12,30),dendrogram_ratio=0.02,cmap='RdBu')