My data & code are as below
w = [1,2,3,4,5,6,7,8,9,10]
vals = [[1,2,3,4,5,6,7,8,9,10],[2,4,6,8,8,8,8,8,7,1],[1,4,2,4,8,9,8,8,7,2]]
def plot_compare(*id_nums):
fig = plt.figure(figsize=(10, 5))
leg=[]
for id_num in id_nums:
rel = vals[id_num]
sns.lineplot(x=w, y=rel)
leg.append(id_num)
fig.legend(labels=[leg],loc=5,);
plot_compare(0,2)
The idea was to get multiple line plots with just one function (I my actual data I have a lot of values that need to be plotted)
When I run the code as above, I get the plot as below.
Line plots are exactly as I want, but the legend is just one item instead of 2 items (since I have plotted 2 line graphs).
I have tried moving the legend line inside of the for loop but no use. I want a may legends as the line plots.
Can anyone help?
You are having legend as list of list. Instead use fig.legend(labels=leg,loc=5)
Use:
w = [1,2,3,4,5,6,7,8,9,10]
vals = [[1,2,3,4,5,6,7,8,9,10],[2,4,6,8,8,8,8,8,7,1],[1,4,2,4,8,9,8,8,7,2]]
def plot_compare(*id_nums):
fig = plt.figure(figsize=(10, 5))
leg=[]
for id_num in id_nums:
rel = vals[id_num]
sns.lineplot(x=w, y=rel)
leg.append(id_num)
fig.legend(labels=leg,loc=5)
plt.show()
plot_compare(0,2)
Related
Here's my chart:
Unfortunately, this is there too, right below:
This is the code:
fig,ax1 = plt.subplots(6,1, figsize=(20,10),dpi=300)
fig2,ax2 = plt.subplots(6,1, figsize=(20,10),dpi=300)
for index, val in enumerate(datedf.columns):
g = ax1[index].plot(datedf.index, datedf[val], color=colors[index])
ax1[index].set(ylim=[-100,6500])
ax2[index] = ax1[index].twinx()
a = ax2[index].plot(qtydf.index, qtydf[val], color=colors[index], alpha=0.5)
ax2[index].set(ylim=[200,257000])
I tried this answer but I got an error on the first line (too many values to unpack)
Can anyone explain why?
You generate 2 figures, so you end up with 2 figures.
Instead you should do something like:
fig, axes = plt.subplots(6,1, figsize=(20,10),dpi=300)
for index, val in enumerate(datedf.columns):
ax1 = axes[index]
g = ax1.plot(datedf.index, datedf[val], color=colors[index])
ax1.set(ylim=[-100,6500])
ax2 = ax1.twinx()
ax2.plot(qtydf.index, qtydf[val], color=colors[index], alpha=0.5)
ax2.set(ylim=[200,257000])
NB. The code is untested as I don't have the original dataset.
I am trying to create a Manhattan plot that will be vertically highlighted at certain parts of the plot given a list of values corresponding to points in the scatter plot. I looked at several examples but I am not sure how to proceed. I think using axvspan or ax.fill_between should work but I am not sure how. The code below was lifted directly from
How to create a Manhattan plot with matplotlib in python?
from pandas import DataFrame
from scipy.stats import uniform
from scipy.stats import randint
import numpy as np
import matplotlib.pyplot as plt
# some sample data
df = DataFrame({'gene' : ['gene-%i' % i for i in np.arange(10000)],
'pvalue' : uniform.rvs(size=10000),
'chromosome' : ['ch-%i' % i for i in randint.rvs(0,12,size=10000)]})
# -log_10(pvalue)
df['minuslog10pvalue'] = -np.log10(df.pvalue)
df.chromosome = df.chromosome.astype('category')
df.chromosome = df.chromosome.cat.set_categories(['ch-%i' % i for i in range(12)], ordered=True)
df = df.sort_values('chromosome')
# How to plot gene vs. -log10(pvalue) and colour it by chromosome?
df['ind'] = range(len(df))
df_grouped = df.groupby(('chromosome'))
fig = plt.figure()
ax = fig.add_subplot(111)
colors = ['red','green','blue', 'yellow']
x_labels = []
x_labels_pos = []
for num, (name, group) in enumerate(df_grouped):
group.plot(kind='scatter', x='ind', y='minuslog10pvalue',color=colors[num % len(colors)], ax=ax)
x_labels.append(name)
x_labels_pos.append((group['ind'].iloc[-1] - (group['ind'].iloc[-1] - group['ind'].iloc[0])/2))
ax.set_xticks(x_labels_pos)
ax.set_xticklabels(x_labels)
ax.set_xlim([0, len(df)])
ax.set_ylim([0, 3.5])
ax.set_xlabel('Chromosome')
given a list of values of the point, pvalues e.g
lst = [0.288686, 0.242591, 0.095959, 3.291343, 1.526353]
How do I highlight the region containing these points on the plot just as shown in green in the image below? Something similar to:
]1
It would help if you have a sample of your dataframe for your reference.
Assuming you want to match your lst values with Y values, you need to iterate through each Y value you're plotting and check if they are within lst.
for num, (name, group) in enumerate(df_grouped):
group Variable in your code are essentially partial dataframes of your main dataframe, df. Hence, you need to put in another loop to look through all Y values for lst matches
region_plot = []
for num, (name, group) in enumerate(a.groupby('group')):
group.plot(kind='scatter', x='ind', y='minuslog10pvalue',color=colors[num % len(colors)], ax=ax)
#create a new df to get only rows that have matched values with lst
temp_group = group[group['minuslog10pvalue'].isin(lst)]
for x_group in temp_group['ind']:
#If condition to make sure same region is not highlighted again
if x_group not in region_plot:
region_plot.append(x_group)
ax.axvspan(x_group, x_group+1, alpha=0.5, color='green')
#I put x_group+1 because I'm not sure how big of a highlight range you want
Hope this helps!
I have a dataset containing 10 features and corresponding labels. I am using scatterplot to plot distinct pair of features to see which of them describe the labels perfectly (which means that total 45 plots will be created). In order to do that, I used a nested loop format. The code shows no error and I obtained all the plots as well. However, there is clearly something wrong with the code because each new scatterplot that gets created and saved is accumulating points from the previous ones as well. I am attaching the complete code which I used. How to fix this problem? Below is the link for raw dataset:
https://github.com/IITGuwahati-AI/Learning-Content/raw/master/Phase%203%20-%202020%20(Summer)/Week%201%20(Mar%2028%20-%20Apr%204)/assignment/data.txt
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
data_url ='https://raw.githubusercontent.com/diwakar1412/Learning-Content/master/DiwakarDas_184104503/datacsv.csv'
df = pd.read_csv(data_url)
df.head()
def transform_label(value):
if value >= 2:
return "BLUE"
else:
return "RED"
df["Label"] = df.Label.apply(transform_label)
df.head()
colors = {'RED':'r', 'BLUE':'b'}
fig, ax = plt.subplots()
for i in range(1,len(df.columns)):
for j in range(i+1,len(df.columns)):
for k in range(len(df[str(i)])):
ax.scatter(df[str(i)][k], df[str(j)][k], color=colors[df['Label'][k]])
ax.set_title('F%svsF%s' %(i,j))
ax.set_xlabel('%s' %i)
ax.set_ylabel('%s' %j)
plt.savefig('F%svsF%s' %(i,j))
Dataset
You have to create a new figure each time. Try to put
fig, ax = plt.subplots()
inside your loop:
for i in range(1,len(df.columns)):
for j in range(i+1,len(df.columns)):
fig, ax = plt.subplots() # <-------------- here
for k in range(len(df[str(i)])):
ax.scatter(df[str(i)][k], df[str(j)][k], color=colors[df['Label'][k]])
ax.set_title('F%svsF%s' %(i,j))
ax.set_xlabel('%s' %i)
ax.set_ylabel('%s' %j)
plt.savefig('/Users/Alessandro/Desktop/tmp/F%svsF%s' %(i,j))
The code below achieves what I want to do, but does so in a very roundabout way. I have looked around for a succinct way to produce a single legend for a figure that includes multiple subplots that takes into account their labels, to no avail. plt.figlegend() requires you to pass in labels and lines, and plt.legend() requires only handles (slightly better).
My example below illustrates what I want. I have 9 vectors, each with one of 3 categories. I want to plot each vector on a separate sub plot, label it, and plot a legend which indicates (using colour) what the label means; this is the automatic behaviour on a single plot.
Do you know of a better way of achieving the plot below?
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
nr_lines = 9
nr_cats = 3
np.random.seed(1337)
# Data
X = np.random.randn(nr_lines, 100)
labels = ['Category {}'.format(ii) for ii in range(nr_cats)]
y = np.random.choice(labels, nr_lines)
# Ideally wouldn't have to manually pick colours
clrs = matplotlib.rcParams['axes.prop_cycle'].by_key()['color']
clrs = [clrs[ii] for ii in range(nr_cats)]
lab_clr = {k: v for k, v in zip(labels, clrs)}
fig, ax = plt.subplots(3, 3)
ax = ax.flatten()
for ii in range(nr_lines):
ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
lines = [a.lines[0] for a in ax]
l_labels = [l.get_label() for l in lines]
# the hack - get a single occurance of each label
idx_list = [l_labels.index(lab) for lab in labels]
lines_ = [lines[idx] for idx in idx_list]
#l_labels_ = [l_labels[idx] for idx in idx_list]
plt.legend(handles=lines_, bbox_to_anchor=[2, 2.5])
plt.tight_layout()
plt.savefig('/home/james/Downloads/stack_figlegend_example.png',
bbox_inches='tight')
You could use a dictionary to collect them using the label as a key. For example:
handles = {}
for ii in range(nr_lines):
l1, = ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
if y[ii] not in handles:
handles[y[ii]] = l1
plt.legend(handles=handles.values(), bbox_to_anchor=[2, 2.5])
You only add a handle to the dictionary if the category isn't already present.
I am plotting grouped panda data frame
score = pd.DataFrame()
score['Score'] = svm_score
score['Wafer_Slot'] = desc.Wafer_Slot[test_index].tolist()
gscore = score.groupby('Wafer_Slot')
score_plot = [score for ws, score in gscore]
ax = gscore.boxplot(subplots=False)
ax.set_xticklabels(range(52)) # does not work
plt.xlabel('Wafer Slot')
plt.show()
It is working well but the x axis is impossible to read as there are numerous numbers overlapping. I would like the x axis be a counter of the boxplot.
How can I do that?
The boxplot method doesn't return the axes object like the plot method of DataFrames and Series. Try this:
gscore.boxplot(subplots=False)
ax = plt.gca()
ax.set_xticklabels(range(52))
The boxplot method returns a dict or OrderedDict of dicts of line objects by the look of it.