How does one stop graphs from the same code from overlapping? - python

The graphs that are output from two distinct nx.draw_networkx commands seem to overlap in the console. How does one properly separate different graphs?
This feels like a silly question, but I have yet to find any solution on the web.
def desenho(dados)
edges,weights = zip(*nx.get_edge_attributes(dados,'weight').items())
pos = nx.spring_layout(dados)
print(nx.draw_networkx(dados, pos, node_color='purple', edgelist=edges, edge_color=weights, width=5.0, edge_cmap=plt.cm.jet), '\n')
graph_1 = desenho(data_1)
graph_2 = desenho(data_2)
I'd expect that each output would process and match with the empty string I put in to create some distance between them, but that isn't happening. What am I doing wrong here?
Present output:
Expected output:
I'd also appreciate suggestions on how to make the color_map a bit less extreme in its gradient.

You need to create a subplot for each graph that you want to plot. For example:
import matplotlib.pyplot as plt
import networkx as nx
def desenho(dados):
edges,weights = zip(*nx.get_edge_attributes(dados,'weight').items())
pos = nx.spring_layout(dados)
nx.draw_networkx(dados, pos, node_color='purple', edgelist=edges, edge_color=weights, width=5.0, edge_cmap=plt.cm.jet)
fig = plt.figure()
ax = fig.add_subplot(2,1,1) # 2,1,1 means: 2:two rows, 1: one column, 1: first plot
graph_1 = desenho(data_1)
ax2 = fig.add_subplot(2,1,2) # 2,1,2 means: 2:two rows, 1: one column, 1: second plot
graph_2 = desenho(data_2)

Related

Creating a complex legend with matplotlib

I'm currently trying to make a nested doughnut chart with four layers and I have come across some problems with it.
There is one dependency in my data. I look into the changes done with a specific method and divide them into agronomical and academic traits. I then create a fourth ring which shows basically the amount of academic and each agronomical trait. I don't know how to automatically align both doughnut rings so they match.
I looked into the matplotlib documentation, but I don't understand the addressing of the colormaps. I took over the example code, but in the end its not really understandable how this is addressing the colors of it.
I need to make a legend for the chart. However, due to the long names of some of the subgroups, I can not show them in the pie chart but they should appear in the legend. When I draw the legend via the ax.legend function, it adds only the groups to the legend which I addressed in the ax.pie function with labels=, if I use fig.legend for drawing the legend, the colors are not matching at all. I tried to use the handles= function I stumbled across some posts here on StackOverflow. But they just give me an error
AttributeError: 'tuple' object has no attribute 'legend'
I would like to add the pct and number of occurrences to my legend, but I guess there is no "easy" way for that?
´´´
import numpy as np
import pandas
import pandas as pd
import matplotlib.pyplot as plt
import openpyxl
df = pandas.read_excel("savedrecs.xlsx", sheet_name="test")
#print(df.head())
size = 0.3
fig, ax = plt.subplots(figsize=(12,8))
#Colors----
cmap1 = plt.get_cmap("tab20c")
cmap2 = plt.get_cmap("tab10")
outer_colors = cmap1(np.arange(20))
inner_colors = cmap1(np.arange(12))
sr_colors = cmap1(np.arange(5,6))
#Data----
third_ring = df[df["Group"].str.contains("group")]
fourth_ring = df[df["Group"].str.contains("Target trait")]
second_ring = df[df["Group"].str.contains("Cultivar")]
first_ring = df[df["Group"].str.contains("Mutation")]
#----
#---Testautopct---
def make_autopct(values):
def my_autopct(pct):
total = sum(values)
val = int(round(pct*total/100.0))
return '{p:.2f}%\n({v:d})'.format(p=pct,v=val)
return my_autopct
#-----
#Piechart----
ir = ax.pie(first_ring["Occurence"], radius=1-size, labels=first_ring["Name"], textprops={"fontsize":8},labeldistance=0,
colors=sr_colors, wedgeprops=dict(edgecolor="w"))
sr = ax.pie(second_ring["Occurence"],
autopct=make_autopct(second_ring["Occurence"]),pctdistance=0.83,textprops={"fontsize":8},
radius=1,wedgeprops=dict(width=size, edgecolor="w"),startangle=90,colors=inner_colors)
tr = ax.pie(third_ring["Occurence"],
autopct=make_autopct(third_ring["Occurence"]),labels=third_ring["Name"],pctdistance=0.83,textprops={"fontsize":8},
radius=1+size,wedgeprops=dict(width=size, edgecolor="w"),startangle=90,colors=outer_colors)
fr = ax.pie(fourth_ring["Occurence"],
autopct=make_autopct(fourth_ring["Occurence"]),labels=fourth_ring["Name"],pctdistance=0.83,textprops={"fontsize":8},
radius=1+size*2,wedgeprops=dict(width=size, edgecolor="w"),startangle=90,colors=outer_colors)
#---Legend & Title----
ax.legend( bbox_to_anchor=(1.04, 0.5), loc="center left", borderaxespad=10 ,fancybox=True, shadow=False, ncol=1, title="This will be a fancy legend title")
fig.suptitle("This will be a fancy title, which I don't know yet!")
#----
plt.tight_layout()
plt.show()
´´´
The output of this code is then as follows:

Highlight part of scatter plot containing specific points in python

I am trying to create a Manhattan plot that will be vertically highlighted at certain parts of the plot given a list of values corresponding to points in the scatter plot. I looked at several examples but I am not sure how to proceed. I think using axvspan or ax.fill_between should work but I am not sure how. The code below was lifted directly from
How to create a Manhattan plot with matplotlib in python?
from pandas import DataFrame
from scipy.stats import uniform
from scipy.stats import randint
import numpy as np
import matplotlib.pyplot as plt
# some sample data
df = DataFrame({'gene' : ['gene-%i' % i for i in np.arange(10000)],
'pvalue' : uniform.rvs(size=10000),
'chromosome' : ['ch-%i' % i for i in randint.rvs(0,12,size=10000)]})
# -log_10(pvalue)
df['minuslog10pvalue'] = -np.log10(df.pvalue)
df.chromosome = df.chromosome.astype('category')
df.chromosome = df.chromosome.cat.set_categories(['ch-%i' % i for i in range(12)], ordered=True)
df = df.sort_values('chromosome')
# How to plot gene vs. -log10(pvalue) and colour it by chromosome?
df['ind'] = range(len(df))
df_grouped = df.groupby(('chromosome'))
fig = plt.figure()
ax = fig.add_subplot(111)
colors = ['red','green','blue', 'yellow']
x_labels = []
x_labels_pos = []
for num, (name, group) in enumerate(df_grouped):
group.plot(kind='scatter', x='ind', y='minuslog10pvalue',color=colors[num % len(colors)], ax=ax)
x_labels.append(name)
x_labels_pos.append((group['ind'].iloc[-1] - (group['ind'].iloc[-1] - group['ind'].iloc[0])/2))
ax.set_xticks(x_labels_pos)
ax.set_xticklabels(x_labels)
ax.set_xlim([0, len(df)])
ax.set_ylim([0, 3.5])
ax.set_xlabel('Chromosome')
given a list of values of the point, pvalues e.g
lst = [0.288686, 0.242591, 0.095959, 3.291343, 1.526353]
How do I highlight the region containing these points on the plot just as shown in green in the image below? Something similar to:
]1
It would help if you have a sample of your dataframe for your reference.
Assuming you want to match your lst values with Y values, you need to iterate through each Y value you're plotting and check if they are within lst.
for num, (name, group) in enumerate(df_grouped):
group Variable in your code are essentially partial dataframes of your main dataframe, df. Hence, you need to put in another loop to look through all Y values for lst matches
region_plot = []
for num, (name, group) in enumerate(a.groupby('group')):
group.plot(kind='scatter', x='ind', y='minuslog10pvalue',color=colors[num % len(colors)], ax=ax)
#create a new df to get only rows that have matched values with lst
temp_group = group[group['minuslog10pvalue'].isin(lst)]
for x_group in temp_group['ind']:
#If condition to make sure same region is not highlighted again
if x_group not in region_plot:
region_plot.append(x_group)
ax.axvspan(x_group, x_group+1, alpha=0.5, color='green')
#I put x_group+1 because I'm not sure how big of a highlight range you want
Hope this helps!

Replacing part of a plot with a dotted line

I would like to replace part of my plot where the function dips down to '-1' with a dashed line carrying on from the previous point (see plots below).
Here's some code I've written, along with its output:
import numpy as np
import matplotlib.pyplot as plt
y = [5,6,8,3,5,7,3,6,-1,3,8,5]
plt.plot(np.linspace(1,12,12),y,'r-o')
plt.show()
for i in range(1,len(y)):
if y[i]!=-1:
plt.plot(np.linspace(i-1,i,2),y[i-1:i+1],'r-o')
else:
y[i]=y[i-1]
plt.plot(np.linspace(i-1,i,2),y[i-1:i+1],'r--o')
plt.ylim(-1,9)
plt.show()
Here's the original plot
Modified plot:
The code I've written works (it produces the desired output), but it's inefficient and takes a long time when I actually run it on my (much larger) dataset. Is there a smarter way to go about doing this?
You can achieve something similar without the loops:
import pandas as pd
import matplotlib.pyplot as plt
# Create a data frame from the list
a = pd.DataFrame([5,6,-1,-1, 8,3,5,7,3,6,-1,3,8,5])
# Prepare a boolean mask
mask = a > 0
# New data frame with missing values filled with the last element of
# the previous segment. Choose 'bfill' to use the first element of
# the next segment.
a_masked = a[mask].fillna(method = 'ffill')
# Prepare the plot
fig, ax = plt.subplots()
line, = ax.plot(a_masked, ls = '--', lw = 1)
ax.plot(a[mask], color=line.get_color(), lw=1.5, marker = 'o')
plt.show()
You can also highlight the negative regions by choosing a different colour for the lines:
My answer is based on a great post from July, 2017. The latter also tackles the case when the first element is NaN or in your case a negative number:
Dotted lines instead of a missing value in matplotlib
I would use numpy functionality to cut your line into segments and then plot all solid and dashed lines separately. In the example below I added two additional -1s to your data to see that this works universally.
import numpy as np
import matplotlib.pyplot as plt
Y = np.array([5,6,-1,-1, 8,3,5,7,3,6,-1,3,8,5])
X = np.arange(len(Y))
idxs = np.where(Y==-1)[0]
sub_y = np.split(Y,idxs)
sub_x = np.split(X,idxs)
fig, ax = plt.subplots()
##replacing -1 values and plotting dotted lines
for i in range(1,len(sub_y)):
val = sub_y[i-1][-1]
sub_y[i][0] = val
ax.plot([sub_x[i-1][-1], sub_x[i][0]], [val, val], 'r--')
##plotting rest
for x,y in zip(sub_x, sub_y):
ax.plot(x, y, 'r-o')
plt.show()
The result looks like this:
Note, however, that this will fail if the first value is -1, as then your problem is not well defined (no previous value to copy from). Hope this helps.
Not too elegant, but here's something that doesn't use loops which I came up with (based on the above answers) which works. #KRKirov and #Thomas Kühn , thank you for your answers, I really appreciate them
import pandas as pd
import matplotlib.pyplot as plt
# Create a data frame from the list
a = pd.DataFrame([5,6,-1,-1, 8,3,5,7,3,6,-1,3,8,5])
b=a.copy()
b[2]=b[0].shift(1,axis=0)
b[4]=(b[0]!=-1) & (b[2]==-1)
b[5]=b[4].shift(-1,axis=0)
b[0] = (b[5] | b[4])
c=b[0]
d=pd.DataFrame(c)
# Prepare a boolean mask
mask = a > 0
# New data frame with missing values filled with the last element of
# the previous segment. Choose 'bfill' to use the first element of
# the next segment.
a_masked = a[mask].fillna(method = 'ffill')
# Prepare the plot
fig, ax = plt.subplots()
line, = ax.plot(a_masked, 'b:o', lw = 1)
ax.plot(a[mask], color=line.get_color(), lw=1.5, marker = 'o')
ax.plot(a_masked[d], color=line.get_color(), lw=1.5, marker = 'o')
plt.show()

Matplotlib: Automatic coloured legend for all subplots using subplot line labels

The code below achieves what I want to do, but does so in a very roundabout way. I have looked around for a succinct way to produce a single legend for a figure that includes multiple subplots that takes into account their labels, to no avail. plt.figlegend() requires you to pass in labels and lines, and plt.legend() requires only handles (slightly better).
My example below illustrates what I want. I have 9 vectors, each with one of 3 categories. I want to plot each vector on a separate sub plot, label it, and plot a legend which indicates (using colour) what the label means; this is the automatic behaviour on a single plot.
Do you know of a better way of achieving the plot below?
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
nr_lines = 9
nr_cats = 3
np.random.seed(1337)
# Data
X = np.random.randn(nr_lines, 100)
labels = ['Category {}'.format(ii) for ii in range(nr_cats)]
y = np.random.choice(labels, nr_lines)
# Ideally wouldn't have to manually pick colours
clrs = matplotlib.rcParams['axes.prop_cycle'].by_key()['color']
clrs = [clrs[ii] for ii in range(nr_cats)]
lab_clr = {k: v for k, v in zip(labels, clrs)}
fig, ax = plt.subplots(3, 3)
ax = ax.flatten()
for ii in range(nr_lines):
ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
lines = [a.lines[0] for a in ax]
l_labels = [l.get_label() for l in lines]
# the hack - get a single occurance of each label
idx_list = [l_labels.index(lab) for lab in labels]
lines_ = [lines[idx] for idx in idx_list]
#l_labels_ = [l_labels[idx] for idx in idx_list]
plt.legend(handles=lines_, bbox_to_anchor=[2, 2.5])
plt.tight_layout()
plt.savefig('/home/james/Downloads/stack_figlegend_example.png',
bbox_inches='tight')
You could use a dictionary to collect them using the label as a key. For example:
handles = {}
for ii in range(nr_lines):
l1, = ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
if y[ii] not in handles:
handles[y[ii]] = l1
plt.legend(handles=handles.values(), bbox_to_anchor=[2, 2.5])
You only add a handle to the dictionary if the category isn't already present.

what is the corresponding matplotlib code of this matlab code

I'm trying to go away from matlab and use python + matplotlib instead. However, I haven't really figured out what the matplotlib equivalent of matlab 'handles' is. So here's some matlab code where I return the handles so that I can change certain properties. What is the exact equivalent of this code using matplotlib? I very often use the 'Tag' property of handles in matlab and use 'findobj' with it. Can this be done with matplotlib as well?
% create figure and return figure handle
h = figure();
% add a plot and tag it so we can find the handle later
plot(1:10, 1:10, 'Tag', 'dummy')
% add a legend
my_legend = legend('a line')
% change figure name
set(h, 'name', 'myfigure')
% find current axes
my_axis = gca();
% change xlimits
set(my_axis, 'XLim', [0 5])
% find the plot object generated above and modify YData
set(findobj('Tag', 'dummy'), 'YData', repmat(10, 1, 10))
There is a findobj method is matplotlib too:
import matplotlib.pyplot as plt
import numpy as np
h = plt.figure()
plt.plot(range(1,11), range(1,11), gid='dummy')
my_legend = plt.legend(['a line'])
plt.title('myfigure') # not sure if this is the same as set(h, 'name', 'myfigure')
my_axis = plt.gca()
my_axis.set_xlim(0,5)
for p in set(h.findobj(lambda x: x.get_gid()=='dummy')):
p.set_ydata(np.ones(10)*10.0)
plt.show()
Note that the gid parameter in plt.plot is usually used by matplotlib (only) when the backend is set to 'svg'. It use the gid as the id attribute to some grouping elements (like line2d, patch, text).
I have not used matlab but I think this is what you want
import matplotlib
import matplotlib.pyplot as plt
x = [1,3,4,5,6]
y = [1,9,16,25,36]
fig = plt.figure()
ax = fig.add_subplot(111) # add a plot
ax.set_title('y = x^2')
line1, = ax.plot(x, y, 'o-') #x1,y1 are lists(equal size)
line1.set_ydata(y2) #Use this to modify Ydata
plt.show()
Of course, this is just a basic plot, there is more to it.Go though this to find the graph you want and view its source code.
# create figure and return figure handle
h = figure()
# add a plot but tagging like matlab is not available here. But you can
# set one of the attributes to find it later. url seems harmless to modify.
# plot() returns a list of Line2D instances which you can store in a variable
p = plot(arange(1,11), arange(1,11), url='my_tag')
# add a legend
my_legend = legend(p,('a line',))
# you could also do
# p = plot(arange(1,11), arange(1,11), label='a line', url='my_tag')
# legend()
# or
# p[0].set_label('a line')
# legend()
# change figure name: not sure what this is for.
# set(h, 'name', 'myfigure')
# find current axes
my_axis = gca()
# change xlimits
my_axis.set_xlim(0, 5)
# You could compress the above two lines of code into:
# xlim(start, end)
# find the plot object generated above and modify YData
# findobj in matplotlib needs you to write a boolean function to
# match selection criteria.
# Here we use a lambda function to return only Line2D objects
# with the url property set to 'my_tag'
q = h.findobj(lambda x: isinstance(x, Line2D) and x.get_url() == 'my_tag')
# findobj returns duplicate objects in the list. We can take the first entry.
q[0].set_ydata(ones(10)*10.0)
# now refresh the figure
draw()

Categories

Resources