Trend graph with Matplotlib

Trend graph with Matplotlib - python

I have the following lists:
input = ['"25', '"500', '"10000', '"200000', '"1000000']
inComp = ['0.000001', '0.0110633', '4.1396405', '2569.270532', '49085.86398']
quickrComp=['0.0000001', '0.0003665', '0.005637', '0.1209121', '0.807273']
quickComp = ['0.000001', '0.0010253', '0.0318653', '0.8851902', '5.554448']
mergeComp = ['0.000224', '0.004089', '0.079448', '1.973014', '13.034443']
I need to create a trend graph to demonstrate the growth of the values of inComp, quickrComp, quickComp, mergeComp as the input values grow (input is the x-axis). I am using matplotlib.pyplot, and the following code:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(input,quickrComp, label="QR")
ax.plot(input,mergeComp, label="merge")
ax.plot(input, quickComp, label="Quick")
ax.plot(input, inComp, label="Insrção")
ax.legend()
plt.show()
However, what is happening is this: the values of the y-axis are disordered; the values of quickrComp on the y-axis are first inserted; then all mergeComp values and so on. I need the y-axis values to start at 0 and end at the highest of the 4-row values. How can I do this?

Two things: First, your y-values are strings. You need to convert the data to numeric (float) type. Second, your y-values in one of the lists are huge as compared to the remaining three lists. So you will have to convert the y-scale to logarithmic to see the trend. You can, in principle, convert your x-values to float (integers) as well but in your example, you don't need it. In case you want to do that, you will also have to remove the " from the front of each x-value.
A word of caution: Don't name your variables the same as in-built functions. In your case, you should rename input to something else, input1 for instance.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
input1 = ['"25', '"500', '"10000', '"200000', '"1000000']
inComp = ['0.000001', '0.0110633', '4.1396405', '2569.270532', '49085.86398']
quickrComp=['0.0000001', '0.0003665', '0.005637', '0.1209121', '0.807273']
quickComp = ['0.000001', '0.0010253', '0.0318653', '0.8851902', '5.554448']
mergeComp = ['0.000224', '0.004089', '0.079448', '1.973014', '13.034443']
ax.plot(input1, list(map(float, quickrComp)), label="QR")
ax.plot(input1, list(map(float, mergeComp)), label="merge")
ax.plot(input1, list(map(float, quickComp)), label="Quick")
ax.plot(input1, list(map(float, inComp)), label="Insrção")
ax.set_yscale('log')
ax.legend()
plt.show()

Related

Highlight part of scatter plot containing specific points in python

I am trying to create a Manhattan plot that will be vertically highlighted at certain parts of the plot given a list of values corresponding to points in the scatter plot. I looked at several examples but I am not sure how to proceed. I think using axvspan or ax.fill_between should work but I am not sure how. The code below was lifted directly from
How to create a Manhattan plot with matplotlib in python?
from pandas import DataFrame
from scipy.stats import uniform
from scipy.stats import randint
import numpy as np
import matplotlib.pyplot as plt
# some sample data
df = DataFrame({'gene' : ['gene-%i' % i for i in np.arange(10000)],
'pvalue' : uniform.rvs(size=10000),
'chromosome' : ['ch-%i' % i for i in randint.rvs(0,12,size=10000)]})
# -log_10(pvalue)
df['minuslog10pvalue'] = -np.log10(df.pvalue)
df.chromosome = df.chromosome.astype('category')
df.chromosome = df.chromosome.cat.set_categories(['ch-%i' % i for i in range(12)], ordered=True)
df = df.sort_values('chromosome')
# How to plot gene vs. -log10(pvalue) and colour it by chromosome?
df['ind'] = range(len(df))
df_grouped = df.groupby(('chromosome'))
fig = plt.figure()
ax = fig.add_subplot(111)
colors = ['red','green','blue', 'yellow']
x_labels = []
x_labels_pos = []
for num, (name, group) in enumerate(df_grouped):
group.plot(kind='scatter', x='ind', y='minuslog10pvalue',color=colors[num % len(colors)], ax=ax)
x_labels.append(name)
x_labels_pos.append((group['ind'].iloc[-1] - (group['ind'].iloc[-1] - group['ind'].iloc[0])/2))
ax.set_xticks(x_labels_pos)
ax.set_xticklabels(x_labels)
ax.set_xlim([0, len(df)])
ax.set_ylim([0, 3.5])
ax.set_xlabel('Chromosome')
given a list of values of the point, pvalues e.g
lst = [0.288686, 0.242591, 0.095959, 3.291343, 1.526353]
How do I highlight the region containing these points on the plot just as shown in green in the image below? Something similar to:
]1

It would help if you have a sample of your dataframe for your reference.
Assuming you want to match your lst values with Y values, you need to iterate through each Y value you're plotting and check if they are within lst.
for num, (name, group) in enumerate(df_grouped):
group Variable in your code are essentially partial dataframes of your main dataframe, df. Hence, you need to put in another loop to look through all Y values for lst matches
region_plot = []
for num, (name, group) in enumerate(a.groupby('group')):
group.plot(kind='scatter', x='ind', y='minuslog10pvalue',color=colors[num % len(colors)], ax=ax)
#create a new df to get only rows that have matched values with lst
temp_group = group[group['minuslog10pvalue'].isin(lst)]
for x_group in temp_group['ind']:
#If condition to make sure same region is not highlighted again
if x_group not in region_plot:
region_plot.append(x_group)
ax.axvspan(x_group, x_group+1, alpha=0.5, color='green')
#I put x_group+1 because I'm not sure how big of a highlight range you want
Hope this helps!

Matplotlib plot already binned data

I want to plot the mean local binary patterns histograms of a set of images. Here is what I did:
#calculates the lbp
lbp = feature.local_binary_pattern(image, 24, 8, method="uniform")
#Now I calculate the histogram of LBP Patterns
(hist, _) = np.histogram(lbp.ravel(), bins=np.arange(0, 27))
After that I simply sum up all the LBP histograms and take the mean of them. These are the values found, which are saved in a txt file:
2.962000000000000000e+03
1.476000000000000000e+03
1.128000000000000000e+03
1.164000000000000000e+03
1.282000000000000000e+03
1.661000000000000000e+03
2.253000000000000000e+03
3.378000000000000000e+03
4.490000000000000000e+03
5.010000000000000000e+03
4.337000000000000000e+03
3.222000000000000000e+03
2.460000000000000000e+03
2.495000000000000000e+03
2.599000000000000000e+03
2.934000000000000000e+03
2.526000000000000000e+03
1.971000000000000000e+03
1.303000000000000000e+03
9.900000000000000000e+02
7.980000000000000000e+02
8.680000000000000000e+02
1.119000000000000000e+03
1.479000000000000000e+03
4.355000000000000000e+03
3.112600000000000000e+04
I am trying to simply plot these values (don't need to calculate the histogram, because the values are already from a histogram). Here is what I've tried:
import matplotlib
matplotlib.use('Agg')
import numpy as np
import matplotlib.pyplot as plt
import plotly.plotly as py
#load data
data=np.loadtxt('original_dataset1.txt')
#convert to float
data=data.astype('float32')
#define number of Bins
n_bins = data.max() + 1
plt.style.use("ggplot")
(fig, ax) = plt.subplots()
fig.suptitle("Local Binary Patterns")
plt.ylabel("Frequency")
plt.xlabel("LBP value")
plt.bar(n_bins, data)
fig.savefig('lbp_histogram.png')
However, look at the Figure these commands produce:
I still dont understand what is happening. I would like to make a Figure like the one I produced in Excel using the same data, as follows:
I must confess that I am quite rookie with matplotlib. So, what was my mistake?

Try this. Here the array is your mean values from bins.
array = [2962,1476,1128,1164,1282,1661,2253]
fig,ax = plt.subplots(nrows=1, ncols=1,)
ax.bar(np.array(range(len(array)))+1,array,color='orangered')
ax.grid(axis='y')
for i, v in enumerate(array):
ax.text(i+1, v, str(v),color='black',fontweight='bold',
verticalalignment='bottom',horizontalalignment='center')
plt.savefig('savefig.png',dpi=150)
The plot look like this.

Replacing part of a plot with a dotted line

I would like to replace part of my plot where the function dips down to '-1' with a dashed line carrying on from the previous point (see plots below).
Here's some code I've written, along with its output:
import numpy as np
import matplotlib.pyplot as plt
y = [5,6,8,3,5,7,3,6,-1,3,8,5]
plt.plot(np.linspace(1,12,12),y,'r-o')
plt.show()
for i in range(1,len(y)):
if y[i]!=-1:
plt.plot(np.linspace(i-1,i,2),y[i-1:i+1],'r-o')
else:
y[i]=y[i-1]
plt.plot(np.linspace(i-1,i,2),y[i-1:i+1],'r--o')
plt.ylim(-1,9)
plt.show()
Here's the original plot
Modified plot:
The code I've written works (it produces the desired output), but it's inefficient and takes a long time when I actually run it on my (much larger) dataset. Is there a smarter way to go about doing this?

You can achieve something similar without the loops:
import pandas as pd
import matplotlib.pyplot as plt
# Create a data frame from the list
a = pd.DataFrame([5,6,-1,-1, 8,3,5,7,3,6,-1,3,8,5])
# Prepare a boolean mask
mask = a > 0
# New data frame with missing values filled with the last element of
# the previous segment. Choose 'bfill' to use the first element of
# the next segment.
a_masked = a[mask].fillna(method = 'ffill')
# Prepare the plot
fig, ax = plt.subplots()
line, = ax.plot(a_masked, ls = '--', lw = 1)
ax.plot(a[mask], color=line.get_color(), lw=1.5, marker = 'o')
plt.show()
You can also highlight the negative regions by choosing a different colour for the lines:
My answer is based on a great post from July, 2017. The latter also tackles the case when the first element is NaN or in your case a negative number:
Dotted lines instead of a missing value in matplotlib

I would use numpy functionality to cut your line into segments and then plot all solid and dashed lines separately. In the example below I added two additional -1s to your data to see that this works universally.
import numpy as np
import matplotlib.pyplot as plt
Y = np.array([5,6,-1,-1, 8,3,5,7,3,6,-1,3,8,5])
X = np.arange(len(Y))
idxs = np.where(Y==-1)[0]
sub_y = np.split(Y,idxs)
sub_x = np.split(X,idxs)
fig, ax = plt.subplots()
##replacing -1 values and plotting dotted lines
for i in range(1,len(sub_y)):
val = sub_y[i-1][-1]
sub_y[i][0] = val
ax.plot([sub_x[i-1][-1], sub_x[i][0]], [val, val], 'r--')
##plotting rest
for x,y in zip(sub_x, sub_y):
ax.plot(x, y, 'r-o')
plt.show()
The result looks like this:
Note, however, that this will fail if the first value is -1, as then your problem is not well defined (no previous value to copy from). Hope this helps.

Not too elegant, but here's something that doesn't use loops which I came up with (based on the above answers) which works. #KRKirov and #Thomas Kühn , thank you for your answers, I really appreciate them
import pandas as pd
import matplotlib.pyplot as plt
# Create a data frame from the list
a = pd.DataFrame([5,6,-1,-1, 8,3,5,7,3,6,-1,3,8,5])
b=a.copy()
b[2]=b[0].shift(1,axis=0)
b[4]=(b[0]!=-1) & (b[2]==-1)
b[5]=b[4].shift(-1,axis=0)
b[0] = (b[5] | b[4])
c=b[0]
d=pd.DataFrame(c)
# Prepare a boolean mask
mask = a > 0
# New data frame with missing values filled with the last element of
# the previous segment. Choose 'bfill' to use the first element of
# the next segment.
a_masked = a[mask].fillna(method = 'ffill')
# Prepare the plot
fig, ax = plt.subplots()
line, = ax.plot(a_masked, 'b:o', lw = 1)
ax.plot(a[mask], color=line.get_color(), lw=1.5, marker = 'o')
ax.plot(a_masked[d], color=line.get_color(), lw=1.5, marker = 'o')
plt.show()

Overlapping boxplots in python

I have the foll. dataframe:
Av_Temp Tot_Precip
278.001 0
274 0.0751864
270.294 0.631634
271.526 0.229285
272.246 0.0652201
273 0.0840059
270.463 0.0602944
269.983 0.103563
268.774 0.0694555
269.529 0.010908
270.062 0.043915
271.982 0.0295718
and want to plot a boxplot where the x-axis is 'Av_Temp' divided into equi-sized bins (say 2 in this case), and the Y-axis shows the corresponding range of values for Tot_Precip. I have the foll. code (thanks to Find pandas quartiles based on another column), however, when I plot the boxplots, they are getting plotted one on top of another. Any suggestions?
expl_var = 'Av_Temp'
cname = 'Tot_Precip'
df[expl_var+'_Deciles'] = pandas.qcut(df[expl_var], 2)
grp_df = df.groupby(expl_var+'_Deciles').apply(lambda x: numpy.array(x[cname]))
fig, ax = plt.subplots()
for i in range(len(grp_df)):
box_arr = grp_df[i]
box_arr = box_arr[~numpy.isnan(box_arr)]
stats = cbook.boxplot_stats(box_arr, labels = str(i))
ax.bxp(stats)
ax.set_yscale('log')
plt.show()

Since you're using pandas already, why not use the boxplot method on dataframes?
expl_var = 'Av_Temp'
cname = 'Tot_Precip'
df[expl_var+'_Deciles'] = pandas.qcut(df[expl_var], 2)
ax = df.boxplot(by='Av_Temp_Deciles', column='Tot_Precip')
ax.set_yscale('log')
That produces this: http://i.stack.imgur.com/20KPx.png
If you don't like the labels, throw in a
plt.xlabel('');plt.suptitle('');plt.title('')
If you want a standard boxplot, the above should be fine. My understanding of the separation of boxplot into boxplot_stats and bxp is to allow you to modify or replace the stats generated and fed to the plotting routine. See https://github.com/matplotlib/matplotlib/pull/2643 for some details.
If you need to draw a boxplot with non-standard stats, you can use boxplot_stats on 2D numpy arrays, so you only need to call it once. No loops required.
expl_var = 'Av_Temp'
cname = 'Tot_Precip'
df[expl_var+'_Deciles'] = pandas.qcut(df[expl_var], 2)
# I moved your nan check into the df apply function
grp_df = df.groupby('Av_Temp_Deciles').apply(lambda x: numpy.array(x[cname][~numpy.isnan(x[cname])]))
# boxplot_stats can take a 2D numpy array of data, and a 1D array of labels
# stats is now a list of dictionaries of stats, one dictionary per quantile
stats = cbook.boxplot_stats(grp_df.values, labels=grp_df.index)
# now it's a one-shot plot, no loops
fig, ax = plt.subplots()
ax.bxp(stats)
ax.set_yscale('log')

Formatting X axis with dates format Matplotlib

I have written code which plots the past seven day stock value for a user-determined stock market over time.
The problem I have is that I want to format the x axis in a YYMMDD format.
I also don't understand what 2.014041e7 means at the end of the x axis.
Values for x are:
20140421.0, 20140417.0, 20140416.0, 20140415.0, 20140414.0, 20140411.0, 20140410.0
Values for y are:
531.17, 524.94, 519.01, 517.96, 521.68, 519.61, 523.48
My code is as follows:
mini = min(y)
maxi = max(y)
minimum = mini - 75
maximum = maxi + 75
mini2 = int(min(x))
maxi2 = int(max(x))
plt.close('all')
fig, ax = plt.subplots(1)
pylab.ylim([minimum,maximum])
pylab.xlim([mini2,maxi2])
ax.plot(x, y)
ax.plot(x, y,'ro')
ax.plot(x, m*x + c)
ax.grid()
ax.plot()

When plotting your data using your method you are simply plotting your y data against numbers (floats) in x such as 20140421.0 (which I assume you wish to mean the date 21/04/2014).
You need to convert your data from these floats into an appropriate format for matplotlib to understand. The code below takes your two lists (x, y) and converts them.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
# Original data
raw_x = [20140421.0, 20140417.0, 20140416.0, 20140415.0, 20140414.0, 20140411.0, 20140410.0]
y = [531.17, 524.94, 519.01, 517.96, 521.68, 519.61, 523.48]
# Convert your x-data into an appropriate format.
# date_fmt is a string giving the correct format for your data. In this case
# we are using 'YYYYMMDD.0' as your dates are actually floats.
date_fmt = '%Y%m%d.0'
# Use a list comprehension to convert your dates into datetime objects.
# In the list comp. strptime is used to convert from a string to a datetime
# object.
dt_x = [dt.datetime.strptime(str(i), date_fmt) for i in raw_x]
# Finally we convert the datetime objects into the format used by matplotlib
# in plotting using matplotlib.dates.date2num
x = [mdates.date2num(i) for i in dt_x]
# Now to actually plot your data.
fig, ax = plt.subplots()
# Use plot_date rather than plot when dealing with time data.
ax.plot_date(x, y, 'bo-')
# Create a DateFormatter object which will format your tick labels properly.
# As given in your question I have chosen "YYMMDD"
date_formatter = mdates.DateFormatter('%y%m%d')
# Set the major tick formatter to use your date formatter.
ax.xaxis.set_major_formatter(date_formatter)
# This simply rotates the x-axis tick labels slightly so they fit nicely.
fig.autofmt_xdate()
plt.show()
The code is commented throughout so should be easily self explanatory. Details on the various modules can be found below:
datetime
matplotlib.dates

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trend graph with Matplotlib - python

Related

Highlight part of scatter plot containing specific points in python

Matplotlib plot already binned data

Replacing part of a plot with a dotted line

Overlapping boxplots in python

Formatting X axis with dates format Matplotlib

Categories

Resources