How to plot one to many plot using matplotlib - python

Update: Removing the screenshot, Below is the code from the screenshot:
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(5)
y = np.array([[1,2],[1,2,3],[1,2,3,4],[1,2,3,4,5],[1,2,5,7,9]])
plt.plot(x,y) #gives ValueError: setting an array element with a sequence.
#A relaistic example
age = [20,30,40,50,60]
salary = np.array([[200,350,414],[300,500,612,700],[500,819],[900,1012],[812,712]])
plt.plot(age,salary) #gives ValueError: setting an array element with a sequence.
I am having two arrays each of size 5, elements of y are arrays, and I want them to be plotted against each x, for example at x = 0, I want to plot all the points from y[0], is there a way?
Update: Added another example above to show a realistic case , where I need to plot different salaries of different age people, each age people can have more than one salary.

List comprehension to the rescue!
import numpy as np
import matplotlib.pyplot as plt
age = [20,30,40,50,60]
salary = np.array([[200,350,414],[300,500,612,700],[500,819],[900,1012],[812,712]])
#creating x-y tuples
xy = [(k, j) for i, k in enumerate(age) for j in salary[i]]
#unpacking the tuples with zip
plt.scatter(*zip(*xy))
plt.show()
Sample output:
However, irregular numpy arrays should not be created, and this example works perfectly well with a normal list. Just saying.

As of now I am using the following workaround, but looking for a simpler solution:
indx = -1
for a in age:
indx+=1
for s in salary[indx]:
plt.plot(a,s,'o')
plt.show()

Related

MatplotLib.pyplot.scatter not plotting normally when a new list added to the array

I was working with NumPy and Pandas to create some artificial data for testing models.
First, I coded this:
# Constructing some random data for experiments
import math
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
np.random.seed(42)
# Rectangular Data
total_n = 500
x = np.random.rand(total_n)*10
y = np.random.rand(total_n)*10
divider = 260
# Two lambda functions are for shifting the data, the numbers are chosen arbitrarily
f = lambda a: a*2
x[divider:] = f(x[divider:])
y[divider:] = f(y[divider:])
g = lambda a: a*3 + 5
x[:divider] = g(x[:divider])
y[:divider] = g(y[:divider])
# Colours array for separating the data
colors = ['blue']*divider + ['red']*(total_n-divider)
squares = np.array([x,y])
plt.scatter(squares[0],squares[1], c=colors, alpha=0.5)
I got what I wanted:
The Data I wanted
But I wanted to add the colors array to the numpy array, to take it as a Label variable so I added this to the code:
# Constructing some random data for experiments
import math
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
np.random.seed(42)
# Rectangular Data
total_n = 500
x = np.random.rand(total_n)*10
y = np.random.rand(total_n)*10
divider = 260
# Two lambda functions are for shifting the data, the numbers are chosen arbitrarily
f = lambda a: a*2
x[divider:] = f(x[divider:])
y[divider:] = f(y[divider:])
g = lambda a: a*3 + 5
x[:divider] = g(x[:divider])
y[:divider] = g(y[:divider])
# Colours array for separating the data
colors = ['blue']*divider + ['red']*(total_n-divider)
squares = np.array([x,y,colors])
plt.scatter(squares[0],squares[1], c=colors, alpha=0.5)
And everything just blows out:
The Blown out Data
I got my work around this by separating the label from the whole numpy array. But still what's going on here??
Alright so I think I have the answer. A Numpy array can only have one type of data which is infered when creating the array if it is not given. When you create squares with colors in it, then squares.dtype='<U32', which means that all values are converted to a little-endian 32 character string.
To avoid that you can:
use a simple list
use a pandas dataframe, as they accept columns of different types
if you want to use numpy you can use a structured array as follow
zipped = [z for z in zip(x, y, colors)]
#input must be a list of tuples/list representing rows
#the transformation is made with zip
dtype = np.dtype([('x', float), ('y', float), ('colors', 'U10')])
#type of data, 10 characters string is U10
squares = np.array(zipped, dtype=dtype)
#creating the array by precising the type
plt.scatter(squares["x"],squares["y"], c=squares["colors"], alpha=0.5)
#when plotting call the corresponding column, just as in a dataframe

Simulate the compound random variable S

Let S=X_1+X_2+...+X_N where N is a nonnegative integer-valued random variable and X_1,X_2,... are i.i.d random variables.(If N=0, we set S=0).
Simulate S in the case where N ~ Poi(100) and X_i ~ Exp(0.5). (draw histograms and use the numpy or scipy built-in functions).And check the equations E(S)=E(N)*E(X_1) and Var(S)=E(N)*Var(X_1)+E(X_1)^2 *Var(N)
I was trying to solve it, but I'm not sure yet of everything and also got stuck on the histogram part. Note: I'm new to python or more generally , new to programming.
My work:
import scipy.stats as stats
import matplotlib as plt
N = stats.poisson(100)
X = stats.expon(0.5)
arr = X.rvs(N.rvs())
S = 0
for i in arr:
S=S+i
print(arr)
print("S=",S)
expected_S = (N.mean())*(X.mean())
variance_S = (N.mean()*X.var()) + (X.mean()*X.mean()*N.var())
print("E(X)=",expected_S)
print("Var(S)=",variance_S)
Your existing code mostly looks sensible, but I'd simplify:
arr = X.rvs(N.rvs())
S = 0
for i in arr:
S=S+i
down to:
S = X.rvs(N.rvs()).sum()
To draw a histogram, you need many samples from this distribution, which is now easily accomplished via:
arr = []
for _ in range(10_000):
arr.append(X.rvs(N.rvs()).sum())
or, equivalently, using a list comprehension:
arr = [X.rvs(N.rvs()).sum() for _ in range(10_000)]
to plot these in a histogram, you need the pyplot module from Matplotlib, so your import should be:
from matplotlib.pyplot import plt
plt.hist(arr, 50)
The 50 above says to use that number of "bins" when drawing the histogram. We can also compare these to the mean and variance you calculated by assuming the distribution is well approximated by a normal:
approx = stats.norm(expected_S, np.sqrt(variance_S))
_, x, _ = plt.hist(arr, 50, density=True)
plt.plot(x, approx.pdf(x))
This works because the second value returned from matplotlib's hist method are the locations of the bins. I used density=True so I could work with probability densities, but another option could be to just multiply the densities by the number of samples to get expected counts like the previous histogram.
Running this gives me:

I am beginner, and have a question related to plotting in Python

I am new to python.
I wanted to know the syntax for a problem
Suppose I want to plot a quantity x = (constant with a fixed given value) * ln (1+z) versus z (which varies from c to d)
How do I define the variables x and z, how do I input an 'ln' function
I have imported numpy, scipy and matplotlib, but do not know how to proceed thereafter
Since you already imported numpy, here is just another answer:
import numpy as np
import matplotlib.pyplot as plt
x_coeff = 10
c = 0
d = 100
z = [i for i in range(c, d)]
x = [x_coeff * np.log(1+v) for i, v in enumerate(z)]
plt.plot(z, x)
plt.show()
It's always better to check the documents, and give out your first try:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.log.html
You might also need to understand "list comprehension".
It's a beautiful and convenient way to create list in python.
For plotting a curve, you need two lists, one of them is domain on x axis and the other is range points on y-axis. first we take a constant as input,using python inbuilt input function and make sure that it is int, use math library and log function to do log as need.
import math
import matplotlib.pyplot as plt
a = int(input("enter a value for constant : "))
c,d = 0,100
xvals = list(range(c,d,1)) # start,end,step
print(xvals)
yvals = [a*math.log(1+x) for x in xvals]
print(yvals)
plt.plot(xvals,yvals)
plt.show()

Matplotlib plot pmf from list of 2D numpy arrays

I have a dataset from my simulations where I combine the results from each simulation seed into a bigger list using bl.extend(df['column'].tolist()).
I'm also running several simulation scenarios, so I append each scenario to a list of lists.
Finally, I'm computing the Probability Mass Function (PMF) of each list as follows (from How to plot a PMF of a sample?)
for idx,sublist in enumerate(pmf_list):
val, cnt = np.unique(sublist, return_counts=True)
pmf = cnt / float(len(sublist))
plot_pmf.append(np.column_stack((val, pmf)))
The issue is that I end up with a list of numpy arrays which I don't know how to plot. The minimum code to reproduce the problem is the following:
import numpy as np
list1 = np.empty([2, 2])
list2 = np.empty([2, 2])
list3 = np.empty([2, 2])
bl = [] # big list
bl.append(list1)
bl.append(list2)
bl.append(list3)
print bl
I can plot using plt.hist(bl[0]) but it doesn't give me the right results. See plot attached for the following list.
<type 'numpy.ndarray'>
[[0.00000000e+00 1.91734780e-01]
[1.00000000e+00 2.94277080e-02]
[2.00000000e+00 3.28276369e-01]
[3.00000000e+00 4.43357154e-01]
[4.00000000e+00 3.54294582e-03]
[5.00000000e+00 1.57306794e-03]
[6.00000000e+00 2.00530733e-03]
[7.00000000e+00 2.95245485e-05]
[8.00000000e+00 2.24386568e-05]
[9.00000000e+00 2.83435665e-05]
[1.00000000e+01 1.18098194e-06]
[1.20000000e+01 1.18098194e-06]]
Formatting the y-values I get:
0.1944084241
0.0415880165
0.3480178394
0.4031723062
0.0050902199
0.0033411939
0.0040175705
0.0001480127
0.0001031961
0.0001008373
0.0000058969
0.0000011794
0.0000047175
0.0000005897
very different from the y-values on the histogram plot
Does the following graph look right?
import matplotlib.pyplot as plt
import numpy as np
X = np.array([[0.00000000e+00, 1.91734780e-01],
[1.00000000e+00, 2.94277080e-02],
[2.00000000e+00, 3.28276369e-01],
[3.00000000e+00, 4.43357154e-01],
[4.00000000e+00, 3.54294582e-03],
[5.00000000e+00, 1.57306794e-03],
[6.00000000e+00, 2.00530733e-03],
[7.00000000e+00, 2.95245485e-05],
[8.00000000e+00, 2.24386568e-05],
[9.00000000e+00, 2.83435665e-05],
[1.00000000e+01, 1.18098194e-06],
[1.20000000e+01, 1.18098194e-06],])
plt.bar(x=X[:, 0], height=X[:, 1])
plt.show()
If you already have the first column as the possible values of the random variable, and the second column as the corresponding probability values, you could use a bar plot to visualize the PMF.
The histogram plot function plt.hist is for a vector of observed values. For example,
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
np.random.seed(0)
plt.hist(np.random.normal(size=1000))
plt.show()

Slicing a graph

I have created a graph in python but I now need to take a section of the graph and expand this by using a small range of the original data, but I don't know how to find the row number of the results that form the range or how I can create a graph using just these results form the file. This is the code I have for the graph:
import numpy as np
import matplotlib.pyplot as plt
#variable for data to plot
spec_to_plot = "SN2012fr_20121129.42_wifes_BR.dat"
#tells python where to look for the file
spec_directory = '/home/fh1u16/Documents/spectra/'
data = np.loadtxt(spec_directory + spec_to_plot, dtype=np.float)
x = data[:,0]
y = data[:,1]
plt.plot(x, y)
plt.xlabel("Wavelength")
plt.ylabel("Flux")
plt.title(spec_to_plot)
plt.show()
edit: data is between 3.5e+3 and 9.9e+3 in the first column, I need to use just the data between 5.5e+3 and 6e+3 to plot another graph, but this only applies to the first column. Hope this makes a bit more sense?
Python version 2.7
If I understand you correctly, you could do it this way:
my_slice = slice(np.argwhere(x>5.5e3)[0], np.argwhere(x>6e3)[0])
x = data[my_slice,0]
y = data[my_slice,1]
np.argwhere(x>5.5e3)[0] is the index of the first occurrence of x>5.5e3 and like wise for the end of the slice. (assuming your data is sorted)
A more general way working even if your data is not sorted:
mask = (x>5.5e3) & (x<6e3)
x = data[mask, 0]
y = data[mask, 1]
solved by using
plt.axis([5500, 6000, 0, 8e-15])
thanks for help.

Categories

Resources