Step plot by reading from file - python

I am a newbie to matplotlib. I am trying to plot step function and having some trouble. Right now I am able to read from the file and plot it as shown below. But the graph in the top is not in steps and the one below is not a proper step. I saw examples to plot step function by giving x & y value. I am not sure how to do it by reading from a file though. Can someone help me?
from pylab import plotfile, show, gca
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
fname = cbook.get_sample_data('sample.csv', asfileobj=False)
plotfile(fname, cols=(0,1), delimiter=' ')
plotfile(fname, cols=(0,2), newfig=False, delimiter=' ')
plt.show()
Sample inputs(3 columns):
27023927 3 0
27023938 2 0
27023949 3 0
27023961 2 0
27023972 3 0
27023984 2 0
27023995 3 0
27024007 2 0
27024008 2 1
27024018 3 1
27024030 2 1
27024031 2 0
27024041 3 0
27024053 2 0
27024054 2 1
27024098 2 0
Note: I have made the y-axis1 values as 3 & 2 so that this graph can occur in the top and another y-axis2 values 0 & 1 so that it comes in the bottom as shown below
Waveform as it looks now

Essentially your resolution is too low, for the lower plot the steps (except the last one) occur over 1 unit in x, while the steps are about an order of magnitude larger. This gives the appearance of steps while if you zoom in you will see the vertical lines have a non-infinite gradient (true steps change with an infinite gradient).
This is the same problem for both the top and bottom plots. We can easily remedy this by using the step function. You will generally find it easier to import the data, in this example I use the powerful numpy genfromtxt. This loads the data as an array data:
import numpy as np
import matplotlib.pylab as plt
data = np.genfromtxt('test.csv', delimiter=" ")
ax1 = plt.subplot(2,1,1)
ax1.step(data[:,0], data[:,1])
ax2 = plt.subplot(2,1,2)
ax2.step(data[:,0], data[:,2])
plt.show()
If you are new to python then there may be two things to mention, we use two subplots (ax1 and ax2) to plot the data rather than plotting on the same plot (this means you wouldn't need to add values to spatially separate them). We access the elements of the array through the [] this gives the [column, row] with : meaning all columns and and index i being the ith column

I would propose to load the data to a numpy array
import numpy as np
data = np.loadtxt('sample.csv')
And than plot it:
# first point
ax = [data[0,0]]
ay = [data[0,1]]
for i in range(1, data.shape[0]):
if ay[-1] != data[i,1]: # if y value has changed
# add current x and old y
ax.append(data[i,0])
ay.append(ay[-1])
# add current x and current y
ax.append(data[i,0])
ay.append(data[i,1])
import matplotlib.pyplot as plt
plt.plot(ax,ay)
plt.show()
What my solution differs from yours, is that I plot two points for every change in y. The two points produce this 90 degree bend. I Only plot the first curve. Change [?,1] to [?,2] for the second one.

Thanks for the suggestions. I was able to plot it after some research and here is my code,
import csv
import datetime
import matplotlib.pyplot as plt
import numpy as np
import dateutil.relativedelta as rd
import bisect
import scipy as sp
fname = "output.csv"
portfolio_list = []
x = []
a = []
b = []
portfolio = csv.DictReader(open(fname, "r"))
portfolio_list.extend(portfolio)
for data in portfolio_list:
x.append(data['i'])
a.append(data['a'])
b.append(data['b'])
stepList = [0, 1,2,3]
fig = plt.figure(figsize=(20, 10))
ax = fig.add_subplot(111)
plt.step(x, a, 'g', where='post')
plt.step(x, b, 'r', where='post')
plt.show()
and got the image like,

Related

Trying to resolve problem in pandas-python

I have one question. I have point cloud data, and now I have to read and plot the points. If anyone can help me, I would be very thankful. I am using python(pandas, matplotlib,...), and I got all values of X,Y,Z but don't know how to plot all of them to get 3D plot. The values are taken from point cloud data and it has 170 rows and 254 combinations of x,y,z,I,N values.
https://datalore.jetbrains.com/notebook/n9MPhjVrtrIoU1buWmQuDh/MT7MrS1buzmbD7VSDqhGqu/
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
import pandas as pd
df1 = pd.read_csv('cloud.txt',delimiter='\t')
pd.set_option('display.max_columns', None)
df1 = df1.apply (pd.to_numeric, errors='coerce')
#cloud.dropna()
df1.fillna(0,axis=0,inplace=True)
df2=df1.iloc[:,:-1]
df2.head(170)
kolone=[]
i=1
while i<6:
kolone.append(i)
i=i+1
display(kolone)
c=[]
columns=kolone*224
c=c+columns
df2.columns=c
display(df2)
#Reading the points: 1 column is x value, 2 column is y value and
3 column is z value. 4 and 5 are intensity and noise values and
they are not important for this.
#First row is exchanged with numerisation of columns: adding
values 1,2,3,4,5 or x,y,z,I,N values.
x=df2[1]
y=df2[2]
z=df2[3]
r=[]
i=1
while i<225:
r.append(i)
i=i+1
#print(r)
x.columns=r
display(x)
#Reading x coordinates--224 values of x
i=1
p=[]
while i<225:
p.append(i)
i=i+1
#print(p)
y.columns=p
display(y)
#Reading y coordinates--224 values of y
i=1
q=[]
while i<225:
q.append(i)
i=i+1
#print(q)
z.columns=q
display(z)
#Reading z coordinates--224 values of z
It is a bit upsetting that you haven't tried anything at all yet. The documentation page for matplotlib's 3D scatter plot includes a complete example.
There is no point in going to all that trouble to assign column names. Indeed, there is really no point in using pandas at all for this; you could read the CSV directly into a numpy array. However, assuming you have a dataframe with unnamed columns, it's still pretty easy.
In this code, I create a 50x3 array of random integers, then I pull the columns as lists and pass them to scatter. You ought to be able to adapt this to your own code.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randint( 256, size=(50,3))
df = pd.DataFrame(data)
x = df[0].tolist()
y = df[1].tolist()
z = df[2].tolist()
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
ax.scatter( x, y, z )
plt.show()

Plotting colored lines connecting individual data points of two swarmplots

I have:
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
# Generate random data
set1 = np.random.randint(0, 40, 24)
set2 = np.random.randint(0, 100, 24)
# Put into dataframe and plot
df = pd.DataFrame({'set1': set1, 'set2': set2})
data = pd.melt(df)
sb.swarmplot(data=data, x='variable', y='value')
The two random distributions plotted with seaborn's swarmplot function:
I want the individual plots of both distributions to be connected with a colored line such that the first data point of set 1 in the dataframe is connected with the first data point of set 2.
I realize that this would probably be relatively simple without seaborn but I want to keep the feature that the individual data points do not overlap.
Is there any way to access the individual plot coordinates in the seaborn swarmfunction?
EDIT: Thanks to #Mead, who pointed out a bug in my post prior to 2021-08-23 (I forgot to sort the locations in the prior version).
I gave the nice answer by Paul Brodersen a try, and despite him saying that
Madness lies this way
... I actually think it's pretty straight forward and yields nice results:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
# Generate random data
rng = np.random.default_rng(42)
set1 = rng.integers(0, 40, 5)
set2 = rng.integers(0, 100, 5)
# Put into dataframe
df = pd.DataFrame({"set1": set1, "set2": set2})
print(df)
data = pd.melt(df)
# Plot
fig, ax = plt.subplots()
sns.swarmplot(data=data, x="variable", y="value", ax=ax)
# Now connect the dots
# Find idx0 and idx1 by inspecting the elements return from ax.get_children()
# ... or find a way to automate it
idx0 = 0
idx1 = 1
locs1 = ax.get_children()[idx0].get_offsets()
locs2 = ax.get_children()[idx1].get_offsets()
# before plotting, we need to sort so that the data points
# correspond to each other as they did in "set1" and "set2"
sort_idxs1 = np.argsort(set1)
sort_idxs2 = np.argsort(set2)
# revert "ascending sort" through sort_idxs2.argsort(),
# and then sort into order corresponding with set1
locs2_sorted = locs2[sort_idxs2.argsort()][sort_idxs1]
for i in range(locs1.shape[0]):
x = [locs1[i, 0], locs2_sorted[i, 0]]
y = [locs1[i, 1], locs2_sorted[i, 1]]
ax.plot(x, y, color="black", alpha=0.1)
It prints:
set1 set2
0 3 85
1 30 8
2 26 69
3 17 20
4 17 9
And you can see that the data is linked correspondingly in the plot.
Sure, it's possible (but you really don't want to).
seaborn.swarmplot returns the axis instance (here: ax). You can grab the children ax.get_children to get all plot elements. You will see that for each set of points there is an element of type PathCollection. You can determine the x, y coordinates by using the PathCollection.get_offsets() method.
I do not suggest you do this! Madness lies this way.
I suggest you have a look at the source code (found here), and derive your own _PairedSwarmPlotter from _SwarmPlotter and change the draw_swarmplot method to your needs.

Error when trying to plot multi-colored line in Python

I am unable to plot a variable where the points are coloured by reference to an index. What I ultimately want is the line-segment of each point (connecting to the next point) to be a particular colour. I tried with both Matplotlib and pandas. Each method throws a different error.
Generating a trend-line:
datums = np.linspace(0,10,5)
sinned = np.sin(datums)
plt.plot(sinned)
So now we generate a new column of the labels:
sinned['labels'] = np.where((sinned < 0), 1, 2)
print(sinned)
Which generate our final dataset:
0 labels
0 0.000000 2
1 0.598472 2
2 -0.958924 1
3 0.938000 2
4 -0.544021 1
And now for the plotting attempt:
plt.plot(sinned[0], c = sinned['labels'])
Which results in the error: length of rgba sequence should be either 3 or 4
I also tried setting the labels to be the strings 'r' or 'b', which didn't work either :-/
1 and 2 are not a color, 'b'lue and 'r'ed are used in the example below. You need to plot each separately.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
datums = np.linspace(0,10,5)
sinned = pd.DataFrame(data=np.sin(datums))
sinned['labels'] = np.where((sinned < 0), 'b', 'r')
fig, ax = plt.subplots()
for s in range(0, len(sinned[0]) - 1):
x=(sinned.index[s], sinned.index[s + 1])
y=(sinned[0][s], sinned[0][s + 1])
ax.plot(x, y, c=sinned['labels'][s])
plt.show()

Multiple legends and multiple colors/shapes matplotlib

I want to plot data from about 20+ files at same time. I am trying to plot each set of data from each file in different color and each with different legend. I have seen some examples and also the matplotlib tutorial but I am little lost here. How to put legends and give different shapes for every set.
e.g: The inputs are set of data from several files with separate thresholds.
filenames: file1_th0, file1_th0.1 and so on. So i want to make all similar threshold data of different files of same shape/color. Also give proper legends. I can plot very well which ever data set I need but I am not able to put separate shapes for different threshold value. Any suggestion in this regards will be great.
Code:
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
from pylab import*
import math
from matplotlib.ticker import LogLocator
for fname in ('file1_th0', 'file1_th0.1','file1_th0.01', 'file1_th0.001', 'file1_th0.001'):
data=np.loadtxt(fname)
X=data[:,2]
sorted_data = np.sort(X)
cdf=np.arange(len(sorted_data))/float(len(sorted_data))
ccdf = 1 - cdf
plt.plot(sorted_data,ccdf,'r-', label = 'label1')
for fname in ('file2_th0', 'file2_th0.1', 'file2_th0.01', 'file2_th0.001','file2_th0.0001'):
data=np.loadtxt(fname)
X=data[:,2]
sorted_data = np.sort(X)
cdf=np.arange(len(sorted_data))/float(len(sorted_data))
ccdf = 1 - cdf
plt.plot(sorted_data,cdf,'b-')
for fname in ('file3_th0','file3_th0.1','file3_th0.01','file3_th0.001', 'file3_th0.0001'):
data=np.loadtxt(fname)
X=data[:,4]
sorted_data = np.sort(X)
cdf=np.arange(len(sorted_data))/float(len(sorted_data))
ccdf = 1 - cdf
plt.plot(sorted_data,cdf,'m-')
for fname in ('file4_th0', 'file4_th0.1', 'file4_th0.01', 'file4_th0.001','file4_th0.0001'):
data=np.loadtxt(fname)
X=data[:,4]
sorted_data = np.sort(X)
cdf=np.arange(len(sorted_data))/float(len(sorted_data))
ccdf = 1 - cdf
plt.plot(sorted_data,cdf,'c--')
plt.xlabel('this is x!')
plt.ylabel('this is y!')
plt.gca().set_xscale("log")
#plt.gca().set_yscale("log")
plt.show()
First of all, you need to add labels and markers to your plot calls and add a legend call, e.g:
b=np.arange(0,20,1)
c=b*0.5
d=b*2
plt.plot(b,d,color='r',marker='o',label='set 1')
plt.plot(b,c,color='g',marker='*',label='set 2')
plt.legend(loc='upper left')
However in your looped example you will end up with lots of identical legend entries, which I presume you don't want.
To get round it, you could:
n=0
for whatever in whatever: # e.g. your for loops
# do stuff with whatever
if n==0:
plt.plot(sorted_data,cdf,color='r',marker='o',label='set 1')
else:
plt.plot(sorted_data,cdf,color='r',marker='o')
n += 1

Matplotlib contour plot with intersecting contour lines

I am trying to make a contour plot of the following data using matplotlib in python. The data is of this form -
# x y height
77.23 22.34 56
77.53 22.87 63
77.37 22.54 72
77.29 22.44 88
The data actually consists of nearly 10,000 points, which I am reading from an input file. However the set of distinct possible values of z is small (within 50-90, integers), and I wish to have a contour lines for every such distinct z.
Here is my code -
import matplotlib
import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import csv
import sys
# read data from file
data = csv.reader(open(sys.argv[1], 'rb'), delimiter='|', quotechar='"')
x = []
y = []
z = []
for row in data:
try:
x.append(float(row[0]))
y.append(float(row[1]))
z.append(float(row[2]))
except Exception as e:
pass
#print e
X, Y = np.meshgrid(x, y) # (I don't understand why is this required)
# creating a 2D array of z whose leading diagonal elements
# are the z values from the data set and the off-diagonal
# elements are 0, as I don't care about them.
z_2d = []
default = 0
for i, no in enumerate(z):
z_temp = []
for j in xrange(i): z_temp.append(default)
z_temp.append(no)
for j in xrange(i+1, len(x)): z_temp.append(default)
z_2d.append(z_temp)
Z = z_2d
CS = plt.contour(X, Y, Z, list(set(z)))
plt.figure()
CB = plt.colorbar(CS, shrink=0.8, extend='both')
plt.show()
Here is the plot of a small sample of data -
Here is a close look to one of the regions of the above plot (note the overlapping/intersecting lines) -
I don't understand why it doesn't look like a contour plot. The lines are intersecting, which shouldn't happen. What can be possibly wrong? Please help.
Try to use the following code. This might help you -- it's the same thing which was in the Cookbook:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.mlab import griddata
# with this way you can load your csv-file really easy -- maybe you should change
# the last 'dtype' to 'int', because you said you have int for the last column
data = np.genfromtxt('output.csv', dtype=[('x',float),('y',float),('z',float)],
comments='"', delimiter='|')
# just an assigning for better look in the plot routines
x = data['x']
y = data['y']
z = data['z']
# just an arbitrary number for grid point
ngrid = 500
# create an array with same difference between the entries
# you could use x.min()/x.max() for creating xi and y.min()/y.max() for yi
xi = np.linspace(-1,1,ngrid)
yi = np.linspace(-1,1,ngrid)
# create the grid data for the contour plot
zi = griddata(x,y,z,xi,yi)
# plot the contour and a scatter plot for checking if everything went right
plt.contour(xi,yi,zi,20,linewidths=1)
plt.scatter(x,y,c=z,s=20)
plt.xlim(-1,1)
plt.ylim(-1,1)
plt.show()
I created a sample output file with an Gaussian distribution in 2D. My result with using the code from above:
NOTE:
Maybe you noticed that the edges are kind of cropped. This is due to the fact that the griddata-function create masked arrays. I mean the border of the plot is created by the outer points. Everything outside the border is not there. If your points would be on a line then you will not have any contour for plotting. This is kind of logical. I mention it, cause of your four posted data points. It seems likely that you have this case. Maybe you don't have it =)
UPDATE
I edited the code a bit. Your problem was probably that you didn't resolve the dependencies of your input-file correctly. With the following code the plot should work correctly.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.mlab import griddata
import csv
data = np.genfromtxt('example.csv', dtype=[('x',float),('y',float),('z',float)],
comments='"', delimiter=',')
sample_pts = 500
con_levels = 20
x = data['x']
xmin = x.min()
xmax = x.max()
y = data['y']
ymin = y.min()
ymax = y.max()
z = data['z']
xi = np.linspace(xmin,xmax,sample_pts)
yi = np.linspace(ymin,ymax,sample_pts)
zi = griddata(x,y,z,xi,yi)
plt.contour(xi,yi,zi,con_levels,linewidths=1)
plt.scatter(x,y,c=z,s=20)
plt.xlim(xmin,xmax)
plt.ylim(ymin,ymax)
plt.show()
With this code and your small sample I get the following plot:
Try to use my snippet and just change it a bit. For example, I had to change for the given sample csv-file the delimitter from | to ,. The code I wrote for you is not really nice, but it's written straight foreword.
Sorry for the late response.

Categories

Resources