I'm new to Python and want to perform a rather simple task. I've got a two-dimensional point set, which is stored as binary data (i.e. (x, y)-coordinates) in a file, which I want to visualize. The output should look as in the picture below.
However, I'm somehow overwhelmed by the amount of google results on this topic. And many of them seem to be for three-dimensional point cloud visualization and/or a massive amount of data points. So, if anyone could point me to a suitable solution for my problem, I would be really thankful.
EDIT: The point set is contained in a file which is formatted as follows:
0.000000000000000 0.000000000000000
1.000000000000000 1.000000000000000
1
0.020375738732779 0.026169010160356
0.050815740313746 0.023209931647163
0.072530406907906 0.023975230642589
The first data vector is the one in the line below the single "1"; i.e. (0.020375738732779, 0.026169010160356). How do I read this into a vector in python? I can open the file using f = open("pointset file")
Install and import matplotlib and pyplot:
import matplotlib.pyplot as plt
Assuming this is your data:
x = [1, 2, 5, 1, 5, 7, 8, 3, 2, 6]
y = [6, 7, 1, 2, 6, 2, 1, 6, 3, 1]
If you need, you can use a comprehension to split the coordinates into seperate lists:
x = [p[0] for p in points]
y = [p[1] for p in points]
Plotting is as simple as:
plt.scatter(x=x, y=y)
Result:
Many customizations are possible.
EDIT: following question edit
In order to read the file:
x = []
y = []
with open('pointset_file.txt', 'r') as f:
for line in f:
coords = line.split(' ')
x.append(float(coords[0]))
y.append(float(coords[1]))
You could read your data as follow, and plot using scattr plot. this method is considering for small number of data and not csv, just the format you have presented.
import matplotlib.pyplot as plt
with open("pointset file") as fid:
lines = fid.read().split("\n")
# lines[:2] looks like the bounds for each axis, if yes use it in plot
data = [[float(d) for d in line.split(" ") if d] for line in lines[3:]]
plt.scatter(data[0], data[1])
plt.show()
Assuming you want a plot looking pretty much exactly like the sample image you give, and you want the plot to display the data with both axes in equal proportion, one could use a general purpose multimedia library like pygame to achieve this:
#!/usr/bin/env python3
import sys
import pygame
# windows will never be larger than this in their largest dimension
MAX_WINDOW_SIZE = 400
BG_COLOUR = (255, 255, 255,)
FG_COLOUR = (0, 0, 0,)
DATA_POINT_SIZE = 2
pygame.init()
if len(sys.argv) < 2:
print('Error: need filename to read data from')
pygame.quit()
sys.exit(1)
else:
data_points = []
# read in data points from file first
with open(sys.argv[1], 'r') as file:
[next(file) for _ in range(3)] # discard first 3 lines of file
# now the rest of the file contains actual data to process
data_points.extend(tuple(float(x) for x in line.split()) for line in file)
# file read complete. now let's find the min and max bounds of the data
top_left = [float('+Inf'), float('+Inf')]
bottom_right = [float('-Inf'), float('-Inf')]
for datum in data_points:
if datum[0] < top_left[0]:
top_left[0] = datum[0]
if datum[1] < top_left[1]:
top_left[1] = datum[1]
if datum[0] > bottom_right[0]:
bottom_right[0] = datum[0]
if datum[1] > bottom_right[1]:
bottom_right[1] = datum[1]
# calculate space dimensions
space_dimensions = (bottom_right[0] - top_left[0], bottom_right[1] - top_left[1])
# take the biggest of the X or Y dimensions of the point space and scale it
# up to our maximum window size
biggest = max(space_dimensions)
scale_factor = MAX_WINDOW_SIZE / biggest # all points will be scaled up by this factor
# screen dimensions
screen_dimensions = tuple(sd * scale_factor for sd in space_dimensions)
# basic init and draw all points to screen
display = pygame.display.set_mode(screen_dimensions)
display.fill(BG_COLOUR)
for point in data_points:
# translate and scale each point
x = point[0] * scale_factor - top_left[0] * scale_factor
y = point[1] * scale_factor - top_left[1] * scale_factor
pygame.draw.circle(display, FG_COLOUR, (x, y), DATA_POINT_SIZE)
pygame.display.update()
while True:
for event in pygame.event.get():
if event.type == pygame.QUIT:
pygame.quit()
sys.exit(0)
pygame.time.wait(50)
Execute this script and pass the name of the file which holds your data in as the first argument. It will spawn a window with the data points displayed.
I generated a bunch of uniformly distributed random x,y points to test it, with:
from random import random
for _ in range(1000):
print(random(), random())
This produces a window looking like the following:
If the space your data points are within is not of square size, the window shape will change to reflect this. The largest dimension of the window, either width or height, will always stay at a specified size (I used 400px as a default in my demo).
Admittedly, this is not the most elegant or concise solution, and reinvents the wheel a little bit, however it gives you the most control on how to display your data points, and it also deals with both the reading in of the file data and the display of it.
To read your file:
import pandas as pd
import numpy as np
df = pd.read_csv('your_file',
sep='\s+',
header=None,
skiprows=3,
names=['x','y'])
For now I've created a random dataset
import random
df = pd.DataFrame({'x':[random.uniform(0, 1) for n in range(100)],
'y':[random.uniform(0, 1) for n in range(100)]})
I prefer Plotly for any kind of figure
import plotly.express as px
fig = px.scatter(df,
x='x',
y='y')
fig.show()
From here you can easily update labels, colors, etc.
Explanation:
I have two numpy arrays: dataX and dataY, and I am trying to filter each array to reduce the noise. The image shown below shows the actual input data (blue dots) and an example of what I want it to be like(red dots). I do not need the filtered data to be as perfect as in the example but I do want it to be as straight as possible. I have provided sample data in the code.
What I have tried:
Firstly, you can see that the data isn't 'continuous', so I first divided them into individual 'segments' ( 4 of them in this example), and then applied a filter to each 'segment'. Someone suggested that I use a Savitzky-Golay filter. The full, run-able code is below:
import scipy as sc
import scipy.signal
import numpy as np
import matplotlib.pyplot as plt
# Sample Data
ydata = np.array([1,0,1,2,1,2,1,0,1,1,2,2,0,0,1,0,1,0,1,2,7,6,8,6,8,6,6,8,6,6,8,6,6,7,6,5,5,6,6, 10,11,12,13,12,11,10,10,11,10,12,11,10,10,10,10,12,12,10,10,17,16,15,17,16, 17,16,18,19,18,17,16,16,16,16,16,15,16])
xdata = np.array([1,2,3,1,5,4,7,8,6,10,11,12,13,10,12,13,17,16,19,18,21,19,23,21,25,20,26,27,28,26,26,26,29,30,30,29,30,32,33, 1,2,3,1,5,4,7,8,6,10,11,12,13,10,12,13,17,16,19,18,21,19,23,21,25,20,26,27,28,26,26,26,29,30,30,29,30,32])
# Used a diff array to find where there is a big change in Y.
# If there's a big change in Y, then there must be a change of 'segment'.
diffy = np.diff(ydata)
# Create empty numpy arrays to append values into
filteredX = np.array([])
filteredY = np.array([])
# Chose 3 to be the value indicating the change in Y
index = np.where(diffy >3)
# Loop through the array
start = 0
for i in range (0, (index[0].size +1) ):
# Check if last segment is reached
if i == index[0].size:
print xdata[start:]
partSize = xdata[start:].size
# Window length must be an odd integer
if partSize % 2 == 0:
partSize = partSize - 1
filteredDataX = sc.signal.savgol_filter(xdata[start:], partSize, 3)
filteredDataY = sc.signal.savgol_filter(ydata[start:], partSize, 3)
filteredX = np.append(filteredX, filteredDataX)
filteredY = np.append(filteredY, filteredDataY)
else:
print xdata[start:index[0][i]]
partSize = xdata[start:index[0][i]].size
if partSize % 2 == 0:
partSize = partSize - 1
filteredDataX = sc.signal.savgol_filter(xdata[start:index[0][i]], partSize, 3)
filteredDataY = sc.signal.savgol_filter(ydata[start:index[0][i]], partSize, 3)
start = index[0][i]
filteredX = np.append(filteredX, filteredDataX)
filteredY = np.append(filteredY, filteredDataY)
# Plots
plt.plot(xdata,ydata, 'bo', label = 'Input Data')
plt.plot(filteredX, filteredY, 'ro', label = 'Filtered Data')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Result')
plt.legend()
plt.show()
This is my result:
When each point is connected, the result looks as follows.
I have played around with the order, but it seems like a third order gave the best result.
I have also tried these filters, among a few others:
scipy.signal.medfilt
scipy.ndimage.filters.uniform_filter1d
But so far none of the filters I have tried were close to what I really wanted. What is the best way to filter data such as this? Looking forward to your help.
One way to get something looking close to your ideal would be clustering + linear regression.
Note that you have to provide the number of clusters and I also cheated a bit in scaling up y before clustering.
import numpy as np
from scipy import cluster, stats
ydata = np.array([1,0,1,2,1,2,1,0,1,1,2,2,0,0,1,0,1,0,1,2,7,6,8,6,8,6,6,8,6,6,8,6,6,7,6,5,5,6,6, 10,11,12,13,12,11,10,10,11,10,12,11,10,10,10,10,12,12,10,10,17,16,15,17,16, 17,16,18,19,18,17,16,16,16,16,16,15,16])
xdata = np.array([1,2,3,1,5,4,7,8,6,10,11,12,13,10,12,13,17,16,19,18,21,19,23,21,25,20,26,27,28,26,26,26,29,30,30,29,30,32,33, 1,2,3,1,5,4,7,8,6,10,11,12,13,10,12,13,17,16,19,18,21,19,23,21,25,20,26,27,28,26,26,26,29,30,30,29,30,32])
def split_to_lines(x, y, k):
yo = np.empty_like(y, dtype=float)
# get the cluster centers and the labels for each point
centers, map_ = cluster.vq.kmeans2(np.array((x, y * 2)).T.astype(float), k)
# for each cluster, use the labels to select the points belonging to
# the cluster and do a linear regression
for i in range(k):
slope, interc, *_ = stats.linregress(x[map_==i], y[map_==i])
# use the regression parameters to construct y values on the
# best fit line
yo[map_==i] = x[map_==i] * slope + interc
return yo
import pylab
pylab.plot(xdata, ydata, 'or')
pylab.plot(xdata, split_to_lines(xdata, ydata, 4), 'ob')
pylab.show()
EDIT: I figured out that the Problem always occours if one tries to plot to two different lists of figures. Does that mean that one can not do plots to different figure-lists in the same loop? See latest code for much simpler sample of a problem.
I try to analyze a complex set of data which consists basically about measurements of electric devices under different conditions. Hence, the code is a bit more complex but I tried to strip it down to a working example - however it is still pretty long. Hence, let me explain what you see: You see 3 classes with Transistor representing an electronic device. It's attribute Y represents the measurement data - consisting of 2 sets of measurements. Each Transistor belongs to a group - 2 in this example. And some groups belong to the same series - one series where both groups are included in this example.
The aim is now to plot all measurement data for each Transistor (not shown), then to also plot all data belonging to the same group in one plot each and all data of the same series to one plot. In order to program it in an efficent way without having a lot of loops my idea was to use the object orientated nature of matplotlib - I will have figures and subplots for each level of plotting (initialized in initGrpPlt and initSeriesPlt) which are then filled with only one loop over all Transistors (in MainPlt: toGPlt and toSPlt). In the end it should only be printed / saved to a file / whatever (PltGrp and PltSeries).
The Problem: Even though I specify where to plot, python plots the series plots into the group plots. You can check this yourself by running the code with the line 'toSPlt(trans,j)' and without. I have no clue why python does this because in the function toSPlt I explicetly say that python should use the subplots from the series-subplot-list. Would anyone have an idea to why this is like this and how to solve this problem in an elegent way?
Read the code from the bottom to the top, that should help with understanding.
Kind regards
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
maxNrVdrain = 2
X = np.linspace(-np.pi, np.pi, 256,endpoint=True)
A = [[1*np.cos(X),2*np.cos(X),3*np.cos(X),4*np.cos(X)],[1*np.tan(X),2*np.tan(X),3*np.tan(X),4*np.tan(X)]]
B = [[2* np.sin(X),4* np.sin(X),6* np.sin(X),8* np.sin(X)],[2*np.cos(X),4*np.cos(X),6*np.cos(X),8*np.cos(X)]]
class Transistor(object):
_TransRegistry = []
def __init__(self,y1,y2):
self._TransRegistry.append(self)
self.X = X
self.Y = [y1,y2]
self.group = ''
class Groups():
_GroupRegistry = []
def __init__(self,trans):
self._GroupRegistry.append(self)
self.transistors = [trans]
self.figlist = []
self.axlist = []
class Series():
_SeriesRegistry = []
def __init__(self,group):
self._SeriesRegistry.append(self)
self.groups = [group]
self.figlist = []
self.axlist = []
def initGrpPlt():
for group in Groups._GroupRegistry:
for j in range(maxNrVdrain):
group.figlist.append(plt.figure(j))
group.axlist.append(group.figlist[j].add_subplot(111))
return
def initSeriesPlt():
for series in Series._SeriesRegistry:
for j in range(maxNrVdrain):
series.figlist.append(plt.figure(j))
series.axlist.append(series.figlist[j].add_subplot(111))
return
def toGPlt(trans,j):
colour = cm.rainbow(np.linspace(0, 1, 4))
group = trans.group
group.axlist[j].plot(trans.X,trans.Y[j], color=colour[group.transistors.index(trans)], linewidth=1.5, linestyle="-")
return
def toSPlt(trans,j):
colour = cm.rainbow(np.linspace(0, 1, 2))
series = Series._SeriesRegistry[0]
group = trans.group
if group.transistors.index(trans) == 0:
series.axlist[j].plot(trans.X,trans.Y[j],color=colour[series.groups.index(group)], linewidth=1.5, linestyle="-", label = 'T = nan, RH = nan' )
else:
series.axlist[j].plot(trans.X,trans.Y[j],color=colour[series.groups.index(group)], linewidth=1.5, linestyle="-")
return
def PltGrp(group,j):
ax = group.axlist[j]
ax.set_title('Test Grp')
return
def PltSeries(series,j):
ax = series.axlist[j]
ax.legend(loc='upper right', frameon=False)
ax.set_title('Test Series')
return
def MainPlt():
initGrpPlt()
initSeriesPlt()
for trans in Transistor._TransRegistry:
for j in range(maxNrVdrain):
toGPlt(trans,j)
toSPlt(trans,j)#plots to group plot for some reason
for j in range(maxNrVdrain):
for group in Groups._GroupRegistry:
PltGrp(group,j)
plt.show()
return
def Init():
for j in range(4):
trans = Transistor(A[0][j],A[1][j])
if j == 0:
Groups(trans)
else:
Groups._GroupRegistry[0].transistors.append(trans)
trans.group = Groups._GroupRegistry[0]
Series(Groups._GroupRegistry[0])
for j in range(4):
trans = Transistor(B[0][j],B[1][j])
if j == 0:
Groups(trans)
else:
Groups._GroupRegistry[1].transistors.append(trans)
trans.group = Groups._GroupRegistry[1]
Series._SeriesRegistry[0].groups.append(Groups._GroupRegistry[1])
return
def main():
Init()
MainPlt()
return
main()
latest example that does not work:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
X = np.linspace(-np.pi, np.pi, 256,endpoint=True)
Y1 = np.cos(X)
Y2 = np.sin(X)
figlist1 = []
figlist2 = []
axlist1 = []
axlist2 = []
for j in range(4):
figlist1.append(plt.figure(j))
axlist1.append(figlist1[j].add_subplot(111))
figlist2.append(plt.figure(j))#this should be a new set of figures!
axlist2.append(figlist2[j].add_subplot(111))
colour = cm.rainbow(np.linspace(0, 1, 4))
axlist1[j].plot(X,j*Y1, color=colour[j], linewidth=1.5, linestyle="-")
axlist1[j].set_title('Test Grp 1')
colour = cm.rainbow(np.linspace(0, 1, 4))
axlist2[j].plot(X,j*Y2, color=colour[int(j/2)], linewidth=1.5, linestyle="-")
axlist2[j].set_title('Test Grp 2')
plt.show()
Ok, stupid mistake if one thinks of the Background but maybe someone has a similar Problem and is unable to see the cause as I was first. So here is the solution:
The Problem is that the Name of the listobjects like figlist1[j] do not define the figure - they are just pointers to the actual figure object. and if such an object is created by plt.figure(j) one has to make sure that j is different for each figure - hence, in a Loop where multiple figures shall be initialized one Needs to somehow Change the number of the figure or the first object will be overwritten. Hope that helps! Cheers.
I have python code to create a bezier curve, from which I create a bezier path.
Here are my imports:
import from svgpathtools import Path, Line, CubicBezier
Here is my code:
bezier_curve = CubicBezier(start_coordinate, control_point_1, control_point_2, end_coordinate)
bezier_path = Path(bezier_curve)
I would like to create a list of coordinates that make up this curve, but none of the documentation I am reading gives a straightforward way to do that. bezier_curve and bezier_path only have parameters for the start point, end point, and control point.
Seems like a pretty reasonable question. Surprised there's no answer. I had to do this myself recently, and the secret is point().
Here's how I got it done, using your boilerplate as a starting point:
from svgpathtools import Path, Line, CubicBezier
bezier_curve = CubicBezier(start=(300+100j), control1=(100+100j), control2=(200+200j), end=(200+300j))
bezier_path = Path(bezier_curve)
NUM_SAMPLES = 10
myPath = []
for i in range(NUM_SAMPLES):
myPath.append(bezier_path.point(i/(NUM_SAMPLES-1)))
print(myPath)
Output:
[(300+100j), (243.8957475994513+103.56652949245542j), (206.72153635116598+113.71742112482853j), (185.1851851851852+129.62962962962962j), (175.99451303155004+150.480109739369j), (175.85733882030178+175.44581618655695j), (181.4814814814815+203.7037037037037j), (189.57475994513032+234.43072702331963j), (196.84499314128942+266.8038408779149j), (200+300j)]
The answer given above worked very well for me. I only had to make a tiny modification to the code:
from svgpathtools import Path, Line, CubicBezier
bezier_curve = CubicBezier(start=(300+100j), control1=(100+100j), control2=(200+200j), end=(200+300j))
bezier_path = Path(bezier_curve)
NUM_SAMPLES = 10
myPath = []
for i in range(NUM_SAMPLES):
myPath.append(bezier_path.point(i/(**float(NUM_SAMPLES)**-1)))
print(myPath)
Changing i/(NUM_SAMPLES -1) by i/(float(NUM_SAMPLES) -1) assures a correct behavior when the curve is parametrized from 0 to 1. Otherwise only an integer division is produced.
#to demonstrate lines and cubics, improving readibility
from svgpathtools import Path, Line, CubicBezier
cubic = CubicBezier(300+100j, 100+100j, 200+200j, 200+300j) # A cubic beginning at (300, 100) and ending at (200, 300)
line = Line(200+300j, 250+350j) # A line beginning at (200, 300) and ending at (250, 350)
number_of_points = 10
cubic_points = []
for i in range(number_of_points):
cubic_points.append(cubic.point(i/(NUM_SAMPLES-1)))
print('cubic points', path_points)
line_points = []
for i in range(number_of_points):
line_points.append(line.point(i/(NUM_SAMPLES-1)))
print('line points', path_points)
I needed a more general form to extract multiple paths with different features. Its a recursive solution to work with path, list of path, list of segments and single segments simultaneously. And you can specify the sample_points per segment for curves, but a line stays 2 points to not add extra points without a need:
import svgpathtools.path
def svgpathtools_unpacker(obj, sample_points=10):
path = []
if isinstance(obj, (svgpathtools.path.Path, list)):
for i in obj:
path.extend(svgpathtools_unpacker(i, sample_points=sample_points))
elif isinstance(obj, svgpathtools.path.Line):
path.extend(obj.bpoints())
elif isinstance(obj, (svgpathtools.path.CubicBezier, svgpathtools.path.QuadraticBezier)):
path.extend(obj.points(np.linspace(0,1,sample_points)))
else:
print(type(obj))
return np.array(path)
and how you can use it:
import matplotlib.pyplot as plt
bezier_curve = svgpathtools.path.CubicBezier(start=(300+100j), control1=(100+100j), control2=(200+200j), end=(200+300j))
bezier_quad = svgpathtools.path.QuadraticBezier(bezier_curve.end, control=(200+200j), end=(300+150j))
line = svgpathtools.path.Line(start=bezier_quad.end, end=bezier_curve.start)
bezier_path = svgpathtools.path.Path(bezier_curve, bezier_quad, line)
plt.figure()
for i in [10, 100]:
xy = svgpathtools_unpacker(bezier_path, sample_points=i)
plt.plot(xy.real, xy.imag, label=f'sample_points={i}')
plt.legend()
I wrote some code to shift an array, and was trying to generalize it to handle non-integer shifts using the "shift" function in scipy.ndimage. The data is circular and so the result should wrap around, exactly as the np.roll command does it.
However, scipy.ndimage.shift does not appear to wrap integer shifts properly. The following code snippet shows the discrepancy:
import numpy as np
import scipy.ndimage as sciim
import matplotlib.pyplot as plt
def shiftfunc(data, amt):
return sciim.interpolation.shift(data, amt, mode='wrap', order = 3)
if __name__ == "__main__":
xvals = np.arange(100)*1.0
yvals = np.sin(xvals*0.1)
rollshift = np.roll(yvals, 2)
interpshift = shiftfunc(yvals, 2)
plt.plot(xvals, rollshift, label = 'np.roll', alpha = 0.5)
plt.plot(xvals, interpshift, label = 'interpolation.shift', alpha = 0.5)
plt.legend()
plt.show()
It can be seen that the first couple of values are highly discrepant, while the rest are fine. I suspect this is an implementation error of the prefiltering and interpolation operation when using the wrap option. A way around this would be to modify shiftfunc to revert to np.roll when the shift value is an integer, but this is unsatisfying.
Am I missing something obvious here?
Is there a way to make ndimage.shift coincide with np.roll?
I dont think there is anything wrong with the shift function. when you use roll, your need to chop an extra element for fair comparision. please see the code below.
import numpy as np
import scipy.ndimage as sciim
import matplotlib.pyplot as plt
def shiftfunc(data, amt):
return sciim.interpolation.shift(data, amt, mode='wrap', order = 3)
def rollfunc(data,amt):
rollshift = np.roll(yvals, amt)
# Here I remove one element (first one before rollshift) from the array
return np.concatenate((rollshift[:amt], rollshift[amt+1:]))
if __name__ == "__main__":
shift_by = 5
xvals = np.linspace(0,2*np.pi,20)
yvals = np.sin(xvals)
rollshift = rollfunc(yvals, shift_by)
interpshift = shiftfunc(yvals,shift_by)
plt.plot(xvals, yvals, label = 'original', alpha = 0.5)
plt.plot(xvals[1:], rollshift, label = 'np.roll', alpha = 0.5,marker='s')
plt.plot(xvals, interpshift, label = 'interpolation.shift', alpha = 0.5,marker='o')
plt.legend()
plt.show()
results in