now I have a new dataset that the first 2 columns are X and Y points (in general they represent locations. the data file are pretty large, and for initial data analysis I need to extract specific lines (or close to line data) - is there any way to tell numpy (or python, or pandas) to extract this specific data set - attached is an example, which is simplified and the data are pretty rounded (which is not the case with the real data) but will clearly show what I need:
Example EDIT*
import os
import sys
import numpy as np
X = list(range(45))*3
Y = list(range(1, 91, 2)) + list(range(20, 65, 1)) + list(range(1, 136, 3))
XY = zip(X, Y)
XYarray = np.array(XY).reshape(135, 2)
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
plt.plot(XYarray[:,0], XYarray[:,1], 'ro') #all data
plt.plot(XYarray[0:44,0], XYarray[0:44,1], 'b*') #first line to be teste
#plt.plot(XYarray[45:90,0], XYarray[45:90,1], 'g*') #other lines of interest
#plt.plot(XYarray[91:135,0], XYarray[91:135,1], 'gx') #otherline of ineters
fig.show()
all my data lie within an arbitrary XY array (surface spatial data) -I need to extract lines availlable; for example I want to extract only the uncommented blue star line - and then move the next ones (currently commented in the code
Keep in mind that my actual data are not that regular
Hope that helps
Related
I would like to plot in 3D with Pandas / MatplotLib (Wireframe or other, I do not care) but in a specific way..
I'm using RFID sensors and I'm trying to record the signal I receive at different distance + different angles. And I want to see the correlation between the rising of the distance and the angle.
So that's why I want to plot in 3D :
X Axis -> the Distance, Y Axis -> the Angle, Z Axis -> the signal received which means a float
My CSV file from where I generate my DataFrame is organized like this a double entry table :
Distance;0;23;45;90;120;180
0;-53.145;-53.08;-53.1;-53.035;-53.035;-53.035
5;-53.145;-53.145;-53.05;-53.145;-53.145;-53.145
15;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
25;-53.145;-52.145;-53.145;-53.002;-53.145;-53.145
40;-53.145;-53.002;-51.145;-53.145;-54.255;-53.145
60;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
80;-53.145;-53.145;-53.145;-53.145;-60;-53.145
100;-53.145;-52;-53.145;-54;-53.145;-53.145
120;-53.145;-53.145;-53.145;-53.145;-53.002;-53.145
140;-51.754;-53.145;-51.845;-53.145;-53.145;-53.145
160;-53.145;-53.145;-49;-53.145;-53.145;-53.145
180;-53.145;-53.145;-53.145;-53.145;-53.145;-53.002
200;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
On the first label row we've different angles : 0°, 23°, 45°, ...
And the index of the DataFrame is the distance : 0 cm, 15 cm...
And the matrix inside represents the signal, so, values of Z Axis...
But I do not know how to generate a 3D Scatter, WireFrame... because in every tutorial I see people that use specific columns as axis.
Indeed, in my CSV file on the first row I've the label of all columns
Distance;0 ;23 ;45 ;90 ;120;180
And I do not know how to generate a 3D plot with a double entry table.
Do you know how to do it ? Or, to generate my CSV file in a better way to see the same result at the end !
I would be grateful if you would help me about this !
Thank you !
maybe contour is enough
b = np.array([0,5,15,25,40,60,80,100,120,140,160,180,200])
a = np.array([0,23,45,90,120,180])
x, y = np.meshgrid(a, b)
z = np.random.randint(-50,-40, (x.shape))
scm = plt.contourf(x, y, z, cmap='inferno')
plt.colorbar(scm)
plt.xticks(a)
plt.yticks(b)
plt.xlabel('Distance')
plt.ylabel('Angle')
plt.show()
displays
You can get a contour plot with something like this (but for the data shown it is not very interesting since all the values are constant at -45):
df = pd.read_csv(sep=';')
df = df.set_index('Distance')
x = df.index
y = df.columns.astype(int)
z = df.values
X,Y = np.meshgrid(x,y)
Z = z.T
plt.contourf(X,Y,Z,cmap='jet')
plt.colorbar()
plt.show()
Welcome to stackoverflow, your question can be split into several steps:
Step 1 - read the data
I have stored your data in a file called data.txt.
I don't know Pandas very well but this can also be handled with the nice simple function of Numpy called loadtxt. Your data is a bit problematic because of the text 'Distance' value in the first column and first row. But don't panic we load the file as a matrix of strings:
raw_data = np.loadtxt('data.txt', delimiter=';', dtype=np.string_)
Step 2 - transform the raw data
To extract the wanted data from the raw data we can do the following:
angle = raw_data[0 , 1:].astype(float)
distance = raw_data[1:, 0 ].astype(float)
data = raw_data[1:, 1:].astype(float)
With indexing the raw data we select the data that we want and with astype we change the string values to numbers.
Intermediate step - making the data a bit fancier
Your data was a bit boring, only the value -45, i took the liberty to make it a bit fancier:
data = (50 + angle[np.newaxis,:]) / (10 + np.sqrt(distance[:,np.newaxis]))
Step 4 - make a wireframe plot
The example at matplotlib.org looks easy enough:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(X, Y, Z)
plt.show()
But the trick is to get the X, Y, Z parameters right...
Step 3 - make the X and Y data
The Z data is simply our data values:
Z = data
The X and Y should also be 2D array's such that plot_wireframe can find the x and y for each value of Z in the 2D arrays X an Y at the same array locations. There is a Numpy function to create these 2D array's:
X, Y = np.meshgrid(angle, distance)
Step 5 - fancing it up a bit
ax.set_xticks(angle)
ax.set_yticks(distance[::2])
ax.set_xlabel('angle')
ax.set_ylabel('distance')
Putting it together
All steps together in the right order:
# necessary includes...
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
import numpy as np
raw_data = np.loadtxt('data.txt', delimiter=';', dtype=np.string_)
angle = raw_data[0 , 1:].astype(float)
distance = raw_data[1:, 0 ].astype(float)
data = raw_data[1:, 1:].astype(float)
# make the example data a bit more interesting...
data = (50 + angle[np.newaxis,:]) / (10 + np.sqrt(distance[:,np.newaxis]))
# setting up the plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# the trickey part creating the data that plot_wireframe wants
Z = data
X, Y = np.meshgrid(angle, distance)
ax.plot_wireframe(X, Y, Z)
# fancing it up a bit
ax.set_xticks(angle)
ax.set_yticks(distance[::2])
ax.set_xlabel('angle')
ax.set_ylabel('distance')
# and showing the plot ...
plt.show()
I am trying to plot a single line (or tube) in Mayavi that has a non-constant width or radius. This seems like a simple task though I may not be understanding what is happening behind the scenes well enough to make this happen.
The following code creates the line I want, and I am able to scale by color; however, I would also like to scale by width.
import mayavi.mlab as mlab
import numpy as np
x = range(100)
y = range(100)
z = range(100)
s = np.random.uniform(0, 1, 100)
mlab.plot3d(x, y, z, s, tube_radius=10)
I don't have an image of the desired output as I am unable to create it, though it would essentially be the preceding image scaled by radius instead of color, so that some areas of the line would be wider than other areas. One possible solution would be to use the tube_radius parameter and plot each section individually, though this really seems like poor practice as the lines can get quite long and have many different sections.
In the GUI, you can go to the Tube pipeline and use Vary_radius = 'vary_radius_by_scalar'
In the script you can do
import mayavi.mlab as mlab
import numpy as np
x = range(100)
y = range(100)
z = range(100)
s = np.random.uniform(0, 1, 100)
t = mlab.plot3d(x, y, z, s, tube_radius=10)
t.parent.parent.filter.vary_radius = 'vary_radius_by_scalar'
Since the parent of the surface is the Module manager (colors, etc) and its parent is the Tube pipeline
Is there a way to extract the data from an array, which corresponds to a line of a contourplot in python? I.e. I have the following code:
n = 100
x, y = np.mgrid[0:1:n*1j, 0:1:n*1j]
plt.contour(x,y,values)
where values is a 2d array with data (I stored the data in a file but it seems not to be possible to upload it here). The picture below shows the corresponding contourplot. My question is, if it is possible to get exactly the data from values, which corresponds e.g. to the left contourline in the plot?
Worth noting here, since this post was the top hit when I had the same question, that this can be done with scikit-image much more simply than with matplotlib. I'd encourage you to check out skimage.measure.find_contours. A snippet of their example:
from skimage import measure
x, y = np.ogrid[-np.pi:np.pi:100j, -np.pi:np.pi:100j]
r = np.sin(np.exp((np.sin(x)**3 + np.cos(y)**2)))
contours = measure.find_contours(r, 0.8)
which can then be plotted/manipulated as you need. I like this more because you don't have to get into the deep weeds of matplotlib.
plt.contour returns a QuadContourSet. From that, we can access the individual lines using:
cs.collections[0].get_paths()
This returns all the individual paths. To access the actual x, y locations, we need to look at the vertices attribute of each path. The first contour drawn should be accessible using:
X, Y = cs.collections[0].get_paths()[0].vertices.T
See the example below to see how to access any of the given lines. In the example I only access the first one:
import matplotlib.pyplot as plt
import numpy as np
n = 100
x, y = np.mgrid[0:1:n*1j, 0:1:n*1j]
values = x**0.5 * y**0.5
fig1, ax1 = plt.subplots(1)
cs = plt.contour(x, y, values)
lines = []
for line in cs.collections[0].get_paths():
lines.append(line.vertices)
fig1.savefig('contours1.png')
fig2, ax2 = plt.subplots(1)
ax2.plot(lines[0][:, 0], lines[0][:, 1])
fig2.savefig('contours2.png')
contours1.png:
contours2.png:
plt.contour returns a QuadContourSet which holds the data you're after.
See Get coordinates from the contour in matplotlib? (which this question is probably a duplicate of...)
I have got this code to generate a surface plot. But it gives a zero division error. I am not able to figure out what is wrong. Thank you.
import pylab, csv
import numpy
from mayavi.mlab import *
def getData(fileName):
try:
data = csv.reader(open(fileName,'rb'))
except:
print 'File not found'
else:
data = [[float(row[0]), float(row[1]),float(row[2])] for row in data]
x = [row[0] for row in data]
y = [row[1] for row in data]
z = [row[2] for row in data]
return (x, y, z)
def plotData(fileName):
xVals, yVals, zVals = getData(fileName)
xVals = pylab.array(xVals)
yVals = pylab.array(yVals)
zVals = (pylab.array(zVals)*10**3)
x, y = numpy.mgrid[-0.5:0.5:0.001, -0.5:0.5:0.001]
s = surf(x, y, zVals)
return s
plotData('data')
If I have understood the code correctly, there is a problem with zVals in mayavi.mlab.surf.
According to the documentation of the function, s is the elevation matrix, a 2D array, where indices along the first array axis represent x locations, and indices along the second array axis represent y locations. Your file reader seems to return a 1D vector instead of an array.
However, this may not be the most difficult problem. Your file seems to contain triplets of x, y, and z coordinates. You can use mayavi.mlab.surf only if your x and y coordinates in the file form a regular square grid. If this is the case, then you just have to recover that grid and form nice 2D arrays of all three parts. If the points are in the file in a known order, it is easy, otherwise it is rather tricky.
Maybe you would want to start with mayavi.mlab.points3d(xVals, yVals, zVals). That will give you an overall impression of your data. (Or if already know more about your data, you might give us a hint by editing your question and adding more information!)
Just to give you an idea of probably slightly pythonic style of writing this, your code is rewritten (and surf replaced) in the following:
import mayavi.mlab as ml
import numpy
def plot_data(filename):
data = numpy.loadtxt(filename)
xvals = data[:,0]
yvals = data[:,1]
zvals = data[:,2] * 1000.
return ml.points3d(x, y, z)
plot_data('data')
(Essential changes: the use of numpy.loadtxt, get rid of pylab namespace here, no import *, no CamelCase variable or function names. For more information, see PEP 8.)
If you only need to see the shape of the surface, and the data in the file is ordered row-by-row and with the same number of data points in each row (i.e. fixed number of columns), then you may use:
import mayavi.mlab as ml
import numpy
importt matplotlib.pyplot as plt
# whatever you have as the number of points per row
columns = 13
data = numpy.loadtxt(filename)
# draw the data points into a XY plane to check that they really for a rectangular grid:
plt.plot(data[:,0], data[:,1])
# draw the surface
zvals = data[:,2].reshape(-1,columns)
ml.surf(zvals, warp_scale='auto')
As you can see, this code allows you to check that your values really are in the right kind of grid. It does not check that they are in the correct order, but at least you can see they form a nice grid. Also, you have to input the number of columns manually. The keyword warp_scale takes care of the surface scaling so that it should look reasonable.
I have the following script which reads the ascii file of two columns and generates 1D plot. The graph has several peaks. What I want is to give all the peak a number like first peak 1, second peak 2 and so on. The peaks appear in an equidistant position in X axis. Can someone tell me how to do that in python. The code-
from pylab import*
# Read the file.
f2 = open('d012_SAXS-recomb.txt', 'r')
# read the whole file into a single variable, which is a list of every row of the file.
lines = f2.readlines()[2:-100]
f2.close()
# initialize some variable to be lists:
x1 = []
y1 = []
# scan the rows of the file stored in lines, and put the values into some variables:
for line in lines:
p = line.split()
x1.append(float(p[0]))
y1.append(float(p[1]))
x = np.array(x1)
y = np.array(y1)
xlim(0.0,4.0)
# now, plot the data:
#subplot(211)
plt.plot(x, y, color='orange',linewidth=2.0, linestyle='-', label='Arabic - LPP''\nRoman - SPP''\nAsterisk - CHOL')
legend(loc='upper right')
xlabel('q')
ylabel('Intensity')
plt.show()
Here's some example code that finds the first (highest) peak. (BTW, I'm using pylab here, so the plot and numpy modules are already imported).
x = linspace(0,10,501)
y = exp(-0.2*x)*sin(x)
k = y.argmax()
plot(x,y)
text(x[k],y[k],'Peak1')
Try that to get started.