Line/surface of best fit for 3D scatterplot from Excel data - python

I have a database of the height, weight, and age of 100s of people. Using matplotlib, I've been able to create a 3D scatterplot of these 3 variables with the xyz co-ordinates of each point representing the (height,weight,age) of one person.
Is it possible to create a (i) line (ii) surface of best fit for the data? The meshgrid would be incomplete since I don't have an age (z) value for each pairing of height and weight (x,y) values. Can we draw the line/surface regardless? Do I have to impute the missing z-values in the meshgrid, and if so, how would I do that?
Most other answers I've seen on this topic assume z is a function of x and y, or that the meshgrid is complete, both of which are not the case here.

Can you try using numpy.meshgrid and then fill your unknown z values with numpy.nan? Matplotlib should ignore numpy.nan from the plots.
By 'best fit' did you mean an interpolation? If so you can pass your data through scipy.interpolate.RectBivariateSpline. I think that would suit your problem?

Related

In which field does 3d graph plotting used?

We know that we can plot 3d graphs in MatPlotLib but in which field does it used and for what purpose
3D graphs are used when you need to establish the relationship between 3 variables(x,y and z).
This application can be used in the following fields:
1)geographical area: In this field X,Y is used as latitude,longitude and Z can be used as Altitude to replicate the geographical area like hills,buildings etc..
2)Geometry: To visualize the 3d objects like plane,sphere,cube etc in three dimensional space we use 3d plotting.
3)Statistics: To compare two variables on third variable we use 3d plots like 3d barchart, Scatter plot etc..
There are many other fields where 3d plotting is used instead of 2d plot, as it provides more information visually.
When you are working with 3 variables and want to plot a graph in between them and identify the relationship between them then you should use 3d graphs.
There can be many use cases of 3d plots:
To describe the position of a point in a plane if the position varies with time, we now need 3 measurements- x-axis distance, y-axis distance, and time elapsed.
To describe the position of a point in 3D space, we need 3 measurements: x distance, y distance and z distance.
To describe the position of a point in 3D space, if the position varies with time, we need 4 measurements: x distance, y distance, z distance and time.
Each of those measurements represents a dimension, and each dimension requires it’s own axis.
Follow this docs for more information

Heatmap from 1D array or list, what should x-axis be?

I have a 1D array consisting of over 100,000 values. If I were to plot it on a scatterplot, it would pretty much just be one solid color block. So, I want to use a heatmap instead.
I saw various methods, but they either want a 2D array or have "x" and "y" values. If my 1D array values were y, what should the x-axis be? I only want to see how highly "concentrated" those values are in one area of the plot.
plt.imshow() requires at least a 2D array.
plt.pcolormesh() requires X and Y.
plt.hexbin() requires X and Y.
np.histogram2d() requires X and Y (from an example I saw).
Thank you.

Contour plot on 2 parameter data in python

Hi I have a numpy object with shape (1000,3) that I wish do a contour plot of. The first two columns represent x and y values and the 3rd is the associated density value at the the point denoted by the x and y values. These are NOT evenly spaced as the x and y values were generated by MCMC sampling methods. I wish to plot the x and y values and demarcate points which have density at a certain level.
I have tried calling the contour function but it does not appear to work.
presuming I have a data object such that np.shape(data) gives (1000,3)
plt.figure()
contour(data[:,0],data[:,1],data[:,2])
plt.show()
this does not seem to work and gives the following error
TypeError: Input z must be a 2D array
I understand z, the 3rd column needs to be some sort of meshgrid, but the all the examples I have seen seen to rely on constructing one from evenly spaced x and y which I do not have. Help appreciated on how I can resolve this.
EDIT: I have figured it out. Need to use the method for unevenly spaced points as described here.
https://matplotlib.org/devdocs/gallery/images_contours_and_fields/griddata_demo.html#sphx-glr-gallery-images-contours-and-fields-griddata-demo-py

4D heat map in matplotlib

I want to plot a 4D heatmap in Python through matplotlib, like this 4d map.
I have already a set of 3D grid points (x,y,z) and its corresponding function value f.
I am thinking of plotting it using plot_surface with x, y, z as the three required arrays, and alter the color gradient using f.
There is a way here to use f for the color gradient, but I have trouble plotting the 3D grid, which I will emphasize that the third dimension is independent of the first two. (The second link shows otherwise.)
Or are there any way to better visualize this 4D data using matplotlib?
Your data is of a slightly different form I imagine, but as long as you have a point for every thing you need to be plotted you could use something like they did here:
How to make a 4d plot using Python with matplotlib
There aren't great existing ways to visualize true 4D functions (where the third dimension is independent of the first two as you described), so I wrote a small package plot4d. It should be able to help you visualize your function.
from plot4d import plotter
f = lambda x, y, z: sin(x)*y*cos(z)-x**3
z_range = np.linspace(0,2,10)
frame = plotter.Frame2D(xmin=0, xmax=1, ymin=0, ymax=1)
plotter.plot4d(f, z_range, frame=frame, func_name='f')
Installation:
pip install plot4d

How can I account for identical data points in a scatter plot?

I'm working with some data that has several identical data points. I would like to visualize the data in a scatter plot, but scatter plotting doesn't do a good job of showing the duplicates.
If I change the alpha value, then the identical data points become darker, which is nice, but not ideal.
Is there some way to map the color of a dot to how many times it occurs in the data set? What about size? How can I assign the size of the dot to how many times it occurs in the data set?
As it was pointed out, whether this makes sense depends a bit on your dataset. If you have reasonably discrete points and exact matches make sense, you can do something like this:
import numpy as np
import matplotlib.pyplot as plt
test_x=[2,3,4,1,2,4,2]
test_y=[1,2,1,3,1,1,1] # I am just generating some test x and y values. Use your data here
#Generate a list of unique points
points=list(set(zip(test_x,test_y)))
#Generate a list of point counts
count=[len([x for x,y in zip(test_x,test_y) if x==p[0] and y==p[1]]) for p in points]
#Now for the plotting:
plot_x=[i[0] for i in points]
plot_y=[i[1] for i in points]
count=np.array(count)
plt.scatter(plot_x,plot_y,c=count,s=100*count**0.5,cmap='Spectral_r')
plt.colorbar()
plt.show()
Notice: You will need to adjust the radius (the value 100 in th s argument) according to your point density. I also used the square root of the count to scale it so that the point area is proportional to the counts.
Also note: If you have very dense points, it might be more appropriate to use a different kind of plot. Histograms for example (I personally like hexbin for 2d data) are a decent alternative in these cases.

Categories

Resources