Paraview glyph uniform distribution does not work on my dataset - python

I'm running Paraview 4.2 on Linux. Here's what's happening:
I load my XDMF/hdf5 data into PV, which contains vector data.
I apply a glyph filter to the loaded data, and hit apply (thereby using the default mode of Uniform Spatial Distribution).
No glyphs appear on screen, and the information tab shows that the filter has no data (0 points, etc.).
If I switch to All Points, or Every Nth Point, it works fine and displays the glyphs oriented correctly.
Annoyingly, if I then make a cone source, and change the input of the glyph to the cone, Uniform Spatial Distribution works fine for the cone.
No errors come up anywhere when I do this, whether in the PV GUI or through pvpython.
Any ideas?

Uniform distribution works by picking a set of random locations in space and finding data points closet to those locations to glyph. Try playing with the Seed to see if that helps pick different random locations that yield better results.
If you could share the data, that'd make it easier to figure out what could be going on here, as well.

Related

how to find the exact data value on click event or mouse hover in a time series graph drawn by datashader and holoview

Here I am trying to visualize 1bn data.
The below scatter plot represents graph of time value pair.
eg:
df:
TIME,VAL
145000000, 1.464000,
150000000, 1.466000,
155000000, 1.461250,
160000000, 1.481750,
165000000, 1.493500,
170000000, 1.514500,
175000000, 1.524000,
180000000, 1.543750,
185000000, 1.553750,
190000000, 1.582000,
195000000, 1.594000,
200000000, 1.625000,
205000000, 1.639500,
210000000, 1.679250,
215000000, 1.697250,
220000000, 1.720000,
I need to find the exact time value pair that is being mapped to (x, y) point.
Is there any way to find the real time value of particular (x, y) click on raster image being rendered on screen
If you're looking to find the individual row contributing to a specific point, the answer is that you can't.
Unlike matplotlib, datashader is not rendering individual points. Instead, it first defines the image boundaries, then using the requested number of pixels, computes the range of values in (x, y) which fall into each pixel. It then bins/discretizes your data, so the rendering engine is only working with summary statistics for each pixel - not the individual values from your source data. This is what makes datashader so powerful when rendering huge datasets, but it also means that nowhere is there a mapping from rows to pixels.
You could of course identify the boundaries of a given pixel and then filter your dataset to pull all rows with data falling into these bounds. But there's no guarantee that the match will be unique (this depends on your data).
Michael Delgado is correct that the rendered image doesn't contain information about the rows, but (a) you can use the HoloViews "inspect" tools to look up the original datapoints mapped into that pixel (automating the process he describes; see https://examples.pyviz.org/ship_traffic), and (b) it's on our list to provide such an inverse mapping from pixel to rows, with the constraint that only n datapoints can be returned for each pixel (see proposal in https://github.com/holoviz/datashader/issues/1126). Once we have such a mapping, it should be trivial to provide hover and click information in holoviews for datashader plots without the cost of searching through the original dataset. Wish us luck, and in the meantime, use inspect_points!

Generate schematic (geographic) diagram from graph

I would like to know how best to generate a schematic diagram, something like this, from a graph (created using the Python NetworkX library) that contains the latitude and longitude of each node (city) and the lines connecting them in the Indian railway network.
The cities (nodes) should be located reasonably close to their actual position, but not necessarily exactly. I am OK with using the plate carrée projection that simply maps lat/long onto X/Y in the diagram.
The rail lines (edges) can be straight lines or even curves if it fits better.
On the diagram should be displayed the cities (preferably as dots) along with a short (max 4 characters) label for each, the lines connecting them, and a single label for each line (the given example has quite long labels for the lines).
Preferably the amount of manual tweaking of coordinates to get things to fit should be minimised.
Using Graphviz was my first idea. But I don't know how well neato/fdp (required for fixed positioning of nodes) will perform with large numbers of nodes/edges. Also, making Graphviz display labels separately outside the nodes (rather than inline) seems to need a lot of manual positioning of each label, which would be pretty boring. Is there any better way to get this kind of layout?
Doable (https://forum.graphviz.org/t/another-stupid-graphviz-trick-geographic-graphs/256), but does not seem to use many Graphviz features. In addition to tools mentioned in the link, maybe consider pikchr (https://pikchr.org/home/doc/trunk/homepage.md)

Creating a packed bubble / scatter plot in python (jitter based on size to avoid overlapping)

I have come across a number of plots (end of page) that are very similar to scatter / swarm plots which jitter the y-axis in order avoid overlapping dots / bubbles.
How can I get the y values (ideally in an array) based on a given set of x and z values (dot sizes)?
I found the python circlify library but it's not quite what I am looking for.
Example of what I am trying to create
EDIT: For this project I need to be able to output the x, y and z values so that they can be plotted in the user's tool of choice. Therefore I am more interested in solutions that generate the y-coords rather than the actual plot.
Answer:
What you describe in your text is known as a swarm plot (or beeswarm plot) and there are python implementations of these (esp see seaborn), but also, eg, in R. That is, these plots allow adjustment of the y-position of each data point so they don't overlap, but otherwise are closely packed.
Seaborn swarm plot:
Discussion:
But the plots that you show aren't standard swarm plots (which almost always have the weird looking "arms"), but instead seem to be driven by some type of physics engine which allows for motion along x as well as y, which produces the well packed structures you see in the plots (eg, like a water drop on a spiders web).
That is, in the plot above, by imagining moving points only along the vertical axis so that it packs better, you can see that, for the most part, you can't really do it. (Honestly, maybe the data shown could be packed a bit better, but not dramatically so -- eg, the first arm from the left couldn't be improved, and if any of them could, it's only by moving one or two points inward). Instead, to get the plot like you show, you'll need some motion in x, like would be given by some type of physics engine, which hopefully is holding x close to its original value, but also allows for some variation. But that's a trade-off that needs to be decided on a data level, not a programming level.
For example, here's a plotting library, RAWGraphs, which produces a compact beeswarm plot like the Politico graphs in the question:
But critically, they give the warning:
"It’s important to keep in mind that a Beeswarm plot uses forces to avoid collision between the single elements of the visual model. While this helps to see all the circles in the visualization, it also creates some cases where circles are not placed in the exact position they should be on the linear scale of the X Axis."
Or, similarly, in notes from this this D3 package: "Other implementations use force layout, but the force layout simulation naturally tries to reach its equilibrium by pushing data points along both axes, which can be disruptive to the ordering of the data." And here's a nice demo based on D3 force layout where sliders adjust the relative forces pulling the points to their correct values.
Therefore, this plot is a compromise between a swarm plot and a violin plot (which shows a smoothed average for the distribution envelope), but both of those plots give an honest representation of the data, and in these plots, these closely packed plots representation comes at a cost of a misrepresentation of the x-position of the individual data points. Their advantage seems to be that you can color and click on the individual points (where, if you wanted you could give the actual x-data, although that's not done in the linked plots).
Seaborn violin plot:
Personally, I'm really hesitant to misrepresent the data in some unknown way (that's the outcome of a physics engine calculation but not obvious to the reader). Maybe a better compromise would be a violin filled with non-circular patches, or something like a Raincloud plot.
I created an Observable notebook to calculate the y values of a beeswarm plot with variable-sized circles. The image below gives an example of the results.
If you need to use the JavaScript code in a script, it should be straightforward to copy and paste the code for the AccurateBeeswarm class.
The algorithm simply places the points one by one, as close as possible to the x=0 line while avoiding overlaps. There are also options to add a little randomness to improve the appearance. x values are never altered; this is the one big advantage of this approach over force-directed algorithms such as the one used by RAWGraphs.

Highlighting many ranges on an axis of a Bokeh plot?

I have a scatter plot of data and would like to highlight certain ranges of the x-axis. When the number ranges to highlight are relatively small, using BoxAnnotation works well. However, I'm trying to make many adjacent highlightings (with different opacity). With many adjacent BoxAnnotations, zoomed out, the boxes slightly overlap, creating lines. Additionally, thousands of BoxAnnotations takes a long time to generate and does not run smoothly when interacting with the plot.
To be more specific about my case, I have some temporal data and a predictive model detecting the probability of some event occurring in the data. I want each segment to be highlighted with an opacity given by the probability that an event is occurring at that point in time. However, my current BoxAnnotation approach results in artificial lines from overlap of boxes when zoomed out (they disappear when zooming in on a region), and slow responsiveness of the interactive plot.
Is there a way to accomplish something similar to this without the artifacts and with a smoother experience?
Current method:
source = ColumnDataSource(data=data_frame)
figure_ = figure(x_axis_label='Time', y_axis_label='Intensity')
for index in range(data_frame.shape[0] - 1):
figure_.add_layout(
BoxAnnotation(left=data_frame['time'].values[index], right=data_frame['time'].values[index + 1],
fill_alpha=data_frame['prediction'].values[index], fill_color='red', line_alpha=0)
)
figure_.circle(x='time', y='intensity', source=source)
show(figure_)
Example of artificial lines when there are too many small adjacent BoxAnnotations:
When zooming on the x-axis, the lines disappear:
There's probably not any way to salvage this exact approach. The artifacts are due to the functioning of the underlying raster HTML canvas, and here's not anything that can be one about that. And any slowness is due to the fact that this kind of use of BoxAnnotation (with so very many individual instances) is not at all what was envisioned, and it is simply not optimized to show hundreds of instances the way e.g. scatter glyphs are. You are trying to use box annotations to construct a sort of translucent heat map, and that is not a good fit for it, for the reasons above.
You could potentially overcome slowness by using a single rect or vbar glyph that draws all the boxes at once in a vectorized way. But that won't alleviate the compositing issues.
Your best bet is to create a semi-transparent "heatmap" image overlay yourself with a tool or code that can afford better control over the details of rasterization and compositing. I can't really advise you on how to do that in any detail. The Datashader library might be useful for this.

Paraview programmatically plot quantities over line, using only data on nodes

I have a dataset on a 2D domain.
I mean to get an XY plot of data along a line which contains a number of element edges.
I would typically do this with PlotOverLine.
Now I want to obtain a similar plot, but with my data points located only at the actual locations of nodes in my mesh.
In this way, my XY plot (and its underlying data) would be directly representative of the mesh resolution in my original data.
How can this be done programmatically, 1) from the UI as a macro, 2) from the UI at the python shell, 3) at the CLI (pvpython myscript.py)?
I couldn't even find how to do this via GUI.
I started by Select Points On, but I couldn't go any further.
Moreover, if this is the right way to start, I wouldn't be able to transfer it to python code, since Start Trace does not include Select Points On in the output trace, so I wouldn't know how to add it to code.
PS: This resembles the procedure in Abaqus whereby one defines a Path based on an arbitrary set of points (which can be nodes, e.g.) and then plots a given quantity along that path.

Categories

Resources