Stitching grids of varying grid spacing - python

I read data from binary files into numpy arrays with np.fromfile. These data represent Z values on a grid for which spacing and shape are known so there is no problem reshaping the 1D array into the the shape of the grid and plotting with plt.imshow. So if I have N grids I can plot N subplots showing all data in one figure but what I'd really like to do is plot them as one image.
I can't just stack the arrays because the data in each array is spaced differently and because they have different shapes.
My idea was to "supersample" all grids to the spacing of the finest grid, stack and plot but I am not sure that is such a good idea as these grid files can become quite large.
By the way: Let's say I wanted to do that, how do I go from:
0, 1, 2
3, 4, 5
to:
0, 0, 1, 1, 2, 2
0, 0, 1, 1, 2, 2
3, 3, 4, 4, 5, 5
3, 3, 4, 4, 5, 5
I'm open to any suggestions.
Thanks,
Shahar

The answer if you just plot is: don't. plt.imshow has a keyword argument extent which you can use to zoom the imagine when plotting. Other then that I would suggest scipy.ndimage.zoom, with order=0, it is equivalent to repeating values, but you can zoom to any size easily or use a different order to get some smooth interpolation. np.tile could be an option for very simple zooming too.
Here is an example:
a = np.arange(9).reshape(3,3)
b = np.arange(36).reshape(6,6)
plt.imshow(a, extent=[0,1,0,1], interpolation='none')
plt.imshow(b, extent=(1,2,0,1), interpolation='none')
# note scaling is "broke"
plt.xlim(0,2)
of course to get the same color range for both, you should add vim=... and vmax keywords.

Related

Visualising entity density on a 2D plane using pcolormesh in matplotlib, Python

I am trying to recreate the following heatmap (created with R) with Python.
This heatmap represents the entity concentration in a room, where the lighter color equate denser entity. The X axis is length in meters and the Y axis is height in meters.
Currently, I have been trying to recreate the data with matplotlib's pcolormesh. Unfortunately, I cannot seem to understand how to define X and Y columns for the heatmap.
The following code produces the heatmap below:
df = pd.DataFrame({"x": [0, 0, 1, 1, 2, 2],
"y": [3, 4, 3, 4, 3, 4],
"concentration": [1123, 1238, 1285, 1394, 5123, 8712]})
plt.pcolormesh(df)
plt.show()
Whereas I would like to see something like this:
As you can see, for X and Y I actually use values from columns in the dataframe.
Basically, X and Y are coordinates and the color at those pixels is dependent on the value of concentration (third column).
I tried to pass the arguments like that:
plt.pcolormesh([df["x"], df["z"]], df["concentration"])
plt.show()
But this leads to the following error:
TypeError: pcolormesh() takes 1 or 3 positional arguments but 2 were given
How am I to represenet the concentration data as a 2D array?
Am I even on the correct path?

ValueError: color kwarg must have one color per data set. 4462 data sets and 1 colors were provided

I have an error that I do not comprehend, am quite new to this so thank you all in advance!
I am using Jupyter Notebook (Anaconda3)The link here shows my code and error message
The problem is the dimensionality of your all_unbalanced_data array/list.
If you are using an N-dimensional data (N different datasets, lists of data, etc) as input to plt.hist, then the color kwarg must be of the same dimensionality.
You input one single color, so for the script to work your data must be shaped as a 1-dimensional array.
A rule of thumb
Suppose your data are contained in a numpy array:
all_unbalanced_data = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
Than you can extract the shape (dimensionality) of the array:
all_unbalanced_data.shape
>>> (2, 4)
The the number of colors plt.hist will expect will be 2:
color = ['color_code_1', 'color_code_2']
So in your case plt.hist is expecting 4462 different colors.

What is being plotted by plt.plot with a tuple argument?

This code snippet:
import matplotlib.pyplot as plt
plt.plot(([1, 2, 3], [1, 2, 3]))
plt.show()
produces:
What function is being plotted here? Is this use case described in matplotlib documentation?
This snippet:
plt.plot(([1, 2, 3], [1, 2, 3], [2, 3, 4]))
produces:
From the new test case you provided we can see it is picking the i-th element on the list and building a series.
So it ends up plotting the series y = {1, 1, 2}, y = {2, 2 , 3} and y = {3, 3, 4}.
On a more generic note, we can assume that using a tuple of list will plot multiple series.
Honestly, it doesn't look that user friendly to write the input like that but there might be some case where it is more convenient.
The x-values are picked by a default according to the docs:
The horizontal / vertical coordinates of the data points. x values are optional and default to range(len(y)).
Calling plt.plot(y) is calling plot in the Axes class. Looking at the source code, the key description closest to your problem states the following for plotting multiple sets of data:
- If *x* and/or *y* are 2D arrays a separate data set will be drawn
for every column. If both *x* and *y* are 2D, they must have the
same shape. If only one of them is 2D with shape (N, m) the other
must have length N and will be used for every data set m.
Example:
>>> x = [1, 2, 3]
>>> y = np.array([[1, 2], [3, 4], [5, 6]])
>>> plot(x, y)
is equivalent to:
>>> for col in range(y.shape[1]):
... plot(x, y[:, col])
The main difference here compared to your example is that x is implicitly defined based on the length of your tuple (described elsewhere in the documentation) and that you are using a tuple rather than an np.array. I tried digging further into the source code to see where tuples would become arrays. In particular at line 1632: lines = [*self._get_lines(*args, data=data, **kwargs)] seems to be where the different lines are likely generated, but that is as far as I got.
Of note, this is one of three ways to plot multiple lines of data, this being the most compact.

Count elements in numpy array between bounds [duplicate]

While reading up on numpy, I encountered the function numpy.histogram().
What is it for and how does it work? In the docs they mention bins: What are they?
Some googling led me to the definition of Histograms in general. I get that. But unfortunately I can't link this knowledge to the examples given in the docs.
A bin is range that represents the width of a single bar of the histogram along the X-axis. You could also call this the interval. (Wikipedia defines them more formally as "disjoint categories".)
The Numpy histogram function doesn't draw the histogram, but it computes the occurrences of input data that fall within each bin, which in turns determines the area (not necessarily the height if the bins aren't of equal width) of each bar.
In this example:
np.histogram([1, 2, 1], bins=[0, 1, 2, 3])
There are 3 bins, for values ranging from 0 to 1 (excl 1.), 1 to 2 (excl. 2) and 2 to 3 (incl. 3), respectively. The way Numpy defines these bins if by giving a list of delimiters ([0, 1, 2, 3]) in this example, although it also returns the bins in the results, since it can choose them automatically from the input, if none are specified. If bins=5, for example, it will use 5 bins of equal width spread between the minimum input value and the maximum input value.
The input values are 1, 2 and 1. Therefore, bin "1 to 2" contains two occurrences (the two 1 values), and bin "2 to 3" contains one occurrence (the 2). These results are in the first item in the returned tuple: array([0, 2, 1]).
Since the bins here are of equal width, you can use the number of occurrences for the height of each bar. When drawn, you would have:
a bar of height 0 for range/bin [0,1] on the X-axis,
a bar of height 2 for range/bin [1,2],
a bar of height 1 for range/bin [2,3].
You can plot this directly with Matplotlib (its hist function also returns the bins and the values):
>>> import matplotlib.pyplot as plt
>>> plt.hist([1, 2, 1], bins=[0, 1, 2, 3])
(array([0, 2, 1]), array([0, 1, 2, 3]), <a list of 3 Patch objects>)
>>> plt.show()
import numpy as np
hist, bin_edges = np.histogram([1, 1, 2, 2, 2, 2, 3], bins = range(5))
Below, hist indicates that there are 0 items in bin #0, 2 in bin #1, 4 in bin #3, 1 in bin #4.
print(hist)
# array([0, 2, 4, 1])
bin_edges indicates that bin #0 is the interval [0,1), bin #1 is [1,2), ...,
bin #3 is [3,4).
print (bin_edges)
# array([0, 1, 2, 3, 4]))
Play with the above code, change the input to np.histogram and see how it works.
But a picture is worth a thousand words:
import matplotlib.pyplot as plt
plt.bar(bin_edges[:-1], hist, width = 1)
plt.xlim(min(bin_edges), max(bin_edges))
plt.show()
Another useful thing to do with numpy.histogram is to plot the output as the x and y coordinates on a linegraph. For example:
arr = np.random.randint(1, 51, 500)
y, x = np.histogram(arr, bins=np.arange(51))
fig, ax = plt.subplots()
ax.plot(x[:-1], y)
fig.show()
This can be a useful way to visualize histograms where you would like a higher level of granularity without bars everywhere. Very useful in image histograms for identifying extreme pixel values.

plotting a histogram in matplotlib on values [duplicate]

While reading up on numpy, I encountered the function numpy.histogram().
What is it for and how does it work? In the docs they mention bins: What are they?
Some googling led me to the definition of Histograms in general. I get that. But unfortunately I can't link this knowledge to the examples given in the docs.
A bin is range that represents the width of a single bar of the histogram along the X-axis. You could also call this the interval. (Wikipedia defines them more formally as "disjoint categories".)
The Numpy histogram function doesn't draw the histogram, but it computes the occurrences of input data that fall within each bin, which in turns determines the area (not necessarily the height if the bins aren't of equal width) of each bar.
In this example:
np.histogram([1, 2, 1], bins=[0, 1, 2, 3])
There are 3 bins, for values ranging from 0 to 1 (excl 1.), 1 to 2 (excl. 2) and 2 to 3 (incl. 3), respectively. The way Numpy defines these bins if by giving a list of delimiters ([0, 1, 2, 3]) in this example, although it also returns the bins in the results, since it can choose them automatically from the input, if none are specified. If bins=5, for example, it will use 5 bins of equal width spread between the minimum input value and the maximum input value.
The input values are 1, 2 and 1. Therefore, bin "1 to 2" contains two occurrences (the two 1 values), and bin "2 to 3" contains one occurrence (the 2). These results are in the first item in the returned tuple: array([0, 2, 1]).
Since the bins here are of equal width, you can use the number of occurrences for the height of each bar. When drawn, you would have:
a bar of height 0 for range/bin [0,1] on the X-axis,
a bar of height 2 for range/bin [1,2],
a bar of height 1 for range/bin [2,3].
You can plot this directly with Matplotlib (its hist function also returns the bins and the values):
>>> import matplotlib.pyplot as plt
>>> plt.hist([1, 2, 1], bins=[0, 1, 2, 3])
(array([0, 2, 1]), array([0, 1, 2, 3]), <a list of 3 Patch objects>)
>>> plt.show()
import numpy as np
hist, bin_edges = np.histogram([1, 1, 2, 2, 2, 2, 3], bins = range(5))
Below, hist indicates that there are 0 items in bin #0, 2 in bin #1, 4 in bin #3, 1 in bin #4.
print(hist)
# array([0, 2, 4, 1])
bin_edges indicates that bin #0 is the interval [0,1), bin #1 is [1,2), ...,
bin #3 is [3,4).
print (bin_edges)
# array([0, 1, 2, 3, 4]))
Play with the above code, change the input to np.histogram and see how it works.
But a picture is worth a thousand words:
import matplotlib.pyplot as plt
plt.bar(bin_edges[:-1], hist, width = 1)
plt.xlim(min(bin_edges), max(bin_edges))
plt.show()
Another useful thing to do with numpy.histogram is to plot the output as the x and y coordinates on a linegraph. For example:
arr = np.random.randint(1, 51, 500)
y, x = np.histogram(arr, bins=np.arange(51))
fig, ax = plt.subplots()
ax.plot(x[:-1], y)
fig.show()
This can be a useful way to visualize histograms where you would like a higher level of granularity without bars everywhere. Very useful in image histograms for identifying extreme pixel values.

Categories

Resources