I have a specific problem that maybe can help me with. I have, currently, 3 arrays of data and I want to make a 2D histogram of the first two while using the third array as values that get summed up in each particular bin. I also want to include a color bar that shows the scale of different colors you see in the histogram.
As a start I looked into using matplotlib.pyplot.hexbin to do this and it seems to work fine but I don't want to have hexagons as the shape of my bins. Is somebody able to point me to some resources on how to do this?
Related
I have an array of 575 points.
When I represent it on graph, I get following curve shown in the attached image.
I want to split it in sub graphs when slope becomes 0 or you can say when the graph becomes parallel to x-axis.
Thanks in advance.
I understand you want to have some degree of smoothness, otherwise you will have as a result many small separated regions of the graph.
You also may need to specifically define what you want to consider as parallel to the x-axis.
I suggest to start by moving a running window of certain length that categorizes each range being studied as horizontal given certain condition.
This condition can be something like "all values are inside certain range". This condition may take into account characteristics like the variance and the mean of the points inside the window. For example, "all values are between 101% and 99% of the mean."
I have a numpy-array, who's shape is:
(30,40,100,200)
Those are 3D points (30(x-axis)x40(y-axis)x100(z-axis)) for different times (200 in total):
For visualization only (this is not my dataset, the picture comes from here: http://15462.courses.cs.cmu.edu/fall2016/article/35)
Now, I have issues with understanding how I can slice it:
How do I extract a 3D cluster for one specific time, i.e. 140?
From that extracted 3D cluster, how can I plot a 2D x-z cross-section for a specific y-position, i.e.45?
You should read up on basic numpy slicing: https://numpy.org/doc/stable/reference/arrays.indexing.html
How do I extract a 3D cluster for one specific time, i.e. 140?
Just specify the time index, i.e. data[:, :, :, 140]. Be aware that Python indexing starts from 0.
From that extracted 3D cluster, how can I plot a 2D x-z cross-section for a specific y-position, i.e.45?
You can acquire a 2D cross-section by a similar slicing operation, i.e. cluster[:, 45, :]. It can be plotted in various ways depending on the plotting library. imshow() from matplotlib might be one possibility.
Is your question about the data set (how does data categorize and how to get a 3D cluster at a specific time), or about the coding?
If it is about "How to get a cluster at a specific time" it means that your problem is about your particular dataset, which Stackoverflow is not a correct place for these types of question.
If it is about "coding" then define clearly your question and provide us with your code and the problem with it.
Based on your explanation, I think that for each time step, you have a complete set of xyz data, and so the solution is very strait.
I am using Python's matplotlib.pyplot.contourf to create a contour plot of my data with a color bar. I have done this successfully countless times, even with other layers of the same variable. However, when the values get small (on the order of 1E-12), parts of the contour show up white. The white color does not show up in the color bar either. Does anyone know what causes this and how to fix this? The faulty contour is attached below.
a1 = plt.contourf(np.linspace(1,24,24),np.linspace(1,20,20),np.transpose(data[:,:,15]))
plt.colorbar(a1)
plt.show()
tl;dr
Given the new information, matplotlib couldn't set the right number of levels (see parameters in the documentation) for your data leaving data unplotted. To fix that you need to tell matplotlib to extend the limits with either plt.contourf(..., extend="max") or plt.contourf(..., extend="both")
Extensive answer
There are a few reasons why contourf() is showing white zones with a colormap that doesn't include white.
NaN values
NaN values are never plotted.
Masked data
If you mask data before plotting, it won't appear in the plot. But you should know if you masked your data.
Although, you may have unnoticed mask your data if you use something like Tick locator = LogLocator().
Matplotlib couldn't set the right levels for your data
Sometimes matplotlib doesn't set the right levels, leaving some of your data without plotting.
To fix that you can user plt.contourf(..., extend=EXTENDS) where EXTENDS can be "neither", "both", "min", "max"
Coarse grid
contourf plots whitespace over finite data. Past answers do not correct
One remark, white section in the plot can also occur if the X and Y vectors data points are not equally spaced. In that case best to use function tricontourf().
I was facing the same problem recently, when there was data available even higher/lower than the levels I have set. So, the plt.contourf fills the contours exclusively given by you, and it neglects any other higher or lower values present in your data.
I solved this by adding a key word argument extend="both", which for your case would be something like this:
a1 = plt.contourf(np.linspace(1,24,24),np.linspace(1,20,20),np.transpose(data[:,:,15]), extend="both")
or in general form:
a1 = plt.contourf(x,y,variable[:,:,15],extend="both")
By doing this, you're instructing the module to plot the higher(/lower) values according to the highest(/lowest) filled contour.
If you want only to extend in the lower or higher range, you can change the keyword argument to
extend="min" or extend ="max"
I am trying to make a swam plot that contains more information than a single categorical level and two variables. I am looking to create something like this
So ideally, something like this would work (but it does not):
ax = sns.swarmplot(x="round_id", y="independent_error_abs", hue="difficulty", hue_order=['easy','medium','hard'], size="followers", markershape="rank",data=df)
where "difficulty", "followers", and "rank" determine the color of the point, the size of the point, and the shape of the point, respectively.
No, this is not possible with swarmplot. Personally I find this kind of plot very difficult to interpret: a good statistical plot should make the patterns in the data immediately apparent, whereas plots with multiple categorical variables that manipulate the size or shape of the points quickly become more like puzzles. My recommendation in these cases (following Andrew Gelman) is to make more than one plot, each with relatively simple semantics.
You don't have to agree, of course, but you will have to make it yourself using matplotlib.
I am facing the same issue, and actually the solution seems to be pretty simple at least for the marker type!
Just divide your dataframe in subdataframes, each for a different marker type. The you make a swarmplot on top of each other, and that's it.
If the size of the dot, is also a categorical variable, you just need to do the same as above where each subdtaframe will represent a marker and a different size.
If size is continuous, then it seems you would need to plot each dot independently in a for loop, but for that I would use matplotlib.pyplot.
I am trying to write program to generate burndown chart. The x-axis is of dates. The y-axis shows remaining hours on a particular date. The problem is that data is not present for all the dates in advance as it is a burndown chart. So this results in error -
"ValueError: x and y must have same first dimension"
So my question is what default values I can assign to the remaining points on Y-axis?
I will paste actual code if this information is not sufficient. Thanks.
I'm not sure if this is what you want, but by using masked arrays you can avoid plotting specific points. See my answer here.
Or maybe you'd like something more like this, which skips them on the x-axis as well as not plotting them?
I know this question is 2 1/2 years old, but I had a similar problem with plotting graphs with missing data and thought that masked arrays were more complex than they needed to be.
My solution was to put the numpy.inf value at any points where I was missing data and then use the 'o-' option when calling matplotlib.pyplot.plot. This will make lines be broken where you don't have data, but if you have a single data point somewhere missing data on each side of it, you get a circle.
The only downside is that you end up with circles at every point on the line, so it might not be pretty.