Python - How to bin float number when using pandas.plot.hist()

Python - How to bin float number when using pandas.plot.hist() - python

I am using Pandas and trying to create a histogram for the frequency of float numbers (in bins)
I don't need to set those myself to any special value, just get a reasonable looking histogram, showing the frequency of the data points in reasonable binning or continues axis.
Trying the use this code
vals, bins = np.histogram(total["ratio"].tolist(), bins=10)
total[["ratio"]].plot().hist(x="ratio", bins=bins)
I also tried using 10 as the parameter for bins. Got the same result
Taken from the docs
And still get the axis not binned, which creates a large mess.
What am I missing here in order to bin the data?
Update
Using df["ratio"].hist() worked like a charm.
What was the reason that doing the actual same by df[["ratio"]].plot().hist() did not work?

Related

Pyplot Scatter Plot

I'm not great at matplotlib, but I need to use it for some work I am doing. I have a set of 9 columns of data, with around 100k lines. I want to produce a scatter plot, and I don't care about the rows, they're meaningless for my purposes. What I need is for the values to be plotted as a scatter against which column they are in, regardless of which row they a part of.
This is all loaded in from a text file in a simple 2D array using numpy.loadtxt. It's just a set of numbers, so any substitution of random numbers should work. I'm just not sure how to manipulate it in a way that the scatter command will like. I often get it giving me errors like I'm giving it too few arguments, or it just iterates over the array (or arrays if I separate them), in ways I do not anticipate.
My first thought is that I could somehow break it down into a set of series by column, but I don't think the scatter command will take that. Any help would be very much appreciated.

The scatter function takes in two lists that have the same length. To access a single column n of your numpy array, just use data[:, n]. Since you want to correspond all columns with their column number, you need to also create a list that has the same length of data, with only the column number as elements. To create the plot you want, just do the following:
for i in range(9):
plt.scatter([i + 1] * len(data), data[:, i])

Different result using welch function between Matlab and Python

When I run welch function on the same data in Matlab and Python, I get slightly PSD estimation difference, while the sample frequencies are identical.
Here is the parameters i used in both Matlab and Python:
Matlab:
winlength=512;
overlap=0;
fftlength=1024;
fs=127.886;
[PSD, freqs] = pwelch(data, winlength, overlap, fftlength, fs);
Python:
freqs, PSD = welch(data, fs=127.886, window='hamming', nperseg=512,
noverlap=None, nfft=1024)
here's a plot presenting the difference:
enter image description here
Does anyone have any idea what should I change to get the same results of PSD?

In the Matlab documentation https://se.mathworks.com/help/signal/ref/pwelch.html it says that the overlap parameter has to be a positive integer thus 0 is not a valid value.
If you omit the overlap value - (or declare a non-valid value) the parameter is automatically set to a 50% overlap i.e. changing the curve.
Try to set the Python function to a 50% overlap and see if they match.
BTW you rarely want to have zero overlap as this is likely to cause transients in the signal.

matplotlib x-axis formatting if x-axis is pandas index

I'm using iPython notebook's %matplotlib inline and I'm having trouble formatting my plot.
As you can see, my first and last data point aren't showing up the way the other data points are showing up. I'd like to have the error bars visible and have the graph be "zoomed out" a bit.
df.plot(yerr=df['std dev'],color='b', ecolor='r')
plt.title('SpO2 Mean with Std Dev')
plt.xlabel('Time (s)')
plt.ylabel(SpO2)
I assume I have to use
matplotlib.pyplot.xlim()
but I'm not sure how to use it properly if my x-axis is a DataFrame index composed of strings:
index = ['-3:0','0:3','3:6','6:9','9:12','12:15','15:18','18:21','21:24']
Any ideas? Thanks!

You can see the usage of xlim here. Basically in this case if you ran plt.xlim() you would get(0.0, 8.0). As you have an index that uses text and not numbers the values for xlim are actually just the index of the entries in your index. So in this case you would just need to change the values by feeding in however many steps left and right you want your graph to take. For example:
plt.xlim(-1,len(df))
Would change this:
to this:
Hope that helps.

Python 2.7 time series non numeric values

I am using Python 2.7 and need to draw a time series using matplotlib library. My y axis data is numeric and everything is ok with it.
The problem is my x axis data which is not numeric, and matplotlib does not cooperate in this case. It does not draw me a time series even though it is not supposed to affect the correctness of the plot, because the x axis data is arranged by a given order anyway and it's order does not affect anything logically.
For example let's say the x data is ["i","like","python"] and the y axis data is [1,2,3].
I did not add my code because I've found that the code is ok, it works if I change the data to all numeric data.
Please explain me how can I use matplotlib to draw the time series, without making me to convert the x values to numeric stuff.
I've based my matplotlib code on following answers: How to plot Time Series using matplotlib Python, Time Series Plot Python.

Matplotlib requires someway of positioning those labels. See the following example:
import matplotlib.pyplot as plt
x = ["i","like","python"]
y = [1,2,3]
plt.plot(y,y) # y,y because both are numeric (you could create an xt = [1,2,3]
plt.xticks(y,x) # same here, the second argument are the labels.
plt.show()
, that results in this:
Notice how I've put the labels there but had to somehow say where they are supposed to be.
I also think you should put a part of your code so that it's easier for other people to suggest upon.

Colorbar based legend in python matplotlib

In the graphic below, I want to put in a legend for the calendar plot. The calendar plot was made using ax.plot(...,label='a') and drawing rectangles in a 52x7 grid (52 weeks, 7 days per week).
The legend is currently made using:
plt.gca().legend(loc="upper right")
How do I correct this legend to something more like a colorbar? Also, the colorbar should be placed at the bottom of the plot.
EDIT:
Uploaded code and data for reproducing this here:
https://www.dropbox.com/sh/8xgyxybev3441go/AACKDiNFBqpsP1ZttsZLqIC4a?dl=0

Aside - existing bugs
The code you put on the dropbox doesn't work "out of the box". In particular - you're trying to divide a datetime.timedelta by a numpy.timedelta64 in two places and that fails.
You do your own normalisation and colour mapping (calling into color_list based on an int() conversion of your normalised value). You subtract 1 from this and you don't need to - you already floor the value by using int(). The result of doing this is that you can get an index of -1 which means your very smallest values are incorrectly mapped to the colour for the maximum value. This is most obvious if you plot column 'BIOM'.
I've hacked this by adding a tiny value (0.00001) to the total range of the values that you divide by. It's a hack - I'm not sure that this method of mapping is at all the best use of matplotlib, but that's a different question entirely.
Solution adapting your code
With those bugs fixed, and adding a last suplot below all the existing ones (i.e. replacing 3 with 4 on all your calls to subplot2grid(), you can do the following:
Replace your
plt.gca().legend(loc="upper right")
with
# plot an overall colorbar type legend
# Grab the new axes object to plot the colorbar on
ax_colorbar = plt.subplot2grid((4,num_yrs), (3,0),rowspan=1,colspan=num_yrs)
mappableObject = matplotlib.cm.ScalarMappable(cmap = palettable.colorbrewer.sequential.BuPu_9.mpl_colormap)
mappableObject.set_array(numpy.array(df[col_name]))
col_bar = fig.colorbar(mappableObject, cax = ax_colorbar, orientation = 'horizontal', boundaries = numpy.arange(min_val,max_val,(max_val-min_val)/10))
# You can change the boundaries kwarg to either make the scale look less boxy (increase 10)
# or to get different values on the tick marks, or even omit it altogether to let
col_bar.set_label(col_name)
ax_colorbar.set_title(col_name + ' color mapping')
I tested this with two of your columns ('NMN' and 'BIOM') and on Python 2.7 (I assume you're using Python 2.x given the print statement syntax)
The finalised code that works directly with your data file is in a gist here
You get
How does it work?
It creates a ScalarMappable object that matplotlib can use to map values to colors. It set the array to base this map on to all the values in the column you are dealing with. It then used Figure.colorbar() to add the colorbar - passing in the mappable object so that the labels are correct. I've added boundaries so that the minimum value is shown explicitly - you can omit that if you want matplotlib to sort that out for itself.
P.S. I've set the colormap to palettable.colorbrewer.sequential.BuPu_9.mpl_colormap, matching your get_colors() function which gets these colours as a 9 member list. I strongly recommend importing the colormap you want to use as a nice name to make the use of mpl_colors and mpl_colormap more easy to understand e.g.
import palettable.colorbrewer.sequential.BuPu_9 as color_scale
Then access it as
color_scale.mpl_colormap
That way, you can keep your code DRY and change the colors with only one change.
Layout (in response to comments)
The colorbar may be a little big (certainly tall) for aesthetic ideal. There are a few possible options to do that. I'll point you to two:
The "right" way to do it is probably to use a Gridspec
You could use your existing approach, but increase the number of rows and have the colorbar still in one row, while the other elements span more rows than they do currently.
I've implemented that with 9 rows, an extra column (so that the month labels don't get lost) and the colorbar on the bottom row, spanning 2 less columns than the main figure. I've also used tight_layout with w_pad=0.0 to avoid label clashes. You can play with this to get your exact preferred size. New code here.
This gives:
:

There are functions to do this in matplotlib.colorbar. With some specific code from your example, I could give you a better answer, but you'll use something like:
myColorbar = matplotlib.colorbar.ColorbarBase(myAxes, cmap=myColorMap,
norm=myNorm,
orientation='vertical')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - How to bin float number when using pandas.plot.hist() - python

Related

Pyplot Scatter Plot

Different result using welch function between Matlab and Python

matplotlib x-axis formatting if x-axis is pandas index

Python 2.7 time series non numeric values

Colorbar based legend in python matplotlib

Categories

Resources