Histogram of a dataframe

Histogram of a dataframe - python

I have a data frame as follow:
and I am trying to plot a histogram from it such that the letters {A,B,C,D} are in the x axis and y axis shows the numbers. I have tried the following:
df.plot(kind='hist')
for which I get the address instead of the plot, i.e:
<matplotlib.axes._subplots.AxesSubplot at 0x11217d5f8>
I was wondering how can I show the plot?

IIUC, I think you need to transpose the dataframe to get index ['A','B','C','D']as x-axis and then plot. Also use plt.show() to display the histogram. The latest version of pandas will display directly the plot with axes object displaying. But, for the older versions need to explicitly write the plt.show() code to display.
import matplotlib.pyplot as plt
df.T.plot(kind='hist')
plt.show()

Related

xticklabels disappear when using ax.twin in subplots

I'm trying to create a 2 rows subplots using pandas plot, where the upper subplots has secondary y axis, but the creation of the secondary axis makes the xticklabels disappear as shown below.
I used the following code:
fig,axes=plt.subplots(2,1)
ax=axes[0]
pd.Series(range(10)).plot(ax=ax)
ax2=ax.twinx()
(pd.Series(range(10))**2).plot(ax=ax2)
ax=axes[1]
pd.Series(range(10)).plot(ax=ax)
when using the same code but replacing the order of the subplots it works fine:
fig,axes=plt.subplots(2,1)
ax=axes[0]
pd.Series(range(10)).plot(ax=ax)
ax=axes[1]
pd.Series(range(10)).plot(ax=ax)
ax2=ax.twinx()
(pd.Series(range(10))**2).plot(ax=ax2)

For this, I suggest using matplotlib as is, and not through pandas. That should solve your issue.
So it would be something like this:
import matplotlib.pyplot as plt
fig,axes=plt.subplots(2,1)
ax=axes[0]
ax.plot(pd.Series(range(10)))
ax2=ax.twinx()
ax2.plot(pd.Series(range(10))**2)
ax=axes[1]
ax.plot(pd.Series(range(10)))

How do I use mathlibplot.hist with x and y values using bins=40 in Python 3?

I have a large list of data points of x and y values that I need to put into a histogram with 40 bins but mathlibplot.hist is only letting me enter 1 variable with bins. I've tried hist2d as well but it's not very clean. Any help would be appreciated!

As you have data points x and y, you can simply use hist method to plot histogram.
The following code will help you to create a histogram.
plt.hist([x,y],bins=40, histtype='step',fill=True)
plt.show()
The histogram will look like the following:
If you want to change the style or give it title and labels, you can do it. Here is another histogram with unfilled bars.
If you still face any problem, let me know then.

Maybe you can make use of matplotlib library to solve your purpose:
It will be like imposing 2 histograms on top of each other.
In the below code, I am trying to plot a histograms of y_train and predicted(X_train) in the same space.
You can modify the variables as per your requirement.
import matplotlib.pyplot as plt
plt.hist(y_train, stacked=True,bins=40, label='Actual', alpha=0.5)
plt.hist(regressor.predict(X_train),bins=40, stacked=True, label='Predicted', alpha=0.5)
plt.legend(loc='best')
plt.show()
Hope this helps!

Display all x values of a graph

I know it has already been asked, but I could not solve my problem.
I have three pandas column, One with dates, and other with values.
I can get my graph with the two curves depending on date.
However, I cannot display all dates in the x axis. Can you help me?
import pandas as pd
import matplotlib.pyplot as plt
# mau_file is the pandas dataframe with three columns.
plt.figure()
mau_file.plot(x='month_date', y=['mau', 'nb_migs'], figsize=(10,5), grid=True)
plt.set_xticklabels(mau_file['month_date'])
plt.legend(loc='best')
plt.show()

Usually, plt.xticks() is used to display x axis values.
As I'm not sure it is 100% compatible with a pandas structure, you may need to store your data in a classical table or a numpy array.
Documentation of plt.xticks()
EDIT : It is possible to chose the orientation of the labels.
For exemple plt.xticks(x, labels, rotation='vertical') will give you vertical labels.

Single column heat map in python

My goal is to have a single column heat map, but for some reason to code I normally use for heat maps doesn't work with if I'm not using a 2-D array.
vec1 = np.asarray([1,2,3,4,5])
fig, ax = plt.subplots()
plt.imshow(vec1, cmap='jet')
I know it's weird to show I single column vector as a heat map, but it's a nice visual for my purposes. I just want a column of colored squares that I can label along the y-axis to show a ranked list of things to people.

You could use the library Seaborn to do this. In Seaborn you can identify specific columns to plot. In this case that'd be your array. The following should accomplish what you're wanting
vec1 = np.asarray([1,2,3,4,5])
fig, ax = plt.subplots()
seaborn.heatmap([vec1])
Then you'll just have to do your formatting on that heatmap as you would in pyplotlib.
http://seaborn.pydata.org/generated/seaborn.heatmap.html

Starting from the previous answer, I've come up with an approach which uses both Seaborn and Matplotlib's transform to do what pavlov requested within its comment (that is, swapping axis in a heatmap even though Seaborn does not have an orientation parameter).
Let's start from the previous answer:
vec1 = np.asarray([1,2,3,4,5])
sns = heatmap([vec1])
plt.show()
Using heatmap on a single vector yields to the following result:
Ok, let's swap the x-axis with the y-axis. To do that, we can use an Affine2D transform, applying a rotation of 90 degrees.
from matplotlib import transforms
tr = transforms.Affine2D().rotate_deg(90)
Let's also reshape the initial array to make it resemble a column vector:
vec2 = vec1.reshape(vec1.shape[0], 1)
Now we can plot the heatmap and force Matplotlib to perform an affine transform:
sns.heatmap(vec2)
plt.show(tr)
The resulting plot is:
Now, if we want to force each row to be a square, we can simply use the square=True parameter:
sns.heatmap(vec2, square=True)
plt.show(tr)
This is the final result:
Hope it helps!

y-axis scaling in seaborn vs pandas

I'm plotting a scatter plot using a pandas dataframe. This works correctly, but I wanted to use seaborn themes and specials functions. When I plot the same data points calling seaborn, the y-axis remains almost invisible. X-axis values ranges from 5000-15000, while y-axis values are in [-6:6]*10^-7.
If I multiply the y-axis values by 10^6, they display correctly, but the actual values when plotted using seaborn remains invisible/indistinguishable in a seaborn generated plot.
How can I seaborn so that the y-axis values scale automatically in the resultant plot?
Also some rows even contain NaN, not in this case, how to disregard that while plotting, short of manually weeding out rows containing NaN.
Below is the code I've used to plot.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("datascale.csv")
subdf = df.loc[(df.types == "easy") & (df.weight > 1300), ]
subdf = subdf.iloc[1:61, ]
subdf.drop(subdf.index[[25]], inplace=True) #row containing NaN
subdf.plot(x='length', y='speed', style='s') #scales y-axis correctly
sns.lmplot("length", "speed", data=subdf, fit_reg=True, lowess=True) #doesn't scale y-axis properly
# multiplying by 10^6 displays the plot correctly, in matplotlib
plt.scatter(subdf['length'], 10**6*subdf['speed'])

Strange that seaborn does not scale the axis correctly. Nonetheless, you can correct this behaviour. First, get a reference to the axis object of the plot:
lm = sns.lmplot("length", "speed", data=subdf, fit_reg=True)
After that you can manually set the y-axis limits:
lm.axes[0,0].set_ylim(min(subdf.speed), max(subdf.speed))
The result should look something like this:
Example Jupyter notebook here.

Seaborn and matplotlib should just ignore NaN values when plotting. You should be able to leave them as is.
As for the y scaling: there might be a bug in seaborn.
The most basic workaround is still to scale the data before plotting.
Scale to microspeed in the dataframe before plotting and plot microspeed instead.
subdf['microspeed']=subdf['speed']*10**6
Or transform to log y before plotting, i.e.
import math
df = pd.DataFrame({'speed':[1, 100, 10**-6]})
df['logspeed'] = df['speed'].map(lambda x: math.log(x,10))
then plot logspeed instead of speed.
Another approach would be to use seaborn regplot instead.
Matplot lib correctly scales and plots for me as follows:
plt.plot(subdf['length'], subdf['speed'], 'o')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Histogram of a dataframe - python

Related

xticklabels disappear when using ax.twin in subplots

How do I use mathlibplot.hist with x and y values using bins=40 in Python 3?

Display all x values of a graph

Single column heat map in python

y-axis scaling in seaborn vs pandas

Categories

Resources