Plot a histogram of dates - python

I'm trying to plot a histogram of dates from a pandas dataframe. I have had a look at this question Can Pandas plot a histogram of dates? and some others, and while this works it is plotting a bar chart instead of a histogram.
Is there an easy way of plotting a histogram of dates or should I extract the year as numbers and plot a histogram of an array of numbers?
Thanks!

You can explicitly register a converter using:
pd.plotting.register_matplotlib_converters()
By default you can't plot dates using the plotting functionality in pandas so you must explicitly register a converter like this.
As a very simple example try:
pd.plotting.register_matplotlib_converters()
df = pd.DataFrame({'date': [pd.to_datetime('1/1/2019')]*8 + [pd.to_datetime('2/1/2019')]*4})
df['date'].hist()
which will return:

Related

When making histogram Matplotlib Python the x-values and y-values are interchanged?

After I created a simple series, I wanted to create a histogram by doing this:
data = np.array([3368, 4043])
s = pd.Series(data,index=['FEMALE','MALE'])
s.hist()
However, the histogram will look like this:
This is not correct, as I want the variable sex on the x-axis and I want the values of 3368 and 4043 on the y-axis. How do I fix this?

Plot a graph in matplotlib with two different scales on one axis

I'm trying to plot a graph with time data on X-Axis. My data has daily information, but I want to create something that has two different date scales on X-Axis.
I want to start it from 2005 and it goes to 2014, but after 2014, I want that, the data continues by months of 2015. Is this possible to do? If so: how can I create this kind of plot?
Thanks.
I provided an image below:
Yes you can, just use the following pattern as I observed your X-axis values are already the same so it would just plot the other graph on the right
For a dataframe:
import numpy, matplotlib
data = numpy.array([45,63,83,91,101])
df1 = pd.DataFrame(data, index=pd.date_range('2005-10-09', periods=5, freq='W'), columns=['events'])
df2 = pd.DataFrame(numpy.arange(10,21,2), index=pd.date_range('2015-01-09', periods=6, freq='M'), columns=['events'])
matplotlib.pyplot.plot(df1.index, df1.events)
matplotlib.pyplot.plot(df2.index, df2.events)
matplotlib.pyplot.show()
You can change the parameters according to your convenience.

How to align bars with tick labels in plt or pandas histogram (when plotting multiple columns)

I have started using python for lots of data problems at work and the datasets are always slightly different. I'm trying to explore more efficient ways of plotting data using the inbuilt pandas function rather than individually writing out the code for each column and editing the formatting to get a nice result.
Background: I'm using Jupyter notebook and looking at histograms where the values are all unique integers.
Problem: I want the xtick labels to align with the centers of the histogram bars when plotting multiple columns of data with the one function e.g. df.hist() to get histograms of all columns at once.
Does anyone know if this is possible?
Or is it recommended to do each graph on its own vs. using the inbuilt function applied to all columns?
I can modify them individually following this post: Matplotlib xticks not lining up with histogram
which gives me what I would like but only for one graph and with some manual processing of the values.
Desired outcome example for one graph:
Basic example of data I have:
# Import libraries
import pandas as pd
import numpy as np
# create list of datapoints
data = [[170,30,210],
[170,50,200],
[180,50,210],
[165,35,180],
[170,30,190],
[170,70,190],
[170,50,190]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['height', 'width','weight'])
# print dataframe.
df
Code that displays the graphs in the problem statement
df.hist(figsize=(5,5))
plt.show()
Code that displays the graph for weight how I would like it to be for all
df.hist(column='weight',bins=[175,185,195,205,215])
plt.xticks([180,190,200,210])
plt.yticks([0,1,2,3,4,5])
plt.xlim([170, 220])
plt.show()
Any tips or help would be much appreciated!
Thanks
I hope this helps.You take the column and count the frequency of each label (value counts) then you specify sort_index in order to get the order by the label not by the frecuency, then you plot the bar plot.
data = [[170,30,210],
[170,50,200],
[180,50,210],
[165,35,180],
[170,30,190],
[170,70,190],
[170,50,190]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['height', 'width','weight'])
df.weight.value_counts().sort_index().plot(kind = 'bar')
plt.show()

Histogram of a dataframe

I have a data frame as follow:
and I am trying to plot a histogram from it such that the letters {A,B,C,D} are in the x axis and y axis shows the numbers. I have tried the following:
df.plot(kind='hist')
for which I get the address instead of the plot, i.e:
<matplotlib.axes._subplots.AxesSubplot at 0x11217d5f8>
I was wondering how can I show the plot?
IIUC, I think you need to transpose the dataframe to get index ['A','B','C','D']as x-axis and then plot. Also use plt.show() to display the histogram. The latest version of pandas will display directly the plot with axes object displaying. But, for the older versions need to explicitly write the plt.show() code to display.
import matplotlib.pyplot as plt
df.T.plot(kind='hist')
plt.show()

Graphing percentage data in seaborn

How would I graph this data in seaborn. I would like to have the various categories on the x axis, and the data on the y axis as percentages.
I tried to create a barplot with seaborn but I can't get it to look right.
Any help would be appreciated!
Thanks
Edit: code:
sns.barplot(x = new_df.columns,data=new_df)
I suggest you organize your DataFrame more like this, it will make it much easier to plot and organize this type of data.
Instead of doing your DataFrame as you have it, instead transpose it to two simple columns like so:
name value
debt_consolidation 0.152388
credit_card 0.115689
all_other 0.170111
etc. By doing this you can simply plot your data in Seaborn by doing the below:
sns.barplot(x="name",y="value", data = df)
Which will look like this (click)

Categories

Resources