I have a table that I'm currently trying to display on a bar chart. It is annual data, with various data from the 1st/jan of one year until the 31st/dec of the same year
DATE COUNT
0 2019-01-01 42
1 2019-02-01 3
2 2019-03-01 31
3 2019-04-01 13
4 2019-05-01 1
...
When I plot this with 'date' as the x-axis, plotly is automatically converting the x axis to weeks, so that i have 52 bars instead of 365.
fig = px.histogram(df, x="DATE", y="COUNT", title="title")
fig.update_layout(bargap=0.30)
fig
I've tried updating the ticks with various formats, but this just changes the x axis labels, not the number of bars
I'm not sure how to change it from weekly to daily on the x-axis
Related
My dataset is like this
Days Visitors
Tuesday 23
Monday 30
Sunday 120
Friday 2
Friday 30
Tuesday 13
Monday 20
Saturday 100
How can I plot a histogram for this dataset, but assume it as a large dataset(560030 rows), not just only these values.
Actually I want to have days on x-axis and Visitors on Y-axis.
Use seaborn, which is an API for matplotlib.
seaborn.histplot
seaborn.displot
This will show the distribution of the number of visitors for each day of the week.
sns.histplot
import seaborn as sns
import pandas as pd
import numpy as np # for test data
import random # for test data
import calendar # for test data
# test dataframe
np.random.seed(365)
random.seed(365)
df = pd.DataFrame({'Days': random.choices(calendar.day_name, k=1000), 'Visitors': np.random.randint(1, 121, size=(1000))})
# display(df.head(6))
Days Visitors
0 Friday 83
1 Sunday 53
2 Saturday 34
3 Wednesday 92
4 Tuesday 45
5 Wednesday 6
# plot the histogram
sns.histplot(data=df, x='Visitors', hue='Days', multiple="stack")
Once the histogram is plotted, if the legend needs to be moved, use of the workaround found in seaborn issue: Not clear how to reposition seaborn.histplot legend #2280, may be necessary.
sns.displot
This option most clearly conveys the daily distribution of visitor counts
sns.displot(data=df, col='Days', col_wrap=4, x='Visitors')
Barplot
seaborn.barplot
This will show the sum of all visits for a given day
sns.barplot(data=df, x='Days', y='Visitors', estimator=sum, ci=None)
plt.xticks(rotation=90)
I would like to plot a graph with the most recent dates on the left instead of the right of the x-axis.
Is there a way to do this in pandas and matplotlib and still get the date axis?
Invert an axis in a matplotlib grafic
shows how to do this for the y-axis using invert_yaxis(). However, this is not available for xaxis.
Set xlim() from pyplot. Let's take this example:
period = pd.period_range("1.1.2013","12.1.2013",freq="M")
data = np.arange(12)
s = pd.Series(data=data,index=period)
#Output
2013-01 0
2013-02 1
2013-03 2
2013-04 3
2013-05 4
2013-06 5
2013-07 6
2013-08 7
2013-09 8
2013-10 9
2013-11 10
2013-12 11
Set first value of xlim to be last index of series and second value to be the first index, like this:
s.plot()
plt.xlim(s.index[-1],s.index[0])
plt.show()
I have a Series with more than 100 000 rows that I want to plot. I have problem with the x-axis of my figure. Since my x-axis is made of several dates, you can't see anything if you plot all of them.
How can I choose to show only 1 out of every x on the x-axis ?
Here is an example of a code which produces a graphic with an ugly x-axis :
sr = pd.Series(np.array(range(15)))
sr.index = [ '2018-06-' + str(x).zfill(2) for x in range(1,16)]
Out :
2018-06-01 0
2018-06-02 1
2018-06-03 2
2018-06-04 3
2018-06-05 4
2018-06-06 5
2018-06-07 6
2018-06-08 7
2018-06-09 8
2018-06-10 9
2018-06-11 10
2018-06-12 11
2018-06-13 12
2018-06-14 13
2018-06-15 14
fig = plt.plot(sr)
plt.xlabel('Date')
plt.ylabel('Sales')
Using xticks you can achieve the desired effect:
In your example:
sr = pd.Series(np.array(range(15)))
sr.index = [ '2018-06-' + str(x).zfill(2) for x in range(1,16)]
fig = plt.plot(sr)
plt.xlabel('Date')
plt.xticks(sr.index[::4]) #Show one in every four dates
plt.ylabel('Sales')
Output:
Also, if you want to set the number of ticks, instead, you can use locator_params:
sr.plot(xticks=sr.reset_index().index)
plt.locator_params(axis='x', nbins=5) #Show five dates
plt.ylabel('Sales')
plt.xlabel('Date')
Output:
I have data that has Dates that I would like to be able to automatically split in a scatterplot and looks like:
Date Level Price
2008-01-01 56 11
2008-01-03 10 12
2008-01-05 52 13
2008-02-01 66 14
2008-05-01 20 10
..
2009-01-01 12 11
2009-02-01 70 11
2009-02-05 56 12
..
2018-01-01 56 10
2018-01-11 10 17
..
Only way I know how to tackle this is to just manually select using iloc and eyeball the dates in the dataframe like this:
fig = plt.figure(figsize=(15,10))
ax1 = fig.add_subplot(111)
ax1.scatter(df.iloc[____, 1], df.iloc[_____, 2], s=10, c='r', marker="o", label='2008')
ax1.scatter(df.iloc[____, 1],df.iloc[____, 2], s=10, c='mediumblue', marker="o", label='2009')
.
.
. (for each year I want)
plt.ylabel('Level', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(loc='upper left', prop={'size': 12});
plt.show()
But this takes a lot of time.
I'd like to automatically loop through each Date's Year and plot different Levels (Y) to Price (X) on colors by that given year and make a legend label for each year.
What would be a good strategy to do this?
I've made a dataframe that has dates and 2 values that looks like:
Date Year Level Price
2008-01-01 2008 56 11
2008-01-03 2008 10 12
2008-01-05 2008 52 13
2008-02-01 2008 66 14
2008-05-01 2008 20 10
..
2009-01-01 2009 12 11
2009-02-01 2009 70 11
2009-02-05 2009 56 12
..
2018-01-01 2018 56 10
2018-01-11 2018 10 17
..
I'm able to plot these by colors on their year by creating a column on their years with df['Year'] = df['Date'].dt.year but I want to also have labels on each Year in the legend.
My code right now for plotting by year looks like:
colors = ['turquoise','orange','red','mediumblue', 'orchid', 'limegreen']
fig = plt.figure(figsize=(15,10))
ax = fig.add_subplot(111)
ax.scatter(df['Price'], df['Level'], s=10, c=df['Year'], marker="o", label=df['Year'], cmap=matplotlib.colors.ListedColormap(colors))
plt.title('Title', fontsize=16)
plt.ylabel('Level', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(loc='upper left', prop={'size': 12});
plt.show()
How can I adjust the labels in the legend to show the year? The way I've done it is just using the Year column but that obviously just gives me results like this:
When you are scattering your points, you will want to make sure that you are accessing a col in your dataframe that exists. In your code, you are trying to access a column called 'Year' which doesn't exist. See below for the problem:
ax.scatter(df['Price'], df['Level'], s=10, c=df['Year'], marker="o", label=df['Year'], cmap=matplotlib.colors.ListedColormap(colors)
In this line of code, where you specify the color (c) you are looking for a column that doesn't exist. As well, you have the same problem with your label that you are passing in. To solve this you need to create a column that contains the year:
Extract all the dates
Grab just the year from each date
Add this to your dataframe
Below is some code to implement these steps:
# Create a list of all the dates
dates = df.Date.values
#Create a list of all of the years using list comprehension
years = [x[0] for x in dates.split('-')]
# Add this column to your dataframe
df['Year'] = years
As well I would direct you to this course to learn more about plotting in python!
https://exlskills.com/learn-en/courses/python-data-modeling-intro-for-machine-learning-python_modeling_for_machine_learning/content