Plotting multiple dates on year in scatterplot Python - python

I have data that has Dates that I would like to be able to automatically split in a scatterplot and looks like:
Date Level Price
2008-01-01 56 11
2008-01-03 10 12
2008-01-05 52 13
2008-02-01 66 14
2008-05-01 20 10
..
2009-01-01 12 11
2009-02-01 70 11
2009-02-05 56 12
..
2018-01-01 56 10
2018-01-11 10 17
..
Only way I know how to tackle this is to just manually select using iloc and eyeball the dates in the dataframe like this:
fig = plt.figure(figsize=(15,10))
ax1 = fig.add_subplot(111)
ax1.scatter(df.iloc[____, 1], df.iloc[_____, 2], s=10, c='r', marker="o", label='2008')
ax1.scatter(df.iloc[____, 1],df.iloc[____, 2], s=10, c='mediumblue', marker="o", label='2009')
.
.
. (for each year I want)
plt.ylabel('Level', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(loc='upper left', prop={'size': 12});
plt.show()
But this takes a lot of time.
I'd like to automatically loop through each Date's Year and plot different Levels (Y) to Price (X) on colors by that given year and make a legend label for each year.
What would be a good strategy to do this?

Related

Matplotlib scatter plot automatically duplicate datetime xticks

In my dataframe I have, for each day, the mean value of cardiac frequency during the day and during the night that I previously calculated. The problem is that when I try to do a scatter plot, the values on the x axis get duplicated.
So this is my df.
date FC_mean_day FC_mean_night
0 2022-12-28 79.43 74.11
1 2022-12-29 74.25 75.00
2 2022-12-30 75.75 74.40
3 2022-12-31 70.91 72.90
4 2023-01-01 68.43 73.00
date datetime64\[ns\]
FC_mean_day float64
FC_mean_night float64
dtype: object
And this is what I have done to generate the scatter plot
fig,ax = plt.subplots()
plt.scatter(df5.date, df5.FC_mean_night, color= 'blue', label = 'Notte(22-06)', alpha=1, )
plt.scatter(df5.date, df5.FC_mean_day, color= 'yellow', label = 'Giorno(06-22)', alpha=1,)
my_format = mdates.DateFormatter('%d/%m/%y')
plt.rcParams["figure.figsize"] = (3,4)
plt.title("Media della Frequenza Cardiaca")
plt.xlabel('Data')
plt.ylabel('Valore')
plt.xticks(rotation = 90)
plt.legend(loc='center left', bbox_to_anchor=(1,0.5))
ax.yaxis.grid(color='grey', linestyle = 'dashed')
ax.set_axisbelow(True)
ax.xaxis.set_major_formatter(my_format)
plt.show()
And this is what it generates:
scatter with duplicates
I noticed that when I have some values in the next day and not the night the problem doesn't happen, like in this case:
date FC_mean_day FC_mean_night
0 2022-12-28 79.43 74.11
1 2022-12-29 74.25 75.00
2 2022-12-30 75.75 74.40
3 2022-12-31 70.91 72.90
4 2023-01-01 68.43 73.00
5 2023-01-02 75.00 NaN
scatter without duplicates
What am I missing?
You already used a dateformatter, so you just need set a locator with Axis.set_major_locator.
ax.xaxis.set_major_formatter(my_format)
ax.xaxis.set_major_locator(mdates.DayLocator(interval=1)) # <- add this line
Output :

Change plotly bar chart x axis from weeks to days

I have a table that I'm currently trying to display on a bar chart. It is annual data, with various data from the 1st/jan of one year until the 31st/dec of the same year
DATE COUNT
0 2019-01-01 42
1 2019-02-01 3
2 2019-03-01 31
3 2019-04-01 13
4 2019-05-01 1
...
When I plot this with 'date' as the x-axis, plotly is automatically converting the x axis to weeks, so that i have 52 bars instead of 365.
fig = px.histogram(df, x="DATE", y="COUNT", title="title")
fig.update_layout(bargap=0.30)
fig
I've tried updating the ticks with various formats, but this just changes the x axis labels, not the number of bars
I'm not sure how to change it from weekly to daily on the x-axis

How to plot evenly spaced values on the x axis while plotting using matplotlib

I have a Series with more than 100 000 rows that I want to plot. I have problem with the x-axis of my figure. Since my x-axis is made of several dates, you can't see anything if you plot all of them.
How can I choose to show only 1 out of every x on the x-axis ?
Here is an example of a code which produces a graphic with an ugly x-axis :
sr = pd.Series(np.array(range(15)))
sr.index = [ '2018-06-' + str(x).zfill(2) for x in range(1,16)]
Out :
2018-06-01 0
2018-06-02 1
2018-06-03 2
2018-06-04 3
2018-06-05 4
2018-06-06 5
2018-06-07 6
2018-06-08 7
2018-06-09 8
2018-06-10 9
2018-06-11 10
2018-06-12 11
2018-06-13 12
2018-06-14 13
2018-06-15 14
fig = plt.plot(sr)
plt.xlabel('Date')
plt.ylabel('Sales')
Using xticks you can achieve the desired effect:
In your example:
sr = pd.Series(np.array(range(15)))
sr.index = [ '2018-06-' + str(x).zfill(2) for x in range(1,16)]
fig = plt.plot(sr)
plt.xlabel('Date')
plt.xticks(sr.index[::4]) #Show one in every four dates
plt.ylabel('Sales')
Output:
Also, if you want to set the number of ticks, instead, you can use locator_params:
sr.plot(xticks=sr.reset_index().index)
plt.locator_params(axis='x', nbins=5) #Show five dates
plt.ylabel('Sales')
plt.xlabel('Date')
Output:

Creating labels based on year in column Python

I've made a dataframe that has dates and 2 values that looks like:
Date Year Level Price
2008-01-01 2008 56 11
2008-01-03 2008 10 12
2008-01-05 2008 52 13
2008-02-01 2008 66 14
2008-05-01 2008 20 10
..
2009-01-01 2009 12 11
2009-02-01 2009 70 11
2009-02-05 2009 56 12
..
2018-01-01 2018 56 10
2018-01-11 2018 10 17
..
I'm able to plot these by colors on their year by creating a column on their years with df['Year'] = df['Date'].dt.year but I want to also have labels on each Year in the legend.
My code right now for plotting by year looks like:
colors = ['turquoise','orange','red','mediumblue', 'orchid', 'limegreen']
fig = plt.figure(figsize=(15,10))
ax = fig.add_subplot(111)
ax.scatter(df['Price'], df['Level'], s=10, c=df['Year'], marker="o", label=df['Year'], cmap=matplotlib.colors.ListedColormap(colors))
plt.title('Title', fontsize=16)
plt.ylabel('Level', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(loc='upper left', prop={'size': 12});
plt.show()
How can I adjust the labels in the legend to show the year? The way I've done it is just using the Year column but that obviously just gives me results like this:
When you are scattering your points, you will want to make sure that you are accessing a col in your dataframe that exists. In your code, you are trying to access a column called 'Year' which doesn't exist. See below for the problem:
ax.scatter(df['Price'], df['Level'], s=10, c=df['Year'], marker="o", label=df['Year'], cmap=matplotlib.colors.ListedColormap(colors)
In this line of code, where you specify the color (c) you are looking for a column that doesn't exist. As well, you have the same problem with your label that you are passing in. To solve this you need to create a column that contains the year:
Extract all the dates
Grab just the year from each date
Add this to your dataframe
Below is some code to implement these steps:
# Create a list of all the dates
dates = df.Date.values
#Create a list of all of the years using list comprehension
years = [x[0] for x in dates.split('-')]
# Add this column to your dataframe
df['Year'] = years
As well I would direct you to this course to learn more about plotting in python!
https://exlskills.com/learn-en/courses/python-data-modeling-intro-for-machine-learning-python_modeling_for_machine_learning/content

How to plot different groups of data from a dataframe into a single figure

I have a temperature file with many years temperature records, in a format as below:
2012-04-12,16:13:09,20.6
2012-04-12,17:13:09,20.9
2012-04-12,18:13:09,20.6
2007-05-12,19:13:09,5.4
2007-05-12,20:13:09,20.6
2007-05-12,20:13:09,20.6
2005-08-11,11:13:09,20.6
2005-08-11,11:13:09,17.5
2005-08-13,07:13:09,20.6
2006-04-13,01:13:09,20.6
Every year has different numbers, time of the records, so the pandas datetimeindices are all different.
I want to plot the different year's data in the same figure for comparing . The X-axis is Jan to Dec, the Y-axis is temperature. How should I go about doing this?
Try:
ax = df1.plot()
df2.plot(ax=ax)
If you a running Jupyter/Ipython notebook and having problems using;
ax = df1.plot()
df2.plot(ax=ax)
Run the command inside of the same cell!! It wont, for some reason, work when they are separated into sequential cells. For me at least.
Chang's answer shows how to plot a different DataFrame on the same axes.
In this case, all of the data is in the same dataframe, so it's better to use groupby and unstack.
Alternatively, pandas.DataFrame.pivot_table can be used.
dfp = df.pivot_table(index='Month', columns='Year', values='value', aggfunc='mean')
When using pandas.read_csv, names= creates column headers when there are none in the file. The 'date' column must be parsed into datetime64[ns] Dtype so the .dt extractor can be used to extract the month and year.
import pandas as pd
# given the data in a file as shown in the op
df = pd.read_csv('temp.csv', names=['date', 'time', 'value'], parse_dates=['date'])
# create additional month and year columns for convenience
df['Year'] = df.date.dt.year
df['Month'] = df.date.dt.month
# groupby the month a year and aggreate mean on the value column
dfg = df.groupby(['Month', 'Year'])['value'].mean().unstack()
# display(dfg)
Year 2005 2006 2007 2012
Month
4 NaN 20.6 NaN 20.7
5 NaN NaN 15.533333 NaN
8 19.566667 NaN NaN NaN
Now it's easy to plot each year as a separate line. The OP only has one observation for each year, so only a marker is displayed.
ax = dfg.plot(figsize=(9, 7), marker='.', xticks=dfg.index)
To do this for multiple dataframes, you can do a for loop over them:
fig = plt.figure(num=None, figsize=(10, 8))
ax = dict_of_dfs['FOO'].column.plot()
for BAR in dict_of_dfs.keys():
if BAR == 'FOO':
pass
else:
dict_of_dfs[BAR].column.plot(ax=ax)
This can also be implemented without the if condition:
fig, ax = plt.subplots()
for BAR in dict_of_dfs.keys():
dict_of_dfs[BAR].plot(ax=ax)
You can make use of the hue parameter in seaborn. For example:
import seaborn as sns
df = sns.load_dataset('flights')
year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
.. ... ... ...
139 1960 Aug 606
140 1960 Sep 508
141 1960 Oct 461
142 1960 Nov 390
143 1960 Dec 432
sns.lineplot(x='month', y='passengers', hue='year', data=df)

Categories

Resources