I plot data from this dataframe.
Date 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Date
01-01 12.896 13.353 12.959 13.011 13.073 12.721 12.643 12.484 12.876 13.102
01-02 12.915 13.421 12.961 13.103 13.125 12.806 12.644 12.600 12.956 13.075
01-03 12.926 13.379 13.012 13.116 13.112 12.790 12.713 12.634 12.959 13.176
01-04 13.051 13.414 13.045 13.219 13.051 12.829 12.954 12.724 13.047 13.187
01-05 13.176 13.417 13.065 13.148 13.115 12.874 12.956 12.834 13.098 13.123
The plot generates a line plot for each year across every day of the year.
ice_data.plot(figsize=(20,12), title='Arctic Sea Ice Extent', lw=4, fontsize=16, ax=ax, grid=True)
However, this plot generates the xtick labels in the format from the dataframe, which is 01-01 (January 1). I want to change the xtick label to be 01 Jan, but cannot seem to figure out how to manipulate the dataframe using strftime() to accomplish this. Here is a picture for reference.
Convert values to datetimes by to_datetime with specified year for correct parsing 2-29, last convert to custom format by DatetimeIndex.strftime:
idx = pd.to_datetime('2000' + ice_data.index, format='%Y%m-%d').strftime('%d %b')
ice_data = ice_data.set_index(idx)
Related
The dataframe I am trying to graph is below. I want to plot each fieldname as the legend item with x=year and y=value
The name of the dataframe is my_gross
fieldName thisType value year
0 diluted_shares_outstanding unit 9.637900e+07 2015
1 diluted_shares_outstanding unit 8.777500e+07 2016
2 diluted_shares_outstanding unit 8.556200e+07 2017
3 diluted_shares_outstanding unit 8.353000e+07 2018
4 diluted_shares_outstanding unit 7.771000e+07 2019
5 diluted_shares_outstanding unit 7.292900e+07 2020
6 eps gross 7.360470e+08 2015
7 eps gross 7.285207e+08 2016
8 eps gross 8.944702e+08 2017
9 eps gross 1.298734e+09 2018
10 eps gross 1.451550e+09 2019
11 eps gross 1.259110e+09 2020
18 sales_revenue gross 5.817000e+09 2015
19 sales_revenue gross 5.762000e+09 2016
20 sales_revenue gross 6.641000e+09 2017
21 sales_revenue gross 8.047000e+09 2018
22 sales_revenue gross 9.351000e+09 2019
23 sales_revenue gross 8.530000e+09 2020
The following code is what I ran to create a graph, but I get undesired results.
for item in my_gross['fieldName']:
plt.plot(my_gross['year'], my_gross['value'],label=item)
plt.legend()
plt.xticks(rotation=45)
plt.show()
Results
undesired graph
The result I am trying to get is similar to this graph
desired graph
Do I need to create a dictionary for unique values and do some sort of count and then loop through that dictionary instead of the df itself?
The standard pandas and matplotlib approach is to pivot to wide-form and plot:
import pandas as pd
from matplotlib import pyplot as plt
plot_df = df.pivot(index='year',
columns='fieldName',
values='value')
plot_df.plot()
plt.tight_layout()
plt.show()
plot_df:
fieldName diluted_shares_outstanding eps sales_revenue
year
2015 96379000.0 7.360470e+08 5.817000e+09
2016 87775000.0 7.285207e+08 5.762000e+09
2017 85562000.0 8.944702e+08 6.641000e+09
2018 83530000.0 1.298734e+09 8.047000e+09
2019 77710000.0 1.451550e+09 9.351000e+09
2020 72929000.0 1.259110e+09 8.530000e+09
seaborn.lineplot has built-in functionality with hue without needing to reshape:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
sns.lineplot(data=df, x='year', y='value', hue='fieldName')
plt.tight_layout()
plt.show()
There're several ways to do it, depending on libraries available.
Using just pandas (with matplotlib being used by pandas in backend):
Loop over unique values in your 'fieldName' column, filter the DataFrame to only include that value, set index to year (this will be your x-axis), and choose the value you intent to plot ('value' Series), then plot it.
for fieldname in df['fieldName'].unique():
df[df['fieldName'] == fieldname].set_index('year')['value'].plot(label = fieldname)
plt.legend()
Outputs:
EDIT:
Seems like a relatively simple groupby works (no loops needed):
df.set_index('year').groupby('fieldName')['value'].plot()
plt.legend()
Outputs:
I have a dataframe here that contains a value daily since 2000 (ignore the index).
Extent Date
6453 13.479 2001-01-01
6454 13.385 2001-01-02
6455 13.418 2001-01-03
6456 13.510 2001-01-04
6457 13.566 2001-01-05
I would like to make a plot where the x-axis is the day of the year, and the y-axis is the value. The plot would contain 20 different lines, with each line corresponding to the year of the data. Is there an intuitive way to do this using pandas, or is it easier to do with matplotlib?
Here is a quick paint sketch to illustrate.
One quick way is to plot x-axis as strings:
df['Date'] = pd.to_datetime(df['Date'])
(df.set_index([df.Date.dt.strftime('%m-%d'),
df.Date.dt.year])
.Extent.unstack()
.plot()
)
I've made a dataframe that has dates and 2 values that looks like:
Date Year Level Price
2008-01-01 2008 56 11
2008-01-03 2008 10 12
2008-01-05 2008 52 13
2008-02-01 2008 66 14
2008-05-01 2008 20 10
..
2009-01-01 2009 12 11
2009-02-01 2009 70 11
2009-02-05 2009 56 12
..
2018-01-01 2018 56 10
2018-01-11 2018 10 17
..
I'm able to plot these by colors on their year by creating a column on their years with df['Year'] = df['Date'].dt.year but I want to also have labels on each Year in the legend.
My code right now for plotting by year looks like:
colors = ['turquoise','orange','red','mediumblue', 'orchid', 'limegreen']
fig = plt.figure(figsize=(15,10))
ax = fig.add_subplot(111)
ax.scatter(df['Price'], df['Level'], s=10, c=df['Year'], marker="o", label=df['Year'], cmap=matplotlib.colors.ListedColormap(colors))
plt.title('Title', fontsize=16)
plt.ylabel('Level', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(loc='upper left', prop={'size': 12});
plt.show()
How can I adjust the labels in the legend to show the year? The way I've done it is just using the Year column but that obviously just gives me results like this:
When you are scattering your points, you will want to make sure that you are accessing a col in your dataframe that exists. In your code, you are trying to access a column called 'Year' which doesn't exist. See below for the problem:
ax.scatter(df['Price'], df['Level'], s=10, c=df['Year'], marker="o", label=df['Year'], cmap=matplotlib.colors.ListedColormap(colors)
In this line of code, where you specify the color (c) you are looking for a column that doesn't exist. As well, you have the same problem with your label that you are passing in. To solve this you need to create a column that contains the year:
Extract all the dates
Grab just the year from each date
Add this to your dataframe
Below is some code to implement these steps:
# Create a list of all the dates
dates = df.Date.values
#Create a list of all of the years using list comprehension
years = [x[0] for x in dates.split('-')]
# Add this column to your dataframe
df['Year'] = years
As well I would direct you to this course to learn more about plotting in python!
https://exlskills.com/learn-en/courses/python-data-modeling-intro-for-machine-learning-python_modeling_for_machine_learning/content
i have DataFrame with Month,Year and Value and i want to do a TimeSeries Plot.
Sample:
month year Value
12 2016 0.006437804129357764
1 2017 0.013850880792606646
2 2017 0.013330349031207292
3 2017 0.07663058273768052
4 2017 0.7822831457266424
5 2017 0.8089573099244689
6 2017 1.1634845000200715
im trying to plot this Value data with Year and Month, Year and Month in X-Axis and Value in Y-Axis.
One way is this:
import pandas as pd
import matplotlib.pyplot as plt
df['date'] = df['month'].map(str)+ '-' +df['year'].map(str)
df['date'] = pd.to_datetime(df['date'], format='%m-%Y').dt.strftime('%m-%Y')
fig, ax = plt.subplots()
plt.plot_date(df['date'], df['Value'])
plt.show()
You need to set a DateTime index for pandas to properly plot the axis. A one line modification for your dataframe (assuming you don't need year and month anymore as columns and that first day of each month is correct) would do:
df.set_index(pd.to_datetime({
'day': 1,
'month': df.pop('month'),
'year': df.pop('year')
}), inplace=True)
df.Value.plot()
I have a dataframe df2 with dates in string form and numbers
date | value
2018-02-02 130
2018-02-05 360
2018-02-06 98
2018-02-07 150
When I plot the dates in a plot along the x axis, the dates returned are incorrect. They seem to translate like so:
2018-02-06 = Jan 1, 1970
2018-02-05 = Jan 13.5, 1970
source=ColumnDataSource(data=df2)
p= figure(plot_width=700, plot_height=400, x_axis_type="datetime")
p.xaxis.formatter= DatetimeTickFormatter(days="%B/%d/%Y")
p.triangle("DATE","VALUE",color='black',source=source)
The glyphs don't fall exactly on the gridlines either. What is happening?
p.triangle(df2["DATE"].dt.date,df2["VALUE"],color='Black') will yield the date
I have a nasty habit of finding the right answers shortly after posting it on stackoverflow. I think I may do that more often!