I've made a dataframe that has dates and 2 values that looks like:
Date Year Level Price
2008-01-01 2008 56 11
2008-01-03 2008 10 12
2008-01-05 2008 52 13
2008-02-01 2008 66 14
2008-05-01 2008 20 10
..
2009-01-01 2009 12 11
2009-02-01 2009 70 11
2009-02-05 2009 56 12
..
2018-01-01 2018 56 10
2018-01-11 2018 10 17
..
I'm able to plot these by colors on their year by creating a column on their years with df['Year'] = df['Date'].dt.year but I want to also have labels on each Year in the legend.
My code right now for plotting by year looks like:
colors = ['turquoise','orange','red','mediumblue', 'orchid', 'limegreen']
fig = plt.figure(figsize=(15,10))
ax = fig.add_subplot(111)
ax.scatter(df['Price'], df['Level'], s=10, c=df['Year'], marker="o", label=df['Year'], cmap=matplotlib.colors.ListedColormap(colors))
plt.title('Title', fontsize=16)
plt.ylabel('Level', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(loc='upper left', prop={'size': 12});
plt.show()
How can I adjust the labels in the legend to show the year? The way I've done it is just using the Year column but that obviously just gives me results like this:
When you are scattering your points, you will want to make sure that you are accessing a col in your dataframe that exists. In your code, you are trying to access a column called 'Year' which doesn't exist. See below for the problem:
ax.scatter(df['Price'], df['Level'], s=10, c=df['Year'], marker="o", label=df['Year'], cmap=matplotlib.colors.ListedColormap(colors)
In this line of code, where you specify the color (c) you are looking for a column that doesn't exist. As well, you have the same problem with your label that you are passing in. To solve this you need to create a column that contains the year:
Extract all the dates
Grab just the year from each date
Add this to your dataframe
Below is some code to implement these steps:
# Create a list of all the dates
dates = df.Date.values
#Create a list of all of the years using list comprehension
years = [x[0] for x in dates.split('-')]
# Add this column to your dataframe
df['Year'] = years
As well I would direct you to this course to learn more about plotting in python!
https://exlskills.com/learn-en/courses/python-data-modeling-intro-for-machine-learning-python_modeling_for_machine_learning/content
Related
I have a table that I'm currently trying to display on a bar chart. It is annual data, with various data from the 1st/jan of one year until the 31st/dec of the same year
DATE COUNT
0 2019-01-01 42
1 2019-02-01 3
2 2019-03-01 31
3 2019-04-01 13
4 2019-05-01 1
...
When I plot this with 'date' as the x-axis, plotly is automatically converting the x axis to weeks, so that i have 52 bars instead of 365.
fig = px.histogram(df, x="DATE", y="COUNT", title="title")
fig.update_layout(bargap=0.30)
fig
I've tried updating the ticks with various formats, but this just changes the x axis labels, not the number of bars
I'm not sure how to change it from weekly to daily on the x-axis
I plot data from this dataframe.
Date 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Date
01-01 12.896 13.353 12.959 13.011 13.073 12.721 12.643 12.484 12.876 13.102
01-02 12.915 13.421 12.961 13.103 13.125 12.806 12.644 12.600 12.956 13.075
01-03 12.926 13.379 13.012 13.116 13.112 12.790 12.713 12.634 12.959 13.176
01-04 13.051 13.414 13.045 13.219 13.051 12.829 12.954 12.724 13.047 13.187
01-05 13.176 13.417 13.065 13.148 13.115 12.874 12.956 12.834 13.098 13.123
The plot generates a line plot for each year across every day of the year.
ice_data.plot(figsize=(20,12), title='Arctic Sea Ice Extent', lw=4, fontsize=16, ax=ax, grid=True)
However, this plot generates the xtick labels in the format from the dataframe, which is 01-01 (January 1). I want to change the xtick label to be 01 Jan, but cannot seem to figure out how to manipulate the dataframe using strftime() to accomplish this. Here is a picture for reference.
Convert values to datetimes by to_datetime with specified year for correct parsing 2-29, last convert to custom format by DatetimeIndex.strftime:
idx = pd.to_datetime('2000' + ice_data.index, format='%Y%m-%d').strftime('%d %b')
ice_data = ice_data.set_index(idx)
I have a Pandas DataFrame seems like this
Year EventCode CityName EventCount
2015 10 Jakarta 12
2015 10 Yogjakarta 15
2015 10 Padang 27
...
2015 13 Jayapura 34
2015 14 Jakarta 24
2015 14 Yogjaarta 15
...
2019 14 Jayapura 12
i want to visualize top 5 city that have the biggest EventCount (with pie chart), group by eventcode in every year
How can i do that?
This could be achieved by restructuring your data with pivot_table, filtering on top cities using sort_values and the DataFrame.plot.pie method with subplots parameter:
# Pivot your data
df_piv = df.pivot_table(index='EventCode', columns='CityName',
values='EventCount', aggfunc='sum', fill_value=0)
# Get top 5 cities by total EventCount
plot_cities = df_piv.sum().sort_values(ascending=False).head(5).index
# Plot
df_piv.reindex(columns=plot_cities).plot.pie(subplots=True,
figsize=(10, 7),
layout=(-1, 3))
[out]
Pandas supports plotting each column into a subplot automatically. So you want to select the CityName as index, make EventCode as column and plot.
(df.sort_values('EventCount', ascending=False) # sort descending by `EventCount`
.groupby('EventCode', as_index=False)
.head(5) # get 5 most count within `EventCode`
.pivot(index='CityName', # pivot for plot.pie
columns='EventCode',
values='EventCount'
)
.plot.pie(subplots=True, # plot with some options
figsize=(10,6),
layout=(2,3))
)
Output:
I have data that has Dates that I would like to be able to automatically split in a scatterplot and looks like:
Date Level Price
2008-01-01 56 11
2008-01-03 10 12
2008-01-05 52 13
2008-02-01 66 14
2008-05-01 20 10
..
2009-01-01 12 11
2009-02-01 70 11
2009-02-05 56 12
..
2018-01-01 56 10
2018-01-11 10 17
..
Only way I know how to tackle this is to just manually select using iloc and eyeball the dates in the dataframe like this:
fig = plt.figure(figsize=(15,10))
ax1 = fig.add_subplot(111)
ax1.scatter(df.iloc[____, 1], df.iloc[_____, 2], s=10, c='r', marker="o", label='2008')
ax1.scatter(df.iloc[____, 1],df.iloc[____, 2], s=10, c='mediumblue', marker="o", label='2009')
.
.
. (for each year I want)
plt.ylabel('Level', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(loc='upper left', prop={'size': 12});
plt.show()
But this takes a lot of time.
I'd like to automatically loop through each Date's Year and plot different Levels (Y) to Price (X) on colors by that given year and make a legend label for each year.
What would be a good strategy to do this?
I have a temperature file with many years temperature records, in a format as below:
2012-04-12,16:13:09,20.6
2012-04-12,17:13:09,20.9
2012-04-12,18:13:09,20.6
2007-05-12,19:13:09,5.4
2007-05-12,20:13:09,20.6
2007-05-12,20:13:09,20.6
2005-08-11,11:13:09,20.6
2005-08-11,11:13:09,17.5
2005-08-13,07:13:09,20.6
2006-04-13,01:13:09,20.6
Every year has different numbers, time of the records, so the pandas datetimeindices are all different.
I want to plot the different year's data in the same figure for comparing . The X-axis is Jan to Dec, the Y-axis is temperature. How should I go about doing this?
Try:
ax = df1.plot()
df2.plot(ax=ax)
If you a running Jupyter/Ipython notebook and having problems using;
ax = df1.plot()
df2.plot(ax=ax)
Run the command inside of the same cell!! It wont, for some reason, work when they are separated into sequential cells. For me at least.
Chang's answer shows how to plot a different DataFrame on the same axes.
In this case, all of the data is in the same dataframe, so it's better to use groupby and unstack.
Alternatively, pandas.DataFrame.pivot_table can be used.
dfp = df.pivot_table(index='Month', columns='Year', values='value', aggfunc='mean')
When using pandas.read_csv, names= creates column headers when there are none in the file. The 'date' column must be parsed into datetime64[ns] Dtype so the .dt extractor can be used to extract the month and year.
import pandas as pd
# given the data in a file as shown in the op
df = pd.read_csv('temp.csv', names=['date', 'time', 'value'], parse_dates=['date'])
# create additional month and year columns for convenience
df['Year'] = df.date.dt.year
df['Month'] = df.date.dt.month
# groupby the month a year and aggreate mean on the value column
dfg = df.groupby(['Month', 'Year'])['value'].mean().unstack()
# display(dfg)
Year 2005 2006 2007 2012
Month
4 NaN 20.6 NaN 20.7
5 NaN NaN 15.533333 NaN
8 19.566667 NaN NaN NaN
Now it's easy to plot each year as a separate line. The OP only has one observation for each year, so only a marker is displayed.
ax = dfg.plot(figsize=(9, 7), marker='.', xticks=dfg.index)
To do this for multiple dataframes, you can do a for loop over them:
fig = plt.figure(num=None, figsize=(10, 8))
ax = dict_of_dfs['FOO'].column.plot()
for BAR in dict_of_dfs.keys():
if BAR == 'FOO':
pass
else:
dict_of_dfs[BAR].column.plot(ax=ax)
This can also be implemented without the if condition:
fig, ax = plt.subplots()
for BAR in dict_of_dfs.keys():
dict_of_dfs[BAR].plot(ax=ax)
You can make use of the hue parameter in seaborn. For example:
import seaborn as sns
df = sns.load_dataset('flights')
year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
.. ... ... ...
139 1960 Aug 606
140 1960 Sep 508
141 1960 Oct 461
142 1960 Nov 390
143 1960 Dec 432
sns.lineplot(x='month', y='passengers', hue='year', data=df)