How can I change the x and y-axis labels in plotly because in matplotlib, I can simply use plt.xlabel but I am unable to do that in plotly.
By using this code in a dataframe:
Date = df[df.Country=="India"].Date
New_cases = df[df.Country=="India"]['7day_rolling_avg']
px.line(df,x=Date, y=New_cases, title="India Daily New Covid Cases")
I get this output:
In this X and Y axis are labeled as X and Y how can I change the name of X and Y axis to "Date" and "Cases"
simple case of setting axis title
update_layout(
xaxis_title="Date", yaxis_title="7 day avg"
)
full code as MWE
import pandas as pd
import io, requests
df = pd.read_csv(
io.StringIO(
requests.get(
"https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv"
).text
)
)
df["Date"] = pd.to_datetime(df["date"])
df["Country"] = df["location"]
df["7day_rolling_avg"] = df["daily_people_vaccinated_per_hundred"]
Date = df[df.Country == "India"].Date
New_cases = df[df.Country == "India"]["7day_rolling_avg"]
px.line(df, x=Date, y=New_cases, title="India Daily New Covid Cases").update_layout(
xaxis_title="Date", yaxis_title="7 day avg"
)
Related
I would like to improve my bitcoin dataset but I found that the date is not sorted in the right way and want to show only the month and year. How can I do it?
data = Bitcoin_Historical['Price']
Date1 = Bitcoin_Historical['Date']
train1 = Bitcoin_Historical[['Date','Price']]
#Setting the Date as Index
train2 = train1.set_index('Date')
train2.sort_index(inplace=True)
cols = ['Price']
train2 = train2[cols].apply(lambda x: pd.to_numeric(x.astype(str)
.str.replace(',',''), errors='coerce'))
print (type(train2))
print (train2.head())
plt.figure(figsize=(15, 5))
plt.plot(train2)
plt.xlabel('Date', fontsize=12)
plt.xlim(0,20)
plt.ylabel('Price', fontsize=12)
plt.title("Closing price distribution of bitcoin", fontsize=15)
plt.gcf().autofmt_xdate()
plt.show()
The result shows picture below:
It's not ordered and shows all dates. I would like to order by month+year and show only the month name+year. How can that be done?
Example of Data:
Thank you
I've made the following edits to your code:
converted the column Date column as datetime type
cleaned up the Price column and converting to float
removed the line plt.xlim(0,20) which is causing the output to display 1970
used alternative way to plot, so that the x-axis can be formatted to get monthly tick marks, more info here
Please try the code below:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
pd.options.mode.chained_assignment = None
Bitcoin_Historical = pd.read_csv('data.csv')
train1 = Bitcoin_Historical[['Date','Price']]
train1['Date'] = pd.to_datetime(train1['Date'], infer_datetime_format=True, errors='coerce')
train1['Price'] = train1['Price'].str.replace(',','').str.replace(' ','').astype(float)
train2 = train1.set_index('Date') #Setting the Date as Index
train2.sort_index(inplace=True)
print (type(train2))
print (train2.head())
ax = train2.plot(figsize=(15, 5))
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%b'))
plt.xlabel('Date', fontsize=12)
plt.ylabel('Price', fontsize=12)
plt.title("Closing price distribution of bitcoin", fontsize=15)
plt.show()
Output
Try to cast your "Date" column into datetime, check if it does the trick:
train1.Date = pd.to_datetime(train1.Date)
train2 = train1.set_index('Date')
I essentially have two different data frames, one for calculating weekly data (df) and a second one (df1) that has the plot values of the stock/crypto. On df, I have created a pandas column 'pivot' ((open+high+low)/3) using the weekly data to create a set of values containing the weekly pivot values.
Now I want to plot these weekly data (as lines) onto df1 which has the daily data. Therefore the x1 would be the start of the week and x2 be the end of the week. the y values being the pivot value from the df(weekly).
Here is what I would want it to look like:
My Approach & Problem:
First of all, I am a beginner in Python, this is my second month of learning. My apologies if this was asked before.
I know the pivot values can be calculated using a single data frame & pandas group-by but I want to take the issue after this is done, so both ways should be fine if you are approaching this issue. What I would like to have is those final lines with OHLC candlesticks. I would like to plot these results using Plotly OHLC and go Shapes. What I am stuck with is iterating through the pivot weekly data frame and adding the lines as traces on top of the OHLC data daily data.
Here's my code so far:
import yfinance as yf
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta
df = yf.download( tickers = 'BTC-USD',
start = '2021-08-30',
end = datetime.today().strftime('%Y-%m-%d'),
interval = '1wk',
group_by = 'ticker',
auto_adjust = True).reset_index()
#daily df for plot
df2 = yf.download( tickers = 'BTC-USD',
start = '2021-08-30',
end = datetime.today().strftime('%Y-%m-%d'),
interval = '1d',
group_by = 'ticker',
auto_adjust = True).reset_index()
#small cap everything
df = df.rename(columns={'Date':'date',
'Open': 'open',
'High': 'high',
'Low' : 'low',
'Close' : 'close'})
df['pivot'] = (df['high']+ df['low'] + df['close'])/3
result = df.copy()
fig = go.Figure(data = [go.Candlestick(x= df['date'],
open = df['open'],
high = df['high'],
low = df['low'],
close = df['close'],
name = 'Price Candle')])
This would be for plotting until the candlesticks OHLC, however, the rest iteration is what is troubling me. You can plot it on a line chart or on an OHLC chart and iterate it.
fig = px.line(df, x='time', y='close')
result = df.copy()
for i, pivot in result.iterrows():
fig.add_shape(type="line",
x0=pivot.date, y0=pivot, x1=pivot.date, y1=pivot,
line=dict(
color="green",
width=3)
)
fig
When I print this no pivot lines appear the way I want them to show.Only the original price line graph shows
Thanks in advance for taking the time to read this so far.
There are two ways to create a line segment: add a shape or use line mode on a scatter plot. I think the line mode of scatter plots is more advantageous because it allows for more detailed settings. For the data frame, introduce a loop process on a line-by-line basis to get the next line using the idx of the data frame. y-axis values are pivot values. I wanted to get Yokohama, so I moved the legend position up. Also, since we are looping through the scatter plot, we will have many legends for the pivot values, so we set the legend display to True for the first time only.
import yfinance as yf
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta
df = yf.download( tickers = 'BTC-USD',
start = '2021-08-30',
end = datetime.today().strftime('%Y-%m-%d'),
interval = '1wk',
group_by = 'ticker',
auto_adjust = True).reset_index()
#daily df for plot
df2 = yf.download( tickers = 'BTC-USD',
start = '2021-08-30',
end = datetime.today().strftime('%Y-%m-%d'),
interval = '1d',
group_by = 'ticker',
auto_adjust = True).reset_index()
#small cap everything
df = df.rename(columns={'Date':'date',
'Open': 'open',
'High': 'high',
'Low' : 'low',
'Close' : 'close'})
df['pivot'] = (df['high']+ df['low'] + df['close'])/3
fig = go.Figure()
fig.add_trace(go.Candlestick(x= df['date'],
open = df['open'],
high = df['high'],
low = df['low'],
close = df['close'],
name = 'Price Candle',
legendgroup='one'
)
)
#fig.add_trace(go.Scatter(mode='lines', x=df['date'], y=df['pivot'], line=dict(color='green'), name='pivot'))
for idx, row in df.iterrows():
#print(idx)
if idx == len(df)-2:
break
fig.add_trace(go.Scatter(mode='lines',
x=[row['date'], df.loc[idx+1,'date']],
y=[row['pivot'], row['pivot']],
line=dict(color='blue', width=1),
name='pivot',
showlegend=True if idx == 0 else False,
)
)
fig.update_layout(
autosize=False,
height=600,
width=1100,
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1)
)
fig.update_xaxes(rangeslider_visible=False)
fig.show()
Data I'm working with: https://drive.google.com/file/d/1xb7icmocz-SD2Rkq4ykTZowxW0uFFhBl/view?usp=sharing
Hey everyone,
I am a bit stuck with editing a plot.
Basically, I would like my x value to display the months in the year, but it doesn't seem to work because of the data type (?). Do you have any idea how I could get my plot to have months in the x axis?
If you need more context about the data, please let me know!!!
Thank you!
Here's my code for the plot and the initial data modifications:
import matplotlib.pyplot as plt
import mplleaflet
import pandas as pd
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
import numpy as np
df = pd.read_csv("data/C2A2_data/BinnedCsvs_d400/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv")
df['degrees']=df['Data_Value']/10
df['Date'] = pd.to_datetime(df['Date'])
df2 = df[df['Date']<'2015-01-01']
df3 = df[df['Date']>='2015-01-01']
max_temp = df2.groupby([(df2.Date.dt.month),(df2.Date.dt.day)])['degrees'].max()
min_temp = df2.groupby([(df2.Date.dt.month),(df2.Date.dt.day)])['degrees'].min()
max_temp2 = df3.groupby([(df3.Date.dt.month),(df3.Date.dt.day)])['degrees'].max()
min_temp2 = df3.groupby([(df3.Date.dt.month),(df3.Date.dt.day)])['degrees'].min()
max_temp.plot(x ='Date', y='degrees', kind = 'line')
min_temp.plot(x ='Date',y='degrees', kind= 'line')
plt.fill_between(range(len(min_temp)),min_temp, max_temp, color='C0', alpha=0.2)
ax = plt.gca()
ax.set(xlabel="Date",
ylabel="Temperature",
title="Extreme Weather in 2015")
plt.legend()
plt.tight_layout()
x = plt.gca().xaxis
for item in x.get_ticklabels():
item.set_rotation(45)
plt.show()
Plot I'm getting:
Option 1 (Most Similar Approach)
Change the index based on month abbreviations using Index.map and calendar
This is just for df2:
import calendar
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("...")
df['degrees'] = df['Data_Value'] / 10
df['Date'] = pd.to_datetime(df['Date'])
df2 = df[df['Date'] < '2015-01-01']
max_temp = df2.groupby([df2.Date.dt.month, df2.Date.dt.day])['degrees'].max()
min_temp = df2.groupby([df2.Date.dt.month, df2.Date.dt.day])['degrees'].min()
# Update the index to be the desired display format for x-axis
max_temp.index = max_temp.index.map(lambda x: f'{calendar.month_abbr[x[0]]}')
min_temp.index = min_temp.index.map(lambda x: f'{calendar.month_abbr[x[0]]}')
max_temp.plot(x='Date', y='degrees', kind='line')
min_temp.plot(x='Date', y='degrees', kind='line')
plt.fill_between(range(len(min_temp)), min_temp, max_temp,
color='C0', alpha=0.2)
ax = plt.gca()
ax.set(xlabel="Date", ylabel="Temperature", title="Extreme Weather 2005-2014")
x = plt.gca().xaxis
for item in x.get_ticklabels():
item.set_rotation(45)
plt.margins(x=0)
plt.legend()
plt.tight_layout()
plt.show()
As an aside: the title "Extreme Weather in 2015" is incorrect because this data includes all years before 2015. This is "Extreme Weather 2005-2014"
The year range can be checked with min and max as well:
print(df2.Date.dt.year.min(), '-', df2.Date.dt.year.max())
# 2005 - 2014
The title could be programmatically generated with:
title=f"Extreme Weather {df2.Date.dt.year.min()}-{df2.Date.dt.year.max()}"
Option 2 (Simplifying groupby step)
Simplify the code using groupby aggregate to create a single DataFrame then convert the index in the same way as above:
import calendar
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("...")
df['degrees'] = df['Data_Value'] / 10
df['Date'] = pd.to_datetime(df['Date'])
df2 = df[df['Date'] < '2015-01-01']
# Get Max and Min Degrees in Single Groupby
df2_temp = (
df2.groupby([df2.Date.dt.month, df2.Date.dt.day])['degrees']
.agg(['max', 'min'])
)
# Convert Index to whatever display format is desired:
df2_temp.index = df2_temp.index.map(lambda x: f'{calendar.month_abbr[x[0]]}')
# Plot
ax = df2_temp.plot(
kind='line', rot=45,
xlabel="Date", ylabel="Temperature",
title=f"Extreme Weather {df2.Date.dt.year.min()}-{df2.Date.dt.year.max()}"
)
# Fill between
plt.fill_between(range(len(df2_temp)), df2_temp['min'], df2_temp['max'],
color='C0', alpha=0.2)
plt.margins(x=0)
plt.tight_layout()
plt.show()
Option 3 (Best overall functionality)
Convert the index to a datetime using pd.to_datetime. Choose any leap year to uniform the data (it must be a leap year so Feb-29 does not raise an error). Then set the set_major_formatter using the format string %b to use the month abbreviation:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("...")
df['degrees'] = df['Data_Value'] / 10
df['Date'] = pd.to_datetime(df['Date'])
df2 = df[df['Date'] < '2015-01-01']
# Get Max and Min Degrees in Single Groupby
df2_temp = (
df2.groupby([df2.Date.dt.month, df2.Date.dt.day])['degrees']
.agg(['max', 'min'])
)
# Convert to DateTime of Same Year
# (Must be a leap year so Feb-29 doesn't raise an error)
df2_temp.index = pd.to_datetime(
'2000-' + df2_temp.index.map(lambda s: '-'.join(map(str, s)))
)
# Plot
ax = df2_temp.plot(
kind='line', rot=45,
xlabel="Date", ylabel="Temperature",
title=f"Extreme Weather {df2.Date.dt.year.min()}-{df2.Date.dt.year.max()}"
)
# Fill between
plt.fill_between(df2_temp.index, df2_temp['min'], df2_temp['max'],
color='C0', alpha=0.2)
# Set xaxis formatter to month abbr with the %b format string
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
plt.tight_layout()
plt.show()
The benefit of this approach is that the index is a datetime and therefore will format better than the string representations of options 1 and 2.
I have the following dataframe 'df_percentages':
df_percentages
Percentages_center Percentages zone2 Percentages total
Sleeping 77.496214 87.551742 12.202591
Low activity 21.339391 12.286724 81.511021
Middle activity 0.969207 0.124516 5.226317
High activity 0.158169 0.000000 1.009591
I am trying to create a vertically stacked bar-chart, with on the x-axis 3 seperate bars: one for 'Percentages_center', one for 'Percentages zone2' and one for 'Percentages total'. 1 bar should represent the percentages of sleeping, low activity, middle activity and high activity.
I've tried this using the following code, but I cant figure out how to make the bar chart:
x = ['Center', 'Zone2', 'Total']
plot = px.Figure(data=[go.Bar(
name = 'Sleeping (0-150 MP)',
x = x,
y = df_percentages['Percentages center']
),
go.Bar(
name = 'Low activity (151-2000 MP)',
x = x,
y = df_percentages['Percentages zone2']
),
go.Bar(
name = 'Middle activity (2001-6000 MP)',
x = x,
y = df_percentages['Percentages center']
),
go.Bar(
name = 'High activity (6000-10000)',
x = x,
y = df_percentages['Percentages zone2']
)
])
plot.update_layout(barmode='stack')
plot.show()
If you're open to plotly.express, I would suggest using
df = df.T # to get your df in the right shape
fig = px.bar(df, x = df.index, y = df.columns)
Plot:
Complete code:
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px
df = pd.DataFrame({'Percentages_center': {'Sleeping': 77.496214,
'Low_activity': 21.339391,
'Middle_activity': 0.9692069999999999,
'High_activity': 0.158169},
'Percentages_zone2': {'Sleeping': 87.551742,
'Low_activity': 12.286724000000001,
'Middle_activity': 0.124516,
'High_activity': 0.0},
'Percentages_total': {'Sleeping': 12.202591,
'Low_activity': 81.511021,
'Middle_activity': 5.226317,
'High_activity': 1.009591}})
df = df.T
fig = px.bar(df, x = df.index, y = df.columns)
fig.show()
I'm learning Seaborn and trying to figure out how I can format an X axis for dates over a yearly period, so that it is readable. Let's assume we have a dataframe which holds weather measurements for each day of an entire year (365 rows).
sns.scatterplot(x = df_weather["DATE"], y = df_weather["MAX_TEMPERATURE_C"], color = 'red')
sns.scatterplot(x = df_weather["DATE"], y = df_weather["MIN_TEMPERATURE_C"], color = 'blue')
plt.show()
How can I ensure that the X axis labels are readable? Ideally, one label per month would be fine.
Thanks!
Not very sure what your column date is like, but maybe try something like below, first generate some data, I have the date as a string which I guess is something like yours:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
DATE = pd.date_range('2020-01-01', periods=365, freq='D').strftime('%y%y-%m-%d')
MIN = np.random.uniform(low=10,high=25,size = len(index))
MAX = MIN + np.random.uniform(low=5,high=10,size =len(index))
df = pd.DataFrame({'DATE':DATE,'MIN':MIN,'MAX':MAX})
Plot like you did using sns:
fig, ax = plt.subplots(figsize = (10,4))
ax = sns.scatterplot(x = "DATE", y = "MAX",data=df, color = 'red')
ax = sns.scatterplot(x = "DATE", y = "MIN",data=df, color = 'blue')
Now we define the start of the mths to define ticks:
mths = pd.date_range('2020-01-01', periods=12, freq='MS')
ax.set_xticks(mths.strftime('%y%y-%m-%d'))
ax.set(xticklabels=mths.strftime('%b'))
plt.show()
And it should look ok: