i get this code:
fig = plt.figure(figsize=(20,8))
ax1 = fig.add_subplot(111, ylabel='Price in $')
df['Close'].plot(ax=ax1, color='r', lw=2.)
signals[['short_mavg', 'long_mavg']].plot(ax=ax1, lw=2.)
ax1.plot(signals.loc[signals.positions == 1.0].index,
signals.short_mavg[signals.positions == 1.0],
'^', markersize=10, color='m')
ax1.plot(signals.loc[signals.positions == -1.0].index,
signals.short_mavg[signals.positions == -1.0],
'v', markersize=10, color='k')
plt.show()
the problem is : all of the '^' and 'v' and 'Date" values from df, placed on Y axis =(
Ill added all code part of my jupyter notebook and .csv sample
csv data:
2013.12.17,00:00,0.89469,0.89571,0.88817,0.88973,4
2013.12.18,00:00,0.88974,0.89430,0.88200,0.88595,4
code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
df = pd.read_csv("AUDUSD.csv",header = None)
df.columns = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume']
df=df.set_index('Date')
second df:
short_window = 20
long_window = 90
signals = pd.DataFrame(index=df.index)
signals['signal'] = 0.0
#calculating MAs
signals['short_mavg'] = df['Close'].rolling(short_window).mean()
signals['long_mavg'] = df['Close'].rolling(long_window).mean()
signals['signal'][short_window:] = np.where(signals['short_mavg'][short_window:]
> signals['long_mavg'][short_window:], 1.0, 0.0)
signals['positions'] = signals['signal'].diff()
I created a graph with your code using Yahoo Finance currency data. It may be that the time series data is not indexed. Please check your data and my data content.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import yfinance as yf
ticker = yf.Ticker("AUDUSD=X")
df = ticker.history(start='2013-01-01', end='2021-01-01')
short_window = 20
long_window = 90
signals = pd.DataFrame(index=df.index)
signals['signal'] = 0.0
#calculating MAs
signals['short_mavg'] = df['Close'].rolling(short_window).mean()
signals['long_mavg'] = df['Close'].rolling(long_window).mean()
signals['signal'][short_window:] = np.where(signals['short_mavg'][short_window:]
> signals['long_mavg'][short_window:], 1.0, 0.0)
signals['positions'] = signals['signal'].diff()
fig = plt.figure(figsize=(20,8))
ax1 = fig.add_subplot(111, ylabel='Price in $')
df['Close'].plot(ax=ax1, color='r', lw=2.)
signals[['short_mavg', 'long_mavg']].plot(ax=ax1, lw=2.)
ax1.plot(signals.loc[signals.positions == 1.0].index,
signals.short_mavg[signals.positions == 1.0],
'^', markersize=10, color='m')
ax1.plot(signals.loc[signals.positions == -1.0].index,
signals.short_mavg[signals.positions == -1.0],
'v', markersize=10, color='k')
plt.show()
signals
signal short_mavg long_mavg positions
Date
2012-12-31 0.0 NaN NaN NaN
2013-01-01 0.0 NaN NaN 0.0
2013-01-02 0.0 NaN NaN 0.0
2013-01-03 0.0 NaN NaN 0.0
2013-01-04 0.0 NaN NaN 0.0
... ... ... ... ...
2020-12-25 1.0 0.749732 0.727486 0.0
2020-12-28 1.0 0.750791 0.727987 0.0
2020-12-29 1.0 0.751951 0.728454 0.0
2020-12-30 1.0 0.753096 0.728910 0.0
2020-12-31 1.0 0.754453 0.729403 0.0
Related
I am producing a pandas barplot with raw counts represented by the plot, however I would like to annotate the bars with the pct of those counts as a whole. I have seen a lot of people using ax.patches methods to annotate but my values are unrelated to the get_height of the actual bars.
Here is some toy data. The plot produced will be the individual counts of the specific type. However, I want to add annotations above that specific bar that represent the pct total of that specific type to all types for that person's name.
Let me know if you need any more clarification.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
d = {'ID': [1,1,1,2,2,3,3,3,4],
'name': ['bob','bob','bob','shelby','shelby','jordan','jordan','jordan','jeff'],
'type': ['type1','type2','type4','type1','type6','type5','type8','type2',None]}
df: pd.DataFrame = pd.DataFrame(data=d)
df_pivot: pd.DataFrame = df.pivot_table(index='type', columns=['name'], values='ID', aggfunc={'ID': np.sum}).fillna(0)
# create percent totals of the specific type's row of the total
df_pivot['bob_pct_total']: pd.Series = (df_pivot['bob']/df_pivot['bob'].sum()).mul(100).round(1)
df_pivot['shelby_pct_total']: pd.Series = (df_pivot['shelby']/df_pivot['shelby'].sum()).mul(100).round(1)
df_pivot['jordan_pct_total']: pd.Series = (df_pivot['jordan']/df_pivot['jordan'].sum()).mul(100).round(1)
df_pivot.head(10)
name bob jordan shelby bob_pct_total shelby_pct_total jordan_pct_total
type
type1 1.0 0.0 2.0 33.3 50.0 0.0
type2 1.0 3.0 0.0 33.3 0.0 33.3
type4 1.0 0.0 0.0 33.3 0.0 0.0
type5 0.0 3.0 0.0 0.0 0.0 33.3
type6 0.0 0.0 2.0 0.0 50.0 0.0
type8 0.0 3.0 0.0 0.0 0.0 33.3
fig, ax = plt.subplots(figsize=(15,15))
df_pivot.plot(kind='bar', y=['bob','jordan','shelby'], ax=ax)
You can use the old approach, looping through the bars, using the height to position whatever text you want. Since matplotlib 3.4.0 there also is a new function bar_label that removes much of the boilerplate:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
d = {'ID': [1, 1, 1, 2, 2, 3, 3, 3, 4],
'name': ['bob', 'bob', 'bob', 'shelby', 'shelby', 'jordan', 'jordan', 'jordan', 'jeff'],
'type': ['type1', 'type2', 'type4', 'type1', 'type6', 'type5', 'type8', 'type2', None]}
df: pd.DataFrame = pd.DataFrame(data=d)
df_pivot: pd.DataFrame = df.pivot_table(index='type', columns=['name'], values='ID', aggfunc={'ID': np.sum}).fillna(0)
# create percent totals of the specific type's row of the total
df_pivot['bob_pct_total']: pd.Series = (df_pivot['bob'] / df_pivot['bob'].sum()).mul(100).round(1)
df_pivot['shelby_pct_total']: pd.Series = (df_pivot['shelby'] / df_pivot['shelby'].sum()).mul(100).round(1)
df_pivot['jordan_pct_total']: pd.Series = (df_pivot['jordan'] / df_pivot['jordan'].sum()).mul(100).round(1)
fig, ax = plt.subplots(figsize=(12, 5))
columns = ['bob', 'jordan', 'shelby']
df_pivot.plot(kind='bar', y=['bob', 'jordan', 'shelby'], rot=0, ax=ax)
for bars, col in zip(ax.containers, ['bob_pct_total', 'jordan_pct_total', 'shelby_pct_total']):
ax.bar_label(bars, labels=['' if val == 0 else f'{val}' for val in df_pivot[col]])
plt.tight_layout()
plt.show()
PS: To skip labeling the first bars, you could experiment with:
for bars, col in zip(ax.containers, ['bob_pct_total', 'jordan_pct_total', 'shelby_pct_total']):
labels=['' if val == 0 else f'{val}' for val in df_pivot[col]]
labels[0] = ''
ax.bar_label(bars, labels=labels)
I am trying to plot the rolling mean on a double-axis graph. However, I am unable to create my legend correctly. Any pointers?
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
# df6 = t100df5.rolling(window=12).mean()
lns1 = ax1.plot(
df6,
label = ['Alpha', 'Beta'], # how do I add 'Beta' label correctly?
linewidth = 2.0)
lns2 = ax2.plot(temp,
label = 'Dollars',
color='black')
lns = lns1+lns2
labs = [l.get_label() for l in lns]
L = ax1.legend(lns, labs, loc = 0, frameon = True)
df6 looks like this:
Alpha Beta
TIME
1990-01-01 NaN NaN
1990-02-01 NaN NaN
1990-03-01 NaN NaN
1990-04-01 NaN NaN
1990-05-01 NaN NaN
... ... ...
2019-08-01 10.012447 8.331901
2019-09-01 9.909044 8.263813
2019-10-01 9.810155 8.185539
2019-11-01 9.711690 8.085016
2019-12-01 9.619968 8.03533
And temp looks like this:
Dollars
date
1994-01-01 NaN
1994-02-01 NaN
1994-03-01 225.664248
1994-04-01 217.475670
1995-01-01 216.464499
... ...
2018-04-01 179.176545
2019-01-01 177.624369
2019-02-01 178.731035
2019-03-01 176.624608
2019-04-01 177.357060
Note that the datetime objects are the indices for the dataframes.
How can I add a legend with appropriate labels for the graph below? The black line is from temp and both of the other lines are from df6.
I just added another ax1.plot statement like this:
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
lns1 = ax1.plot(
df6.index, df6.Alpha
label = 'Alpha',
linewidth = 2.0)
lns1_5 = ax1.plot(df6.index, df6.Beta, label = 'Beta')
lns2 = ax2.plot(temp,
label = 'Dollars',
color='black')
lns = lns1+lns1_5+lns2
labs = [l.get_label() for l in lns]
L = ax1.legend(lns, labs, loc = 0, frameon = True)
Here is the sample data:
Datetime Price Data1 Data2 ShiftedPrice
0 2017-11-05 09:20:01.134 2123.0 12.23 34.12 300.0
1 2017-11-05 09:20:01.789 2133.0 32.43 45.62 330.0
2 2017-11-05 09:20:02.238 2423.0 35.43 55.62 NaN
3 2017-11-05 09:20:02.567 3423.0 65.43 56.62 NaN
4 2017-11-05 09:20:02.948 2463.0 45.43 58.62 NaN
I am trying to draw a plot between Datetime and Shiftedprice columns and horizontal lines for mean, confidence intervals of the ShiftedPrice column.
Have a look at the code below:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
df1 = df.dropna(subset=['ShiftedPrice'])
df1
fig = plt.figure(figsize=(20,10))
ax = fig.add_subplot(121)
ax = df1.plot(x='Datetime',y='ShiftedPrice')
# Plotting the mean
ax.axhline(y=df1['ShiftedPrice'].mean(), color='r', linestyle='--', lw=2)
plt.show()
# Plotting Confidence Intervals
ax.axhline(y=df1['ShiftedPrice'].mean() + 1.96*np.std(df1['ShiftedPrice'],ddof=1), color='g', linestyle=':', lw=2)
ax.axhline(y=df1['ShiftedPrice'].mean() - 1.96*np.std(df1['ShiftedPrice'],ddof=1), color='g', linestyle=':', lw=2)
plt.show()
My problem is that horizontal lines are not appearing. Instead, I get the following message
ax.axhline(y=df1['ShiftedPrice'].mean(), color='r', linestyle='--', lw=2)
Out[22]: <matplotlib.lines.Line2D at 0xccc5c18>
I have the following df:
[A B C D
1Q18 6.9 0.0 25.0 9.9
2Q17 NaN NaN NaN NaN
2Q18 7.1 0.0 25.0 4.1
3Q17 NaN NaN NaN NaN
3Q18 7.3 0.0 25.0 5.3
4Q17 NaN NaN NaN NaN
4Q18 7.0 0.0 25.0 8.3]
And I would like to obtain a graph such as
I tried first with Bar(df) but it only graph the first column
p=Bar(df)
show(p)
I also tried:
p=Bar(popo, values=["A","B"])
show(p)
>raise ValueError("expected an element of either %s, got %r" % (nice_join(self.type_params), value))
ValueError: expected an element of either Column(Float) or Column(String), got array([[ 6.9, 0. ]])
thank you in advance for letting me what I am doing wrong
cheers
In [Bokeh 0.12.6+] is possible use visual dodge:
from bokeh.core.properties import value
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.transform import dodge
df.index = df.index.str.split('Q', expand=True)
df = df.sort_index(level=[1,0])
df.index = df.index.map('Q'.join)
#remove all NaNs, because not supported plotting
df = df.dropna()
print (df)
A B C D
1Q18 6.9 0.0 25.0 9.9
2Q18 7.1 0.0 25.0 4.1
3Q18 7.3 0.0 25.0 5.3
4Q18 7.0 0.0 25.0 8.3
output_file("dodged_bars.html")
df = df.reset_index().rename(columns={'index':'qrange'})
data = df.to_dict(orient='list')
idx = df['qrange'].tolist()
source = ColumnDataSource(data=data)
p = figure(x_range=idx, y_range=(0, df[['A','B','C','D']].values.max() + 5),
plot_height=250, title="Report",
toolbar_location=None, tools="")
p.vbar(x=dodge('qrange', -0.3, range=p.x_range), top='A', width=0.2, source=source,
color="#c9d9d3", legend=value("A"))
p.vbar(x=dodge('qrange', -0.1, range=p.x_range), top='B', width=0.2, source=source,
color="#718dbf", legend=value("B"))
p.vbar(x=dodge('qrange', 0.1, range=p.x_range), top='C', width=0.2, source=source,
color="#e84d60", legend=value("C"))
p.vbar(x=dodge('qrange', 0.3, range=p.x_range), top='D', width=0.2, source=source,
color="#ddb7b1", legend=value("D"))
p.x_range.range_padding = 0.2
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
Your data is pivoted so I unpivoted it and then went with Bar plot, hope this is what you need:
a = [6.9, np.nan, 7.1, np.nan, 7.3, np.nan, 7.0]
b = [0.0, np.nan, 0.0, np.nan, 0.0, np.nan, 0.0]
c = [25.0, np.nan, 25.0, np.nan, 25.0, np.nan, 25.0]
d = [9.9, np.nan, 4.1, np.nan, 5.3, np.nan, 8.3]
df = pd.DataFrame({'A': a, 'B': b, 'C': c, 'D': d}, index =['1Q18', '2Q17', '2Q18', '3Q17', '3Q18', '4Q17', '4Q18'])
df.reset_index(inplace=True)
df = pd.melt(df, id_vars='index').dropna().set_index('index')
p = Bar(df, values='value', group='variable')
show(p)
This is my Python code for plotting graph from txt file:
import matplotlib.pyplot as plt
x = []
y = []
fig = plt.figure()
rect = fig.patch
rect.set_facecolor('#31312e')
readFile = open('data.txt', 'r')
sepFile = readFile.read().split('\n')
readFile.close()
for plotPair in sepFile:
xAndY = plotPair.Split(';')
x.append(int(xAndY[0]))
y.append(int(xAndY[1]))
ax1 = fig.add_subplot(1,1,1, axisbg='blue')
ax1.plot(x,y, 'c', linewidth=3.3)
plt.show()
This is how my data.txt look like:
[info]
Datum=100221
[messung]
Uhrzeit;Interval;AMB_TEMP;IRAD;W_D;W_S;Poly_M_TEMP;TF_M_TEMP
;s;DegC;W/m2;Deg;m/s;DegC;DegC
[Start]
00:15:00;900;26.1;55.8;5.5;1.0;;
00:30:00;900;26.1;55.8;6.1;1.0;;
00:45:00;900;26.1;55.9;5.7;0.9;;
01:00:00;900;26.1;55.9;5.8;0.7;;
01:15:00;900;26.1;55.8;6.4;0.8;;
01:30:00;900;26.1;55.8;6.1;0.8;;
01:45:00;900;26.1;55.8;5.7;1.0;;
02:00:00;900;26.0;55.8;5.8;1.1;;
02:15:00;900;25.9;55.9;5.4;1.1;;
02:30:00;900;25.8;55.9;5.9;0.9;;
02:45:00;900;25.8;55.9;8.0;1.0;;
03:00:00;900;25.8;55.8;7.2;0.9;;
03:15:00;900;25.7;55.8;11.1;0.7;;
03:30:00;900;25.6;55.9;8.5;1.0;;
03:45:00;900;25.7;55.8;6.8;1.1;;
04:00:00;900;25.7;55.8;6.8;0.9;;
04:15:00;900;25.7;55.8;7.0;0.9;;
04:30:00;900;25.6;55.8;6.6;0.6;;
04:45:00;900;25.7;55.8;6.3;0.5;;
05:00:00;900;25.6;55.8;6.1;0.5;;
05:15:00;900;25.5;55.8;5.6;0.8;;
05:30:00;900;25.5;55.8;5.0;0.6;;
05:45:00;900;25.5;55.8;5.2;0.7;;
06:00:00;900;25.5;55.8;5.1;0.7;;
06:15:00;900;25.4;55.8;5.5;0.6;;
06:30:00;900;25.4;55.8;6.1;0.6;;
06:45:00;900;25.4;55.8;5.9;0.6;;
07:00:00;900;25.4;55.8;6.1;0.7;;
07:15:00;900;25.3;55.8;6.2;0.9;;
07:30:00;900;25.4;55.8;5.8;0.9;;
07:45:00;900;25.5;57.4;6.1;0.8;;
08:00:00;900;25.7;68.7;5.9;0.8;;
08:15:00;900;26.0;85.5;6.1;0.8;;
08:30:00;900;26.2;95.5;5.6;0.9;;
08:45:00;900;26.4;110.5;5.5;1.0;;
09:00:00;900;26.8;137.7;5.7;1.2;;
09:15:00;900;27.4;175.7;5.6;1.3;;
09:30:00;900;28.1;223.1;6.0;1.6;;
09:45:00;900;28.7;275.1;5.9;1.9;;
10:00:00;900;29.5;317.7;6.1;2.5;;
10:15:00;900;31.3;633.4;6.3;2.8;;
10:30:00;900;31.4;601.3;6.0;3.0;;
10:45:00;900;32.6;719.6;6.4;3.1;;
11:00:00;900;32.6;695.0;6.5;2.9;;
11:15:00;900;32.8;656.7;6.7;2.5;;
11:30:00;900;33.3;755.1;6.6;2.7;;
11:45:00;900;33.5;773.4;6.4;2.7;;
12:00:00;900;34.0;912.4;6.1;3.0;;
12:15:00;900;34.0;842.2;5.9;3.2;;
12:30:00;900;34.1;594.6;6.5;2.3;;
12:45:00;900;33.7;755.2;7.2;2.6;;
13:00:00;900;34.2;560.3;6.1;2.5;;
13:15:00;900;33.4;437.0;6.9;2.2;;
13:30:00;900;32.7;411.4;6.2;2.7;;
13:45:00;900;32.9;296.0;7.1;1.8;;
14:00:00;900;32.1;289.3;6.9;2.5;;
14:15:00;900;33.2;441.0;6.2;2.1;;
14:30:00;900;31.8;275.0;5.9;2.6;;
14:45:00;900;31.1;206.9;6.7;2.6;;
15:00:00;900;31.0;294.3;6.1;2.1;;
15:15:00;900;33.7;750.2;6.2;2.8;;
15:30:00;900;35.0;729.4;6.6;2.6;;
15:45:00;900;33.4;480.6;6.1;3.2;;
16:00:00;900;33.5;502.6;6.8;3.0;;
16:15:00;900;33.1;391.8;6.6;2.3;;
16:30:00;900;33.3;490.9;6.7;2.8;;
16:45:00;900;33.2;419.9;6.6;2.7;;
17:00:00;900;31.2;168.5;6.2;2.7;;
17:15:00;900;30.5;147.6;6.5;2.8;;
17:30:00;900;30.0;96.0;7.3;2.0;;
17:45:00;900;28.0;58.0;14.4;2.1;;
18:00:00;900;25.2;57.0;20.3;3.1;;
18:15:00;900;23.7;58.0;19.7;2.6;;
18:30:00;900;23.5;55.9;19.6;1.9;;
18:45:00;900;23.8;55.8;23.4;1.2;;
19:00:00;900;24.1;56.5;18.6;0.5;;
19:15:00;900;24.4;57.6;17.7;0.3;;
19:30:00;900;24.8;56.8;9.7;0.3;;
19:45:00;900;25.1;55.8;5.4;0.4;;
20:00:00;900;25.0;55.8;7.8;0.3;;
20:15:00;900;25.2;55.8;6.7;0.5;;
20:30:00;900;25.2;55.8;5.9;0.8;;
20:45:00;900;25.2;55.8;5.6;0.8;;
21:00:00;900;25.0;55.8;5.6;1.0;;
21:15:00;900;24.9;55.8;5.7;1.3;;
21:30:00;900;24.9;55.8;5.8;1.2;;
21:45:00;900;24.9;55.8;5.7;1.0;;
22:00:00;900;25.0;55.8;6.0;0.8;;
22:15:00;900;25.0;55.8;6.0;0.9;;
22:30:00;900;25.0;55.8;5.9;1.0;;
22:45:00;900;25.0;55.7;6.1;0.6;;
23:00:00;900;25.0;55.8;5.2;0.4;;
23:15:00;900;25.2;55.8;5.7;0.5;;
23:30:00;900;25.3;55.8;6.2;0.5;;
23:45:00;900;25.4;55.8;5.8;0.4;;
24:00:00;900;25.3;55.8;4.5;0.4;;
When i run module in my python: it say error:
ValueError: invalid literal for int() with base 10: '[info]r'
In my txt file i got 6 columns,
how can choose which column to be plotted as a graph?
import matplotlib.pyplot as plt
from datetime import time, datetime
x = []
y = []
t = []
fig = plt.figure()
rect = fig.patch
rect.set_facecolor('#31312e')
readFile = open('data.txt', 'r')
sepFile = readFile.read().split('\n')
readFile.close()
for idx, plotPair in enumerate(sepFile):
if idx > 5:
xAndY = plotPair.split(';')
time_string = xAndY[0]
time_string = time_string.replace(' ', '') # remove blanks
datetime_obj = datetime.strptime(time_string, '%H:%M:%S')
t.append(datetime_obj)
x.append(float(xAndY[2]))
y.append(float(xAndY[3]))
ax1 = fig.add_subplot(1, 1, 1, axisbg='blue')
ax1.plot(t, y, 'c', linewidth=3.3)
plt.show()
You have to ignore the first 6 header rows.
I'm using if idx > 5: for this purpose.
I changed the columns that will be printed to [2] and [3]. If you want to plot the first column you have to handle the : in the strings.
Changed int() into float() for casting the string into a number. For more information on that aspect see:
Parse String to Float or Int
now with datetime.
However: 24:00:00 should be written in the data file as 00:00:00
With Pandas, 3 lines:
import pandas as pd
df = pd.read_table("file",skiprows=6, sep=";", index_col=0,
parse_dates=True, header=None,
names=["Interval","AMB_TEMP","IRAD","W_D","W_S","Poly_M_TEMP","TF_M_TEMP"]
df.AMB_TEMP.plot()
skiprows allows to skip the 6 first lines
index_col and parse_date, make the first column being the index and parsed as date
names : names of the column since there is no header.
Then plot of column AMB_TEMP, it could have been :
df.ix[:,1:5].plot()
to plot columns from AMB_TEMP to W_S.
With df being a dataframe:
df.head()
Interval AMB_TEMP IRAD W_D W_S Poly_M_TEMP \
2014-07-22 00:15:00 900 26.1 55.8 5.5 1.0 NaN
2014-07-22 00:30:00 900 26.1 55.8 6.1 1.0 NaN
2014-07-22 00:45:00 900 26.1 55.9 5.7 0.9 NaN
2014-07-22 01:00:00 900 26.1 55.9 5.8 0.7 NaN
2014-07-22 01:15:00 900 26.1 55.8 6.4 0.8 NaN
TF_M_TEMP
2014-07-22 00:15:00 NaN
2014-07-22 00:30:00 NaN
2014-07-22 00:45:00 NaN
2014-07-22 01:00:00 NaN
2014-07-22 01:15:00 NaN