I have the following df:
[A B C D
1Q18 6.9 0.0 25.0 9.9
2Q17 NaN NaN NaN NaN
2Q18 7.1 0.0 25.0 4.1
3Q17 NaN NaN NaN NaN
3Q18 7.3 0.0 25.0 5.3
4Q17 NaN NaN NaN NaN
4Q18 7.0 0.0 25.0 8.3]
And I would like to obtain a graph such as
I tried first with Bar(df) but it only graph the first column
p=Bar(df)
show(p)
I also tried:
p=Bar(popo, values=["A","B"])
show(p)
>raise ValueError("expected an element of either %s, got %r" % (nice_join(self.type_params), value))
ValueError: expected an element of either Column(Float) or Column(String), got array([[ 6.9, 0. ]])
thank you in advance for letting me what I am doing wrong
cheers
In [Bokeh 0.12.6+] is possible use visual dodge:
from bokeh.core.properties import value
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.transform import dodge
df.index = df.index.str.split('Q', expand=True)
df = df.sort_index(level=[1,0])
df.index = df.index.map('Q'.join)
#remove all NaNs, because not supported plotting
df = df.dropna()
print (df)
A B C D
1Q18 6.9 0.0 25.0 9.9
2Q18 7.1 0.0 25.0 4.1
3Q18 7.3 0.0 25.0 5.3
4Q18 7.0 0.0 25.0 8.3
output_file("dodged_bars.html")
df = df.reset_index().rename(columns={'index':'qrange'})
data = df.to_dict(orient='list')
idx = df['qrange'].tolist()
source = ColumnDataSource(data=data)
p = figure(x_range=idx, y_range=(0, df[['A','B','C','D']].values.max() + 5),
plot_height=250, title="Report",
toolbar_location=None, tools="")
p.vbar(x=dodge('qrange', -0.3, range=p.x_range), top='A', width=0.2, source=source,
color="#c9d9d3", legend=value("A"))
p.vbar(x=dodge('qrange', -0.1, range=p.x_range), top='B', width=0.2, source=source,
color="#718dbf", legend=value("B"))
p.vbar(x=dodge('qrange', 0.1, range=p.x_range), top='C', width=0.2, source=source,
color="#e84d60", legend=value("C"))
p.vbar(x=dodge('qrange', 0.3, range=p.x_range), top='D', width=0.2, source=source,
color="#ddb7b1", legend=value("D"))
p.x_range.range_padding = 0.2
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
Your data is pivoted so I unpivoted it and then went with Bar plot, hope this is what you need:
a = [6.9, np.nan, 7.1, np.nan, 7.3, np.nan, 7.0]
b = [0.0, np.nan, 0.0, np.nan, 0.0, np.nan, 0.0]
c = [25.0, np.nan, 25.0, np.nan, 25.0, np.nan, 25.0]
d = [9.9, np.nan, 4.1, np.nan, 5.3, np.nan, 8.3]
df = pd.DataFrame({'A': a, 'B': b, 'C': c, 'D': d}, index =['1Q18', '2Q17', '2Q18', '3Q17', '3Q18', '4Q17', '4Q18'])
df.reset_index(inplace=True)
df = pd.melt(df, id_vars='index').dropna().set_index('index')
p = Bar(df, values='value', group='variable')
show(p)
Related
There are a dozens similar sounding questions here, I think I've searched them all and could not find a solution to my problem:
I have 2 df: df_c:
CAN-01 CAN-02 CAN-03
CE
ce1 0.84 0.73 0.50
ce2 0.06 0.13 0.05
And df_z:
CAN-01 CAN-02 CAN-03
marker
cell1 0.29 1.5 7
cell2 1.00 3.0 1
I want to join for each 'marker' + 'CE' combination over their column names
Example: cell1 + ce1:
[[0.29, 0.84],[1.5,0.73],[7,0.5], ...]
(Continuing for cell1 + ce2, cell2 + ce1, cell2 + ce2)
I have a working example using two loops and .loc twice, but it takes forever on the full data set.
I think the best to build is a multiindex DF with some merge/join/concat magic:
CAN-01 CAN-02 CAN-03
Source
0 CE 0.84 0.73 0.50
Marker 0.29 1.5 7
1 CE ...
Marker ...
Sample Code
dc = [['ce1', 0.84, 0.73, 0.5],['c2', 0.06,0.13,0.05]]
dat_c = pd.DataFrame(dc, columns=['CE', 'CAN-01', 'CAN-02', 'CAN-03'])
dat_c.set_index('CE',inplace=True)
dz = [['cell1', 0.29, 1.5, 7],['cell2', 1, 3, 1]]
dat_z = pd.DataFrame(dz, columns=['marker', "CAN-01", "CAN-02", "CAN-03"])
dat_z.set_index('marker',inplace=True)
Bad/Slow Solution
for ci, c_row in dat_c.iterrows(): # for each CE in CE table
tmp = []
for j,colz in enumerate(dat_z.columns[1:]):
if not colz in dat_c:
continue
entry_c = c_row.loc[colz]
if len(entry_c.shape) > 0:
continue
tmp.append([dat_z.loc[marker,colz],entry_c])
IIUC:
use append()+groupby():
dat_c.index=[f"cell{x+1}" for x in range(len(dat_c))]
df=dat_c.append(dat_z).groupby(level=0).agg(list)
output of df:
CAN-01 CAN-02 CAN-03
cell1 [0.84, 0.29] [0.73, 1.5] [0.5, 7.0]
cell2 [0.06, 1.0] [0.13, 3.0] [0.05, 1.0]
If needed list:
dat_c.index=[f"cell{x+1}" for x in range(len(dat_c))]
lst=dat_c.append(dat_z).groupby(level=0).agg(list).to_numpy().tolist()
output of lst:
[[[0.84, 0.29], [0.73, 1.5], [0.5, 7.0]],
[[0.06, 1.0], [0.13, 3.0], [0.05, 1.0]]]
i get this code:
fig = plt.figure(figsize=(20,8))
ax1 = fig.add_subplot(111, ylabel='Price in $')
df['Close'].plot(ax=ax1, color='r', lw=2.)
signals[['short_mavg', 'long_mavg']].plot(ax=ax1, lw=2.)
ax1.plot(signals.loc[signals.positions == 1.0].index,
signals.short_mavg[signals.positions == 1.0],
'^', markersize=10, color='m')
ax1.plot(signals.loc[signals.positions == -1.0].index,
signals.short_mavg[signals.positions == -1.0],
'v', markersize=10, color='k')
plt.show()
the problem is : all of the '^' and 'v' and 'Date" values from df, placed on Y axis =(
Ill added all code part of my jupyter notebook and .csv sample
csv data:
2013.12.17,00:00,0.89469,0.89571,0.88817,0.88973,4
2013.12.18,00:00,0.88974,0.89430,0.88200,0.88595,4
code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
df = pd.read_csv("AUDUSD.csv",header = None)
df.columns = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume']
df=df.set_index('Date')
second df:
short_window = 20
long_window = 90
signals = pd.DataFrame(index=df.index)
signals['signal'] = 0.0
#calculating MAs
signals['short_mavg'] = df['Close'].rolling(short_window).mean()
signals['long_mavg'] = df['Close'].rolling(long_window).mean()
signals['signal'][short_window:] = np.where(signals['short_mavg'][short_window:]
> signals['long_mavg'][short_window:], 1.0, 0.0)
signals['positions'] = signals['signal'].diff()
I created a graph with your code using Yahoo Finance currency data. It may be that the time series data is not indexed. Please check your data and my data content.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import yfinance as yf
ticker = yf.Ticker("AUDUSD=X")
df = ticker.history(start='2013-01-01', end='2021-01-01')
short_window = 20
long_window = 90
signals = pd.DataFrame(index=df.index)
signals['signal'] = 0.0
#calculating MAs
signals['short_mavg'] = df['Close'].rolling(short_window).mean()
signals['long_mavg'] = df['Close'].rolling(long_window).mean()
signals['signal'][short_window:] = np.where(signals['short_mavg'][short_window:]
> signals['long_mavg'][short_window:], 1.0, 0.0)
signals['positions'] = signals['signal'].diff()
fig = plt.figure(figsize=(20,8))
ax1 = fig.add_subplot(111, ylabel='Price in $')
df['Close'].plot(ax=ax1, color='r', lw=2.)
signals[['short_mavg', 'long_mavg']].plot(ax=ax1, lw=2.)
ax1.plot(signals.loc[signals.positions == 1.0].index,
signals.short_mavg[signals.positions == 1.0],
'^', markersize=10, color='m')
ax1.plot(signals.loc[signals.positions == -1.0].index,
signals.short_mavg[signals.positions == -1.0],
'v', markersize=10, color='k')
plt.show()
signals
signal short_mavg long_mavg positions
Date
2012-12-31 0.0 NaN NaN NaN
2013-01-01 0.0 NaN NaN 0.0
2013-01-02 0.0 NaN NaN 0.0
2013-01-03 0.0 NaN NaN 0.0
2013-01-04 0.0 NaN NaN 0.0
... ... ... ... ...
2020-12-25 1.0 0.749732 0.727486 0.0
2020-12-28 1.0 0.750791 0.727987 0.0
2020-12-29 1.0 0.751951 0.728454 0.0
2020-12-30 1.0 0.753096 0.728910 0.0
2020-12-31 1.0 0.754453 0.729403 0.0
So here is my first dataframe df1. In the columns, Starting DOY and Ending DOY, for example, 3.0 and 6.0, I want to print column values By, Bz, Vsw etc of another dataframe df2 by matching it with column DOY
Here is a simple tutorial how you can do it:
from pandas import DataFrame
if __name__ == '__main__':
data1 = {'Starting DOY': [3.0, 3.0, 13.0],
'Ending DOY': [6.0, 6.0, 15.0]}
data2 = {'YEAR': [1975, 1975, 1975],
'DOY': [1.0, 3.0, 6.0],
'HR': [0, 1, 2],
'By': [-7.5, -4.0, -3.6],
'Bz': [0.2, 2.4, -2.3],
'Nsw': [999.9, 6.2, 5.9],
'Vsw': [9999.0, 476.0, 482.0],
'AE': [181, 138, 86]}
df1 = DataFrame(data1, columns=['Starting DOY',
'Ending DOY'])
df2 = DataFrame(data2, columns=['YEAR', 'DOY',
'HR', 'By', 'Bz',
'Nsw', 'Vsw', 'AE'])
for doy in df1.values:
start_doy = doy[0]
end_doy = doy[1]
for val in df2.values:
year = val[0]
current_doy = val[1]
hr = val[2]
By = val[3]
Bz = val[4]
Nsw = val[5]
Vsw = val[6]
AE = val[7]
if start_doy <= current_doy <= end_doy:
print("For DOY {}".format(current_doy))
print("By: {}".format(By))
print("Bz: {}".format(Bz))
print("Vsw: {}".format(Vsw))
print("--------------------")
Ouput:
For DOY 3.0
By: -4.0
Bz: 2.4
Vsw: 476.0
--------------------
For DOY 6.0
By: -3.6
Bz: -2.3
Vsw: 482.0
--------------------
For DOY 3.0
By: -4.0
Bz: 2.4
Vsw: 476.0
--------------------
For DOY 6.0
By: -3.6
Bz: -2.3
Vsw: 482.0
--------------------
I think the easiest way about this would be:
>>> df1 = pd.DataFrame([[1,2],[3,4],[5,6]], columns=["Starting DOY", "Ending DOY"])
>>> df2 = pd.DataFrame([[6,5,8, 1.5],[4,3,9, 3.5],[2,1,5, 5.5]], columns=["By", "Bz", "Vsw", "DOY"])
>>> df1.apply(lambda row: df2[(df2['DOY'] >= row[0]) & (df2['DOY'] <= row[1])], axis=1)
0 By Bz Vsw DOY
0 6 5 8 1.5
1 By Bz Vsw DOY
1 4 3 9 3.5
2 By Bz Vsw DOY
2 2 1 5 5.5
dtype: object
Also depending on what you want the output for, and how you would need to format it.
I have dataframe. I tried simple a function to convert column names to list. Surprisingly I am getting tuples. Why?
my code:
big_list = [[nan, 21.0, 3.2, 12.0, 3.24],
[42.0, 23.799999999999997, 6.0, 13.599999999999998, 5.24],
[112.0, 32.199999999999996, 14.400000000000002, 18.4, 11.24],
[189.0, 46.2, 28.400000000000002, 26.400000000000002, 21.240000000000002]]
df= pd.DataFrame(np.array(big_list),index=range(0,4,1),columns=[sns])
df =
ig abc def igh klm
0 NaN 21.0 3.2 12.0 3.24
1 42.0 23.8 6.0 13.6 5.24
2 112.0 32.2 14.4 18.4 11.24
3 189.0 46.2 28.4 26.4 21.24
print(list(df))
present output:
[('ig',), ('abc',), ('def',), ('igh',), ('klm',)]
Expected outptu:
['ig','abc','def','igh','klm']
The following code should do it:
df = pd.DataFrame([[np.nan, 21.0, 3.2, 12.0, 3.24]], columns=['ig','abc','def','igh','klm'])
print(list(df.columns))
It gives the following output:
['ig', 'abc', 'def', 'igh', 'klm']
If your output is different, then the dataframe might be wrongly constructed
I have a dataframe which looks like this:
df = pd.DataFrame({'Pred': [10, 9.5, 9.8], 'Actual': [10.2, 9.9, 9.1], 'STD': [0.1, 0.2, 0.6]})
Pred Actual STD
0 10.0 10.2 0.1
1 9.5 9.9 0.2
2 9.8 9.1 0.6
I want to make a bar plot with error bars using STD only on the Pred column, and not on the Actual column. So far I have this:
df.plot.bar(yerr='STD', capsize=4)
but this adds the error bars on both Actual and Pred. Is there a straight forward way to tell Pandas to add the erorr bar to a single column?
You can do with
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
errors=df.STD.to_frame('Pred')
df[['Actual','Pred']].plot.bar(yerr=errors, ax=ax)