Return dataframe with columns in python

Return dataframe with columns in python - python

I'm trying to create a function to calculate Heikin Ashi candles (for financial analysis).
My indicators.py file looks like
from pandas import DataFrame, Series
def heikinashi(dataframe):
open = dataframe['open']
high = dataframe['high']
low = dataframe['low']
close = dataframe['close']
ha_close = 0.25 * (open + high + low + close)
ha_open = 0.5 * (open.shift(1) + close.shift(1))
ha_low = max(high, ha_open, ha_close)
ha_high = min(low, ha_open, ha_close)
return dataframe, ha_close, ha_open, ha_low, ha_high
And in my main script i'm trying to call this function the most effective way to return those four dataframes: ha_close, ha_open, ha_low and ha_high
My main script looks something like:
import indicators as ata
ha_close, ha_open, ha_low, ha_high = ata.heikinashi(dataframe)
dataframe['ha_close'] = ha_close(dataframe)
dataframe['ha_open'] = ha_open(dataframe)
dataframe['ha_low'] = ha_low(dataframe)
dataframe['ha_high'] = ha_high(dataframe)
but for some strange reason I cannot find the dataframes
What will be the most efficient way to perform this? code wise and with minimal calls
I expect to have dataframe['ha_close'] etc returned with the correct data as shown in the function
Any advice appreciated
Thank you!

In heikinashi() you are returning five elements, dataframe and ha_*, however, in your main script you are only assigning the return values to four variables.
Try changing the main script to:
dataframe, ha_close, ha_open, ha_low, ha_high = ata.heikinashi(dataframe)
Apart from that, it looks like dataframe which you are passing into heikinashi() is not defined anywhere.

Related

np.where is not calculating correctly?

I am using np.where function to calculate supertrend (an indicator in stock market). For this I need to do a simple calculation which np.where is giving wrong results most of the items. I have done the same calculations in excel too but not sure why np.where is giving wrong calculation in calculating Upper_Band column
Here is the excel calculation which I think is correct.
Here is the excel generated after running python code
from S.NO 6 onwards calculation has been different
this is the actual python code
import numpy as np
data["High-Low"] = data["High"] - data["Low"]
data["Close-low"] = abs(data["Close"].shift(1) - data["Low"])
data["Close-High"] = abs(data["Close"].shift(1) - data["High"])
data["True_Range"] = data[["High-Low", "Close-low", "Close-High"]].max(axis=1)
data["ATR"] = data["True_Range"].rolling(window=10).mean()
data["Basic_Upper_Band"] = (data["High"] + data["Low"])/2 + (data["ATR"]*2)
data["Basic_Lower_Band"] = (data["High"] + data["Low"])/2 - (data["ATR"]*2)
data["Upper_Band"] = 0
data["Lower_Band"] = 0
PreviousUB = data["Upper_Band"].shift(1)
BasicUB = data["Basic_Upper_Band"]
PreviousClose = data["Close"].shift(1)
data["Upper_Band"] = np.where((PreviousUB > BasicUB) | (PreviousUB < PreviousClose),BasicUB,PreviousUB)
Here is the output in python format too
main calculation in the code is data["Upper_Bank"]. I am using the same formula in excel and through np.where but it is giving me different results.

Is there a faster method to calculate implied volatility using mibian module for millions of rows in a csv/xl file?

My situation:
The CSV file has been converted to a data frame df5 and all the columns being used in the for loop below are of float type, this code is working but taking many many hours to just do 30,000 rows.
What I want from my situation:
I need to do the same operation on millions of rows and I am looking for fixes/alternate solutions that make it considerably faster.
Below is the code I am using currently:
for row in np.arange(0,len(df5)):
underlyingPrice = df5.iloc[row]['CLOSE_y']
strikePrice = df5.iloc[row]['STRIKE_PR']
interestRate = 10
dayss = df5.iloc[row]['Days']
optPrice = df5.iloc[row]['CLOSE_x']
result = BS([underlyingPrice,strikePrice,interestRate,dayss], callPrice= optPrice)
df5.iloc[row,df5.columns.get_loc('IV')]= result.impliedVolatility

Your loop seems to take values from each row to build another column IV.
This can be done much faster by using the apply method, which allows to use a function on each row/column to calculate a result.
Something like this:
def useBS(row):
underlyingPrice = row['CLOSE_y']
strikePrice = row['STRIKE_PR']
interestRate = 10
dayss = row['Days']
optPrice = row['CLOSE_x']
result = BS([underlyingPrice,strikePrice,interestRate,dayss], callPrice= optPrice)
return result.impliedVolatility
df5['IV'] = df5.apply(useBS, axis=1)

Python Loop Addition

No matter what I do I don't seem to be able to add all the base volumes and quote volumes together easily! I want to end up with a total base volume and a total quote volume of all the data in the data frame. Can someone help me on how you can do this easily?
I have tried summing and saving the data in a dictionary first and then adding it but I just don't seem to be able to make this work!
import urllib
import pandas as pd
import json
def call_data(): # Call data from Poloniex
global df
datalink = 'https://poloniex.com/public?command=returnTicker'
df = urllib.request.urlopen(datalink)
df = df.read().decode('utf-8')
df = json.loads(df)
global current_eth_price
for k, v in df.items():
if 'ETH' in k:
if 'USDT_ETH' in k:
current_eth_price = round(float(v['last']),2)
print("Current ETH Price $:",current_eth_price)
def calc_volumes(): # Calculate the base & quote volumes
global volume_totals
for k, v in df.items():
if 'ETH' in k:
basevolume = float(v['baseVolume'])*current_eth_price
quotevolume = float(v['quoteVolume'])*float(v['last'])*current_eth_price
if quotevolume > 0:
percentages = (quotevolume - basevolume) / basevolume * 100
volume_totals = {'key':[k],
'basevolume':[basevolume],
'quotevolume':[quotevolume],
'percentages':[percentages]}
print("volume totals:",volume_totals)
print("#"*8)
call_data()
calc_volumes()

A few notes:
For the next 2 years don't use the keyword globals for anything.
put function documentation under the function in quotes
using the requests library will be much easier than urllib. However ...
pandas can fetch the JSON and parse it all in one step
ok it doesn't have to be as split up as this, I'm just showing you how to properly pass variables around instead of globals.
I could not find "ETH" by itself. In the data they sent they have these 3 ['BTC_ETH', 'USDT_ETH', 'USDC_ETH']. So I used "USDT_ETH" I hope the substitution is ok.
calc_volumes is seeming to do the calculation and being some sort of filter (it's picky as to what it prints). This function needs to be broken up in to it's two separate jobs. printing and calculating. (maybe there was a filter step but I leave that for homework)
.
import pandas as pd
eth_price_url = 'https://poloniex.com/public?command=returnTicker'
def get_data(url=''):
""" Call data from Poloniex and put it in a dataframe"""
data = pd.read_json(url)
return data
def get_current_eth_price(data = None):
""" grab the price out of the dataframe """
current_eth_price = data['USDT_ETH']['last'].round(2)
return current_eth_price
def calc_volumes(data=None, current_eth_price=None):
""" Calculate the base & quote volumes """
data = df[df.columns[df.columns.str.contains('ETH')]].loc[['baseVolume', 'quoteVolume', 'last']]
data = data.transpose()
data[['baseVolume','quoteVolume']]*= current_eth_price
data['quoteVolume']*=data['last']
data['percentages']=(data['quoteVolume'] - data['baseVolume']) / data['quoteVolume'] * 100
return data
df = get_data(url = eth_price_url)
the_price = get_current_eth_price(data = df)
print(f'the current eth price is: {the_price}')
volumes = calc_volumes(data=df, current_eth_price=the_price)
print(volumes)

This code seems kind of odd and inconsistent... for example, you're importing pandas and calling your variable df but you're not actually using dataframes. If you used df = pd.read_json('https://poloniex.com/public?command=returnTicker', 'index')* to get a dataframe, most of your data manipulation here would become much easier, and wouldn't require any loops either.
For example, the first function's code would become as simple as current_eth_price = df.loc['USDT_ETH','last'].
The second function's code would basically be
eth_rows = df[df.index.str.contains('ETH')]
total_base_volume = (eth_rows.baseVolume * current_eth_price).sum()
total_quote_volume = (eth_rows.quoteVolume * eth_rows['last'] * current_eth_price).sum()
(*The 'index' argument tells pandas to read the JSON dictionary indexed by rows, then columns, rather than columns, then rows.)

Formating time and plot it

I have the following excel file and the time stamp in the format
20180821_2330
1) for a lot of days. How would I format it as standard time so that I can plot it versus the other sensor values ?
2) I would like to have a big plot with for example sensor 1 reading against all the days, is that possible ?
https://www.mediafire.com/file/m36ha4777d6epvd/median_data.xlsx/file

is this something you are looking for? I improvised and created 'n' column which could represent your 'timestamp' as the data frame. Basically, what I think you should do, is to apply another function - let's call it 'apply_fun' on your column which stores 'timestamps' a function which takes each element and transforms it into strptime() format.
import datetime
import pandas as pd
n = {'timestamp':['20180822_2330', '20180821_2334', '20180821_2334', '20180821_2330']}
data_series = pd.DataFrame(n)
def format_dates(n):
x = n.find('_')
y = datetime.datetime.strptime(n[:x]+n[x+1:], '%Y%m%d%H%M')
return y
def apply_fun(dataset):
dataset['timestamp2'] = dataset['timestamp'].apply(format_dates)
return dataset
print(apply_fun(data_series))
When it comes to 2nd point, I am not able to reach the site due to McAffe agent at work, which does not allow to open it. Once you have 1st, you can ask for 2nd separately.

Tracking Error on a number of benchmarks

I'm trying to calculate tracking error for a number of different benchmarks versus a fund that I'm looking at (tracking error is defined as the standard deviation of the percent difference between the fund and benchmark). The time series for the fund and all the benchmarks are all in a data frame that I'm reading from an excel on file and what I have so far is this (with the idea that arg1 represents all the benchmarks and is then applied using applymap), but it's returning a KeyError, any suggestions?
import pandas as pd
import numpy as np
data = pd.read_excel('File_Path.xlsx')
def index_analytics(arg1):
tracking_err = np.std((data['Fund'] - data[arg1]) / data[arg1])
return tracking_err
data.applymap(index_analytics)

There are a few things that need fixed. First,applymap passes each individual value for all the columns to your calling function (index_analytics). So arg1 is the individual scalar value for all the values in your dataframe. data[arg1] is always going to return a key error unless all your values are also column names.
You also shouldn't need to use apply to do this. Assuming your benchmarks are in the same dataframe then you should be able to do something like this for each benchmark. Next time include a sample of your dataframe.
df['Benchmark1_result'] = (df['Fund'] - data['Benchmark1']) / data['Benchmark1']
And if you want to calculate all the standard deviations for all the benchmarks you can do this
# assume you have a dataframe with a list of all the benchmark columns
benchmark_columns = [list, of, benchmark, columns]
np.std((df['Fund'].values - df[benchmark_columns].values) / df['Fund'].values, axis=1)

Assuming you're following the definition of Tracking Error below:
import pandas as pd
import numpy as np
# Example DataFrame
df = pd.DataFrame({'Portfolio_Returns': [5.00, 1.67], 'Bench_Returns': [2.89, .759]})
df['Active_Return'] = df['Portfolio_Returns'] - df['Bench_Returns']
print(df.head())
list_ = df['Active_Return']
temp_ = []
for val in list_:
x = val**2
temp_.append(x)
tracking_error = np.sqrt(sum(temp_))
print(f"Tracking Error is: {tracking_error}")
Or if you want it more compact (because apparently the cool kids do it):
df = pd.DataFrame({'Portfolio_Returns': [5.00, 1.67], 'Bench_Returns': [2.89, .759]})
tracking_error = np.sqrt(sum([val**2 for val in df['Portfolio_Returns'] - df['Bench_Returns']]))
print(f"Tracking Error is: {tracking_error}")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Return dataframe with columns in python - python

Related

np.where is not calculating correctly?

Is there a faster method to calculate implied volatility using mibian module for millions of rows in a csv/xl file?

Python Loop Addition

Formating time and plot it

Tracking Error on a number of benchmarks

Categories

Resources