How to accelerate the following function in Python

How to accelerate the following function in Python - python

I have a complex function which takes forever to run on large pandas dataframe and I do not find ways to accelerate it. Do you guys have any tip?
I have used numba, but this is clearly not enough. I have also tried to used index reference to exploit pandas capacity to the maximum but, I am sure there are other ways I have not implemented.
What this function does, is basically takes a dataframe with random time spaced events and normalize it into second spaced events. There are three different types of events (TRADE, BEST_BID, BEST_ASK), so for each second I should have three lines (one of each event). If not event of a specific type happened during that second, we reuse previous value.
Thank you for your help!
#numba.jit
def convertTicksToSeconds(dataFrame_df):
idxs = dataFrame_df['data_all'][dataFrame_df['data_all']['time_change']].index.tolist()
previous_idx = 0
progress_i = 0
#Creation of the df to holfd the normalized data
normalized_Data = pandas.DataFrame( columns=['timestamp', 'B.A.T', 'price', 'volume', 'asset'])
BAT_type =['TRADE','BEST_BID','BEST_ASK']
tmp_time = dataFrame_df['data_all']['timestamp'][0]
data_TRADE = {'timestamp': tmp_time, 'B.A.T': 'TRADE', 'price': 0, 'volume': 0, 'asset': dataFrame_df['data_all']['asset'][0]}
data_BID = {'timestamp': tmp_time, 'B.A.T': 'BEST_BID', 'price': 0, 'volume': 0, 'asset': dataFrame_df['data_all']['asset'][0]}
data_ASK = {'timestamp': tmp_time, 'B.A.T': 'BEST_ASK', 'price': 0, 'volume': 0, 'asset': dataFrame_df['data_all']['asset'][0]}
for BAT in BAT_type:
for idx in idxs:
if dataFrame_df['data_all'][previous_idx:idx-1][dataFrame_df['data_all']['B.A.T'] == BAT].empty == False:
timestamp = dataFrame_df['data_all']['timestamp'][idx]
price = dataFrame_df['data_all']['price'][previous_idx:idx-1][dataFrame_df['data_all']['B.A.T'] == BAT]
volume = dataFrame_df['data_all']['volume'][previous_idx:idx-1][dataFrame_df['data_all']['B.A.T'] == BAT]
total_volume = volume.sum()
weighted_price = price * volume
weighted_price = weighted_price.sum() / total_volume
volume = volume.mean()
asset = dataFrame_df['data_all']['asset'][idx]
if BAT == 'TRADE':
data_TRADE = {'timestamp': timestamp, 'B.A.T': BAT, 'price': weighted_price, 'volume': volume, 'asset': asset}
elif BAT == 'BEST_BID':
data_BID = {'timestamp': timestamp, 'B.A.T': BAT, 'price': weighted_price, 'volume': volume, 'asset': asset}
elif BAT == 'BEST_ASK':
data_ASK = {'timestamp': timestamp, 'B.A.T': BAT, 'price': weighted_price, 'volume': volume, 'asset': asset}
print data_TRADE
print data_BID
print data_ASK
normalized_Data.append(data_TRADE, ignore_index=True)
normalized_Data.append(data_BID, ignore_index=True)
normalized_Data.append(data_ASK, ignore_index=True)
previous_idx = idx
progress_i += 1
tmp = (progress_i / len(idxs))*100
print ('Progress : ' + str(tmp) + ' %')
return normalized_Data

Related

How can I convert .append to .concat pandas python

I have this data entry:
[{'id': 2269396, 'from': 1647086100, 'at': 1647086160000000000, 'to': 1647086160, 'open': 1.072652, 'close': 1.072691, 'min': 1.072641, 'max': 1.072701, 'volume': 0},..]
Apllying this indexing pandas:
current = self.getAllCandles(self.active_id,start_candle)
main = pd.DataFrame()
useful_frame = pd.DataFrame()
for candle in current:
useful_frame = pd.DataFrame(list(candle.values()),index = list(candle.keys())).T.drop(columns = ['at'])
useful_frame = useful_frame.set_index(useful_frame['from']).drop(columns = ['id'])
main = main.append(useful_frame)
main.drop_duplicates()
final_data = main.drop(columns = {'to'})
final_data = final_data.loc[~final_data.index.duplicated(keep = 'first')]
return final_data
After that I have the following result:
from open close min max volume
from
1.647086e+09 1.647086e+09 1.072652 1.072691 1.072641 1.072701 0.0
... ... ... ... ... ... ...
Since df.append() will be deprecated, I'm struggling to execute the same instructions using df.concat(). But I'm not getting it, how could I change that?
Thank you all, I made a small modification to the code suggested by our friend Stuart Berg #stuart-berg, and it was perfect:
current = self.getAllCandles(self.active_id, start_candle)
frames = []
useful_frame = pd.DataFrame.from_dict(current, orient='columns')
useful_frame = useful_frame.set_index('from')
useful_frame = useful_frame.drop(columns=['at', 'id'])
frames.append(useful_frame)
main = pd.concat(frames).drop_duplicates()
final_data = main.drop(columns='to')
final_data = final_data.loc[~final_data.index.duplicated()]
return final_data

I think this is what you're looking for:
current = self.getAllCandles(self.active_id, start_candle)
frames = []
for candle in current:
useful_frame = pd.DataFrame.from_dict(candle, orient='columns')
#useful_frame['from'] = datetime.datetime.fromtimestamp(int(useful_frame['from'])).strftime('%Y-%m-%d %H:%M:%S')
useful_frame = useful_frame.set_index('from')
useful_frame = useful_frame.drop(columns=['at', 'id'])
frames.append(useful_frame)
main = pd.concat(frames).drop_duplicates()
final_data = main.drop(columns='to')
final_data = final_data.loc[~final_data.index.duplicated()]

Create an empty python list and then append all the series to the list. Finally call pandas' concat on that list, this will give you that dataframe.

How create dataframe from list of dictionary of multi level json

So I have a json file that have multiple level,by using pandas I can read the first level and put it in a dataframe,but the problem is as you can see in the dataframe Column Comments and hastags the second level is inside a column have format of list of dictionary,is there any solution for make the second level dictionary into dataframe. I try to use for loop and json_normalize but it always throw an error. Any suggestion? My code is like this
import pandas as pd
df2 = pd.read_json("data.json")
cid = []
for x in df2["comments"]:
cid.append(x.get('cid'))
data = pd.DataFrame({'cid':cid})
If i use the code it throw an error since I try access list by string not index.
AttributeError: 'list' object has no attribute 'get'
Even I change it into integer it got dictionary inside a column.How to change the dict inside the column or is there another easier way to do this? Dictionary in column
for x in df2["comments"]:
cid.append(x[0])
data = pd.DataFrame({'cid':cid})
for y in data:
print(y.get('cid'))
Example of first row of the data frame
[{'cid': '7000061798167266075', 'createTime': 1629828926, 'text': 'Done 🥰', 'diggCount': 1, 'replyCommentTotal': 0, 'uid': '6529379534374092801', 'uniqueId': 'alikayanti'}, {'cid': '6999869922566783771', 'createTime': 1629784228, 'text': 'haloo min, yg udah ikutan di misi sebelumnya boleh ikutan lagi gaa?', 'diggCount': 1, 'replyCommentTotal': 1, 'uid': '6842932775562642433', 'uniqueId': 'fia_654'}, {'cid': '7000248857603588891', 'createTime': 1629872457, 'text': 'bell bottoms maksudnya apa kak?\napakah benar artinya bel bawah?', 'diggCount': 0, 'replyCommentTotal': 2, 'uid': '6960940768727417857', 'uniqueId': 'keterimadiptn1'}, {'cid': '7000322023545455387', 'createTime': 1629889491, 'text': 'syudah🥰', 'diggCount': 0, 'replyCommentTotal': 0, 'uid': '6806645499672839170', 'uniqueId': 'miftahulhaqqu'}, {'cid': '7001271117180977947', 'createTime': 1630110475, 'text': 'kak, perpanjang dong waktu posting videonya :)', 'diggCount': 1, 'replyCommentTotal': 0, 'uid': '6921267441846830082', 'uniqueId': 'elisabetkst'}]

Maybe this solves your problem:
Defined the following function which unnests any json:
import json
import pandas as pd
def flatten_nested_json_df(df):
df = df.reset_index()
s = (df.applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df.applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
while len(list_columns) > 0 or len(dict_columns) > 0:
new_columns = []
for col in dict_columns:
horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
horiz_exploded.index = df.index
df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
new_columns.extend(horiz_exploded.columns) # inplace
for col in list_columns:
#print(f"exploding: {col}")
df = df.drop(columns=[col]).join(df[col].explode().to_frame())
new_columns.append(col)
s = (df[new_columns].applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df[new_columns].applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
return df
and do this:
results = pd.json_normalize(data)
df = pd.DataFrame(results)
outdf = flatten_nested_json_df(df)
which returns:
index cid createTime \
0 0 7000061798167266075 1629828926
1 1 6999869922566783771 1629784228
2 2 7000248857603588891 1629872457
3 3 7000322023545455387 1629889491
4 4 7001271117180977947 1630110475
text diggCount \
0 Done 🥰 1
1 haloo min, yg udah ikutan di misi sebelumnya b... 1
2 bell bottoms maksudnya apa kak?\napakah benar ... 0
3 syudah🥰 0
4 kak, perpanjang dong waktu posting videonya :) 1
replyCommentTotal uid uniqueId
0 0 6529379534374092801 alikayanti
1 1 6842932775562642433 fia_654
2 2 6960940768727417857 keterimadiptn1
3 0 6806645499672839170 miftahulhaqqu
4 0 6921267441846830082 elisabetkst

I found the solution is through multiple for loop after we looping the row of the targeted column,and after that append it as a list
unique_id = []
cid = []
createTime = []
text = []
diggCount = []
replyCommentTotal = []
uid = []
for items in df2["comments"]:
for x in items:
unique_id.append(x["uniqueId"])
cid.append(x["cid"])
createTime.append(x["createTime"])
text.append(x["text"])
diggCount.append(x["diggCount"])
replyCommentTotal.append(x["replyCommentTotal"])
uid.append(x["uid"])
data = pd.DataFrame({'unique_id':unique_id,
'cid':cid,
'createTime':createTime,
'text':text,
'diggCount':diggCount,
'replyCommentTotal':replyCommentTotal,
'uid':uid})

Python return list of items that fit set parameters

I'm having some difficulty properly boxing the returned solution to the min_cals <= sum(calories) <= max_cals. There are combinations that yield solutions where sum(cals) < min_cals. In addition to remaining within the caloric range the solution should yield a list where the sum(cost approaches as closely to budget as possible without exceeding the budget limit. Here's some re-contextualized code. I could really use a hand:
menu = [
{'name':'Cheese Pizza Slice', 'calories': 700, 'cost': 4},
{'name':'House Salad', 'calories': 100, 'cost': 8.5},
{'name':'Grilled Shrimp', 'calories': 400, 'cost': 15},
{'name':'Beef Brisket', 'calories': 400, 'cost': 12},
{'name':'Soda', 'calories': 100, 'cost': 1},
{'name':'Cake', 'calories': 300, 'cost': 3},
]
def menu_recommendation(menu, min_cal, max_cal, budget):
menu = [item for item in menu if item['calories'] <= max_cal and item['cost'] <= budget]
if len(menu) == 0: return []
return min((
[item] + menu_recommendation(menu, min_cal - item['calories'], max_cal - item['calories'], budget - item['cost'])
for item in menu
), key=
lambda recommendations: [budget - sum(item['cost'] for item in recommendations) and min_cal <= sum(item['calories'] for item in recommendations) <= max_cal, -sum(item['calories'] for item in recommendations)]
)
recommendation = menu_recommendation(menu, 1000, 1200, 15)
total_cost = sum(item['cost'] for item in recommendation)
total_cals = sum(item['calories'] for item in recommendation)
print(f'recommendation: {recommendation}')
print(f'total cost: {total_cost}')
print(f'total calories: {total_cals}')
for example, the following returns a solution with a total calorie count of 700, which is below the 1000 minimum.
recommendation = menu_recommendation(menu, 1000, 1200, 15)

We can work up something recursive, probably.
def smallest_combo(lst, m, n, z):
# filter list to remove elements we can't use next without breaking the rules
lst = [dct for dct in lst if m <= dct['x'] <= n and dct['y'] <= z]
# recursive base case
if len(lst) == 0:
return []
# go through our list of eligibles
# simulate 'what would the best possibility be if we picked that one to go with next'
# then of those results select the one with the sum('y') closest to z
# (breaking ties with the largest sum('x'))
return min((
[dct] + smallest_combo(lst, m - dct['x'], n - dct['x'], z - dct['y'])
for dct in lst
), key=
lambda com: [z - sum(d['y'] for d in com), -sum(d['x'] for d in com)]
)
inp = [{'name': 'item1', 'x': 600, 'y': 5},
{'name': 'item2', 'x': 200, 'y': 8},
{'name': 'item3', 'x': 500, 'y': 12.5},
{'name': 'item4', 'x': 0, 'y': 1.5},
{'name': 'item5', 'x': 100, 'y': 1}]
print(smallest_combo(inp, 500, 1500, 25))
# [{'name': 'item3', 'x': 500, 'y': 12.5}, {'name': 'item3', 'x': 500, 'y': 12.5}]
There would be a number of ways to speed this up. First by making a recursive cache, and second by taking a dynamic programming approach instead (i.e. start at the bottom instead of at the top).

Here is a dynamic programming solution that builds up a data structure showing all of the (calorie, cost) options we can wind up with along with one item each. We look for the best one meeting the criteria, then find what recommendation that is.
def menu_recommendation(menu, min_cal, max_cal, budget):
# This finds the best possible solution in pseudo-polynomial time.
recommendation_tree = {(0, 0.0): None}
for item in menu:
# This tree will wind up being the old plus new entries from adding this item.
new_recommendation_tree = {}
for key in recommendation_tree.keys():
calories, cost = key
new_recommendation_tree[key] = recommendation_tree[key]
new_key = (calories + item['calories'], cost + item['cost'])
if new_key not in recommendation_tree and new_key[0] <= max_cal:
# This is a new calorie/cost combination to look at.
new_recommendation_tree[new_key] = item
# And now save the work.
recommendation_tree = new_recommendation_tree
# Now we look for the best combination.
best = None
for key in recommendation_tree:
calories, cost = key
# By construction, we know that calories <= max_cal
if min_cal <= calories:
if best is None or abs(budget - cost) < abs(budget - best[1]):
# We improved!
best = key
if best is None:
return None
else:
# We need to follow the tree back to the root to find the recommendation
calories, cost = best
item = recommendation_tree[best]
answer = []
while item is not None:
# This item is part of the menu.
answer.append(item)
# And now find where we were before adding this item.
calories = calories - item['calories']
cost = cost - item['cost']
best = (calories, cost)
item = recommendation_tree[best]
return answer

I came up with this, it's basically a knapsack but it removes recursively dishes from the menu if they are not suitable for the recommendation:
menu = [
{'name':'Cheese Pizza Slice', 'calories': 700, 'cost': 4},
{'name':'House Salad', 'calories': 100, 'cost': 8.5},
{'name':'Grilled Shrimp', 'calories': 400, 'cost': 15},
{'name':'Beef Brisket', 'calories': 400, 'cost': 12},
{'name':'Soda', 'calories': 100, 'cost': 1},
{'name':'Cake', 'calories': 300, 'cost': 3},
]
def get_price(recommendation):
return sum(dish["cost"] for dish in recommendation)
def get_calories(recommendation):
return sum(dish["calories"] for dish in recommendation)
def menu_recommendation(menu, min_cal, max_cal, budget):
sorted_menu = sorted(menu, key=lambda dish: dish["cost"], reverse=True)
recommendation = []
for dish in sorted_menu:
if dish["cost"] + get_price(recommendation) <= budget:
recommendation.append(dish)
if recommendation:
if get_calories(recommendation) < min_cal:
sorted_menu.remove(min(recommendation, key=lambda dish: dish["calories"]/dish["cost"]))
return menu_recommendation(sorted_menu, min_cal, max_cal, budget)
if get_calories(recommendation) > max_cal:
sorted_menu.remove(max(recommendation, key=lambda dish: dish["calories"]/dish["cost"]))
return menu_recommendation(sorted_menu, min_cal, max_cal, budget)
return recommendation
recommendation = menu_recommendation(menu, 500, 800, 15)
total_cost = sum(item['cost'] for item in recommendation)
total_cals = sum(item['calories'] for item in recommendation)
print(f'recommendation: {recommendation}')
print(f'total cost: {total_cost}')
It removes elements according to the calorie/cost rate, because it's the cost to which is applied the knapsack.
Please let me know if you have any question.

I believe I have solved the issue of making recommendations that were outside of the scoped min_cal / max_cal boundaries, but I still feel like there could be a solution that more closely approaches budget.
Here is my updated code:
menu = [
{'name':'Cheese Pizza Slice', 'calories': 700, 'cost': 4},
{'name':'House Salad', 'calories': 100, 'cost': 8.5},
{'name':'Grilled Shrimp', 'calories': 400, 'cost': 15},
{'name':'Beef Brisket', 'calories': 400, 'cost': 12},
{'name':'Soda', 'calories': 100, 'cost': 1},
{'name':'Cake', 'calories': 300, 'cost': 3},
]
def menu_recommendation(menu, min_cal, max_cal, budget):
menu = [item for item in menu if item['calories'] <= max_cal and item['cost'] <= budget]
if len(menu) == 0: return []
return min(([item] + menu_recommendation(menu, min_cal - item['calories'], max_cal - item['calories'], budget - item['cost'])
for item in menu
), key=
lambda recommendations: [budget - sum(item['cost'] for item in recommendations)
and min_cal - sum(item['calories'] for item in recommendations) >= 0
and max_cal - sum(item['calories'] for item in recommendations) >= 0,
-sum(item['calories'] for item in recommendations)]
)
recommendation = menu_recommendation(menu, 1000, 1200, 15)
total_cost = sum(item['cost'] for item in recommendation)
total_cals = sum(item['calories'] for item in recommendation)
print(f'recommendation: {recommendation}')
print(f'total cost: {total_cost}')
print(f'total calories: {total_cals}')
If anyone has any improvements I'd love to hear them!

How to import multiple stock prices with pandas through yahoo?

So I am trying to get multiple stock prices using pandas and panadas datareader. If I only try to import one ticker it will run fine, but if I use more than one then an error arises. The code is:
import pandas as pd
import pandas_datareader as web
import datetime as dt
stocks = ['BA', 'AMD']
start = dt.datetime(2018, 1, 1)
end = dt.datetime(2020, 1, 1)
d = web.DataReader(stocks, 'yahoo', start, end)
Though I get the error:
ValueError: Wrong number of items passed 2, placement implies 1
So how do I get around it only allowing to pass 1 stock.
So far I have tried using quandl and google instead, which dont work either. I also have tried pdr.get_data_yahoo but I get the same result. I have also tried yf.download() and still get the same issue. Does anyone have any ideas to get around this? Thank you.
EDIT: Full code:
import pandas as pd
import pandas_datareader as web
import datetime as dt
import yfinance as yf
import numpy as np
stocks = ['BA', 'AMD', 'AAPL']
start = dt.datetime(2018, 1, 1)
end = dt.datetime(2020, 1, 1)
d = web.DataReader(stocks, 'yahoo', start, end)
d['sma50'] = np.round(d['Close'].rolling(window=2).mean(), decimals=2)
d['sma200'] = np.round(d['Close'].rolling(window=14).mean(), decimals=2)
d['200-50'] = d['sma200'] - d['sma50']
_buy = -2
d['Crossover_Long'] = np.where(d['200-50'] < _buy, 1, 0)
d['Crossover_Long_Change']=d.Crossover_Long.diff()
d['buy'] = np.where(d['Crossover_Long_Change'] == 1, 'buy', 'n/a')
d['sell'] = np.where(d['Crossover_Long_Change'] == -1, 'sell', 'n/a')
pd.set_option('display.max_rows', 5093)
d.drop(['High', 'Low', 'Close', 'Volume', 'Open'], axis=1, inplace=True)
d.dropna(inplace=True)
#make 2 dataframe
d.set_index(d['Adj Close'], inplace=True)
buy_price = d.index[d['Crossover_Long_Change']==1]
sell_price = d.index[d['Crossover_Long_Change']==-1]
d['Crossover_Long_Change'].value_counts()
profit_loss = (sell_price - buy_price)*10
commision = buy_price*.01
position_value = (buy_price + commision)*10
percent_return = (profit_loss/position_value)*100
percent_rounded = np.round(percent_return, decimals=2)
prices = {
"Buy Price" : buy_price,
"Sell Price" : sell_price,
"P/L" : profit_loss,
"Return": percent_rounded
}
df = pd.DataFrame(prices)
print('The return was {}%, and profit or loss was ${} '.format(np.round(df['Return'].sum(), decimals=2),
np.round(df['P/L'].sum(), decimals=2)))
d

I tried 3 stocks in your code and it returns data for all 3, not sure I understood the problem you're facing?
import pandas as pd
import pandas_datareader as web
import datetime as dt
stocks = ['BA', 'AMD', 'AAPL']
start = dt.datetime(2018, 1, 1)
end = dt.datetime(2020, 1, 1)
d = web.DataReader(stocks, 'yahoo', start, end)
print(d)
Output:
Attributes Adj Close Close ... Open Volume
Symbols BA AMD AAPL BA AMD AAPL ... BA AMD AAPL BA AMD AAPL
Date ...
2018-01-02 282.886383 10.980000 166.353714 296.839996 10.980000 172.259995 ... 295.750000 10.420000 170.160004 2978900.0 44146300.0 25555900.0
2018-01-03 283.801239 11.550000 166.324722 297.799988 11.550000 172.229996 ... 295.940002 11.610000 172.529999 3211200.0 154066700.0 29517900.0
2018-01-04 282.724396 12.120000 167.097290 296.670013 12.120000 173.029999 ... 297.940002 12.100000 172.539993 4171700.0 109503000.0 22434600.0
2018-01-05 294.322296 11.880000 168.999741 308.839996 11.880000 175.000000 ... 296.769989 12.190000 173.440002 6177700.0 63808900.0 23660000.0
2018-01-08 295.570740 12.280000 168.372040 310.149994 12.280000 174.350006 ... 308.660004 12.010000 174.350006 4124900.0 63346000.0 20567800.0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2019-12-24 331.030457 46.540001 282.831299 333.000000 46.540001 284.269989 ... 339.510010 46.099998 284.690002 4120100.0 44432200.0 12119700.0
2019-12-26 327.968689 46.630001 288.442780 329.920013 46.630001 289.910004 ... 332.700012 46.990002 284.820007 4593400.0 57562800.0 23280300.0
2019-12-27 328.187408 46.180000 288.333313 330.140015 46.180000 289.799988 ... 330.200012 46.849998 291.119995 4124000.0 36581300.0 36566500.0
2019-12-30 324.469513 45.520000 290.044617 326.399994 45.520000 291.519989 ... 330.500000 46.139999 289.459991 4525500.0 41149700.0 36028600.0
2019-12-31 323.833313 45.860001 292.163818 325.760010 45.860001 293.649994 ... 325.410004 45.070000 289.929993 4958800.0 31673200.0 25201400.0

I think the error comes from your moving average and the line
d['sma50'] = np.round(d['Close'].rolling(window=2).mean(), decimals=2)
because d represent 3 stocks, I think you have to separate each stock and compute the moving average separately
EDIT : I tried something for two stocks only (BA and AMD) but it is not the best solution because I'm always repeating myself for every line.
I'm just a beginner in Python but maybe this will help you to find a solution to your problem
PS : The last line doesn't work really well (which is the printing of the P&L and Return)
"
import pandas as pd
import pandas_datareader as web
import datetime as dt
stock1 = ['BA']
stock2=['AMD']
start = dt.datetime(2018, 1, 1)
end = dt.datetime(2020, 1, 1)
d1 = web.DataReader(stock1, 'yahoo', start, end)
d2 = web.DataReader(stock2, 'yahoo', start, end)
d1['sma50'] = np.round(d1['Close'].rolling(window=2).mean(), decimals=2)
d2['sma50'] = np.round(d2['Close'].rolling(window=2).mean(), decimals=2)
d1['sma200'] = np.round(d1['Close'].rolling(window=14).mean(), decimals=2)
d2['sma200'] = np.round(d2['Close'].rolling(window=14).mean(), decimals=2)
d1['200-50'] = d1['sma200'] - d1['sma50']
d2['200-50'] = d2['sma200'] - d2['sma50']
_buy = -2
d1['Crossover_Long'] = np.where(d1['200-50'] < _buy, 1, 0)
d2['Crossover_Long'] = np.where(d2['200-50'] < _buy, 1, 0)
d1['Crossover_Long_Change']=d1.Crossover_Long.diff()
d2['Crossover_Long_Change']=d2.Crossover_Long.diff()
d1['buy'] = np.where(d1['Crossover_Long_Change'] == 1, 'buy', 'n/a')
d2['buy'] = np.where(d2['Crossover_Long_Change'] == 1, 'buy', 'n/a')
d1['sell_BA'] = np.where(d1['Crossover_Long_Change'] == -1, 'sell', 'n/a')
d2['sell_AMD'] = np.where(d2['Crossover_Long_Change'] == -1, 'sell', 'n/a')
pd.set_option('display.max_rows', 5093)
d1.drop(['High', 'Low', 'Close', 'Volume', 'Open'], axis=1, inplace=True)
d2.drop(['High', 'Low', 'Close', 'Volume', 'Open'], axis=1, inplace=True)
d2.dropna(inplace=True)
d1.dropna(inplace=True)
d1.set_index("Adj Close",inplace=True)
d2.set_index("Adj Close",inplace=True)
buy_price_BA = np.array(d1.index[d1['Crossover_Long_Change']==1])
buy_price_AMD = np.array(d2.index[d2['Crossover_Long_Change']==1])
sell_price_BA = np.array(d1.index[d1['Crossover_Long_Change']==-1])
sell_price_AMD = np.array(d2.index[d2['Crossover_Long_Change']==-1])
d1['Crossover_Long_Change'].value_counts()
d2['Crossover_Long_Change'].value_counts()
profit_loss_BA = (sell_price_BA - buy_price_BA)*10
profit_loss_AMD = (sell_price_AMD - buy_price_AMD)*10
commision_BA = buy_price_BA*.01
commision_AMD = buy_price_AMD*.01
position_value_BA = (buy_price_BA + commision_BA)*10
position_value_AMD = (buy_price_AMD + commision_AMD)*10
percent_return_BA = np.round(((profit_loss_BA/position_value_BA)*100),decimals=2)
percent_return_AMD = np.round(((profit_loss_AMD/position_value_AMD)*100),decimals=2)
prices_BA = {
"Buy Price BA" : [buy_price_BA],
"Sell Price BA" : [sell_price_BA],
"P/L BA" : [profit_loss_BA],
"Return BA": [percent_return_BA]}
df = pd.DataFrame(prices_BA)
print('The return was {}%, and profit or loss was ${} '.format(np.round(df['Return BA'].sum(), decimals=2),
np.round(df['P/L BA'].sum(), decimals=2)))
prices_AMD = {
"Buy Price AMD" : [buy_price_AMD],
"Sell Price AMD" : [sell_price_AMD],
"P/L AMD" : [profit_loss_AMD],
"Return AMD": [percent_return_AMD]}
df = pd.DataFrame(prices_AMD)
print('The return was {}%, and profit or loss was ${} '.format(np.round(df['Return AMD'].sum(), decimals=2),
np.round(df['P/L AMD'].sum(), decimals=2)))

It seems like there's a bug in the pandas data reader. I work around it by initialising with one symbol and then setting the symbols property on the instantiated object. After doing that, it works fine to call read() on tmp below.
import pandas_datareader as pdr
all_symbols = ['ibb', 'xly', 'fb', 'exx1.de']
tmp = pdr.yahoo.daily.YahooDailyReader(symbols=all_symbols[0])
# this is a work-around, pdr is broken...
tmp.symbols = all_symbols
data = tmp.read()

Different functions return the same item from a Python dictionary

import json, os
def load_data(filepath):
if not os.path.exists(filepath):
return None
with open(filepath, 'r') as file:
return json.load(file)
def get_biggest_bar(data):
bars = []
for bar in data:
bars.append((bar['Cells']['SeatsCount'] , bar['Number']))
max_number = max(bars)[1]
(item for item in data if item['Number'] == max_number).__next__()
return item, max_number
def get_smallest_bar(data):
bars = []
for bar in data:
bars.append((bar['Cells']['SeatsCount'] , bar['Number']))
min_number = min(bars)[1]
(item for item in data if item['Number'] == min_number).__next__()
return item, min_number
def get_closest_bar(data, longitude, latitude):
coordinates = []
def get_distance(point, input_point):
return ((longitude-input_point[0])**2 + (latitude - input_point[1])**2)**1/2
for cell in data:
coordinates.append([cell['Cells']['geoData']['coordinates'],cell['Number']])
for coor in coordinates:
coor[0] = get_distance(point, coor[0])
closest_bar = min(coordinates)[1]
(item for item in data if item['Number'] == closest_bar).__next__()
return item, closest_bar
if __name__ == '__main__':
data = load_data("Bars.json")
print(get_smallest_bar(data))
print(get_biggest_bar(data))
print(get_closest_bar(data, 50.0, 50.0))
And it's output is:
(dict_values(['Семёновский переулок, дом 21', 'нет', 'район Соколиная Гора', 'Восточный административный округ', 'да', 177, {'type': 'Point', 'coordinates': [37.717115000077776, 55.78262800012168]}, 'СПБ', 272459722, [{'PublicPhone': '(916) 223-32-98'}], 'SПБ']), 37)
(dict_values(['Семёновский переулок, дом 21', 'нет', 'район Соколиная Гора', 'Восточный административный округ', 'да', 177, {'type': 'Point', 'coordinates': [37.717115000077776, 55.78262800012168]}, 'СПБ', 272459722, [{'PublicPhone': '(916) 223-32-98'}], 'SПБ']), 434)
(dict_values(['Семёновский переулок, дом 21', 'нет', 'район Соколиная Гора', 'Восточный административный округ', 'да', 177, {'type': 'Point', 'coordinates': [37.717115000077776, 55.78262800012168]}, 'СПБ', 272459722, [{'PublicPhone': '(916) 223-32-98'}], 'SПБ']), 170)
As you see, items are COMPLETLY identical, but they are diffrent(I tried to devide functions and run them seperatly, and they output diffrent items)! Also, you can see the second number in the fucntion's returns - they are diffrent! Whats the matter?!

You are using a generator to get item, but on the next line that variable is not what you think it is. Item from inside the generator is out of scope. I would prefer to return the actual generated value. Also, you are getting the closest bar to some point, but not the one you passed into the function.
Thus I think item and point are both global variables that you are using by mistake inside your functions.
I have python2.7, so the syntax to get the next value from the generator may be slightly different.
def load_data(filepath):
data = [
{'Number': 10, 'Cells': {'SeatsCount': 10, 'geoData': {'coordinates': (10, 10)}}},
{'Number': 50, 'Cells': {'SeatsCount': 50, 'geoData': {'coordinates': (50, 50)}}},
{'Number': 90, 'Cells': {'SeatsCount': 90, 'geoData': {'coordinates': (90, 90)}}}
]
return data
def get_biggest_bar(data):
bars = []
for bar in data:
bars.append((bar['Cells']['SeatsCount'] , bar['Number']))
max_number = max(bars)[1]
g = (item for item in data if item['Number'] == max_number)
return next(g), max_number
def get_smallest_bar(data):
bars = []
for bar in data:
bars.append((bar['Cells']['SeatsCount'] , bar['Number']))
min_number = min(bars)[1]
g = (item for item in data if item['Number'] == min_number)
return next(g), min_number
def get_closest_bar(data, longitude, latitude):
point = (longitude, latitude)
coordinates = []
def get_distance(point, input_point):
return ((longitude-input_point[0])**2 + (latitude - input_point[1])**2)**1/2
for cell in data:
coordinates.append([cell['Cells']['geoData']['coordinates'],cell['Number']])
for coor in coordinates:
coor[0] = get_distance(point, coor[0])
closest_bar = min(coordinates)[1]
g = (item for item in data if item['Number'] == closest_bar)
return next(g), closest_bar
if __name__ == '__main__':
data = load_data("Bars.json")
print("smallest", get_smallest_bar(data))
print("biggest", get_biggest_bar(data))
print("closest", get_closest_bar(data, 50.0, 50.0))
Output:
('smallest', ({'Cells': {'geoData': {'coordinates': (10, 10)}, 'SeatsCount': 10}, 'Number': 10}, 10))
('biggest', ({'Cells': {'geoData': {'coordinates': (90, 90)}, 'SeatsCount': 90}, 'Number': 90}, 90))
('closest', ({'Cells': {'geoData': {'coordinates': (50, 50)}, 'SeatsCount': 50}, 'Number': 50}, 50))

Assign the result of the call to __next__() before returning it, like this:
result = (item for item in data if item['Number'] == closest_bar).__next__()
return result, closest_bar

Answers what went wrong have been given. Im my opinion the code searching for min and max was not "pythonic". I'd like to suggest another approach:
(Using sample data from Kenny Ostrom's answer)
data = [ {'Number': 10, 'Cells': {'SeatsCount': 10, 'geoData': {'coordinates': (10, 10)}}},
{'Number': 50, 'Cells': {'SeatsCount': 50, 'geoData': {'coordinates': (50, 50)}}},
{'Number': 90, 'Cells': {'SeatsCount': 90, 'geoData': {'coordinates': (90, 90)}}} ]
biggest = max(data, key=lambda bar: bar['Cells']['SeatsCount'])
smallest = min(data, key=lambda bar: bar['Cells']['SeatsCount'])
for the closest bar a simple custom key function based on the get_distance from the original code is required, but you got the idea.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to accelerate the following function in Python - python

Related

How can I convert .append to .concat pandas python

How create dataframe from list of dictionary of multi level json

Python return list of items that fit set parameters

How to import multiple stock prices with pandas through yahoo?

Different functions return the same item from a Python dictionary

Categories

Resources