How to efficiently iterate through a dictionary? - python

I'm new to Python and programming.
My textbook says I have to do the following problem set:
Create a second purchase summary that which accumulates total investment by ticker symbol. In the
above sample data, there are two blocks of CAT.
These can easily be combined by creating a dict where
the key is the ticker and the value is the list of blocks purchased. The program makes one pass
through the data to create the dict. A pass through the dict can then create a report showing each
ticker symbol and all blocks of stock.
I cannot think of a way, apart from hard-coding, to add the two entries of the 'CAT' stock.
## Stock Reports
stockDict = {"GM":"General Motors", "CAT":"Caterpillar", "EK":"Eastman Kodak",
"FB":"Facebook"}
# symbol,prices,dates,shares
purchases = [("GM",100,"10-sep-2001",48), ("CAT",100,"01-apr-1999",24),
("FB",200,"01-jul-2013",56), ("CAT", 200,"02-may-1999",53)]
# purchase history:
print "Company", "\t\tPrice", "\tDate\n"
for stock in purchases:
price = stock[1] * stock[3]
name = stockDict[stock[0]]
print name, "\t\t", price, "\t", stock[2]
print "\n"
# THIS IS THE PROBLEM SET I NEED HELP WITH:
# accumulate total investment by ticker symbol
byTicker = {}
# create dict
for stock in purchases:
ticker = stock[0]
block = [stock]
if ticker in byTicker:
byTicker[ticker] += block
else:
byTicker[ticker] = block
for i in byTicker.values():
shares = i[0][3]
price = i[0][1]
investment = shares * price
print investment
Right now, the output is:
4800
11200
2400
It's not good because it does not calculate the two CAT stocks. Right now it only calculates one. The code should be flexible enough that I could add more CAT stocks.

Your problem is in the last part of your code, the penultimate bit creates a list of all stocks against each ticker, which is fine:
for i in byTicker.values():
shares = i[0][3]
price = i[0][1]
investment = shares * price
print investment
Here you only use the zeroth stock for each ticker. Instead, try:
for name, purchases in byTicker.items():
investment = sum(shares * price for _, shares, _, price in purchases)
print name, investment
This will add up all of the stocks for each ticker, and for your example gives me:
CAT 13000
FB 11200
GM 4800

The problem with your code is that you are not iterating over the purchaes, but just getting the first element from each ticker value. That is, byTicker looks something like:
byTicker: {
"GM": [("GM",100,"10-sep-2001",48)],
"CAT": [("CAT",100,"01-apr-1999",24), ("CAT", 200,"02-may-1999",53)],
"FB": [("FB",200,"01-jul-2013",56)]
}
so when you iterate over the values, you actually get three lists. But when you process these lists, you are actually accessing only the first of them:
price = i[0][1]
for the value corresponding to "CAT", i[0] is ("CAT",100,"01-apr-1999",24). You should look into i[1] as well! Consider iterating over the different purchases:
for company, purchases in byTicker.items():
investment = 0
for purchase in purchases:
investment += purchase[1] * purchase[3]
print(company, investment)

Maybe something like this:
## Stock Reports
stockDict = {"GM":"General Motors", "CAT":"Caterpillar", "EK":"Eastman Kodak",
"FB":"Facebook"}
# symbol,prices,dates,shares
purchases = [("GM",100,"10-sep-2001",48), ("CAT",100,"01-apr-1999",24),
("FB",200,"01-jul-2013",56), ("CAT", 200,"02-may-1999",53)]
# purchase history:
print "Company", "\t\tPrice", "\tDate\n"
for stock in purchases:
price = stock[1] * stock[3]
name = stockDict[stock[0]]
print name, "\t\t", price, "\t", stock[2]
print "\n"
# THIS IS THE PROBLEM SET I NEED HELP WITH:
# accumulate total investment by ticker symbol
byTicker = {}
# create dict
for stock in purchases:
ticker = stock[0]
price = stock[1] * stock[3]
if ticker in byTicker:
byTicker[ticker] += price
else:
byTicker[ticker] = price
for ticker, price in byTicker.iteritems():
print ticker, price
The output I get is:
Company Price Date
General Motors 4800 10-sep-2001
Caterpillar 2400 01-apr-1999
Facebook 11200 01-jul-2013
Caterpillar 10600 02-may-1999
GM 4800
FB 11200
CAT 13000
which appears to be correct.
Testing whether or not a ticker is in the byTicker dict tells you whether or not there's already been a purchase recorded for that stock. If there is, you just add to it, if not, you start fresh. This is basically what you were doing, except for some reason you were collecting all of the purchase records for a given stock in that dict, when all you really cared about was the price of the purchase.
You could build the dict the same way you were originally, and then iterate over the items stored under each key, and add them up. Something like this:
totals = []
for ticker in byTicker:
total = 0
for purchase in byTicker[ticker]:
total += purchase[1] * purchase[3]
totals.append((ticker, total))
for ticker, total in totals:
print ticker, total
And just for kicks, you could compress it all into one line with generator statements:
print "\n".join("%s: %d" % (ticker, sum(purchase[1]*purchase[3] for purchase in byTicker[ticker])) for ticker in byTicker)
Either of these last two are completely unnecessary to do though, since you're already iterating through every purchase, you may as well just accumulate the total price for each stock as you go, as I showed in the first example.

Related

pandas returning line number and type?

I have a csv file, and using python get the highest average price of avocado from the data. All works fine until printing the region
avocadoesDB = pd.read_csv("avocado.csv")
avocadoesDB = pd.DataFrame(avocadoesDB)
avocadoesDB = avocadoesDB[['AveragePrice', 'type', 'year', 'region']]
regions = avocadoesDB[['AveragePrice', 'region']]
regionMax = max(regions['AveragePrice'])
region = regions.loc[regions['AveragePrice']==regionMax]
print(f"The highest average price for both types of potatoes is ${regionMax} from {region['region']}.")
Output:
The highest average price for both types of potatoes is $3.25 from 14125 SanFrancisco
Name: region, dtype: object.
Expected:
The highest average price for both types of potatoes is $3.25 from SanFrancisco.
So i've tried to copy the similar method on a simple dataset and i've seem to make it work, here's the code snippet
mx = max(df1['Salary'])
plc = df.loc[df1['Salary']==mx]['Name']
print('Max Sal : ' + str(plc.iloc[0]))
Output:
Max Sal : Farah
According to this post on Stack Overflow, when you use df.loc[df1['Salary']==mx]['Name'] , A Series Object is returned, and so to retrieve the value of the desired column, you use [0], if I understood the post correctly.
So for your code, you can replace
region = regions.loc[regions['AveragePrice']==regionMax]
print(f"The highest average price for both types of potatoes is ${regionMax} from {region['region']}.")
with
region = regions.loc[regions['AveragePrice']==regionMax]['region']
print(f"The highest average price for both types of potatoes is ${regionMax} from {region}.")
This should work. Hope this helps!

Dataframe - for each row, compare values of two columns, get value of third column on match

I have a pandas dataframe in Python that contains a list of different stocks by ticker symbol, and for each one, it also records current price and a low and high price alert threshold value.
Below shows a sample of the dataframe:
TICKER
CURRENT PRICE($)
ALERT PRICE HIGH ($)
ALERT PRICE LOW ($)
AMZN
114
180
105
APPL
140
110
190
MSFT
235
340
210
NOTE: I've excluded pandas Index column above as this integer can change for the same TICKER stock, depending on the order they randomly get returned in the API request.
For each row in the dataframe, I want to test if the ['CURRENT PRICE ($)'] is above the ['ALERT PRICE HIGH ($)'] or below the ['ALERT PRICE LOW (£)'].
Where the condition if true, I want to pass the 'TICKER' ID to a print statement that notifies of the price alert being reached.
In pseudo-code it would be along the lines of the below:
for each row in df:
if CURRENT PRICE($) > ALERT PRICE HIGH ($)
print('High Price Alert for' + TICKER)
if CURRENT PRICE($) < ALERT PRICE LOW ($)
print('Low Price Alert for:' + TICKER)
Being fairly new to dataframes, I'm not sure how to translate this into code that will achieve my goal, or if looping over a dataframe in this way is even the best method for this. Hoping someone can help please.
You can loop over a dataframe but you should use vectorized code whenever possible. In your case, I doubt if either method makes a difference.
Here's how I would do it:
# high_ and low_alert are series to True/False values
high_alert = df["CURRENT PRICE($)"] > df["ALERT PRICE HIGH ($)"]
low_alert = df["CURRENT PRICE($)"] < df["ALERT PRICE LOW ($)"]
# df.loc[high_alert, "TICKER"] get rows where `high_alert == True`
# and extract the TICKER column.
# Likewise for df.loc[low_alert, "TICKER"]
print("High Prices Alert for: " + " ".join(df.loc[high_alert, "TICKER"]))
print("Low Prices Alert for: " + " ".join(df.loc[low_alert, "TICKER"]))

Is there any way to get rid of for loops and get the number of stocks as a variable of cash in hand?

I have got the following code from datacamp to create a portfolio of returns for trading in the stock market.
# Set the initial capital
initial_capital= float(100000)
positions = pd.DataFrame(index=final.index).fillna(0.0)
number_of_stocks = 3
positions['signal'] = number_of_stocks*final['signal'] #Buy shares
portfolio = positions.multiply(final['Close'], axis=0)
pos_diff = positions.diff()
portfolio['holdings'] = (positions.multiply(final['Close'], axis=0)).sum(axis=1) # holding amount
portfolio['cash'] = initial_capital - (pos_diff.multiply(final['Close'], axis=0)).sum(axis=1).cumsum() # cash amount
portfolio['total'] = portfolio['cash'] + portfolio['holdings'] # total amount
portfolio['returns'] = portfolio['total'].pct_change() # Return percentages
portfolio['diff'] = pos_diff
portfolio['positions'] = positions['signal']
portfolio.tail()
All I want to do now is to convert number_of_strock into a variable of cash in hand, so that, I can trade with all the cash I have in my hand each time I buy or sell.
I have tried using nested for loops but did not get any good outputs. Is there any way to get the thing done without using a complex looping structure?
Thanks for your aid.

Plotting (discrete sum over time period) vs. (time period) yields graph with discontinuities

I have some lists related to buying and selling bitcoin.
One is the price (of a buy or sell) and the other is an associated date.
When I plot the total money made (or lost) from my buying/selling over different lengths of time vs. those different lengths of time, the result is 'choppy' - not what I expected. And I think my logic might be wrong
My raw input lists look like:
dates=['2013-05-12 00:00:00', '2013-05-13 00:00:00', '2013-05-14 00:00:00', ....]
prices=[114.713, 117.18, 114.5, 114.156,...]
#simple moving average of prices calced over a short period
sma_short_list = [None, None, None, None, 115.2098, 116.8872, 118.2272, 119.42739999999999, 121.11219999999999, 122.59219999999998....]
#simple moving average of prices calced over a longer period
sma_long_list = [...None, None, None, None, 115.2098, 116.8872, 118.2272, 119.42739999999999, 121.11219999999999, 122.59219999999998....]
Based on the moving average cross-overs (which were calculated based on https://stackoverflow.com/a/14884058/2089889) I will either buy or sell the bitcoin at the date/price where crossover occurred.
I wanted to plot how (much money this approach would have made me as of today) vs. (days ago that I started this approach)
BUT
I am having trouble in that the resulting graph is really choppy. First I thought this was because I have one more buy than sell (or vis-versa) so I tried to account for that. But it was still choppy. NOTE the following code is called in a loop for days_ago in reversed(range(0,approach_started_days_ago)): so each time the following code executes it should spit out how much money that approach would have made had I started it days_ago (I call this bank), and the choppy plot is the days_ago vs. bank
dates = data_dict[file]['dates']
prices = data_dict[file]['prices']
sma_short_list = data_dict[file]['sma'][str(sma_short)]
sma_long_list = data_dict[file]['sma'][str(sma_long)]
prev_diff=0
bank = 0.0
buy_amt, sell_amt = 0.0,0.0
buys,sells, amt, first_tx_amt, last_tx_amt=0,0,0, 0, 0
start, finish = len(dates)-days_ago,len(dates)
for j in range(start, finish):
diff = sma_short_list[j]-sma_long_list[j]
amt=prices[j]
#If a crossover of the moving averages occured
if diff*prev_diff<0:
if first_tx_amt==0:
first_tx_amt = amt
#BUY
if diff>=0 and prev_diff<=0:
buys+=1
bank = bank - amt
#buy_amt = buy_amt+amt
#print('BUY ON %s (PRICE %s)'%(dates[j], prices[j]))
#SELL
elif diff<=0 and prev_diff>=0:
sells+=1
bank = bank + amt
#sell_amt = sell_amt + amt
#print('SELL ON %s (PRICE %s)'%(dates[j], prices[j]))
prev_diff=diff
last_tx_amt=amt
#if buys > sells, subtract last
if buys > sells:
bank = bank + amt
elif sells < buys:
bank = bank - amt
#THIS IS RELATED TO SOME OTHER APPROACH I TRIED
#a = (buy_amt) / buys if buys else 0
#b = (sell_amt) / sells if sells else 0
#diff_of_sum_of_avg_tx_amts = a - b
start_date = datetime.now()-timedelta(days=days_ago)
return bank, start_date
I reasoned that my amount in the 'bank' would be the amount I have sold - the amount I have bought
But, if the first crossover was a sell I don't want to count that (I am going to assume that the first tx I make will be a buy.
Then if the last tx I make is a buy (negative to my bank), I will count today's price into my 'bank'
if last_tx_type=='buy':
sell_amt=sell_amt+prices[len(prices)-1] #add the current amount to the sell amount if the last purchase you made is a buy
if sell_first==True:
sell_amt = sell_amt - first_tx_amt #if the first thing you did was sell, you do not want to add this to money made b/c it was with apriori money
bank = sell_amt-buy_amt

How to make dict from list components - python

I have a list of dates:
dates = ['2018-11-13 ', '2018-11-14 ']
and I have a list of weather data for various cities:
weather_data = [('Carbondale', 1875.341, '2018-11-13 '), ('Carbondale', 1286.16, '2018-11-14 '), ('Davenport', 708.5, '2018-11-13 '), ('Davenport', 506.1, '2018-11-14 ')]
i[1] in weather_data is a climate score, based on climatic info for each day. I have shortened the above lists for the sake of this example.
My goal is to find the city with the lowest climate score for each day. I thought a good way to do that would be to put them in a dictionary.
An example of what I want is...
conditions_dict = {'2018-11-13': ('Carbondale',1875.341), ('Davenport', 708.5)}
and my end output would be...
The best weather on 2018-11-13 is in Davenport with a value of 708.5
Basically, if I had a dict with a date as the key, and (city,value) as the value, I could then easily find the lowest value by city for each day.
However, I cannot figure how to make my dictionary look like this. The part I am really struggling with is how to match the date to multiple readings for various cities on one day.
Is using a dictionary even a good way to do this?
You don't really need an intermediate dict with all cities and scores for each date if your goal is find the minimum score and city of each date since you can simply iterate through weather_data and keep track of the lowest score so far and its associated city for each date in a dict:
min_score_of_date = {}
for city, score, date in weather_data:
if date not in min_score_of_date or score < min_score_of_date.get(date)[1]:
min_score_of_date[date] = (city, score)
Given your sample input, min_score_of_date would become:
{'2018-11-13 ': ('Davenport', 708.5), '2018-11-14 ': ('Davenport', 506.1)}
This is another way you can go about it if the lowest temperature dates haven't already been filtered for you.
# each date has a tuple of cities and their temperature
conditions = {
'2018-11-13': (
('Carbondale',1875.341),
('Davenport', 708.5)
)
}
# loop through every date
for date, cities in conditions.items():
# for every date, loop through its values
# grab its temperateure and add to the list
# them find the minimun temperature
# get all tempertures
tempertures = [_[1] for _ in cities]
# get minimum temperature
min_temperture = min(tempertures)
# loop throught all cities
for city in cities:
# if a city matches min_temperature do whats bellow
if min_temperture in city:
# city name
name = city[0]
# city temperture
temperture = str(city[1])
print(
"The best weather on "\
+ date\
+ "is in "\
+ name + " with a value of "\
+ temperture
)

Categories

Resources