I am taking a course in python and one of the problem sets is as follows:
Read in the contents of the file SP500.txt which has monthly data for 2016 and 2017 about the S&P 500 closing prices as well as some other financial indicators, including the “Long Term Interest Rate”, which is interest rate paid on 10-year U.S. government bonds.
Write a program that computes the average closing price (the second column, labeled SP500) and the highest long-term interest rate. Both should be computed only for the period from June 2016 through May 2017. Save the results in the variables mean_SP and max_interest.
SP500.txt:
Date,SP500,Dividend,Earnings,Consumer Price Index,Long Interest Rate,Real Price,Real Dividend,Real Earnings,PE10
1/1/2016,1918.6,43.55,86.5,236.92,2.09,2023.23,45.93,91.22,24.21
2/1/2016,1904.42,43.72,86.47,237.11,1.78,2006.62,46.06,91.11,24
3/1/2016,2021.95,43.88,86.44,238.13,1.89,2121.32,46.04,90.69,25.37
4/1/2016,2075.54,44.07,86.6,239.26,1.81,2167.27,46.02,90.43,25.92
5/1/2016,2065.55,44.27,86.76,240.23,1.81,2148.15,46.04,90.23,25.69
6/1/2016,2083.89,44.46,86.92,241.02,1.64,2160.13,46.09,90.1,25.84
7/1/2016,2148.9,44.65,87.64,240.63,1.5,2231.13,46.36,91,26.69
8/1/2016,2170.95,44.84,88.37,240.85,1.56,2251.95,46.51,91.66,26.95
9/1/2016,2157.69,45.03,89.09,241.43,1.63,2232.83,46.6,92.19,26.73
10/1/2016,2143.02,45.25,90.91,241.73,1.76,2214.89,46.77,93.96,26.53
11/1/2016,2164.99,45.48,92.73,241.35,2.14,2241.08,47.07,95.99,26.85
12/1/2016,2246.63,45.7,94.55,241.43,2.49,2324.83,47.29,97.84,27.87
1/1/2017,2275.12,45.93,96.46,242.84,2.43,2340.67,47.25,99.24,28.06
2/1/2017,2329.91,46.15,98.38,243.6,2.42,2389.52,47.33,100.89,28.66
3/1/2017,2366.82,46.38,100.29,243.8,2.48,2425.4,47.53,102.77,29.09
4/1/2017,2359.31,46.66,101.53,244.52,2.3,2410.56,47.67,103.74,28.9
5/1/2017,2395.35,46.94,102.78,244.73,2.3,2445.29,47.92,104.92,29.31
6/1/2017,2433.99,47.22,104.02,244.96,2.19,2482.48,48.16,106.09,29.75
7/1/2017,2454.1,47.54,105.04,244.79,2.32,2504.72,48.52,107.21,30
8/1/2017,2456.22,47.85,106.06,245.52,2.21,2499.4,48.69,107.92,29.91
9/1/2017,2492.84,48.17,107.08,246.82,2.2,2523.31,48.76,108.39,30.17
10/1/2017,2557,48.42,108.01,246.66,2.36,2589.89,49.05,109.4,30.92
11/1/2017,2593.61,48.68,108.95,246.67,2.35,2626.9,49.3,110.35,31.3
12/1/2017,2664.34,48.93,109.88,246.52,2.4,2700.13,49.59,111.36,32.09
My solution (correct but not optimal):
file = open("SP500.txt", "r")
content = file.readlines()
# List that will hold the range of months we need
data=[]
for line in content:
# Get a list of values for each line
values = line.split(',')
# Return lines with the required dates
for i in range(6,13):
month_range = f"{i}/1/2016"
if month_range == values[0]:
data.append(values)
# Return lines with the required dates
for i in range(1,6):
month_range = f"{i}/1/2017"
if month_range == values[0]:
data.append(values)
sum_total = 0
max_interest = 0
# Loop through the data of our required months
for entry in data:
# Get the sum total
sum_price += float(entry[1])
# Find the highest interest rate in list
if max_interest < float(entry[5]):
max_interest = float(entry[5])
mean_SP = sum_total / len(data)
I'm self-learning these concepts and I would love to learn a better way of implementing this solution. My code seems borderline hard coding (exact date in values[0]) and I imagine it to be error prone for bigger problems. Especially the excessive looping that's being done, which seems quite exaustive for such a simple problem.
Thanks in advance.
EDIT:
New code (based Deepak Tripathi answer):
with open('SP500.txt') as f:
lines = f.readlines()
lines = [line.rstrip().split(",") for line in lines]
date_index, spf_index, long_interest_rate = 0, 1, 5
start_year, end_year = 2016, 2017
start_month, end_month = 6, 5
mean_SP, max_interest = 0, -1000 # Some random negative number
total_entries = 0
for line in lines[1:]:
date_values = line[date_index].split('/')
if (int(date_values[2]) == start_year and int(date_values[0]) >= start_month) or (int(date_values[2]) == end_year and int(date_values[0]) <= end_month):
total_entries += 1
mean_SP += float(line[spf_index])
max_interest = max(max_interest, float(line[long_interest_rate]))
mean_SP /= total_entries
print(mean_SP, max_interest)
I think you can optimized by storing the index of columns in some variable
with open('temp.txt') as f:
lines = f.readlines()
lines = [line.rstrip().split(",") for line in lines]
date_index, spf_index, long_interest_rate = 0, 1, 5
start_date, end_date = "01/06/2016", "31/05/2017"
mean_SP, max_interest = 0, -1000 # Some random negative number
for line in lines[1:]:
if start_date.zfill(10) <= line[date_index] <= end_date.zfill(10):
mean_SP += float(line[spf_index])
max_interest = max(max_interest, float(line[long_interest_rate]))
mean_SP /= len(lines[1:])
print(mean_SP, max_interest)
I have a text file consisting of some stocks and their prices and what not, I am trying to print out the stock which has the lowest value along with the name of the company here is my code.
stocks = open("P:\COM661\stocks.txt")
name_lowest = ""
price_lowest = 0
for line in stocks:
rows = line.split("\t")
price = float(rows[2])
if price>price_lowest:
price_lowest = price
name_lowest = rows[1]
print(name_lowest + "\t" + str(price_lowest))
I'm trying to go through the file and compare each numeric value to the one before it to see if it is higher or lower and then at the end it should have saved the lowest price and print it along with the name of the company.
Instead it prints the value of the last company in the file along with its name.
How can I fix this?
You made 2 mistakes.
First is initialised the initial value to 0
You should initialise the initial value to the max available number in python float.
import sys
price_lowest = sys.float_info.max
Or else you could initialise it to the first element
Second your should if statement should be
if price<price_lowest:
Initialize:
price_lowest = 999999 # start with absurdly high value, or take first one
Plus your if check is the opposite.
Should be:
if price < price_lowest
Others already suggested a solution that fixes your current code. However, using Python you can have a shorter solution:
with open('file') as f:
print min(
[(i.split('\t')[0], float(i.split('\t')[1])) for i in f.readlines()],
key=lambda t: t[1]
)
Your "if" logic is backwards, it should be price<lowest_pre.
Just make a little adjustment start your price_lowest at None then set it to your first encounter and compare from there on
stocks = open("P:\COM661\stocks.txt")
name_lowest = ""
price_lowest = None
for line in stocks:
rows = line.split("\t")
price = float(rows[2])
if price_lowest = None:
price = price_lowest
name_lowest = rows[1]
elif price < price_lowest:
price_lowest = price
name_lowest = rows[1]
print(name_lowest + "\t" + str(price_lowest))
I have a huuge csv file (524 MB, notepad opens it for 4 minutes) that I need to change formatting of. Now it's like this:
1315922016 5.800000000000 1.000000000000
1315922024 5.830000000000 3.000000000000
1315922029 5.900000000000 1.000000000000
1315922034 6.000000000000 20.000000000000
1315924373 5.950000000000 12.452100000000
The lines are divided by a newline symbol, when I paste it into Excel it divides into lines. I would've done it by using Excel functions but the file is too big to be opened.
First value is the number of seconds since 1-01-1970, second is price, third is volumen.
I need it to be like this:
01-01-2009 13:55:59 5.800000000000 1.000000000000 01-01-2009 13:56:00 5.830000000000 3.000000000000
etc.
Records need to be divided by a space. Sometimes there are multiple values of price from the same second like this:
1328031552 6.100000000000 2.000000000000
1328031553 6.110000000000 0.342951630000
1328031553 6.110000000000 0.527604200000
1328031553 6.110000000000 0.876088370000
1328031553 6.110000000000 0.971026920000
1328031553 6.100000000000 0.965781090000
1328031589 6.150000000000 0.918752490000
1328031589 6.150000000000 0.940974100000
When this happens, I need the code to take average price from that second and save just one price for each second.
These are bitcoin transactions which didn't happen every second when BTC started.
When there is no record from some second, there needs to be created a new record with the following second and the values of price and volumen copied from the last known price and volumen.
Then save everything to a new txt file.
I can't seem to do it, I've been trying to write a converter in python for hours, please help.
shlex is a lexical parser. We use it to pick the numbers from the input one at a time. Function records groups these into lists where the first element of the list is an integer and the other elements are floating points.
The loop reads the results of records and averages on times as necessary. It also prints two outputs to a line.
from shlex import shlex
lexer = shlex(instream=open('temp.txt'), posix=False)
lexer.wordchars = r'0123456789.\n'
lexer.whitespace = ' \n'
lexer.whitespace_split = True
import time
def Records():
record = []
while True:
token = lexer.get_token()
if token:
token = token.strip()
if token:
record.append(token)
if len(record)==3:
record[0] = int(record[0])
record[1] = float(record[1])
record[2] = float(record[2])
yield record
record=[]
else:
break
else:
break
def conv_time(t):
return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(t))
records = Records()
pos = 1
current_date, price, volume = next(records)
price_sum = price
volume_sum = volume
count = 1
for raw_date, price, volume in records:
if raw_date == current_date:
price_sum += price
volume_sum += volume
count += 1
else:
print (conv_time(current_date), price_sum/count, volume_sum/count, end=' ' if pos else '\n')
pos = (pos+1)%2
current_date = raw_date
price_sum = price
volume_sum = volume
count = 1
print (conv_time(current_date), price_sum/count, volume_sum/count, end=' ' if pos else '\n')
Here are the results. You might need to do something about significant digits to the rights of decimal points.
2011-09-13 09:53:36 5.8 1.0 2011-09-13 09:53:44 5.83 3.0
2011-09-13 09:53:49 5.9 1.0 2011-09-13 09:53:54 6.0 20.0
2011-09-13 10:32:53 5.95 12.4521 2012-01-31 12:39:12 6.1 2.0
2012-01-31 12:39:13 6.108 0.736690442 2012-01-31 12:39:49 6.15 0.9298632950000001
1) Reading a single line from a file
data = {}
with open(<path to file>) as fh:
while True:
line = fh.readline()[:-1]
if not line: break
values = line.split(' ')
for n in range(0, len(values), 3):
dt, price, volumen = values[n:n+3]
2) Checking if it's the next second after the last record's
If so, adding the price and volumen values to a variable and increasing a counter for later use in calculating the average
3) If the second is not the next second, copy values of last price and volumen.
if not dt in data:
data[dt] = []
data[dt].append((price, volumen))
4) Divide timestamps like "1328031552" into seconds, minutes, hours, days, months, years.
Somehow take care of gap years.
for dt in data:
# seconds, minutes, hours, days, months, years = datetime (dt)
... for later use in calculating the average
p_sum, v_sum = 0
for p, v in data[dt]:
p_sum += p
v_sum += v
n = len(data[dt])
price = p_sum / n
volumen = v_sum / n
5) Arrange values in the 01-01-2009 13:55:59 1586.12 220000 order
6) Add the record to the end of the new database file.
print(datetime, price, volumen)