Convert str to float explicitly in Python 3 - python

I am gettting a "TypeError: Can't convert 'float' object to str implicitly" error because I am trying to divide a float by a string.
I am trying to cast the string to a float, but am still getting an error.
The string 'empNumber', is all digits but has a comma (ex: 112,000) - hence the "replace" function to strip the comma. I am drawing an error when I try to divide "final/decimal". How can I fix this type error?
def revPerEmployee():
for ticker in sp500short:
searchurl = "http://finance.yahoo.com/q/ks?s="+ticker
f = urlopen(searchurl)
html = f.read()
soup = BeautifulSoup(html, "html.parser")
searchurlemp = "http://finance.yahoo.com/q/pr?s="+ticker+"+Profile"
femp = urlopen(searchurlemp)
htmlemp = femp.read()
soupemp = BeautifulSoup(htmlemp, "html.parser")
try:
revenue2 = soup.find("td", text="Revenue (ttm):").find_next_sibling("td").text
empCount2 = soupemp.find("td", text="Full Time Employees:").find_next_sibling("td").text
except:
revenue2 = "There is no data for this company"
empCount2 = "There is no data for this company"
if revenue2 == "There is no data for this company" or empCount2 == "There is no data for this company":
lastLetter = ticker+": There is no data for this company"
else:
lastLetter = revenue2[len(revenue2)-1:len(revenue2)]
empNumber = empCount2.replace(",", "")
decimal = float(empNumber)
if lastLetter == "B":
result = revenue2[:-1]
revNum = float(result)
final = revNum * 1000000000.0
revPerEmp = final/decimal
print(ticker+": "+revPerEmp)
elif lastLetter == "M":
result = revenue2[:-1]
revNum = float(result)
final = revNum * 1000000.0
#newnum = "{:0,.2f}".format(final)
revPerEmp = final/decimal
print(ticker+": "+revPerEmp)
elif lastLetter == "K":
result = revenue2[:-1]
revNum = float(result)
final = revNum * 1000.0
#newnum = "{:0,.2f}".format(final)
revPerEmp = final/decimal
print(ticker+": "+revPerEmp)
else:
print(lastLetter)

17 + "orange" is nonsense, you can't add numbers and strings. You want
print("%s: %s" % (ticker, revPerEmp))
(you can switch %s for other formats, like %.2f), or
print(str(ticker) + ": " + str(revPerEmp))

The problem is that your program assumes that what is obtained from the URL request is a number in the form of digits followed by a suffix (K, M or B). This is not tested for.
There are also two suggestions to improve your code. First, you do use a try ... except clause to check when data cannot be obtained. You can also use it if conversion fails. The message "There is no data for this company" could be printed in the except clause.
Second, you have three if clauses very much alike, suggesting they can be condensed. A python dictionary can be used for the suffix values.
SUFFIX_VALUES = { 'K': 1000.0, 'M': 1000000.0, 'B': 1000000000.0 }
try:
# taken from your code
revenue2 = soup.find("td", text="Revenue(ttm):").find_next_sibling("td").text
empCount2 = soupemp.find("td", text="Full Time Employees:").find_next_sibling("td").text
revNum = float(revenue2[:-1])
empNumber = empCount2.replace(",", "")
decimal = float(empNumber)
lastLetter = revenue2[-1]
final = revNum * SUFFIX_VALUES[lastLetter]
revPerEmp = final/decimal
print("%s: %d" % (ticker, revPerEmp))
except:
print(ticker + ": There is no data for this company")
Now, if data is missing from the URL request, if conversion fails, or if the suffix is wrong, the program will execute the except clause.

Related

Python Won't Enter Try/Except In For Loop On 50th Iteration

First off I want to apologize to everyone who's about to read this code... I know it's a mess.
For anyone who is able to decipher it: I have a list of ~16,500 website URL's that I am scraping and then using googles NLP to categorize. The list of URL's is created with the following chunk of code, as far as I can tell nothing is broken here.
url_list = open("/Users/my_name/Documents/Website Categorization and Scrapper Python/url_list copy", "r")
indexed_url_list = url_list.readlines()
clean_url_list = []
clean_url_list = [x[:-1] for x in indexed_url_list]
When I print the length of this list it correctly gives me the count of ~16,500.
The main block of code is as follows:
for x in clean_url_list:
print('1')
url = x
print('1.1')
try:
r = scraper.get(url, headers = headers,)
print('1.2')
soup = BeautifulSoup(r.text, 'html.parser')
print('1.3')
title = soup.find('title').text
print('1.4')
description = soup.find('meta', attrs={'name': 'description'})["content"]
print('2')
if "content" in str(description):
description = description.get("content")
else:
description = ""
h1 = soup.find_all('h1')
h1_all = ""
for x in range (len(h1)):
if x == len(h1) -1:
h1_all = h1_all + h1[x].text
else:
h1_all = h1_all + h1[x].text + ". "
paragraphs_all = ""
paragraphs = soup.find_all('p')
for x in range (len(paragraphs)):
if x == len(paragraphs) -1:
paragraphs_all = paragraphs_all + paragraphs[x].text
else:
paragraphs_all = paragraphs_all + paragraphs[x].text + ". "
h2 = soup.find_all('h2')
h2_all = ""
for x in range (len(h2)):
if x == len(h2) -1:
h2_all = h2_all + h2[x].text
else:
h2_all = h2_all + h2[x].text + ". "
h3 = soup.find_all('h3')
h3_all = ""
for x in range (len(h3)):
if x == len(h3) -1:
h3_all = h3_all + h3[x].text
else:
h3_all = h3_all + h3[x].text + ". "
allthecontent = ""
allthecontent = str(title) + " " + str(description) + " " + str(h1_all) + " " + str(h2_all) + " " + str(h3_all) + " " + str(paragraphs_all)
allthecontent = str(allthecontent)[0:999]
print(allthecontent)
except Exception as e:
print(e)
When I run this it successfully categorizes the first 49 URL's, but ALWAYS stops on the 50th, no matter what URL it is. No error is thrown, and even if it did the try/except should handle it. Using the print statements to debug it seems to not enter the "try" section on the 50th iteration for whatever reason and it's always the 50th iteration
Any help would be much appreciated and I hope you have some good eye wash to wipe away the code you just had to endure.
I helped look at this at work. The actual issue was a bad 50th url that would not return. Adding a timeout allowed the code to escape the try/catch block and move on to the next url in a manageable fashion.
try:
r = scraper.get(url, headers = headers, timeout=5)
except:
continue # handle next url in list

SHA256 doesn't yield same result

I'm following on this tutorial and in part 2 (picture below) it shows that the SHA256 yields a result different than what I get when I ran my python code:
the string is: 0450863AD64A87AE8A2FE83C1AF1A8403CB53F53E486D8511DAD8A04887E5B23522CD470243453A299FA9E77237716103ABC11A1DF38855ED6F2EE187E9C582BA6
While the tutorial SHA256 comes to: 600FFE422B4E00731A59557A5CCA46CC183944191006324A447BDB2D98D4B408
My short python shows:
sha_result = sha256(bitconin_addresss).hexdigest().upper()
print sha_result
32511E82D56DCEA68EB774094E25BAB0F8BDD9BC1ECA1CEEDA38C7A43ACEDDCE
in fact, any online sha256 shows the same python result; so am I missing here something?
You're hashing the string when you're supposed to be hashing the bytes represented by that string.
>>> hashlib.sha256('0450863AD64A87AE8A2FE83C1AF1A8403CB53F53E486D8511DAD8A04887E5B23522CD470243453A299FA9E77237716103ABC11A1DF38855ED6F2EE187E9C582BA6'.decode('hex')).hexdigest().upper()
'600FFE422B4E00731A59557A5CCA46CC183944191006324A447BDB2D98D4B408'
You could use Gavin's "base58.py", which I believe he no longer shares it on his github page. However you probably could easily google and find different versions of it from github.
Here is one version edited a little by me:
#!/usr/bin/env python
"""encode/decode base58 in the same way that Bitcoin does"""
import math
import sys
__b58chars = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
__b58base = len(__b58chars)
def b58encode(v):
""" encode v, which is a string of bytes, to base58.
"""
long_value = 0L
for (i, c) in enumerate(v[::-1]):
long_value += ord(c) << (8*i) # 2x speedup vs. exponentiation
result = ''
while long_value >= __b58base:
div, mod = divmod(long_value, __b58base)
result = __b58chars[mod] + result
long_value = div
result = __b58chars[long_value] + result
# Bitcoin does a little leading-zero-compression:
# leading 0-bytes in the input become leading-1s
nPad = 0
for c in v:
if c == '\0': nPad += 1
else: break
return (__b58chars[0]*nPad) + result
def b58decode(v):
""" decode v into a string of len bytes
"""
long_value = 0L
for (i, c) in enumerate(v[::-1]):
long_value += __b58chars.find(c) * (__b58base**i)
result = ''
while long_value >= 256:
div, mod = divmod(long_value, 256)
result = chr(mod) + result
long_value = div
result = chr(long_value) + result
nPad = 0
for c in v:
if c == __b58chars[0]: nPad += 1
else: break
result = chr(0)*nPad + result
return result
try:
import hashlib
hashlib.new('ripemd160')
have_crypto = True
except ImportError:
have_crypto = False
def hash_160(public_key):
if not have_crypto:
return ''
h1 = hashlib.sha256(public_key).digest()
r160 = hashlib.new('ripemd160')
r160.update(h1)
h2 = r160.digest()
return h2
def hash_160_to_bc_address(h160, version="\x00"):
if not have_crypto:
return ''
vh160 = version+h160
h3=hashlib.sha256(hashlib.sha256(vh160).digest()).digest()
addr=vh160+h3[0:4]
return b58encode(addr)
def public_key_to_bc_address(public_key, version="\x00"):
if not have_crypto or public_key is None:
return ''
h160 = hash_160(public_key)
return hash_160_to_bc_address(h160, version=version)
def sec_to_bc_key(sec, version="\x80"):
if not have_crypto or sec is None:
return ''
vsec = version+sec +"\x01"
hvsec=hashlib.sha256(hashlib.sha256(vsec).digest()).digest()
return b58encode(vsec+hvsec[0:4])
def bc_key_to_sec(prv):
return b58decode(prv)[1:33]
def bc_address_to_hash_160(addr):
bytes = b58decode(addr)
return bytes[1:21]
if __name__ == '__main__':
if len(sys.argv) > 1:
if sys.argv[1] == '-en':
print b58encode(sys.argv[2].decode('hex_codec'))
if sys.argv[1] == '-de':
print b58decode(sys.argv[2]).encode('hex_codec')
if sys.argv[1] == '-pub':
print public_key_to_bc_address(sys.argv[2].decode('hex_codec'))
if sys.argv[1] == '-adr':
print bc_address_to_hash_160(sys.argv[2]).encode('hex_codec')
if sys.argv[1] == '-sec':
print sec_to_bc_key(sys.argv[2].decode('hex_codec'))
if sys.argv[1] == '-prv':
print bc_key_to_sec(sys.argv[2]).encode('hex_codec')
else:
print ''
print 'Usage: ./base58.py [options]'
print ''
print ' -en converts hex to base58'
print ' -de converts base58 to hex'
print
print ' -pub public_key_to_bc_address'
print ' -adr bc_address_to_hash_160'
print
print ' -sec sec_to_bc_key'
print ' -prv bc_key_to_sec'
print
To answer your specific question, based on above code you could use this command:
hashlib.sha256('0450863AD64A87AE8A2FE83C1AF1A8403CB53F53E486D8511DAD8A04887E5B23522CD470243453A299FA9E77237716103ABC11A1DF38855ED6F2EE187E9C582BA6'.decode('hex_codec')).digest().encode('hex_codec').upper()

Python 3: ValueError: invalid literal for int() with base 10: '0001.0110010110010102e+22'

I'm a beginner, so sorry if this is obvious.
I'm at a loss here. I've been trying to make an encryption/decryption program, but I keep getting this error. I'm aware that there are other questions on this issue, but I still can't resolve it.
Encryptor:
import binascii
def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
bits = bin(int(binascii.hexlify(text.encode(encoding, errors)), 16))[2:]
return bits.zfill(8 * ((len(bits) + 7) // 8))
def text_from_bits(bits, encoding='utf-8', errors='surrogatepass'):
n = int(bits, 2)
return int2bytes(n).decode(encoding, errors)
def int2bytes(i):
hex_string = '%x' % i
n = len(hex_string)
return binascii.unhexlify(hex_string.zfill(n + (n & 1)))
#ENCRYPTION ALGORITHM
algorithm = 61913299
#ASCII ----> NUMBERS
raw = input("Enter text to encrypt:")
binary = text_to_bits(raw)
binary = int(binary)
algorithm = int(algorithm)
encrypted = binary * algorithm
encrypted = str(encrypted)
print(encrypted)
print("Done")
Decryptor:
import sys
import time
def to_bin(string):
res = ''
for char in string:
tmp = bin(ord(char))[2:]
tmp = '%08d' %int(tmp)
res += tmp
return res
def to_str(string):
res = ''
for idx in range(len(string)/8):
tmp = chr(int(string[idx*8:(idx+1)*8], 2))
res += tmp
return res
incorrectpasswords = 0
password=("password")
originpassword = password
x = 1
algorithm = 61913299
while x==1:
passwordattempt =input("Enter Password:")
if passwordattempt == password:
print("Correct")
x = 2
if passwordattempt!= password:
print("Incorrect")
incorrectpasswords = incorrectpasswords + 1
if incorrectpasswords > 2:
if x == 1:
print("Too many wrong attempts, please try again in one minute.")
time.sleep(60)
encrypted = input("Enter numbers to unencrypt:")
encrypted = int(encrypted)
one = encrypted / algorithm
size = sys.getsizeof(one)
one = str(one).zfill(size + 1)
one = int(one)
unencrypted = to_str(one)
x = unencrypted
For the conversion between binary and text, and text and binary, I used some code I found online.
I believe your code is not working because:
one = encrypted / algorithm
generates a float
to turn your string back into a number you should apply
eval(one)
or
float(one)
instead of
int(one)
(You can also turn it into an int after applying float or eval)
alternatively you might be able to get it by using integer division // as opposed to / , which will make one the type int by flooring the decimal result of the divison, but I'm not sure if that is the behavior you are looking for
Example in python 3 shell:
>>> import sys
>>> one = 15/25
>>> size = sys.getsizeof(one)
>>> one = str(one).zfill(size+1)
>>> one
'00000000000000000000000.6'
>>> type(one)
<class 'str'>
>>> one = eval(one)
>>> one
0.6
>>> type(one)
<class 'float'>

BeautifulSoup error handling when find returns NoneType

I am scraping search results from a website where each result is contained in a and has a range of data associated with it. However some of these data values are missing and when they are, the error ''NoneType' object has no attribute 'text'' is returned.
I have put in a try/except block. Currently the entire search result is skipped when one of the values is missing. What can I do to allow the missing values to be replaced with a "", or blank in the xls file I am saving to?
My code is below:
divs = soup.find_all("div", class_="result-item standard") + soup.find_all("div", class_="result-item standard basic-ad")
for div in divs:
try:
#item_title = " ".join(div.h2.a.text.split())
item = div.h2.a.text.split()
item_year = item[0]
item_make = item[1]
item_model = ""
for i in range (2,len(item)):
item_model = item_model + item[i] + " "
item_eng = div.find("li", "item-engine").text
item_trans = div.find("li", "item-transmission").text
item_body = div.find("li", "item-body").text
item_odostr = div.find("li", "item-odometer").text
item_odo = ''.join(c for c in item_odostr if c.isdigit())
item_pricestr = " ".join(div.find("div", "primary-price").text.split())
item_price = ''.join(c for c in item_pricestr if c.isdigit())
item_adtype = div.find("div", "ad-type").span.text
#item_distance = div.find("a", "distance-from-me-link").text
item_loc = div.find("div", "call-to-action").p.text
item_row = (str(x),item_year,item_make,item_model,item_eng,item_trans,item_body,item_odo,item_price,item_adtype,item_loc)
print ",".join(item_row)
print(" ")
for i in range(len(item_row)):
ws.write(x,i,item_row[i])
if x % 500 == 0 :
wb.save("data.xls")
except AttributeError as e:
with open("error"+str(x)+".txt", "w+") as error_file:
error_file.write(div.text.encode("utf-8"))
for example:
item_eng = div.find("li", "item-engine").text if div.find("li", "item-engine") else ''
or:
item_eng = div.find("li", "item-engine").text if len(div.find_all("li", "item-engine"))!=0 else ''

Parse the HTML Table

I have an HTML table that I need to parse into a CSV file.
import urllib2, datetime
olddate = datetime.datetime.strptime('5/01/13', "%m/%d/%y")
from BeautifulSoup import BeautifulSoup
print("dates,location,name,url")
def genqry(arga,argb,argc,argd):
return arga + "," + argb + "," + argc + "," + argd
part = 1
row = 1
contenturl = "http://www.robotevents.com/robot-competitions/vex-robotics-competition"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())
table = soup.find('table', attrs={'class': 'catalog-listing'})
rows = table.findAll('tr')
for tr in rows:
try:
if row != 1:
cols = tr.findAll('td')
for td in cols:
if part == 1:
keep = 0
dates = td.find(text=True)
part = 2
if part == 2:
location = td.find(text=True)
part = 2
if part == 3:
name = td.find(text=True)
for a in tr.findAll('a', href=True):
url = a['href']
# Compare Dates
if len(dates) < 6:
newdate = datetime.datetime.strptime(dates, "%m/%d/%y")
if newdate > olddate:
keep = 1
else:
keep = 0
else:
newdate = datetime.datetime.strptime(dates[:6], "%m/%d/%y")
if newdate > olddate:
keep = 1
else:
keep = 0
if keep == 1:
qry = genqry(dates, location, name, url)
print(qry)
row = row + 1
part = 1
else:
row = row + 1
except (RuntimeError, TypeError, NameError):
print("Error: " + name)
I need to be able to get every VEX Event in that table that is after 5/01/13. So far, this code gives me an error about the dates, that I can't seem to be able to fix. Maybe someone that is better than me can fix this code? Thanks in advance, Smith.
EDIT #1: The Error That I am Getting Is:
Value Error: '\n10/5/13' does not match format '%m/%d/%y'
I think that I need to remove newlines at the beginning of the string first.
EDIT #2: Got it to run, without any output, any help?
Your question is very poor. Without knowing what the exact error, I would guess the problem is with your if len(dates) < 6: block. Consider the following:
>>> date = '10/5/13 - 12/14/13'
>>> len(date)
18
>>> date = '11/9/13'
>>> len(date)
7
>>> date[:6]
'11/9/1'
One suggestion to make your code more Pythonic: Instead of doing row = row + 1, use enumerate.
Update: Tracing your code, I get the value of dates as follows:
>>> dates
u'\n10/5/13 - 12/14/13 \xa0\n '

Categories

Resources