import time
import traceback
import sys
import tools
from BeautifulSoup import BeautifulSoup
f = open("randomwords.txt","w")
while 1:
try:
page = tools.download("http://wordnik.com/random")
soup = BeautifulSoup(page)
si = soup.find("h1")
w = si.string
print w
f.write(w)
f.write("\n")
time.sleep(3)
except:
traceback.print_exc()
continue
f.close()
It prints just fine. It just won't write to the file. It's 0 bytes.
You can never leave the while loop, hence the f.close() call will never be called and the stream buffer to the file will never be flushed.
Let me explain a little bit further, in your exception catch statement you've included continue so there's no "exit" to the loop condition. Perhaps you should add some sort of indicator that you've reached the end of the page instead of a static 1. Then you'd see the close call and information printed to the file.
A bare except is almost certainly a bad idea; you should only handle the exception you expect to see. Then if it does something totally unexpected you will still get a useful error trace about it.
import time
import tools
from BeautifulSoup import BeautifulSoup
def scan_file(url, logf):
try:
page = tools.download(url)
except IOError:
print("Couldn't read url {0}".format(url))
return
try:
soup = BeautifulSoup(page)
w = soup.find("h1").string
except AttributeError:
print("Couldn't find <h1> tag")
return
print(w)
logf.write(w)
logf.write('\n')
def main():
with open("randomwords.txt","a") as logf:
try:
while True:
time.sleep(3)
scan_file("http://wordnik.com/random", logf)
except KeyboardInterrupt:
break
if __name__=="__main__":
main()
Now you can close the program by typing Ctrl-C, and the "with" clause will ensure that the log file is closed properly.
From what i understand, you want to output a random number every three second into a file. But caching will take place, so you will not see your numbers until the cache has grown too large, typically in the order of 4K bytes.
i suggest that in your loop, you add a f.flush() before the sleep() line.
Also, like wheaties sugessted, you should have proper exception handling (if i want to stop your program, i will likely do a SIGINT using Ctrl+C, and your program won't stop in this case) and a proper exit path.
I'm sure that when you test your program, you will kill it hard to stop it, and any random number it has written will not be written because the file is not properly closed. If you program could exit normally, you would have close()d the file, and close() triggers a flush(), and so you would have something written in your file.
Read the answer posted by wheaties.
And, if you want to force to write the file's buffer to the disk, read:
http://docs.python.org/library/stdtypes.html#file.flush
Related
I am trying to find a way in which I can re run my code if there is a timeout or an error.
This is something that happens if the internet connection drops - So I'd need to delay for a few seconds whilst it comes back online and try again..
Would there be a way in which I could run my code in another python script and tell it to re run if it times out or disconnects?
Thanks in advance
If you are talking about http request connection error and if you are using the requests library you can use this urllib retry
FYI https://docs.python-requests.org/en/master/api/#requests.adapters.HTTPAdapter
Of course, other libraries will have their own retry function.
If you just want simple retry code, use the below code
retries_count = 3 # hard value
delay = 3 # hard value
while True:
try:
... run some code
return or break
except {Your Custom Error}:
if retries_count <= 0:
raise
retries_count -= 1
time.sleep(delay)
A Google search for "python throttling" will give you a lot of reference.
You can use a try-catch in an infinite loop like this:
while True:
try:
# your code
# call another python script
except:
time.sleep(10)
also, you can check if the error and run whatever you need to run, depend on the error type. for example:
while True:
try:
# your code
# call another python script
# break
except Exception as e:
if e == 'error type':
time.sleep(10)
else:
pass
You actually can do it with try except block.
You can find all about it here: https://docs.python.org/3/tutorial/errors.html
You just make one script, that has something like that:
while True:
try:
another_script_name.main()
except:
time.sleep(20)
Of course, you need to import both time and the other script you made.
What these lines are doing is just an infinite loop, that always tries to run the main function on the other script you made, and if some sort of error occurs, the system will sleep for 20 seconds and then will try again, because it is in the infinite loop.
I'm in trouble about how to end a 'try' loop, which is occurred since I have the 'try', here is the code:
import time
class exe_loc:
mem = ''
lib = ''
main = ''
def wizard():
while True:
try:
temp_me = input('Please specify the full directory of the memory, usually it will be a folder called "mem"> ' )
if temp_me is True:
exe_loc.mem = temp_me
time.sleep(1)
else:
print('Error value! Please run this configurator again!')
sys.exit()
temp_lib = input('Please specify the full directory of the library, usually it will be a folder called "lib"> ')
if temp_lib is True:
exe_loc.lib = temp_lib
time.sleep(1)
else:
print('Invalid value! Please run this configurator again!')
sys.exit()
temp_main = input('Please specify the full main executable directory, usually it will be app main directory> ')
if temp_main is True:
exe_loc.main = temp_main
time.sleep(1)
I tried end it by using break, pass, and I even leaves it empty what I get is Unexpected EOF while parsing, I searched online and they said it is caused when the code blocks were not completed. Please show me if any of my code is wrong, thanks.
Btw, I'm using python 3 and I don't know how to be more specific for this question, kindly ask me if you did not understand. Sorry for my poor english.
EDIT: Solved by removing the try because I'm not using it, but I still wanna know how to end a try loop properly, thanks.
Your problem isn't the break, it's the overall, high-level shape of your try clause.
A try requires either an except or a finally block. You have neither, which means your try clause is never actually complete. So python keeps looking for the next bit until it reaches EOF (End Of File), at which point it complains.
The python docs explain in more detail, but basically you need either:
try:
do_stuff_here()
finally:
do_cleanup_here() # always runs, even if the above raises an exception
or
try:
do_stuff_here()
except SomeException:
handle_exception_here() # if do_stuff_here raised a SomeException
(You can also have both the except and finally.) If you don't need either the cleanup or the exception handling, that's even easier: just get rid of the try altogether, and have the block go directly under that while True.
Finally, as a terminology thing: try is not a loop. A loop is a bit of code that gets executed multiple times -- it loops. The try gets executed once. It's a "clause," not a "loop."
You have to also 'catch' the exception with the except statement, otherwise the try has no use.
So if you do something like:
try:
# some code here
except Exception:
# What to do if specified error is encountered
This way if anywhere in your try block an exception is raised it will not break your code, but it will be catched by your except.
I have data coming in through sockets, being queued up to be parsed and checked for certain requirements and then passed to my FileWrite() function.
It seems to slow down over time even with a fairly even level of data. I can't seem to find a leak with it or a reason for it to take so long.
This is a snippet of the code. It goes backwards through the list because there's also a quick check it makes to be sure it is worth writing and might pop something out of the list once in a while but I've ran timers without it and it doesn't make a difference.
Snippet:
def FileWrite():
CopiedList = []
del CopiedList[:]
CopiedList[:] = []
CopiedList = list(ListForMatches)
try:
if os.path.exists("Temp.xml") == True:
os.remove("Temp.xml")
if os.path.exists("Finished.xml") == True:
os.remove("Finished.xml")
try:
if os.path.exists("Temp.xml") == True:
try:
os.remove("Temp.xml")
except:
print "problem removing temp.xml?"
root = ET.Element("Refunds")
tree = ET.ElementTree(root)
for Events in reversed(CopiedList):
try:
XMLdataparse2 = []
del XMLdataparse2[:]
XMLdataparse2[:] = []
XMLdataparse2 = Events.split('-Delimeter_Here-')
try:
if "Snowboarding" in XMLdataparse2[0]: #Football
matches = ET.SubElement(root, "Event", type = XMLdataparse2[0])
ET.SubElement(matches, "EventTime").text = str(XMLdataparse2[1])
ET.SubElement(matches, "EventName").text = str(XMLdataparse2[2])
ET.SubElement(matches, "Location").text = str(XMLdataparse2[3])
ET.SubElement(matches, "Distance").text = str(XMLdataparse2[4])
except:
print "problem preparing XML tree"
except:
agecounter = agecounter - 1 #Prevent infinite loop in case of problem
print "Problem moving to next XML tree"
trry = 0 #Attempts writing the XML file a few times (just in case)
while trry < 5:
try:
tree.write("Temp.xml")
break
except:
trry +=1
except:
print "problem writing XML file"
e = sys.exc_info()[0]
print e
#except:
except WindowsError, e:
e = sys.exc_info()[0]
print e
#Let's get the file ready to go.
try:
if os.path.exists("Temp.xml") == True:
os.rename("Temp.xml","Finished.xml")
except:
print "Problem creating Finished.xml"
return
My question is would it be possible to safely pass the parsing by '-Delimeter_Here-' and writing to the XML file to a thread. If I have 200 pieces to add could 200 threads parse that bit of data and safely write them to the same file or is that going to cause all kinds of havoc?
I can't really see another way of removing this bottleneck but I'd appreciate any suggestions or a safe way to thread this.
Update:
So I've been reading around and doing a little testing. Suspicions confirmed it would not be safe to have multiple threads writing into the XML at the same time.
In theory I could thread it to split the list by the delimiters in separate threads which would stop one thread from having to do that. I don't know how I would write the strings to the XML from those threads though. I don't know how I could add them to another queue without having to just parse them up again anyway.
Maybe a function which will take a set of strings to write to the XML and try this file lock part of the threading library.
So for each result in that last a thread starts by splitting up the data and checking it when passes it into another thread function to write it to the XML locking one at a time.
I'm hoping it's possible to have threads wait until the lock comes off and rotate that way.
Update
I wrote it using threading to parse up the strings and file locking to prevent it slowing down. This is the only way I see to speed this up. Having some trouble getting it to update the file properly without parsing the XML constantly which is a bigger slowdown (posted here).
I can't think of any other way of speeding this up.
I am parsing through the HTML returned from a list of links. When I reach a certain point in each HTML document I raise an Exception.
import urllib2, time,
from HTMLParser import HTMLParser
class MyHTMLParser2(HTMLParser):
def handle_starttag(self, tag, attrs):
if somethings:
do somethings
if tag == "div" and "section2" in attrs[0][1]:
raise NameError('End')
parser2 = MyHTMLParser2()
cntr = 0
for links in ls:
try:
f = urllib2.urlopen(links)
parser2.feed(f.read())
cntr+=1
if cntr%10 == 0:
print "Parsing...", " It has benn", (time.clock()-start)/60, 'mins.'
break
except Exception, e:
print 'There has been an error Jim. url_check number', cntr
error_log.write(links)
continue
It just executes the try statement once for the first link and then executes the exception clause to infinity.
How can I get it to move on to the next link once the exception is raised
The error_log is from some other errors it would run into related to urllib2, mostly they seemed like it wasn't able to connect to the webpage fast enough. So if there was a way to quit the HTMLParser2 without throwing an exception, that would be great. That way I could re-implement the error_log
No, your diagnosis is not correct, there is not infinite exception loop here. Each URL is an entirely separate exception.
The cntr variable won't update whenever you have an exception, perhaps that is giving you the impression that you end up in a exception loop. Either move the cntr += 1 line out of the try: statement, or use enumerate() to generate a counter for you.
That said, why are you trying to parse multiple HTML pages with one parser instance? Most likely the exception you keep getting is that a specific page is malformed and put the parser into a state it cannot continue from.
You should not stop the parser with an exception. Parsing is a pretty complex process and usually, it's better to let the parser complete, collecting the information you need and process this information when the parser has done it's job. That way, you keep different things in your software separate, making everything easier to maintain, debug and understand.
I currently have code that reads raw content from a file chosen by the user:
def choosefile():
filec = tkFileDialog.askopenfile()
# Wait a few to prevent attempting to displayng the file's contents before the entire file was read.
time.sleep(1)
filecontents = filec.read()
But, sometimes people open big files that take more than 2 seconds to open. Is there a callback for FileObject.read([size])? For people who don't know what a callback is, it's a operation executed once another operation has executed.
Slightly modified from the docs:
#!/usr/bin/env python
import signal, sys
def handler(signum, frame):
print "You took too long"
sys.exit(1)
f = open(sys.argv[1])
# Set the signal handler and a 2-second alarm
signal.signal(signal.SIGALRM, handler)
signal.alarm(2)
contents = f.read()
signal.alarm(0) # Disable the alarm
print contents
Answer resolved by asker
Hm, I made a mistake at first. tkFileDialog.askopenfile() does not read the file, but FileObject.read() reads the file, and blocks the code. I found the solution according to #kindall. I'm not a complete expert at Python, though.
Your question seems to assume that Python will somehow start reading your file while some other code executes, and therefore you need to wait for the read to catch up. This is not even slightly true; both open() and read() are blocking calls and will not return until the operation has completed. Your sleep() is not necessary and neither is your proposed workaround. Simply open the file and read it. Python won't do anything else while that is happening.
Thanks kindall! Resolved code:
def choosefile():
filec = tkFileDialog.askopenfile()
filecontents = filec.read()