actually this is not hang status, i mean..it slow response,
so in that case,
i would like to close IE and
want to restart from start.
so closing is no problem ,problem is ,how to set timeout ,for example if i set 15sec, if not webpage open less than 15 sec i want to close it and restart from start.
is this possible to use with IE com interface?
really hard to find solution
Paul,
I'm used to follow code to check wether a webpage is completely open or not.
But as I mentioned, it is not working well, because IE.navigate is looks like it hangs or does not respond.
while ie.ReadyState != 4:
time.sleep(0.5)
To avoid blocking problem use IE COM object in a thread.
Here is a simple but powerful example demonstrating how can you use thread and IE com object together. You can improve it for your purpose.
This example starts a thread a uses a queue to communicate with main thread, in main thread user can add urls to queue, and IE thread visits them one by one, after he finishes one url, IE visits next. As IE COM object is being used in a thread you need to call Coinitialize
from threading import Thread
from Queue import Queue
from win32com.client import Dispatch
import pythoncom
import time
class IEThread(Thread):
def __init__(self):
Thread.__init__(self)
self.queue = Queue()
def run(self):
ie = None
# as IE Com object will be used in thread, do CoInitialize
pythoncom.CoInitialize()
try:
ie = Dispatch("InternetExplorer.Application")
ie.Visible = 1
while 1:
url = self.queue.get()
print "Visiting...",url
ie.Navigate(url)
while ie.Busy:
time.sleep(0.1)
except Exception,e:
print "Error in IEThread:",e
if ie is not None:
ie.Quit()
ieThread = IEThread()
ieThread.start()
while 1:
url = raw_input("enter url to visit:")
if url == 'q':
break
ieThread.queue.put(url)
Related
Trying to deal with the creation of a webdriver timing out (which happens once in a while covered here). I can't use a signal based timeout because my server is running on Windows so I've been trying to find an alternative.
I looked at the timeout from eventlet but I don't think that will cut it. A time.sleep(10000) doesn't trigger the timeout so I don't think the timeout itself would.
What I'm thinking is calling a thread to create and return the browser and then setting a join timeout. So something like:
def SpawnPhantomJS(dcap, service_args):
browser = webdriver.PhantomJS('C:\phantomjs.exe',desired_capabilities=dcap, service_args=service_args)
print "browser made!"
return browser
proxywrite = '--proxy=',nextproxy
service_args = [
proxywrite,
'--proxy-type=http',
'--ignore-ssl-errors=true',
]
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (nextuseragent)
newDriver = Thread(target=SpawnPhantomJS, args=[dcap, service_args]).start().join(20)
So I'm having some issues with the syntax on how to do this properly in theory this should work. If the creation stalls the SpawnPhamtomJS thread will stall not the main one so the timeout join should help it move on.
Is this possible though? Can I create a webdriver in a thread and return it? Any points appreciated.
Updates:
Just calling a function returned a webcontrol so that bodes well for what I'm trying to do.
newDriver = SpawnPhantomJS(dcap, service_args)
So I'm hoping it's just a syntax issue I have running this as a thread with a timeout.
This didn't do it however:
spawnthread = Thread(target=SpawnPhantomJS, args=[dcap, service_args])
spawnthread.start()
newDriver = spawnthread.join()
Wishful thinking there.
Thread pooling.
from multiprocessing.pool import ThreadPool
pool = ThreadPool(processes=1)
async_result = pool.apply_async(SpawnPhantomJS, (dcap, service_args))
newDriver = async_result.get(10)
I am working on a web app with CherryPy that needs to access a few applications via COM.
Right now I create a new instance of the application with each request, which means each request waits 3 seconds for the application to start and 0.01 for the actual job.
I would like to start each COM application once and keep it alive and reuse it for a few seconds on the following requests because most of the time it is used by a burst of 5-10 ajax requests, then nothing for hours.
Is it possible to share a COM abject across all the threads of a CherryPy application?
Here is the summary of a few experiments that show how it is working now on each request and how it does not work across threads.
The following code successfully starts and stops Excel:
>>> import pythoncom, win32com.client
>>> def start():
global xl
xl = win32com.client.Dispatch('Excel.Application')
>>> def stop():
global xl
xl.quit()
xl = None
>>> start()
>>> stop()
But the following code starts Excel and closes it after 3 seconds.
>>> import pythoncom, win32com.client, threading, time
>>> def start():
global xl
pythoncom.CoInitialize()
xl = win32com.client.Dispatch('Excel.Application')
time.sleep(3)
>>> threading.Thread(target=start).start()
I added the call to CoInitialize() otherwise the xl object would not work (see this post).
And I added the 3 second pause, so I could see on the task manager that the EXCEL.EXE process starts and is alive for 3 seconds.
Why does it die after the thread that started it ends?
I checked the documentation of CoInitialize(), but I couldn't understand if it is possible to get it to work in multithreaded environment.
If you want to use win32com in multiple threads you need to do a little bit of work more as COMObject cannot be passed to a thread directly. You need to use CoMarshalInterThreadInterfaceInStream() and CoGetInterfaceAndReleaseStream() to pass instance between threads:
import pythoncom, win32com.client, threading, time
def start():
# Initialize
pythoncom.CoInitialize()
# Get instance
xl = win32com.client.Dispatch('Excel.Application')
# Create id
xl_id = pythoncom.CoMarshalInterThreadInterfaceInStream(pythoncom.IID_IDispatch, xl)
# Pass the id to the new thread
thread = threading.Thread(target=run_in_thread, kwargs={'xl_id': xl_id})
thread.start()
# Wait for child to finish
thread.join()
def run_in_thread(xl_id):
# Initialize
pythoncom.CoInitialize()
# Get instance from the id
xl = win32com.client.Dispatch(
pythoncom.CoGetInterfaceAndReleaseStream(xl_id, pythoncom.IID_IDispatch)
)
time.sleep(5)
if __name__ == '__main__':
start()
For more info see: https://mail.python.org/pipermail/python-win32/2008-June/007788.html
The answer from #Mauriusz Jamro ( https://stackoverflow.com/a/27966218/7733418 ) was really helpful. Just to add to it, also ensure that you do:
pythoncom.CoUninitialize ()
in the end so that there's no memory leak. You can call it somewhere after using CoInitialize() and before your process ends.
Try using multiprocessing. Worked for me, after a long search.
from multiprocessing import Process
p = Process(target=test, args=())
p.start()
p.join()
I want to know how can I stop my program in console with CTRL+C or smth similar.
The problem is that there are two threads in my program. Thread one crawls the web and extracts some data and thread two displays this data in a readable format for the user. Both parts share same database. I run them like this :
from threading import Thread
import ResultsPresenter
def runSpider():
Thread(target=initSpider).start()
Thread(target=ResultsPresenter.runPresenter).start()
if __name__ == "__main__":
runSpider()
how can I do that?
Ok so I created my own thread class :
import threading
class MyThread(threading.Thread):
"""Thread class with a stop() method. The thread itself has to check
regularly for the stopped() condition."""
def __init__(self):
super(MyThread, self).__init__()
self._stop = threading.Event()
def stop(self):
self._stop.set()
def stopped(self):
return self._stop.isSet()
OK so I will post here snippets of resultPresenter and crawler.
Here is the code of resultPresenter :
# configuration
DEBUG = False
DATABASE = database.__path__[0] + '/database.db'
app = Flask(__name__)
app.config.from_object(__name__)
app.config.from_envvar('CRAWLER_SETTINGS', silent=True)
def runPresenter():
url = "http://127.0.0.1:5000"
webbrowser.open_new(url)
app.run()
There are also two more methods here that I omitted - one of them connects to the database and the second method loads html template to display result. I repeat this until conditions are met or user stops the program ( what I am trying to implement ). There are also two other methods too - one get's initial link from the command line and the second valitated arguments - if arguments are invalid I won't run crawl() method.
Here is short version of crawler :
def crawl(initialLink, maxDepth):
#here I am setting initial values, lists etc
while not(depth >= maxDepth or len(pagesToCrawl) <= 0):
#this is the main loop that stops when certain depth is
#reached or there is nothing to crawl
#Here I am popping urls from url queue, parse them and
#insert interesting data into the database
parser.close()
sock.close()
dataManager.closeConnection()
Here is the init file which starts those modules in threads:
import ResultsPresenter, MyThread, time, threading
def runSpider():
MyThread.MyThread(target=initSpider).start()
MyThread.MyThread(target=ResultsPresenter.runPresenter).start()
def initSpider():
import Crawler
import database.__init__
import schemas.__init__
import static.__init__
import templates.__init__
link, maxDepth = Crawler.getInitialLink()
if link:
Crawler.crawl(link, maxDepth)
killall = False
if __name__ == "__main__":
global killall
runSpider()
while True:
try:
time.sleep(1)
except:
for thread in threading.enumerate():
thread.stop()
killall = True
raise
Killing threads is not a good idea, since (as you already said) they may be performing some crucial operations on database. Thus you may define global flag, which will signal threads that they should finish what they are doing and quit.
killall = False
import time
if __name__ == "__main__":
global killall
runSpider()
while True:
try:
time.sleep(1)
except:
/* send a signal to threads, for example: */
killall = True
raise
and in each thread you check in a similar loop whether killall variable is set to True. If it is close all activity and quit the thread.
EDIT
First of all: the Exception is rather obvious. You are passing target argument to __init__, but you didn't declare it in __init__. Do it like this:
class MyThread(threading.Thread):
def __init__(self, *args, **kwargs):
super(MyThread, self).__init__(*args, **kwargs)
self._stop = threading.Event()
And secondly: you are not using my code. As I said: set the flag and check it in thread. When I say "thread" I actually mean the handler, i.e. ResultsPresenter.runPresenter or initSpide. Show us the code of one of these and I'll try to show you how to handle stopping.
EDIT 2
Assuming that the code of crawl function is in the same file (if it is not, then you have to import killall variable), you can do something like this
def crawl(initialLink, maxDepth):
global killall
# Initialization.
while not killall and not(depth >= maxDepth or len(pagesToCrawl) <= 0):
# note the killall variable in while loop!
# the other code
parser.close()
sock.close()
dataManager.closeConnection()
So basically you just say: "Hey, thread, quit the loop now!". Optionally you can literally break a loop:
while not(depth >= maxDepth or len(pagesToCrawl) <= 0):
# some code
if killall:
break
Of course it will still take some time before it quits (has to finish the loop and close parser, socket, etc.), but it should quit safely. That's the idea at least.
Try this:
ps aux | grep python
copy the id of the process you want to kill and:
kill -3 <process_id>
And in your code (adapted from here):
import signal
import sys
def signal_handler(signal, frame):
print 'You killed me!'
sys.exit(0)
signal.signal(signal.SIGQUIT, signal_handler)
print 'Kill me now'
signal.pause()
I'm running two python threads (import threading). Both of them are blocked on a open() call; in fact they try to open named pipes in order to write in them, so it's a normal behaviour to block until somebody try to read from the named pipe.
In short, it looks like:
import threading
def f():
open('pipe2', 'r')
if __name__ == '__main__':
t = threading.Thread(target=f)
t.start()
open('pipe1', 'r')
When I type a ^C, the open() in the main thread is interrupted (raises IOError with errno == 4).
My problem is: the t threads still waits, and I'd like to propagate the interruption behaviour, in order to make it raise IOError too.
I found this in python docs:
"
... only the main thread can set a new signal handler, and the main thread will be the only one to receive signals (this is enforced by the Python signal module, even if the underlying thread implementation supports sending signals to individual threads). This means that signals can’t be used as a means of inter-thread communication. Use locks instead.
"
Maybe you should also check these docs:
exceptions.KeyboardInterrupt
library/signal.html
One other idea is to use select to read the pipe asynchronously in the threads. This works in Linux, not sure about Windows (it's not the cleanest, nor the best implementation):
#!/usr/bin/python
import threading
import os
import select
def f():
f = os.fdopen(os.open('pipe2', os.O_RDONLY|os.O_NONBLOCK))
finput = [ f ]
foutput = []
# here the pipe is scanned and whatever gets in will be printed out
# ...as long as 'getout' is False
while finput and not getout:
fread, fwrite, fexcep = select.select(finput, foutput, finput)
for q in fread:
if q in finput:
s = q.read()
if len(s) > 0:
print s
if __name__ == '__main__':
getout = False
t = threading.Thread(target=f)
t.start()
try:
open('pipe1', 'r')
except:
getout = True
basically the problem is, that the only way to get all instances of VLC is to search all non-named instances for the org.freedesktop.MediaPlayer identity function and call it.
(alternatively I could use the introspection API, but this wouldn't seem to solve my problem)
Unfortunately many programs upon having sent a dbus call, simply do not respond, causing a long and costly timeout.
When this happens multiple times it can add up.
Basically the builtin timeout is excessively long.
If I can decrease the dbus timeout somehow that will solve my problem, but the ideal solution would be a way.
I got the idea that I could put each call to "Identify" inside a thread and that I could kill threads that take too long, but this seems not to be suggested. Also adding multithreading greatly increases the CPU load while not increasing the speed of the program all that much.
here is the code that I am trying to get to run quickly (more or less) which is currently painfully slow.
import dbus
bus = dbus.SessionBus()
dbus_proxy = bus.get_object('org.freedesktop.DBus', '/org/freedesktop/DBus')
names = dbus_proxy.ListNames()
for name in names:
if name.startswith(':'):
try:
proxy = bus.get_object(name, '/')
ident_method = proxy.get_dbus_method("Identity",
dbus_interface="org.freedesktop.MediaPlayer")
print ident_method()
except dbus.exceptions.DBusException:
pass
Easier than spawning a bunch of threads would be to make the calls to the different services asynchronously, providing a callback handler for when a result comes back or a D-Bus error occurs. All of the calls effectively happen in parallel, and your program can proceed as soon as it gets some positive results.
Here's a quick-and-dirty program that prints a list of all the services it finds. Note how quickly it gets all the positive results without having to wait for any timeouts from anything. In a real program you'd probably assign a do-nothing function to the error handler, since your goal here is to ignore the services that don't respond, but this example waits until it's heard from everything before quitting.
#! /usr/bin/env python
import dbus
import dbus.mainloop.glib
import functools
import glib
class VlcFinder (object):
def __init__ (self, mainloop):
self.outstanding = 0
self.mainloop = mainloop
bus = dbus.SessionBus ()
dbus_proxy = bus.get_object ("org.freedesktop.DBus", "/org/freedesktop/DBus")
names = dbus_proxy.ListNames ()
for name in dbus_proxy.ListNames ():
if name.startswith (":"):
proxy = bus.get_object (name, "/")
iface = dbus.Interface (proxy, "org.freedesktop.MediaPlayer")
iface.Identity (reply_handler = functools.partial (self.reply_cb, name),
error_handler = functools.partial (self.error_cb, name))
self.outstanding += 1
def reply_cb (self, name, ver):
print "Found {0}: {1}".format (name, ver)
self.received_result ()
def error_cb (self, name, msg):
self.received_result ()
def received_result (self):
self.outstanding -= 1
if self.outstanding == 0:
self.mainloop.quit ()
if __name__ == "__main__":
dbus.mainloop.glib.DBusGMainLoop (set_as_default = True)
mainloop = glib.MainLoop ()
finder = VlcFinder (mainloop)
mainloop.run ()