Load a web page - python

I am trying to load a web page using PySide's QtWebKit module. According to the documentation (Elements of QWebView; QWebFrame::toHtml()), the following script should print the HTML of the Google Search Page:
from PySide import QtCore
from PySide import QtGui
from PySide import QtWebKit
# Needed if we want to display the webpage in a widget.
app = QtGui.QApplication([])
view = QtWebKit.QWebView(None)
view.setUrl(QtCore.QUrl("http://www.google.com/"))
frame = view.page().mainFrame()
print(frame.toHtml())
But alas it does not. All that is printed is the method's equivalent of a null response:
<html><head></head><body></body></html>
So I took a closer look at the setUrl documentation:
The view remains the same until enough data has arrived to display the new url.
This made me think that maybe I was calling the toHtml() method too soon, before a response has been received from the server. So I wrote a class that overrides the setUrl method, blocking until the loadFinished signal is triggered:
import time
class View(QtWebKit.QWebView):
def __init__(self, *args, **kwargs):
super(View, self).__init__(*args, **kwargs)
self.completed = True
self.loadFinished.connect(self.setCompleted)
def setCompleted(self):
self.completed = True
def setUrl(self, url):
self.completed = False
super(View, self).setUrl(url)
while not self.completed:
time.sleep(0.2)
view = View(None)
view.setUrl(QtCore.QUrl("http://www.google.com/"))
frame = view.page().mainFrame()
print(frame.toHtml())
That made no difference at all. What am I missing here?
EDIT: Merely getting the HTML of a page is not my end game here. This is a simplified example of code that was not working the way I expected it to. Credit to Oleh for suggesting replacing time.sleep() with app.processEvents()

Copied from my other answer:
from PySide.QtCore import QObject, QUrl, Slot
from PySide.QtGui import QApplication
from PySide.QtWebKit import QWebPage, QWebSettings
qapp = QApplication([])
def load_source(url):
page = QWebPage()
page.settings().setAttribute(QWebSettings.AutoLoadImages, False)
page.mainFrame().setUrl(QUrl(url))
class State(QObject):
src = None
finished = False
#Slot()
def loaded(self, success=True):
self.finished = True
if self.src is None:
self.src = page.mainFrame().toHtml()
state = State()
# Optional; reacts to DOM ready, which happens before a full load
def js():
page.mainFrame().addToJavaScriptWindowObject('qstate$', state)
page.mainFrame().evaluateJavaScript('''
document.addEventListener('DOMContentLoaded', qstate$.loaded);
''')
page.mainFrame().javaScriptWindowObjectCleared.connect(js)
page.mainFrame().loadFinished.connect(state.loaded)
while not state.finished:
qapp.processEvents()
return state.src
load_source downloads the data from an URL and returns the HTML after modification by WebKit. It wraps Qt's event loop with its asynchronous events, and is a blocking function.
But you really should think what you're doing. Do you actually need to invoke the engine and get the modified HTML? If you just want to download HTML of some webpage, there are much, much simpler ways to do this.
Now, the problem with the code in your answer is you don't let Qt do anything. There is no magic happening, no code running in background. Qt is based on an event loop, and you never let it enter that loop. This is usually achieved by calling QApplication.exec_ or with a workaround processEvents as shown in my code. You can replace time.sleep(0.2) with app.processEvents() and it might just work.

Related

How can I monkey patch PyQT's QApplication.notify() to time events

In our PyQt application we want to time the duration of all Qt Events. Only in a special performance monitoring mode. Previously I subclassed QApplication and overrode the notify() method and that worked great. I wrote the data in chrome://tracing format and it was super helpful.
However when our application is run inside Jupyter there is a pre-existing QApplication instance. So I can't think of how to make it use my subclass.
Instead I tried monkey patching below, but my notify() is never called. I suspect notify() is a wrapped C++ method and it can't be monkey patched?
def monkey_patch_event_timing(app: QApplication):
original_notify = app.notify
def notify_with_timing(self, receiver, event):
timer_name = _get_timer_name(receiver, event)
# Time the event while we handle it.
with perf.perf_timer(timer_name, "qt_event"):
return original_notify(receiver, event)
bound_notify = MethodType(notify_with_timing, app)
# Check if we are already patched first.
if not hasattr(app, '_myproject_event_timing'):
print("Enabling Qt Event timing...")
app.notify = bound_notify
app._myproject_event_timing = True
Is there a way to monkey patch QApplication.notify or otherwise insert code somewhere that can time every Qt Event?
A possible solution is to remove the old QApplication with the help of sip and create a new one:
def monkey_patch_event_timing():
app = QApplication.instance()
if app is not None:
import sip
sip.delete(app)
class MyApplication(QApplication):
def notify(self, receiver, event):
ret = QApplication.notify(self, receiver, event)
print(ret, receiver, event)
return ret
app = MyApplication([])
return app

Qt Signals/Slots in threaded Python

I'm having troubles using PyQt4 slots/signals.
I'm using PyLIRC and I'm listening for button presses on a remote. This part I have gotten to work outside of Qt. My problem comes when emitting the signal from the button listening thread and attempting to call a slot in the main thread.
My button listener is a QObject initialized like so:
buttonPressed = pyqtSignal(int)
def __init__(self):
super(ButtonEvent, self).__init__()
self.buttonPressed.connect(self.onButtonPressed)
def run(self):
print 'running'
while(self._isListening):
s = pylirc.nextcode()
if (s):
print 'emitting'
self.buttonPressed.emit(int(s[0]))
The onButtonPressed slot is internal to the button listener for testing purposes.
To move the button listener to another thread to do the work, I use the following:
event = ButtonEvent()
eventThread = QThread()
event.moveToThread(eventThread)
eventThread.started.connect(event.run)
Then in the main thread, I have my VideoTableController class that contains the slot in the main thread that doesn't get called. Inside of __init__ I have this:
class VideoTableController(QObject):
def __init__(self, buttonEvent):
buttonEvent.buttonPressed.connect(self.onButtonPressed)
Where onButtonPressed in this case is:
#pyqtSlot(int)
def onButtonPressed(self, bid):
print 'handling button press'
if bid not in listenButtons: return
{ ButtonEnum.KEY_LEFT : self.handleBack,
#...
So when I start the event thread, it starts listening properly. When I press a button on the remote, the onButtonPressed slot internal to the ButtonEvent class is properly called, but the slot within VideoTableController, which resides in the main thread, is not called. I started my listening thread after connecting the slot to the signal, and I tested doing it the other way around, but to no avail.
I have looked around, but I haven't been able to find anything. I changed over to using QObject after reading You're doing it wrong. Any help with this is greatly appreciated. Let me know if you need anything else.
EDIT: Thanks for the responses! Here is a big chunk of code for you guys:
ButtonEvent (This class uses singleton pattern, excuse the poor coding because I'm somewhat new to this territory of Python also):
import pylirc
from PyQt4.QtCore import QObject, pyqtSignal, QThread, pyqtSlot
from PyQt4 import QtCore
class ButtonEvent(QObject):
"""
A class used for firing button events
"""
_instance = None
_blocking = 0
_isListening = False
buttonPressed = pyqtSignal(int)
def __new__(cls, configFileName="~/.lircrc", blocking=0, *args, **kwargs):
if not cls._instance:
cls._instance = super(ButtonEvent, cls).__new__(cls, args, kwargs)
cls._blocking = blocking
if not pylirc.init("irexec", configFileName, blocking):
raise RuntimeError("Problem initilizing PyLIRC")
cls._isListening = True
return cls._instance
def __init__(self):
"""
Creates an instance of the ButtonEvent class
"""
super(ButtonEvent, self).__init__()
self.buttonPressed.connect(self.button)
### init
def run(self):
print 'running'
while(self._isListening):
s = pylirc.nextcode()
if (s):
print 'emitting'
self.buttonPressed.emit(int(s[0]))
def stopListening(self):
print 'stopping'
self._isListening = False
#pyqtSlot(int)
def button(self, bid):
print 'Got ' + str(bid)
def setupAndConnectButtonEvent(configFileName="~/.lircrc", blocking=0):
"""
Initializes the ButtonEvent and puts it on a QThread.
Returns the QThread it is running on.
Does not start the thread
"""
event = ButtonEvent().__new__(ButtonEvent, configFileName, blocking)
eventThread = QThread()
event.moveToThread(eventThread)
eventThread.started.connect(event.run)
return eventThread
Here is the VideoTableController:
from ControllerBase import ControllerBase
from ButtonEnum import ButtonEnum
from ButtonEvent import ButtonEvent
from PyQt4.QtCore import pyqtSlot
from PyQt4 import QtCore
class VideoTableController(ControllerBase):
listenButtons = [ ButtonEnum.KEY_LEFT,
ButtonEnum.KEY_UP,
ButtonEnum.KEY_OK,
ButtonEnum.KEY_RIGHT,
ButtonEnum.KEY_DOWN,
ButtonEnum.KEY_BACK ]
def __init__(self, model, view, parent=None):
super(VideoTableController, self).__init__(model, view, parent)
self._currentRow = 0
buttonEvent = ButtonEvent()
buttonEvent.buttonPressed.connect(self.onButtonPressed)
self.selectRow(self._currentRow)
#pyqtSlot(int)
def onButtonPressed(self, bid):
print 'handling button press'
if bid not in listenButtons: return
{ ButtonEnum.KEY_LEFT : self.handleBack,
ButtonEnum.KEY_UP : self.handleUp,
ButtonEnum.KEY_OK : self.handleOk,
ButtonEnum.KEY_RIGHT : self.handleRight,
ButtonEnum.KEY_DOWN : self.handleDown,
ButtonEnum.KEY_BACK : self.handleBack,
}.get(bid, None)()
And here is my startup script:
import sys
from PyQt4 import QtCore, QtGui
from ui_main import Ui_MainWindow
from VideoTableModel import VideoTableModel
from VideoTableController import VideoTableController
from ButtonEvent import *
class Main(QtGui.QMainWindow):
def __init__(self, parent=None):
QtGui.QWidget.__init__(self, parent)
self.ui = Ui_MainWindow()
self.ui.setupUi(self)
self.buttonEvent = ButtonEvent()
self.bEventThread = setupAndConnectButtonEvent()
model = VideoTableModel("/home/user/Videos")
self.ui.videoView.setModel(model)
controller = VideoTableController(model, self.ui.videoView)
self.bEventThread.start()
def closeEvent(self, event):
self.buttonEvent.stopListening()
self.bEventThread.quit()
event.accept()
if __name__ == '__main__':
app = QtGui.QApplication(sys.argv)
buttonEvent = ButtonEvent()
myapp = Main()
myapp.show()
sys.exit(app.exec_())
It turns out I was just making a foolish Python mistake. The signal was being emitted correctly, and the event loop was running properly in all threads. My problem was that in my Main.__init__ function I made a VideoTableController object, but I did not keep a copy in Main, so my controller did not persist, meaning the slot also left. When changing it to
self.controller = VideoTableController(model, self.ui.videoView)
Everything stayed around and the slots were called properly.
Moral of the story: it's not always a misuse of the library, it may be a misuse of the language.
It seems that the quickest workaround would be change your ButtonEvent code here:
...
def run(self):
print 'running'
while(self._isListening):
s = pylirc.nextcode()
if (s):
print 'emitting'
self.buttonPressed.emit(int(s[0]))
...
to this:
#pyqtSlot()
def run(self):
print 'running'
while(self._isListening):
s = pylirc.nextcode()
if (s):
print 'emitting'
self.buttonPressed.emit(int(s[0]))
The short explanation to this issue is that PyQt uses a proxy internally, and this way you can make sure to avoid that. After all, your method is supposed to be a slot based on the connect statement.
Right... Now, I would encourage you to give some consideration for your current software design though. It seems that you are using a class in a dedicated thread for handling Qt button events. It may be good idea, I am not sure, but I have not seen this before at least.
I think you could get rid of that class altogether in the future with a better approach where you connect from the push button signals directly to your handler slot. That would not be the run "slot" in your dedicated thread, however, but the cannonical handler.
It is not a good design practice to introduce more complexity, especially in multi-threaded applications, than needed. Hope this helps.
I haven't actually tested this (because I don't have access to your compiled UI file), but I'm fairly certain I'm right.
Your run method of your ButtonEvent (which is supposed to be running in a thread) is likely running in the mainthread (you can test this by importing the python threading module and adding the line print threading.current_thread().name. To solve this, decorate your run method with #pyqtSlot()
If that doesn't solve it, add the above print statement to various places until you find something running in the main thread that shouldn't be. The lined SO answer below will likely contain the answer to fix it.
For more details, see this answer: https://stackoverflow.com/a/20818401/1994235

HTML page vastly different when using a headless webkit implementation using PyQT

I was under the impression that using a headless browser implementation of webkit using PyQT will automatically get me the html code for each URL even with heavy JS code in it. But I am only seeing it partially. I am comparing with the page I get when I save the page from the firefox window.
I am using the following code -
class JabbaWebkit(QWebPage):
# 'html' is a class variable
def __init__(self, url, wait, app, parent=None):
super(JabbaWebkit, self).__init__(parent)
JabbaWebkit.html = ''
if wait:
QTimer.singleShot(wait * SEC, app.quit)
else:
self.loadFinished.connect(app.quit)
self.mainFrame().load(QUrl(url))
def save(self):
JabbaWebkit.html = self.mainFrame().toHtml()
def userAgentForUrl(self, url):
return USER_AGENT
def get_page(url, wait=None):
# here is the trick how to call it several times
app = QApplication.instance() # checks if QApplication already exists
if not app: # create QApplication if it doesnt exist
app = QApplication(sys.argv)
#
form = JabbaWebkit(url, wait, app)
app.aboutToQuit.connect(form.save)
app.exec_()
return JabbaWebkit.html
Can some one see anything obviously wrong with the code?
After running the code through a few URLs, here is one I found that shows the problems I am running into quite clearly - http://www.chilis.com/EN/Pages/menu.aspx
Thanks for any pointers.
The page have ajax code, when it finish load, it still need some time to update the page with ajax. But you code will quit when it finish load.
You should add some code like this to wait some time and process events in webkit:
for i in range(200): #wait 2 seconds
app.processEvents()
time.sleep(0.01)

PySide wait for signal from main thread in a worker thread

I decided to add a GUI to one of my scripts. The script is a simple web scraper. I decided to use a worker thread as downloading and parsing the data can take a while. I decided to use PySide, but my knowledge of Qt in general is quite limited.
As the script is supposed to wait for user input upon coming across a captcha I decided it should wait until a QLineEdit fires returnPressed and then send it's content to the worker thread so it can send it for validation. That should be better than busy-waiting for the return key to be pressed.
It seems that waiting for a signal isn't as straight forward as I thought it would be and after searching for a while I came across several solutions similar to this. Signaling across threads and a local event loop in the worker thread make my solution a bit more complicated though.
After tinkering with it for several hours it still won't work.
What is supposed to happen:
Download data until refered to captcha and enter a loop
Download captcha and display it to the user, start QEventLoop by calling self.loop.exec_()
Exit QEventLoop by calling loop.quit() in a worker threads slot which is connected via self.line_edit.returnPressed.connect(self.worker.stop_waiting) in the main_window class
Validate captcha and loop if validation fails, otherwise retry the last url which should be downloadable now, then move on with the next url
What happens:
...see above...
Exiting QEventLoop doesn't work. self.loop.isRunning() returns False after calling its exit(). self.isRunning returns True, as such the thread didn't seem to die under odd circumstances. Still the thread halts at the self.loop.exec_() line. As such the thread is stuck executing the event loop even though the event loop tells me it is not running anymore.
The GUI responds as do the slots of the worker thread class. I can see the text beeing send to the worker thread, the status of the event loop and the thread itself, but nothing after the above mentioned line gets executed.
The code is a bit convoluted, as such I add a bit of pseudo-code-python-mix leaving out the unimportant:
class MainWindow(...):
# couldn't find a way to send the text with the returnPressed signal, so I
# added a helper signal, seems to work though. Doesn't work in the
# constructor, might be a PySide bug?
helper_signal = PySide.QtCore.Signal(str)
def __init__(self):
# ...setup...
self.worker = WorkerThread()
self.line_edit.returnPressed.connect(self.helper_slot)
self.helper_signal.connect(self.worker.stop_waiting)
#PySide.QtCore.Slot()
def helper_slot(self):
self.helper_signal.emit(self.line_edit.text())
class WorkerThread(PySide.QtCore.QThread):
wait_for_input = PySide.QtCore.QEventLoop()
def run(self):
# ...download stuff...
for url in list_of_stuff:
self.results.append(get(url))
#PySide.QtCore.Slot(str)
def stop_waiting(self, text):
self.solution = text
# this definitely gets executed upon pressing return
self.wait_for_input.exit()
# a wrapper for requests.get to handle captcha
def get(self, *args, **kwargs):
result = requests.get(*args, **kwargs)
while result.history: # redirect means captcha
# ...parse and extract captcha...
# ...display captcha to user via not shown signals to main thread...
# wait until stop_waiting stops this event loop and as such the user
# has entered something as a solution
self.wait_for_input.exec_()
# ...this part never get's executed, unless I remove the event
# loop...
post = { # ...whatever data necessary plus solution... }
# send the solution
result = requests.post('http://foo.foo/captcha_url'), data=post)
# no captcha was there, return result
return result
frame = MainWindow()
frame.show()
frame.worker.start()
app.exec_()
What you are describing looks ideal for QWaitCondition.
Simple example:
import sys
from PySide import QtCore, QtGui
waitCondition = QtCore.QWaitCondition()
mutex = QtCore.QMutex()
class Main(QtGui.QMainWindow):
def __init__(self, parent=None):
super(Main, self).__init__()
self.text = QtGui.QLineEdit()
self.text.returnPressed.connect(self.wakeup)
self.worker = Worker(self)
self.worker.start()
self.setCentralWidget(self.text)
def wakeup(self):
waitCondition.wakeAll()
class Worker(QtCore.QThread):
def __init__(self, parent=None):
super(Worker, self).__init__(parent)
def run(self):
print "initial stuff"
mutex.lock()
waitCondition.wait(mutex)
mutex.unlock()
print "after returnPressed"
if __name__=="__main__":
app = QtGui.QApplication(sys.argv)
m = Main()
m.show()
sys.exit(app.exec_())
The slot is executed inside the thread which created the QThread, and not in the thread that the QThread controls.
You need to move a QObject to the thread and connect its slot to the signal, and that slot will be executed inside the thread:
class SignalReceiver(QtCore.QObject):
def __init__(self):
self.eventLoop = QEventLoop(self)
#PySide.QtCore.Slot(str)
def stop_waiting(self, text):
self.text = text
eventLoop.exit()
def wait_for_input(self):
eventLoop.exec()
return self.text
class MainWindow(...):
...
def __init__(self):
...
self.helper_signal.connect(self.worker.signalReceiver.stop_waiting)
class WorkerThread(PySide.QtCore.QThread):
def __init__(self):
self.signalReceiver = SignalReceiver()
# After the following call the slots will be executed in the thread
self.signalReceiver.moveToThread(self)
def get(self, *args, **kwargs):
result = requests.get(*args, **kwargs)
while result.history:
...
self.result = self.signalReceiver.wait_for_input()

pyQT QNetworkManager and ProgressBars

I'm trying to code something that downloads a file from a webserver and saves it, showing the download progress in a QProgressBar.
Now, there are ways to do this in regular Python and it's easy. Problem is that it locks the refresh of the progressBar. Solution is to use PyQT's QNetworkManager class. I can download stuff just fine with it, I just can't get the setup to show the progress on the progressBar. HereĀ“s an example:
class Form(QDialog):
def __init__(self,parent=None):
super(Form,self).__init__(parent)
self.progressBar = QProgressBar()
self.reply = None
layout = QHBoxLayout()
layout.addWidget(self.progressBar)
self.setLayout(layout)
self.manager = QNetworkAccessManager(self)
self.connect(self.manager,SIGNAL("finished(QNetworkReply*)"),self.replyFinished)
self.Down()
def Down(self):
address = QUrl("http://stackoverflow.com") #URL from the remote file.
self.manager.get(QNetworkRequest(address))
def replyFinished(self, reply):
self.connect(reply,SIGNAL("downloadProgress(int,int)"),self.progressBar, SLOT("setValue(int)"))
self.reply = reply
self.progressBar.setMaximum(reply.size())
alltext = self.reply.readAll()
#print alltext
#print alltext
def updateBar(self, read,total):
print "read", read
print "total",total
#self.progressBar.setMinimum(0)
#self.progressBar.setMask(total)
#self.progressBar.setValue(read)
In this case, my method "updateBar" is never called... any ideas?
Well you haven't connected any of the signals to your updateBar() method.
change
def replyFinished(self, reply):
self.connect(reply,SIGNAL("downloadProgress(int,int)"),self.progressBar, SLOT("setValue(int)"))
to
def replyFinished(self, reply):
self.connect(reply,SIGNAL("downloadProgress(int,int)"),self.updateBar)
Note that in Python you don't have to explicitly use the SLOT() syntax; you can just pass the reference to your method or function.
Update:
I just wanted to point out that if you want to use a Progress bar in any situation where your GUI locks up during processing, one solution is to run your processing code in another thread so your GUI receives repaint events. Consider reading about the QThread class, in case you come across another reason for a progress bar that does not have a pre-built solution for you.

Categories

Resources