spynner doesn't load XHR data - python

I'm building a script to monitor a reporting service. Depending on how it takes to process the report the report appears in HTML or comes via XmlHttpRequest.
As a tool to check the page I want to use spynner, which works perfect for HTML, but it seems that I can't get it to work when the data comes via XHR.
The code for the test is the following:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
__docformat__ = 'restructuredtext en'
from time import sleep
from spynner import browser
import pyquery
from PyQt4.QtCore import QUrl
from PyQt4.QtNetwork import QNetworkRequest, QNetworkAccessManager
from PyQt4.QtCore import QByteArray
def load_page(br):
ret = br.load_jquery(True)
print ret
return 'Japan' in br.html
br = browser.Browser(
debug_level=4
)
br.load('https://foobar.eu/newton/cgi-bin/cognos.cgi')
br.create_webview()
br.show()
#br.load("https://foobar.eu/newton/cgi-bin/cognos.cgi?b_action=xts.run&m=portal/cc.xts&m_folder=iA37B5BBC0615469DA37767D2B6F1DCF1")
#br.browse()
res = br.load("https://foobar.eu:443/newton/cgi-bin/cognos.cgi?b_action=cognosViewer&ui.action=run&ui.object=/content/folder[#name='DMA Admin Zone']/folder[#name='02. Performance Benchmark Module']/folder[#name='1. Reports']/report[#name='CQM_Test_3_HTML_Heavy_Local_Processing_Final']&ui.name=CQM_Test_3_HTML_Heavy_Local_Processing_Final&run.outputFormat=&run.prompt=true", 1, wait_callback=load_page)
d = str(pyquery.PyQuery(br.html))
if d.find("Japan") > -1:
print 'We discovered Japan!'
else:
print 'Japan is nowhere to be seen!'
sleep(10)
The URL in the comments is a page which contains a link to the report. When I click the report by hand the report works (via XHP). However, I can't seem to get it to work via scripting.
The br.load_jquery always returns None.
As a help I have added part of the spynner debug trace when I click the link by hand: http://fpaste.org/97583/13987135/
In firebug I can clearly see the XHP reponse with the string 'Japan' in.
What am I missing?

apparantly replacing the load page function with the following code makes it work:
def load_page(br):
br.wait(5)
return 'Japan' in br.html

Related

Python: Get pywebview Current URL

I am using the pywebview library to open a page that will redirect the user to another url. What I would like to do is get the URL the user is directed to.
my code so far:
import urllib.request
import urllib.parse
import webview
import threading
import time
def openwebview():
time.sleep(1)
page = webview.create_window("URL_that_redirects_user")
def geturl():
#what goes here?
t = threading.Thread(target = openwebview)
t.start()
I am using Windows, thanks!
Author of pywebview here. There is no way to get the current URL. Uou have to dig into an underlying webview to get the URL.
Thanks for the suggestion, I will look into introducing this feature.
Now you can do it:
def geturl():
print(webview.get_current_url())
See here:
https://github.com/r0x0r/pywebview/blob/master/examples/get_current_url.py

web browser created by spynner not responding

I am trying to use spynner for web scraping ... below I used www.google.com as an example .... I want to automatically search for "Barack Obama" using spynner ... However, the web browser created by spynner keeps not responding ... and the search string ("Barack Obama") is not filled in the search box (You will see it when you run the code below yourself).
import spynner
browser = spynner.Browser()
browser.show()
browser.load("https://www.google.com")
browser.wait_page_load()
browser.fill("input[name=q]", "Barack Obama")
browser.click("input[name=btnK]")
The input fields are identfied correctly in my code ... you can check for yourself. ... So why is this not working?
Trie this code snippet.. I used qt
import spynner
from PyQt4.QtCore import Qt
b = spynner.Browser()
b.show()
b.load("http://www.google.com")
b.wk_fill('input[name=q]', 'soup')
b.sendKeys("input[name=q]",[Qt.Key_Enter])
b.browse()

My Python script doesn't give me an error or shows any output

I'm creating a simple transit twitter-bot which posts a tweet to my API, then grabs the result to later on reply with an answer on travel times and such. All the magic is on the server-side , and this code should work just fine. Here's how:
A user composes like the tweet below:
#kollektiven Sundsvall Navet - Ljustadalen
My script removes the #kollektiven from the tweet, send the rest Sundsvall Navet - Ljustadalen to our API. Then a JSON should be given to the script. The script should later on reply you with an answer like this:
#jackbillstrom Sundsvall busstation Navet (2014-01-08 20:45) till Ljustadalen centrum (Sundsvall kn) (2014-01-08 20:59)
But it doesn't. I'm using this code from github called spritzbot. I edited the extensions/hello.py to look like the one below:
# -*- coding: utf-8 -*-
import json, urllib2, os
os.system("clear")
def process_mention(status, settings):
print status.user.screen_name,':', status.text.encode('utf-8')
urlencode = status.text.lower().replace(" ","%20") # URL-encoding
tweet = urlencode.strip('#kollektiven ')
try:
call = "http://xn--datorkraftfrvrlden-xtb17a.se/kollektiven/proxy.php?input="+tweet # Endpoint
endpoint = urllib2.urlopen(call) # GET-Request to API endpoint
data = json.load(endpoint) # Load JSON
answer = data['proxyOutput'] # The answer from the API
return dict(response=str(answer)) # Posts answer tweet
except:
return dict(response="Error, kontakta #jackbillstrom") # Error-meddelande
What is causing this problem? And why? I made some changes before I came to this revision, and it worked back then.
You need:
if __name__ == '__main__':
process_mention(...)
...
You're not calling process_mention anywhere, just defining it.

Spynner crash python

I'm building a Django app and I'm using Spynner for web crawling. I have this problem and I hope someone can help me.
I have this function in the module "crawler.py":
import spynner
def crawling_js(url)
br = spynner.Browser()
br.load(url)
text_page = br.html
br.close (*)
return text_page
(*) I tried with br.close() too
in another module (eg: "import.py") I call the function in this way:
from crawler import crawling_js
l_url = ["https://www.google.com/", "https://www.tripadvisor.com/", ...]
for url in l_url:
mytextpage = crawling_js(url)
.. parse mytextpage....
when I pass the first url in to the function all is correct when I pass the second "url" python crash. Python crash in this line:br.load(url). Someone can help me? Thanks a lot
I have:
Django 1.3
Python 2.7
Spynner 1.1.0
PyQt4 4.9.1
Why you need to instantiate br = spynner.Browser() and close it every time you call crawling_js(). In a loop this will utilize a lot of resources which I think is the reason why it crashes. let's think of it like this, br is a browser instance. Therefore, you can make it browse any number of websites without the need to close it and open it again. Adjust your code this way:
import spynner
br = spynner.Browser() #you open it only once.
def crawling_js(url):
br.load(url)
text_page = br._get_html() #_get_html() to make sure you get the updated html
return text_page
then if you insist to close br later you simply do:
from crawler import crawling_js , br
l_url = ["https://www.google.com/", "https://www.tripadvisor.com/", ...]
for url in l_url:
mytextpage = crawling_js(url)
.. parse mytextpage....
br.close()

mechanize can't login python

I'm making auto-login script by use mechanize python.
Before I was used mechanize with no problem, but www.gmarket.co.kr in this site I couldn't make it .
whenever i try to login always login page was returned even with correct gmarket id , pass, i can't login and I saw some suspicious message
"<script language=javascript>top.location.reload();</script>"
I think this related with my problem, but don't know exactly how to handle .
Here is sample id and pass for login test
id: tgi177 pass: tk1047
if anyone can help me much appreciate thanks in advance
CODE:
# -*- coding: cp949 -*-
from lxml.html import parse, fromstring
import sys,os
import mechanize, urllib
import cookielib
import re
from BeautifulSoup import BeautifulSoup,BeautifulStoneSoup,Tag
try:
params = urllib.urlencode({'command':'login',
'url':'http%3A%2F%2Fwww.gmarket.co.kr%2F',
'member_type':'mem',
'member_yn':'Y',
'login_id':'tgi177',
'image1.x':'31',
'image1.y':'26',
'passwd':'tk1047',
'buyer_nm':'',
'buyer_tel_no1':'',
'buyer_tel_no2':'',
'buyer_tel_no3':''
})
rq = mechanize.Request("http://www.gmarket.co.kr/challenge/login.asp")
rs = mechanize.urlopen(rq)
data = rs.read()
logged_in = r'input_login_check_value' in data
if logged_in:
print ' login success !'
rq = mechanize.Request("http://www.gmarket.co.kr")
rs = mechanize.urlopen(rq)
data = rs.read()
print data
else:
print 'login failed!'
pass
quit()
except:
pass
mechanize doesn't have the ability to interact with JavaScript. Probably spidermonkey module will help you (I have no experience with it, but description is quite promising). Also you could handle such reload (e.g.Browser.reload() for this particular case) manually if it's the only site you have this problem.
Update:
Quick look through your page shows that you have submit to other URL (with https: scheme). Look through checkValid() JavaScript function. Posting to it gives other result. Note, that this looks like homework you should do yourself before asking.

Categories

Resources