Getting Html from server using urllib - python

I am trying to get data from a website which has a form. Now i am using urllib for this purpose. Form has three fields and thus i need to provide these three values. But what i am getting is in response is empty form's code while i am expecting a table corresponding to my input values-
here is my code snippet
data=urllib.parse.urlencode(values)
data=data.encode('ascii')
req=urllib.request.Request(url,data,headers)
response=urllib.request.urlopen(req)
the_page=response.read()
Values are the input values given to the form. What i am doing wrong? Though i am not sending session id.

Try this from
http://wwwsearch.sourceforge.net/mechanize/
import re
import mechanize
br = mechanize.Browser()
br.open("http://www.example.com/")
# follow second link with element text matching regular expression
response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1)
assert br.viewing_html()
print br.title()
print response1.geturl()
print response1.info() # headers
print response1.read() # body
br.select_form(name="order")
# Browser passes through unknown attributes (including methods)
# to the selected HTMLForm.
br["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__)
# Submit current form. Browser calls .close() on the current response on
# navigation, so this closes response1
response2 = br.submit()
# print currently selected form (don't call .submit() on this, use br.submit())
print br.form
response3 = br.back() # back to cheese shop (same data as response1)
# the history mechanism returns cached response objects
# we can still use the response, even though it was .close()d
response3.get_data() # like .seek(0) followed by .read()
response4 = br.reload() # fetches from server

Actually problem was that i was not sending enough information in the header. So i switched to "requests". I divided this request in two phase-
First to get cookie from server
second to get the data including the cookie received in first request
Then server responded with proper desired data.

Related

How to submit data to a webpage using xpaths and the such

I'm in the process of writing an API for a craigslist like website, and I finished the data getting part using html from lxml.
Now I want to submit data (login info, things to be posted ...) to the website.
Can I do it using lxml or do I have to use another module ?
As #furas mentioned you can use requests module for posting the data (login in your case).
Here is the simple example where you can use requests module for both get and post
import requests
# get the token
resp = requests.get("https://www.botoxcosmetic.com/sc/api/findclinic/GetFadToken?_=1556315102966")
# print the token
print (resp.json())
# storing all the input data in dataI. In your case you have to replace them with username and password (check the API documentation or devtools to make sure the param name(s) is correct)
dataI = {'ZipCode':'10022','MileRadius':'1','PerPage':'5','Token':resp.json()}
# post the data (you don't have to click on any submit button) and capture the response to the the post.
resp = requests.post("https://www.botoxcosmetic.com/sc/api/findclinic/FindSpecialists",data= dataI)
# print the response from post call
print(resp.json())

Python Requests Module to send sequential post requests

There are 2 forms. The urls of both forms are different.
1st is : mobileSearch.jsp
2nd is : mobileModify.jsp
On 1st form i need to provide ctn and click on submit.After that i will get 2nd form where i will have to provide some text fields let's say username.
Now the issue is i'm sending POST req to 1st form and getting proper response but i am not clear on how to use this response and send another POST req to update some text field.
I suppose i need to use Sessions module but not clear on its use.
I have been searching the net for the past few days but didn't find anything useful. Any help regarding this would be appreciated?
Below is my code. Its not giving any error but its also not updating the value of parameter "paramName1".
import requests
proxy = {'http':'http://myhost:myport','https':'http://myhost:myport'}
url = 'myUrl'
data = {'paramName':'paramValue'}
s = requests.session()
s.auth = ('uname', 'pswd')
s.proxies = (proxy)
s.post(url, data=data)
data1 = {'paramName1':'paramValue1'}
url1 = 'myUrlNew'
s.post(url1, data=data1)

Python Requests getting final link

I know I am probably going about this the wrong way. But i am trying to figure out what resource URL's in our proxy server config are broken or redirecting to a different URL than what we have on file.
An example of a resource being passed into out proxy prefix URL is:
https://login.proxy.library.ohio.edu/login?auth=ou&url=https://www.whatismyip.com/
When this URL is resolved it should redirect to the proxied link
https://www-whatismyip-com.proxy.library.ohio.edu/
What I am wanting is to get the final status code of the final URL after it is resolved and redirected
What I have code wise, just a snippet...
proxy_url = "https://login.proxy.library.ohio.edu/login?auth=ou&url=https://www.whatismyip.com/"
conn = requests.head(proxy_url, allow_redirects=True)
print conn.url[:-3]
The [:-3] is to remove some weird unwanted characters at the end of the string.
However it is only returning the original link I am passing it.
How can I get the correct proxied URL after it's resolved and redirects.
You can use history property of the Response object to track redirection:
resp = requests.head("http://some.url", allow_redirects=True)
for elem in resp.history:
print elem.url # "some_other.url"
Response.history contains all responses with their urls from intermediate steps. Read more here

Trying to access a password protected url using python

I've had problems accessing www.bizi.si or more specifically
http://www.bizi.si/BALMAR-D-O-O/ for instance. If you look at it without registering you won't see any financial data. But if you use the free registration, I used username: Lukec, password: lukec12345, you can see some of the financial data. I've used this next code:
import urllib.parse
import urllib.request
import re
import csv
username = 'Lukec'
password = 'lukec12345'
url = 'http://www.bizi.si/BALMAR-D-O-O/'
values = {'username':username, 'password':password}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')
req = urllib.request.Request(url,data,values)
resp = urllib.request.urlopen(req,data)
respData = resp.read()
paragraphs = re.findall('<tbody>(.*?)</tbody>',str(respData))
And my len(paragraphs) is zero. I would really be grateful if anyone of you would be able to tell me how to access the page correctly. I know that the length being zero isnt the best indicator, but also the len(respData) if I use values as stated in my code or if I take it out of my code is the same, so I know I have not accessed the page through username, password.
Thank you for you're help in advance and have a nice day.
There are two issues here:
You are not using POST, but GET for the request.
There is no <tbody> element in the HTML produced; any such tags have been automatically added by your browser, do not rely on them being there.
To create a POST request, use:
req = urllib.request.Request(url, data, method='POST')
resp = urllib.request.urlopen(req)
Note that I removed the values argument (those are not headers, the third positional argument to Request() and you don't pass in a data argument when using a Request object.
The resulting HTML returned does not necessarily include the same data that is sent to a browser; you probably need to maintain a session here, return the cookies that the site sets.
It is far easier to do this with better tools such as the requests library and BeautifulSoup (the latter lets you parse HTML without having to resort to regular expressions), which can be combined with the robobrowser project to help you fill and submit forms on websites.
Note however that the page forms and state are managed by ASP.NET JavaScript code and are not easily reverse-engineered even by robobrowser. When you log in with a browser (which has run the JavaScript code for you), the POST looks like this:
{'__EVENTTARGET': ['ctl00$ctl00$loginBoxPopup$loginBox1$ButtonLogin'],
'__VSTATE': ['H4sIAAAAAAAEAO18zXPbSJKvBYmkPixLdrfVs909EkbTPZa7JYrfH27bsyAJSRA/QJEUbXNmVg8kIAoSCNAASFmanXd7t+4XsaeNjY2NcEfsfU8bsZeek3V8t72+/+D9D/t+CYAU9WW7pydidzZMm0AhKzMrMysrqwqV4n+Mzf1s3Le4t5c1dNs0NKuivOypplI2LDsjtY7yysne3oLP92XL1kKhL9xrVZHM1gEn9yW9pcjhL1rNWqUidhXd9+CdaFnNsBTZt/JOxIxmtI6A+dXbMT1Iy1b7iu9X74Nb5n0PR/Fa3YOipOqDe9bQvij2NFutq8pxeG5O3pd9/Dvws0anK+knOcWWVM3KmGjxQLFyki2FL/Oa802vhRPRZCoejsfmFpjFOxsmTK7odjWfE3JW5P+O3Rq7devWf+BDd/pMUOF/Vk8sW+kE0Z6mQF1Dt4Kbiq6YaitYUC37f4R/8xsPRdDtaGSV7Vgtw9TU5ipbV0wLBE9iwRD9W2WzEKpnKk90pWebkrbKlntNTW2ht2vGkaI/aSaTUrwVT4TT0ZgSSqV/97txiODfU8He8u1Z6qkyudd3uQZu3ZqcnJxigDDmfefoYQLfSYdu5DOzwOzPkdr3N7maCQdTTM94NdXWFN9sRtI6ksnKQQP/ZMKWF27T5R7jI7pAC44Ka/n+bcxFXWW7pqGe9g1ZP5RWWdtsG31Vl1hVZy3bMFW7r3jcVtm8Kpvqm++UvsT2oK7ERmKZVTYaCoXYrKIdKkE2J/XffOdSdyRbdcpn39tKX9WOwL1rWJrR11Wq30crOhBUQGXJPnLuh4p9KLH9AaLSYSULnQOBe2xTPVWDlhqUGT80WWJ83485ra6ystfqSEvXtX4E1aUjW12FZpLds0CoHEp9HWN1lcWQWWVPpb5yKulKCyU2l6uvXpWS7KUeGDKVZKNJ5jgCa6mr2uQImmLrBnBN4813qmbIzDRZfaKmvLJ9k0FPBZmZIQ3GfX4C0PNt93nCNnuKzMy6TysHtt19tL5+fHw8oFxX9X3D7Egt9VBZ76rWkQGR1mXmjkvxr9OPrZapdm3WPukqT5ZtNLsOFSUXuvx0dprFRzZavQ5sGjxG/yqavrJ8gey3l+l+u8xaZgtwT6BTpQO791U5qEuHsiMX9F8vVRuDirVoMBwNRoOH1q/DwcRvl58+Xma/ZpfXXX5Plx9+MzvtytKHt3akLvuE1Xua9s05sH10FabCe69C4cUYBxfh+z3dGeSsdWAcF6Vu2YRYyvFKS9KhlSCvsi3DMOXn3v0F+t4wOqtsz0QnujTE9CH7e5ffiKhZhwWa+2LlwS8fQK0Bz4ffnOOq++zKEDdIA37lIfvkCRt6eI5DH1NBINGH1qDPOZmqIz5t1YoFNLa8/M0FlIvq0ueywRwxrhjMJb9osCuMlWN2pDNXrlMQmEFJlsuS3oDhvOlvpVTdK3OlhigW92ovynzk4RX5LrIObuZXLnbE5TYsxc7CVRVzpX3kdtJlM/9ipLveadyBQS6JIQAs2Kq18nsiwph/tC9pFkIL4VcV+9FyV9WX8ajLyqtHYfYPD6815yWurlCO4L93OD2iC+KGKbXbUlNTBq0cGJgM3IfrWJOh+T6sUHBiIVgutxB/jyDQ0M9XOgOHZY8hpXEcNLCiWHH8eXmvqUn6EUbde3LvGD1LIZlGWhhp4IuVZVnt/0aV/+bJA1q3FKQTzGV7zm2vjtnVMPcqhmGTV2CKV8wHv1t+GGwdqJpsQqaHQeXlSuhhULJttGXTxIV2lsumeiSd/VFlEbWvRphrhHcc0LOxJ9xV5eA/VYVmfEVeoeg6QPnD7PTjQTiSmXk3fs4goCOm9iUQyMxdZ8KsI0Cjqw5khnGR/m7smiDb1aDoMtvSJMt6stxqrUm6pJ3YastaftqV2oo3WQf3IE9dgto93VTaZHBTkaHbUJZ3BPBWSwqaiqCr9soIkcyMY4qfZMbouk9i+xxJfBOmccwuME6Fsxag5QOuiz+TerbRwtIBE5ayZilmX20pa/AWX+KaWcert1CgJWBQsjqv1rmsuyDEAqRj6DLaoIY/3lc1KJWVbKWN5YNikVdF/nn8yqrsNi1yfp3pWXAQyzpfZIXfd5FFwctZ2v3D2Hus7ThZatlSMJNTzeCg0eAzpRmsogCKoCfxyYYjvwB+q+xPlO5tyz4s7+J/gky/+R2ZDevF0YVhwHkYLnejfwJf4jojL/gdT/nC9ZcRr0HR8f3xOnYfTGAUjqHjX5ylxTotcjZURZN9E2JZqC7eIWBd0nqKC2WE3OLM3i8ImjF6utyW5xZuM2MMw4wzE4yP8TMBZpKZYqaZGeb2wu1539SmWOFL29U852NC6fa8bxIA4ew1PYfDeL6/vbsp1htbYq7EsdWCWOdLwjYhJwk5L1ZEFzkUxfO9klircMROXMtXOI9NCDV3xAxXKI1CI4DeFasVPlfa5l3GrhQpVEyVxVyFq7sAwpwsi8XdivtMYk1VubowkDsGwFwVJOBUFqsDugTRNbjhc7y9MLYvM1OXTbs06fOLmT2hyC9N+XxUyi1Nt+X5+/8+dus2t50T62ev33wrlDg/V+befMtPZvhCTShlBRRKfE7I1+5mhGqtImQ5VsywVbFWECYyBT7nyxTEPO/PiFsQdSoDW9VLQPJloGx2OlPhSG0wnMhUctxUpsI3xDrqJ1E6+0HI8oEsV8qLdc6X5Qvb/P0sX8njzqIjBr3GFycJSlz9TkGcphtsuc3lp7JCJb9b4Er8zJtvgX/2WqxnhXsogwP+e53HF6cIJBbRyGSOhyIlIR/ICXXoyvlyYmabn8a1AtOidT8VS9x9ukG0NQC3xDq7WeFyfHbKhYL+bk4ssOWKwBa2dzOFba4kEMSVma2J5QLUm8yJxbMfuAIfyIlgWeemqM9FYuXP7ZYLfP4jV0+OrXNVagl8+GnXQR1DorjN38GVcEC2KZa4GedRcCSa88plvlYh3ScIOLVZEatgv80HttC76Nc7WyJ6Ya1aEJz+8cPZYRjcKtu7hckt9FMNKt3ZqpSpH9byYoOcQchVMBIYYXNeKAjkmuzAC+Zgu9Kbb91+IjmFhljgJuHmGDzQe5tv8CAQp7Z3K2evyZP8ea6IJnx5rsQVpvPEGhLVxUAe46Yi5OguoG0MOIJv8748+qsUcK45Dk9lvvJxXqzWMPp4ciKngyt5cAaQLwCjQVQ0Omfz7hh1xOOm8jDnD3ydhIA0ebS5W9oUa2C62yhy/gINWNGPvuMqtUCBmqsDKtSg/NSgdwFAqSRO4laDTJVAQcT2mc/exn0N3bpJwNmCEyxYeAIMeNt5QueINTF/DyMBXZ0lscuiY3lfYZcGW2E374yKqSK3ffYaaub9RS4PXwoUYRkMqknc89S7gSKfq4s5VPClTf7sNe4YiXkOdxpO3KdFIV/g6mc/UCNueOGLLLnV7r2igNbXRg3oA6jETTtXONCbbyeLAjwEnna3KOYrYgmK1So8YgHULsKQNJLnncKIjweKcBY46awbvRAcMjDtRHEX9i1x+YIYKHENDggzJYwmz1+mURbZIoaiGKA4CAeZAjux5ninWCmKZz9MilWhQM+BMl91BkNZACau9Tw3g1gIaWqVGp+nUFrgt+BdiKG5Ol/Ic1OkMwZ3iQ+g1ODhmwibNXG7BG5QSSzkJnHH7hzWRaHOwfUmyrXd7UB5NytCnOkKBcK1jQpXDNDIx4NzR9Uc7hSlEAnJxoSKCFrYhoj38FziL4Qe1DYcB9zm71XAZhBDnegu3EGnUDsUFDBeAujvksPHvY84y3wF/uY4V7XAIThz0y7A8YuK64u+yu7Za36+yjt2Gwror/JFjLZAlXdGP+4ISggK7py07YW2e8MpajjM7w5APPSBtFl+ukrd5TjbZFUswMR17iOa4gSaQc5tEoCbYazxuBM691m1HmQRoshIbN2ZZoOeL2xNVet8jWMh0c+pJKDktHEZ76/cWoQUYZsaohY50kj4bLTmEtVttw5jFiwnz15zBbLYXchWRNDkK2v1Cvwiw6GKumK34hRqiEUzTmGbr5T4bReIBlxgZRfDtDZJcSOL4HCbChuI0JgNuWnnwXGHj85eF8n1nXlimy+g9a0BjJxjMIbmHVitJJJaZS4vfDQEODMMhSJhCn199ppGrx83TBu+GofgEKgJFGE5f00sFIXSZK2SIWfkAzR0MdJvY7YTaX1AU4yvVkFkRxV68M2307VdzIROxwUwbOAK/CzuCCgUM8hDnCeepQjpIoGQ3/bVhRxf9NeFMno/UBeciOJHYIIquG3T7FrHUCWHrldoYHKT9d0GTQ3cXINz5rTBGPA3uHrlzbf+Bo0N3u/MltlptFLgGyAUAijWAPNB4oowTVd3jTFz9sMu6ooZxEtax4yFx8Pp+FhkPBxL4Zsei46H46Gx2Fh8LIFSeCw5lhpLoxTBsg+LMyYcZcIxPAMvnWDCcSaMa5IJp5gwocWYSAi3OBMB43iCiYBzPMlEokwEVQADhMfUeCSEW5qJhpgoYaaYKGGCRQL0CYASESYaY6JxJppgokkmCow0EwsxMaqLMrEIE8M1xsTiTCzBxJJMLMXEiD7GxMEinUQxzsTDDEQnaRMJBtJBMkgFidBiPM0kQkwizKApcEzEIFSKAVECqieSDP3HM3imU0ySxMItzCQjTDLKJGGERJpJxscjYUieDDFJUKUBSTJJ4KWZVIhJhZlUhElFmVSMScWZFDDAIJVkUikmlWbSISYdZtJEHmHSUTSPRsA7Dd7JGJOO4xZnYOZ0koEIaUhCGCFYIASiELRCo5FQGl+CERnUDqGZEGhDVEavJvEFTjgM2jDwSN4waKkfk4CnQqAnGHQJkRSAp4CXIhj4h6ke/QCSFJGBbRgsqb/R2RFiSb1NXR1BfYTsCBEiEIE0iVAZNLABSYC+D0dAGkU5BRQyCrkA9b/TBFCjJAHwoqiPUj3B6JmkxTcGWvKDGPktaGLATxOMuoDgwCezpqMffz3+72Pt/4YfWWY+G2xYJs43LIyYw2aFcXcqcwtzN+3xmFnmDoP6ed9YCFsg2oqNhb091FjE21yNRb3921jc202hB50NE0qRkLeVwnDztn3knt5GjaKIt3Mkt/Z2e+MRgjr7x/G4A6VmJkBGZWpgujR43S/5MILcndn4ULuc+GfULjbQLj3ULj3QLpYeaJdOD7VzgK52DtTVjqCudjGCutqlnSI140un3Ycb1fv51R3/cMuvop+XBuo7e/tRA/jODbDgw143Z3SkN9+pgzMRKI02Z2q9w3MIiTRXVM6+l/QRPDJDoKSc6pJuwB60v8+7daylsLrCyoZmvPlOgtHI835xo0T+0S5Z8GPrnzN7Z39sKrrEapJl66ouuWLNNyRLaV4Ak2z3GpJMFBcqSLx5V+hRMFn0bs48+6PUvwCnHvx8eG4ksQZrNDX1SB2gKKQi1Pil+0IuIOpZeqfru0uvswtGW9WLoNRWHn4jMw+gKpCWZq6rBY+HczdXM1+/rXLtHZzD76iPv6M+/bbGv3lb5ZPRwHIj3uh7kokCX6Oh6d6dlyTjE5iUonSJ0CV8BYDQPI5A5o3wC1EsQG61VxWGnnWDCPP3ubvqaVOhszT2iK6meriwtraGtThWYlg0O/s2jgVosmoqsmX3pY9zhtZ2TgnhNJYH5EqK0UW5o9jKCJhOS+UhusLSGWYfdwwFiw5JWVM67SiHdFSbEkfoDUs3+pf46JJmOCOBBhRwO3RKCmdczI00MDj0VIboH49WQ8FDCY3dz5vw7MtKfDVQkV2xjnpd8KbDVEO2e5BQ6T+kp64J2x0a/gbYt5WlET7XNb54oX5wNDwUYz4HZc0RCVYu4L/FWp+LzT7CjWHZKgzM9lXT6HtclH7Aiz1zomPGYZxKVJRTxew7R7CXe6YrWTe1tTBqQWPY7icXZD2HX9ThLXw/JkcTGnmuwe5Wy/yW42YLZe+MmlVPPYNJ+qH06ZtvVdJ1cIKNXvG6QuGq5AxKpymxbB8WcI1ySv0zTBCAcciDFNk4MtBDEgk2zB1QPkGMbavnR/yDRubLA4ghO4Avq7ZpnH2P5pua1HbP6jFkJG0k6UDpzw6xZEWT5kW5q1rquWi3uY5h2uopdYO0cKlpr6GlnHc+z67DDu0edLtgjnsb54f7nrB3z0Eekzsebw9h1n30Kr/ISX2wB1c3FeDIUcGQyW8V8JCl/pE6sPogW4CVNNWT5+6wnwa5D8Wy0UdPQAJjkLpgoBtIaEVvKX06HAMuDOcYhtQ9YHsm24fvmJh2e7p1pHTwgNYOe5+Rc2xgh0kv/AQWTsIVaGdGPvJzqjuHkPOcvS6J1ZpTi1lcpnluYPBPLj07h3RSN+jnM0Itx91xb9R7XVj2Y9Ir6OE4oitt6Wfo0aDTm5cqPr8oSU6EFENBlhzrSbCeTvEMLXvjEPGlIvIPr9YPBjDVcw+/dpiL/IaQFbD7ZzdFehfPVdwGyiK9cBhE5082DYuma9Phc+4ol0yFfTV20UMJP9lSbRPhwVD2gyxm+L7quujncPwulgvkCFcqR2PaNdVfUYvVmlgmyQrOK156NUEtOt2ZpddprtCfV12be+sKDBln6DphNPhglA9Zte6KLZTqPO5DJp8OmLhzhsNB1fuKZQcfXTJgnisL2NXTi6WymK8INYehy1vYYumFUrXG14nrohdCg7AlTYwOW7QQdPooeJ+7dQuTcIqm41SMLnG6JOiSpEuKLmlc0iG6hOlC03aaKNJEkSaKNFGkiSJNFGlQYDFPlzBdIrfoGqVLjC5xuiTokqRLii5EEiaSMJHQ2iBMqwTsTulCFGGiCBNFmJAjhBwh5IjDP0LYEcKOEHaEsCOEHSH+MaeC6KJEFyW6KDUSJbIo1UaJLEpkUSKLElmUKGJEESOKGFHEaDfJ/Tl2buNvW339+h3ruvHhdmFipDzcOnjnV7LsLKwCo0u5CxuMycVxTbMOFsdty2gvTlBTeLAULfJ//oLz+AafucHD5MzAZCOqL90+WJptL905WJr7b6it73wTufhZAVFe2zDMjuXmYoBBVZe66G478v+Y61MDvsmop+pPSQv4Xz86LYCyHQaJvMGhzKvsT5DkHSkAX75H++9z5P/L9+DjHvFTGgtt7CcOHJek5MQDU9n3fXVd2oeXALpOySyQfM3JDl2XmYnhMF8gh/Z7I943tzCxOOVkrR3YHc23kjF0KG7TXqRFO5JDWtSzfUM7NFBwJ78joxNcmmi7wkAsN9lgYnFc0uxL+ayIDGbL8S3nzcSlZNfLEWhiGIdcvtNu5uxMSeqrbSw5d03N94trtP596A/rHeNQWaP11brsJbYs3fd9dJ6I86hvqPJKiPafNwRQxiNavqEFi95zdGBRc92NxQ72r27AHmx61hTdsBWimPAorksSdTSQbLWlG2vewlJZvxCBXCvNeglBsHagZhhaTe36vqoZLJYlIPJ6y01TMlVT0tUO2+saptR0e20xwOuUxSYfLExca4Wlu9fajIxzzYwx6eQqu17lukDA0Fs3vBO5kTW8aGLpZz+GwkcGgCXXr7FkhisUucpabk1cEy93gjuEprzMsarARkOJaCwUjQzTe6ejkWQilIqEQiEnqdd/gwyQd5uy9YpSt6ZaK/FIOJmMr7JYX4RTCOkPLvj5317JwmbZSIzN/K2TDP0A3Cba5wk7/3vssdWVdJbiDzZr3SfL6EtFsdckWTYVy1p+ei27zON1Inu6yl4mhxGwoltrGbKy/JRSvz3MK4ia0ULQsE+WnzqCeWiyF31GREy5lF7qXPfA0JXlET42ZgwX+BTrQDYWi6aHvD7yIhlzf5TpBMy80JFUzTYe9UwVG46/RrjVgrpiLzKmPtotwylqUuhIbSck/MPYNX7g1Dpzh4SYGJSs7qtf81w8Z9RfSFs7DelZ1nxVTm8YKTWX07ZKqV5L4Pu6JdXCm19G9kNWOpLrtyPdlxX7OHlUfN5/tnMQK0RCRyf5Qqa0vXWSNFMprcyVQ9Hq/g6vlEFUV7a04+dNM5o62Dmx0qEvo7Tu+mREUTcAHI6Iq9oqJRuvK6/sYaYeJC0dcx0uF+IEjjO4HEf3I46PcZXjTJPjd7iddqbB8RZXce5weO7ifYfL7HA8z3mv+/1e247/j8wmE+54xuM9X5GzaPrBADx0t/V9xXY2Uwr9FcCRBo+j12WIL26nstF0cDpLcwT+d1S9ZxvsEza8mk6z/G5lECDRzPlC0nGetbIX39gLg4TFPgg+zWIzo7CWQZvOfkt55E4dlIRGOW9Zo6fb+8P16vjAmZzlqhOjpofB4et3BYeReO5aI/CnzBpeNBkLUwwZybsbCw3/KmAs4maJ+b0o6oj9GTM28rIUgE99//S+fryfyti93ZIVfl7aifV3hC6c79gyT+qvpFNzu26VI1uyoZ4C2nhVFjp8vXGYKHOJZq2W2ai+iAKeOYjtb0RzTW278+zLSLNla6WdyE5DK5mb3czLaP7Fq3gx+ywXTbafx8yTrNl2fXn83HX8rhLMwsRFJf7+fZVISfqRHVeeZaXaYcnaiYXr9m7+RVnd2Hq2vxtN7lZ4K90/aeT6z05CnNnfCGdyiX6ef3lY2slkmvvlnePYfnkzcxp/GRcqpWoumkqWas3CK6hT3Xx28CpXTSbqWw11s7fVFa6Iz1wQ+h/fV+hSezPx/MR61T1uFnLF3Xi+3KxGQzCo8MyM67WadLCVr3R60mFms7gFScIHx+Uyt//ypVqt5NvPZCVVfl4vHhQSae2oEI9xG+rRSymhylrkJN8Bfr5eq2Or2+TkTG2n265YVwSfvsHu//KTVIhYfe7woFJoHh/WMulQNHsgmhnIk9isSRsHr7q70ecbcSu78aIC++7z7dP6zm69lTosV0PNciLXPmk8z0nJbVAkX+zU+6excvHgefKkqecytRehbvfVgcaVGr3Obo2M9bxsvTx6W3/8NCeSd7XnYvE01NjdN3dCfSUuHJe4rrHxMnwi5huV/ahy/PzFaTO0D3EbL3byL8KJTKOBEdHXzNSpnjN1e4vPp4uHuVOjpeVf9iKv7FziVepKX9CAvzOYv65dJlwfM8Z/NIXvR1MEfjTFzLlis+dFb/KnwuiLAgDv+r582w4EC/I1Aq5fWDxeeelwcfV//fng/P3Z8b6lnCcXD7OKr08nHuYRX5NAfClz+GrK8Hmu8DBJ+Dw7+HJa8DAf2JWQCaUp3yWUZEJRJhyixJdQignhCmCMCSWYUPzjr8dnL7zYuXhW7Teae2pHodRhKsmD1GFi/yF7+EP28Ifs4Q/Zwx+yhz9kD3/IHv6QPfwXmj1MS5kPCcQfEog/JBD/BX6c3crEyIbO3dz9Se/ORk4cB6dbo3++OrH0V6NvfhfHe6qsDM5RHHzfyPEIvRbZp/dSI69Krj3l9jun3LfoJzy8w5Thrvfi853B32Y77xiZQWFiUPAPtsa+C/tbel/J9Ns+f1pupiMx2X1T472CXZyrS5oqS3S2uWkave4QbfBmYXEE4rsCcd6TDA+E/AOLOW9w/wmBV9JYXeqw7m+ssDr9uTrbV958B+jZ9+cneMHHTXP96cXfXhh86oRpm4res3XV++EVesfr7fDpkKlpKqddTXJynkZOm87Pmt7G/y1VmXfxZTXp4Mh4dANjWe17hxIPmj1NU2zrwQ0NOeg97S21DoamPq0e0DHa6C/dWH3jkC0bpr2vaIf00yMOgmodSZp+nk7qvEM/VA9Yq2sa9BsJRhNGtDSFMu86Drp+2Hu8jibeLURdlVVW7kvDVEtpJEnZE6336C2GHf3cnNxGIju1yiB7ThlJOHyHrI/Xb7Tn43X0zNPhcP3grOwHZ/0v76wB9x0n+65jpJEshovh+Huf6OR0jnog5UBbNm6ukzgn9l3DVo4044OXf/Dy/9yQ/MFhPzjsf3mHff+w7HeXyg76j8qV8063Pr32Z5UoZ2Dd2wS4x/7+H/3bf4GLvyX4/wE1p0V+k1QAAA=='],
'ctl00$ctl00$SearchAdvanced1$ActivitiesAndProductsSearch1$RadioButtonList1': ['TSMEDIA '
'dejavnost'],
'ctl00$ctl00$SearchAdvanced1$DropDownListYearSelection': ['2013'],
'ctl00$ctl00$SearchAdvanced1$SteviloZaposlenihDo': ['do'],
'ctl00$ctl00$SearchAdvanced1$SteviloZaposlenihOd': ['od'],
'ctl00$ctl00$SearchAdvanced1$ddlLegalEvents': ['0'],
'ctl00$ctl00$loginBoxPopup$loginBox1$Password': ['lukec12345'],
'ctl00$ctl00$loginBoxPopup$loginBox1$UserName': ['Lukec'],
'ctl00_ctl00_ScriptManager1_HiddenField': [';;AjaxControlToolkit, '
'Version=3.5.40412.0, '
'Culture=neutral, '
'PublicKeyToken=28f01b0e84b6d53e:sl:1547e793-5b7e-48fe-8490-03a375b13a33:475a4ef5:effe2a26:3ac3e789:5546a2b:d2e10b12:37e2e5c9:5a682656:12bbc599:1d3ed089:497ef277:a43b07eb:751cdd15:dfad98a5:3cf12cf1'],
'hiddenInputToUpdateATBuffer_CommonToolkitScripts': ['1']}
That's a lot more information than a simple username / password combination.
See post request using python to asp.net page for approaches on how to handle such pages instead.

urllib2 sending postdata

I want to crawl bets of bookmakers directly from their webpages. Currently I try to get the quotes from a provider called unibet.com. The problem: I need to send a post request in order to get an appropriate filtering of the quotes I want.
Therefore I go to the following webpage https://www.unibet.com/betting/grid/all-football/germany/bundesliga/1000094994.odds# where in the upper part of the bets section are several checkboxes. I uncheck every box instead of "Match". Then I click on the update Button and recorded the post request with chrome. The following screenshot demonstrates what is being sent:
After that I get a filtered result that only contains the quotes for a match.
Now, I just want to have these quotes. Therefore I wrote the following python code:
req = urllib2.Request( 'https://www.unibet.com/betting/grid/grid.do?eventGroupIds=1000094994' )
req.add_header("Content-type", "application/x-www-form-urlencoded")
post_data = [ ('format','iframe'),
('filtered','true'),
('gridSelectedTab','1'),
('_betOfferCategoryTab.filterOptions[1_604139].checked','true'),
('betOfferCategoryTab.filterOptions[1_604139].checked','on'),
('_betOfferCategoryTab.filterOptions[1_611318].checked','false'),
('_betOfferCategoryTab.filterOptions[1_611319].checked','false'),
('_betOfferCategoryTab.filterOptions[1_611321].checked','false'),
('_betOfferCategoryTab.filterOptions[1_604144].checked','false'),
('_betOfferCategoryTab.filterOptions[1_624677].checked','false'),
('_betOfferCategoryTab.filterOptions[1_604142].checked','false'),
('_betOfferCategoryTab.filterOptions[1_604145].checked','false'),
('_betOfferCategoryTab.filterOptions[1_611322].checked','false'),
('_betOfferCategoryTab.filterOptions[1_604148].checked','false'),
('gridSelectedTimeframe','')]
post_data = urllib.urlencode(post_data)
req.add_header('Content-Length', len(post_data ))
resp = urllib2.urlopen(req, post_data )
html = resp.read()
The problem: Instead of a filtered result I get the full list of all quotes and bet types as if all checkboxes had been checked. I do not understand why my python request returns the unfiltered data?
The site stores your preferences in a session cookie. Because you're not capturing and sending the appropriate cookie, upon updating the site presents its default results.
Try this:
import cookielib
cookiejar = cookielib.CookieJar()
opener = urllib2.build_opener(
urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel=0),
urllib2.HTTPSHandler(debuglevel=0),
urllib2.HTTPCookieProcessor(cookiejar),
)
Now, instead of using urllib2.open() just call opener as a function call: opener() and pass your args.

Categories

Resources