Can't post to proxy form - python

I want to use urllib2 through a proxy http site, post to the form component having name="what", click submit, and return the resulting webpage as a string. I know many have asked this question before, see here for example. However, I couldn't get their solutions to work for my example code below:
url = "http://anonymouse.org/anonwww.html"
posturl = "www.google.ca"
values = {'what':posturl}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
html = response.read()
print html

piggybacking on Christian's answer:
requests is a very good library for this stuff... however, urllib2 also suffices:
import urllib2
def get_anon_content(url):
anon_url = 'http://anonymouse.org/cgi-bin/anon-www.cgi/%s' % url
req = urllib2.Request(anon_url)
response = urllib2.urlopen(req)
content = response.read()
return content
url = 'http://www.google.ca'
print get_anon_content(url)

in youre case you can just use this url:
http://anonymouse.org/cgi-bin/anon-www.cgi/http://www.google.ca
its the same thing as using anonymouse except you dont have to go to the site you just use the url
and next time make it easier on youreself and use requests you can get the same effect of urllib but with like 4 lines so check that out
and good luck :)

Related

How to GET responde status code from get request?

Hi I am very new to python programming. Here I'm trying to write a python script which will get a status code using GET request. I can able to do it for single URL but how to do it for multiple URL's in a single script.
Here is the basic code I have written which will get response code from a url.
import requests
import json
import jsonpath
#API URL
url = "https://reqres.in/api/users?page=2"
#Send Get Request
response = requests.get(url)
if response:
print('Response OK')
else:
print('Response Failed')
# Display Response Content
print(response.content)
print(response.headers)
#Parse response to json format
json_response = json.loads(response.text)
print(json_response)
#Fetch value using Json Path
pages = jsonpath.jsonpath(json_response,'total_pages')
print(pages[0])
try this code.
import requests
with open("list_urls.txt") as f:
for url in f:
response = requests.get(url)
print ("The url is ",url,"and status code is",response.status_code)
I hope this helps.
You can acess to the status code with response.status_code
You can put your code in a function like this
def treat_url(url):
response = requests.get(url)
if response:
print('Response OK')
else:
print('Response Failed')
# Display Response Content
print(response.content)
print(response.headers)
#Parse response to json format
json_response = json.loads(response.text)
print(json_response)
#Fetch value using Json Path
pages = jsonpath.jsonpath(json_response,'total_pages')
print(pages[0])
And have a list of urls and iterate throw it:
url_list=["www.google.com","https://reqres.in/api/users?page=2"]
for url in url_list:
treat_url(url)
A couple of suggestions, the question itself is not very clear, so a good articulation would be useful for all the contributors over here :) ...
Now coming to what I was able to comprehend, there are few modifications that you can do:
response = requests.get(url) You will always get a response object, I think you might want to check the status code here, which you can do by response.status_code and based upon what you get, you say whether or not you got a success response.
and regarding looping, you can check the last page from response JSON as response_json['last_page'] and run a for loop on range(2, last_page + 1) and append the page number in URI to fetch individual pages response
You can directly fetch JSON from response object response.json()
Please refer to requests doc here

Python - Reading different urls using urllib2 returned the same results?

I'm trying to use Python urllib2 to read some pages but for given different urls returned the same page.
The page is an inquiry for campsite availability for a given campground from recreation.gov. Since there might be a lot campsites in a campground, the last index in url tells the page how many campsites will be listed.
For example if startIdx=0 the page lists out campsite 1~25, and if startIdx=25 the page lists out campsite 26~50.
So I constructed some urls with different startIdx but after using urllib2 to read the page, the returned html were all the same -- it seems somehow the startIdx in url was ignored.
In addition, if I manually open those urls in browser the pages look normal, but if I use webbrowser.open to open those urls the pages look weird.
The brief sample code duplicates the problem I'm having:
import urllib2
url1 = 'http://www.recreation.gov/campsiteCalendar.do?page=calendar&contractCode=NRSO&parkId=70928&calarvdate=03/11/2016&sitepage=true&startIdx=0'
url2 = 'http://www.recreation.gov/campsiteCalendar.do?page=calendar&contractCode=NRSO&parkId=70928&calarvdate=03/11/2016&sitepage=true&startIdx=25'
hdr = {'User-Agent': 'Mozilla/5.0'}
request1 = urllib2.Request( url1, headers = hdr )
response1 = urllib2.urlopen( request1 )
html1 = response1.read()
request2 = urllib2.Request( url2, headers = hdr )
response2 = urllib2.urlopen( request2 )
html2 = response2.read()
In [1]:html1 == html2
Out[2]: True
I have no other knowledge about how things work in inquiries and PHP related stuff. So I'm curious why does urllib2 behave like this. The Python version I'm using is 2.7
Thanks!
The web page may change during runtime, whereas you are only requesting HTML. There is probably some JavaScript that changes the contents of the page based on the URL encoded information. If the content was loaded server-side with PHP, then it would be present with the request as the server changes the HTML before sending it. JavaScript will change the HTML after sending it.
In other words, a regular browser will change the HTML based on the URL using JavaScript. Your simple request will not do that.

Python: Using a Cookie to Maintain State

I've set up a system to practice making a Padding Oracle attack and after much work I've discovered that my exploit isn't working because my code isn't maintaining state with a cookie! After reading up on cookies I could still use a little help on modifying my code so it properly maintains state.
I start off by making my cookie jar. This should also grab the cookie from the site I want (to my understanding):
cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
opener.open('http://192.168.1.12/main_login.php')
I have normal working code that grabs the website data so I can parse it through BeautifulSoup
usock = urllib2.urlopen("http://192.168.1.12/main_login.php")
data = usock.read()
usock.close()
And sends the post with the appropriate data:
url = 'http://192.168.1.3/check_login.php'
values = {'login_captcha': CAPTCHAguess, 'captchaID': BogusCipher, 'iv': IVprime}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
response.close()
What do I need to change in the above two bits of code so that it will use the cookie to maintain state when pulling and POSTing the data?
You need to use the same opener for all requests for this to work. So instead of:
response = urllib2.urlopen(req)
use:
response = opener.open(req)
Mandatory note in these cases: consider using the excellent requests library

how to encode POST data using python's urllib

I'm trying to connect to my Django server using a client side python script. Right now I'm just trying to connect to a view and retrieve the HttpResponse. The following works fine
import urllib2
import urllib
url = "http://localhost:8000/minion/serve"
request = urllib2.Request(url)
response = urllib2.urlopen(request)
html = response.read()
print html
However if I change it to
import urllib2
import urllib
url = "http://localhost:8000/minion/serve"
values = {'name': 'Bob'}
data = urllib.urlencode(values)
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)
html = response.read()
print html
I get urllib2.HTTPError: HTTP Error 500: INTERNAL SERVER ERROR. Am I doing something wrong here? Here are the tutorials I was trying to follow, http://techmalt.com/?p=212 http://docs.python.org/2/howto/urllib2.html.
EDIT: tried to make the following change as per That1Guy's suggestion (other lines left the same)
request = urllib2.Request(url, data)
response = urllib2.urlopen(request, data)
This returned the same error message as before.
EDIT: It seems to work if I change the page I'm viewing so the error isn't in the client side script. So in light of that revelation, here's the server side view that's being accessed:
def serve(request):
return HttpResponse("You've been served!")
As you can see, it's very straight forward.
EDIT: Tested to see if Internal Error might be caused by missing csrf token, but csrf_exempt decorator failed to resolve error.
Finally figured it out with this. How to POST dictionary from python script to django url?
Turns out my url was missing a trailing slash. It's the little things that always get you I guess.

using python urlopen for a url query

Using urlopen also for url queries seems obvious. What I tried is:
import urllib2
query='http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627'
f = urllib2.urlopen(query)
s = f.read()
f.close()
However, for this specific url query it fails with HTTP error 403 forbidden
When entering this query in my browser, it works.
Also when using http://www.httpquery.com/ to submit the query, it works.
Do you have suggestions how to use Python right to grab the correct response?
Looks like it requires cookies... (which you can do with urllib2), but an easier way if you're doing this, is to use requests
import requests
session = requests.session()
r = session.get('http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627')
This is generally a much easier and less-stressful method of retrieving URLs in Python.
requests will automatically store and re-use cookies for you. Creating a session is slightly overkill here, but is useful for when you need to submit data to login pages etc..., or re-use cookies across a site... etc...
using urllib2 is something like
import urllib2, cookielib
cookies = cookielib.CookieJar()
opener = urllib2.build_opener( urllib2.HTTPCookieProcessor(cookies) )
data = opener.open('url').read()
It appears that the urllib2 default user agent is banned by the host. You can simply supply your own user agent string:
import urllib2
url = 'http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627'
request = urllib2.Request(url, headers={"User-Agent" : "MyUserAgent"})
contents = urllib2.urlopen(request).read()
print contents

Categories

Resources