Need a solution urllib2 - python

Im working with url lib2 and I need a help.
When I get the information I need from the website, it works fine, but if the info on the website changed, the result still the same, Im thinking that I have to find a way of cleaning up the "cache" or the "lib.close" ... I don't know... Could someone help me out with that please? Thank you
Here is the code:
import urllib2
url = 'http://website.com'
response = urllib2.urlopen(url)
webContent = response.read()
string = webContent.find('***')
alert = webContent[string+11:]
webContent = alert
string = webContent.find('***')
alert = webContent[:string]
alert = alert.replace('</strong>',' ')
print alert

urllib2 does not do caching. Either a HTTP Proxy is involved or the caching happens server-side.
Check the response headers. X-Cache or X-Cache-Lookup would mean that you are connected through a proxy.

Related

Trouble logging in to website using python requests module

I'm trying to login to the Starbucks website (login url: https://app.starbucks.com/account/signin?ReturnUrl=https%3A%2F%2Fapp.starbucks.com%2Fprofile) with no success.
I used the firefox inspect tool to find out the url i am supposed to send a POST request to and how should the payload data look like and i found out that the request url is "https://www.starbucks.com/bff/account/signin" and the payload is something like : "{"username": "my_username","password":"my_password"}, so here's my code:
import requests
url = 'https://www.starbucks.com/bff/account/signin'
uname = "my_username"
pwd = "my_password"
payload = {"username":uname, "password":pwd}
with requests.Session() as s:
p = s.post(url,data=payload)
print(p.status_code)
The status_code that is printed is always 200, which is strange because whenever i type invalid credentials manually, on the network tab of the inspect tool i get a 400 response code. And also, whenever i do print(p.content) instead of printing the status code, the content is always the same (both for wrong and correct credentials).
Can somebody help me out?
Thanks in advance

How can i add cookie in headers?

i want to automation testing tool using api.
at first, i login the site and get a cookie.
my code is python3
import urllib
import urllib3
from bs4 import BeautifulSoup
url ='http://ip:port/api/login'
login_req = urllib.parse.urlencode(login_form)
http = urllib3.PoolManager()
r= http.request('POST',url,fields={'userName':'id','password':'passoword'})
soup = BeautifulSoup(r.data.decode('utf-8'),'lxml')
cookie = r.getheaders().get('Set-Cookie')
str1 = r.getheaders().get('Set-Cookie')
str2 = 'JSESSIONID' +str1.split('JSESSIONID')[1]
str2 = str2[0:-2]
print(str2)
-- JSESSIONID=df0010cf-1273-4add-9158-70d817a182f7; Path=/; HttpOnly
and then, i add cookie on head another web site api.
but it is not working!
url2 = 'http://ip:port/api/notebook/job/paragraph'
r2 = http.request('POST',url2)
r2.headers['Set-Cookie']=str2
r2.headers['Cookie']=str2
http.request('POST',url2, headers=r2.headers)
why is not working? it shows another cookie
if you know this situation, please explain to me..
error contents is that.
HTTP ERROR 500
Problem accessing /api/login;JSESSIONID=b8f6d236-494b-4646-8723-ccd0d7ef832f.
Reason: Server Error
Caused by:</h3><pre>javax.servlet.ServletException: Filtered request failed.
ProtocolError: ('Connection aborted.', BadStatusLine('<html>\n',))
thanks a lot!
Use requests module in python 3.x. You have to create a session which you are not doing now that's why you are facing problems.
import requests
s=requests.Session()
url ='http://ip:port/api/login'
r=s.get(url)
dct=s.cookies.get_dict() #it will return cookies(if any) and save it in dict
Take which ever cookie is wanted by the server and all the headers requested and pass it in header
jid=dct["JSESSIONID"]
head = {JSESSIONID="+jid,.....}
payload = {'userName':'id','password':'passoword'}
r = s.post(url, data=payload,headers=head)
r = s.get('whatever url after login')
To get info about which specific headers you have to pass and all the parameters required for POST
Open link in google chrome.
Open Developers Console(fn + F12).
There search for login doc (if cannot find, input wrong details and submit).
You will get info about request headers and POST parameters.

I can't get a html page with requests

I would like to get an html page and read the content. I use requests (python) and my code is very simple:
import requests
url = "http://www.romatoday.it"
r = requests.get(url)
print r.text
when I try to do this procedure I get ever:
Connection aborted.', error(110, 'Connection timed out')
If I open the url in a browser all work well.
If I use requests with other url all is ok
I think is a "http://www.romatoday.it" particularity but I don't understand what is the problem. Can you help me please?
Maybe the problem is that the comma here
>> url = "http://www.romatoday,it"
should be a dot
>> url = "http://www.romatoday.it"
I tried that and it worked for me
Hmm..Have you tried other packages, not 'requests'?
the code blow is same result as your code.
import urllib
url = "http://www.romatoday.it"
r = urllib.urlopen(url)
print r.read()
a picture that I captured after running your code.

how to encode POST data using python's urllib

I'm trying to connect to my Django server using a client side python script. Right now I'm just trying to connect to a view and retrieve the HttpResponse. The following works fine
import urllib2
import urllib
url = "http://localhost:8000/minion/serve"
request = urllib2.Request(url)
response = urllib2.urlopen(request)
html = response.read()
print html
However if I change it to
import urllib2
import urllib
url = "http://localhost:8000/minion/serve"
values = {'name': 'Bob'}
data = urllib.urlencode(values)
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)
html = response.read()
print html
I get urllib2.HTTPError: HTTP Error 500: INTERNAL SERVER ERROR. Am I doing something wrong here? Here are the tutorials I was trying to follow, http://techmalt.com/?p=212 http://docs.python.org/2/howto/urllib2.html.
EDIT: tried to make the following change as per That1Guy's suggestion (other lines left the same)
request = urllib2.Request(url, data)
response = urllib2.urlopen(request, data)
This returned the same error message as before.
EDIT: It seems to work if I change the page I'm viewing so the error isn't in the client side script. So in light of that revelation, here's the server side view that's being accessed:
def serve(request):
return HttpResponse("You've been served!")
As you can see, it's very straight forward.
EDIT: Tested to see if Internal Error might be caused by missing csrf token, but csrf_exempt decorator failed to resolve error.
Finally figured it out with this. How to POST dictionary from python script to django url?
Turns out my url was missing a trailing slash. It's the little things that always get you I guess.

using python urlopen for a url query

Using urlopen also for url queries seems obvious. What I tried is:
import urllib2
query='http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627'
f = urllib2.urlopen(query)
s = f.read()
f.close()
However, for this specific url query it fails with HTTP error 403 forbidden
When entering this query in my browser, it works.
Also when using http://www.httpquery.com/ to submit the query, it works.
Do you have suggestions how to use Python right to grab the correct response?
Looks like it requires cookies... (which you can do with urllib2), but an easier way if you're doing this, is to use requests
import requests
session = requests.session()
r = session.get('http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627')
This is generally a much easier and less-stressful method of retrieving URLs in Python.
requests will automatically store and re-use cookies for you. Creating a session is slightly overkill here, but is useful for when you need to submit data to login pages etc..., or re-use cookies across a site... etc...
using urllib2 is something like
import urllib2, cookielib
cookies = cookielib.CookieJar()
opener = urllib2.build_opener( urllib2.HTTPCookieProcessor(cookies) )
data = opener.open('url').read()
It appears that the urllib2 default user agent is banned by the host. You can simply supply your own user agent string:
import urllib2
url = 'http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627'
request = urllib2.Request(url, headers={"User-Agent" : "MyUserAgent"})
contents = urllib2.urlopen(request).read()
print contents

Categories

Resources