google search error - python

I don't understand why this code works:
import urllib2
url = urllib2.urlopen('http://www.google.fr/search?hl=en&q=voiture').read()
print url
and not this one :
import urllib2
url = urllib2.urlopen('http://www.google.fr/search?hl=en&q=voiture&start=2&sa=N').read()
print url
it displays the following error:
**urllib2.HTTPError: HTTP Error 403: Forbidden**
Thanks ;)

Related

how to handle 30x when using feedparser to parse rss url

Now I am using Python 3 feedparser to parse some RSS url, this is my code:
if __name__ == "__main__":
try:
feed = feedparser.parse("https://ucw.moe/feed/rss")
print(feed.status)
except Exception as e:
logger.error(e)
but I get this error:
HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Found
what should I do to fix this problem?
Use requests to get the feed before like:
import requests
import feedparser
page = requests.get("https://ucw.moe/feed/rss")
print(page.status_code)
feed = feedparser.parse(page.content)

Not able to get correct response using requests.session()

I tried to scrape a booking page of a movie in bookmyshow site. I am using requests library for this. My code is:
import requests
from bs4 import BeautifulSoup
rs = s.get("https://in.bookmyshow.com/serv/getData?cmd=GETSHOWINFO&vid=MOMV&ssid=1112")
print(rs.status_code)
rt = s.get("https://in.bookmyshow.com/serv/doSecureTrans.bms")
print(rt.status_code)
print(rs.text)
This is the code for getting the source code of the page, i am using. For the first page i am getting 200 as response and then for the second page is also giving 200 response. when i print the source code i am getting "invalid request" as output. what could be the error?

Python 3, urlopen - HTTP Error 403: Forbidden

I'm trying to download automatically the first image which appears in the google image search but I'm not able to read the website source and an error occurs ("HTTP Error 403: Forbidden").
Any ideas? Thank you for your help!
That's my code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
word = 'house'
r = urlopen('https://www.google.pl/search?&dcr=0&tbm=isch&q='+word)
data = r.read()
Apparently you have to pass the headers argument because the website is blocking you thinking you are a bot requesting data. I found an example of doing this here HTTP error 403 in Python 3 Web Scraping.
Also, the urlopen object didn't support the headers argument, so I had to use the Request object instead.
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
word = 'house'
r = Request('https://www.google.pl/search?&dcr=0&tbm=isch&q='+word, headers={'User-Agent': 'Mozilla/5.0'})
response = urlopen(r).read()

https get request with python urllib2

I am trying to fetch data from quandl using urllib2.Please check code below.
import json
from pymongo import MongoClient
import urllib2
import requests
import ssl
#import quandl
codes = [100526];
for id in codes:
url = 'https://www.quandl.com.com//api/v3/datasets/AMFI/"+str(id)+".json?api_key=XXXXXXXX&start_date=2013-08-30'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
data = response.read()
print data
OR
for id in codes:
url = "https://www.quandl.com.com//api/v3/datasets/AMFI/"+str(id)+".json?api_key=XXXXXXXX&start_date=2013-08-30"
request = requests.get(url,verify=False)
print request
I am getting HTTPERROR exception 404 in 1st case. and when I use request module I get SSL error even after using verify=false. I am looking through previous posts but most of them are related to HTTP request.
Thanks for help.
J
This is working for me, but you get a warning about the SSL certificate but you don't need to care about it.
import requests
codes = [100526];
for id in codes:
url = "https://www.quandl.com.com//api/v3/datasets/AMFI/"+str(id)+".json?api_key=XXXXXXXX&start_date=2013-08-30"
request = requests.get(url, verify=False)
print request.text
request.text has your response data.
You seem to be using a wrong URL (.com.com instead of .com) as well as a combination of different quotes in the first version of your code. Use the following instead and it should work:
import urllib2
import requests
codes = [100526]
for id in codes:
url = "https://www.quandl.com//api/v3/datasets/AMFI/"+str(id)+".json?start_date=2013-08-30"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
print response.read()
for id in codes:
url = "https://www.quandl.com//api/v3/datasets/AMFI/"+str(id)+".json?start_date=2013-08-30"
response = requests.get(url,verify=False)
print response.text
To disable the warning about the SSL certificate, use the following code before making the request using requests:
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

I can't get a html page with requests

I would like to get an html page and read the content. I use requests (python) and my code is very simple:
import requests
url = "http://www.romatoday.it"
r = requests.get(url)
print r.text
when I try to do this procedure I get ever:
Connection aborted.', error(110, 'Connection timed out')
If I open the url in a browser all work well.
If I use requests with other url all is ok
I think is a "http://www.romatoday.it" particularity but I don't understand what is the problem. Can you help me please?
Maybe the problem is that the comma here
>> url = "http://www.romatoday,it"
should be a dot
>> url = "http://www.romatoday.it"
I tried that and it worked for me
Hmm..Have you tried other packages, not 'requests'?
the code blow is same result as your code.
import urllib
url = "http://www.romatoday.it"
r = urllib.urlopen(url)
print r.read()
a picture that I captured after running your code.

Categories

Resources