Error when trying to open a webpage with mechanize - python

I'm trying to learn mechanize to create a chat logging bot later, so I tested out some basic code
import mechanize as mek
import re
br = mek.Browser()
br.open("google.com")
However, whenever I run it, I get this error.
Traceback (most recent call last):
File "/home/runner/.local/share/virtualenvs/python3/lib/python3.7/site-packages/mechanize/_mechanize.py", line 262, in _mech_open
url.get_full_url
AttributeError: 'str' object has no attribute 'get_full_url'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 5, in <module>
br.open("google.com")
File "/home/runner/.local/share/virtualenvs/python3/lib/python3.7/site-packages/mechanize/_mechanize.py", line 253, in open
return self._mech_open(url_or_request, data, timeout=timeout)
File "/home/runner/.local/share/virtualenvs/python3/lib/python3.7/site-packages/mechanize/_mechanize.py", line 269, in _mech_open
raise BrowserStateError("can't fetch relative reference: "
mechanize._mechanize.BrowserStateError: can't fetch relative reference: not viewing any document
I double checked with the documentation on the mechanize page and it seems consistent. What am I doing wrong?

You have to use a schema, otherwise mechanize thinks you are trying to open a local/relative path (as the error suggests).
br.open("google.com") should be br.open("http://google.com").
Then you will see an error mechanize._response.httperror_seek_wrapper: HTTP Error 403: b'request disallowed by robots.txt', because google.com does not allow crawlers. This can be remedied with br.set_handle_robots(False) before open.

Related

Python Requests unable to get redirected URL

I have defined the following function to get the redirected URLs using Requests library. However i get the error KeyError: 'location'
def get_redirected_url(r_url):
r = requests.get(r_url, allow_redirects=False)
url = r.headers['Location']
return url
Calling the function
get_redirected_url('http://omgili.com/ri/.wHSUbtEfZQujfav8g98PjRMi_ogV.5EwBTfg476RyS2Gqya3tDAwNIv8Yi8wQ9AK4.U2mxeyq2_xbUjqsOx8NYY8r0qgxD.4Bm2SrouZKnrg1jqRxEfVmGbtTaKTaaDJtOjtS46fYr6A5UJoh9BYxVtDGJIsbSfgshRXR3FVr4-')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in get_redirected_url
File "/home/user/PycharmProjects/untitled/venv/lib/python3.6/site-packages/requests/structures.py", line 54, in __getitem__
return self._store[key.lower()][1]
KeyError: 'location'
Is it failing because the redirection waits for 5 seconds? If so, how do we incorporate that as well?
I have tried the other answers like this and this. But unable to crack it.
It is simple: r.headers doesn't have 'Location' key. You may have use the wrong key.
Edit: the site you want to browse with requests is protected.

python skype groupchat GET response error

I'm getting error in trying to communicate in a group chat via skype.
Traceback (most recent call last):
File "C:\Program Files\Python\Python38-32\ga.py", line 12, in
skc.recent()
File "C:\Program Files\Python\Python38-32\lib\site-packages\skpy\chat.py", line 452, in recent
info = self.skype.conn("GET", "{0}/threads/{1}".format(self.skype.conn.msgsHost, json.get("id")),
File "C:\Program Files\Python\Python38-32\lib\site-packages\skpy\conn.py", line 219, in call
raise SkypeApiException("{0} response from {1} {2}".format(resp.status_code, method, url), resp)
skpy.core.SkypeApiException: ('404 response from GET https://azwus1-client-s.gateway.messenger.live.com/v1/threads/19:*********************', )
[Finished in 7.483s]
I'm just trying to test to send a simple chat to a groupchat but I'm not sure what is causing this error.
currently using this code
ch = sk.chats["19:***********************#thread.skype"] //due to privacy issue, i cant display the id
ch.sendMsg("testing")
whereas if I use the code this way for creating a new conversation,
ch = sk.contacts["live#123"].chat
ch.sendMsg("testing")
it will work.
Can someone enlighten me what is the issue with it? really appreciate a lot.

How to detect/trap error codes from convertapi so that my python app doesn't fail?

First, my apologies as a Python newbie that I'm asking this question. It probably has nothing at all to do with convertapi and more to do with my basic lack of understanding as to how to interact with APIs.
I'm reading a Google sheet to find embedded hyperlinks containing references to files (PDF, html, whatever) and then using convertapi to get a txt version so that I can do content analysis based on existence, count and proximity of various terms.
My question has to do with the convertapi.convert failing because (in this case) it turns out convertapi thinks the PDF is invalid (because I have tested the file # convertapi.com and it returned a 5002 error). I don't dispute the file may be bad - all I want to do is detect that convertapi.convert can't convert the file so that I can ignore it and move on.
My python code has a small function:
def convert_PDF_to_text(inputfilename):
result = convertapi.convert('txt', { 'File': inputfilename }, from_format = 'pdf')
result.save_files('converted_pdf_files')
...and while it works fine for some inputs there is a particular URL PDF that results in this output (including my own messages from program):
about to call convertapi.convert with filename (https://www.epa.gov/sites/production/files/2016-06/documents/2016_policy_order_revision_6-10-16.pdf)
yes this is the specific file causing the problem: https://www.epa.gov/sites/production/files/2016-06/documents/2016_policy_order_revision_6-10-16.pdf
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/convertapi/client.py", line 46, in handle_response
r.raise_for_status()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://v2.convertapi.com/convert/pdf/to/txt?Secret=PIuLcqNVL8w4rc9Y
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./p1.py", line 244, in <module>
convert_PDF_to_text(source_URL)
File "./p1.py", line 63, in convert_PDF_to_text
result = convertapi.convert('txt', { 'File': inputfilename }, from_format = 'pdf')
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/convertapi/api.py", line 7, in convert
return task.run()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/convertapi/task.py", line 26, in run
response = convertapi.client.post(path, params, timeout = timeout)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/convertapi/client.py", line 16, in post
return self.handle_response(r)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/convertapi/client.py", line 49, in handle_response
raise ApiError(r.json())
convertapi.exceptions.ApiError: <exception str() failed>
I know it should be obvious just from the errors what I should check...but I'm too much of a newbie to Python and APIs to know how to decipher.
How do I test for errors so that my Python code doesn't abort?
Thanks in advance and again sorry for the basic question - yes I did search for answers and don't find anyone addressing my question, it's likely too simple...
All - disregard. I used try: & except: to manage this.

hubspot3 client and "too many retries" error

I'm trying to pull the details for a contact from hubspot using the recipient's email. I'm using the python3 client "hubspot3" (https://github.com/jpetrucciani/hubspot3).
Here's the code I'm submitting:
import requests
from hubspot3.contacts import ContactsClient
API_KEY=[myapikey]
client=ContactsClient(api_key=API_KEY,debug=True)
email='mytest#gmail.com'
client.get_contact_by_email(email)
The response:
WARNING:root:Too many retries for /contacts/v1/contact/email/nwnippy27+cb1#gmail.com/profile?hapikey=[myapikey]
Traceback (most recent call last):
File "hubspot_api_test.py", line 11, in <module>
client.get_contact_by_email(email)
File "/opt/virtual_env/hubspot-test/lib/python3.7/site-packages/hubspot3/contacts.py", line 38, in get_contact_by_email
"contact/email/{email}/profile".format(email=email), method="GET", **options
File "/opt/virtual_env/hubspot-test/lib/python3.7/site-packages/hubspot3/base.py", line 339, in _call
**options
File "/opt/virtual_env/hubspot-test/lib/python3.7/site-packages/hubspot3/base.py", line 245, in _call_raw
result = self._execute_request_raw(connection, request_info)
File "/opt/virtual_env/hubspot-test/lib/python3.7/site-packages/hubspot3/base.py", line 162, in _execute_request_raw
raise HubspotNotFound(result, request)
hubspot3.error.HubspotNotFound:
Hubspot Error
I'm reading this error as saying that the email address can't be found. Is that correct? If not, I appreciate any intel on the cause and solution.
OK ... so not super useful, but it turned out that this is just the error message you get when the email doesn't exist. After a few tries it gives up, which is why you get the "too many retries" error.

How to get active tab url from browser?

How do I get URL from current tab in browser using python? Using os.environ['REQUEST_URI'] gives an error.
The following is my code :
os.environ['REQUEST_URI']
and the error :
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
os.environ["REQUEST_URI"]
File "C:\Python27\lib\os.py", line 423, in __getitem__
return self.data[key.upper()]
KeyError: 'REQUEST_URI'
Any other alternatives are also welcome.
Alternate to your asked way by "os.environ", you can do it:
by copy and past by hand
on other ways than you asked way by python
by Brotab
by Selenium
by xdotool
by javascript from serversite or by code injection from client side

Categories

Resources