Invalid username using snscrape for Twitter - python

When I enter a username that is clearly invalid, it returns a ValueError as expected. However, I can't seem to catch this error and do something about and the error is just shown on the terminal. This is the code I have so far.
import snscrape.modules.twitter as twitterScraper
scraper = twitterScraper.TwitterUserScraper("ksdbdbkvbdvvbdvbsdvbvbdskbksd")
try:
if scraper._get_entity():
print(True)
except ValueError:
print("Not found")
This should output "Not Found" but just outputs ValueError: Invalid username. Any help on how I can solve this? Thanks.

The problem is that the error is on line of scraper
You have to add this line inside try
Like this:
import snscrape.modules.twitter as twitterScraper
try:
scraper = twitterScraper.TwitterUserScraper("ksdbdbkvbdvvbdvbsdvbvbdskbksd")
scraper._get_entity()
except ValueError:
print(True)

Related

Selenium, Firefox, Error checking if an ID is on a page?

sorry if this is kind of vague I do not know how to explain it well but basically I am trying to run a function which checks if an ID is on a page and I do not know how to do it. Here is my code of what I've attempted so far.
def checkoutpage():
driver1.find_element_by_id('test')
try:
if checkoutpage == True:
print("Working")
else:
print("Not working")
except:
print("ERROR")
It returns Not working not matter if the ID is on the page or not, help is appreciated.

After timeout keep trying request

I've just started using Python to scrape the data. But my code as below freezes during work and I guess that's because some url did not response anything; I guess it would work if I just try that url again. My question here is, if I just revise the code like,
reshomee = requests.get(homeUrl, headers=headerss, timeout=10)
then does this code try that url again after 10 seconds with no response? I am just worried if it would be just over without trying again...?
I couldn't help asking this because I have no idea how to try this code since url freezes very rare and randomly. Thank you!
def reshome(tries=0):
try:
reshomee = requests.get(homeUrl, headers=headerss)
return reshomee
except Exception as e:
print(e)
if tries < 10:
print('try:' + str(tries))
sleep(tries*30+100)
return reshome(tries+1)
else:
print('cannot make it')
You can use requests.exceptions in the module.
def reshome(tries=0):
try:
reshomee = requests.get(homeUrl, headers=headerss, timeout=0.001)
return reshomee
except requests.exceptions.Timeout as e:
return reshome(tries+1)

How to bypass missing link and continue to scrape good data?

How to bypass missing link and continue to scrape good data?
I am using Python2 and Ubuntu 14.04.3.
I am scraping a web page with multiple links to associated data.
Some associated links are missing so I need a way to bypass the missing links and continue scraping.
Web page 1
part description 1 with associated link
part description 2 w/o associated link
more part descriptions with and w/o associcated links
Web page n+
more part descriptions
I tried:
try:
Do some things.
Error caused by missing link.
except Index Error as e:
print "I/O error({0}): {1}".format(e.errno, e.strerror)
break # to go on to next link.
# Did not work because program stopped to report error!
Since link is missing on web page can not use if missing link statement.
Thanks again for your help!!!
I corrected my faulty except error by following Python 2 documentation. Except correction jumped faulty web site missing link and continued on scraping data.
Except correction:
except:
# catch AttributeError: 'exceptions.IndexError' object has no attribute 'errno'
e = sys.exc_info()[0]
print "Error: %s" % e
break
I will look into the answer(s) posted to my questions.
Thanks again for your help!
Perhaps you are looking for something like this:
import urllib
def get_content_safe(url):
try:
contents = urllib.open(url)
return contents
except IOError, ex:
# Report ex your way
return None
def scrape:
# ....
content = get_content_safe(url)
if content == None:
pass # or continue or whatever
# ....
Long story short, just like Basilevs said, when you catch exception, your code will not break and will keep its execution.

Python mechanize is not handling form exception

I am writing a web scraper using Python and mechanize. The scraper looks for the "Next" button and loops until it comes to the last page, which does not have a "Next" button. That gives the FormNotFoundError: exception, which stops the loop. When I try to catch the exception, I get a NameError: instead of the actual error.
What am I doing wrong?
Alternatively, is there a better way to stop the loop when I have reached the end?
Here is the relevant code.
Import mechanize
br = mechanize.Browser()
br.open("http://example.com")
x=0
while x > 1:
try:
br.select_form(nr=2)
response = br.submit("next")
*otherstuff*
except FormNotFoundError:
break
Here is the error output.
File "scraping.py", line 32, in <module>
except FormNotFoundError:
NameError: name 'FormNotFoundError' is not defined
Can you try to change this to:
except mechanize._mechanize.FormNotFoundError:
instead of this:
except FormNotFoundError:

python if $_GET is empty do something & url path

import cgi
form = cgi.FieldStorage()
test = form['name'].value
if test is None:
print('empty')
else:
print ('Hello ' + test)
... and that doesn't seem to display anything when my url is something like .../1.py
if i set it to .../1.py?name=asd it will display Hello asd
also how to get everything after the question mark and after the domain name: for example if i try to access http://localhost/thisis/test i want to get /thisis/test.
edit: i tried to use try: and i couldn't get it working.
To answer the first part of my question, i found what the problem is and that's the correct code:
import cgi
form = cgi.FieldStorage()
try:
test = form['name'].value
except KeyError:
print('not found')
else:
print(test)
for my second question:
import os
print(os.environ["REQUEST_URI"])

Categories

Resources