In selenium.py, How to deal with 404 status code

In selenium.py, How to deal with 404 status code - python

I'm writing a selenium script by python. Something that I found out, is that when selenium gets 404 status code. it crashes. What is the best way to deal with it?

I had a similar problem. Sometimes a server we were using (i.e., not the main server we were testing, only a "sub-server") throughout our tests would crash. I added a minor sanity test to see if the server is up or not before the main tests ran. That is, I performed a simple GET request to the server, surrounded it with try-catch and if that passed I continue with the tests. Let me stress out this point- before i even started selenium i would perform a GET request using python's urllib2. It's not the best of solutions but it's fast it was enough for me.

Related

Python SimpleHTTPServer keeps going down and I don't know why

This is my first time working with SimpleHTTPServer, and honestly my first time working with web servers in general, and I'm having a frustrating problem. I'll start up my server (via SSH) and then I'll go try to access it and everything will be fine. But I'll come back a few hours later and the server won't be running anymore. And by that point the SSH session has disconnected, so I can't see if there were any error messages. (Yes, I know I should use something like screen to save the shell messages -- trying that right now, but I need to wait for it to go down again.)
I thought it might just be that my code was throwing an exception, since I had no error handling, but I added what should be a pretty catch-all try/catch block, and I'm still experiencing the issue. (I feel like this is probably not the best method of error handling, but I'm new at this... so let me know if there's a better way to do this)
class MyRequestHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):
# (this is the only function my request handler has)
def do_GET(self):
if 'search=' in self.path:
try:
# (my code that does stuff)
except Exception as e:
# (log the error to a file)
return
else:
SimpleHTTPServer.SimpleHTTPRequestHandler.do_GET(self)
Does anyone have any advice for things to check, or ways to diagnose the issue? Most likely, I guess, is that my code is just crashing somewhere else... but if there's anything in particular I should know about the way SimpleHTTPServer operates, let me know.

I never had SimpleHTTPServer running for an extended period of time usually I just use it to transfer a couple of files in an ad-hoc manner, but I guess that it wouldn't be so bad as long as your security restraints are elsewhere (ie firewall) and you don't have need for much scale.
The SSH session is ending, which is killing your tasks (both foreground and background tasks). There are two solutions to this:
Like you've already mentioned use a utility such as screen to prevent your session from ending.
If you really want this to run for an extended period of time, you should look into your operating system's documentation on how to start/stop/enable services (now-a-days most of the cool kids are using systemd, but you might also find yourself using SysVinit or some other init system)
EDIT:
This link is in the comments, but I thought I should put it here as it answers this question pretty well

Django Selenium test fails sometimes in Travis CI

My team has puzzled at this issue on and off for weeks now. We have a test suite using LiveServerTestCase that runs all the Selenium-based tests that we have. One test in particular will seemingly randomly fail for no reason sometimes--I could change a comment in a different file and the test would fail. Changing some other comment would fix the test again. We are using the Firefox webdriver for the Selenium tests:
self.driver = Firefox()
Testing locally inside our Docker container can never reproduce the error. This is most likely due to the fact that when tests.py is run outside of Travis CI, a different web driver is used than Firefox(). The web driver instead is as such:
self.driver = WebDriver("http://selenium:4444/wd/hub", desired_capabilities={'browserName':'firefox'})
For local testing, we use a Selenium container.
The test that fails is a series of sub-tests that each tests a filtering search feature that we have; each sub-test is a different filter query. The sequence of each sub-test is:
Find the filter search bar element
Send the filter query (a string, i.e. something like "function = int main()")
Simulate the browser click to execute the query
For the specific filter on the set of data (the set of data is consistent throughout the subtests), assert that the length of the returned results matches what is expected for that specific filter
Very often this test will pass when run in Travis CI, and as noted before, this test always passes when run locally. The error cannot be reproduced when interacting with the site manually in a web browser. However, once in a while, this sort of error will appear in the test output in Travis CI:
- Broken pipe from ('127.0.0.1', 39000)
- Broken pipe from ('127.0.0.1', 39313)
39000 and 39313 are not always the numbers--these change every time a new Travis CI build is run. These seem like port numbers, though I'm not really sure what they actually are.
We have time.sleep(sec) lines right before fetching the list of results for a filter. Increasing the sleep time usually will correlate with a temporary fix of the broken pipe error. However, the test is very fickle and changing the sleep time likely does not have much to do with fixing the error at all; there have been times where the sleep time has been reduced or taken out of a subtest and the test will pass. In any case, as a result of the broken pipe, the filter cannot get executed and the assertion fails.
One potentially interesting detail is that regardless of the order of subtests, it is always the first subtest that fails if the broken pipe error occurs. If, however, the first subtest passes, then all subtests will always pass.
So, my question is: what on earth is going on here and how do we make sure that this random error stops happening? Apologies if this is a vague/confusing question, but unfortunately that is the nature of the problem.

It looks like your issue may be similar to what this fellow was running into. It's perhaps an issue with your timeouts. You may want to use an explicit wait, or try waiting for a specific element to load before comparing the data. I had similar issues with my test where my Selenium test would try polling an image to see if it was present before the page had finished loading. Like I say, this may not be the same issue, but could potentially help. Goodluck!

I just ran into this myself, and this is caused by the django's built-in server not using python's logging system. This has been fixed in 1.10 but is not released yet at the time of writing. In my case it is acceptable to leave the messages in the log until it is time to upgrade; better than adding timeouts and increasing build time.
Django ticket on the matter
Code that's causing the issue in 1.9.x

A curious case of nginx uswgi python

We have a python MVC Web application built using (werkzeug, jinja2 and MongoEngine).
In production we have 4 nginx servers setup behind a nginx load balancer. All 4 servers share a common Mongo server, a Redis server and a Sphinx server.We are using uwsgi between nginx and the application.
Now to the curious case.
Once we deploy a new code, we do a touch xyz.wsgi. For a few hours everything looks fine.
but after that we randomly get the error.
'module' object is not callable
I have seen this error before, in other python development scenarios. But what confuses me this time is the total random behavior.
For Example example.com/multimedia?keywords=sdf&s=title&c=21830.
If we refresh the error is gone. Try another value for any parameter like 'keywords=xzy' and there it is again. Refresh its gone.
That 'multimedia' module is something we did just recently.So we can assume its the root cause. But why does the error occur randomly ?
My assumption is that, it might have something to do with nginx caching or existence of pyc/pyo ? Could a illicit Global Variable be the cause ?
Could you expert hands help me out.

The error probably occurs randomly because it's a runtime error in your code. That is, it doesn't get fired until a user visits your site with the right conditions to follow the code path that results in this error.
It's unlikely to be an nginx caching issue. If it was caching it, then it would probably return the same result over and over rather then change on reload.
However, you can test this by removing nginx and directly testing against werkzeug. Run the requests against it and see if you see the same behavior. No use in debugging Nginx unless you can prove that the underlying systems work the way you expect.
It's also probably worth the 30 seconds to search for module() in your code, since that's the most direct interpretation of that error message.

is there a cleanup phase in mod_wsgi?

I am trying to do some logging in Django (mod_wsgi) of a view. I however want to do this so that the client is not held up similar to the perlcleanuphandler phase available in mod_perl. Notice the line "It is used to execute some code immediately after the request has been served (the client went away)". This is exactly what I want.
I want to client to be serviced and then I want to do the logging. Is there a good insertion point for the code in mod_wsgi or Django ? I looked into suggestions here and here. However, in both cases when I put a simple time.sleep(10) and do a curl/wget on the url, the curl doesn't return for 10 secs.
I even tried to put the time.sleep in __del__ method in the HttpResponse Object as suggested in one of the comments, but still no dice.
I am aware that I can probably put the logging data onto a queue and do some backgroud processing to store the logs, but I would like to avoid that approach if there is an other simpler/easier approach.
Any suggestions ?

See documentation at:
http://code.google.com/p/modwsgi/wiki/RegisteringCleanupCode
for a WSGI specific (not mod_wsgi specific) way.
Django as pointed out by others may have its own ways of doing things as well, although whether it is fired after all the response content is written back to the client I don't know.

Python web-scraping threaded performance

I have a web app that needs both functionality and performance tested, and part of the test suite that we plan on using is already written in Python. When I first wrote this, I used mechanize as my means of web-scraping, but it seems to be too bulky for what I'm trying to do (either that or I'm missing something).
The basic layout of what I'm trying to do is as follows. All are objects.
User has Comm (used to be the interface between my stuff and mechanize)
Comm has Browser (holds my CookieJar, urllib2, and BeautifulSoup objects, used to be mechanize)
Browser has Form(s) (used to be mechanize-handled)
Now, as far as threading goes, I have that down. Adjustment between dealing with the GIL and having separate instances of Python running will be made as needed, but suggestions will be taken.
So what I need to do is thread users hitting the application and doing various things (logging in, filling out forms, submitting forms for processing, etc.) while not making the testing box scream too loudly. My current problem with mechanize seems to be RAM.
Part of what's causing the RAM issue is the need for separate browser instances for each user to keep from overwriting the JSESSIONID cookie every time I do something with a different user.
Much of this might seem trivial, but I'm trying to run thousands of threads here, so little tweaks can mean a lot. Any input is appreciated.

Threading causes problems with the GIL, more so with more cores. Try using mechanize with eventlet to achieve concurrency (via multiple processes) also check out multi-mechanize

Have you considered Twisted, the asynchronous library, for at least doing interaction with users?

I actually went without using mechanize and used the Threading module. This allowed for fairly quick transactions, and I also made sure not to have too much inside of each thread. Login information, and getting the webapp in the state necessary before I threaded helped the threads to run shorter and therefore more quickly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.