I'm receiving intermittent blank pages on my appengine python website. Typically these come when a new process is started or when I flush the cache. There is a single white page served and once that has served everything is fine.
It's basically the same error as here:
http://groups.google.com/group/google-appengine/browse_thread/thread/c072383dc970e450
However, I have double and triple checked that I have the correct code on my python file (the following is copied and pasted):
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
Here is an example response in the logs that generated the blank page:
01-02 04:46AM 48.539 / 200 188ms 570cpu_ms 383api_cpu_ms 0kb
Mozilla/5.0 (Windows; U; Windows NT
6.0; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.19
Safari/534.13,gzip(gfe),gzip(gfe),gzip(gfe)
86.164.42.252 - tjcritchlow [02/Jan/2011:04:46:48 -0800] "GET /
HTTP/1.1" 200 124 - "Mozilla/5.0
(Windows; U; Windows NT 6.0; en-US)
AppleWebKit/534.13 (KHTML, like Gecko)
Chrome/9.0.597.19
Safari/534.13,gzip(gfe),gzip(gfe),gzip(gfe)"
"www.7bks.com" ms=188 cpu_ms=570
api_cpu_ms=383 cpm_usd=0.016028
I 01-02 04:46AM 48.724
Saved; key: appstats:008500, part: 82 bytes, full: 92081 bytes,
overhead: 0.001 + 0.005; link:
http://www.7bks.com/stats/details?time=1293972408543
Any suggestions welcome on how I might debug further or solve this issue.
I have a couple of different python files, here's the handlers from my app.yaml. But I've checked all of them to ensure they all have the correct if name code at the bottom.
handlers:
- url: /admin/.*
script: admin.py
login: admin
- url: /googleanalytics/
script: googleanalytics.py
login: admin
- url: /cleanupsessions/
script: cleanupsessions.py
login: admin
- url: /robots.txt
static_files: robots.txt
upload: robots.txt
- url: /favicon.ico
static_files: images/favicon.ico
upload: images/favicon.ico
- url: /images
static_dir: images
- url: /css
static_dir: css
- url: /jquery
static_dir: jquery
- url: /.*
script: 7books.py
error_handlers:
- file: customerror.html
Could the issue be with one of the libraries I'm importing? Should I check all of them to ensure they all have the name code?
Regarding the blank page issue, debugging that might be a bit involved unless you can narrow down the problem to a particular piece of your code or the GAE stack.
Appstats is a tool for GAE that hooks into each web request (as WSGI middleware) and records performance, debugging, and some other information from requests, aggregating them in an administrative interface which you can access through your GAE admin site. It's a nice way of seeing tracebacks for errors, and monitoring your site for errors (which show up as yellow boxed "E" line items in the request listing once you've installed appstats).
First, you should set up Appstats by following the directions at:
http://code.google.com/appengine/docs/python/tools/appstats.html
...which will tell you loads of useful information about every
request, especially ones with errors, including a Python traceback
containing what went wrong and where in the call stack, just like you're used to when debugging errors with your code locally using the Google App Engine Launcher or even just Python CLI or iPython.
Then, next time you see a blank page, hop over to your admin page or
to appstats, log in, and you'll have a traceback that will tell you
where in your code something broke.
Most likely, this is just something returning prematurely or an
unhandled edge case somewhere in your code, but it could just as
easily be cosmic rays until you have appstats to look at your request
logs and debug the error.
Once you've installed Appstats and reproduced the problem, paste the traceback above, (remember to remove any user information such as passwords that may be in the traceback), and I'll revise this response to match.
i just resolved this after lots of frustration.
i had renamed my script handler to 'site.py', renaming it back to 'main.py' resolved everything.
My best guess is that the stock python 'site' module was dominant on the classpath over my site.py. if someone knows more, please leave a comment here.
Make sure your python indentation is consistent. For instance, either use tabs or 4 spaces or 2 spaces... don't mix them within the same function. This solved it for me! Which is annoying since a tab and 4 spaces look exactly the same to a human.
Related
I recently launched a web application based Django, and have been very pleased with its results. I also turned on a feature in Django where you can have emails sent to MANAGERS for 404's by adding the middleware 'django.middleware.common.BrokenLinkEmailsMiddleware'. However, ever since I did that, I'm getting LOTS of spam requests hitting 404s. I'm not sure if they are bots or what but this is the information I'm getting from Django:
Referrer: http://34.212.239.19/index.php
Requested URL: /index.php
User agent: Mozilla/5.0 (Windows; U; Windows NT 6.0;en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6)
IP address: 172.31.23.16
Why am I getting requests to URL's that don't exist on my site and is there a way to filter out requests so I don't get emails for them? These URL's have never existed on my site (my site is very recently launched). I'm getting roughly 50-100 emails a day from spam requests to my site.
I can't imagine an automated way of filtering out spam as a non-existent URL is indistinguishable from a spam URL, but you can filter out usual suspects using IGNORABLE_404_URLS:
List of compiled regular expression objects describing URLs that should be ignored when reporting HTTP 404 errors via email (see Error reporting). Regular expressions are matched against request's full paths (including query string, if any). Use this if your site does not provide a commonly requested file such as favicon.ico or robots.txt.
For example:
import re
IGNORABLE_404_URLS = [
re.compile(r'\.(php|cgi)$'),
re.compile(r'^/phpmyadmin/'),
]
I'm dusting off an app that worked a few months ago. I've made no changes. Here's the code in question:
result = urlfetch.fetch(
url=url,
deadline=TWENTY_SECONDS)
if result.status_code != 200: # pragma: no cover
logging.error('urlfetch failed.')
logging.error('result.status_code = %s' % result.status_code)
logging.error('url =')
logging.error(url)
Here's the output:
WARNING 2015-04-20 01:13:46,473 urlfetch_stub.py:118] No ssl package found. urlfetch will not be able to validate SSL certificates.
ERROR 2015-04-20 01:13:46,932 adminhandlers.py:84] urlfetch failed. url =
ERROR 2015-04-20 01:13:46,933 adminhandlers.py:85] http://www.stubhub.com/listingCatalog/select/?q=%2Bevent_date%3A%5BNOW%20TO%20NOW%2B1DAY%5D%0D%0A%2BancestorGeoDescriptions:%22New%20York%20Metro%22%0D%0A%2BstubhubDocumentType%3Aevent&version=2.2&start=0&rows=1&wt=json&fl=name_primary+event_date_time_local+venue_name+act_primary+ancestorGenreDescriptions+description
When I use a different url, e.g., "http://www.google.com/", the fetch succeeds.
When I paste the url string from the output into Chrome I get this response, which is the one I'm looking for:
{"responseHeader":{"status":0,"QTime":19,"params":{"fl":"name_primary event_date_time_local venue_name act_primary ancestorGenreDescriptions description","start":"0","q":"+event_date:[NOW TO NOW+1DAY]\r\n+ancestorGeoDescriptions:\"New York Metro\"\r\n+stubhubDocumentType:event +allowedViewingDomain:stubhub.com","wt":"json","version":"2.2","rows":"1"}},"response":{"numFound":26,"start":0,"docs":[{"act_primary":"Waka Flocka Flame","description":"Waka Flocka Flame Tickets (18+ Event)","event_date_time_local":"2015-04-20T20:00:00Z","name_primary":"Webster Hall","venue_name":"Webster Hall","ancestorGenreDescriptions":["All tickets","Concert tickets","Artists T - Z","Waka Flocka Flame Tickets"]}]}}
I hope I'm missing something simple. Any suggestions?
Update May 30, 2015
Anzel's suggestion of Apr 23 was correct. I need to add a user agent header. The one supplied by the AppEngine dev server is
AppEngine-Google; (+http://code.google.com/appengine)
The one supplied by hosted AppEngine is
AppEngine-Google; (+http://code.google.com/appengine; appid: s~MY_APP_ID)
The one supplied by requests.get() in pure Python (no AppEngine) on MacOS is
python-requests/2.2.1 CPython/2.7.6 Darwin/14.3.0
When I switch in the Chrome user agent header all is well in pure Python. Stubhub must have changed this since I last tried it. Curious that they would require an interactive user agent for a service that emits JSON, but I'm happy they offer the service at all.
When I add that header in AppEngine, though, AppEngine prepends it to its own user-agent header. Stubhub then turns down the request.
So I've made some progress, but have not yet solved my problem.
FYI:
In AppEngine I supply the user agent like this:
result = urlfetch.fetch(
url=url,
headers = {'user-agent': USER_AGENT_STRING}
)
This is a useful site for determining the user agent string your code or browser is sending:
http://myhttp.info/
I don't have priveledges yet to post comments, so here goes.
Look at the way you are entering the URL into the var 'url'. Is it already encoded as the error message says? I would try to make sure the url is a regular, non-encoded one, and test that, perhaps the library is re-encoding it again, causing problems. If you could give us more surrounding code, that may help in our diagnosis.
I'm trying to debug a "POST" request error but I do not have enough information. Thus I need help to figure out more.
I get the following error in my tail -a. This is the only thing it displays in tail and of inside the log itself. I assume that tail does not have -v for verbose.
==> python/logs/access_log-20131102-000000-EST <==
85.75.241.1 - - [02/Nov/2013:09:09:47 -0400] "POST /dajaxice/async.store_event/ HTTP/1.1" 500 16516 "http://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36"
(I changed the example.com above with the original)
Where should I search to get additional information about this 500 error in the log files? Can I force python tell more?
In local server I get the following which does not tell something particular either.
[02/Nov/2013 14:22:15] "POST /dajaxice/async.store_event/ HTTP/1.1" 200 24
Finally are the codes 16516 and 24 tell me something particular in 500 16516 and 200 24 respectively? I know that 500/200 are the http codes but what are the others?
You're looking in the access log. Errors, not surprisingly, are logged in the error log - you should look there for more detail.
(The second value is the number of bytes in the response.)
I know, one should not use debug = Trueon a live server, but if it is the only way to hunt down a bug, ou should consider swithing it on for a few minutes to get more information.
Furthermore, the django debug-toolbar can be of help, e.g. it can disply addidtional logging messages which are not written to file but raised using
import logging
logger = logging.getLogger(__name__) # Get an instance of a logger __name__ will be your app name
and use e.g.
logger.debug(str(form.cleaned_data))
Or you write your own logger:
"""Logger to File"""
file_logger = logging.getLogger("file_logger")
file_logger.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(message)s')
handler = logging.handlers.RotatingFileHandler(os.path.join(MEDIA_ROOT, "log", "filelogger_log.log"), maxBytes=10000000, backupCount=5)
handler.setFormatter(formatter)
file_logger.addHandler(handler)
HTML/CSS files load normally in my browser when doing a request on my GAE website, but .js files are insanely slow (all > 1s). The .js is in the static folder (just like the .css files).
The confusing part is that Chrome/Firefox report that it is "Waiting" for the whole time, but GAE's log shows a really fast request.
Full size
The handler for js is identical to css, in app.yaml:
- url: /(.*\.css)
mime_type: text/css
static_files: static/\1
upload: static/(.*\.css)
- url: /(.*\.js)
mime_type: text/javascript
static_files: static/\1
upload: static/(.*\.js)
Edit, more info:
runtime: python27
api_version: 1
threadsafe: yes
libraries:
- name: django
version: "1.2"
- name: webapp2
version: 2.5.1
There is no concurrent requests going on for this test as I've requested the file manually from the address bar (as opposed to let the browser request it from its reference from the html).
There is no cron jobs/tasks happening at that time.
I don't see a new instance created, there is just one always available apparently (according to the Instance chart on the dashboard and to the log not showing "... has caused the creation of a new instance").
The request is done directly on my .appspot.com sub domain.
All my test are done with CTRL+SHIFT+R, the response is always a 200 (not from cache, not 304 unchanged).
Results are the same when running in Incognito mode.
I really wonder what's going on and where is the time actually spent.
As I am typing this, I made some tests by copying /static/main.js into several new file names and folder:
/static/main.css, request /main.css takes 180ms.
/static/css/main.css, request /css/main.css takes 180ms.
/static/css/main.js, request /css/main.js takes 1s.
Now for some reason, .css loads much faster than .js. But that's still not the 12ms reported by GAE's log.
Here are the requests / response headers:
Main.js:
Main.css:
The only difference I see is that, other than the extension, the css has Transfer-Encoding: chunked, while the js has Content-Length: 7930 instead.
I am having a problem with the AppEngine serving me a blank page every time the instance is loaded.
It is similar problem already covered here, however, there was no constructive solution mentioned that would help my case.
As there are no errors in the console, this one seems to be extremely hard to debug and I frankly do not know how to start. I have not solution mentioned in the previous post as I do not want to rename any of my files and I want to keep my code organised as it is now.
Lastly, to ask a broader question - what is the difference the first and subsequent requests are handled for each of the instances? Is there anything a developer should be aware of?
So what exactly happens - if the appengine's code is loaded for the first time, I get a blank page. Every subsequent request is fine. I think I have all the required parts of the code there, just in case here are the handlers from app.yaml (also, the app is using python27 as a platform):
handlers:
- url: /favicon\.ico
static_files: favicon.ico
upload: favicon\.ico
- url: /fb/.*
script: fbhandler.py
- url: /xarpc/.*
script: xmlrpchandler.py
- url: /
script: main.py
- url: .*
script: main.py
The handler in question is the fbhandler.py.
def main():
application = webapp.WSGIApplication([("/fb/", FBHandler)], debug=True)
util.run_wsgi_app(application)
if __name__ == '__main__':
main()
If I examine the appengine logs, I see this for the "blank" request:
2012-05-07 10:43:14.822 /fb/?id=341108955956205 200 2998ms 0kb Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.168 Safari/535.19
95.23.245.xx - - [07/May/2012:03:43:14 -0700] "POST /fb/?id=xxx HTTP/1.1" 200 0 - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.168 Safari/535.19" "xxx.appspot.com" ms=2999 cpu_ms=563 api_cpu_ms=0 cpm_usd=0.015744 loading_request=1 instance=00c61b117cb0ca2fc61adc3939c4bd034dfa416f
I 2012-05-07 10:43:14.822
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
Compared to this for any good subsequent request:
2012-05-07 10:44:05.479 /fb/?id=341108955956205 200 832ms 0kb Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.168 Safari/535.19
95.23.245.xx - - [07/May/2012:03:44:05 -0700] "POST /fb/?id=xxx HTTP/1.1" 200 483 - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.168 Safari/535.19" "xxx.appspot.com" ms=832 cpu_ms=1272 api_cpu_ms=1078 cpm_usd=0.035497 instance=00c61b117cb0ca2fc61adc3939c4bd034dfa416f
As you can see, no obvious error messages anywhere.
Update:
Upon advice, I have removed the main() to try to rule out caching, now the handler looks like:
application = webapp.WSGIApplication([("/fb/", FBHandler)], debug=True)
util.run_wsgi_app(application)
Same effect, first load gets blank page and subsequent loads get proper data. What I have noticed that it does not matter if a new code is uploaded or not, if an instance is running and I update the code, it shows properly on the first load. I need to go into instances admin and shut one down to reproduce the error.
Update 2:
I have now tried changing the handlers to the new python27 way:
- url: /fb/.*
script: fbhandler.app
And also corresponding change to the fbhandler
import webapp2
app = webapp2.WSGIApplication([("/fb/", FBHandler)])
Again, same issue. I have also tried removing the debug parameter with no effect.
You are experiencing this behavior due to the way App Engine caches your CGI handler.
If you look in the runtime documentation (CGI Handler Scripts Can Also Be Cached
section) you can see that:
You can tell App Engine to cache the CGI handler script itself, in addition to imported modules. If the handler script defines a function named main(), then the script and its global environment will be cached like an imported module. The first request for the script on a given web server evaluates the script normally. For subsequent requests, App Engine calls the main() function in the cached environment.
To be sure that your handler works properly even during a instance startup be sure to define your WSGI application as a global variable (outside of main).
SOLUTION
I now seem to have found a solution. When I upgraded all of the application handlers to the webapp2 and used the new way of referencing them, it now seems to work without the first load error.