What is the best way to monitor website traffic for a Google App Engine hosted website?
It's fairly trivial to put some code in each page handler to record each page request to the datastore, and now (thanks stackoverflow) I have the code to log the referring site.
There's another question on logging traffic using the datastore, but it doesn't consider other options (if there are any).
My concern is that the datastore is expensive. Is there another way? Do people typically implement traffic monitoring, or am I being over-zealous?
If I do implement traffic monitoring via the datastore, what fields are recommended to capture? What's good and/or common practise?
I'd go with: time-stamp; page; referer; IP address; username (if logged in). Any other suggestions?
All of the items you mention are already logged by the built-in App Engine logger. Why do you need to duplicate that? You can download the logs at regular intervals for analysis if you need.
People usually use Google Analytics (or something similar) as it does client-side tracking and gives more insight then server-side tracking.
If you only need server-side tracking then analysing logs should be enough. The problem with Log API is that it can be expensive because it does not do real querying: for every log search it goes thorough all logs (within range).
You might want to look at Mache, a tool that exports all GAE logs to Google BigQuery which has proper query functionality.
Another option would be to download logs and analyse them with a local tools. GAE logs are in Apache format so there are plenty of tools available.
You can use the logging module and that comes with a separate quota limit.
7 MBytes spanning 69 days (1% of the Retention limit)
I don't know what the limit is but that's a line from my app so it seems to be quite large.
You can then add to the log with
logging.debug("something to store")
if it does not already come with what you need, then read it out locally with:
appcfg.py --num_days=0 request_logs appname/ output.txt
Anything you write out via System.err.println (or the python equivalent) will automatically be appended to the app engine log. So, for example, you can create you own logging format, put println's on all your pages, and then download the log and grep for that format. So for example, if this is your format:
MYLOG:url:userid:urlparams
then download the log and pipe it through grep ^MYLOG and it would give you all the traffic for your site.
Related
I would like to automate this process of viewing logs in dashboard and typing the information (Total messages sent in a time period, total errors, CPU usage, memory usage), this task is very time consuming at the moment.
The info is gathered from mulesoft anypoint platform. I'm currently thinking of a way to extract all of the data using python webscraping but I don't know how to use it perfectly.
You'll find here a screenshot of the website i'm trying to get the data off of, you can choose to see the logs specific to a certain time and date. My question is, do I start learning python webscrapping or is there another way of doing things that I am just unaware of ?
Logs website example
It doesn't make any sense to use web scrapping. All services in Anypoint Platform have a REST API. Most of them are documented at https://anypoint.mulesoft.com/exchange/portals/anypoint-platform/. Scrapping may broke with any minor change to the UI. The REST API is stable.
The screenshot seems to be from Anypoint Monitoring. I see in the catalog Anypoint Monitoring Archive API. I'm not sure if the API for getting Monitoring Dashboards data is documented. You could alternatively use the older CloudHub Dashboards API. It is probably not exactly the same but it will approximate.
As a non-admin user of open-stack, I do want to obtain how many vms our of the total quota are running at a specific time.
I do want to monitor usage of such resources by writing a collectd plugin for it.
I observed that there are already two types of collect plugins related to open-stack but none of seems seem to address this simple use case: a user that wants to monitor his own usage of these resources.
collectd-openstack which seems not to be maintained and that seems to require admin rights, a deal-breaker limitation
collectd-ceilometer-plugin which is mostly the oppisitve thing: feeding data captured by collectd to ceilometer.
I don't care about the state of the entire cloud, I am interested only about usage inside my project.
How API should I use in order to obtain this informations? Funny, most of the information I need is already published on the web dashboard. Still, I need to capture it with python/collect in order to send it to other systems for processing.
You need use nova client API for that. Have a look at http://docs.openstack.org/developer/python-novaclient/api.html
I have a django application hosted on a server running on Apache + Ubuntu. I deployed the application using mod_wsgi. Is there any way to find out the number of visitors to my web site.
I realize that this query might have little to do with django and more do with the server. Any help would be appreciated.
Why not just use Google Analytics? You can easily monitor user behavior, traffic source, time spend on each page, etc.
If you really want to do this with Django you could write a context processor to record each request, but then you would have to write the user's IP and check if the user has not visited before and this would be incredibly imprecise since there might be different users sharing the same IP, etc.
How about using some free statistics provider like Statcounter or Google Analytics?
If you don't want to use Google Analytics or similar, but do it all yourself, you have two options:
One is to alter all views, if you are using class-based view then add a mixin (see this SO question for more information about mixins,) or if you are using the old function-based view you have to manually call another function to keep track.
The other alternative, and probably best one, is to write a middleware class, and keep track through that.
There's also this free and powerful Django app Chartbeat that you could try to work with.
Chartbeat provides real-time analytics to websites and blogs. It shows visitors, load times, and referring sites on a minute-by-minute basis. The service also provides alerts the second your website crashes or slows to a crawl.
https://django-analytical.readthedocs.io/en/latest/services/chartbeat.html
I am working on a Python App, which runs on App Engine. Is there a way I can publish the app on each customers' appSpot account, so that the App uses the users' cloud storage? Instead of running the App on my AppSpot account and all the users storing the data on my Cloud space?
Yes, absolutely.
You just need to have each client create an App Engine account with an application to which you have administrator access. You can adjust the settings on the application to forbid downloads of your code by the other administrators if that's appropriate for your agreement with the client. This also allows the clients to be billed directly for their instances' usage, and makes it completely impossible for data to leak between different clients' instances.
Using multiple applications for multiple clients who are licensing your application almost certainly does not violate part 4.4 of the TOS, although don't take this as legal advice.
No, you cannot do that. The app is hosted and run in the administrator's account which would be you. What you can do is, release the source code and point your users do install it in their appspot account, just like creating a new application.
I suppose it's not exactly what you need. But it can give you an idea where to go. Please check DryDrop project. There is small Python application you can ask each user to install on their account, then they can configure it to fetch your site files from your GitHub repo through webhooks functionality. I didn't try it, but, theoretically, you update your site, commit it to your repo, and all users get your updated application automatically. You can share your thoughts if that works for you.
Maybe. If it's an open source app that you're giving away, you can publish the source and instruct users to upload it to their own accounts.
If you're selling the app, displaying ads or otherwise trying to monetize the service, you probably want to stick with one instance. Using multiple instances to avoid paying for quota usage is direct violation of the App Engine TOS:
4.4. You may not develop multiple Applications to simulate or act as a
single Application or otherwise access
the Service in a manner intended to
avoid incurring fees.
No. Writing an application that deploys other applications is in violation of the terms of service.
Note we don't have any 'hard' limits - those limits that aren't billing enabled can be increased on application to us if you provide a reasonable use-case.
Hi My app had visitors and I like to analyze the log file. Can I run a log analyzer program on the log file that google app engine allows us to download? Are third-party programs such as webalizer and visitors compatible?
Thank you
You can get a lot of meaningful and derived data out of Appstats instead of downloading and analyzing your log file. You may want to try with those server side stats and see if those fit your bill.
I suggest you use Google Analytics on your web app. If you want to do some sort of server-side visitor analaytics (instead of the client-side Javascript that Google Analytics uses), you'd have to store something in a database (BigTable on GAE) and run your own analytics.