I just deployed my first ever web app and I am curious if there is an easy way to track every time someone visits my website, well I am sure there is but how?
Easy as pie, use Google Analytics, you just have to include a tiny script in your app's pages
http://www.google.com/analytics/
PythonAnywhere Dev here. You also have your access log. You can click through this from your web app tab. It shows you the raw data about your visitors. I would personally also use something like Google Analytics. However you don't need to do anything to be able to just see your raw visitor data. It's already there.
know from myself people are obsessed with traffic, statistics, looking at other sites – tracking their stats and so on. And if there is enough demand, of course there are sites to satisfy You. I wanted to put those sites and tools in one list together, because at least for me this field was really unclear – I didn’t know what means Google Pagerank, Alexa, Compete, Technorati rankings and I could continue so on. I must say not always these stats are precise, but however they give at least overview, how popular the certain page is, how many visitors that sites gets – and if You compare those stats with Your site statistics, You can get pretty precise results then.
http://www.stuffedweb.com/3-tools-to-track-your-website-visitors/
http://www.1stwebdesigner.com/design/10-ways-how-to-track-site-traffic-popularity-statistics/
I am a huge fan of Cloudflare's analytics. It is super easy to setup, and you don't have to worry about adding a javascript blurb to each page. Cloudflare is also able to track all of the things that visit your page without loading the javascript.
http://www.cloudflare.com
Related
I am just starting out to learning Python and have taken some online courses in my free time.
I am trying to find the data source for this website, making a daily count of departures from the airport, and eventually building a flights vs date plot.
Have spent two weeks investigating the page source, but am unable to find the json source. Would a kind soul please show me where the json source is? Thanks!
https://www.changiairport.com/en/flights/departures.html
https://www.changiairport.com/cag-web/flights/departures?lang=en&callback=JSON_CALLBACK&date=today
There you go. That will give you the flight schedule for today. The date parameter can probably be other things too, but I don't know what the options are. Any normal get request should work, it seems publicly accessible.
You just need to right click and go to "inspect" then hit the "network" tab and then just browse through the different requests.
Just a note:
Just for the record, this is called scraping and it's often in a legal
gray area where, so long as you aren't using it too extensively or
making a profit off of it you probably won't get in any trouble, but
just make sure you have permission from the company if you plan to
make a lot of calls to an open API like this. It's usually against their terms of service, but as an unenforced clause that they will only use if you become a nuisance.
How to generate a random yet valid website link, regardless of languages. Actually, the more diverse the language of the website it generates, the better it is.
I've been doing it by using other people's script on their webpage, how can i not rely on these random site forwarding script and make my own?. I've been doing it as such:
import webbrowser
from random import choice
random_page_generator = ['http://www.randomwebsite.com/cgi-bin/random.pl',
'http://www.uroulette.com/visit']
webbrowser.open(choice(random_page_generator), new=2)
I've been doing it by using other people's script on their webpage, how can i not rely on these random site forwarding script and make my own?
There are two ways to do this:
Create your own spider that amasses a huge collection of websites, and pick from that collection.
Access some pre-existing collection of websites, and pick from that collection. For example, DMOZ/ODP lets you download their entire database;* Google used to have a customized random site URL;** etc.
There is no other way around it (short of randomly generating and testing valid strings of arbitrary characters, which would be a ridiculously bad idea).
Building a web spider for yourself can be a fun project. Link-driven scraping libraries like Scrapy can do a lot of the grunt work for you, leaving you to write the part you care about.
* Note that ODP is a pretty small database compared to something like Google's or Yahoo's, because it's primarily a human-edited collection of significant websites rather than an auto-generated collection of everything anyone has put on the web.
** Google's random site feature was driven by both popularity and your own search history. However, by feeding it an empty search history, you could remove that part of the equation. Anyway, I don't think it exists anymore.
A conceptual explanation, not a code one.
Their scripts are likely very large and comprehensive. If it's a random website selector, they have a huge, huge list of websites line by line, and the script just picks one. If it's a random URL generator, it probably generates a string of letters (e.g. "asljasldjkns"), plugs it between http:// and .com, tries to see if it is a valid URL, and if it is, sends you that URL.
The easiest way to design your own might be to ask to have a look at theirs, though I'm not certain of the success you'd have there.
The best way as a programmer is simply to decipher the nature of URL language. Practice the building of strings and testing them, or compile a huge database of them yourself.
As a hybridization, you might try building two things. One script that, while you're away, searches for/tests URLs and adds them to a database. Another script that randomly selects a line out of this database to send you on your way. The longer you run the first, the better the second becomes.
EDIT: Do Abarnert's thing about spiders, that's much better than my answer.
The other answers suggest building large databases of URL, there is another method which I've used in the past and documented here:
http://41j.com/blog/2011/10/find-a-random-webserver-using-libcurl/
Which is to create a random IP address and then try and grab a site from port 80 of that address. This method is not perfect with modern virtual hosted sites, and of course only fetches the top page but it can be an easy and effective way of getting random sites. The code linked above is C but it should be easily callable from python, or the method could be easily adapted to python.
I am a social scientist and a complete newbie/noob when it comes to coding. I have searched through the other questions/tutorials but am unable to get the gist of how to crawl a news website targeting the comments section specifically. Ideally, I'd like to tell python to crawl a number of pages and return all the comments as a .txt file. I've tried
from bs4 import BeautifulSoup
import urllib2
url="http://www.xxxxxx.com"
and that's as far as I can go before I get an error message saying bs4 is not a module. I'd appreciate any kind of help on this, and please, if you decide to respond, DUMB IT DOWN for me!
I can run wget on terminal and get all kinds of text from websites which is awesome IF I could actually figure out how to save the individual output html files into one big .txt file. I will take a response to either question.
Try Scrapy. It is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
You will most likely encounter this as you go, but in some cases, if the site is employing 3rd party services for comments, like Disqus, you will find that you will not be able to pull the comments down in this manner. Just a heads up.
I've gone down this route before and have had to tailor the script to a particular site's layout/design/etc.
I've found libcurl to be extremely handy, if you don't mind doing the post-processing using Python's string handler functions.
If you don't need to implement it purely in Python, you can make use of wget's recursive mirroring option to handle the content pull, then write your python code to parse the downloaded files.
I'll add my two cents here as well.
The first things to check are that you installed beautiful soup, and that it lives somewhere that it can be found. There's all kinds of things that can go wrong here.
My experience is similar to yours: I work at a web startup, and we have a bunch of users who register, but give us no information about their job (which is actually important for us). So my idea was to scrape the homepage and the "About us" page from the domain in their email address, and try to put a learning algorithm around the data that I captured to predict their job. The results for each domain are stored as a text file.
Unfortunately (for you...sorry), the code I ended up with was a bit complicated. The problem is that you'll end up getting a lot of garbage when you do the scraping, and you'll have to filter it out. You'll also end up with encoding issues, and (assuming you want to do some learning here) you'll have to get rid of low-value words. The total code is about 1000 lines, and I'll post some important pieces that may help you out here, if you're interested.
I need to get the number of unique visitors(say, for the last 5 minutes) that are currently looking at an article, so I can display that number, and sort the articles by most popular.
ex. Similar to how most forums display 'There are n people viewing this thread'
How can I achieve this on Google App Engine? I am using Python 2.7.
Please try to explain in a simple way because I recently started learning programming and I am working on my first project. I don't have lots of experience. Thank you!
Create a counter (property within entity) and increase it transactionally for every page view. If you have more then a few pageviews a second you need to look into sharded counters.
There is no way to tell when someone stops viewing a page unless you use Javascript to inform the server when that happens. Forums etc typically assume that someone has stopped viewing a page after n minutes of inactivity, and base their figures on that.
For minimal resource use, I would suggest using memcache exclusively here. If the value gets evicted, the count will be incorrect, but the consequences of that are minimal, and other solutions will use a lot more resources.
Did you consider Google Analytics service for getting statistics? Read this article about real-time monitoring using this service. Please note: a special script must be embedded on every page you want to monitor.
I am working on a website for which it would be useful to know the number of links shared by a particular facebook page (e.g., http://www.facebook.com/cocacola) so that the user can know whether they are 'liking' a firehose of information or a dribble of goodness. What is the best way to get the number of links/status updates that are shared by a particular page?
+1 for implementations that use python (this is a django website) but any solutions are welcome! I tried using fbconsole to accomplish this but I have come up a little short.
For what it is worth, this unanswered question seems relevant. As does the fact that, as of 2012.04.18, you can export your data to csv from the insights management page on the facebook site. The information is in there I just don't know how to get it out...
Thanks for your help!
In the event that anyone else finds this useful, I thought I'd post my gist example here. fbconsole makes it fairly simple to extract data through the Facebook Graph API.
The caveat is that it was not terribly easy to programmatically extract data through fbconsole so I wrote the fbconsole.automatically_authenticate to make it much easier to access this information in a systematic way. This addition has not yet been incorporated into the master branch of fbconsole (it was just posted this morning), but it is available here in the meantime for those that are interested.