I am attempting to build a web crawler to sign into FaceBook and check the online status of some family members for a project I'm building for my parents. Upon searching, I found that this is attainable through FQL queries on friend online presence, but it seems that this will be removed around April of this year. So I thought that maybe I can just do a basic crawler myself in python that will get the HTML info from online friends in my chat, but when trying to print out the HTML code after attempting to log in, it returns a very large amount of jumbled HTML and javascript that mentions "BigPipe." I see that BigPipe breaks pages into pagelets but I'm a little confused on what to make of this information.
So my questions are, does anyone know of another way to get online statuses other than the FQL queries, has anyone else attempted to crawl Facebook, has anyone attempted to crawl any site with this BigPipe response?
Thank you in advance,
Jake
You may be able to write a FireFox extension. You will not be able to scrape FB without JavaScript. That pretty much rules out most traditional scraping methods.
Using PyQt4.QtWebKit will help to deal with javascript.
Here some basic usage of it : webkit-pyqt-rendering-web-pages
Documentation: PyQt4-qtwebkit.html
I just finished my school project which requires user data from Facebook group members. I used a web crawling tool - Octoparse for data extraction, it's a non-programming application and can be used to crawl different types of data on Facebook. You can go to this tutorial:Facebook Scraping Case Study | Scraping Facebook Groups
Related
I am slightly new to python coding and I have a project coming up to which I've decided to make some code that when entering a Facebook users URL it will return all data that their profile has to offer. Any help would be greatly appreciated or if you have code that does similar I would love to observe.
I am looking for this to be executed in python.
I would recommend using a web scraping framework with python. There are tons of them. Beautiful Soup, Scrapy are great options. However, most web applications do have security in place to prevent you from scraping data on their platforms. I would recommend you do more research.
I am trying to use BeautifulSoup(or another web scraping API) to automate web forms. For example, on the login page of Facebook there is also a registration form so lets say i want to fill out this form through automation. So i would need to be able to find the relevant html tags(such as the inputs for first name, last name, etc) and then i would want to take all of that input and push a request to Facebook to make that account, how would this be done?
Even I am beginner in the scraping, I was facing these problems too. To carry out basic scraping operations we can use beautiful soup. While learning more about scraping I came across "Scrapy" tool. We can use Scrapy for many more functionality like you specified. Try out Scrapy here. This is recommended by many professional web scraper .
I am learning python right now and I want to level up my knowledge on it particularly scraping. I am now on using Scrapy and getting in to use it along with Splash. I wanted to scrape a more challenging website - an airline website "https://www.airasia.com/en/home.page?cid=1" - one of my web developer friend told me that it would be impossible to scrape this type of websites since no regular json or xml files are returned for the data to be scrape. He said data can only be access using API (he said something about RESTFUL API) I don't somehow believe him. So as not wasting my time, if someone can CONFIRM it, I would be happy and if someone would say it can be scraped, I would be more happy if that guy can give me tips on how to scrape it and hands down if that guy can show proofs..
Many thanks.
Almost ANY website can be scraped but some websites are trickier than others.
Instead of Scrapy, I would recommend using a better alternative called Selenium which happens to have a library for python as well.
Long story made short: You will start a web browser in form of a driver and navigate to the page of your choice and simulate user interactions such as clicking, entering data in forms and submission. You will also be able to run JavaScript functions.
You might also want to do some research on legal constraints to ensure your operation is not unlawful. For instance, refer to case law of Ryanair Ltd v PR Aviation BV (Case C-30/14 CJEU).
You have 2 options: Use their API if they use one, to make http requests and obtain data and informations from their servers.
Or use a python scraping / web test framework, eg scrapy or selenium, to scrap their website directly in a python program.
Scrapy will be harder than selenium on this website because a lot of content is dynamic and will require custom code to trigger. Selenium should be easy to use.
For fun, I've learnt since last night how to do basic web scraping, using Python's urllib, urllib2, cookie-jar, and BeautifulSoup. It only took a bit, but I've figured out how to get all information from each user's profile that I need (OKCupid to be exact). However, I've only figured out how to do so, and have no idea how to go through a public database of users without an API from the site.
Is there any easy way to do so? Thanks.
I have a friend that owns a small business and has a Page on Facebook. I want to help her manage it from a marketing perspective, and figure that it may be best to do so through their API.
I have skimmed their API documentation, and have a basic working knowledge of Python. What I can't figure out is if I can access their page's data with Python and grab the data on wall posts, who liked posts, etc. Is this possible? I can't find a decent tutorial for someone who is new to programming.
To provide context, I have been scraping the Twitter Search API for some time now and I am hoping there is something similar (request certain data elements, and have it returned as structured data I can analyze). I find their API extremely straight forward, and for Facebook, I don't know where to begin.
I don't want to create an application, I simply want to access the data that is related to my friend's page.
I am hoping to find some decent tutorials and help on what I will need to get started. Any help you can provide will be greatly appreciated.
You could try Pyjamas Desktop.
http://pyjs.org/
It runs python in an embedded web browser and gives you access to the html DOM.
This potentially means that you can use the JS api directly from python.
You will need to be running a server locally though.
Basically to automate posting stuff to the persons profile you need to get their oath token and then make API calls w/ that token.
Here are steps to get API token:
Register APP w/ facebook and get app id
Have your friend click this link https://www.facebook.com/dialog/oauth?
client_id=[your app id here]&
type=user_agent&
scope=email,read_stream,,,user_about_me,offline_access,publish_stream&
redirect_uri=http://www.facebook.com/connect/login_success.html
Then record that token for future
You can now use any available python FB lib to post and manage that FB page.
This should get you started:
http://eggie5.com/20-getting-started-w-facebook-api