python read data from web application - python

hi everyone I would like to create a small bot to help me on binary option.
i am not an expert on python but actualy I can read a web page and
retrieve a precise value in a tag,
but the information what I need is on a web application
and not in the source code of the web page. I am not an expert of eb application and I want to know if I retrieve a value displayed on the application with python.
here is a link to the picture of the application:
"http://comparatif-options-binaires.fr/wp-content/uploads/2014/05/optionweb-analyse-technique-ow-school.jpg"

I think the problem you face here is the value you need is being loaded via Javascript of some sort (though without access to the web application and no visible effort from your code I can't be sure).
Expanding on #sabhirams answer (and agreeing that requests and BeautifulSoup are excellent libraries for static text) I would recommend having a look at the following:
Selenium - automates web browser usage in python (so will run the full javascript).
Webkit - Again another headless browser for python that has some excellent SO questions on the matter.
Ghost.py - attempts to make the Webkit experience a little smoother.
pyv8 - something a bit more barebones, pyv8 is a python wrapper for the Google V8 Javascript engine and can be used to run the javascript on the page and, hopefully, extract the element you need.
And if you're really not settled with python why not look at using a Javascript headless browser to run the javascript like PhantomJS.
As mentioned before; Respect others when scraping and be aware there may be consequences if you are caught.

I think you mean you want to build a script which can scrape a given webpage, and extract a certain value out of a given target DOM element.
I dont currently have the time to write this code for you, but it should be rather simple to put together. Here are some modules which might help you:
Request - Use this to fetch a given webpage into your py script
BeautifulSoup - Feed the above "DOM text" to beautiful soup, and you will be able to more easily manipulate the HTML page (fetch your var of interest etc...).
EDIT:
As pointed out in the comments above, please consider the Terms and Conditions of the web-service you are trying to scrape info from.

Related

Interact with local HTML file with python

I want to make a script in python that interacts with a webpage that has quite a lot of javascript in it (it's a webpage that computes a bunch of physics stuff).
I don't want my code to break if the page formatting changes and I want it to run offline so I would prefer my script to run on a local html copy of the page I got (all the JS code is accessible in the HTML source, there is no call to an external server). I wanted to use the requests library to do it, but it only works with URLs. Is there any library to do this? Note that I want to interact with the HTML (input values and look at the outputs etc..), I know that I can parse the file but that's not what I'm asking. I'm also totally new to web bots or anything related.
Right now I can open my .html version of the page offline with chrome and interact with it, so there has to be a way to automate this somehow. I'm also not against using something else than python if there is a better library for this in another language.
interesting question, best way I can think to do that is use a web framework and then just scrape the data using requests. I am familiar with flask and its simple to use but im sure there are other options as well

How to get website script output on python

I am trying to write a web scraper in python but I have an issue, the contents of the site are not coded into the html, it seems like they are coming from a different source and I want to know if there's any python library that can fetch the contents for me or if there is such tool in any other language I'm willing to learn.
See: Is this possible to load the page after the javascript execute using python?
You'll have to execute the JS and whatever else it is that generates the HTML you want. You can do this in a lot of ways, but the answer I linked above suggests using Selenium Web Driver.

Python - how to trick anti adblock filter while scraping?

I`m trying to download content of a website using python urllib, but i have a problem because the site has an addblock filter and only thing i can get is text that asks me to disable addblock... Is there any way to trick this kind of filter?
Thanks in advance. (:
Javascript Parsing
The issue you are running into is a JavaScript filter that loads data after the page has loaded. The message that warns that you are using adblock is there in raw HTML and is completely static. It is replaced when a JavaScript call is able to validate where adblock is or is not present. There are several ways you can get around this, however each requires finding some way of loading JavaScript.
Solution(s)
There are several solutions to your problem. You can read more about them here.
Embed a web browser within an application and simulate a normal user.
Remotely connect to a web browser and automate it from a scripting
language.
Use special purpose add-ons to automate the browser
Use a framework/library to simulate a complete browser.
As you can see each one in some way requires emulating a browser and DOM objects. Since there are several libraries to help you accomplish this, I highly recommend you look into the url above.
The following is a code example from the same page that shows how to retrieve the URLs on a page that generates URLs via JavaScript. It relies on a library from gargoylesoftware.
import com.gargoylesoftware.htmlunit.WebClient as WebClient
import com.gargoylesoftware.htmlunit.BrowserVersion as BrowserVersion
def main():
webclient = WebClient(BrowserVersion.FIREFOX_3_6) # creating a new webclient object.
url = "http://www.gartner.com/it/products/mq/mq_ms.jsp"
page = webclient.getPage(url) # getting the url
articles = page.getByXPath("//table[#id='mqtable']//tr/td/a") # getting all the hyperlinks
if __name__ == '__main__':
main()
However,
I am not sure why you are scraping a webpage, or what website you are scraping it from. However, it is against the terms and conditions of various sites to automate such data-collection, and I advise your revise these terms before you get yourself into any trouble.
Further Research
If you are looking for a more generic answer to your question (e.g. "How can I load javascript with Python.") I highly recommend looking at previous answers on this site, because they offer some really good insight into the matter:
Web-scraping JavaScript page with Python

python open web page and get source code

We have developed a web based application, with user login etc, and we developed a python application that have to get some data on this page.
Is there any way to communicate python and system default browser ?
Our main goal is to open a webpage, with system browser, and get the HTML source code from it ? We tried with python webbrowser, opened web page succesfully, but could not get source code, and tried with urllib2, in that case, i think we have to use system default browser's cookie etc, and i dont want to this, because of security.
https://pypi.python.org/pypi/selenium
You can try to use Selenium, he was done for testing, but nothing prevents you from using it for other purposes
If your web site is navigable without Javascript, then you could try Mechanize or zope.testbrowser. These tools offer a higher level API than urllib2, letting you do things like follow links on pages and fill out HTML forms.
This can be helpful in navigating a site that uses cookie based authentication with HTML forms for login, for example.
Have a look at the nltk module---they have some utilities for looking at web pages and getting text. There's also BeautifulSoup, which is a bit more elaborate. I'm currently using both to scrape web pages for a learning algorithm---they're pretty widely used modules, so that means you can find lots of hints here :)

Python 3 - way to interact with a web page

I have experience with reading and extracting html source 'as given'(via urllib.request), but now I would like to perform browser-alike actions(like filling a form, or selecting a value from the option menu) and then, of course, read a resulting html code as usual. I did come across some modules that seemed promising, but turned out not supporting Python 3.
So, I'm here asking for a name of library/module that does the wanted, or pointing to a solution within standard libraries if it's there and I failed to see it.
Usually many websites (like Twitter, facebook or Wikipedia) provide their API's to let developers hook into their app and perform activities programmatically. For what so ever web site you wish to perform activities through code, just look for their API support.
In case you need to do web scraping, you can use scrapy. But it only has support upto python 2.7.x. Anyways, you can use requests for HTTP client and beautiful soup for HTML parsing.

Categories

Resources