I'm new to web programming, and have recently began looking into using Python to automate some manual processes. What I'm trying to do is log into a site, click some drop-down menus to select settings, and run a report.
I've found the acclaimed requests library: http://docs.python-requests.org/en/latest/user/advanced/#request-and-response-objects
and have been trying to figure out how to use it.
I've successfully logged in using bpbp's answer on this page: How to use Python to login to a webpage and retrieve cookies for later usage?
My understanding of "clicking" a button is to write a post() command that mimics a click: Python - clicking a javascript button
My question (since I'm new to web programming and this library) is how I would go about pulling the data I need to figure out how I would construct these commands. I've been looking into [RequestObject].headers, .text, etc. Any examples would be great.
As always, thanks for your help!
EDIT:::
To make this question more concrete, I'm having trouble interacting with different aspects of a web-page. The following image shows what I'm actually trying to do:
I'm on a web-page that looks like this. There is a drop-down menu with click-able dates that can be changed. My goal is to automate changing the date to the most recent date, "click"'Save and Run', and download the report when it's finished running.
The only solution to this I have found is Selenium. If it werent a javascript heavy website you could try mechanize but for this you need to render the javascript and then inject javascript...like Selenium does.
Upside: You can record actions in Firefox (using selenium) then export those actions to python. The downside is that this code has to open a browser window to run.
Related
I try to scrape a ajax/js based website for a .csv download.
If I visit the site in a browser, I need to click a button that calls a js function "downloadChartCSV()". This function generates and downloads a .csv file. That the file I'm after.
How can I automate this?
I have tried:
requests-html, which can login and render the page, but I miss an option to call that onClick event. There seems to be an open issue about exactly this: Github Issue
Selenium which probably works, but I don't want a full browser that is flashing up.
If you want to have the functionality of Selenium, but don't want a full browser flashing up: Use the PhantomJS Driver.
https://realpython.com/headless-selenium-testing-with-python-and-phantomjs/
This is a very good Introduction into it.
I'm currently working on a python script and I am stuck.
My script should go on a webpage, upload a file from the computer, fill a form, start an action (BLAST) and then wait since the action can be long. (sometimes one hour). After all this go on a link in the page and download one of the results (Hit table).
I currently just found how to open a webpage from my script but can't find something that would let me "navigate" on the page.
To simply navigate on a website and simulate user-interaction, you could use Selenium with Python:
References:
https://superuser.com/questions/59183/what-can-i-use-to-simulate-user-actions-within-firefox
WatiN or Selenium?
You could try Mechanize which allows you to simulate a browser in your script.
hi everyone I would like to create a small bot to help me on binary option.
i am not an expert on python but actualy I can read a web page and
retrieve a precise value in a tag,
but the information what I need is on a web application
and not in the source code of the web page. I am not an expert of eb application and I want to know if I retrieve a value displayed on the application with python.
here is a link to the picture of the application:
"http://comparatif-options-binaires.fr/wp-content/uploads/2014/05/optionweb-analyse-technique-ow-school.jpg"
I think the problem you face here is the value you need is being loaded via Javascript of some sort (though without access to the web application and no visible effort from your code I can't be sure).
Expanding on #sabhirams answer (and agreeing that requests and BeautifulSoup are excellent libraries for static text) I would recommend having a look at the following:
Selenium - automates web browser usage in python (so will run the full javascript).
Webkit - Again another headless browser for python that has some excellent SO questions on the matter.
Ghost.py - attempts to make the Webkit experience a little smoother.
pyv8 - something a bit more barebones, pyv8 is a python wrapper for the Google V8 Javascript engine and can be used to run the javascript on the page and, hopefully, extract the element you need.
And if you're really not settled with python why not look at using a Javascript headless browser to run the javascript like PhantomJS.
As mentioned before; Respect others when scraping and be aware there may be consequences if you are caught.
I think you mean you want to build a script which can scrape a given webpage, and extract a certain value out of a given target DOM element.
I dont currently have the time to write this code for you, but it should be rather simple to put together. Here are some modules which might help you:
Request - Use this to fetch a given webpage into your py script
BeautifulSoup - Feed the above "DOM text" to beautiful soup, and you will be able to more easily manipulate the HTML page (fetch your var of interest etc...).
EDIT:
As pointed out in the comments above, please consider the Terms and Conditions of the web-service you are trying to scrape info from.
I need to create a script that will log into an authenticated page and download a pdf.
However, the pdf that I need to download is not at a URL, but it is generated upon clicking on a specific input button on the page. When I check the HTML source, it only gives me the url of the button graphic and some obscure name of the button input, and action=".".
In addition, both the url where the button is and the form name is obscured, for example:
url = /WebObjects/MyStore.woa/wo/5.2.0.5.7.3
input name = 0.0.5.7.1.1.11.19.1.13.13.1.1
How would I log into the page, 'click' that button, and download the pdf file within a script?
Maybe Mechanize module can help.
I think that url on clicking the button maybe generated using javascript.So, to run javascript code from python script take a look at Spidermonkey.
Try mechanize or twill. HttpFox or firebug can help you to build your queries. Remember you can also pickle cookies from browser and use it later with py libs. If the code is generated by javascript it could be possible to 'reverse engineer' it. If nof you can run some javascript interpret or use selenium or windmill to script a real browser.
You could observe what requests are made when you click the button (using Firebug in Firefox or Developer Tools in Chrome). You may then be able to request the PDF directly.
It's difficult to help without seeing the page in question.
As Acorn said, you should try monitoring the actual requests and see if you can spot a pattern.
If not, then your best bet is actually to automate a fully-featured browser, that will be able to run Javascript, so you'll exactly mimic what a regular user would do. Have a look at this page on the Python Wiki for ideas, check the section Python Wrappers around Web "Libraries" and Browser Technology.
I have a program written for text simplification in python language, I need this program to be run on a browser as a plugin... If you click the plugin it should take the webpage's text as input and pass this input to my text simplification program and the output of the program should be again displayed in another web page...
Text simplification program takes input text and produces a simplified version of the text, so now I'm planning to create a plugin which uses this program and produces simplified version of text on the webpage...
It will be of great help if anyone help me out through this...
You would need to use NPAPI plugins in Chrome Extension:
http://code.google.com/chrome/extensions/npapi.html
Then you use Content Scripts to get the webpage text, you pass it to the Background Page via Messaging. Then your NPAPI plugin will call python (do it however you like since its all in C++), and from the Background Page, you send the text within the plugin.
Concerning your NPAPI plugin, you can take a look how it is done in pyplugin or gather ideas from here to create it.
Now the serious question, why can't you do this all in JavaScript?
If you want an easier way than trying to figure out plugins, make it run as a webservice somewhere (Google App Engine is good for Python, and free), then use a bookmarklet to send pages from the browser. As an added bonus, it works with any browser, not just Chrome.
More explanation:
Rather than running on your own computer, you make your program run on a computer at Google (or somewhere else), and access it over the web. See Google's introduction to App Engine. Then, if you want it in your browser, you make a "bookmarklet" - a little bit of javascript that grabs the web page you're currently on (either the code or the URL, depends on what you're trying to do), and sends it to your program over the web. You can add this to your browser's bookmark bar as a button you can click. There's some more info on this site.