Website automation using Python - python

I m trying to automate a Web Application validation performed by my team.I have choosen Python as the language to do this, although my exp. with Python is very limited.I have done similar things in the past using Perl. Now the problem is that after posting the url of the website it directs to a logon page which is made in Javascript. From whatever little Python I know, I believe scrapping/parsing website made in Javascript is not possible. I faced the same issue while doing this with Perl as well and wasn't able to proceed.
Any pointers or help in resolving the above issue would be highly appreciated.
Thanks

Spynner may help http://code.google.com/p/spynner/

Maybe you can take a look a Selenium. It's a firefox plugin that enables automation, but it also has a webdriver system where you can write automation scripts in various languages (including python), and a server execute the code in various browsers. I never tried the webdriver part myself, but that should do what you want.

Related

Setup VPN through python script for web crawling

I have been using selenium to do some web scraping and I'm in need for changing my ip. After having done some reserach into this I have discovered that it is fairly easy to setup and use a proxy. However, I am already paying for a VPN and therefore I would like to use it for this application as well. The free proxy lists that I have found have been way to slow to be useful for me.
I did some googling and found vpnc and other libraries but I couldn't get it to work all the way. I'm fairly new to web scraping and python so therefore I would appreciate if someone could help me on my level of knowledge.
Is it possible to do this or am I trying to achieve something that is way to difficult for an amateur like me? I'm trying to set this up on MacOS as well as Windows 7.

How to run Selenium with Webdriver on Online Python interpreters?

Folks, I have a scraping script that I need to run on specific times for live info, but I can't have my computer on me all day. So I thought about running it on an online interpreter, but repl.it doesn't have webdriver and the other I found didn't neither. Could you help me with that?
Thanks
I'm not sure, but I don't guess if you can do it on a free online interpreter!
You can buy a server and use that, You can SSH to it anytime you want, or even better, You can develop a micro web service using Flask or something else to report the data you need!
Other way I can think of is let your computer be online 24/7 and use smtplib to email yourself the data in an interval!

Will ghost.py allow my users to scrape javascript injected images?

My site, http://whatgoeswiththis.co, has a scraper that takes images from the web and posts to our site. I can get server rendered images no problem, but for sites like https://www.everlane.com/collections/mens-luxury-tees/products/mens-crew-antique, the images are rendered client-side with javascript.
I've succeeded in writing a script on my local machine that uses ghost.py to scrape the images from this site.
However, I've had to install various programs on my laptop like Qt, PySide, PyQt4, and XQuartz. To my knowledge, these aren't libraries I can just add to my app. My question is, is this stack something that is possible to add to my existing Django app that will allow users to scrape these javascript injected images? Or is there another solution used for webapps?
Sites like http://wanelo.com are able to scrape these images - is there something in particular they're using that is an optimal solution?
Thanks for your help, and I apologize if I sound inexperienced (I am but learning!).
My current answer is: maybe ghost.py works. But only after a lot of prerequisites that I found difficult to install and configure. My solution was to follow the advice of Pyklar to use PhantomJS through the selenium library here: https://stackoverflow.com/a/15699761/2532070.
I was able to switch from beautifulsoup to selenium/phantomjs simply by changing a few lines of code, brew install phantomjs, and pip install selenium.
I hope this helps someone avoid the same struggle!
You can do something like:
g = Ghost()
g.open(url, wait=False)
page, resources = g.wait_for_selector(your_image_css_selector)

Make a web crawler in python to download pdf

I want to make a web crawler using Python and then download pdf file from that URL.
Can anyone help me? how to start?
A good site to start is ScraperWiki, a site where you can write and execute scrapers/crawlers online. Besides other languages it supports Python. It provides a lot of useful tutorials and librarys for a fast start.

Is there pluggable online python console?

I'm thinking if there already is some sort of online live python console (web-based) with open source code available. Anyone know of anything?
It would be really useful to have console in Django admin (like running python manage.py shell on the server's terminal), so it would be great to have django/any wsgi aplication, that can be used to enable web based live console access.
Thanks
You're looking for the Werkzug debugger.
http://werkzeug.pocoo.org/
http://werkzeug.pocoo.org/docs/debug/
It's got an interactive javascript based in-browser debugger for your WSGI projects, among many other great tools. Fantastic stuff.
For Django specifically, there's also RunServerPlus, which is part of the django-extensions package.
https://github.com/django-extensions/django-extensions
You should check out Python Anywhere. You can run python web apps, you get an SQL database, and you get a bash shell in your browser.
Have a look at python shell from Google. There's a link to source code at the top. Loading Django environment into it might be not very easy but I believe it's possible.
I'm not sure if this meets your desire but you might take a look at Chrome extension : https://chrome.google.com/webstore/detail/gdiimmpmdoofmahingpgabiikimjgcia
There is a great website called Codecademy. It teaches the fundamentals of Python, Ruby, Javascript, and HTML/CSS.
They also have online consoles for each of the languages they teach, excluding HTML/CSS. This website is Codecademy Labs. Codecademy Labs has a console you can type directly in, and an editor that displays output in the console. I hope that this helped you find what you were looking for!

Categories

Resources