Make a web crawler in python to download pdf - python

I want to make a web crawler using Python and then download pdf file from that URL.
Can anyone help me? how to start?

A good site to start is ScraperWiki, a site where you can write and execute scrapers/crawlers online. Besides other languages it supports Python. It provides a lot of useful tutorials and librarys for a fast start.

Related

Setup VPN through python script for web crawling

I have been using selenium to do some web scraping and I'm in need for changing my ip. After having done some reserach into this I have discovered that it is fairly easy to setup and use a proxy. However, I am already paying for a VPN and therefore I would like to use it for this application as well. The free proxy lists that I have found have been way to slow to be useful for me.
I did some googling and found vpnc and other libraries but I couldn't get it to work all the way. I'm fairly new to web scraping and python so therefore I would appreciate if someone could help me on my level of knowledge.
Is it possible to do this or am I trying to achieve something that is way to difficult for an amateur like me? I'm trying to set this up on MacOS as well as Windows 7.

Running SQL and Python on a webpage

just a general question.
I have written some Python codes in Spider. In my code, I use sql command to pull data from ssms then use Python to manipulate them. Now I want to implement my codes on a webpage or other online sources so other can run them. How would be possible? Any thought would be appreciate :) Thanks!
P.S I'm a PC user not MAC
Here's a link to a list of Python-based web frameworks:
https://wiki.python.org/moin/WebFrameworks
I know people who like Flask (http://flask.pocoo.org/), and it has some thoughtful-looking Windows installation instructions here: http://flask.pocoo.org/docs/0.12/installation/#installation.
Also, you might also look at the Python library pyodbc to directly make your queries if I understand you right that you're using sqlcmd.

Will ghost.py allow my users to scrape javascript injected images?

My site, http://whatgoeswiththis.co, has a scraper that takes images from the web and posts to our site. I can get server rendered images no problem, but for sites like https://www.everlane.com/collections/mens-luxury-tees/products/mens-crew-antique, the images are rendered client-side with javascript.
I've succeeded in writing a script on my local machine that uses ghost.py to scrape the images from this site.
However, I've had to install various programs on my laptop like Qt, PySide, PyQt4, and XQuartz. To my knowledge, these aren't libraries I can just add to my app. My question is, is this stack something that is possible to add to my existing Django app that will allow users to scrape these javascript injected images? Or is there another solution used for webapps?
Sites like http://wanelo.com are able to scrape these images - is there something in particular they're using that is an optimal solution?
Thanks for your help, and I apologize if I sound inexperienced (I am but learning!).
My current answer is: maybe ghost.py works. But only after a lot of prerequisites that I found difficult to install and configure. My solution was to follow the advice of Pyklar to use PhantomJS through the selenium library here: https://stackoverflow.com/a/15699761/2532070.
I was able to switch from beautifulsoup to selenium/phantomjs simply by changing a few lines of code, brew install phantomjs, and pip install selenium.
I hope this helps someone avoid the same struggle!
You can do something like:
g = Ghost()
g.open(url, wait=False)
page, resources = g.wait_for_selector(your_image_css_selector)

How to make .py Python files actually execute in the browser?

I have tried searching online like crazy with no avail. PHP is as simple as naming the file .php and writing PHP. I know people say it's that simple for Python, but I have found no useful guides in setting it up. I merely want to practice Python on my computer via WAMP or another alternative. I am on Windows Vista.
I cannot get .py files to execute correctly. The actual text:
print("Hello!")
Appears just as that rather than "Hello!". I don't know what to do to make it actually work in my browser.
Any help or pointing towards guides would be greatly appreciated.
PHP does not execute in the browser. It is executed on the server side then the output is sent by the web server to the browser.
If you want a simple way to use Python to process web requests take a look at web.py (http://webpy.org).
Your server should handle Python code. Take a look at framework Django. And as for servers I can suggest you http://webfaction.com

Website automation using Python

I m trying to automate a Web Application validation performed by my team.I have choosen Python as the language to do this, although my exp. with Python is very limited.I have done similar things in the past using Perl. Now the problem is that after posting the url of the website it directs to a logon page which is made in Javascript. From whatever little Python I know, I believe scrapping/parsing website made in Javascript is not possible. I faced the same issue while doing this with Perl as well and wasn't able to proceed.
Any pointers or help in resolving the above issue would be highly appreciated.
Thanks
Spynner may help http://code.google.com/p/spynner/
Maybe you can take a look a Selenium. It's a firefox plugin that enables automation, but it also has a webdriver system where you can write automation scripts in various languages (including python), and a server execute the code in various browsers. I never tried the webdriver part myself, but that should do what you want.

Categories

Resources