My site, http://whatgoeswiththis.co, has a scraper that takes images from the web and posts to our site. I can get server rendered images no problem, but for sites like https://www.everlane.com/collections/mens-luxury-tees/products/mens-crew-antique, the images are rendered client-side with javascript.
I've succeeded in writing a script on my local machine that uses ghost.py to scrape the images from this site.
However, I've had to install various programs on my laptop like Qt, PySide, PyQt4, and XQuartz. To my knowledge, these aren't libraries I can just add to my app. My question is, is this stack something that is possible to add to my existing Django app that will allow users to scrape these javascript injected images? Or is there another solution used for webapps?
Sites like http://wanelo.com are able to scrape these images - is there something in particular they're using that is an optimal solution?
Thanks for your help, and I apologize if I sound inexperienced (I am but learning!).
My current answer is: maybe ghost.py works. But only after a lot of prerequisites that I found difficult to install and configure. My solution was to follow the advice of Pyklar to use PhantomJS through the selenium library here: https://stackoverflow.com/a/15699761/2532070.
I was able to switch from beautifulsoup to selenium/phantomjs simply by changing a few lines of code, brew install phantomjs, and pip install selenium.
I hope this helps someone avoid the same struggle!
You can do something like:
g = Ghost()
g.open(url, wait=False)
page, resources = g.wait_for_selector(your_image_css_selector)
Related
I am trying to create a script that get the data from a google keep list I was thinking Google Takeout might do part of what I want but I cannot find a API to automate the downloads. Does anyone know a way to grab this data via script (python/bash) so that I can easily extract what I need?
I am not sure if it is allowed or not, but you could login via a BeautifulSoup session and navigate to the site you wish to parse.
I've written a quite similar script for Python, you can find it at github, i thinkt it's pretty self-explanatory but if you should require any more help feel free to ask.
You could use selenium library for that.
Used the framework to scrape the keep.google.com webpage for all the notes and export them to a csv file
This Might be helpful, i made the script to backup my notes to my computer
https://github.com/darshkpatel/GoogleKeep_Backup
There is no API for Google Keep at this time. I don't think your going to be able automate Google Takeout either the best you will be able to do would be run it manually then create your own application to import it were ever it is you want to import it to.
Here is an automated solution for this question: a link!
Or just execute these commands in the terminal:
git clone https://github.com/Dmitry9/exportKeep.git;
cd exportKeep;
npm install;
npm run scrape;
After all dependencies installed (could take a minute or so) chrome instance will navigate to the sign-in page. After posting credentials it will scroll to the bottom of the window to force the browser to load all of the notes inside DOM. Inspecting the output of the terminal you will find a path to the saved JSON file.
In the meanwhile there is an API, see here: https://developers.google.com/keep/api/reference/rest
Also, there is a python library that implements this API (I'm not the author of the library): https://github.com/kiwiz/gkeepapi
How can I update a python script remotely. I have a program which I would like to share, however it will be frequently updates, therefore I want to be able to remotely update it so that the users do not have to re-install it every day. I have already searched StackOverflow for an answer but I did not find anything I could understand. Any help will be mentioned in the projects credit!
A very good solution would be to build a web app. You can use django, bottle or flask for example.
Your users just connect to your url with a browser. You are in complete control of the code, and can update whenever you want without any action on their part.
They also do not need to install anything in the first place, and browser nowadays provide a lot of flexibility and dynamic content.
I have a Python program that works with Selenium and PhantomJS, and I’d like to distribute it. The functionality is quite simple; it goes onto a website, fills certain forms and returns the outcome, without any visible browser action.
The problem is that I can’t expect an arbitrary user to have PhantomJS installed on their computers. How should I approach the distribution process?
I already checked Setuptools and PythonAnywhere, but I don’t think they work for what I want.
Edit: May be too hopeful, but I'd like to be able to distribute it for Windows, OSX and Ubuntu.
The way I do it is through a web application built on Flask (one of many great python web frameworks) and hosted on PythonAnywhere.
To use PhantomJS and Selenium in PythonAnywhere you have to ask for Docker Consoles. Instructions here: https://www.pythonanywhere.com/forums/topic/1320/
I want to make a web crawler using Python and then download pdf file from that URL.
Can anyone help me? how to start?
A good site to start is ScraperWiki, a site where you can write and execute scrapers/crawlers online. Besides other languages it supports Python. It provides a lot of useful tutorials and librarys for a fast start.
I m trying to automate a Web Application validation performed by my team.I have choosen Python as the language to do this, although my exp. with Python is very limited.I have done similar things in the past using Perl. Now the problem is that after posting the url of the website it directs to a logon page which is made in Javascript. From whatever little Python I know, I believe scrapping/parsing website made in Javascript is not possible. I faced the same issue while doing this with Perl as well and wasn't able to proceed.
Any pointers or help in resolving the above issue would be highly appreciated.
Thanks
Spynner may help http://code.google.com/p/spynner/
Maybe you can take a look a Selenium. It's a firefox plugin that enables automation, but it also has a webdriver system where you can write automation scripts in various languages (including python), and a server execute the code in various browsers. I never tried the webdriver part myself, but that should do what you want.