I have so many tasks to be done via scrapy on prodcution server.
As my manager wants to add or remove the urls to scrap and he wants the web interface.
I am thinking of making the web app for that
I have found this link
https://github.com/holgerd77/django-dynamic-scraper/
I just want to know that can I use that in production or I can manually call scrapy in my django app and I don't need to use that app.
I have tried that and looks alright to me. They also have good documentaion written there
They are good if you are new and want to get things going. but i think once you get used to the scrapy and django in detail you almost don't need it
Python modules, including scrapy, are most of the time plug-and-play so yes, you should be able to do it with scrapy itself. And you will probably benefit from clearer documentation.
Related
I would like to know what is the fastest way to turn a simple Python script into a basic web app.
For example, say I would like to create a web app that takes a keyword from the user and display the most retweeted tweet on Twitter. If I write a python script that is capable of performing that task using Twitter's API, how would I go about turning it into a web app for people to access?
I have looked at frameworks such as Django, but it would take me weeks or months to learn how to use it. I just need something quick and simple. Any such alternatives?
Make a CGI script out of it. You basically get the request information from the webserver via environment variables and you print the desired HTML to stdout. There are helper libraries such as Werkzeug which help with abstracting away the handling of the environment variables by wrapping them in a Request object.
This technique is quite outdated and isn't normally used nowadays as the script has to be run on every request and thus incurs the startup cost all the time.
Nevertheless this may actually be a good solution for you because it is quick and every webserver supports it.
I have got a basic Django web application running on Heroku. I would like to add a spider to crawl some webs (e.g with Scrapy) based on a scheduled task ( e.g. via APScheduler ) to get some tables of Django databases loaded with collected data.
Does anybody know of documentation or examples for the basis to achieve this kind of integration? I find it very hard to figure it out.
I have not used Scrapy at all, but I'm actually working with APScheduler and it's very simple to use. So my first guess would be to use a BackgroundScheduler (inside your Django app) and add a job to it that would execute a callable "spider" periodically.
The thing here is how could you embed a Scrapy project inside your Django app so you can access one of its "spiders" and effectively use it as a callable in your scheduled job.
I'm maybe not helping much, but I'm just trying to give you some kickstart orientation. I'm pretty sure that if you carefully read the Scrapy's documentation you'll make your way.
Best.
I've been looking into microframeworks for Python, and have come across two interesting options, Flask and Bottle. each have some similar features. One thing I noticed is that all the example sites show all the application code located inside a single Python file. Obviously, for even moderately sized sites, this would become difficult to manage quite quickly. Do either (or both) of these frameworks support being broken up among different files, and if so how would that be accomplished?
I'm familiar with Django, and like how its a little more structured, but i would rather use something more lightweight, but still powerful.
I don't have any experience with Bottle, but take a look at the Flask docs on larger applications. My Flask apps all use multiple Flask Module objects as that page recommends, one per Python module, and it seems to work just fine.
One thing that's nice about the Module objects is that you can customize dispatch on each one to create URL routing "domains" in your app. So for example, I'm trying to ape a Windows app in some of my code so I have a CaseInsensitiveModule that does case-insensitive dispatch, and I rigged up a RemoteModule to turn HTTP requests into Python methods using the Flask/Werkzeug routing system.
(Note that in current Flask versions, Modules are now Blueprints.)
I can't see how there could be any way of stopping this from working. Flask and Bottle, like Django, are just Python underneath, and Python allows you to break up files into modules. As long as you importing the relevant functions into the main script, they will just work exactly as if they were defined there.
I know a few people have started using my own article on doing this with Flask, although there are obviously other ways to do it depending on the size of the project; even I drop the directory type module for a file based one for smaller projects. Have a look at http://www.cols-code-snippets.co.uk/2011/02/my-take-on-flask-application-skeleton.html
I recently posted a sort of tutorial on how to get started with Bottle+Jinja2 in Google App Engine. My emphasis here is on how to organize the project files. You may be able to get something that you can use: http://codeaspoetry.wordpress.com/2011/11/27/how-to-build-a-web-app-using-bottle-with-jinja2-in-google-app-engine/
It really depends what you are trying to achieve, for micro service/applications/websites bottle is very straight forward and light weight. If you plan your application to grow by the time then Flask might be good option for you coz it has lot of extensions. We have about 40 to 50 micro services written in bottle and never faced any issues.
I know that with the SimpleHTTPServer I can make my directories accessible by web-browsers via Internet. So, I run just one line of the code and, as a result, another person working on another computer can use his/her browser to see content of my directories.
But I wander if I can make more complicated things. For example, somebody uses his/her browser to load my Python program with a set of parameter (example.py?x=2&y=2) and, as a result, he/she sees the HTML page generated by the Python program (not the Python program).
I also wander if I can process html form submitted to the SimpleHTTPServer.
While it is possible, you have to do pretty much everything yourself (parsing request parameters, handle routing, etc).
If you are not looking to get experience in creating web-frameworks, but just want to create a small site you should probably use a minimalistic framework instead.
Try Bottle, a simple single-file web framework: http://bottlepy.org
Maybe the VerseMatch project and related recipes over at ActiveState is something you would be interested in examining? It implements a small application using the standard library for dynamic running.
have you considered using CGIHTTPServer instead of SimpleHTTPServer? Then you can toss your scripts in cgi-bin and they'll execute. You have to include content-type header and whatnot but if you're looking for quick and dirty it's real convenient
I'm just starting out with Python and have practiced so far in the IDLE interface. Now I'd like to configure Python with MAMP so I can start creating really basic webapps — using Python inside HTML, or well, vice-versa. (I'm assuming HTML is allowed in Python, just like PHP? If not, are there any modules/template engines for that?)
What modules do I need to install to run .py from my localhost? Googling a bit, it seems there're various methods — mod_python, FastCGI etc.. which one should I use and how to install it with MAMP Pro 1.8.2?
Many thanks
I think probably the easiest way for you to get started is to work with something like Django. It's a top-to-bottom web development stack which provides you with everything you need to develop and run a backend server. Things can be very simple in that world, no need to mess around with mod_python or FastCGI unless you really have the need.
It's also nice because it conforms to WSGI, which is a Python standard which allows you to plug together unrelated bits of reusable code to add specific functionality to your web app when needed (say for example on-the-fly gzip compression, or OpenID authentication). Once you have outgrown the default Django stack, or want to change something specific you can go down this road if you want.
Those are a few pointers to get you started. You could also look at other alternative frameworks such as TurboGears or paste if you wanted but Django is a great way to get something up and running quickly. Anyway, I'm sure you'll enjoy the experience: WSGI makes it a real joy knocking up web apps with the wealth of Python code you'll find on the web.
[edit: you may find it helpful to browse some of the may Django related questions here on stack-overflow if you run into problems]
You asked whether HTML is allowed within Python, which indicates that you still think too much in PHP terms about it. Contrary to PHP, Python was not designed to create dynamic web-pages. Instead, it was designed as a stand-alone, general-purpose programming language. Therefore you will not be able to put HTML into Python. There are some templating libraries which allow you to go the other way around, somewhat, but that's a completely different issue.
With things like Django or TurboGears or all the other web-frameworks, you essentially set up a small, stand-alone web-server (which comes bundled with the framework so you don't have to do anything), tell the server which function should handle what URL and then write those functions. In the simplest case, each URL you specify has its own function.
That 'handler function' (or 'view function' in Django terminology) receives a request object in which interesting info about the just-received request is contained. It then does whatever processing is required (a DB query for example). Finally, it produces some output, which is returned to the client. A typical way to get the output is to have some data passed to a template where it is rendered together with some HTML.
So, the HTML is separated in a template (in the typical case) and is not in the Python code.
About Python 3: I think you will find that the vast majority of all Python development going on in the world is still with Python 2.*. As others have pointed out here, Python 3 is just coming out, most of the good stuff is not available for it yet, and you shouldn't be bothered about that.
My advise: Grab yourself Python 2.6 and Django 1.1 and dive in. It's fun.
Django is definitely not the easiest way.
check out pylons. http://pylonshq.com/
also check sqlalchemy for sql related stuff. Very cool library.
On the other hand, you can always start with something very simple like mako for templating. http://www.makotemplates.org/