Running Python Scrapers and Database in online environment - python

I hope that this question is not stupid but I am a beginner and I can hardly find good suggestions for what I need.
I run python webscrapers that crawl data from several platforms. So far the whole system is running locally on my Mac. As I will me travelling and need to check the consistency of my scrapers that will operate on a daily basis for at least a month I need to find another solution.
My requirements are:
* Storage for MySQL Database (Would love to use PhPMyAdmin to administrate)
* Enough Processing Power for Python Scrapers
* Necessary Python Modules can be installed
* Data can be accessed at any time and data dumps can be created
I have already found providers like AktiveState (http://www.activestate.com/) or PiCloud (http://www.picloud.com/). They seem to be solutions for the processing of the code.
Does anyone have experience with them?
Can somebody give me a tip for good storage that can communicate with these platforms?
As I am a beginner, I really need something that is easy to use. Thanks!

I usually use Amazon Web Services, Heroku, or Google App Engine, though your choice will depend on your level of expertise and your cost basis.
Heroku is the easiest to use but most costly, followed by GAE, and then AWS is the hardest to use but cheapest.

Related

Coupling a SQL database through a urllib2 to a Python application

Momentarily I am creating a python based application through the programme Ren'py. Now I have to couple the game with a SQL database. The admin on the board of the programme recommended using urllib to do this.
http://lemmasoft.renai.us/forums/viewtopic.php?f=8&t=29954
This is my thread. Now, I've managed to succesfully add the urllib to the program, but I'm lost at the entire "talk to a web service, which would then talk to the sqlite database" segment. Could you perhaps provide any hints/ tips?
I've never worked with Python before, so it's kind of challenging.
The answer you got on that forum isn't very helpful, in that the poster didn't really explain what they meant.
urllib is really the most minor component of the solution they're suggesting. What they actually mean is that you set up an entire web service, hosted on a URL somewhere, which talks to its own database. Your local installations of the app would then use a Python web library to communicate with that remote database over the web.
While this isn't particularly difficult, it is a fair amount of work, especially if you don't have any experience in doing this. You'll need a Python web framework and somewhere to host it. Since you talk about admins needing to log in and view data, you might want to explore Django, which comes with a built-in admin interface.
You'll then need to design an API to allow your Ren'py app to communicate with that web service, and you might want to look at the Django REST framework for that. The final part is getting your app to talk to the web service, which is where the recommendation of urllib comes in - but to be honest, that isn't even a very good recommendation here: the third-party library requests would be much better.
As I say, there's quite a lot of work. A much simpler solution would be to use Python's built-in sqlite3 library to talk to a local database via SQL, but that wouldn't do anything to make people's data available in a central location and would be open to anyone who worked out how to query the database.

Standalone Desktop App with Centralized Database (file) w/o Database Server?

I have been developing a fairly simple desktop application to be used by a group of 100-150 people within my department mainly for reporting. Unfortunately, I have to build it within some pretty strict confines similar to the specs called out in this post. The application will just be a self contained executable with no need to install.
The problem I'm running into is figuring out how to handle the database need. There will probably only be about 1GB of data for the app, but it needs to be available to everyone.
I would embed the database with the application (SQLite), but the data needs to be refreshed every week from a centralized process, so I figure it would be easier to maintain one database, rather than pushing updates down to the apps. Plus users will need to write to the database as well and those updates need to be seen by everyone.
I'm not allowed to set up a server for the database, so that rules out any good options for a true database. I'm restricted to File Shares or SharePoint.
It seems like I'm down to MS Access or SQLite. I'd prefer to stick with SQLite because I'm a fan of python and SQLAlchemy - but based on what I've read SQLite is not a good solution for multiple users accessing it over the network (or even possible).
Is there another option I haven't discovered for this setup or am I stuck working with MS Access? Perhaps I'll need to break down and work with SharePoint lists and apps?
I've been researching this for quite a while now, and I've run out of ideas. Any help is appreciated.
FYI, as I'm sure you can tell, I'm not a professional developer. I have enough experience in web / python / vb development that I can get by - so I was asked to do this as a side project.
SQLite can operate across a network and be shared among different processes. It is not a good solution when the application is write-heavy (because it locks the database file for the duration of a write), but if the application is mostly reporting it may be a perfectly reasonable solution.
As my options are limited, I decided to go with a built in database for each app using SQLite. The db will only need updated every week or two, so I figured a 30 second update by pulling from flat files will be OK. then the user will have all data locally to browse as needed.

python semantic proxy/server, which framework to use?

This year me and a friend have to make a project for the final year of university. The plan is to make a proxy/sever that allows to store ontologies and RDF's, by this way this data is "chained" to a web, so you can make a request for that web and the proxy will send you the homepage with metadata.
We have been thinking to use python and rdflib, and for the web we don't know which framework is the best. We thought of django, but we think that is very big for our purpose, and we decided that webpy or web2py is a better option.
We don't have any python coding experience, this will be our very first time. We have been always programming in c++ and java.
So taking into account everything we've mentioned our question is, which would be the best web framework for our project? And will rdflib suit fine with this framework?
Thanks :)
I have developed several Web applications with Python framworks consuming RDF data. The choice always depends on the performance needed and the amount of data you'll have to handle.
If the number of triples you'll handle is in the magnitude of few thousands then you can easily put together a framework with RDFlib + Django. I have used this choice with toy applications but as soon as you have to deal with lots of data you'll realise that it simply doesn't scale. Not because of Django, the main problem is RDFlib's implementation of a triple store - it is not great.
If you're familiar with C/C++ I recommend you to have a look at Redland libraries. They are written in C and you have bindings for Python so you can still develop your Web layer with Django and pull out RDF data with Python. We do this quite a lot and it normally works. This option will scale a bit more but won't be great either.
In case your data grows to millions of triples then I recommend you to go for a Scalable Triple store. You can access them through SPARQL and HTTP. My choice is always 4store. Here you have a Python client to issue queries and assert/remove data 4store Python Client

Is Google App Engine right for me?

I am thinking about using Google App Engine.It is going to be a huge website. In that case, what is your piece of advice using Google App Engine. I heard GAE has restrictions like we cannot store images or files more than 1MB limit(they are going to change this from what I read in the GAE roadmap),query is limited to 1000 results, and I am also going to se web2py with GAE. So I would like to know your comments.
Thanks
Having developed a smallish site with GAE, I have some thoughts
If you mean "huge" like "the next YouTube", then GAE might be a great fit, because of the previously mentioned scaling.
If you mean "huge" like "massively complex, with a whole slew of screens, models, and features", then GAE might not be a good fit. Things like unit testing are hard on GAE, and there's not a built-in structure for your app that you'd get with something like (famously) (Ruby on) Rails, or (Python powered) Turbogears.
ie: there is no staging environment: just your development copy of the system and production. This may or may not be a bad thing, depending on your situation.
Additionally, it depends on the other Python modules you intend to pull in: some Python modules just don't run on GAE (because you can't talk to hardware, or because there are just too many files in the package).
Hope this helps
using web2py on Google App Engine is a great strategy. It lets you get up and running fast, and if you do outgrow the restrictions of GAE then you can move your web2py application elsewhere.
However, keeping this portability means you should stay away from the advanced parts of GAE (Task Queues, Transactions, ListProperty, etc).
The AppEngine uses BigTable as it's datastore backend. Don't try to write a traditional relational-database driven application. BigTable is much more well suited for use as a highly-scalable key-value store. Avoid joins if at all possible.
I wouldn't worry about any of this. After having played with Google App Engine for a while now, I've found that it scales quite well for large data sets. If your data elements are large (i.e. photos), then you'll need to integrate with another service to handle them, but that's probably going to be true no matter what with data of that size. Also, I've found BigTable relatively easy to work with having come from a background entirely in relational databases. Finally, Django is a somewhat hidden, but awesome, "feature" of Google App Engine. If you've never used it, it's a really nice, elegant web framework that makes a lot of common tasks trivial (forms come to mind here).
Google has just released version 1.3.0 of the SDK with support with a new Blobstore API for storage of files up to 50MB. See the post "App Engine SDK 1.3.0 Released Including Support for Larger User Uploads".
What about Google Wave? It's being built on appengine, and once live, real-time translatable chat reaches the corporate sector... I could see it hitting top 1000th... But then again, that's an internal project that gets to do special stuff other appengine apps can't.... Like hanging threads; I think... And whatever else Wave has under the hood...
If you are planning on a 'huge' website, then don't use App Engine. Simple as that. The App Engine is not built to deliver the next top 1000th website.
Allow me to also ask what do you mean by 'huge', how many simultaneous users? Queries per second? DB load?

Feedback on using Google App Engine? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Looking to do a very small, quick 'n dirty side project. I like the fact that the Google App Engine is running on Python with Django built right in - gives me an excuse to try that platform... but my question is this:
Has anyone made use of the app engine for anything other than a toy problem? I see some good example apps out there, so I would assume this is good enough for the real deal, but wanted to get some feedback.
Any other success/failure notes would be great.
I have tried app engine for my small quake watch application
http://quakewatch.appspot.com/
My purpose was to see the capabilities of app engine, so here are the main points:
it doesn't come by default with Django, it has its own web framework which is pythonic has URL dispatcher like Django and it uses Django templates
So if you have Django exp. you will find it easy to use
But you can use any pure python framework and Django can be easily added see
http://code.google.com/appengine/articles/django.html
google-app-engine-django (http://code.google.com/p/google-app-engine-django/) project is excellent and works almost like working on a Django project
You can not execute any long running process on server, what you do is reply to request and which should be quick otherwise appengine will kill it
So if your app needs lots of backend processing appengine is not the best way
otherwise you will have to do processing on a server of your own
My quakewatch app has a subscription feature, it means I had to email latest quakes as they happend, but I can not run a background process in app engine to monitor new quakes
solution here is to use a third part service like pingablity.com which can connect to one of your page and which executes the subscription emailer
but here also you will have to take care that you don't spend much time here
or break task into several pieces
It provides Django like modeling capabilities but backend is totally different but for a new project it should not matter.
But overall I think it is excellent for creating apps which do not need lot of background processing.
Edit:
Now task queues can be used for running batch processing or scheduled tasks
Edit:
after working/creating a real application on GAE for a year, now my opnion is that unless you are making a application which needs to scale to million and million of users, don't use GAE. Maintaining and doing trivial tasks in GAE is a headache due to distributed nature, to avoid deadline exceeded errors, count entities or do complex queries requires complex code, so small complex application should stick to LAMP.
Edit:
Models should be specially designed considering all the transactions you wish to have in future, because entities only in same entity group can be used in a transaction and it makes the process of updating two different groups a nightmare e.g. transfer money from user1 to user2 in transaction is impossible unless they are in same entity group, but making them same entity group may not be best for frequent update purposes....
read this http://blog.notdot.net/2009/9/Distributed-Transactions-on-App-Engine
I am using GAE to host several high-traffic applications. Like on the order of 50-100 req/sec. It is great, I can't recommend it enough.
My previous experience with web development was with Ruby (Rails/Merb). Learning Python was easy. I didn't mess with Django or Pylons or any other framework, just started from the GAE examples and built what I needed out of the basic webapp libraries that are provided.
If you're used to the flexibility of SQL the datastore can take some getting used to. Nothing too traumatic! The biggest adjustment is moving away from JOINs. You have to shed the idea that normalizing is crucial.
Ben
One of the compelling reasons I have come across for using Google App Engine is its integration with Google Apps for your domain. Essentially it allows you to create custom, managed web applications that are restricted to the (controlled) logins of your domain.
Most of my experience with this code was building a simple time/task tracking application. The template engine was simple and yet made a multi-page application very approachable. The login/user awareness api is similarly useful. I was able to make a public page/private page paradigm without too much issue. (a user would log in to see the private pages. An anonymous user was only shown the public page.)
I was just getting into the datastore portion of the project when I got pulled away for "real work".
I was able to accomplish a lot (it still is not done yet) in a very little amount of time. Since I had never used Python before, this was particularly pleasant (both because it was a new language for me, and also because the development was still fast despite the new language). I ran into very little that led me to believe that I wouldn't be able to accomplish my task. Instead I have a fairly positive impression of the functionality and features.
That is my experience with it. Perhaps it doesn't represent more than an unfinished toy project, but it does represent an informed trial of the platform, and I hope that helps.
The "App Engine running Django" idea is a bit misleading. App Engine replaces the entire Django model layer so be prepared to spend some time getting acclimated with App Engine's datastore which requires a different way of modeling and thinking about data.
I used GAE to build http://www.muspy.com
It's a bit more than a toy project but not overly complex either. I still depend on a few issues to be addressed by Google, but overall developing the website was an enjoyable experience.
If you don't want to deal with hosting issues, server administration, etc, I can definitely recommend it. Especially if you already know Python and Django.
I think App Engine is pretty cool for small projects at this point. There's a lot to be said for never having to worry about hosting. The API also pushes you in the direction of building scalable apps, which is good practice.
app-engine-patch is a good layer between Django and App Engine, enabling the use of the auth app and more.
Google have promised an SLA and pricing model by the end of 2008.
Requests must complete in 10 seconds, sub-requests to web services required to complete in 5 seconds. This forces you to design a fast, lightweight application, off-loading serious processing to other platforms (e.g. a hosted service or an EC2 instance).
More languages are coming soon! Google won't say which though :-). My money's on Java next.
This question has been fully answered. Which is good.
But one thing perhaps is worth mentioning.
The google app engine has a plugin for the eclipse ide which is a joy to work with.
If you already do your development with eclipse you are going to be so happy about that.
To deploy on the google app engine's web site all I need to do is click one little button - with the airplane logo - super.
Take a look the the sql game, it is very stable and actually pushed traffic limits at one point so that it was getting throttled by Google. I have seen nothing but good news about App Engine, other than hosting you app on servers someone else controls completely.
I used GAE to build a simple application which accepts some parameters, formats and send email. It was extremely simple and fast. I also made some performance benchmarks on the GAE datastore and memcache services (http://dbaspects.blogspot.com/2010/01/memcache-vs-datastore-on-google-app.html ). It is not that fast. My opinion is that GAE is serious platform which enforce certain methodology. I think it will evolve to the truly scalable platform, where bad practices simply not allowed.
I used GAE for my flash gaming site, Bearded Games. GAE is a great platform. I used Django templates which are so much easier than the old days of PHP. It comes with a great admin panel, and gives you really good logs. The datastore is different than a database like MySQL, but it's much easier to work with. Building the site was easy and straightforward and they have lots of helpful advice on the site.
I used GAE and Django to build a Facebook application. I used http://code.google.com/p/app-engine-patch as my starting point as it has Django 1.1 support. I didn't try to use any of the manage.py commands because I assumed they wouldn't work, but I didn't even look into it. The application had three models and also used pyfacebook, but that was the extent of the complexity. I'm in the process of building a much more complicated application which I'm starting to blog about on http://brianyamabe.com.

Categories

Resources