I am making a database with data in it. That database has two customers: 1) a .NET webserver that makes the data visible to users somehow someway. 2) a python dataminer that creates the data and populates the tables.
I have several options. I can use the .NET Entity Framework to create the database, then reverse engineer it on the python side. I can vice versa that. I can just write raw SQL statements in one or the other systems, or both. What are possible pitfalls of doing this one way or the other? I'm worried, for example, that if I use the python ORM to create the tables, then I'm going to have a hard time in the .NET space...
I love questions like that.
Here is what you have to consider, your web site has to be fast, and the bottleneck of most web sites is a database. The answer to your question would be - make it easy for .NET to work with SQL. That will require little more work with python, like specifying names of the table, maybe row names. I think Django and SQLAlchemy are both good for that.
Another solution could be to have a bridge between database with gathered data and database to display data. On a background you can have a task/job to migrate collected data to your main database. That is also an option and will make your job easier, at least all database-specific and strange code will go to the third component.
I've been working with .NET for quite a long time before I switched to python, and what you should know is that whatever strategy you chose it will be possible to work with data in both languages and ORMs. Do the hardest part of the job in the language your know better. If you are a Python developer - pick python to mess with the right names of tables and rows.
Related
I am new at working with databases, and I have been given the task to combine data from two large databases as part of an internship program (heavily focused on the learning experience, but not many people at the job are familiar with databases). The options are either to create a new table or database, or make a front-end that pulls data from both databases. Is it possible to just make a front-end for this? There is an issue of storage if a new database has to be created.
I'm still on the stage where I'm trying to figure out exactly how I'm going to go about doing this, and how to access the data in the first place. I have the table data for the two databases that are already in existence and I know what items need to be pulled from both. The end goal is a website where the user can input one of the values and output the all the information about that item. One of the databases is an Oracle database in SQL and the other is a Cisco Prime database. I am planning to work in Python if possible. Any guidance on this would be very helpful!
Yes, it is perfectly OK to access both data sources from a single frontend.
Having said that, it might be a problem if you need to combine data from both datasources in large quantities. Because you might have to reimplement some relational database functions such as join, sort, group by.
Python is perfectly OK to connect to Oracle DataSources. Not so sure about Cisco Prime (which is an unusual database).
I would recommend using Linux or Mac (not Windows) if you are new to Python, since both platforms are more python friendly than Windows.
Background:
I am developing a Django app for a business application that takes client data and displays charts in a dashboard. I have large databases full of raw information such as part sales by customer, and I will use that to populate the analyses. I have been able to do this very nicely in the past using python with pandas, xlsxwriter, etc., and am now in the process of replicating what I have done in the past in this web app. I am using a PostgreSQL database to store the data, and then using Django to build the app and fusioncharts for the visualization. In order to get the information into Postgres, I am using a python script with sqlalchemy, which does a great job.
The question:
There are two ways I can manipulate the data that will be populating the charts. 1) I can use the same script that exports the data to postgres to arrange the data as I like it before it is exported. For instance, in certain cases I need to group the data by some parameter (by customer for instance), then perform calculations on the groups by columns. I could do this for each different slice I want and then export different tables for each model class to postgres.
2) I can upload the entire database to postgres and manipulate it later with django commands that produce SQL queries.
I am much more comfortable doing it up front with python because I have been doing it that way for a while. I also understand that django's queries are little more difficult to implement. However, doing it with python would mean that I will need more tables (because I will have grouped them in different ways), and I don't want to do it the way I know just because it is easier, if uploading a single database and using django/SQL queries would be more efficient in the long run.
Any thoughts or suggestions are appreciated.
Well, it's the usual tradeoff between performances and flexibility. With the first approach you get better performances (your schema is taylored for the exact queries you want to run) but lacks flexibility (if you need to add more queries the scheam might not match so well - or even not match at all - in which case you'll have to repopulate the database, possibly from raw sources, with an updated schema), with the second one you (hopefully) have a well normalized schema but one that makes queries much more complex and much more heavy on the database server.
Now the question is: do you really have to choose ? You could also have both the fully normalized data AND the denormalized (pre-processed) data alongside.
As a side note: Django ORM is indeed most of a "80/20" tool - it's designed to make the 80% simple queries super easy (much easier than say SQLAlchemy), and then it becomes a bit of a PITA indeed - but nothing forces you to use django's ORM for everything (you can always drop down to raw sql or use SQLAlchemy alongside).
Oh and yes: your problem is nothing new - you may want to read about OLAP
i'm working on a project (written in Django) which has only a few entities, but many rows for each entity.
In my application i have several static "reports", directly written in plain SQL. The users can also search the database via a generic filter form. Since the target audience is really tech-savvy and at some point the filter doesn't fit their needs, i think about creating a query language for my database like YQL or Jira's advanced search.
I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html, but it seems that they only operate on in-memory objects. Since the database can be too large for holding it in-memory, i would prefer that the query is translated in SQL (or better a Django query) before doing the actual work.
Are there any library or best practices on how to do this?
Writing such a DSL is actually surprisingly easy with PLY, and what ho—there's already an example available for doing just what you want, in Django. You see, Django has this fancy thing called a Q object which make the Django querying side of things fairly easy.
At DjangoCon EU 2012, Matthieu Amiguet gave a session entitled Implementing Domain-specific Languages in Django Applications in which he went through the process, right down to implementing such a DSL as you desire. His slides, which include all you need, are available on his website. The final code (linked to from the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html.
Reinout van Rees also produced some good comments on that session. (He normally does!) These cover a little of the missing context.
You see in there something very similar to YQL and JQL in the examples given:
groups__name="XXX" AND NOT groups__name="YYY"
(modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"
It can also be tweaked very easily; for example, you might want to use groups.name rather than groups__name (I would). This modification could be made fairly trivially (allow . in the FIELD token, by modifying t_FIELD, and then replacing . with __ before constructing the Q object in p_expression_ID).
So, that satisfies simple querying; it also gives you a good starting point should you wish to make a more complex DSL.
I've faced exactly this problem - a large database which needs searching. I made some static reports and several fancy filters using django (very easy with django) just like you have.
However the power users were clamouring for more. I decided that there already was a DSL that they all knew - SQL. The question was how to make it secure enough.
So I used django permissions to give the power users permission to make SQL queries in a new table. I then made a view for the not-quite-so-power users to use these queries. I made them take optional parameters. The queries were run using Python's lower level DB-API which django is using under the hood for its ORM anyway.
The real trick was opening a read only database connection to run these queries just to make sure that no updates were ever run. I made a read only connection by creating a different user in the database with lower permissions and opening a specific connection for that in the view.
TL;DR - SQL is the way to go!
Depending on the form of your data, the types of queries your users need to use, and the frequency that your data is updated, an alternative to the pure SQL solution suggested by Nick Craig-Wood is to index your data in Solr and then run queries against it.
Solr is an added layer of complexity (configuration, data synchronization) but it is super-fast, can handle large datasets, and provides a (relatively) intuitive query language.
You could write your own SQL-ish language using pyparsing, actually. There is even pretty verbose example you could extend.
EDIT: Since you are asking for specifics, consider a photo-sharing site (like Flickr or picasa - - I know that one uses PHP, while the other uses Python) for instance. If it proves to be successful, it needs to scale enormously. I hope this is specific enough.
It's been sometime since I heard any discussion on this, and since I am in the decision making process of choosing between Ruby and Python for a web project, here come the questions:
[1] Can current versions of Rails (Ruby) and Django (Python) query more than one database at a time?
[2] I also read on SO that "If your focus is building web-sites or web-applications go Ruby" (because it has fully featured, web-focused Rails). But that was about 2 years ago. What's the state of Python web framework Django today? Is it head-to-head with Rails now?
EDIT: [3] Don't know if I can ask this here, it's really surprising how quickly the Stack Exchange sites load. Do SE sites still use the same technology mentioned here? If not, does anyone have an update?
There's nothing in either of the languages that would prevent you from connecting to more than one database at a time. The real question is why would you want to?
The reason the StackOverflow sites are so fast is not so much the choice of technology but they way it's applied. Database optimization techniques are largely independent of the platform involved, just based on common-sense principles and proven methods of scaling.
Ruby on Rails offers a number of methods for connecting to multiple databases, though you might mean connecting to a system that's divided into shards, into multi-tenant partitions, or where different forms of data are stored on different databases. All of these approaches are supported, but they are quite different in implementation.
You should post a new question with an outline of the problem you're trying to solve if you want a specific answer.
Multi-database support exists in Django. In our Django project we have models pulling data from Postgres, MySQL, Oracle, and MS SQL Server (there are various issues depending on the database, but it all generally works). From what I've read, RoR supports multiple databases as well. Each framework comes with its own set of strengths and weaknesses which you have to evaluate against your particular needs and requirements. I don't think anyone can give you a (valid/useful) general answer without knowing the specifics of your situation.
I'm starting a web project that likely should be fine with SQLite. I have SQLObject on top of it, but thinking long term here -- if this project should require a more robust (e.g. able to handle high traffic), I will need to have a transition plan ready. My questions:
How easy is it to transition from one DB (SQLite) to another (MySQL or Firebird or PostGre) under SQLObject?
Does SQLObject provide any tools to make such a transition easier? Is it simply take the objects I've defined and call createTable?
What about having multiple SQLite databases instead? E.g. one per visitor group? Does SQLObject provide a mechanism for handling this scenario and if so, what is the mechanism to use?
Thanks,
Sean
3) Is quite an interesting question. In general, SQLite is pretty useless for web-based stuff. It scales fairly well for size, but scales terribly for concurrency, and so if you are planning to hit it with a few requests at the same time, you will be in trouble.
Now your idea in part 3) of the question is to use multiple SQLite databases (eg one per user group, or even one per user). Unfortunately, SQLite will give you no help in this department. But it is possible. The one project I know that has done this before is Divmod's Axiom. So I would certainly check that out.
Of course, it would probably be much easier to just use a good concurrent DB like the ones you mention (Firebird, PG, etc).
For completeness:
1 and 2) It should be straightforward without you actually writing much code. I find SQLObject a bit restrictive in this department, and would strongly recommend SQLAlchemy instead. This is far more flexible, and if I was starting a new project today, I would certainly use it over SQLObject. It won't be moving "Objects" anywhere. There is no magic involved here, it will be transferring rows in tables in a database. Which as mentioned you could do by hand, but this might save you some time.
Your success with createTable() will depend on your existing underlying table schema / data types. In other words, how well SQLite maps to the database you choose and how SQLObject decides to use your data types.
The safest option may be to create the new database by hand. Then you'll have to deal with data migration, which may be as easy as instantiating two SQLObject database connections over the same table definitions.
Why not just start with the more full-featured database?
I'm not sure I understand the question.
The SQLObject documentation lists six kinds of connections available. Further, the database connection (or scheme) is specified in a connection string. Changing database connections from SQLite to MySQL is trivial. Just change the connection string.
The documentation lists the different kinds of schemes that are supported.