Repository pattern on Python - python

I'm new to python and I'm comming from the c# world.
Over there it seemed like the repository pattern was the way to go, but I am having trouble finding any tutorials of how to best do this on Python.
edit I understand that it can be implemented, I'm just wondering if there is any reason why I am finding close to nothing for how to go about doing this.
Thanks!

I wasn't immediately familiar with the "repository pattern", so I looked it up. It appears to be the idea of putting a more general API, like a dictionary-like key/value lookup, in front of a database or other data store. It seems that the idea is to add an additional layer of abstraction that can allow multiple types of data sources (like both a relational database and a CVS file) to be accessed transparently via a common API.
Given this definition, I can think of no reason why this design pattern wouldn't be equally applicable to a problem addressed with Python vs any other programming language.

Related

Python: Gateway ORM for Models from third party REST APIs

For working with data from a database inside of Python programmes, we generally use Object Relational Mappers, to translate database entries into python objects we can work with, with sqlAlchemy and Django Models probably being the most common and advanced ORMs.
Are there ORMs that do not connect to a database but to a third party (JSON) REST API instead? I would like to have a framework which lets me deal with Python objects to perform CRUD operations on the API. This should have all the well-established standard functionalities of an ORM, including Unit of Work and Lazy Loading. Actually, I would want my python code to be agnostic about whether the model is stored in a database or being fetched from a third party API.
It is hard for me to imagine that such a thing does not yet exist. But I am not able to find it. Maybe I am not knowing the right words to search for it?
ORMs Frameworks are frameworks that connect to databases. From your description you are talking about a DAO pattern, not about a Framework. This is a common programming pattern in other languages such as Java.
The right words or searches would be:
Search for the DAO pattern, what to expect from it and how to code it.
Check a couple of links on examples of DAO patterns in python such as this one, or this other one
Analyze your specific problem. You might not need all the code other solutions offer you. And you might be better off coding yourself your own class adjusted to your needs.
Remember KISS and DRY.
PS: Different languages use different paradigms, it is a common error to try to extrapolate patterns and coding uses from one language to another. So something that is solved in e.g. Java in a way, might not be the best option for Python. Keep that in mind too.

The most appropriate way to use Neo4j from Python in 2015

I'm using latest community Neo4j (2.2.0-M03) for storing my graphs. I'm interested in accessing it from Python. According to the official Neo4j documentation, there are several alternatives.
From what I have understood by checking the docs, playing around a bit, and checking this post, py2neo is the only one supporting Neo4j 2 (and labels). However, if I'd like to write and run specific algorithms on Neo4j, I should use Gremlin, through Bulbs, that however does not seem to support Neo4j 2.
Now, I would like to use some custom algorithms not currently in Neo4j, like Spreading Activation.
Is writing algorithms directly in Neo4j in Java and running them from Python using cypher commands through py2neo the only alternative? Am I missing something?
Cheers
PS. I wanted to post links to all the software I cited but unfortunately I need at least 10 reputation to post more than 2 links...
This is a very tough question, it seems you need design guidance not a quick neo4j question. Depending on how you're using spreading activation, it might be better not to modify the server, but I can't tell because your use case is probably involved. Keep in mind that you can always use neo4j as a graph store, and then put higher-level concepts like spreading activation in your application code, not in the server.
The question presumes I think you want to put it in the server. So what are the options? Broadly, you could write a server plugin and extend the RESTful API (which wouldn't help you with py2neo) On the other hand, I don't think defining your own custom cypher function is supported right now, so you can't necessarily modify the cypher language itself, then use py2neo bindings to exploit a fancy new cypher function. Advice given elsewhere suggests you might want to consider an unmanaged extension to implemented spreading activation. If you did this, once again, I don't see how py2neo would help you.
Short term, I think you should consider NOT modifying neo4j itself, but rather putting your spreading activation in python code that maybe uses py2neo. Long-term, if neo4j comes up with a way of doing cypher user-defined functions (UDFs) which I understand is on the development roadmap (maybe?) then that might be a better option, but I wouldn't recommend it without many more requirements and details.

Building a DSL query language

i'm working on a project (written in Django) which has only a few entities, but many rows for each entity.
In my application i have several static "reports", directly written in plain SQL. The users can also search the database via a generic filter form. Since the target audience is really tech-savvy and at some point the filter doesn't fit their needs, i think about creating a query language for my database like YQL or Jira's advanced search.
I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html, but it seems that they only operate on in-memory objects. Since the database can be too large for holding it in-memory, i would prefer that the query is translated in SQL (or better a Django query) before doing the actual work.
Are there any library or best practices on how to do this?
Writing such a DSL is actually surprisingly easy with PLY, and what ho—there's already an example available for doing just what you want, in Django. You see, Django has this fancy thing called a Q object which make the Django querying side of things fairly easy.
At DjangoCon EU 2012, Matthieu Amiguet gave a session entitled Implementing Domain-specific Languages in Django Applications in which he went through the process, right down to implementing such a DSL as you desire. His slides, which include all you need, are available on his website. The final code (linked to from the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html.
Reinout van Rees also produced some good comments on that session. (He normally does!) These cover a little of the missing context.
You see in there something very similar to YQL and JQL in the examples given:
groups__name="XXX" AND NOT groups__name="YYY"
(modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"
It can also be tweaked very easily; for example, you might want to use groups.name rather than groups__name (I would). This modification could be made fairly trivially (allow . in the FIELD token, by modifying t_FIELD, and then replacing . with __ before constructing the Q object in p_expression_ID).
So, that satisfies simple querying; it also gives you a good starting point should you wish to make a more complex DSL.
I've faced exactly this problem - a large database which needs searching. I made some static reports and several fancy filters using django (very easy with django) just like you have.
However the power users were clamouring for more. I decided that there already was a DSL that they all knew - SQL. The question was how to make it secure enough.
So I used django permissions to give the power users permission to make SQL queries in a new table. I then made a view for the not-quite-so-power users to use these queries. I made them take optional parameters. The queries were run using Python's lower level DB-API which django is using under the hood for its ORM anyway.
The real trick was opening a read only database connection to run these queries just to make sure that no updates were ever run. I made a read only connection by creating a different user in the database with lower permissions and opening a specific connection for that in the view.
TL;DR - SQL is the way to go!
Depending on the form of your data, the types of queries your users need to use, and the frequency that your data is updated, an alternative to the pure SQL solution suggested by Nick Craig-Wood is to index your data in Solr and then run queries against it.
Solr is an added layer of complexity (configuration, data synchronization) but it is super-fast, can handle large datasets, and provides a (relatively) intuitive query language.
You could write your own SQL-ish language using pyparsing, actually. There is even pretty verbose example you could extend.

Embedding Python code as a preprocessor PHP style

I'm going back over an old project where I added preprocessor functionality to Essence' and I realised that my previous solution of writing a domain specific language and associated lexer/parser was overkill.
Instead I just need to be able to embed dynamic language code into the file, isolate it at runtime, eval and insert the results. In other words very similar to the PHP model of inserting dynamic code into HTML. I'd rather not use PHP as Python is much easier to distribute as part of a larger project (IronPython or Jython)
So the question goes, how best to implement something like the following:
<code>Python goes here</code>
Lots of essence <code>Python</code> prime code goes here
I don't want to have to alter the structure of the Essence' file (if I remove all the code blocks everything left should be able to be syntactically correct. It needs to be able to insert text in place of a code block like PHP.
Finally security wise I'm not bothered about code injection, as it would be the user themselves choosing the file to execute although if there were security benefits to one model over another with no extra costs that would obviously be good.
Cheers in advance
Your best bet is to use one of the already made (and battle tested) Templating Engines. The two big ones that I've used are Mako, and Cheetah. They allow you to embed code right in the page, and are mostly used as the View in an MVC architecture.
If you feel that using one of those engines is overkill for your project, here is a small tutorial on how to implement basic templates yourself. Keep in mind that the example will need to be modified to suit your particular project/needs.

Graph databases and RDF triplestores: storage of graph data in python

I need to develop a graph database in python (I would enjoy if anybody can join me in the development. I already have a bit of code, but I would gladly discuss about it).
I did my research on the internet. in Java, neo4j is a candidate, but I was not able to find anything about actual disk storage. In python, there are many graph data models (see this pre-PEP proposal, but none of them satisfy my need to store and retrieve from disk.
I do know about triplestores, however. triplestores are basically RDF databases, so a graph data model could be mapped in RDF and stored, but I am generally uneasy (mainly due to lack of experience) about this solution. One example is Sesame. Fact is that, in any case, you have to convert from in-memory graph representation to RDF representation and viceversa in any case, unless the client code wants to hack on the RDF document directly, which is mostly unlikely. It would be like handling DB tuples directly, instead of creating an object.
What is the state-of-the-art for storage and retrieval (a la DBMS) of graph data in python, at the moment? Would it make sense to start developing an implementation, hopefully with the help of someone interested in it, and in collaboration with the proposers for the Graph API PEP ? Please note that this is going to be part of my job for the next months, so my contribution to this eventual project is pretty damn serious ;)
Edit: Found also directededge, but it appears to be a commercial product
I have used both Jena, which is a Java framework, and Allegrograph (Lisp, Java, Python bindings). Jena has sister projects for storing graph data and has been around a long, long time. Allegrograph is quite good and has a free edition, I think I would suggest this cause it is easy to install, free, fast and you could be up and going in no time. The power you would get from learning a little RDF and SPARQL may very well be worth your while. If you know SQL already then you are off to a great start. Being able to query your graph using SPARQL would yield some great benefits to you. Serializing to RDF triples would be easy, and some of the file formats are super easy ( NT for instance ). I'll give an example. Lets say you have the following graph node-edge-node ids:
1 <- 2 -> 3
3 <- 4 -> 5
these are already subject predicate object form so just slap some URI notation on it, load it in the triple store and query at-will via SPARQL. Here it is in NT format:
<http://mycompany.com#1> <http://mycompany.com#2> <http://mycompany.com#3> .
<http://mycompany.com#3> <http://mycompany.com#4> <http://mycompany.com#5> .
Now query for all nodes two hops from node 1:
SELECT ?node
WHERE {
<http://mycompany.com#1> ?p1 ?o1 .
?o1 ?p2 ?node .
}
This would of course yield <http://mycompany.com#5>.
Another candidate would be Mulgara, written in pure Java. Since you seem more interested in Python though I think you should take a look at Allegrograph first.
I think the solution really depends on exactly what it is you want to do with the graph once you have managed to store it on disk/in database, and this is a little unclear in your question. However, a couple of things you might wish to consider are:
if you just want to persist the graph without using any of the features or properties you might expect from an rdbms solution (such as ACID), then how about just pickling the objects into a flat file? Very rudimentary, but like I say, depends on exactly what you want to achieve.
ZODB is an object database for Python (a spin off from the Zope project I think). I can't say I've had much experience of it in a high performance environment, but bar a few restrictions does allow you to store Python objects natively.
if you wish to pursue RDF, there is an RDF Alchemy project which might help to alleviate some of your concerns about converting from your graph to RDF structures and I think has Sesame as part of it's stack.
There are some other persistence tools detailed on the python site which may be of interest, however I spent quite a while looking into this area last year, and ultimately I found there wasn't a native Python solution that met my requirements.
The most success I had was using MySQL with a custom ORM and I posted a couple of relevant links in an answer to this question. Additionally, if you want to contribute to an RDBMS project, when I spoke to someone from Open Query about a Graph storage engine for MySQL them seemed interested in getting active participation in their project.
Sorry I can't give a more definitive answer, but I don't think there is one... If you do start developing your own implementation, I'd be interested to keep up-to-date with how you get on.
Greetings from your Serius Cybernetics Intelligent Agent!
Some useful links...
Programming the Semantic Web
SEMANTIC PROGRAMMING
RDFLib Python Library for RDF
Hmm, maybe you should take a look at CubicWeb
Regarding Neo4j, did you notice the existing Python bindings? As for the disk storage, take a look at this thread on the mailing list.
For graphdbs in Python, the Hypergraph Database Management System project was recently started on SourceForge by Maurice Ling.
Redland (http://librdf.org) is probably the solution you're looking for. It has Python bindings too.
RDFLib is a python library that you can use. Using harschware's example:
Create a test.nt file like below:
<http://mycompany.com#1> <http://mycompany.com#2> <http://mycompany.com#3> .
<http://mycompany.com#3> <http://mycompany.com#4> <http://mycompany.com#5> .
To query for all nodes two hops from node 1 in RDFLib:
from rdflib import Graph
g = Graph()
g.parse("test.nt", format="nt")
qres = g.query(
"""SELECT ?node
WHERE {
<http://mycompany.com#1> ?p1 ?o1 .
?o1 ?p2 ?node .
}"""
)
for row in qres:
print(node)
Should return the answer <http://mycompany.com#5>.

Categories

Resources