Django==2.2.5
In the examples below two custom filters and two auxiliary functions.
It is a fake example, not a real code.
Two problems with this code:
When a project becomes big I forget what aux functions I have already written. Not to mention team programming. What is the solution here? To organize a separate module for functions that can be imported? And sort them alphabetically?
Some functions from here may be reused outside this package, and some may not. Say, the combine function seems to be reusable, while get_salted_str is definitely for this module only. I think that it is better to distinguish between functions that may be imported and those that may not. Is it better to use underline symbol to mark unimported functions? Like this: _get_salted_str. This may ease the first problem a bit.
Does Django style guide or any other pythonic style guide mention solutions to the two above mentioned problems?
My code example:
def combine(str1, str2):
return "{}_{}".format(str1, str2)
def get_salted_str(str):
SALT = "slkdjghslkdjfghsldfghaaasd"
return combine(str, SALT)
#register.filter
def get_salted_string(str):
return combine(str, get_salted_str(str))
#register.filter
def get_salted_peppered_string(str):
salted_str = get_salted_str(str)
PEPPER = "1234128712908369735619346"
return "{}_{}".format(PEPPER, salted_str)
When a project becomes big I forget what aux functions I have already written. Not to mention team programming. What is the solution here?
Good documentation and proper modularization.
To organize a separate module for functions that can be imported?
Technically, all functions (except of course nested ones) can be imported. Now I assume you meant: "for functions that are meant to be imported from other modules", but even then, it doesn't mean much - it often happens that a function primarily intended for "internal use" (helper function used within the same module) later becomes useful for other modules.
Also, the proper way to regroup function is not based on whether those are for internal or public use (this is handled by prefixing 'internal use only' functions with a single leading underscore), but on how those functions are related.
NB: I use the term "function" because that's how you phrased your question, but this applies to all other names (classes etc).
And sort them alphabetically?
Bad idea IMHO - it doesn't make any sense from a function POV, and can cause issue when merging diverging branches.
Some functions from here may be reused outside this package, and some may not. Say, the combine function seems to be reusable, while "get_salted_str" is definitely for this module only. I think that it is better to distinguish between functions that may be imported and those that may not. Is it better to use underline symbol to mark unimported functions? Like this: _get_salted_str. This may ease the first problem a bit.
Why would you prevent get_salted_str from being imported by another module actually ?
'protected' (single leading underscore) names are for implementation parts that the module's client code should not mess with nor even be aware of - this is called "encapsulation" -, the goal being to allow for implementation changes that won't break the client code.
In your example, get_salted_str() is a template filter, so it's obviously part of your package's public API.
OTHO, combine really looks like an implementation detail - the fact that some unrelated code in another package may need to combine two strings with the same separator seems mostly accidental, and if you expose combine as part of the module's API you cannot change it's implementation anyway. This is typically an implementation function as far as I can tell from your example (and also it's so trivial that it really doesn't warrant being exposed as ar as I'm concerned).
As a more general level: while avoiding duplication is a very audable goal, you must be careful of overdoing it. Some duplication is actually "accidental" - at some point in time, two totally unrelated parts of the code have a few lines in common, but for totally different reasons, and the forces that may lead to a change in one point of the code are totally unrelated to the other part. So before factoring out seemingly duplicated code, ask yourself if this code is doing the same thing for the same reasons and whether changing this code in one part should affect the other part too.
Does Django style guide or any other pythonic style guide mention solutions to the two above mentioned problems?
This is nothing specific to Django, nor even to Python. Writing well organized code relies on the same heuristics whatever the language: you want high cohesions (all functions / classes etc in a same module should be related and provide solutions to the same problems) and low coupling (a module should depend on as few other modules as possible).
NB: I'm talking about "modules" here but the same rules hold for packages (a package is kind of a super-module) or classes (a class is a kind of mini-module too - except that you can have multiple instances of it).
Now it must be said that proper modularisation - like proper naming etc - IS hard. It takes time and experience (and a lot of reflexion) to develop (no pun intended but...) a "feel" for it, and even then you often find yourself reorganizing things quite a bit during your project's lifetime. And, well, there almost always be some messy area somewhere, because sometimes finding out where a given feature really belongs is a bit of wild guess (hint: look for modules or packages named "util" or "utils" or "helpers" - those are usually where the dev regrouped stuff that didn't clearly belong anywhere else).
There are a lot of ways to go about this, so here is the way I always handle this:
1. Reusable functions in a project
First and foremost: Documentation. When working in a big team you definitely need to document reusable function.
Second, packages. When creating a lot of auxiliary/helper functions, that might have a use outside the current module or app, it can be useful to bundle them all together. I often create a 'base' or 'utils' package in my Django project where I bundle all sorts of functions.
The django.contrib package is a pretty good example of all sorts of helper packages bundled into one.
My rule of thumb is, if I find that I reuse some function/piece of code, I move it to my utils package, and if it's related to something else in that package, I bundle them together. That makes it pretty easy to keep track of all the functions there are.
2. Private functions
Python doesn't really have private members, but the generally accepted way to 'mark' a member as private is to add an underscore, like _get_salted_str
3. Style guide
With regards to auxiliary functions, I'm not aware of any styleguide.
'Private' members : https://docs.python.org/3/tutorial/classes.html#private-variables
I'm developing a GUI application in Python that stores it's documents in an XML based format. The application is a mathematical model which several pre-defined components which can be drag-and-dropped. I'd also like the user to be able to create custom components by writing a python function inside an editor provided within the application. My issue is with storing these functions in the XML.
A function might look something like this:
def func(node, timestamp):
return node.weight * timestamp.day + 4
These functions are wrapped in an object which provides a standard way of calling them (compared to the pre-defined components). If I was to create one from Python directly it would look like this:
parameter = ParameterFunction(func)
The function is then called by the model like this:
parameter.value(node=node, timestamp=timestamp)
The ParameterFunction object has a to_xml and from_xml functions which need to serialise/deserialise the object to/from an XML representation.
My question is: how do I store the Python functions in an XML document?
One solution I have thought of so far is to store the function definition as a string, eval() or exec() it for use but keep the string, then store the string in a CDATA block in the XML. Are there any issues with this that I'm not seeing?
An alternative would be to store all of the Python code in a separate file, and have the XML reference just the function names. This could be nice as it could be edited easily in an external editor. In which case what is the best way to import the code? I am envisiging fighting with the python import path...
I'm aware there are will be security concerns with running untrusted code, but I'm willing to make this tradeoff for the freedom it gives users.
The specific application I'm referring to is on github. I'm happy to provide more information if it's needed, but I've tried to keep it fairly generic here. https://github.com/snorfalorpagus/pywr/blob/120928eaacb9206701ceb9bc91a5d73740db1953/pywr/core.py#L396-L402
Nope, you have the easiest and best solution that I can think of. Just keep them as strings, as long as your not worried about running the untrusted code.
The way I'd deal with external python scripts containing tiny snippets like yours would be to treat them as plain text files and read them in as strings. This avoids all the problems with importing them. Just read them in and call exec on them, then the functions will exist in scope.
EDIT: I was going to add something on sandboxing python code, but after a bit of research it seems this will not be an easy task, it would be easier to sandbox the entire program. Another longer and harder way to restrict the untrusted code would be to create your own tiny interpreter that only did safe operations (i.e mathematical operations, calling existing functions, etc..)
Reasoning: I'm trying to convert a large library from Scheme to Python
Are there any good strategies for doing this kind of conversion? Specifically cross-paradigm in this case since Python is more OO and Scheme is Functional.
Totally subjective so I'm making it community wiki
I would treat the original language implementation almost like a requirements specification, and write up a design based on it (most importantly including detailed interface definitions, both for the external interfaces and for those between modules within the library). Then I would implement from that design.
What I would most definitely NOT do is any kind of function-by-function translation.
Use the scheme implementation as a way of generating test cases. I'd write a function that can call scheme code, and read the output, converting it back into python.
That way, you can write test cases that look like this:
def test_f():
assert_equal(library.f(42), reference_implementation('(f 42)'))
This doesn't help you translate the library, but it will give you pretty good confidence that what you have gives the right results.
Of course, depending on what the scheme does, it may not be quite as simple as this...
I would setup a bunch of whiteboards and write out the algorithms from the Scheme code. Then I would implement the algorithms in Python. Then, as #PaulHankin suggests, use the Scheme code as a way to write test cases to test the Python code
If you don't have time to do as the others have suggested and actually re-implement the functionality, there is no reason you CAN'T implement it in a strictly functional fashion.
Python supports the key features necessary to do functional programming, and you might find that your time was better spent doing other things, especially if absolute optimization is not required. On the other hand, you might find bug-hunting to be quite hard.
Write a Python interpreter in Scheme and directly translate your program to that :-) You can start with def:
(define-syntax def
(syntax-rules ()
((def func-name rest ...)
(define func-name (lambda rest ...)))))
;; test
(def sqr (x) (* x x))
(sqr 2) => 4
I'm writing a server that I expect to be run by many different people, not all of whom I will have direct contact with. The servers will communicate with each other in a cluster. Part of the server's functionality involves selecting a small subset of rows from a potentially very large table. The exact choice of what rows are selected will need some tuning, and it's important that it's possible for the person running the cluster (eg, myself) to update the selection criteria without getting each and every server administrator to deploy a new version of the server.
Simply writing the function in Python isn't really an option, since nobody is going to want to install a server that downloads and executes arbitrary Python code at runtime.
What I need are suggestions on the simplest way to implement a Domain Specific Language to achieve this goal. The language needs to be capable of simple expression evaluation, as well as querying table indexes and iterating through the returned rows. Ease of writing and reading the language is secondary to ease of implementing it. I'd also prefer not to have to write an entire query optimiser, so something that explicitly specifies what indexes to query would be ideal.
The interface that this will have to compile against will be similar in capabilities to what the App Engine datastore exports: You can query for sequential ranges on any index on the table (eg, less-than, greater-than, range and equality queries), then filter the returned row by any boolean expression. You can also concatenate multiple independent result sets together.
I realise this question sounds a lot like I'm asking for SQL. However, I don't want to require that the datastore backing this data be a relational database, and I don't want the overhead of trying to reimplement SQL myself. I'm also dealing with only a single table with a known schema. Finally, no joins will be required. Something much simpler would be far preferable.
Edit: Expanded description to clear up some misconceptions.
Building a DSL to be interpreted by Python.
Step 1. Build the run-time classes and objects. These classes will have all the cursor loops and SQL statements and all of that algorithmic processing tucked away in their methods. You'll make heavy use of the Command and Strategy design patterns to build these classes. Most things are a command, options and choices are plug-in strategies. Look at the design for Apache Ant's Task API -- it's a good example.
Step 2. Validate that this system of objects actually works. Be sure that the design is simple and complete. You're tests will construct the Command and Strategy objects, and then execute the top-level Command object. The Command objects will do the work.
At this point you're largely done. Your run-time is just a configuration of objects created from the above domain. [This isn't as easy as it sounds. It requires some care to define a set of classes that can be instantiated and then "talk among themselves" to do the work of your application.]
Note that what you'll have will require nothing more than declarations. What's wrong with procedural? One you start to write a DSL with procedural elements, you find that you need more and more features until you've written Python with different syntax. Not good.
Further, procedural language interpreters are simply hard to write. State of execution, and scope of references are simply hard to manage.
You can use native Python -- and stop worrying about "getting out of the sandbox". Indeed, that's how you'll unit test everything, using a short Python script to create your objects. Python will be the DSL.
["But wait", you say, "If I simply use Python as the DSL people can execute arbitrary things." Depends on what's on the PYTHONPATH, and sys.path. Look at the site module for ways to control what's available.]
A declarative DSL is simplest. It's entirely an exercise in representation. A block of Python that merely sets the values of some variables is nice. That's what Django uses.
You can use the ConfigParser as a language for representing your run-time configuration of objects.
You can use JSON or YAML as a language for representing your run-time configuration of objects. Ready-made parsers are totally available.
You can use XML, too. It's harder to design and parse, but it works fine. People love it. That's how Ant and Maven (and lots of other tools) use declarative syntax to describe procedures. I don't recommend it, because it's a wordy pain in the neck. I recommend simply using Python.
Or, you can go off the deep-end and invent your own syntax and write your own parser.
I think we're going to need a bit more information here. Let me know if any of the following is based on incorrect assumptions.
First of all, as you pointed out yourself, there already exists a DSL for selecting rows from arbitrary tables-- it is called "SQL". Since you don't want to reinvent SQL, I'm assuming that you only need to query from a single table with a fixed format.
If this is the case, you probably don't need to implement a DSL (although that's certainly one way to go); it may be easier, if you are used to Object Orientation, to create a Filter object.
More specifically, a "Filter" collection that would hold one or more SelectionCriterion objects. You can implement these to inherit from one or more base classes representing types of selections (Range, LessThan, ExactMatch, Like, etc.) Once these base classes are in place, you can create column-specific inherited versions which are appropriate to that column. Finally, depending on the complexity of the queries you want to support, you'll want to implement some kind of connective glue to handle AND and OR and NOT linkages between the various criteria.
If you feel like it, you can create a simple GUI to load up the collection; I'd look at the filtering in Excel as a model, if you don't have anything else in mind.
Finally, it should be trivial to convert the contents of this Collection to the corresponding SQL, and pass that to the database.
However: if what you are after is simplicity, and your users understand SQL, you could simply ask them to type in the contents of a WHERE clause, and programmatically build up the rest of the query. From a security perspective, if your code has control over the columns selected and the FROM clause, and your database permissions are set properly, and you do some sanity checking on the string coming in from the users, this would be a relatively safe option.
"implement a Domain Specific Language"
"nobody is going to want to install a server that downloads and executes arbitrary Python code at runtime"
I want a DSL but I don't want Python to be that DSL. Okay. How will you execute this DSL? What runtime is acceptable if not Python?
What if I have a C program that happens to embed the Python interpreter? Is that acceptable?
And -- if Python is not an acceptable runtime -- why does this have a Python tag?
Why not create a language that when it "compiles" it generates SQL or whatever query language your datastore requires ?
You would be basically creating an abstraction over your persistence layer.
You mentioned Python. Why not use Python? If someone can "type in" an expression in your DSL, they can type in Python.
You'll need some rules on structure of the expression, but that's a lot easier than implementing something new.
You said nobody is going to want to install a server that downloads and executes arbitrary code at runtime. However, that is exactly what your DSL will do (eventually) so there probably isn't that much of a difference. Unless you're doing something very specific with the data then I don't think a DSL will buy you that much and it will frustrate the users who are already versed in SQL. Don't underestimate the size of the task you'll be taking on.
To answer your question however, you will need to come up with a grammar for your language, something to parse the text and walk the tree, emitting code or calling an API that you've written (which is why my comment that you're still going to have to ship some code).
There are plenty of educational texts on grammars for mathematical expressions you can refer to on the net, that's fairly straight forward. You may have a parser generator tool like ANTLR or Yacc you can use to help you generate the parser (or use a language like Lisp/Scheme and marry the two up). Coming up with a reasonable SQL grammar won't be easy. But google 'BNF SQL' and see what you come up with.
Best of luck.
It really sounds like SQL, but perhaps it's worth to try using SQLite if you want to keep it simple?
It sounds like you want to create a grammar not a DSL. I'd look into ANTLR which will allow you to create a specific parser that will interpret text and translate to specific commands. ANTLR provides libraries for Python, SQL, Java, C++, C, C# etc.
Also, here is a fine example of an ANTLR calculation engine created in C#
A context-free grammar usually has a tree like structure and functional programs have a tree like structure too. I don't claim the following would solve all of your problems, but it is a good step in the direction if you are sure that you don't want to use something like SQLite3.
from functools import partial
def select_keys(keys, from_):
return ({k : fun(v, row) for k, (v, fun) in keys.items()}
for row in from_)
def select_where(from_, where):
return (row for row in from_
if where(row))
def default_keys_transform(keys, transform=lambda v, row: row[v]):
return {k : (k, transform) for k in keys}
def select(keys=None, from_=None, where=None):
"""
SELECT v1 AS k1, 2*v2 AS k2 FROM table WHERE v1 = a AND v2 >= b OR v3 = c
translates to
select(dict(k1=(v1, lambda v1, r: r[v1]), k2=(v2, lambda v2, r: 2*r[v2])
, from_=table
, where= lambda r : r[v1] = a and r[v2] >= b or r[v3] = c)
"""
assert from_ is not None
idfunc = lambda k, t : t
select_k = idfunc if keys is None else select_keys
if isinstance(keys, list):
keys = default_keys_transform(keys)
idfunc = lambda t, w : t
select_w = idfunc if where is None else select_where
return select_k(keys, select_w(from_, where))
How do you make sure that you are not giving users ability to execute arbitrary code. This framework admits all possible functions. Well, you can right a wrapper over it for security that expose a fixed list of function objects that are acceptable.
ALLOWED_FUNCS = [ operator.mul, operator.add, ...] # List of allowed funcs
def select_secure(keys=None, from_=None, where=None):
if keys is not None and isinstance(keys, dict):
for v, fun keys.values:
assert fun in ALLOWED_FUNCS
if where is not None:
assert_composition_of_allowed_funcs(where, ALLOWED_FUNCS)
return select(keys=keys, from_=from_, where=where)
How to write assert_composition_of_allowed_funcs. It is very difficult to do that it in python but easy in lisp. Let us assume that where is a list of functions to be evaluated in a lips like format i.e. where=(operator.add, (operator.getitem, row, v1), 2) or where=(operator.mul, (operator.add, (opreator.getitem, row, v2), 2), 3).
This makes it possible to write a apply_lisp function that makes sure that the where function is only made up of ALLOWED_FUNCS or constants like float, int, str.
def apply_lisp(where, rowsym, rowval, ALLOWED_FUNCS):
assert where[0] in ALLOWED_FUNCS
return apply(where[0],
[ (apply_lisp(w, rowsym, rowval, ALLOWED_FUNCS)
if isinstance(w, tuple)
else rowval if w is rowsym
else w if isinstance(w, (float, int, str))
else None ) for w in where[1:] ])
Aside, you will also need to check for exact types, because you do not want your types to be overridden. So do not use isinstance, use type in (float, int, str). Oh boy we have run into:
Greenspun's Tenth Rule of Programming: any sufficiently complicated
C or Fortran program contains an ad hoc informally-specified
bug-ridden slow implementation of half of Common Lisp.