I have a utilities.py file for my python project. It contains only util functions, for example is_float(string), is_empty(file), etc.
Now I want to have a function is_valid(number), which has to:
read from a file, valid.txt, which contains all numbers which are valid, and load them onto a map/set.
check the map for the presence of number and return True or False.
This function is called often, and running time should be as small as possible. I don't want to read open and read valid.txt everytime the function is called. The only solution I have come up with is to use a global variable, valid_dict, which is loaded once from valid.txt when utilities.py is imported. The loading code is written as main in utilities.py.
My question is how do I do this without using a global variable, as it is considered bad practice? What is a good design pattern for doing such a task without using globals? Also note again that this is a util file, so there should ideally be no main as such, just functions.
The following is a simple example of a closure. The dictionary, cache, is encapsulated within the outer function (load_func), but remains in scope of the inner, even when it is returned. Notice that load_func returns the inner function as an object, it does not call it.
In utilities.py:
def _load_func(filename):
cache = {}
with open(filename) as fn:
for line in fn:
key, value = line.split()
cache[int(key)] = value
def inner(number):
return number in cache
return inner
is_valid = _load_func('valid.txt')
In __main__:
from utilities import is_valid # or something similar
if is_valid(42):
print(42, 'is valid')
else:
print(42, 'is not valid')
The dictionary (cache) creation could have been done using a dictionary comprehension, but I wanted you to concentrate on the closure.
The variable valid_dict would not be global but local to utilities.py. It would only become global if you did something like from utilities import *. Now that is considered bad practice when you're developing a package.
However, I have used a trick in cases like this that essentially requires a static variable: Add an argument valid_dict={} to is_valid(). This dictionary will be instantiated only once and each time the function is called the same dict is available in valid_dict.
def is_valid(number, valid_dict={}):
if not valid_dict:
# first call to is_valid: load valid.txt into valid_dict
# do your check
Do NOT assign to valid_dict in the if-clause but only modify it: e.g., by setting keys valid_dict[x] = y or using something like valid_dict.update(z).
(PS: Let me know if this is considered "dirty" or "un-pythonic".)
Related
I'm a bit confused about a code in the book "Learning Python", p. 539.
As far as I know assignments within a function are only in this local scope. So if I want to change a global one I first have to declare it global. But why does the following code change the builtin.open() to custom completely once called?
import builtins
def makeopen(id):
original = builtins.open
def custom(*pargs, **kargs):
print('Custom open call %r: ' % id, pargs, kargs)
return original(*pargs, **kargs)
builtins.open = custom
If I call makeopen('spam') and a F = open('text.txt') afterwards I get the custom call. So the builtin.open() has been changed in the whole script after the makeopen('spam'). Why?
And if I would make some more makeopen('xx') one builtin.open('text.txt') would print the custom call for every created makeopen. Why?
Comparing this code to
x = 99
def changing():
x = 88
changing()
print(x)
doesnt even help me. Isn't it the same but with an x instead of builtin.open()?
A variable is considered local if you assign to it anywhere in the function, unless you declare it global.
In your first piece of code, you never assign anything to builtins, so it's not considered local. You just change one of its attributes, open.
The rule is respected!
In your second piece of code, you assign something to x in x = 88, so it is considered local.
When you call makeopen, you replace the original, global open with custom. custom, when executed, prints its name and calls the original open.
If you call makeopen a second time, it will create a second, different custom function, and make the name builtins.open refer to it. When you call this function, it will print its name, then call original, which is what builtins.open referred to when it was created - and that is your first custom function, which will print its name and call the original open.
So, successive calls to makeopen create a chain of functions, and calling open will make each of them run and call its predecessor.
Can you run a for loop over the names of multiple subsets?
For instance, I now have subsets dfVC1 up until dfVC20 and I would like to do something like:
for x in range(20):
print(dfVC[x])
I get this doesn't work... but wonder if there is a way to do this.
I'm going to assume your 'subsets' in this case are variables, named dbVC0, dbVC1, etc. Then, your problem is that you want to print all of them by number, but since they're variables, you can't.
One way to solve this would be to change how the 'subsets' are declared. Instead of
dfVC0 = ...
dfVC1 = ...
you could make one dfVC variable that's a dict, that holds all the others at their proper indices.
dfVC = {}
dfVC[0] = ...
dfVC[1] = ...
which would then allow you to access the various dbVC subsets in the way you're currently trying to.
But changing such a large part of the program isn't always possible. What you might be able to do instead is to figure out which object the dfVCs are attached to, and grab them by string.
If they're in the local namespace (i.e. were declared in the same function as you're currently executing in), you can call the built-in locals() to get a dict that you can then try to find your key in:
for x in range(20):
sname = f'dfVC{x}'
print(locals()[sname])
globals() can be used similarly, if your 'subsets' are in the global scope (i.e. declared outside of the current function).
And if your dfVC variables are attached to a class or module (or something else that behaves like a namespace), you can retrieve them using the built-in getattr() function:
for x in range(20):
sname = f'dfVC{x}'
print(getattr(self, sname)) # replace self with whichever object has the dbVC attached to it
True or False
If a function is defined but never called, then Python automatically detects that and issues a warning
One of the issues with this is that functions in Python are first class objects. So their name can be reassigned. For example:
def myfunc():
pass
a = myfunc
myfunc = 42
a()
We also have closures, where a function is returned by another function and the original name goes out of scope.
Unfortunately it is also perfectly legal to define a function with the same name as an existing one. For example:
def myfunc(): # <<< This code is never called
pass
def myfunc():
pass
myfunc()
So any tracking must include the function's id, not just its name - although that won't help with closures, since the id could get reused. It also won't help if the __name__ attribute of the function is reassigned.
You could track function calls using a decorator. Here I have used the name and the id - the id on its own would not be readable.
import functools
globalDict = {}
def tracecall(f):
#functools.wraps(f)
def wrapper(*args, **kwargs):
global globalDict
key = "%s (%d)" % (f.__name__, id(f))
# Count the number of calls
if key in globalDict:
globalDict[key] += 1
else:
globalDict[key] = 1
return f(*args, **kwargs)
return wrapper
#tracecall
def myfunc1():
pass
myfunc1()
myfunc1()
#tracecall
def myfunc1():
pass
a = myfunc1
myfunc1 = 42
a()
print(globalDict)
Gives:
{'myfunc1 (4339565296)': 2, 'myfunc1 (4339565704)': 1}
But that only gives the functions that have been called, not those that have not!
So where to go from here? I hope you can see that the task is quite difficult given the dynamic nature of python. But I hope the decorator I show above could at least allow you to diagnose the way the code is used.
No it is not. Python is not detect this. If you want to detect which functions are called or not during the run time you can use global set in your program. Inside each function add function name to set. Later you can print your set content and check if the the function is called or not.
False. Ignoring the difficulty and overhead of doing this, there's no reason why it would be useful.
A function that is defined in a module (i.e. a Python file) but not called elsewhere in that module might be called from a different module, so that doesn't deserve a warning.
If Python were to analyse all modules that get run over the course of a program, and print a warning about functions that were not called, it may be that a function was not called because of the input in this particular run e.g. perhaps in a calculator program there is a "multiply" function but the user only asked to sum some numbers.
If Python were to analyse all modules that make up a program and note and print a warning about functions that could not possibly be called (this is impossible but stay with me here) then it would warn about functions that were intended for use in other programs. E.g. if you have two calculator programs, a simple one and an advanced one, maybe you have a central calc.py with utility functions, and then advanced functions like exp and log could not possibly be called when that's used as part of simple program, but that shouldn't cause a warning because they're needed for the advanced program.
I'm new to Python and using Anaconda (editor: Spyder) to write some simple functions. I've created a collection of 20 functions and saved them in separate .py files (file names are the same as function names).
For example
def func1(X)
Y=...
return Y
I have another function that takes as input a function name as string (one of those 20 functions), calls it, does some calculations and return the output.
def Main(String,X)
Z=...
W=String(Z)
V=...
return V
How can I choose the function based on string input?
More details:
The Main function calculates the Sobol Indices of a given function. I write the Main function. My colleagues write their own functions (each might be more than 500 lines of codes) and just want to use Main to get the Sobol indices. I will give Main to other people so I do NOT know what Main will get as a function in the future. I also do not want the user of Main to go through the trouble of making a dictionary.
Functions are objects in Python. This means you can store them in dictionaries. One approach is to dispatch the function calls by storing the names you wish to call as keys and the functions as values.
So for example:
import func1, func2
operation_dispatcher = {
"func1": getattr(func1, "func1"),
"func2": getattr(func2, "func2"),
}
def something_calling_funcs(func_name, param):
"""Calls func_name with param"""
func_to_call = operation_dispatcher.get(func_name, None)
if func_to_call:
func_to_call(param)
Now it might be possible to generate the dispatch table more automatically with something like __import__ but there might be a better design in this case (perhaps consider reorganizing your imports).
EDIT took me a minute to fully test this because I had to set up a few files, you can potentially do something like this if you have a lot of names to import and don't want to have to specify each one manually in the dictionary:
import importlib
func_names = ["func1", "func2"]
operation_dispatch = {
name : getattr(importlib.import_module(name), name)
for name in func_names}
#usage
result = operation_dispatch[function_name](param)
Note that this assumes that the function names and module names are the same. This uses importlib to import the module names from the strings provided in func_names here.
You'll just want to have a dictionary of functions, like this:
call = {
"func1": func1,
"functionOne": func1,
"func2": func2,
}
Note that you can have multiple keys for the same function if necessary and the name doesn't need to match the function exactly, as long as the user enters the right key.
Then you can call this function like this:
def Main(String,X)
Z=...
W=call[String](Z)
V=...
return V
Though I recommend catching an error when the user fails to enter a valid key.
def Main(String,X)
Z=...
try:
W=call[String](Z)
except KeyError:
raise(NameError, String + " is not a valid function key")
V=...
return V
I know this must be a trivial question, but I've tried many different ways, and searched quie a bit for a solution, but how do I create and reference subfunctions in the current module?
For example, I am writing a program to parse through a text file, and for each of the 300 different names in it, I want to assign to a category.
There are 300 of these, and I have a list of these structured to create a dict, so of the form lookup[key]=value (bonus question; any more efficient or sensible way to do this than a massive dict?).
I would like to keep all of this in the same module, but with the functions (dict initialisation, etc) at the
end of the file, so I dont have to scroll down 300 lines to see the code, i.e. as laid out as in the example below.
When I run it as below, I get the error 'initlookups is not defined'. When I structure is so that it is initialisation, then function definition, then function use, no problem.
I'm sure there must be an obvious way to initialise the functions and associated dict without keeping the code inline, but have tried quite a few so far without success. I can put it in an external module and import this, but would prefer not to for simplicity.
What should I be doing in terms of module structure? Is there any better way than using a dict to store this lookup table (It is 300 unique text keys mapping on to approx 10 categories?
Thanks,
Brendan
import ..... (initialisation code,etc )
initLookups() # **Should create the dict - How should this be referenced?**
print getlookup(KEY) # **How should this be referenced?**
def initLookups():
global lookup
lookup={}
lookup["A"]="AA"
lookup["B"]="BB"
(etc etc etc....)
def getlookup(value)
if name in lookup.keys():
getlookup=lookup[name]
else:
getlookup=""
return getlookup
A function needs to be defined before it can be called. If you want to have the code that needs to be executed at the top of the file, just define a main function and call it from the bottom:
import sys
def main(args):
pass
# All your other function definitions here
if __name__ == '__main__':
exit(main(sys.argv[1:]))
This way, whatever you reference in main will have been parsed and is hence known already. The reason for testing __name__ is that in this way the main method will only be run when the script is executed directly, not when it is imported by another file.
Side note: a dict with 300 keys is by no means massive, but you may want to either move the code that fills the dict to a separate module, or (perhaps more fancy) store the key/value pairs in a format like JSON and load it when the program starts.
Here's a more pythonic ways to do this. There aren't a lot of choices, BTW.
A function must be defined before it can be used. Period.
However, you don't have to strictly order all functions for the compiler's benefit. You merely have to put your execution of the functions last.
import # (initialisation code,etc )
def initLookups(): # Definitions must come before actual use
lookup={}
lookup["A"]="AA"
lookup["B"]="BB"
(etc etc etc....)
return lookup
# Any functions initLookups uses, can be define here.
# As long as they're findable in the same module.
if __name__ == "__main__": # Use comes last
lookup= initLookups()
print lookup.get("Key","")
Note that you don't need the getlookup function, it's a built-in feature of a dict, named get.
Also, "initialisation code" is suspicious. An import should not "do" anything. It should define functions and classes, but not actually provide any executable code. In the long run, executable code that is processed by an import can become a maintenance nightmare.
The most notable exception is a module-level Singleton object that gets created by default. Even then, be sure that the mystery object which makes a module work is clearly identified in the documentation.
If your lookup dict is unchanging, the simplest way is to just make it a module scope variable. ie:
lookup = {
'A' : 'AA',
'B' : 'BB',
...
}
If you may need to make changes, and later re-initialise it, you can do this in an initialisation function:
def initLookups():
global lookup
lookup = {
'A' : 'AA',
'B' : 'BB',
...
}
(Alternatively, lookup.update({'A':'AA', ...}) to change the dict in-place, affecting all callers with access to the old binding.)
However, if you've got these lookups in some standard format, it may be simpler simply to load it from a file and create the dictionary from that.
You can arrange your functions as you wish. The only rule about ordering is that the accessed variables must exist at the time the function is called - it's fine if the function has references to variables in the body that don't exist yet, so long as nothing actually tries to use that function. ie:
def foo():
print greeting, "World" # Note that greeting is not yet defined when foo() is created
greeting = "Hello"
foo() # Prints "Hello World"
But:
def foo():
print greeting, "World"
foo() # Gives an error - greeting not yet defined.
greeting = "Hello"
One further thing to note: your getlookup function is very inefficient. Using "if name in lookup.keys()" is actually getting a list of the keys from the dict, and then iterating over this list to find the item. This loses all the performance benefit the dict gives. Instead, "if name in lookup" would avoid this, or even better, use the fact that .get can be given a default to return if the key is not in the dictionary:
def getlookup(name)
return lookup.get(name, "")
I think that keeping the names in a flat text file, and loading them at runtime would be a good alternative. I try to stick to the lowest level of complexity possible with my data, starting with plain text and working up to a RDMS (I lifted this idea from The Pragmatic Programmer).
Dictionaries are very efficient in python. It's essentially what the whole language is built on. 300 items is well within the bounds of sane dict usage.
names.txt:
A = AAA
B = BBB
C = CCC
getname.py:
import sys
FILENAME = "names.txt"
def main(key):
pairs = (line.split("=") for line in open(FILENAME))
names = dict((x.strip(), y.strip()) for x,y in pairs)
return names.get(key, "Not found")
if __name__ == "__main__":
print main(sys.argv[-1])
If you really want to keep it all in one module for some reason, you could just stick a string at the top of the module. I think that a big swath of text is less distracting than a huge mess of dict initialization code (and easier to edit later):
import sys
LINES = """
A = AAA
B = BBB
C = CCC
D = DDD
E = EEE""".strip().splitlines()
PAIRS = (line.split("=") for line in LINES)
NAMES = dict((x.strip(), y.strip()) for x,y in PAIRS)
def main(key):
return NAMES.get(key, "Not found")
if __name__ == "__main__":
print main(sys.argv[-1])