New file for each instance creation - python

I have a class that needs to write to a file. My program creates multiple of these classes and I want to avoid write collisions. I tried to avoid it by using a static variable so each class has a unique file name. ie:
class Foo:
instance_count = 1
#staticmethod
def make():
file_name = Foo.instance_count + '-' + 'file.foo'
Foo.instance_count += 1
Foo(file_name)
def Foo(self, fname):
self.fname = fname
This works to some extent but doesn't work in cases where the class may be created in parallel. How can I make this more robust?
EDIT:
My use case has this class being created in my app, which is served by gunicorn. So I launch my app with gunicorn, with lets say 10 workers, so I can't actually manage the communication between them.

You could make use of something like uuid instead if unique name is what you are after.
EDIT:
If you would want readable but unique names, I would suggest you to look into guarding your increment statement above with a lock so that only one process is increasing it at any point of time and also perhaps make the file creation and the increment operation atomic.

Easy, just use another text file to keep the filenames and the number/code you need to identify. You can use JSON, pickle, or just your own format.
In the __init__ function, you can read to your file. Then, make a new file based on the information you get.
File example:
File1.txt,1
Fil2e.txt,2
And the __init__ function:
def __init__(self):
counter = int(open('counter.txt', 'r').read()[-2:].strip())
with open("File%d.txt"%counter+1, "w"):
#do things
#don't forget to keep the information of your new file

Really what I was looking for was a way to avoid write contention. Then I realized why not just use the logger. Might be a wrong assumption, but I would imagine the logger takes care of locking files for writing. Plus it flushes on every write, so it meets that requirement. As for speed, the overhead definitely does not affect me in this case.
The other solution I found was to use the tempfile class. This would create a unique file for each instantiation of the class.
import tempfile as tf
class Foo:
#staticmethod
def make():
file_name = tf.NamedTemporaryFile('w+',suffix="foofile",dir="foo/dir")
Foo(file_name)
def __init__(self, fname):
self.fname = fname

Related

Using a classmethod to retrieve or load data on init

I have a time-consuming database lookup (downloads data from online) which I want to avoid doing constantly, so I would like to pickle the data if I don't already have it.
This data is being used by the class which has this classmethod.
Is this a 'proper' or expected use of a classmethod? I feel like I could fairly easily refactor it to be an instance method but it feels like it should be a classmethod due to what it's doing. Below is a mockup of the relevant parts of the class.
import os
import pickle
class Example:
def __init__(self):
self.records = self.get_records()
#classmethod
def get_records(cls):
"""
If the records aren't already downloaded from the server,
get them and add to a pickle file.
Otherwise, just load the pickle file.
"""
if not os.path.exists('records.pkl'):
# Slow request
records = get_from_server()
with open('records.pkl', 'wb') as rec_file:
pickle.dump(records, rec_file)
else:
with open('records.pkl', 'rb') as rec_file:
records = pickle.load(rec_file)
return records
def use_records(self):
for item in self.records:
...
Is there also an easy way to refactor this so that I can retrieve the data on request, even if the pickle file exists? Is that as simple as just adding another argument to the classmethod?
Thanks for any help.

Python classes - run method when any other method is called

I am writing a Python app which will use a config file, so I am delegating the control of the config file to a dedicated module, configmanager, and within it a class, ConfigManager.
Whenever a method within ConfigManager is run, which will change my config file in some way, I will need to get the latest version of the file from the disk. Of course, in the spirit of DRY, I should delegate the opening of the config file to it's own function.
However, I feel as though explicitly calling a method to get and return the config file in each function that edits it is not very "clean".
Is there a recommended way in Python to run a method, and make a value available to other methods in a class, whenever and before a method is run in that class?
In other words:
I create ConfigManager.edit_config().
Whenever ConfigManager.edit_config() is called, another function ConfigManager.get_config_file() is run.
ConfigManager.get_config_file() makes a value available to the method ConfigManager.edit_config().
And ConfigManager.edit_config() now runs, having access to the value given by ConfigManager.get_config_file().
I expect to have many versions of edit_config() methods in ConfigManager, hence the desire to DRY my code.
Is there a recommended way of accomplishing something like this? Or should I just create a function to get the config fine, and manually call it each time?
The natural way to have:
ConfigManager.get_config_file() makes a value available to the method
ConfigManager.edit_config().
is to have get_config_file() return that value.
Just call get_config_file() within edit_config().
If there are going to be many versions of edit_config(), then a decorator might be the way to go:
def config_editor(func):
def wrapped(self, *args, **kwargs):
config_file = self.get_config_file()
func(self, config_file, *args, **kwargs)
return func
class ConfigManager
.
.
.
#config_editor
def edit_config1(self, config_file, arg1):
...
#config_editor
def edit_config2(self, config_file, arg1, arg2):
...
ConfigManager mgr
mgr.edit_config1(arg1)
I don't actually like this:
Firstly, the declaration of edit_config1 takes one more argument than the actual usage needs (because the decorator supplies the additional argument).
Secondly, it doesn't actually save all that much boiler plate over:
def edit_config3(self, arg1):
config_file = self.get_config_file()
In conclusion, I don't think the decorators save enough repetition to be worth it.
Since you get something from disk, you open a file. So, you could use the class with the with "function" of python.
You should check the context managers. With that, you will be able to implement the functionality that you want each time that someone access the config file through the __enter__ method and (if it is needed) implement the functionality for stop using the resource with the __exit__ method.

Variable behaviour in python - making more efficient code

Trying to do some optimization here on a class. We're trying not to change too much the class definitions. In essence we are instantiating a ClassA N times but one of the methods has a nasty file read.
for x in range(0, N):
cl = ClassA()
cl.dostuff(x)
The class looks like this:
class ClassA:
def dostuff(self, x):
#open nasty file here
nastyfile = open()
do something else
We could bring that file read out of the class and put in before the loop as the file will not change. But is there a way we can ensure that we only ever open the nasty file once for instances of the class. I.e. so for example on the first instantiate of the class it is defined for all future instances of the class without having to read in again. Is there a way to do this in the current form without really changing the structure too much of the existing code base.
One question relates to the interpreter - i.e. is python smart enough to cache variables just as nastyfile, so that we do as we are, or is the quick and dirty solution the following:
nastyfile = open()
for x in range(0, 1):
cl = ClassA()
cl.dostuff(x)
Looking for a pythonic way to do this.
You could encapsulate opening the file in a classmethod.
class ClassA():
#classmethod
def open_nasty_file(cls):
cls.nasty_file = open('file_path', 'file_mode')
def do_stuff(self):
if not hasattr(self, 'nasty_file'):
self.open_nasty_file()
This approach relies on the fact that attribute look-ups will try finding the attribute on the class if not found on the instance.
You could put this check/instantiation in the __init__ function if you want it opened when the first instance is instantiated.
Note that this method will leave the file open, so it will need to be closed at some point.
You could have a class method that opens the file when the first instance asks for it. I've wrapped it in a lock so that it is thread safe.
import threading
class ClassA:
_nasty_file = None
_nasty_file_lock = threading.Lock()
def dostuff(self, x):
#open nasty file here
nastyfile = get_nasty_file()
do something else
#classmethod
def get_nasty_file(cls):
with cls._nasty_file_lock:
if cls._nasty_file is None:
with open('nastyfile') as fp:
cls._nasty_file = fp.read()
return cls._nasty_file
Instances can access and modify class attributes by themselves. So you can just set up an attribute on the class and provide it with a default (None) value, and then check for that value before doing anything in dostuff. Example:
class A():
nastyfileinfo=None
def dostuff(self,x):
if A.nastyfileinfo: print('nastyfileinfo already exists:',A.nastyfileinfo)
if not A.nastyfileinfo:
print('Adding nastyfileinfo')
A.nastyfileinfo='This is really nasty' ## open()
print('>>>nastyfileinfo:',A.nastyfileinfo)
## Continue doing your other stuff involving x
for j in range(0,10):
A().dostuff(j)
nastyfileinfo is also considered an attribute of the instance, so you can reference it with instance.nastyfileinfo, however if you modify it there it will only update for that one specific instance, whereas if you modify it on the class, all other instances will be able to see it (provided they didn't change their personal/self reference to nastyfileinfo).
instants=[]
for j in range(0,10):
instants.append(A())
for instance in instants:
print(instance.nastyfileinfo)
instants[5].dostuff(5)
for instance in instants:
print(instance.nastyfileinfo)

Class decorator to auto-update properties dictionary on disk?

I am working on a project where I have a number of custom classes to interface with a varied collection of data on a user's system. These classes only have properties as user-facing attributes. Some of these properties are decently resource intensive, so I want to only run the generation code once, and store the returned value on disk (cache it, that is) for faster retrieval on subsequent runs. As it stands, this is how I am accomplishing this:
def stored_property(func):
"""This ``decorator`` adds on-disk functionality to the `property`
decorator. This decorator is also a Method Decorator.
Each key property of a class is stored in a settings JSON file with
a dictionary of property names and values (e.g. :class:`MyClass`
stores its properties in `my_class.json`).
"""
#property
#functools.wraps(func)
def func_wrapper(self):
print('running decorator...')
try:
var = self.properties[func.__name__]
if var:
# property already written to disk
return var
else:
# property written to disk as `null`
return func(self)
except AttributeError:
# `self.properties` does not yet exist
return func(self)
except KeyError:
# `self.properties` exists, but property is not a key
return func(self)
return func_wrapper
class MyClass(object):
def __init__(self, wf):
self.wf = wf
self.properties = self._properties()
def _properties(self):
# get name of class in underscore format
class_name = convert(self.__class__.__name__)
# this is a library used (in Alfred workflows) for interacted with data stored on disk
properties = self.wf.stored_data(class_name)
# if no file on disk, or one of the properties has a null value
if properties is None or None in properties.values():
# get names of all properties of this class
propnames = [k for (k, v) in self.__class__.__dict__.items()
if isinstance(v, property)]
properties = dict()
for prop in propnames:
# generate dictionary of property names and values
properties[prop] = getattr(self, prop)
# use the external library to save that dictionary to disk in JSON format
self.wf.store_data(class_name, properties,
serializer='json')
# return either the data read from file, or data generated in situ
return properties
#this decorator ensures that this generating code is only run if necessary
#stored_property
def only_property(self):
# some code to get data
return 'this is my property'
This code works precisely as I need it, but it still forces me to manually add the _properties(self) method to each class wherein I need this functionality (currently, I have 3). What I want is a way to "insert" this functionality into any class I please. I think that a Class Decorator could get this job done, but try as I might, I can't quite figure out how to wrangle it.
For the sake of clarity (and in case a decorator is not the best way to get what I want), I will try to explain the overall functionality I am after. I want to write a class that contains some properties. The values of these properties are generated via various degrees of complex code (in one instance, I'm searching for a certain app's pref file, then searching for 3 different preferences (any of which may or may not exist) and determining the best single result from those preferences). I want the body of the properties' code only to contain the algorithm for finding the data. But, I don't want to run that algorithmic code each time I access that property. Once I generate the value once, I want to write it to disk and then simply read that on all subsequent calls. However, I don't want each value written to its own file; I want a dictionary of all the values of all the properties of a single class to be written to one file (so, in the example above, my_class.json would contain a JSON dictionary with one key, value pair). When accessing the property directly, it should first check to see if it already exists in the dictionary on disk. If it does, simply read and return that value. If it exists, but has a null value, then try to run the generation code (i.e. the code actually written in the property method) and see if you can find it now (if not, the method will return None and that will once again be written to file). If the dictionary exists and that property is not a key (my current code doesn't really make this possible, but better safe than sorry), run the generation code and add the key, value pair. If the dictionary doesn't exist (i.e. on the first instantiation of the class), run all generation code for all properties and create the JSON file. Ideally, the code would be able to update one property in the JSON file without rerunning all of the generation code (i.e. running _properties() again).
I know this is a bit peculiar, but I need the speed, human-readable content, and elegant code all together. I would really not to have to compromise on my goal. Hopefully, the description of what I want it clear enough. If not, let me know in a comment what doesn't make sense and I will try to clarify. But I do think that a Class Decorator could probably get me there (essentially by inserting the _properties() method into any class, running it on instantiation, and mapping its value to the properties attribute of the class).
Maybe I'm missing something, but it doesn't seem that your _properties method is specific to the properties that a given class has. I'd put that in a base class and have each of your classes with #stored_property methods subclass that. Then you don't need to duplicate the _properties method.
class PropertyBase(object):
def __init__(self, wf):
self.wf = wf
self.properties = self._properties()
def _properties(self):
# As before...
class MyClass(PropertyBase):
#stored_property
def expensive_to_calculate(self):
# Calculate it here
If for some reason you can't subclass PropertyBase directly (maybe you already need to have a different base class), you can probably use a mixin. Failing that, make _properties accept an instance/class and a workflow object and call it explicitly in __init__ for each class.

Module organization, inheritance, and #classmethods

I'm trying to write a class that works kind of like the builtins and some of the other "grown-up" Python stuff I've seen. My Pythonic education is a little spotty, classes-wise, and I'm worried I've got it all mixed up.
I'd like to create a class that serves as a kind of repository, containing a dictionary of unprocessed files (and their names), and a dictionary of processed files (and their names). I'd like to implement some other (sub?)classes that handle things like opening and processing the files. The file handling classes should be able to update the dictionaries in the main class. I'd also like to be able to directly call the various submodules without having to separately instantiate everything, e.g.:
import Pythia
p = Pythia()
p.FileManager.addFile("/path/to/some/file")
or even
Pythia.FileManager.addFile("/path/to/some/file")
I've been looking around at stuff about #classmethod and super and such, but I can't say I entirely understand it. I'm also beginning to suspect that I might have the whole chain of inheritance backwards--that what I think of as my main class should actually be the child class of the handling and processing classes. I'm also wondering whether this would all work better as a package, but that's a separate, very intimidating issue.
Here's my code so far:
#!/usr/bin/python
import re
import os
class Pythia(object):
def __init__(self):
self.raw_files = {}
self.parsed_files = {}
self.FileManger = FileManager()
def listf(self,fname,f):
if fname in self.raw_files.keys():
_isRaw = "raw"
elif fname in self.parsed_files.keys():
_isRaw = "parsed"
else:
return "Error: invalid file"
print "{} ({}):{}...".format(fname,_isRaw,f[:100])
def listRaw(self,n=None):
max = n or len(self.raw_files.items())
for item in self.raw_files.items()[:max]:
listf(item[0],item[1])
def listParsed(self,n=None):
max = n or len(self.parsed_files.items())
for item in self.parsed_files.items()[:max]:
listf(item[0],item[1])
class FileManager(Pythia):
def __init__(self):
pass
def addFile(self,f,name=None,recurse=True,*args):
if name:
fname = name
else:
fname = ".".join(os.path.basename(f).split(".")[:-1])
if os.path.exists(f):
if not os.path.isdir(f):
with open(f) as fil:
Pythia.raw_files[fname] = fil.read()
else:
print "{} seems to be a directory.".format(f)
if recurse == False:
return "Stopping..."
elif recurse == True:
print "Recursively navingating directory {}".format(f)
addFiles(dir,*args)
else:
recurse = raw_input("Recursively navigate through directory {}? (Y/n)".format(f))
if recurse[0].lower() == "n":
return "Stopping..."
else:
addFiles(dir,*args)
else:
print "Error: file or directory not found at {}".format(f)
def addFiles(self,directory=None,*args):
if directory:
self._recursivelyOpen(directory)
def argHandler(arg):
if isinstance(arg,str):
self._recursivelyOpen(arg)
elif isinstance(arg,tuple):
self.addFile(arg[0],arg[1])
else:
print "Warning: {} is not a valid argument...skipping..."
pass
for arg in args:
if not isinstance(arg,(str,dict)):
if len(arg) > 2:
for subArg in arg:
argHandler(subArg)
else:
argHandler(arg)
elif isinstance(arg,dict):
for item in arg.items():
argHandler(item)
else:
argHandler(arg)
def _recursivelyOpen(self,f):
if os.path.isdir(f):
l = [os.path.join(f,x) for x in os.listdir(f) if x[0] != "."]
for x in l:
_recursivelyOpen(x)
else:
addFile(f)
First off: follow PEP8's guidelines. Module names, variable names, and function names should be lowercase_with_underscores; only class names should be CamelCase. Following your code is a little difficult otherwise. :)
You're muddying up OO concepts here: you have a parent class that contains an instance of a subclass.
Does a FileManager do mostly what a Pythia does, with some modifications or extensions? Given that the two only work together, I'd guess not.
I'm not quite sure what you ultimately want this to look like, but I don't think you need inheritance at all. FileManager can be its own class, self.file_manager on a Pythia instance can be an instance of FileManager, and then Pythia can delegate to it if necessary. That's not far from how you're using this code already.
Build small, independent pieces, then worry about how to plug them into each other.
Also, some bugs and style nits:
You call _recursivelyOpen(x) but forgot the self..
Single space after commas.
Watch out for max as a variable name: it's also the name of a builtin function.
Avoid type-checking (isinstance) if you can help it. It's extra-hard to follow your code when it does a dozen different things depending on argument types. Have very clear argument types, and create helper functions that accept different arguments if necessary.
You have Pythia.raw_files[fname] inside FileManager, but Pythia is a class, and it doesn't have a raw_files attribute anyway.
You check if recurse is True, then False, then... something else. When is it something else? Also, you should use is instead of == for testing against the builtin singletons like this.
There is a lot here and you are probably best to educate yourself some more.
For your intended usage:
import Pythia
p = Pythia()
p.file_manager.addFile("/path/to/some/file")
A class structure like this would work:
class FileManager(object):
def __init__(self, parent):
self.parent = parent
def addFile(self, file):
# Your code
self.parent.raw_files[file] = file
def addFiles(self, files)
# Your code
for file in files:
self.parent.raw_files[file] = file
class Pythia(object):
def __init__(self):
self.file_manager = FileManager(self)
However there are a lot of options. You should write some client code first to work out what you want, then implement your class/objects to match that. I don't tend to ever use inheritance in python, it is not really required due to pythons duck typing.
Also if you want a method to be called without instantiating the class use staticmethod, not classmethod. For example:
class FileManager(object):
#staticmethod
def addFiles(files):
pass

Categories

Resources