I'm trying to write a class that works kind of like the builtins and some of the other "grown-up" Python stuff I've seen. My Pythonic education is a little spotty, classes-wise, and I'm worried I've got it all mixed up.
I'd like to create a class that serves as a kind of repository, containing a dictionary of unprocessed files (and their names), and a dictionary of processed files (and their names). I'd like to implement some other (sub?)classes that handle things like opening and processing the files. The file handling classes should be able to update the dictionaries in the main class. I'd also like to be able to directly call the various submodules without having to separately instantiate everything, e.g.:
import Pythia
p = Pythia()
p.FileManager.addFile("/path/to/some/file")
or even
Pythia.FileManager.addFile("/path/to/some/file")
I've been looking around at stuff about #classmethod and super and such, but I can't say I entirely understand it. I'm also beginning to suspect that I might have the whole chain of inheritance backwards--that what I think of as my main class should actually be the child class of the handling and processing classes. I'm also wondering whether this would all work better as a package, but that's a separate, very intimidating issue.
Here's my code so far:
#!/usr/bin/python
import re
import os
class Pythia(object):
def __init__(self):
self.raw_files = {}
self.parsed_files = {}
self.FileManger = FileManager()
def listf(self,fname,f):
if fname in self.raw_files.keys():
_isRaw = "raw"
elif fname in self.parsed_files.keys():
_isRaw = "parsed"
else:
return "Error: invalid file"
print "{} ({}):{}...".format(fname,_isRaw,f[:100])
def listRaw(self,n=None):
max = n or len(self.raw_files.items())
for item in self.raw_files.items()[:max]:
listf(item[0],item[1])
def listParsed(self,n=None):
max = n or len(self.parsed_files.items())
for item in self.parsed_files.items()[:max]:
listf(item[0],item[1])
class FileManager(Pythia):
def __init__(self):
pass
def addFile(self,f,name=None,recurse=True,*args):
if name:
fname = name
else:
fname = ".".join(os.path.basename(f).split(".")[:-1])
if os.path.exists(f):
if not os.path.isdir(f):
with open(f) as fil:
Pythia.raw_files[fname] = fil.read()
else:
print "{} seems to be a directory.".format(f)
if recurse == False:
return "Stopping..."
elif recurse == True:
print "Recursively navingating directory {}".format(f)
addFiles(dir,*args)
else:
recurse = raw_input("Recursively navigate through directory {}? (Y/n)".format(f))
if recurse[0].lower() == "n":
return "Stopping..."
else:
addFiles(dir,*args)
else:
print "Error: file or directory not found at {}".format(f)
def addFiles(self,directory=None,*args):
if directory:
self._recursivelyOpen(directory)
def argHandler(arg):
if isinstance(arg,str):
self._recursivelyOpen(arg)
elif isinstance(arg,tuple):
self.addFile(arg[0],arg[1])
else:
print "Warning: {} is not a valid argument...skipping..."
pass
for arg in args:
if not isinstance(arg,(str,dict)):
if len(arg) > 2:
for subArg in arg:
argHandler(subArg)
else:
argHandler(arg)
elif isinstance(arg,dict):
for item in arg.items():
argHandler(item)
else:
argHandler(arg)
def _recursivelyOpen(self,f):
if os.path.isdir(f):
l = [os.path.join(f,x) for x in os.listdir(f) if x[0] != "."]
for x in l:
_recursivelyOpen(x)
else:
addFile(f)
First off: follow PEP8's guidelines. Module names, variable names, and function names should be lowercase_with_underscores; only class names should be CamelCase. Following your code is a little difficult otherwise. :)
You're muddying up OO concepts here: you have a parent class that contains an instance of a subclass.
Does a FileManager do mostly what a Pythia does, with some modifications or extensions? Given that the two only work together, I'd guess not.
I'm not quite sure what you ultimately want this to look like, but I don't think you need inheritance at all. FileManager can be its own class, self.file_manager on a Pythia instance can be an instance of FileManager, and then Pythia can delegate to it if necessary. That's not far from how you're using this code already.
Build small, independent pieces, then worry about how to plug them into each other.
Also, some bugs and style nits:
You call _recursivelyOpen(x) but forgot the self..
Single space after commas.
Watch out for max as a variable name: it's also the name of a builtin function.
Avoid type-checking (isinstance) if you can help it. It's extra-hard to follow your code when it does a dozen different things depending on argument types. Have very clear argument types, and create helper functions that accept different arguments if necessary.
You have Pythia.raw_files[fname] inside FileManager, but Pythia is a class, and it doesn't have a raw_files attribute anyway.
You check if recurse is True, then False, then... something else. When is it something else? Also, you should use is instead of == for testing against the builtin singletons like this.
There is a lot here and you are probably best to educate yourself some more.
For your intended usage:
import Pythia
p = Pythia()
p.file_manager.addFile("/path/to/some/file")
A class structure like this would work:
class FileManager(object):
def __init__(self, parent):
self.parent = parent
def addFile(self, file):
# Your code
self.parent.raw_files[file] = file
def addFiles(self, files)
# Your code
for file in files:
self.parent.raw_files[file] = file
class Pythia(object):
def __init__(self):
self.file_manager = FileManager(self)
However there are a lot of options. You should write some client code first to work out what you want, then implement your class/objects to match that. I don't tend to ever use inheritance in python, it is not really required due to pythons duck typing.
Also if you want a method to be called without instantiating the class use staticmethod, not classmethod. For example:
class FileManager(object):
#staticmethod
def addFiles(files):
pass
Related
So I have created a directory structure from scratch. I am pretty new to python. This is my code
class File:
def __init__(self, name, contents = [], owner = "No owner currently defined"):
self.name = name
self.contents = contents
self.owner = owner
def CreateOwner(self, new_owner):
self.owner = new_owner
class Directory(File):
def __repr__(self):
return f"Directory({self.name}, {self.contents})"
class PlainFile(File):
def __init__(self, name, owner = "No owner currently defined"):
self.name = name
self.owner = owner
I have made this directory structure
root = Directory("root",
[PlainFile("boot.exe"),
Directory("home", [
Directory("Julius",
[PlainFile("dog.jpg"),
PlainFile("homework.txt")]),
Directory("Cameron", [PlainFile("Elephant.jpg")])])])
And I want to make a function that will recursively print the names of each directory with their subdirectories and files directly underneath them but indented to the right to show that they come from the certain directory. And so each time a new directory is opened from within a directory a new indent is made. I really don't know how to. I have tried for loops and while loops but can't get it to work, I don't even know how to make indentations for opening new directories. Please help :(
Welcome to the world of python and stackoverflow! Good news is that the solution to your problem is very straight-forward. First off, some quick but important comments on your working code:
contents: should probably only be an attribute of the Directory subclass. Also, mutable default arguments is bad practice.
CreateOwner: In python, the convention is to write snake_case instead of CamelCase for everything that isn't class names.
owner: The normal way of implementing a default unset field, even string ones, is usually None.
Here's one way to solve your problem: Have a recursive function with 2 parameters: directory and indentation level. After printing the current Directory, check what you are going to do with each contents element with isinstance(). A Directory requires a recursive call with incremented indent, while a PlainFile is simply printed out. Correctly solved, you will get something like this:
Turning the function into a Directory method is something I'll leave as an exercise to the reader.
Hint:
One way of doing this is to have 2 seperate methods with the one being used from the outside being a shortcut using the current instance to the main method with the main logic which isn't instance-bound. Bonus points if making the latter a classmethod.
Try now and see if it goes better. Comment here if you get stuck again. I have coded a solution which works, but trying yourself first is always better for understanding.
Hopefully you are comfortable with recursion. Just some sample code, so that you can build up. For eg I have not used your repr overload, you can do that easily.
(On a diff note, In your code , you are inheriting from File for directory/Plain file class , but are not calling parent constructor. Some clean up there, and you can simplify this code too)
def rec_print (file_obj:File,level):
try:
if file_obj.contents:
pass
except Exception as e:
return
for obj in file_obj.contents:
indents = "".join(["\t" for idx in range(level)])
print( f"{indents }{obj.name}")
rec_print(obj, level+1)
I just found some test methods in a project which did not have the required "test_" prefix to ensure that they are actually run. It should be possible to avoid this with a bit of linting:
Find all TestCase assertion calls in the code base.
Look for a method with a name starting with "test_" in the call hierarchy.
If there is no such method, print an error message.
I'm wondering how to do the first two, which basically boil down to one problem: how do I find all calls to a specific method in my code base?
Grepping or other text searches won't do, because I need to introspect the results and find parent methods etc. until I either get to the test method or there are no more callers. I need to get a reference to the method to avoid matching methods which happen to have the same name as the ones I'm looking for.
There are 2 possible approaches here.
Static approach:
You could parse the code base using the ast module to identify all function calls and consistently store the origin and the target of the call. You would have to identify all classes and function definition to keep a track of the current context of each call. The limit here is that if you call instance methods, there is no simple way to identify what class the method actually belongs. Same if you use variables that refer to modules
Here is a Visitor subclass that can read Python source files and build a dict {caller: callee}:
class CallMapper(ast.NodeVisitor):
def __init__(self):
self.ctx = []
self.funcs = []
self.calls = collections.defaultdict(set)
def process(self, filename):
self.ctx = [('M', os.path.basename(filename)[:-3])]
tree = ast.parse(open(filename).read(), filename)
self.visit(tree)
self.ctx.pop()
def visit_ClassDef(self, node):
print('ClassDef', node.name, node.lineno, self.ctx)
self.ctx.append(('C', node.name))
self.generic_visit(node)
self.ctx.pop()
def visit_FunctionDef(self, node):
print('FunctionDef', node.name, node.lineno, self.ctx)
self.ctx.append(('F', node.name))
self.funcs.append('.'.join([elt[1] for elt in self.ctx]))
self.generic_visit(node)
self.ctx.pop()
def visit_Call(self, node):
print('Call', vars(node.func), node.lineno, self.ctx)
try:
id = node.func.id
except AttributeError:
id = '*.' + node.func.attr
self.calls['.'.join([elt[1] for elt in self.ctx])].add(id)
self.generic_visit(node)
Dynamic approach:
If you really want to identify what method is called, when more than one could share the same name, you will have to use a dynamic approach. You would decorate individual functions or all methods from a class in order to count how many times they were called, and optionnaly where they were called from. Then you would start the tests and examine what actually happened.
Here is a function that will decorate all methods from a class so that the number all calls will be stored in a dictionnary:
def tracemethods(cls, track):
def tracker(func, track):
def inner(*args, **kwargs):
if func.__qualname__ in track:
track[func.__qualname__] += 1
else:
track[func.__qualname__] = 1
return func(*args, *kwargs)
inner.__doc__ = func.__doc__
inner.__signature__ = inspect.signature(func)
return inner
for name, func in inspect.getmembers(cls, inspect.isfunction):
setattr(cls, name, tracker(func, track))
You could tweak that code to browse the interpretor stack to identify the caller for each call, but is is not very easy because you get the unqualified name of the caller function and will have to use the file name and line number to uniquely identify the caller.
Well, here's a start. You will use a couple of standard libraries:
import dis
import inspect
Suppose you're interested in this source code: myfolder/myfile.py
Then do this:
import myfolder.myfile
def some_func():
''
loads = {'LOAD_GLOBAL', 'LOAD_ATTR'}
name_to_member = dict(inspect.getmembers(myfolder.myfile))
for name, member in name_to_member.items():
if type(member) == type(some_func):
print(name)
for ins in dis.get_instructions(member):
if ins.opname in loads:
print(name, ins.opname, ins.argval)
Other fun things to do: run dis.dis(member), or print out dis.code_info(member).
This will let you visit each function defined in the file,
and visit each executable statement to see if it might be a method call you care about.
Then it's up to you to Do The Right Thing with potential test methods.
Trying to do some optimization here on a class. We're trying not to change too much the class definitions. In essence we are instantiating a ClassA N times but one of the methods has a nasty file read.
for x in range(0, N):
cl = ClassA()
cl.dostuff(x)
The class looks like this:
class ClassA:
def dostuff(self, x):
#open nasty file here
nastyfile = open()
do something else
We could bring that file read out of the class and put in before the loop as the file will not change. But is there a way we can ensure that we only ever open the nasty file once for instances of the class. I.e. so for example on the first instantiate of the class it is defined for all future instances of the class without having to read in again. Is there a way to do this in the current form without really changing the structure too much of the existing code base.
One question relates to the interpreter - i.e. is python smart enough to cache variables just as nastyfile, so that we do as we are, or is the quick and dirty solution the following:
nastyfile = open()
for x in range(0, 1):
cl = ClassA()
cl.dostuff(x)
Looking for a pythonic way to do this.
You could encapsulate opening the file in a classmethod.
class ClassA():
#classmethod
def open_nasty_file(cls):
cls.nasty_file = open('file_path', 'file_mode')
def do_stuff(self):
if not hasattr(self, 'nasty_file'):
self.open_nasty_file()
This approach relies on the fact that attribute look-ups will try finding the attribute on the class if not found on the instance.
You could put this check/instantiation in the __init__ function if you want it opened when the first instance is instantiated.
Note that this method will leave the file open, so it will need to be closed at some point.
You could have a class method that opens the file when the first instance asks for it. I've wrapped it in a lock so that it is thread safe.
import threading
class ClassA:
_nasty_file = None
_nasty_file_lock = threading.Lock()
def dostuff(self, x):
#open nasty file here
nastyfile = get_nasty_file()
do something else
#classmethod
def get_nasty_file(cls):
with cls._nasty_file_lock:
if cls._nasty_file is None:
with open('nastyfile') as fp:
cls._nasty_file = fp.read()
return cls._nasty_file
Instances can access and modify class attributes by themselves. So you can just set up an attribute on the class and provide it with a default (None) value, and then check for that value before doing anything in dostuff. Example:
class A():
nastyfileinfo=None
def dostuff(self,x):
if A.nastyfileinfo: print('nastyfileinfo already exists:',A.nastyfileinfo)
if not A.nastyfileinfo:
print('Adding nastyfileinfo')
A.nastyfileinfo='This is really nasty' ## open()
print('>>>nastyfileinfo:',A.nastyfileinfo)
## Continue doing your other stuff involving x
for j in range(0,10):
A().dostuff(j)
nastyfileinfo is also considered an attribute of the instance, so you can reference it with instance.nastyfileinfo, however if you modify it there it will only update for that one specific instance, whereas if you modify it on the class, all other instances will be able to see it (provided they didn't change their personal/self reference to nastyfileinfo).
instants=[]
for j in range(0,10):
instants.append(A())
for instance in instants:
print(instance.nastyfileinfo)
instants[5].dostuff(5)
for instance in instants:
print(instance.nastyfileinfo)
I have a class that needs to write to a file. My program creates multiple of these classes and I want to avoid write collisions. I tried to avoid it by using a static variable so each class has a unique file name. ie:
class Foo:
instance_count = 1
#staticmethod
def make():
file_name = Foo.instance_count + '-' + 'file.foo'
Foo.instance_count += 1
Foo(file_name)
def Foo(self, fname):
self.fname = fname
This works to some extent but doesn't work in cases where the class may be created in parallel. How can I make this more robust?
EDIT:
My use case has this class being created in my app, which is served by gunicorn. So I launch my app with gunicorn, with lets say 10 workers, so I can't actually manage the communication between them.
You could make use of something like uuid instead if unique name is what you are after.
EDIT:
If you would want readable but unique names, I would suggest you to look into guarding your increment statement above with a lock so that only one process is increasing it at any point of time and also perhaps make the file creation and the increment operation atomic.
Easy, just use another text file to keep the filenames and the number/code you need to identify. You can use JSON, pickle, or just your own format.
In the __init__ function, you can read to your file. Then, make a new file based on the information you get.
File example:
File1.txt,1
Fil2e.txt,2
And the __init__ function:
def __init__(self):
counter = int(open('counter.txt', 'r').read()[-2:].strip())
with open("File%d.txt"%counter+1, "w"):
#do things
#don't forget to keep the information of your new file
Really what I was looking for was a way to avoid write contention. Then I realized why not just use the logger. Might be a wrong assumption, but I would imagine the logger takes care of locking files for writing. Plus it flushes on every write, so it meets that requirement. As for speed, the overhead definitely does not affect me in this case.
The other solution I found was to use the tempfile class. This would create a unique file for each instantiation of the class.
import tempfile as tf
class Foo:
#staticmethod
def make():
file_name = tf.NamedTemporaryFile('w+',suffix="foofile",dir="foo/dir")
Foo(file_name)
def __init__(self, fname):
self.fname = fname
I'm struggling a little understanding how to use classes effectively. I have written a program which I hope to count the number of occurrences of a phrase or word in a .txt file.
I'm not quite sure how to call the function properly, any help would be much appreciated.
Thanks.
class WordCounter:
def The_Count(self):
print "Counting words..."
txt_doc = open("file")
for line in txt_doc:
if "word" in txt_doc:
word_freq = word_freq + 1
return word_freq
print "Frequency of word: %s" % word_freq
WordCounter.The_Count
Using classes is a little different than what you have tried to do here. Think of it more in terms of preserving variables and state of objects in code. To accomplish your task, something more like the following would work:
class CountObject(object):
"""Instance of CountObject for measuring file lengths"""
def __init__(self, filename):
self.filename = filename
def getcount(self, word):
count = 0
infile = open(self.filename,'r')
for line in infile.readlines():
x = line.count(word)
count = count + x
return count
mycounter = CountObject('C:\list.txt')
print 'The occcurence of awesome is %s' %(str(mycounter.getcount('in')))
First, just to agree on the names, a function inside a class is called a method of that class.
In your example, your method performs the action of counting occurrences of words, so to make it clearer, you could simply call your method count. Note also that in Python, it is a convention to have method names start with a lower case.
Also, it is good practice to use so called new-style classes which are simply classes that inherits from object.
Finally, in Python, a method needs to have at least one parameter, which is by convention called self and which should be an instance of the class.
So if we apply these changes, we get something like:
class WordCounter(object):
def count(self):
print "Counting words..."
# Rest of your code
# ...
Now that your class has a method, you first need to create an instance of your class before you can call that method on it. So, to create an instance of a class Foo in Python, you simply need to call Foo(). Once you have your instance, you can then call your method. Using your example
# Create an instance of your class and save it in a variable
my_word_counter = WordCounter()
# Call your method on the instance you have just created
my_word_counter.count()
Note that you don't need to pass in an argument for self because the Python interpreter will replace self with the value of word_counter for you, i.e. it calls WordCounter.count(my_word_counter).
A note on OO
Has noted by others, your example is not a great use of classes in Python. OO classes aim at putting together behaviours (instance methods) along with the data they interact with (instance attributes). You example being a simple one, there is no real internal data associated with your class. A good warning could be the fact that you never use self inside your method.
For behaviour that is not tied to some particular data, Python gives you the flexibility to write module-level functions - Java, in opposition, forces you to put absolutely everything inside classes.
As suggested by others too, to make your example more OO, you could pass the filename as a param to __init__ and save it as self.filename. Probably even better would be to have your WordCounter expect a file-like object, so that it is not responsible for opening/closing the file itself. Something like:
class WordCounter(object):
def __init__(self, txt_doc):
self.word_file = txt_doc
def count(self):
print "Counting words..."
for line in self.txt_doc:
# Rest of your code
# ...
with open(filename) as f:
word_counter = WordCounter(f)
word_counter.count()
Finally, if you want more details on classes in Python, a good source of information is always the official documentation.
you have several problems here, the code you posted isn't correct python. class methods should take a reference to self as an argument:
def The_Count(self):
you need to initialize word_freq for the case where there are no words to count:
word_freq = 0
as others have mentioned, you can call your function this way:
counter = WordCounter()
print(counter.The_Count())
It's not really idiomatic python to wrap these kinds of stateless functions in classes, as you might do in Java or something. I would separate this function into a module, and let the calling class handle the file I/O, etc.
To call a method in a class, first you have to create an instance of that class:
c = WordCounter()
Then you call the method on that instance:
c.TheCount()
However, in this case you don't really need classes; this can just be a top-level function. Classes are most useful when you want each object to have its own internal state.
For such a small program, using classes may not be necessary. You could simply define the function and then call it.
However, if you wanted to implement a class design you could use (after class definition):
if __name__ == "__main__":
wc = WordCounter() #create instance
wc.TheCount() #call method
The use of a class design would use better design principles while increasing the readability/flexibility of your code if you wanted to further expand the capabilities of the class later.
In this case, you'd have to change the code to this:
class WordCounter:
def CountWords(self):
# For functions inside classes, the first parameter must always be `self`
# I'm sure there are exceptions to that rule, but none you should worry about
# right now.
print "Counting words..."
txt_doc = open("file")
word_freq = 0
for line in txt_doc:
if "word" in line: # I'm assuming you mean to use 'line' instead of 'txt_doc'
word_freq += 1
# count all the words first before returning it
txt_doc.close() # Always close files after you open them.
# (also, consider learning about the 'with' keyword)
# Either print the frequency
print "Frequency of word: %s" % word_freq
# ...or return it.
return word_freq
...then to call it, you would do....
>>> foo = WordCounter() # create an instance of the class
>>> foo.CountWords() # run the function
As other posters have noted, this is not the most effective uses of classes. It would be better if you made this into a top-level function, and changed it to this:
def CountWords(filename, word):
with open(filename) as txt_doc:
word_freq = 0
for line in txt_doc:
if word in line:
word_freq += 1
return word_freq
...and called it like this:
>>> output = CountWords("file.txt", "cat")
>>> print "Frequency of word: %s" % output
39
It would make a bit more sense to use a class if you had something like the below, where you have a bunch of variables and functions all related to one conceptual 'object':
class FileStatistics:
def init(self, filename):
self.filename = filename
def CountWords(self, word):
pass # Add code here...
def CountNumberOfLetters(self):
pass
def AverageLineLength(self):
pass
# ...etc.