How can I create a folder system inside of my python program?

How can I create a folder system inside of my python program? - python

Is there any way to actually create a standalone file system in a python program? I know that you can use os.mkdir() and os.chdir() but these write directly to your actual system instead of being stored in the program. I've tried several ways to do this, including:
if command == ("md"):
newDir = input("")
with open('directories.txt', 'a') as f:
f.write(newDir)
Obviously, this doesn't work, but I was wondering if anyone has some ideas on how one might do this. (This is for a basic but hopefully semi-function MS-DOS style os I'm working on.)

Not sure why writing to the file is a problem, but you could basically have a Tree type data structure to represent a file system. For example:
class File:
def __init__(self, filename, directory=False):
self.filename = filename
self.directory = directory
self.files = [] if directory else None
where self.files is a list of File objects that you can iterate through as a tree. This is obviously rudimentary and I'll leave a more detailed implementation up to you.

Related

Why is os.scandir() as slow as os.listdir()?

I tried to optimize a file browsing function written in Python, on Windows, by using os.scandir() instead of os.listdir(). However, time remains unchanged, about 2 minutes and a half, and I can't tell why.
Below are the functions, original and altered:
os.listdir() version:
def browse(self, path, tree):
# for each entry in the path
for entry in os.listdir(path):
entity_path = os.path.join(path, entry)
# check if support by git or not
if self.git_ignore(entity_path) is False:
# if is a dir create a new level in the tree
if os.path.isdir( entity_path ):
tree[entry] = Folder(entry)
self.browse(entity_path, tree[entry])
# if is a file add it to the tree
if os.path.isfile(entity_path):
tree[entry] = File(entity_path)
os.scandir() version:
def browse(self, path, tree):
# for each entry in the path
for dirEntry in os.scandir(path):
entry_path = dirEntry.name
entity_path = dirEntry.path
# check if support by git or not
if self.git_ignore(entity_path) is False:
# if is a dir create a new level in the tree
if dirEntry.is_dir(follow_symlinks=True):
tree[entry_path] = Folder(entity_path)
self.browse(entity_path, tree[entry_path])
# if is a file add it to the tree
if dirEntry.is_file(follow_symlinks=True):
tree[entry_path] = File(entity_path)
In addition, here are the auxiliary functions used within this one:
def git_ignore(self, filepath):
if '.git' in filepath:
return True
if '.ci' in filepath:
return True
if '.delivery' in filepath:
return True
child = subprocess.Popen(['git', 'check-ignore', str(filepath)],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
output = child.communicate()[0]
status = child.wait()
return status == 0
============================================================
class Folder(dict):
def __init__(self, path):
self.path = path
self.categories = {}
============================================================
class File(object):
def __init__(self, path):
self.path = path
self.filename, self.extension = os.path.splitext(self.path)
Does anyone have a solution for how I can make the function run faster? My assumption is that the extraction of the name and path at the beginning makes it run slower than it should, is that correct?

Regarding your question:
os.walk seems to call stats more times than necessary. That seems to be the reason why it's slower than os.scandir().
In this case, I think the best way to boost your speed performance would be to
use parallel processing, which can improve the speed incredibly in some loops.
There are multiple posts about this issue. Here one:
Parallel Processing in Python – A Practical Guide with Examples.
Nevertheless I would like to share some thoughts about it.
I have also been wondering what are the best usage of these three options (scandir, listdir, walk). There is not much documentation about performance comparisons. Probably the best way would be to test it yourself as you did. Here my conclusions about that:
Usage of os.listdir():
It doesn't seem to have advantages compared to os.scandir() excepting that is easier to understand. I still use it when I only need to list files in directory.
PROS:
Fast & Simple
CONS:
Too simple, only works for listing files and dirs in directory, so you might need to combine it with other methods to get extra features about the files metadata. If you so, better use os.scandir().
Usage of os.walk():
This is the most used function when we need to fetch all the items in a directory (and subdirs).
PROS:
It's probably the easiest way to walk around all the items paths and names.
CONS:
It seems to call stats more times than necessary. That seems to be the reason why it's slower than os.scandir().
Although it gives you the root parts of the files, it doesn't provide the extra meta-info of os.scandir().
Usage of os.scandir():
It seems to have (almost) the best of both worlds. It gives you the speed of the simple os.listdir with extra features that would allow you to
simplify your loops, since you could avoid using exiftool or other metadata tools
when you need extra information about the files.
PROS:
Fast. same speed than os.listdir()
very nice extra features.
CONS:
If you want to dive into subfiles you need to create another function in order to scan over each subdir. This function is pretty simple, but maybe it would be more pythonic (I just mean with more elegant sintax) to use os.walk in this case.
So that's my view after reading a bit and using them. I'm happy to be corrected, so I can learn more about it.

Methods to avoid hard-coding file paths in Python

Working with scientific data, specifically climate data, I am constantly hard-coding paths to data directories in my Python code. Even if I were to write the most extensible code in the world, the hard-coded file paths prevent it from ever being truly portable. I also feel like having information about the file system of your machine coded in your programs could be security issue.
What solutions are out there for handling the configuration of paths in Python to avoid having to code them out explicitly?

One of the solution rely on using configuration files.
You can store all your path in a json file like so :
{
"base_path" : "/home/bob/base_folder",
"low_temp_area_path" : "/home/bob/base/folder/low_temp"
}
and then in your python code, you could just do :
import json
with open("conf.json") as json_conf :
CONF = json.load(json_conf)
and then you can use your path (or any configuration variable you like) like so :
print "The base path is {}".format(CONF["base_path"])

First off its always good practise to add a main function to go with each class to test that class or functions in the file. Along with this you determine the current working directory. This becomes incredibly important when running python from a cron job or from a directory that is not the current working directory. No JSON files or environment variables are then needed and you will obtain interoperation across Mac, RHEL and Debian distributions.
This is how you do it, and it will work on windows also if you use '\' instead of '/' (if that is even necessary, in your case).
if "__main__" == __name__:
workingDirectory = os.path.realpath(sys.argv[0])
As you can see when you run your command, the working directory is calculated if you provide a full path or relative path, meaning it will work in a cron job automatically.
After that if you want to work with data that is stored in the current directory use:
fileName = os.path.join( workingDirectory, './sub-folder-of-current-directory/filename.csv' )
fp = open( fileName,'r')
or in the case of the above working directory (parallel to your project directory):
fileName = os.path.join( workingDirectory, '../folder-at-same-level-as-my-project/filename.csv' )
fp = open( fileName,'r')

I believe there are many ways around this, but here is what I would do:
Create a JSON config file with all the paths I need defined.
For even more portability, I'd have a default path where I look for this config file but also have a command line input to change it.

In my opinion passing arguments from command line would be best solution. You should take a look at argparse . This allows you to create nice way to handle arguments from the command line. for example:
myDataScript.py /home/userName/datasource1/

Running Fortran executable within Python script

I am writing a Python script with the following objectives:
Starting from current working directory, change directory to child directory 'A'
Make slight adjustments to a fort.4 file
Run a Fortran binary file (the syntax of which is ../../../../ continuing until I hit the folder containing the binary); return to 2. until my particular objective is complete, then
Back out of child directory to parent, then enter another child directory and return to 2. until I have iterated through all the folders in question.
The code is coming along well. I am having to rely heavily upon Python's OS module for the directory work. However, I have never had any experience a) making minor adjustments of a file using python and b) running an executable. Could you guys give me some ideas on Python modules, direct me to a similar stack source etc, or perhaps give ways that this can be accomplished? I understand this is a vague question, so please ask if you do not understand what I am asking and I will elaborate. Also, the changes I have to make to this fort.4 file are repetitive in nature; they all happen at the same position in the file.
Cheers
EDIT::
entire fort.4 file:
file_name
movie1.dat !name of a general file the binary reads
nbr_box ! line3-8 is general info
2
this_box
1
lrdf_bead
.true.
beadid1
C1 !this is the line I must change
beadid2
F4 !This is a second line I must change
lrdf_com
.false.
bin_width
0.04
rcut
7
So really, I need to change "C1" to "C2" for example. The changes are very insignificant to make, but I must emphasize the fact that the main fortran executable reads this fort.4, as well as this movie1.dat file that I have already created. Hope this helps

Ok so there is a few important things here, first we need to be able to manage our cwd, for that we will use the os module
import os
whenever a method operates on a folder it is important to change directories into the folder and back to the parent folder. This can also be achieved with the os module.
def operateOnFolder(folder):
os.chdir(folder)
...
os.chdir("..")
Now we need to do some method for each directory, that comes with this,
for k in os.listdir(".") if os.path.isdir(k):
operateOnFolder(k)
Finally in order to operate on some preexisting FORTRAN file we can use the builtin file operators.
fileSource = open("someFile.f","r")
fileText = fileSource.read()
fileSource.close()
fileLines = fileText.split("\n")
# change a line in the file with -> fileLines[42] = "the 42nd line"
fileText = "\n".join(fileLines)
fileOutput = open("someFile.f","w")
fileOutput.write(fileText)
You can create and run your executable output.fx from source.f90::
subprocess.call(["gfortran","-o","output.fx","source.f90"])#create
subprocess.call(["output.fx"]) #execute

How can I open multiple files (number of files unknown beforehand) using "with open" statement?

I specifically need to use with open statement for opening the files, because I need to open a few hundred files together and merge them using K-way merge. I understand, ideally I should have kept K low, but I did not foresee this problem.
Starting from scratch is not an option now as I have a deadline to meet. So at this point, I need very fast I/O that does not store the whole/huge portion of file in memory (because there are hundreds of files, each of ~10MB). I just need to read one line at a time for K-way merge. Reducing memory usage is my primary focus right now.
I learned that with open is the most efficient technique, but I cannot understand how to open all the files together in a single with open statement. Excuse my beginner ignorance!
Update: This problem was solved. It turns out the issue was not about how I was opening the files at all. I found out that the excessive memory usage was due to inefficient garbage collection. I did not use with open at all. I used the regular f=open() and f.close(). Garbage collection saved the day.

It's fairly easy to write your own context manager to handle this by using the built-in contextmanger function decorator to define "a factory function for with statement context managers" as the documentation puts it. For example:
from contextlib import contextmanager
#contextmanager
def multi_file_manager(files, mode='rt'):
""" Open multiple files and make sure they all get closed. """
files = [open(file, mode) for file in files]
yield files
for file in files:
file.close()
if __name__ == '__main__':
filenames = 'file1', 'file2', 'file3'
with multi_file_manager(filenames) as files:
a = files[0].readline()
b = files[2].readline()
...
If you don't know all the files ahead of time, it would be equally easy to create a context manager that supported adding them incrementally with the context. In the code below, a contextlib.ContextDecorator is used as the base class to simplify the implementation of a MultiFileManager class.
from contextlib import ContextDecorator
class MultiFileManager(ContextDecorator):
def __init__(self, files=None):
self.files = [] if files is None else files
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
for file in self.files:
file.close()
def __iadd__(self, other):
"""Add file to be closed when leaving context."""
self.files.append(other)
return self
if __name__ == '__main__':
filenames = 'mfm_file1.txt', 'mfm_file2.txt', 'mfm_file3.txt'
with MultiFileManager() as mfmgr:
for count, filename in enumerate(filenames, start=1):
file = open(filename, 'w')
mfmgr += file # Add file to be closed later.
file.write(f'this is file {count}\n')

with open(...) as f:
# do stuff
translates roughly to
f = open(...)
# do stuff
f.close()
In your case, I wouldn't use the with open syntax. If you have a list of filenames, then do something like this
filenames = os.listdir(file_directory)
open_files = map(open, filenames)
# do stuff
for f in open_files:
f.close()
If you really want to use the with open syntax, you can make your own context manager that accepts a list of filenames
class MultipleFileManager(object):
def __init__(self, files):
self.files = files
def __enter__(self):
self.open_files = map(open, self.files)
return self.open_files
def __exit__(self):
for f in self.open_files:
f.close()
And then use it like this:
filenames = os.listdir(file_directory)
with MulitpleFileManager(filenames) as files:
for f in files:
# do stuff
The only advantage I see to using a context manager in this case is that you can't forget to close the files. But there is nothing wrong with manually closing the files. And remember, the os will reclaim its resources when your program exits anyway.

While not a solution for 2.7, I should note there is one good, correct solution for 3.3+, contextlib.ExitStack, which can be used to do this correctly (surprisingly difficult to get right when you roll your own) and nicely:
from contextlib import ExitStack
with open('source_dataset.txt') as src_file, ExitStack() as stack:
files = [stack.enter_context(open(fname, 'w')) for fname in fname_list]
... do stuff with src_file and the values in files ...
... src_file and all elements in stack cleaned up on block exit ...
Importantly, if any of the opens fails, all of the opens that succeeded prior to that point will be cleaned up deterministically; most naive solutions end up failing to clean up in that case, relying on the garbage collector at best, and in cases like lock acquisition where there is no object to collect, failing to ever release the lock.
Posted here since this question was marked as the "original" for a duplicate that didn't specify Python version.

OS X: Determine Trash location for a given path

Simply moving the file to ~/.Trash/ will not work, as if the file os on an external drive, it will move the file to the main system drive..
Also, there are other conditions, like files on external drives get moved to /Volumes/.Trash/501/ (or whatever the current user's ID is)
Given a file or folder path, what is the correct way to determine the trash folder? I imagine the language is pretty irrelevant, but I intend to use Python

Based upon code from http://www.cocoadev.com/index.pl?MoveToTrash I have came up with the following:
def get_trash_path(input_file):
path, file = os.path.split(input_file)
if path.startswith("/Volumes/"):
# /Volumes/driveName/.Trashes/<uid>
s = path.split(os.path.sep)
# s[2] is drive name ([0] is empty, [1] is Volumes)
trash_path = os.path.join("/Volumes", s[2], ".Trashes", str(os.getuid()))
if not os.path.isdir(trash_path):
raise IOError("Volume appears to be a network drive (%s could not be found)" % (trash_path))
else:
trash_path = os.path.join(os.getenv("HOME"), ".Trash")
return trash_path
Fairly basic, and there's a few things that have to be done seperatly, particularly checking if the filename already exist in trash (to avoid overwriting) and the actual moving to trash, but it seems to cover most things (internal, external and network drives)
Update: I wanted to trash a file in a Python script, so I re-implemented Dave Dribin's solution in Python:
from AppKit import NSURL
from ScriptingBridge import SBApplication
def trashPath(path):
"""Trashes a path using the Finder, via OS X's Scripting Bridge.
"""
targetfile = NSURL.fileURLWithPath_(path)
finder = SBApplication.applicationWithBundleIdentifier_("com.apple.Finder")
items = finder.items().objectAtLocation_(targetfile)
items.delete()
Usage is simple:
trashPath("/tmp/examplefile")

Alternatively, if you're on OS X 10.5, you could use Scripting Bridge to delete files via the Finder. I've done this in Ruby code here via RubyCocoa. The the gist of it is:
url = NSURL.fileURLWithPath(path)
finder = SBApplication.applicationWithBundleIdentifier("com.apple.Finder")
item = finder.items.objectAtLocation(url)
item.delete
You could easily do something similar with PyObjC.

A better way is NSWorkspaceRecycleOperation, which is one of the operations you can use with -[NSWorkspace performFileOperation:source:destination:files:tag:]. The constant's name is another artifact of Cocoa's NeXT heritage; its function is to move the item to the Trash.
Since it's part of Cocoa, it should be available to both Python and Ruby.

In Python, without using the scripting bridge, you can do this:
from AppKit import NSWorkspace, NSWorkspaceRecycleOperation
source = "path holding files"
files = ["file1", "file2"]
ws = NSWorkspace.sharedWorkspace()
ws.performFileOperation_source_destination_files_tag_(NSWorkspaceRecycleOperation, source, "", files, None)

The File Manager API has a pair of functions called FSMoveObjectToTrashAsync and FSPathMoveObjectToTrashSync.
Not sure if that is exposed to Python or not.

Another one in ruby:
Appscript.app('Finder').items[MacTypes::Alias.path(path)].delete
You will need rb-appscript gem, you can read about it here

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I create a folder system inside of my python program? - python

Related

Why is os.scandir() as slow as os.listdir()?

Methods to avoid hard-coding file paths in Python

Running Fortran executable within Python script

How can I open multiple files (number of files unknown beforehand) using "with open" statement?

OS X: Determine Trash location for a given path

Categories

Resources