I want to program a virtual file system in Windows with Python.
That is, a program in Python whose interface is actually an "explorer windows". You can create & manipulate file-like objects but instead of being created in the hard disk as regular files they are managed by my program and, say, stored remotely, or encrypted or compressed or versioned, or whatever I can do with Python.
What is the easiest way to do that?
While perhaps not quite ripe yet (unfortunately I have no first-hand experience with it), pywinfuse looks exactly like what you're looking for.
Does it need to be Windows-native? There is at least one protocol which can be both browsed by Windows Explorer, and served by free Python libraries: FTP. Stick your program behind pyftpdlib and you're done.
Have a look at Dokan a User mode filesystem for Windows. There are Ruby, .NET (and Java by 3rd party) bindings available, and I don't think it'll be difficult to write python bindings either.
If you are trying to write a virtual file system (I may misunderstand you) - I would look at a container file format. VHD is well documented along with HDI and (embedded) OSQ. There are basically two things you need to do. One is you need to decide on a file/container format. After that it is as simple as writing the API to manipulate that container. If you would like it to be manipulated over the internet, pick a transport protocol then just write a service (would would emulate a file system driver) that listens on a certain port and manipulates this container using your API
You might be interested in PyFilesystem;
A filesystem abstraction layer for Python
PyFilesystem is an abstraction layer for filesystems. In the same way that Python's file-like objects provide a common way of accessing files, PyFilesystem provides a common way of accessing entire filesystems. You can write platform-independent code to work with local files, that also works with any of the supported filesystems (zip, ftp, S3 etc.).
What the description on the homepage does not advertise is that you can then expose this abstraction again as a filesystem, among others SFTP, FTP (though currently disfunct, probably fixable) and dokan (dito) as well as fuse.
Related
The full explanation of what I want to do and why would take a while to explain. Basically, I want to use a private SSL connection in a publicly distributed application, and not handout my private ssl keys, because that negates the purpose! I.e. I want secure remote database operations which no one can see into - inclusive of the client.
My core question is : How could I make the Python ssl module use data held in memory containing the ssl pem file contents instead of hard file system paths to them?
The constructor for class SSLSocket calls load_verify_locations(ca_certs) and load_cert_chain(certfile, keyfile) which I can't trace into because they are .pyd files. In those black boxes, I presume those files are read into memory. How might I short circuit the process and pass the data directly? (perhaps swapping out the .pyd?...)
Other thoughts I had were: I could use io.StringIO to create a virtual file, and then pass the file descriptor around. I've used that concept with classes that will take a descriptor rather than a path. Unfortunately, these classes aren't designed that way.
Or, maybe use a virtual file system / ram drive? That could be trouble though because I need this to be cross platform. Plus, that would probably negate what I'm trying to do if someone could access those paths from any external program...
I suppose I could keep them as real files, but "hide" them somewhere in the file system.
I can't be the first person to have this issue.
UPDATE
I found the source for the "black boxes"...
https://github.com/python/cpython/blob/master/Modules/_ssl.c
They work as expected. They just read the file contents from the paths, but you have to dig down into the C layer to get to this.
I can write in C, but I've never tried to recompile an underlying Python source. It looks like maybe I should follow the directions here https://devguide.python.org/ to pull the Python repo, and make changes to. I guess I can then submit my update to the Python community to see if they want to make a new standardized feature like I'm describing... Lots of work ahead it seems...
It took some effort, but I did, in fact, solve this in the manner I suggested. I revised the underlying code in the _ssl.c Python module / extension and rebuilt Python as a whole. After figuring out the process for building Python from source, I had to learn the details for how to pass variables between Python and C, and I needed to dig into guts of OpenSSL (over which the Python module is a wrapper).
Fortunately, OpenSSL already has functions for this exact purpose, so it was just a matter of swapping out the how Python is trying to pass file paths into the C, and instead bypassing the file reading process and jumping straight to the implementation of using the ca/cert/key data directly instead.
For the moment, I only did this for Windows. Since I'm ultimately creating a cross platform program, I'll have to repeat the build process for the other platforms I'll support - so that's a hassle. Consider how badly you want this, if you are going to pursue it yourself...
Note that when I rebuilt Python, I didn't use that as my actual Python installation. I just kept it off to the side.
One thing that was really nice about this process was that after that rebuild, all I needed to do was drop the single new _ssl.pyd into my working directory. With that file in place, I could pass my direct cert data. If I removed it, I could pass the normal file paths instead. It will use either the normal Python source, or implicitly use the override if the .pyd file is simply put in the program's directory.
Background
We have a lot of data files stored on a network drive which we process in python. For performance reasons I typically copy the files to my local SSD when processing. My wish is to make this happen automatically, so whenever I try to open a file it will grab the remote version if it isn't stored locally, and ideally also keep some sort of timer to delete the files after some time. The files will practically never be changed so I do not require actual syncing capabilities.
Functionality
To sum up what I am looking for is functionality for:
Keeping a local cache of files/directories from a network drive, automatically retrieving the remote version when unavailable locally
Support for directory structure - that is, the files are stored in a directory structure on the remote server which should be duplicated locally for the requested files
Ideally keep some sort of timer to expire cached files
It wouldn't be to difficult for me to write a piece of code which does this my self, but when possible, I prefer to rely on existing projects as this typically give a more versatile end result and also make any of my own improvements easily available to other users.
Question
I have searched a bit around for terms like python local file cache, file synchronization and the like, but what I have found mostly handles caching of function return values. I was a bit surprised because I would imagine this is a quite general problem, my question is therefore: is there something I have overlooked, and more importantly, are there any technical terms describing this functionality which could help my research.
Thank you in advance,
Gregers Poulsen
-- Update --
Due to other proprietary software packages I am forced to use Windows so the solution naturally must support this.
Have a look at fsspec remote caching, with a tutorial on the anaconda blog and the official documentation. Quoting the former:
In this article, we will present [fsspec]s new ability to cache remote content, keeping a local copy for faster lookup after the initial read.
They give an example for how to use it:
import fsspec
of = fsspec.open("filecache://anaconda-public-datasets/iris/iris.csv", mode='rt',
cache_storage='/tmp/cache1',
target_protocol='s3', target_options={'anon': True})
with of as f:
print(f.readline())
On first call, the file will be downloaded, stored into cache, and provided. On the second call, it will be downloaded from the local filesystem.
I haven't used it yet, but I need it and it looks promising.
I am writing a program in python and, although the file handling is great on local storage, I was wondering if there was any way to store a variable, or a text file on the internet and easily read and write from it as if I were handling files in python?
Thanks in advance
Sure. You just need one of two things: either a cloud provider that offers some kind of storage that your OS can turn into networked file sharing, or a Python module that wraps up some kind of cloud storage in file-like objects.
For the first one, to give a silly but doable example, you could create a Amazon Web Storage instance, configure a CIFS (smbd) or NFS (nfsd) server on it, set up the firewall settings so you can access it, then just mount the store. Alternatively, you could just serve the files up over HTTP (or WebDAV) or FTP or even IMAP (email), and configure your local OS to treat the service as a virtual filesystem (e.g., one of the samples for FUSE lets you mount an FTP server as a local filesystem).
However, the second one is usually a better option. Most things in Python don't actually care whether they get a real file, they just want a file-like object, something that provides the same API as a real file object. In Python 3.x, there are explicit interfaces for this API in the io module (see the links in the glossary entry); in Python 2.x, it's a bit more vague and implicit (although in 2.7, you can still use the io module ABCs for your implementation). You can write anything you want, or you can search around PyPI for things people have already written. For example, ftputil can create file-like objects (and even fake local directories with the same interface as the os module) out of an FTP server.
You can use Python Requests to send and receive data to the Internet. Then, some web app may store and give you the data. I doubt using open() to save/receive files on the Internet is possible without Linux trickery (symlinks or writing to a Linux /device)
I have two computers on a network. I'll call the computer I'm working on computer A and the remote computer B. My goal is to send a command to B, gather some information, transfer this information to computer A and use this in some meaningfull way. My current method follows:
Establish a connection to B with paramiko.
Use paramiko to execute a remote command e.g. paramiko.exec_command('python file.py').
Write the information to a file with pickle, and use paramiko.ftp to transfer the file to computer A.
Open this file and parse the information back into a usable form, like a class or dictionary.
This seems very ad-hoc. My question is, is there a better way to transfer data between computers using Python? I know how to transfer files, this is a different question. I want to make an object on B and use it on A.
I would recommend taking a look at twisted. It's the best python framework I've encountered for custom client/server applications like the one you describe.
I have been doing something similar on a larger scale (with 15 clients). I use the pexpect module to do essentially ssh commands on the remote machine (computer B). Another module to look into would be the subprocess module.
We are developing Versile Python which provides such object-level interaction, you may want to have a look. It includes an object-level framework for streams which could be applied to your use case (transferring file data).
This is specifically geared towards managing MP3 files, but it should easily work for any directory structure with a lot of files.
I want to find or write a daemon (preferably in Python) that will watch a folder with many subfolders that should all contain X number of MP3 files. Any time a file is added, updated or deleted, it should reflect that in a database (preferably PostgreSQL). I am willing to accept if a file is simply moved that the respective rows are deleted and recreated anew but updating existing rows would make me the happiest.
The Stack Overflow question Managing a large collection of music has a little of what I want.
I basically just want a database that I can then do whatever I want to with. My most up-to-date database as of now is my iTunes.xml file, but I don't want to rely on that too much as I don't always want to rely on iTunes for my music management. I see plenty of projects out there that do a little of what I want but in a format that either I can't access or is just more complex than I want. If there is some media player out there that can watch a folder and update a database that is easily accessible then I am all for it.
The reason I'm leaning towards writing my own is because it would be nice to choose my database and schema myself.
Another answer already suggested pyinotify for Linux, let me add watch_directory for Windows (a good discussion of the possibilities in Windows is here, the module's an example) and fsevents on the Mac (unfortunately I don't think there's a single cross-platform module offering a uniform interface to these various system-specific ways to get directory-change notification events).
Once you manage to get such events, updating an appropriate SQL database is simple!-)
If you use Linux, you can use PyInotify.
inotify can notify you about filesystem events when your program is running.
IMO, the best media player that has these features is Winamp. It rescans the music folders every X minutes, which is enough for music (but of course a little less efficient than letting the operating system watch for changes).
But as you were asking for suggestions on writing your own, you could make use of pyinotify (Linux only). If you're running Windows, you can use the ReadDirectoryChangesW API call