i need to handle a lot of files and i dont want to worry about their filenames etc.
So the idea is i sent a file Test.png to an external db/rest api etc. with python and i am getting back an uuid for this file.
The next time my script does need this file it calls this external services with uuid and does get back this file :)
It does sound like amazon s3 ;) But i need something running on linux (what i would prefer) or windows and it should be open source. I dont need something special and very easy to use (out of the box). Oh and no big data solution, the amount of data would be around 5 gb
My file processing language is Python 3.6
Does someone know a good solution here?
thanks
After days of search, i found something :) I want to share this:
https://minio.io/
Object Storage, can be ran inside docker etc. and has a client library for python, just perfekt for me :)
Related
I am using Python, and I want to analyze audio files from internet streaming media (for example Youtube, Soundcloud, etc.)
Is there a universal way to do so? There is a pre-loading for every music or video, there must be a way to access it? How?
I want to run this script on an external server, that might be relevant to the answer.
Thanks
All sound you "hear" on your pc has to run through your soundcard, Maybe somehow write a script that "records" the sounds runnning through the device. Maybe u can use Pymedia module?
I'm implementing a FUSE driver for Google Drive. The aim is to allow a user to mount her Google Drive/Docs account as a virtual filesystem. Full source at https://github.com/jforberg/drivefs. I use the fusepy bindings to integrate FUSE with Python, and Google's Document List API to access Drive.
My driver is complete to the degree that readdir(2), stat(2) and read(2) work as expected. In the filesystem, each file read translates to a HTTPS request which has a large overhead. I've managed to limit the overhead by forcing a larger buffer size for reads.
Now to my problem. File explorers like Thunar and Nautilus build thumbs and determine file types by reading the first part of each file (the first 4k bytes or so). But in my filessystem, reading from many files at once is a painful procedure, and getting a file listing in thunar takes a very long time compared with a simple ls (which only stat(2)s each file).
I need some way to tell file explorers that my filessystem does not play well with "mini-reads", or some way to identify these mini-reads and feed them made-up data to make them happy. Any help would be appreciated!
EDIT: The problem was not with HTTPS overhead, but with my handling of Google's native "doc" format. I added a line to make read(2) return an empty string when someone tries to read a native doc, and the file listing is now almost instantaneous.
This seems a mild limitation, as not even Google's official client program is able to edit native docs.
Here is pycloudfuse which is a similar attempt but for cloud files / openstack object storage which you might find useful bits in.
When writing this I can't say I noticed any problems with Thunar and Nautilus with the directory listings.
I don't think you can feed the file managers made up data - that is bound to lead to problems.
I like the option is to signal to the file explorer not to do thumbnails etc, but I don't think that is possible either.
I think the best option is to remind your users that drivefs is not a real filesystem, and to give a list of its limitations, and if it is anything like pycloudfuse there will be lots!
I am a postdoc and I just finished a cool little scientific application in Python and want to share it with the world. It's a really useful tool for genetecists.
I'd really like to let people run this program through a CGI form interface. Since I'm not a student anymore, I no longer have webspace with a tidy little cgi-bin subdirectory that's hooked up perfectly.
I wrote a simple CGI Python program a few years ago, and was trying to use this as a template.
Here is my quesion:
My program needs to create temporary files (when run from the command line it saves images to a given path).
I've read a couple tutorials on Apache, etc. and got lots of things running, but I can't figure out how to let my program write temporary files (I also don't know where these files would live, etc.). Any time I try to write to a file (in any manner) in my Python program, the CGI "crashes" and doesn't seem OK.
I am not extremely worried about security because the temporary files will only be outputs of the program (not the user input).
And while I'm asking (I'm assuming you're kind of a CGI ninja if you got this far and weren't bored), do you know my CGI program can take a file argument without making a temporary file?
My previous approach to this was to simply take a list of text as an argument:
try:
if item.file:
data = item.file.read()
if check:
Tools_file.main(["ExeName", "-d", "-w " + data])
else:
Tools_file.main(["ExeName", "-s", "-d", "-w " + data])
...
I'd like to do this the right way! Cheers in advance.
Stack overflowingly yours,
Oliver
Well, the "right" way is probably to re-work things using an existing web framework like Django. It's probably overkill in this case. Don't underestimate the security aspects here. They're probably more relevant than you think.
All that said, you probably want to use Python's temp file module from the standard library:
http://docs.python.org/library/tempfile.html
It'll generally write stuff out to /tmp/whatever if you're on unix. If your program is crashing only when run under apache (but runs fine when you execute it directly), check your permissions. Make sure your apache user has permission to write to wherever you've decided to store your temp files. Make sure the temp files are written with appropriate permissions (don't want to write a file that you can't read later on).
As Paul McMillan said, use tempfile:
temp, temp_filename = tempfile.mkstemp(text = True)
temp_output = os.fdopen(temp, 'w')
temp_output.write(something_or_other)
temp_output.close()
My personal opinion is that frameworks are a big time sink unless you really need the prebuilt functionality. CGI is far simpler and can probably work for your application, at least until it gets really popular.
I need to run the python program in the backend. To the script I have given one input file and the code is processing that file and creating new output file. Now if I change the input file content I don't want to run the code again. It should run in the back end continously and generate the output file. Please if someone knows the answer for this let me know.
thank you
Basically, you have to set up a so-called FileWatcher, i.e. some mechanism which looks out for changes in a file.
There are several techniques for watching file/directory changes in python. Have a look at this question: Monitoring contents of files/directories?. Another link is here, this is about directory changes but file changes are handled in a similar way. You could also google for "watch file changes python" in order to get a lot of answers :)
Note: If you're programming in windows, you should probably implement your program as windows service, look here for how to do that.
I want to program a virtual file system in Windows with Python.
That is, a program in Python whose interface is actually an "explorer windows". You can create & manipulate file-like objects but instead of being created in the hard disk as regular files they are managed by my program and, say, stored remotely, or encrypted or compressed or versioned, or whatever I can do with Python.
What is the easiest way to do that?
While perhaps not quite ripe yet (unfortunately I have no first-hand experience with it), pywinfuse looks exactly like what you're looking for.
Does it need to be Windows-native? There is at least one protocol which can be both browsed by Windows Explorer, and served by free Python libraries: FTP. Stick your program behind pyftpdlib and you're done.
Have a look at Dokan a User mode filesystem for Windows. There are Ruby, .NET (and Java by 3rd party) bindings available, and I don't think it'll be difficult to write python bindings either.
If you are trying to write a virtual file system (I may misunderstand you) - I would look at a container file format. VHD is well documented along with HDI and (embedded) OSQ. There are basically two things you need to do. One is you need to decide on a file/container format. After that it is as simple as writing the API to manipulate that container. If you would like it to be manipulated over the internet, pick a transport protocol then just write a service (would would emulate a file system driver) that listens on a certain port and manipulates this container using your API
You might be interested in PyFilesystem;
A filesystem abstraction layer for Python
PyFilesystem is an abstraction layer for filesystems. In the same way that Python's file-like objects provide a common way of accessing files, PyFilesystem provides a common way of accessing entire filesystems. You can write platform-independent code to work with local files, that also works with any of the supported filesystems (zip, ftp, S3 etc.).
What the description on the homepage does not advertise is that you can then expose this abstraction again as a filesystem, among others SFTP, FTP (though currently disfunct, probably fixable) and dokan (dito) as well as fuse.