So, I am a beginner in Pyhton and recently I have covered what would be the basics of the language.
And now I have this little project in mind which is basically to create a script that can convert files in a massive way. Specifically convert .ogg files into .mp4 or .mkv files. The intention of this is to convert whatsapp audio files that come in .ogg format to make them more manipulable.
I would like hints suggestions and guidance on how I could do this. Which lib could I use to help me and where can I learn more about this and file conversion using python
Not a python user, but will try to give you some direction.
There is a software called ffmpeg. It can be used as a command line tool to convert any audio/video files to almost any format. If you decide to use it manually then you need to download binaries here, put your .ogg file in the /bin directory next to ffmpeg.exe file and execute (here is the screenshot for better understanding):
./ffmpeg.exe -i input.ogg output.mp4
or:
./ffmpeg.exe -i input.ogg -c copy output.mp4
These are only basic commands, for more examples check this answer.
But I would suggest to simply use a python wrapper of this tool that is already implemented, check this quick start guide with lots of examples in python. For more details this answer can be also helpful.
I want to download the source code for Python packages using something like
pip download --no-binary=:all: $package==$version
Almost always this result in a tar.gz file (at least on Linux), which is what I want. For NumPy version 1.13.1 however, I retrieve a zip file instead. This unreliability makes it somewhat harder to write automatic install scripts, and so I would like to ask if there is any way in which I can choose the format of the retrieved archive?
pip downloads what it finds at PyPI. For numpy it finds .zip and not .tar.*. So you have to know in advance what formats package publishers provide.
You can ask (or better yet provide a patch for) NumPy team to publish .tar.* in addition to .zip.
I am working on a python script to generate release notes.
I am looking for a way to list all files present in a given changeset.
I am not interested in what changed, but the whole list of files.
For the moment i thought about two possibilities:
update the repository to a given changeset and get the list of files
customize with the hg log via template
(1) above is not so elegant, and I have not been able to implement (2).
Do you have some suggestions?
I thinks i found the answer by myself:
hg manifest -r <changeset>
We need to check the md5sum of self made python packages, actually taking it from resulting *.whl file. The problem is that the md5sum changes on every build, even if there no changes in source code. Also we have tested this on third party packages, i.e. django-celery, and get the same behavior.
So the questions are:
What differs if we don't change the source code?
Is it possible to get the same md5sum for the same python builds?
upd.
To illustrate the issue I get two reports made on two django-celery builds.
Build content checksums is exactly the same (4th column), but the checksums of the *.whl files itself differs.
Links to the reports:
https://www.dropbox.com/s/0kkbhwd2fgopg67/django_celery-3.1.17-py2-none-any2.htm?dl=0
https://www.dropbox.com/s/vecrq587jjrjh2r/django_celery-3.1.17-py2-none-any1.htm?dl=0
Quoting the relevant PEP:
A wheel is a ZIP-format archive with a specially formatted file name and the .whl extension.
ZIP archives preserve the modification time of each file.
Wheel archives do not contain just source code, but also other files and directories that are generated on the fly when the archive is created. Therefore, even if you don't touch your Python source code, the wheel will still contain contents that have a different modification time.
One way to work around this problem is to unzip the wheel and compute the checksums of the contents.
I'm looking for a way to read specific files from a rar archive into memory. Specifically they are a collection of numbered image files (I'm writing a comic reader). While I can simply unrar these files and load them as needed (deleting them when done), I'd prefer to avoid that if possible.
That all said, I'd prefer a solution that's cross platform (Windows/Linux) if possible, but Linux is a must. Just as importantly, if you're going to point out a library to handle this for me, please understand that it must be free (as in beer) or OSS.
See the rarfile module:
http://grue.l-t.ee/~marko/src/rarfile/README.html
http://pypi.python.org/pypi/rarfile/
https://github.com/markokr/rarfile
The real answer is that there isn't a library, and you can't make one. You can use rarfile, or you can use 7zip unRAR (which is less free than 7zip, but still free as in beer), but both approaches require an external executable. The license for RAR basically requires this, as while you can get source code for unRAR, you cannot modify it in any way, and turning it into a library would constitute illegal modification.
Also, solid RAR archives (the best compressed) can't be randomly accessed, so you have to unarchive the entire thing anyhow. WinRAR presents a UI that seems to avoid this, but really it's just unpacking and repacking the archive in the background.
The pyUnRAR2 library can extract files from RAR archives to memory (and disk if you want). It's available under the MIT license and simply wraps UnRAR.dll on Windows and unrar on Unix. Click "QuickTutorial" for usage examples.
On Windows, it is able to extract to memory (and not disk) with the (included) UnRAR.dll by setting a callback using RARSetCallback() and then calling RARProcessFile() with the RAR_TEST option instead of the RAR_EXTRACT option to avoid extracting any files to disk. The callback then watches for UCM_PROCESSDATA events to read the data. From the documentation for UCM_PROCESSDATA events: "Process unpacked data. It may be used to read a file while it is being extracted or tested without actual extracting file to disk."
On Unix, unrar can simply print the file to stdout, so the library just reads from a pipe connected to unrar's stdout. The unrar binary you need is the one that has the "p" for "Print file to stdout" command. Use "apt-get install unrar" to install it on Ubuntu.
It seems like the limitation that rarsoft imposes on derivative works is that you may not use the unrar source code to create a variation of the RAR COMPRESSION algorithm. From the context, it would appear that it's specifically allowing folks to use his code (modified or not) to decompress files, but you cannot use them if you intend to write your own compression code. Here is a direct quote from the license.txt file I just downloaded:
The UnRAR sources may be used in any software to handle RAR
archives without limitations free of charge, but cannot be used
to re-create the RAR compression algorithm, which is proprietary.
Distribution of modified UnRAR sources in separate form or as a
part of other software is permitted, provided that it is clearly
stated in the documentation and source comments that the code may
not be used to develop a RAR (WinRAR) compatible archiver.
Seeing as everyone seemed to just want something that would allow them to write a comic viewer capable of handling reading images from CBR (rar) files, I don't see why people think there's anything keeping them from using the provided source code.
RAR is a proprietary format; I don't think there are any public specs, so third-party tool and library support is poor to non-existant.
You're much better off using ZIP; it's completely free, has an accurate public spec, the compression library is available everywhere (zlib is one of the most widely-deployed libraries in the world), and it's very easy to code for.
http://docs.python.org/library/zipfile.html
The free 7zip library is also able to handle RAR files.
Look at the Python "struct" module. You can then interpret the RAR file format directly in your Python program, allowing you to retrieve the content inside the RAR without depending on external software to do it for you.
EDIT: This is of course vanilla Python - there are alternatives which use third-party modules (as already posted).
EDIT 2: According to Wikipedia's article my answer would require you to have permission from the author.