Is there a Python library to read metadata (camera model, time created, etc ...) from video files? The Perl equivalent is "exiftool." I checked pyexiv2, but it doesn't have video support like exiftool does. Thanks.
I have used hachoir-metadata succesfully: http://pypi.python.org/pypi/hachoir-metadata
I have used PyExifTool, a wrapper for the command line program, exif tool. You can get the library here (I think this is the result of the related question in Sven's comment).
The neat thing about PyExifTool is that it also parses the metadata into a dictionary for you.
I used it on a list of file names from os.walk.
import exiftool
exif_Executable="<path to exif executable>"
with exiftool.ExifTool(executable_=exif_Executable) as et:
metadata = et.get_metadata_batch(fileList)
Related
I want to rename mp3 files on my mac using python before importing them to iTunes. So I need to change the "Title" of the file, not the file's name. As in, I want to change "Al-Fatihah" in the picture below to "new_title".
Most online resources and question that I found suggest using either external libraries or using os.stat() which only gives info about modification and creation of the file (second picture below), unless I'm misunderstanding something. I was wondering if there is a way to do so without having to download extra libraries as I'm not always sure which libraries are safe.
Thanks!
If you don't use a library, you're gonna have to go in and manually edit the bytes yourself. The 'title' you're referring to is an ID3 tag, which is a standard defining which parts of the mp3 file contain data about the track.
In the case of ID3v1, the last 128 bytes of the file are reserved for metadata, and bytes 4 to 34 are reserved for the title.
Manually writing bytes in python is an absolute pain, so I strongly, strongly recommend using a library for this menial task. eyeD3 is a library that can do this for you. If you are not "sure which libraries are safe", why don't you have a look at the source code for these libraries to check that they're safe yourself?
If you really, must absolutely edit them using only python, you'd have to go about it like this. I'm pasting this answer from another question about manipulating bytes here. This is not an exact solution, more of a guideline of what manually editing the bytes would entail:
with open("filename.mp3", "r+b") as f:
fourbytes = [ord(b) for b in f.read(4)]
fourbytes[0] = fourbytes[1] # whatever, manipulate your bytes here
f.seek(0)
f.write("".join(chr(b) for b in fourbytes))
What is the easiest way to create a HDF5-file of an SPSS-file by Python?
If you haven't already stumbled upon it, check out h5py. Iterating over SPSS's data with SPSS's addon python module, and placing the data in an h5py object should be all you need to do.
The Python code at this link
http://code.activestate.com/recipes/577811-python-reader-writer-for-spss-sav-files-linux-mac-/
reads and write sav files. It uses the free i/o modules produced by IBM SPSS.
I am searching for a way to write a simple python
program to perform an automatic edit on an audio file.
I wrote with PIL automatic picture resizing to a predefined size.
I would like to write the same for automatic file re-encoding into a predefined bitrate.
similarly, i would like to write a python program that can stretch an audio file and re-encode it.
do i have to parse MP3's by myself, or is there a library that can be used for this?
Rather than doing this natively in Python, I strongly recommend leaving the heavy lifting up to FFMPEG, by executing it from your script.
It can chop, encode, and decode just about anything you throw at it. You can find a list of common parameters here: http://howto-pages.org/ffmpeg/
This way, you can leave your Python program to figure out the logic of what you want to cut and where, and not spend a decade writing code to deal with all of the audio formats available.
If you don't like the idea of directly executing it, there is also a Python wrapper available for FFMPEG.
There is pydub. It's an easy to use library.
This has already been asked here, but I was looking for a solution that would work on Linux.. Is tiffcp the only way?
Looks like ImageMagick can do it. The solution is essentially the same; call it from the command line.
Specifically, you want the -adjoin option (which is on by default). The command will look something like:
convert *.tiff my_combined_file.tiff
Haven't tried it, but there is pylibtiff, a python wrapper for the libtiff library on which tiffcp is implemented.
I know this is an old question, but convert has the drawback that it recompresses the images. You can use the python tifftools package to do this without recompressing the images: tifftools merge *.tiff combined_file.tiff.
Disclaimer: I am the author of the tifftools package.
I get a file via a HTTP upload and need to make sure its a PDF file. The programing language is Python, but this should not matter.
I thought of the following solutions:
Check if the first bytes of the string are %PDF. This is not a good check but prevents the user from uploading other files accidentally.
Use libmagic (the file command in bash uses it). This does exactly the same check as in (1)
Use a library to try to read the page count out of the file. If the lib is able to read a page count it should be a valid PDF file. Problem: I don't know a Python library that can do this
Are there solutions using a library or another trick?
The current solution (as of 2023) is to use pypdf and catch exceptions (and possibly analyze reader.metadata)
from pypdf import PdfReader
from pypdf.errors import PdfReadError
with open("testfile.txt", "w") as f:
f.write("hello world!")
try:
PdfReader("testfile.txt")
except PdfReadError:
print("invalid PDF file")
else:
pass
The two most commonly used PDF libraries for Python are:
pyPdf
ReportLab
Both are pure python so should be easy to install as well be cross-platform.
With pyPdf it would probably be as simple as doing:
from pyPdf import PdfFileReader
doc = PdfFileReader(file("upload.pdf", "rb"))
This should be enough, but doc will now have documentInfo() and numPages() methods if you want to do further checking.
As Carl answered, pdftotext is also a good solution, and would probably be faster on very large documents (especially ones with many cross-references). However it might be a little slower on small PDF's due to system overhead of forking a new process, etc.
In a project if mine I need to check for the mime type of some uploaded file. I simply use the file command like this:
from subprocess import Popen, PIPE
filetype = Popen("/usr/bin/file -b --mime -", shell=True, stdout=PIPE, stdin=PIPE).communicate(file.read(1024))[0].strip()
You of course might want to move the actual command into some configuration file as also command line options vary among operating systems (e.g. mac).
If you just need to know whether it's a PDF or not and do not need to process it anyway I think the file command is a faster solution than a lib. Doing it by hand is of course also possible but the file command gives you maybe more flexibility if you want to check for different types.
If you're on a Linux or OS X box, you could use Pdftotext (part of Xpdf, found here). If you pass a non-PDF to pdftotext, it will certainly bark at you, and you can use commands.getstatusoutput to get the output and parse it for these warnings.
If you're looking for a platform-independent solution, you might be able to make use of pyPdf.
Edit: It's not elegant, but it looks like pyPdf's PdfFileReader will throw an IOError(22) if you attempt to load a non-PDF.
I run into the same problem but was not forced to use a programming language to manage this task. I used pyPDF but was not efficient for me as it hangs infinitely on some corrupted files.
However, I found this software useful till now.
Good luck with it.
https://sourceforge.net/projects/corruptedpdfinder/
Here is a solution using pdfminersix, which can be installed with pip install pdfminer.six:
from pdfminer.high_level import extract_text
def is_pdf(path_to_file):
try:
extract_text(path_to_file)
return True
except:
return False
You can also use filetype (pip install filetype):
import filetype
def is_pdf(path_to_file):
return filetype.guess(path_to_file).mime == 'application/pdf'
Neither of these solutions is ideal.
The problem with the filetype solution is that it doesn't tell you if the PDF itself is readable or not. It will tell you if the file is a PDF, but it could be a corrupt PDF.
The pdfminer solution should only return True if the PDF is actually readable. But it is a big library and seems like overkill for such a simple function.
I've started another thread here asking how to check if a file is a valid PDF without using a library (or using a smaller one).
By valid do you mean that it can be displayed by a PDF viewer, or that the text can be extracted? They are two very different things.
If you just want to check that it really is a PDF file that has been uploaded then the pyPDF solution, or something similar, will work.
If, however, you want to check that the text can be extracted then you have found a whole world of pain! Using pdftotext would be a simple solution that would work in a majority of cases but it is by no means 100% successful. We have found many examples of PDFs that pdftotext cannot extract from but Java libraries such as iText and PDFBox can.