I have this file "image.jp
and this .mp3 file:
"Green Day - When I Come Around [Official Music Video].mp3"
in the directory "test"
I have already successfully set tags as Author, Title, Album and etc using eyeD3 library.
and then I try to set the Cover Art.
I've tried two possibilities, but none of them worked:
First one: Mutagen:
from mutagen.mp3 import MP3
from mutagen.id3 import ID3, APIC, error
complete_file_path = "test\\"+"Green Day - When I Come Around [Official Music Video].mp3"
path_to_thumb_wf = "test\\"+"image.jpg"
audio = MP3(complete_file_path, ID3=ID3)
# add ID3 tag if it doesn't exist
try:
audio.add_tags()
except error:
pass
print(path_to_thumb_wf)
audio.tags.add(
APIC(
encoding=3, # 3 is for utf-8
mime='image/jpg', # image/jpeg or image/png
type=3, # 3 is for the cover image
desc=u'Cover',
data=open(path_to_thumb_wf, 'rb').read()
)
)
audio.save(v2_version=3)
And the solution using eyeD3
audiofile = eyed3.load(complete_file_path)
# read image into memory
imagedata = open(path_to_thumb_wf,"rb").read()
# append image to tags
audiofile.tag.images.set(3,imagedata,"image/jpeg", u"you can put a description here")
audiofile.tag.save()
I'm using python 3.5.2 on Windows 10. And i don't know if it could influence the result but i'll say anyway, the song has already a cover art that I'd like to change.
As explained in the ID3v2.3 section on APIC:
There may be several pictures attached to one file, each in their individual "APIC" frame, but only one with the same content descriptor. There may only be one picture with the picture type declared as picture type $01 and $02 respectively.
In v2.3, IIRC, "content descriptor" isn't actually documented anywhere, so different clients may do slightly different things here, but most tools will treat it either as the picture type plus description string, or as the entire header (text encoding, MIME type, picture type, and encoded description) as a binary blob. (And some tools just ignore it and allow you to store pictures with completely identical frame headers, but I don't think that's relevant with Mutagen.)
At any rate, this means you're probably just adding another Cover (front) picture, named 'Cover', rather than replacing any existing one.
You haven't explained how you're looking at the file. But I'm guessing you're trying to open it in Windows Media Player or iTunes or some other player, or view it in Windows Explorer (which I think just asks WMP to read the tag), or something like that?
Almost all such tools, when faced with multiple images, just show you the first one. (Some of them don't even distinguish on picture type, and show you the first image of any type, even if it's a 32x32 file icon…)
Some do have a way to view the other pictures, however. For example, in iTunes, if you Get Info or Properties on the track, then go to the Cover Art or similar tab (sorry for the vagueness, but the names have changed across versions), you can see all of the pictures in the tag.
At any rate, if you want to replace the APIC with a different one, you either need to exactly match the descriptor (and, again, that can mean different things to different libraries…), or, more simply, just delete the old one as well as adding the new one.
One more thing to watch out for: both iTunes and WMP cache cover art, and assume that it's never going to change once the file has been imported. And WMP also has various things that can override the image in the file, such as a properly-UUID'd folder cover art image in the same directory.
Related
I wanted change the all fonts in about 100 powerpoint files, without opening the files. There are several shape types in each slide and each might have a different font. I used python-pptx package and wrote the following code to change the fonts of all texts in a powerpoint presentation. Although it does not give any error, it does not work, and the fonts in the file are still whatever they were, for example Arial. I also added print(shape.text) to make sure that it has found all texts, and it seems that there is no issue there. Is it a bug? Or am I missing anything?
prs = Presentation('f10.pptx')
for i, slide in enumerate(prs.slides):
for shape in slide.shapes:
print (shape.has_text_frame)
if shape.has_text_frame:
print(shape.text)
for p in shape.text_frame.paragraphs:
for r in p.runs:
print(r.font.name)
r.font.name = 'Tahoma'
print(r.font.name)
prs.save('f10_tahoma.pptx')
Besides, it seems that the package does not work for utf-8 characters. I added a text-box on the last slide by adding:
text_frame = shape.text_frame
text_frame.clear() # not necessary for newly-created shape
p = text_frame.paragraphs[0]
run = p.add_run()
run.text = 'سلام '
font = run.font
font.name = 'Andalus'
font.size = Pt(18)
before saving the file to add a textbox with utf-8 characters. It adds it there, and when I check the font it shows that it is set to Andalus, but actually it is not Andalus.
With Aspose.Slides for Python via .NET, you can easily change all fonts for all texts in your presentations. The following code example shows you how to do this:
import aspose.slides as slides
with slides.Presentation('example.pptx') as presentation:
for slide in presentation.slides:
for shape in slide.shapes:
if isinstance(shape, slides.AutoShape):
for paragraph in shape.text_frame.paragraphs:
for portion in paragraph.portions:
portion.portion_format.latin_font = slides.FontData('Tahoma')
You can also evaluate Aspose.Slides Cloud SDK for Python for presentation manipulating. This REST-based API allows you to make 150 free API calls per month for API learning and presentation processing.
Aspose Slides Online Viewer can be used to view presentations without PowerPoint installed.
I work at Aspose.
What language is the text of the file? Run.font properties work fine for UTF-8, but there is a separate font for cursive scripts like Arabic. Access to that secondary font is not implemented in python-pptx unfortunately, but that could explain at least part of the behavior you're seeing.
For roman character text (like that we're using here), there are a couple things to check.
The font in question needs to be installed on the machine PowerPoint is running on when the document is opened. Otherwise PowerPoint will substitute a font.
The font (typeface) name used in the XML will not always exactly match what appears in the PowerPoint drop-down selection box. You need to give that name to python-pptx in the exact form it should appear in the XML. You may need to make an example file that works by hand, perhaps containing a single slide with a single textbox for simplicity, and then inspect the XML of that file to find the "spelling" used for that typeface by PowerPoint.
You could do that with code like this:
prs = Presentation("example.pptx")
shape = prs.slides[0].shapes[0]
print(shape._element.xml)
You should be able to locate the typeface name somewhere in an element like <p:rPr> or <p:defRPr>.
I am creating image that I would like to embed in the e-mail. I cannot figure out how to create image as binary and pass into MIMEImage. Below is the code I have and I have error when I try to read image object - the error is "AttributeError: 'NoneType' object has no attribute 'read'".
image=Image.new("RGBA",(300,400),(255,255,255))
image_base=ImageDraw.Draw(image)
emailed_password_pic=image_base.text((150,200),emailed_password,(0,0,0))
imgObj=emailed_password_pic.read()
msg=MIMEMultipart()
html="""<p>Please finish registration <br/><img src="cid:image.jpg"></p>"""
img_file='image.jpg'
msgText = MIMEText(html,'html')
msgImg=MIMEImage(imgObj)
msgImg.add_header('Content-ID',img_file)
msg.attach(msgImg)
msg.attach(msgText)
If you look at line 4 - I am trying to read image so that I can pass it into MIMEImage. Apparently, image needs to be read as binary. However, I don't know how to convert it to binary so that .read() can process it.
FOLLOW-UP
I edited code per suggestions from jsbueno - thank you very much!!!:
emailed_password=os.urandom(16)
image=Image.new("RGBA",(300,400),(255,255,255))
image_base=ImageDraw.Draw(image)
emailed_password_pic=image_base.text((150,200),emailed_password,(0,0,0))
stream_bytes=BytesIO()
image.save(stream_bytes,format='png')
stream_bytes.seek(0)
#in_memory_file=stream_bytes.getvalue()
#imgObj=in_memory_file.read()
imgObj=stream_bytes.read()
msg=MIMEMultipart()
sender='xxx#abc.com'
receiver='jjjj#gmail.com'
subject_header='Please use code provided in this e-mail to confirm your subscription.'
msg["To"]=receiver
msg["From"]=sender
msg["Subject"]=subject_header
html="""<p>Please finish registration by loging into your account and typing in code from this e-mail.<br/><img src="cid:image.png"></p>"""
img_file='image.png'
msgText=MIMEText(html,'html')
msgImg=MIMEImage(imgObj) #Is mistake here?
msgImg.add_header('Content-ID',img_file)
msg.attach(msgImg)
msg.attach(msgText)
smtpObj=smtplib.SMTP('smtp.mandrillapp.com', 587)
smtpObj.login(userName,userPassword)
smtpObj.sendmail(sender,receiver,msg.as_string())
I am not getting errors now but e-mail does not have image in it. I am confused about the way image gets attached and related to in html/email part. Any help is appreciated!
UPDATE:
This code actually works - I just had minor typo in the code on my PC.
There are a couple of conceptual errors there, both in using PIL and on what format an image should be in order to be incorporated into an e-mail.
In PIL: the ImageDraw class operates inplace, not like the Image class calls, which usually return a new image after each operation. In your code, it means that the call to image_base.text is actually changing the pixel data of the object that lies in your image variable. This call actually returns None and the code above should raise an error like "AttributeError: None object does not have attribute 'read'" on the following line.
Past that (that is, you should fetch the data from your image variable to attach it to the e-mail) comes the second issue: PIL, for obvious reasons, have images in an uncompressed, raw pixel data format in memory. When attaching images in e-mails we usually want images neatly packaged inside a file - PNG or JPG formats are usually better depending on the intent - let's just stay with .PNG. So, you have to create the file data using PIL, and them attach the file data (i.e. the data comprising a PNG file, including headers, metadata, and the actual pixel data in a compressed form). Otherwise you'd be putting in your e-mail a bunch of (uncompressed) pixel data that the receiving party would have no way to assemble back into an image (even if he would treat the data as pixels, raw pixel data does not contain the image shape so-)
You have two options: either generate the file-bytes in memory, or write them to an actual file in disk, and re-read that file for attaching. The second form is easier to follow. The first is both more efficient and "the right thing to do" - so let's keep it:
from io import BytesIO
# In Python 2.x:
# from StringIO import StringIO.StringIO as BytesIO
image=Image.new("RGBA",(300,400),(255,255,255))
image_base=ImageDraw.Draw(image)
# this actually modifies "image"
emailed_password_pic=image_base.text((150,200),emailed_password,(0,0,0))
stream = BytesIO()
image.save(stream, format="png")
stream.seek(0)
imgObj=stream.read()
...
(NB: I have not checked the part dealing with mail and mime proper in your code - if you are using it correctly, it should work now)
Change IPython working directory
Inserting image into IPython notebook markdown
Hi, I've read the two above links, and the second link seems most relevant. what the person describes - simply calling the subdirectory - doesn't work for me. For instance, I have an image 'gephi.png' in '/Graphs/gephi.png'
But when I write the following
from IPython.display import Image
path = "/Graphs/gephi.png"
i = Image(path)
i
no image pops up - Yup. No error. Just nothing pops up besides an empty square box image.
Clarification:
When I move the image to the regular director, the image pops up fine.
My only code change is path = "gephi.png"
IPython's Image display object takes three kinds of arguments
The first is raw image data (e.g. the results of open(filename).read():
with open("Graphs/graph.png") as f:
data = f.read()
Image(data=data)
The second model is to load an image from a filename. This is functionally the same as above, but IPython does the reading from the file:
Image(filename="Graphs/graph.png")
The third form is passing URLs. External URLs can be used, but relative URIs will serve files relative to the notebook's own directory:
Image(url="Graphs/graph.png")
Where this can get confusing is if you don't tell IPython which one of these you are specifying, and you just pass the one argument positionally:
Image("Graphs/graph.png")
IPython tries to guess what you mean in this case:
if it looks like a path and points to an existing file, use it as a filename
if it looks like a URL, use it as a URL
otherwise, fallback on embedding the string as raw png data
That #3 is the source of the most confusion. If you pass it a filename that doesn't exist,
you will get a broken image:
Image("/Graphs/graph.png")
Note that URLs to local files must be relative. Absolute URLs will generally be wrong:
Image(url="/Graphs/graph.png")
An example notebook illustrating these things.
In my work as a grad student, I capture microscope images and use python to save them as raw tif's. I would like to add metadata such as the name of the microscope I am using, the magnification level, and the imaging laser wavelength. These details are all important for how I post-process the images.
I should be able to do this with a tif, right? Since it has a header?
I was able to add to the info in a PIL image:
im.info['microscope'] = 'george'
but when I save and load that image, the info I added is gone.
I'm open to all suggestions. If I have too, I'll just save a separate .txt file with the metadata, but it would be really nice to have it embedded in the image.
Tifffile is one option for saving microscopy images with lots of metadata in python.
It doesn't have a lot of external documentation, but the docstings are great so you can get a lot of info just by typing help(tifffile) in python, or go look at the source code.
You can look at the TiffWriter.save function in the source code (line 750) for the different keyword arguments you can use to write metadata.
One is to use description, which accepts a string. It will show up as the tag "ImageDescription" when you read your image.
Another is to use the extratags argument, which accepts a list of tuples. That allows you to write any tag name that exist in TIFF.TAGS(). One of the easiest way is to write them as strings because then you don't have to specify length.
You can also write ImageJ metadata with ijmetadata, for which the acceptable types are listed in the source code here.
As an example, if you write the following:
import json
import numpy as np
import tifffile
im = np.random.randint(0, 255, size=(150, 100), dtype=np.uint8)
# Description
description = "This is my description"
# Extratags
metadata_tag = json.dumps({"ChannelIndex": 1, "Slice": 5})
extra_tags = [("MicroManagerMetadata", 's', 0, metadata_tag, True),
("ProcessingSoftware", 's', 0, "my_spaghetti_code", True)]
# ImageJ metadata. 'Info' tag needs to be a string
ijinfo = {"InitialPositionList": [{"Label": "Pos1"}, {"Label": "Pos3"}]}
ijmetadata = {"Info": json.dumps(ijinfo)}
# Write file
tifffile.imsave(
save_name,
im,
ijmetadata=ijmetadata,
description=description,
extratags=extra_tags,
)
You can see the following tags when you read the image:
frames = tifffile.TiffFile(save_name)
page = frames.pages[0]
print(page.tags["ImageDescription"].value)
Out: 'this is my description'
print(page.tags["MicroManagerMetadata"].value)
Out: {'ChannelIndex': 1, 'Slice': 5}
print(page.tags["ProcessingSoftware"].value)
Out: my_spaghetti_code
For internal use, try saving the metadata as JSON in the TIFF ImageDescription tag, e.g.
from __future__ import print_function, unicode_literals
import json
import numpy
import tifffile # http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html
data = numpy.arange(256).reshape((16, 16)).astype('u1')
metadata = dict(microscope='george', shape=data.shape, dtype=data.dtype.str)
print(data.shape, data.dtype, metadata['microscope'])
metadata = json.dumps(metadata)
tifffile.imsave('microscope.tif', data, description=metadata)
with tifffile.TiffFile('microscope.tif') as tif:
data = tif.asarray()
metadata = tif[0].image_description
metadata = json.loads(metadata.decode('utf-8'))
print(data.shape, data.dtype, metadata['microscope'])
Note that JSON uses unicode strings.
To be compatible with other microscopy software, consider saving OME-TIFF files, which store defined metadata as XML in the ImageDescription tag.
I should be able to do this with a tif, right? Since it has a header?
No.
First, your premise is wrong, but that's a red herring. TIFF does have a header, but it doesn't allow you to store arbitrary metadata in it.
But TIFF is a tagged file format, a series of chunks of different types, so the header isn't important here. And you can always create your own private chunk (any ID > 32767) and store anything you want there.
The problem is, nothing but your own code will have any idea what you stored there. So, what you probably want is to store EXIF or XMP or some other standardized format for extending TIFF with metadata. But even there, EXIF or whatever you choose isn't going to have a tag for "microscope", so ultimately you're going to end up having to store something like "microscope=george\nspam=eggs\n" in some string field, and then parse it back yourself.
But the real problem is that PIL/Pillow doesn't give you an easy way to store EXIF or XMP or anything else like that.
First, Image.info isn't for arbitrary extra data. At save time, it's generally ignored.
If you look at the PIL docs for TIFF, you'll see that it reads additional data into a special attribute, Image.tag, and can save data by passing a tiffinfo keyword argument to the Image.save method. But that additional data is a mapping from TIFF tag IDs to binary hunks of data. You can get the Exif tag IDs from the undocumented PIL.ExifTags.TAGS dict (or by looking them up online yourself), but that's as much support as PIL is going to give you.
Also, note that accessing tag and using tiffinfo in the first place requires a reasonably up-to-date version of Pillow; older versions, and classic PIL, didn't support it. (Ironically, they did have partial EXIF support for JPG files, which was never finished and has been stripped out…) Also, although it doesn't seem to be documented, if you built Pillow without libtiff it seems to ignore tiffinfo.
So ultimately, what you're probably going to want to do is:
Pick a metadata format you want.
Use a different library than PIL/Pillow to read and write that metadata. (For example, you can use GExiv2 or pyexif for EXIF.)
You could try setting tags in the tag property of a TIFF image. This is an ImageFileDirectory object. See TiffImagePlugin.py.
Or, if you have libtiff installed, you can use the subprocess module to call the tiffset command to set a field in the header after you have saved the file. There are online references of available tags.
According to this page:
If one needs more than 10 private tags or so, the TIFF specification suggests that, rather then using a large amount of private tags, one should instead allocate a single private tag, define it as datatype IFD, and use it to point to a socalled 'private IFD'. In that private IFD, one can next use whatever tags one wants. These private IFD tags do not need to be properly registered with Adobe, they live in a namespace of their own, private to the particular type of IFD.
Not sure if PIL supports this, though.
I need to find the difference between two PDF files. Does anybody know of any Python-related tool which has a feature that directly gives the diff of the two PDFs?
What do you mean by "difference"? A difference in the text of the PDF or some layout change (e.g. an embedded graphic was resized). The first is easy to detect, the second is almost impossible to get (PDF is an VERY complicated file format, that offers endless file formatting capabilities).
If you want to get the text diff, just run a pdf to text utility on the two PDFs and then use Python's built-in diff library to get the difference of the converted texts.
This question deals with pdf to text conversion in python: Python module for converting PDF to text.
The reliability of this method depends on the PDF Generators you are using. If you use e.g. Adobe Acrobat and some Ghostscript-based PDF-Creator to make two PDFs from the SAME word document, you might still get a diff although the source document was identical.
This is because there are dozens of ways to encode the information of the source document to a PDF and each converter uses a different approach. Often the pdf to text converter can't figure out the correct text flow, especially with complex layouts or tables.
I do not know your use case, but for regression tests of script which generates pdf using reportlab, I do diff pdfs by
Converting each page to an image using ghostsript
Diffing each page against page image of standard pdf, using PIL
e.g
im1 = Image.open(imagePath1)
im2 = Image.open(imagePath2)
imDiff = ImageChops.difference(im1, im2)
This works in my case for flagging any changes introduced due to code changes.
Met the same question on my encrypted pdf unittest, neither pdfminer nor pyPdf works well for me.
Here are two commands (pdftocairo, pdftotext) work perfect on my test. (Ubuntu Install: apt-get install poppler-utils)
You can get pdf content by:
from subprocess import Popen, PIPE
def get_formatted_content(pdf_content):
cmd = 'pdftocairo -pdf - -' # you can replace "pdftocairo -pdf" with "pdftotext" if you want to get diff info
ps = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE)
stdout, stderr = ps.communicate(input=pdf_content)
if ps.returncode != 0:
raise OSError(ps.returncode, cmd, stderr)
return stdout
Seems pdftocairo can redraw pdf files, pdftotext can extract all text.
And then you can compare two pdf files:
c1 = get_formatted_content(open('f1.pdf').read())
c2 = get_formatted_content(open('f2.pdf').read())
print(cmp(c1, c2)) # for binary compare
# import difflib
# print(list(difflib.unified_diff(c1, c2))) # for text compare
Even though this question is quite old, my guess is that I can contribute to the topic.
We have several applications generating tons of PDFs. One of these apps is written in Python and recently I wanted to write integration tests to check if the PDF generation was working correctly.
Testing PDF generation is HARD, because the specs for PDF files are very complicated and non-deterministic. Two PDFs, generated with the same exact input data, will generate different files, so direct file comparison is discarded.
The solution: we have to go with testing the way they look like (because THAT should be deterministic!).
In our case, the PDFs are being generated with the reportlab package, but this doesn't matter from the test perspective, we just need a filename or the PDF blob (bytes) from the generator. We also need an expectation file containing a "good" PDF to compare with the one coming from the generator.
The PDFs are converted to images and then compared. This can be done in multiple ways, but we decided to use ImageMagick, because it is extremely versatile and very mature, with bindings for almost every programming language out there. For Python 3, the bindings are offered by the Wand package.
The test looks something like the following. Specific details of our implementation were removed and the example was simplified:
import os
from unittest import TestCase
from wand.image import Image
from app.generators.pdf import PdfGenerator
DIR = os.path.dirname(__file__)
class PdfGeneratorTest(TestCase):
def test_generated_pdf_should_match_expectation(self):
# `pdf` is the blob of the generated PDF
# If using reportlab, this is what you get calling `getpdfdata()`
# on a Canvas instance, after all the drawing is complete
pdf = PdfGenerator().generate()
# PDFs are vectorial, so we need to set a resolution when
# converting to an image
actual_img = Image(blob=pdf, resolution=150)
filename = os.path.join(DIR, 'expected.pdf')
# Make sure to use the same resolution as above
with Image(filename=filename, resolution=150) as expected:
diff = actual.compare(expected, metric='root_mean_square')
self.assertLess(diff[1], 0.01)
The 0.01 is as low as we can tolerate small differences. Considering that diff[1] varies from 0 to 1 using the root_mean_square metric, we are here accepting a difference up to 1% on all channels, comparing with the sample expected file.
Check this out, it can be useful: pypdf