What is a good audio library for validating files in Python? - python

I'm already checking for content-type, size, and extension (Django (audio) File Validation), but I need a library to read the file and confirm that it is in fact what I hope it is (mp3 and mp4 mostly).
I've been here: http://wiki.python.org/moin/Audio/ but no luck. Been at this one for a while, am a bit lost in the woods. Relying on SO big time for this whole end of things...
Thanks in advance.
EDIT:
I'm already (in Django) using UploadedFile.content_type() :
"The content-type header uploaded with the file (e.g. text/plain or application/pdf). Like any data supplied by the user, you shouldn't trust that the uploaded file is actually this type. You'll still need to validate that the file contains the content that the content-type header claims -- "trust but verify."
So, I'm already reading the header. But how can I validate the actual content of the file?

If just checking the header isn't good enough, I'd recommend using mutagen to load the file. It should throw an exception if it's not correct.
FYI, I do not think your approach is very scalable. Is it really necessary to read every byte of the file? What is your reason for not trusting the file header?

You can call a unix sub-shell within python like this:
>>> filename = 'Giant Steps.mp3'
>>> import os
>>> type = os.system('file %s' % filename)
Giant Steps.mp3: ISO Media, MPEG v4 system, iTunes AAC-LC
** See man pages for more details on the 'file' command if you want to go this route.
See this post for other options

Use sndhdr
It does a little more than content-type. Reads the file and gets it's headers..of course this is still not foolproof..using ffmpeg is probably then the only option.

Related

Reading attributes of .msg file

I am trying to read a .msg file to get the sender, recipients, and title.
I'm making this script for my workplace where I'm only allowed to install default python libraries so I want to use the email module to do this.
On the python website I found some examples of using the email module. https://docs.python.org/3/library/email.examples.html
Near the end of the page it talks about getting the sender, subject and recipient. I've tried using this code like this:
# Import the email modules we'll need
from email import policy
from email.parser import BytesParser
with open('test_email.msg', 'rb') as fp:
msg = BytesParser(policy=policy.default).parse(fp)
# Now the header items can be accessed as a dictionary, and any non-ASCII will
# be converted to unicode:
print('To:', msg['to'])
print('From:', msg['from'])
print('Subject:', msg['subject'])
This results in an output:
To: None
From: None
Subject: None
I checked the file test_email.msg, it is a valid email.
When I add a line of code
print(msg)
I get an output of a garbled email the same as if I opened the .msg file in notepad.
Can anybody suggest why the email module isn't finding the sender/recipient/subject correctly?
You are apparently attempting to read some sort of proprietary binary format. The Python email library does not support this; it only handles traditional (basically text) RFC822 / RFC5322 format.
To read Microsoft's OLE formats, you will need a third-party module, and some patience, voodoo, and luck.
Also, for the record, there is no unambigious definition of .msg. Outlook uses this file extension for its files, but it is used on other files in other formats as well, including also traditional RFC822 files.
(The second link attempts to link to the MS-OXMSG spec on MSDN; but Microsoft have in the past regarded URLs as some sort of depletable resource which runs out when you use it, so the link will probably stop working if enough people click on it.)

Send PDF file path to client to download after covnersion in WeasyPrint

In my Django app, I'm using WeasyPrint to convert html report to pdf. I need to send the converted file back to client so they can download it. But I don't see any code on WeasyPrint site where we can get the path of saved file or know in any way where the file was saved.
If I hard code the path, like, D:/Python/Workspace/report.pdf and try to open it via javascript, it simply says that the address was not understood.
What is a better way to apporach this issue?
My code:
HTML(string=htmlContent).write_pdf(fileName,
stylesheets=[CSS(filename='css/bootstrap.min.css')])
This is all the code related to WeasyPrint that generated PDF file.
You didn't even bothered to post the relevant code, but anyway:
If you're using the Python API, you either specify the output file path when calling weasyprint.HTML().write_pdf() or get the PDF back as bytestring, as documented here - and then you can either manually save it to a file somewhere you can redirect your user to or just pass the bytestring to django's HttpResponse.
If you're using the commandline (which would be quite surprising from a Django app...), you have to specify the output path too...
IOW : I don't really understand your problem. FWIW, the whole documentation is here : http://weasyprint.readthedocs.io/en/latest/ - and there's a quite obvious link on the project's homepage (which is how I found it FWIW).
EDIT : now you posted your actual code: the answer is written in plain in the FineManual(tm):
Parameters: target – A filename, file-like object, or None
Returns:
The PDF as byte string if target is not provided or None, otherwise None
(the PDF is written to target.)
IOW, either you choose to pass the filename for the generated to be generated and serve this file to the user, or you can just pass your Django HttpResponse as target, cf this example in Django's doc.

Upload image with an in-memory stream to input using Pillow + WebDriver?

I'm getting an Image from URL with Pillow, and creating an stream (BytesIO/StringIO).
r = requests.get("http://i.imgur.com/SH9lKxu.jpg")
stream = Image.open(BytesIO(r.content))
Since I want to upload this image using an <input type="file" /> with selenium WebDriver. I can do something like this to upload a file:
self.driver.find_element_by_xpath("//input[#type='file']").send_keys("PATH_TO_IMAGE")
I would like to know If its possible to upload that image from a stream without having to mess with files / file paths... I'm trying to avoid filesystem Read/Write. And do it in-memory or as much with temporary files. I'm also Wondering If that stream could be encoded to Base64, and then uploaded passing the string to the send_keys function you can see above :$
PS: Hope you like the image :P
You seem to be asking multiple questions here.
First, how do you convert a a JPEG without downloading it to a file? You're already doing that, so I don't know what you're asking here.
Next, "And do it in-memory or as much with temporary files." I don't know what this means, but you can do it with temporary files with the tempfile library in the stdlib, and you can do it in-memory too; both are easy.
Next, you want to know how to do a streaming upload with requests. The easy way to do that, as explained in Streaming Uploads, is to "simply provide a file-like object for your body". This can be a tempfile, but it can just as easily be a BytesIO. Since you're already using one in your question, I assume you know how to do this.
(As a side note, I'm not sure why you're using BytesIO(r.content) when requests already gives you a way to use a response object as a file-like object, and even to do it by streaming on demand instead of by waiting until the full content is available, but that isn't relevant here.)
If you want to upload it with selenium instead of requests… well then you do need a temporary file. The whole point of selenium is that it's scripting a web browser. You can't just type a bunch of bytes at your web browser in an upload form, you have to select a file on your filesystem. So selenium needs to fake you selecting a file on your filesystem. This is a perfect job for tempfile.NamedTemporaryFile.
Finally, "I'm also Wondering If that stream could be encoded to Base64".
Sure it can. Since you're just converting the image in-memory, you can just encode it with, e.g., base64.b64encode. Or, if you prefer, you can wrap your BytesIO in a codecs wrapper to base-64 it on the fly. But I'm not sure why you want to do that here.

Python : How to email attached Unix file in DOS format

On a Unix server, I am using smtplib in python to send an email to myself ; the email also contains a unix file attachment. I use outlook client to view the email and when I open the file, it does not display correctly due to differences in Unix and DOS format.
Is there anyway using smtplib to send the Unix file in DOS format ?
I do not want to use unix2dos as I do not want to create/modify files on the filesystem.
Editing the question to include changes based on suggestions from senior members
Since I have been asked to modify the file, need to know if there is a simpler way to do that. I am not well versed with Python so please bear with me. I have tried a few variations of the following but none have worked. My requirement is that I do not want to write to the file system. I want to save the changes into a variable in memory.
import string
fo=open(filename,"r")
filecontent=fo.readlines()
for line in filecontent:
line = string.replace(line,"\n","\r\m")
This is only a variation around the first comment on your question:
with open(filename, 'r') as f:
content = f.read().replace('\n', '\r\n')
After that, you have in the variable content the ... content of your file, with newlines replaced. In addition, using the with construct ensure your file is properly closed after reading.
Please note it is your responsibility to ensure that the file is "small enough" to hold in memory. If not sure, you could read line by line as you proposed yourself. That being said, I'm not quite sure to understand what was wrong with that at first...

How to handle unicode of an unknown encoding in Django?

I want to save some text to the database using the Django ORM wrappers. The problem is, this text is generated by scraping external websites and many times it seems they are listed with the wrong encoding. I would like to store the raw bytes so I can improve my encoding detection as time goes on without redoing the scrapes. But Django seems to want everything to be stored as unicode. Can I get around that somehow?
You can store data, encoded into base64, for example. Or try to analize HTTP headers from browser, may be it is simplier to get proper encoding from there.
Create a File with the data. Use a Django models.FileField to hold a reference to the file.
No it does not involve a ton of I/O. If your file is small it adds 2 or 3 I/O's (the directory read, the iNode read and the data read.)

Categories

Resources