Convert and save string to binary file in Python - python

I'm using PyOBEX to exchange binary files (e.g. images etc.) between my computer (Windows 7) and my phone (Android). However, when I use get() to get a file from my phone, it arrives on my computer as a str. I tried using the chardet module to find out what encoding to use to decode it and eventually turn it into a binary file, but it returned None. type() says that it's a str.
The code is the following:
import bluetooth
import BTDeviceFinder
import PyOBEX.client
name = "myDevice"
address = BTDeviceFinder.find_by_name(name)
port = BTDeviceFinder.find_port(address)
client = PyOBEX.client.BrowserClient(address, port)
client.connect()
a, b = client.get("pic.jpg")
where a is the header (that comes with a file sent via OBEX) and b is the actual file object. b looks something like this: https://drive.google.com/file/d/0By0ywTLTjb3LaFJaM2hWVEdBakE/view?usp=sharing
The PyOBEX documentation or Python forums say nothing about what encoding is used with get().
Do you know how to turn this string into binary data that can be used with write() and then saved in the original file format (i.e. .jpg)?

In python 2.7 strings represent raw bytes (this changes in python 3)
You simply need to save the data to a binary type file:
with open('file.jpg', 'wb') as handle:
handle.write(data_string)
Here is a link to the python doc on open:
https://docs.python.org/2/library/functions.html#open
Note that the "b" represents binary.
Again, this is assuming Python 2.7

Related

zapier - python - pass bytes to output to be used for next action

I am trying to make an automation on Zapier with the flow like this:
Trigger: a web hook that receive POST request. The body is a file key with a value of base64 string of a certain PDF, so the type is str
Action: a Zapier Python Code that retrieve the file from web hooks, decode the base64 string to bytes to get the real valid content of the PDF to say a variable named file_bytes
Action: a dropbox that retrieve the file_bytes from the step before, and upload it to dropbox
I coded the decoder myself (point 2) and tested that it worked well on my local system.
The problem is that Dropbox (point 3) only receive binary, while Python (point 2) can not pass a value other than JSON serializable. This is a clear limitation from Zapier:
output A dictionary or list of dictionaries that will be the "return value" of this code. You can explicitly return early if you like. This must be JSON serializable!
...
The close to what I can get from other question on this sites are these two, but it did not give me any luck.
Why am I getting a Runtime.MarshalError when using this code in Zapier?
Use Python to get image in Zapier
...
The code to decode base64 string to bytes is like so:
file_bytes = base64.b64decode(input_data['file'])
What I already did:
pass the file_bytes to output like so:
output = [{'file': input_data['file_bytes']}]}]
but it gave me This must be JSON serializable!
pass the file_bytes as string like so:
output = [{'file': str(input_data['file_bytes'])}]
it do uploaded to dropbox, but the file content is corrupt. (of course it is, duh)
pass the file_bytes as decoded string with latin-1 encoding:
output = [{'file': input_data['file_bytes'].decode('latin-1')}]
it do uploaded to dropbox, the PDF can also be opened, even having the same page number as the original PDF, but it is all blank (white, no content)
...
So, is this kind of feature really visible in Zapier platform? Or I was already at dead end even since the beginning?

Open URL encoded filenames in Unix

I'm a python n00b. I have downloaded URL encoded file and I want to work with it on my unix system(Ubuntu 14).
When I try and run some operations on my file, the system says that the file doesn't exist. How do I change my filename to a unix recognizable format?
Some of the files I have download have spaces in them so they would have to be presented with a backslash and then a space. Below is a snippet of my code
link = "http://www.stephaniequinn.com/Music/Scheherezade%20Theme.mp3"
output = open(link.split('/')[-1],'wb')
output.write(site.read())
output.close()
shutil.copy(link.split('/')[-1], tmp_dir)
The "link" you have actually is a URL. URLs are special and are not allowed to contain certain characters, such as spaces. These special characters can still be represented, but in an encoded form. The translation from special characters to this encoded form happens via a certain rule set, often known as "URL encoding". If interested, have a read over here: http://en.wikipedia.org/wiki/Percent-encoding
The encoding operation can be inverted, which is called decoding. The tool set with which you downloaded the files you mentioned most probably did the decoding already, for you. In your link example, there is only one special character in the URL, "%20", and this encodes a space. Your download tool set probably decoded this, and saved the file to your file system with the actual space character in the file name. That is, most likely you have a file in the file system with the following basename:
Scheherezade Theme.mp3
So, when you want to open that file from within Python, and all you have is the link, you first need to get the decoded variant of it. Python can decode URL-encoded strings with built-in tools. This is what you need:
>>> import urllib.parse
>>> url = "http://www.stephaniequinn.com/Music/Scheherezade%20Theme.mp3"
>>> urllib.parse.unquote(url)
'http://www.stephaniequinn.com/Music/Scheherezade Theme.mp3'
>>>
This assumes that you are using Python 3, and that your link object is a unicode object (type str in Python 3).
Starting off with the decoded URL, you can derive the filename. Your link.split('/')[-1] method might work in many cases, but J.F. Sebastian's answer provides a more reliable method.
To extract a filename from an url:
#!/usr/bin/env python2
import os
import posixpath
import urllib
import urlparse
def url2filename(url):
"""Return basename corresponding to url.
>>> url2filename('http://example.com/path/to/file?opt=1')
'file'
"""
urlpath = urlparse.urlsplit(url).path # pylint: disable=E1103
basename = posixpath.basename(urllib.unquote(urlpath))
if os.path.basename(basename) != basename:
raise ValueError # refuse 'dir%5Cbasename.ext' on Windows
return basename
Example:
>>> url2filename("http://www.stephaniequinn.com/Music/Scheherezade%20Theme.mp3")
'Scheherezade Theme.mp3'
You do not need to escape the space in the filename if you use it inside a Python script.
See complete code example on how to download a file using Python (with a progress report).

Map object has no len() in Python 3

I have this Python tool written by someone else to flash a certain microcontroller, but he has written this tool for Python 2.6 and I am using Python 3.3.
So, most of it I got ported, but this line is making problems:
data = map(lambda c: ord(c), file(args[0], 'rb').read())
The file function does not exist in Python 3 and has to be replaced with open. But then, a function which gets data as an argument causes an exception:
TypeError: object of type 'map' has no len()
But what I see so far in the documentation is, that map has to join iterable types to one big iterable, am I missing something?
What do I have to do to port this to Python 3?
In Python 3, map returns an iterator. If your function expects a list, the iterator has to be explicitly converted, like this:
data = list(map(...))
And we can do it simply, like this
with open(args[0], "rb") as input_file:
data = list(input_file.read())
rb refers to read in binary mode. So, it actually returns the bytes. So, we just have to convert them to a list.
Quoting from the open's docs,
Python distinguishes between binary and text I/O. Files opened in
binary mode (including 'b' in the mode argument) return contents as
bytes objects without any decoding.

What config file format to use for user-friendly strings of arbitrary bytes?

So I made a short Python script to launch files in Windows with ambiguous extensions by examining their magic number/file signature first:
https://superuser.com/a/317927/13889
https://gist.github.com/1119561
I'd like to compile it to a .exe to make association easier (either using bbfreeze or rewriting in C), but I need some kind of user-friendly config file to specify the matching byte strings and program paths. Basically I want to put this information into a plain text file somehow:
magic_numbers = {
# TINA
'OBSS': r'%PROGRAMFILES(X86)%\DesignSoft\Tina 9 - TI\TINA.EXE',
# PSpice
'*version': r'%PROGRAMFILES(X86)%\Orcad\Capture\Capture.exe',
'x100\x88\xce\xcf\xcfOrCAD ': '', #PSpice?
# Protel
'DProtel': r'%PROGRAMFILES(X86)%\Altium Designer S09 Viewer\dxp.exe',
# Eagle
'\x10\x80': r'%PROGRAMFILES(X86)%\EAGLE-5.11.0\bin\eagle.exe',
'\x10\x00': r'%PROGRAMFILES(X86)%\EAGLE-5.11.0\bin\eagle.exe',
'<?xml version="1.0" encoding="utf-8"?>\n<!DOCTYPE eagle ': r'%PROGRAMFILES(X86)%\EAGLE-5.11.0\bin\eagle.exe',
# PADS Logic
'\x00\xFE': r'C:\MentorGraphics\9.3PADS\SDD_HOME\Programs\powerlogic.exe',
}
(The hex bytes are just arbitrary bytes, not Unicode characters.)
I guess a .py file in this format works, but I have to leave it uncompiled and somehow still import it into the compiled file, and there's still a bunch of extraneous content like { and , to be confused by/screw up.
I looked at YAML, and it would be great except that it requires base64-encoding binary stuff first, which isn't really what I want. I'd prefer the config file to contain hex representations of the bytes. But also ASCII representations, if that's all the file signature is. And maybe also regexes. :D (In case the XML-based format can be written with different amounts of whitespace, for instance)
Any ideas?
You've already got your answer: YAML.
The data you posted up above is storing text representations of binary data; that will be fine for YAML, you just need to parse it properly. Usually you'd use something from the binascii module; in this case, likely the binascii.a2b_qp function.
magic_id_str = 'x100\x88\xce\xcf\xcfOrCAD '
magic_id = binascii.a2b_qp(magic_id_str)
To elucidate, I will use a unicode character as an easy way to paste binary data into the REPL (Python 2.7):
>>> a = 'Φ'
>>> a
'\xce\xa6'
>>> binascii.b2a_qp(a)
'=CE=A6'
>>> magic_text = yaml.load("""
... magic_string: '=CE=A6'
... """)
>>> magic_text
{'magic_string': '=CE=A6'}
>>> binascii.a2b_qp(magic_text['magic_string'])
'\xce\xa6'
I would suggest doing this a little differently. I would decouple these two settings from each other:
Magic number signature ===> mimetype
mimetype ==> program launcher
For the first part, I would use python-magic, a library that has bindings to libmagic. You can have python-magic use a custom magic file like this:
import magic
m = magic.Magic(magic_file='/path/to/magic.file')
Your users can specify a custom magic file mapping magic numbers to mimetypes. The syntax of magic files is documented. Here's an example showing the magic file for the TIFF format:
# Tag Image File Format, from Daniel Quinlan (quinlan#yggdrasil.com)
# The second word of TIFF files is the TIFF version number, 42, which has
# never changed. The TIFF specification recommends testing for it.
0 string MM\x00\x2a TIFF image data, big-endian
!:mime image/tiff
0 string II\x2a\x00 TIFF image data, little-endian
!:mime image/tiff
The second part then is pretty easy, since you only need to specify text data now. You could go with an INI or yaml format, as suggested by others, or you could even have just a simple tab-delimited file like this:
image/tiff C:\Program Files\imageviewer.exe
application/json C:\Program Files\notepad.exe
I've used some packages to build configuration files, also yaml. I recommend that you use ConfigParser or ConfigObj.
At last, the best option If you wanna build a human-readable configuration file with comments I strongly recommend use ConfigObj.
ConfigObj
Brief ConfigObj tutorial
ConfigParser
Brief ConfigParser tutorial
Enjoy!
Example of ConfigObj
With this code:
You can use ConfigObj to store them too. Try this one:
import configobj
def createConfig(path):
config = configobj.ConfigObj()
config.filename = path
config["Sony"] = {}
config["Sony"]["product"] = "Sony PS3"
config["Sony"]["accessories"] = ['controller', 'eye', 'memory stick']
config["Sony"]["retail price"] = "$400"
config["Sony"]["binary one"]= bin(173)
config.write()
You get this file:
[Sony]
product = Sony PS3
accessories = controller, eye, memory stick
retail price = $400
binary one = 0b10101101

How to get string data from a python PIL image object?

I'm trying to send the data of a gif file of a desktop through a socket to a remote desktop (for desktop sharing) but i can't get the string for the data using PIL, i don't know how to convert the Pil objects to string, here is my code (btw i know i can just write to a file then read the data like that but i think that is inefficient and i think that there is a better way maybe?).
from PIL import ImageGrab
import cStringIO
fakie = cStringIO.StringIO()
ImageGrab.grab().save(fakie, 'GIF')
data = fakie.getvalue()
fakie.close()
# This last bit of code is to see if the var data stored the right info in a str bc i need to send it through a socket
with open('C:\something\something\Desktop\image.gif', 'w') as f:
f.write(data)
The problem is that after the file is written the gif picture only displays the top 1/10 of the page (the gif file is messed up), and so i'm wondering if the problem lies within my computer or my code (i'm using vista on a VERY old computer, at least 6 years i think and i'm getting a new one soon). Any input is appreciated.
As #HYRY puts it, you must open the image file with "wb" mode instead of "w" - Without the "b" Python defaults to open it in text mode - in windows it means that whenever a 0x0a byte is written to the file, the O.S. writes a 0x0d 0x0a sequence instead, because it translates line ending sequences to Windows native line endings.
In the "wb" mode, there is no translation, and your image file won't be corrupted.

Categories

Resources