Python: Output prefixed with b - python

I'm pretty new to Python so please bear with me here!
I've taken some code from ActiveState (and then butchered it around a bit) to open a DBF file and then output to CSV.
This worked perfectly well on Python 2.5 but I've now moved it to Python 3.3 and ran into a number of issues, most of which I've resolved.
The final issue I have is that in order to run the code, I've had to prefix some items with b (because I was getting TypeError: expected bytes, bytearray or buffer compatible object errors)
The code now works, and outputs correctly, except that every field is displayed as b'DATAHERE' (where DATAHERE is the actual data of course!)
So... does anyone know how I can stop it from outputting the b character? I can post code if required but it's fairly lengthy so I was hoping someone would be able to spot what I expect to be something simple that I've done wrong!
Thanks!

You are seeing the code output byte values; if you expected unicode strings instead, simply decode:
yourdata.decode('ascii')
where ascii should be replaced by the encoding your data uses.

Related

file.read(integer) python read characters

As part of a task I must write a program that copies a number with a number of unknown digits from a file. Python language.
I have created a function which retains into int several digits containing the number.
The function to read from a file:
file.read (1) This way I managed to read one character. I tried to replace the number in my variable file.read (myint) - but the reading does not work. The program ran all the way, but nothing was called.
I tried before to convert myint to int, float, long but to no avail.
Can anyone help?
Thanks .
I do not know where the discussion fits, I'd be happy to guide you.
sorry for my english..
my code part 1
my code part 2

pyserial Python2/3 string headache

I'm using pyserial to communicate with some sensors which use the Modbus protocol. In Python 2.7, this works perfectly:
import serial
c = serial.Serial('port/address') # connect to sensor
msg = "\xFE\x44\x00\x08\x02\x9F\x25" # 7 hex bytes(?)
c.write(msg) # send signal
r = c.read(7) # read 7 hex bytes (?).
In Python 3, this does not work. I know it's something to do with differences in how Python 2/3 handle binary vs. unicode strings. I've found numerous other threads suggesting the solution should be to simply prepend a b on my message (msg=b""\xFE\x44\x00\x08\x02\x9F\x25") to specify it as a binary string but this does not work for my case.
Any insights? What should I be sending in Python 3 so the sensor recieves the same signal? I'm at my wit's end...
I should add that I'm totally new to serial connections (well... 1 week old), and (despite reading around quite a bit) I struggle with understanding different character/string formats... Hence question marks in comments above. Please pitch answers appropriately :).
Thanks in advance!
write expects argument to be str, not bytes, so passing b"\xFE\x44\x00\x08\x02\x9F\x25" directly to it won't work. You need to convert bytes to str first: c.write(b"\xFE\x44\x00\x08\x02\x9F\x25".decode()) should work.
Solution
It turned out that specifying the input as a byte string (msg=b""\xFE\x44\x00\x08\x02\x9F\x25") did work. Initial error was from a typo in the msg string...
Secondary errors arose from how the outputs were handled - in Python 2 ord() had to be applied to the indexed output to return integers, in Python 3 integers can be extracted directly from the output by indexing (i.e. no ord() necessary).
Hope this might help someone in the future...

How do I make Python 3.4 c_char_array read strings as two bytes?

I'm using pypyodbc with Python 3.4 on Ubuntu 12.04.
I'm trying to get the column names, but something is a little wonky. What is coming back is just the first character as a byte, like this:
(Pdb) Cname.value
b'T'
The thing behind the scenes is a ctypes char array:
(Pdb) Cname
<ctypes.c_char_Array_1024 object at 0xb6a1ad1c>
But if I look at the raw value:
(Pdb) Cname.raw
b'T\x00Y\x00P\x00E\x00_\x00N\x00A\x00M\x00E\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
you can see that the value TYPE_NAME is separated by \x00.
So it appears to me that what's happening is something (ctypes?) is reading that first \x00 as the null terminator for the string instead of part of the characters.
What can I do to modify the way ctypes is being used so that it will read the entire string? Everything else seems to work fine, it's just the descriptions that are wonky.
Your string is encoded in UTF-16LE. You want to call something like Cname.raw.decode('utf_16le').rstrip('\x00'). That will return a Python string, which you can then do with as you please.

Working with unicode encoded Strings from Active Directory via python-ldap

I already came up with this problem, but after some testing I decided to create a new question with some more specific Infos:
I am reading user accounts with python-ldap (and Python 2.7) from our Active Directory. This does work well, but I have problems with special chars. They do look like UTF-8 encoded strings when printed on the console. The goal is to write them into a MySQL DB, but I don't get those strings into proper UTF-8 from the beginning.
Example (fullentries is my array with all the AD entries):
fullentries[23][1].decode('utf-8', 'ignore')
print fullentries[23][1].encode('utf-8', 'ignore')
print fullentries[23][1].encode('latin1', 'ignore')
print repr(fullentries[23][1])
A second test with a string inserted by hand as follows:
testentry = "M\xc3\xbcller"
testentry.decode('utf-8', 'ignore')
print testentry.encode('utf-8', 'ignore')
print testentry.encode('latin1', 'ignore')
print repr(testentry)
The output of the first example ist:
M\xc3\xbcller
M\xc3\xbcller
u'M\\xc3\\xbcller'
Edit: If I try to replace the double backslashes with .replace('\\\\','\\) the output remains the same.
The output of the second example:
Müller
M�ller
'M\xc3\xbcller'
Is there any way to get the AD output properly encoded? I already read a lot of documentation, but it all states that LDAPv3 gives you strictly UTF-8 encoded strings. Active Directory uses LDAPv3.
My older question this topic is here: Writing UTF-8 String to MySQL with Python
Edit: Added repr(s) infos
First, know that printing to a Windows console is often the step that garbles data, so for your tests, you should print repr(s) to see the precise bytes you have in your string.
You need to find out how the data from AD is encoded. Again, print repr(s) will let you see the content of the data.
UPDATED:
OK, it looks like you're getting strange strings somehow. There might be a way to get them better, but you can adapt in any case, though it isn't pretty:
u.decode('unicode_escape').encode('iso8859-1').decode('utf8')
You might want to look into whether you can get the data in a more natural format.

How do I handle Python unicode strings with null-bytes the 'right' way?

Question
It seems that PyWin32 is comfortable with giving null-terminated unicode strings as return values. I would like to deal with these strings the 'right' way.
Let's say I'm getting a string like: u'C:\\Users\\Guest\\MyFile.asy\x00\x00sy'. This appears to be a C-style null-terminated string hanging out in a Python unicode object. I want to trim this bad boy down to a regular ol' string of characters that I could, for example, display in a window title bar.
Is trimming the string off at the first null byte the right way to deal with it?
I didn't expect to get a return value like this, so I wonder if I'm missing something important about how Python, Win32, and unicode play together... or if this is just a PyWin32 bug.
Background
I'm using the Win32 file chooser function GetOpenFileNameW from the PyWin32 package. According to the documentation, this function returns a tuple containing the full filename path as a Python unicode object.
When I open the dialog with an existing path and filename set, I get a strange return value.
For example I had the default set to: C:\\Users\\Guest\\MyFileIsReallyReallyReallyAwesome.asy
In the dialog I changed the name to MyFile.asy and clicked save.
The full path part of the return value was: u'C:\Users\Guest\MyFile.asy\x00wesome.asy'`
I expected it to be: u'C:\\Users\\Guest\\MyFile.asy'
The function is returning a recycled buffer without trimming off the terminating bytes. Needless to say, the rest of my code wasn't set up for handling a C-style null-terminated string.
Demo Code
The following code demonstrates null-terminated string in return value from GetSaveFileNameW.
Directions: In the dialog change the filename to 'MyFile.asy' then click Save. Observe what is printed to the console. The output I get is u'C:\\Users\\Guest\\MyFile.asy\x00wesome.asy'.
import win32gui, win32con
if __name__ == "__main__":
initial_dir = 'C:\\Users\\Guest'
initial_file = 'MyFileIsReallyReallyReallyAwesome.asy'
filter_string = 'All Files\0*.*\0'
(filename, customfilter, flags) = \
win32gui.GetSaveFileNameW(InitialDir=initial_dir,
Flags=win32con.OFN_EXPLORER, File=initial_file,
DefExt='txt', Title="Save As", Filter=filter_string,
FilterIndex=0)
print repr(filename)
Note: If you don't shorten the filename enough (for example, if you try MyFileIsReally.asy) the string will be complete without a null byte.
Environment
Windows 7 Professional 64-bit (no service pack), Python 2.7.1, PyWin32 Build 216
UPDATE: PyWin32 Tracker Artifact
Based on the comments and answers I have received so far, this is likely a pywin32 bug so I filed a tracker artifact.
UPDATE 2: Fixed!
Mark Hammond reported in the tracker artifact that this is indeed a bug. A fix was checked in to rev f3fdaae5e93d, so hopefully that will make the next release.
I think Aleksi Torhamo's answer below is the best solution for versions of PyWin32 before the fix.
I'd say it's a bug. The right way to deal with it would probably be fixing pywin32, but in case you aren't feeling adventurous enough, just trim it.
You can get everything before the first '\x00' with filename.split('\x00', 1)[0].
This doesn't happen on the version of PyWin32/Windows/Python I tested; I don't get any nulls in the returned string even if it's very short. You might investigate if a newer version of one of the above fixes the bug.
ISTR that I had this issue some years ago, then I discovered that such Win32 filename-dialog-related functions return a sequence of 'filename1\0filename2\0...filenameN\0\0', while including possible garbage characters depending on the buffer that Windows allocated.
Now, you might prefer a list instead of the raw return value, but that would be a RFE, not a bug.
PS When I had this issue, I quite understood why one would expect GetOpenFileName to possibly return a list of filenames, while I couldn't imagine why GetSaveFileName would. Perhaps this is considered as API uniformity. Who am I to know, anyway?

Categories

Resources