I'm reading some strings from a memory buffer, written by a C program. I need to fetch them using python and print them. however when I encounter a string containing %llx python does not know how to parse this:
"unsupported format character 'l' (0x6c) at index 14"
I could use replace('%llx','%x') but than it would not be a long long.. would python handle this correctly in this case?
than it would not be a long long
Python (essentially) doesn't have any concept of a long long. If you're pulling long longs from C code, just use %x and be done with it -- you're not ever going to get values from the C code that are out of the long long range, the only issue that could arise is if you were trying to send them from Python code into C. Just use (with a new-style format string):
print('{0:x}'.format(your_int))
Tested on both Python v3.3.3 and v2.7.6 :
>>> print('%x' % 523433939134152323423597861958781271347434)
6023bedba8c47434c84785469b1724910ea
Related
I'm using pyserial to communicate with some sensors which use the Modbus protocol. In Python 2.7, this works perfectly:
import serial
c = serial.Serial('port/address') # connect to sensor
msg = "\xFE\x44\x00\x08\x02\x9F\x25" # 7 hex bytes(?)
c.write(msg) # send signal
r = c.read(7) # read 7 hex bytes (?).
In Python 3, this does not work. I know it's something to do with differences in how Python 2/3 handle binary vs. unicode strings. I've found numerous other threads suggesting the solution should be to simply prepend a b on my message (msg=b""\xFE\x44\x00\x08\x02\x9F\x25") to specify it as a binary string but this does not work for my case.
Any insights? What should I be sending in Python 3 so the sensor recieves the same signal? I'm at my wit's end...
I should add that I'm totally new to serial connections (well... 1 week old), and (despite reading around quite a bit) I struggle with understanding different character/string formats... Hence question marks in comments above. Please pitch answers appropriately :).
Thanks in advance!
write expects argument to be str, not bytes, so passing b"\xFE\x44\x00\x08\x02\x9F\x25" directly to it won't work. You need to convert bytes to str first: c.write(b"\xFE\x44\x00\x08\x02\x9F\x25".decode()) should work.
Solution
It turned out that specifying the input as a byte string (msg=b""\xFE\x44\x00\x08\x02\x9F\x25") did work. Initial error was from a typo in the msg string...
Secondary errors arose from how the outputs were handled - in Python 2 ord() had to be applied to the indexed output to return integers, in Python 3 integers can be extracted directly from the output by indexing (i.e. no ord() necessary).
Hope this might help someone in the future...
I'm writing a Python C extension, and I need to convert some string into Python object, such as str "(1001,1.0,1)" to list (1001,1.0,1).
Now I'm using PyRun_StringFlags function to get py_object, but I found this is not fast enough for me, is there any other way to do so?
I'm pretty new to Python so please bear with me here!
I've taken some code from ActiveState (and then butchered it around a bit) to open a DBF file and then output to CSV.
This worked perfectly well on Python 2.5 but I've now moved it to Python 3.3 and ran into a number of issues, most of which I've resolved.
The final issue I have is that in order to run the code, I've had to prefix some items with b (because I was getting TypeError: expected bytes, bytearray or buffer compatible object errors)
The code now works, and outputs correctly, except that every field is displayed as b'DATAHERE' (where DATAHERE is the actual data of course!)
So... does anyone know how I can stop it from outputting the b character? I can post code if required but it's fairly lengthy so I was hoping someone would be able to spot what I expect to be something simple that I've done wrong!
Thanks!
You are seeing the code output byte values; if you expected unicode strings instead, simply decode:
yourdata.decode('ascii')
where ascii should be replaced by the encoding your data uses.
I have a file header which I am reading and planning on writing which contains information about the contents; version information, and other string values.
Writing to the file is not too difficult, it seems pretty straightforward:
outfile.write(struct.pack('<s', "myapp-0.0.1"))
However, when I try reading back the header from the file in another method:
header_version = struct.unpack('<s', infile.read(struct.calcsize('s')))
I have the following error thrown:
struct.error: unpack requires a string argument of length 2
How do I fix this error and what exactly is failing?
Writing to the file is not too difficult, it seems pretty straightforward:
Not quite as straightforward as you think. Try looking at what's in the file, or just printing out what you're writing:
>>> struct.pack('<s', 'myapp-0.0.1')
'm'
As the docs explain:
For the 's' format character, the count is interpreted as the size of the string, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. If a count is not given, it defaults to 1.
So, how do you deal with this?
Don't use struct if it's not what you want. The main reason to use struct is to interact with C code that dumps C struct objects directly to/from a buffer/file/socket/whatever, or a binary format spec written in a similar style (e.g. IP headers). It's not meant for general serialization of Python data. As Jon Clements points out in a comment, if all you want to store is a string, just write the string as-is. If you want to store something more complex, consider the json module; if you want something even more flexible and powerful, use pickle.
Use fixed-length strings. If part of your file format spec is that the name must always be 255 characters or less, just write '<255s'. Shorter strings will be padded, longer strings will be truncated (you might want to throw in a check for that to raise an exception instead of silently truncating).
Use some in-band or out-of-band means of passing along the length. The most common is a length prefix. (You may be able to use the 'p' or 'P' formats to help, but it really depends on the C layout/binary format you're trying to match; often you have to do something ugly like struct.pack('<h{}s'.format(len(name)), len(name), name).)
As for why your code is failing, there are multiple reasons. First, read(11) isn't guaranteed to read 11 characters. If there's only 1 character in the file, that's all you'll get. Second, you're not actually calling read(11), you're calling read(1), because struct.calcsize('s') returns 1 (for reasons which should be obvious from the above). Third, either your code isn't exactly what you've shown above, or infile's file pointer isn't at the right place, because that code as written will successfully read in the string 'm' and unpack it as 'm'. (I'm assuming Python 2.x here; 3.x will have more problems, but you wouldn't have even gotten that far.)
For your specific use case ("file header… which contains information about the contents; version information, and other string values"), I'd just use write the strings with newline terminators. (If the strings can have embedded newlines, you could backslash-escape them into \n, use C-style or RFC822-style continuations, quote them, etc.)
This has a number of advantages. For one thing, it makes the format trivially human-readable (and human-editable/-debuggable). And, while sometimes that comes with a space tradeoff, a single-character terminator is at least as efficient, possibly more so, than a length-prefix format would be. And, last but certainly not least, it means the code is dead-simple for both generating and parsing headers.
In a later comment you clarify that you also want to write ints, but that doesn't change anything. A 'i' int value will take 4 bytes, but most apps write a lot of small numbers, which only take 1-2 bytes (+1 for a terminator/separator) if you write them as strings. And if you're not writing small numbers, a Python int can easily be too large to fit in a C int—in which case struct will silently overflow and just write the low 32 bits.
I'm trying to write a Python C extension that processes byte strings, and I have something basically working for Python 2.x and Python 3.x.
For the Python 2.x code, near the start of my function, I currently have a line:
if (!PyArg_ParseTuple(args, "s#:in_bytes", &src_ptr, &src_len))
...
I notice that the s# format specifier accepts both Unicode strings and byte strings. I really just want it to accept byte strings and reject Unicode. For Python 2.x, this might be "good enough"--the standard hashlib seems to do the same, accepting Unicode as well as byte strings. However, Python 3.x is meant to clean up the Unicode/byte string mess and not let the two be interchangeable.
So, I'm surprised to find that in Python 3.x, the s format specifiers for PyArg_ParseTuple() still seem to accept Unicode and provide a "default encoded string version" of the Unicode. This seems to go against the principles of Python 3.x, making the s format specifiers unusable in practice. Is my analysis correct, or am I missing something?
Looking at the implementation for hashlib for Python 3.x (e.g. see md5module.c, function MD5_update() and its use of GET_BUFFER_VIEW_OR_ERROUT() macro) I see that it avoids the s format specifiers, and just takes a generic object (O specifier) and then does various explicit type checks using the GET_BUFFER_VIEW_OR_ERROUT() macro. Is this what we have to do?
I agree with you -- it's one of several spots where the C API migration of Python 3 was clearly not designed as carefully and thouroughly as the Python coder-visible parts. I do also agree that probably the best workaround for now is focusing on "buffer views", per that macro -- until and unless something better gets designed into a future Python C API (don't hold your breath waiting for that to happen, though;-).