Adding values to binary string - python

I would like to add hexvalues to a binary string so that I end up with a binary string that can be transmitted.
What I want is:
StringToAppend = "5ce7e615ff0000000000010202041f0140009e005d006404084c5ce82215ff1d02000000010202041f013b0097005c005e04777c" (I have this in unhexlified form and want to append it to a string a la StatusStr = chr(0)
How do I do this??? This is what i have:
>>> not_macs_buffer= unhexlify("5ce7e615ff0000000000010202041f0140009e005d006404084c5ce82215ff1d02000000010202041 f013b0097005c005e04777c")
>>> StatusStr = chr(0)
>>> for i in xrange(0,len(not_macs_buffer)):
... StatusStr +=chr(not_macs_buffer[i])
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
TypeError: an integer is required
>>>

What are you transmitting the string to/from? Does it have to be in hex?
The issue seems to be that are you are converting your hex string into a binary string, then in your loop you are attempting to convert a string character into a character, using chr(). This fails because chr() only takes an integer value representing a 256-value ASCII code, not a string.
To fix your problem, just change StatusStr +=chr(not_macs_buffer[i]) to this:
StatusStr += not_macs_buffer[i]
Of course, you could forgo the loop completely.
StatusStr = chr(0) + not_macs_buffer
And if you really did need to convert a list of integers to a string, you could use a list comprehension and then join the list. (I won't give an example since it's not relevant)
EDIT:
If you want to add the null value to your original hex string, you can do this:
StringToAppend = '5ce7e6' # ... snip the real value
StatusStr = hexlify(chr(0)) + StringToAppend
# or
StatusStr = hexlify('\x00') + StringToAppend
# or
StatusStr = '0000' + StringToAppend

Well, Thanks all but what I actually ended up doing to get what I wanted, is:
>>> not_macs_buffer= unhexlify("5ce7e615ff0000000000010202041f0140009e005d006404084c5ce82215ff1d02000000010202041 f013b0097005c005e04777c")
>>> StatusStr = chr(0)
>>> for i in xrange(0,len(not_macs_buffer)):
... StatusStr +=chr(ord(not_macs_buffer[i]))

Related

Python equivalent of Ruby's Array#pack, how to pack unknown string length and bytes together

I am working my way through the book "Building Git", which goes through building Git with Ruby. I decided to write it in python while still following along in the book.
The author uses a function defined in ruby Array#pack to pack a git tree object. Git uses binary representation for the 40 character blob hash to reduce it to 20 bytes. In the authors words:
Putting everything together, this generates a string for each entry consisting of the mode 100644,
a space, the filename, a null byte, and then twenty bytes for the object ID. Ruby’s Array#pack
supports many more data encodings and is very useful for generating binary representations of
values. If you wanted to, you could implement all the maths for reading pairs of digits from
the object ID and turning each pair into a single byte, but Array#pack is so convenient that I
usually reach for that first.
He uses the following code to implement this:
def to_s
entries = #entries.sort_by(&:name).map do |entry|
["#{ MODE } #{ entry.name }", entry.oid].pack(ENTRY_FORMAT)
end
with ENTRY_FORMAT = "Z*H40" and MODE = "100644".
entry is class that has :name and :oid attributes, representing the name and the SHA1 hash of a filename.
The goal is also explained by the author:
Putting everything together, this generates a string for each entry consisting of the mode 100644,
a space, the filename, a null byte, and then twenty bytes for the object ID. Ruby’s Array#pack
supports many more data encodings and is very useful for generating binary representations of
values. If you wanted to, you could implement all the maths for reading pairs of digits from
the object ID and turning each pair into a single byte, but Array#pack is so convenient that I
usually reach for that first.
And the format "Z*H40" means the following:
Our usage here consists of two separate encoding instructions:
Z*: this encodes the first string, "#{ MODE } #{ entry.name }", as an arbitrary-length null-
padded string, that is, it represents the string as-is with a null byte appended to the end
H40: this encodes a string of forty hexadecimal digits, entry.oid, by packing each pair of
digits into a single byte as we saw in Section 2.3.3, “Trees on disk”
I have tried for many hours to replicate this in python using struct.pack and other various methods, but either i am not getting the format correct, or I am just missing something very obvious. In any case, this is what I currently have:
def to_s(self):
entries = sorted(self.entries, key=lambda x: x.name)
entries = [f"{self.MODE} {entry.name}" + entry.oid.encode() for entry in entries]
packed_entries = b"".join(pack("!Z*40s", entry) for entry in entries)
return packed_entries
but obviously this will give a concat error from bytes() to str().
Traceback (most recent call last):
File "jit.py", line 67, in <module>
database.store(tree)
File "/home/maslin/jit/pyJit/database.py", line 12, in store
string = obj.to_s()
File "/home/maslin/jit/pyJit/tree.py", line 40, in to_s
entries = [f"{self.MODE} {entry.name}" + entry.oid.encode() for entry in entries]
File "/home/maslin/jit/pyJit/tree.py", line 40, in <listcomp>
entries = [f"{self.MODE} {entry.name}" + entry.oid.encode() for entry in entries]
TypeError: can only concatenate str (not "bytes") to str
So then I tried to keep everything as a string, and tried using struct.pack to format it for me, but it gave me a struct.error: bad char in struct format error.
def to_s(self):
entries = sorted(self.entries, key=lambda x: x.name)
entries = [f"{self.MODE} {entry.name}" + entry.oid for entry in entries]
packed_entries = b"".join(pack("!Z*40s", entry) for entry in entries)
return packed_entries
And the traceback:
Traceback (most recent call last):
File "jit.py", line 67, in <module>
database.store(tree)
File "/home/maslin/jit/pyJit/database.py", line 12, in store
string = obj.to_s()
File "/home/maslin/jit/pyJit/tree.py", line 41, in to_s
packed_entries = b"".join(pack("!Z*40s", entry) for entry in entries)
File "/home/maslin/jit/pyJit/tree.py", line 41, in <genexpr>
packed_entries = b"".join(pack("!Z*40s", entry) for entry in entries)
struct.error: bad char in struct format
How can I pack a string for each entry consisting of the mode 100644,
a space, the filename, a null byte, and then twenty bytes for the object ID?
The author notes above that this can be done by "implementing all the maths for reading pairs of digits from
the object ID and turning each pair into a single byte", so if your solution involves this method, that is also ok.
P.S. this question did not help me nor did this.
P.P.S. ChatGPT was no help as well
So, I had to look this up. The binary format is simple,
the mode as an ascii byte string,
an ascii space
the filename as a byte string,
a null byte
the sha digest in binary format.
So,
mode = b"100644"
Note, mode is a bytes object. You should probably just have it as a bytes object,but if it is a string, you can just .encode it and it should work with utf-8 since it will only be in the ascii range.
Now, your filename is probably a string, e.g.:
filename = "foo.py"
Now, you didn't say exactly, but I presume your oid is the sha1 hexdigest, i.e. a length 40 string of the digest in hexadecimal. However, you probably should just work with the raw digest. Assuming you consumed
>>> import hashlib
>>> sha = hashlib.sha1(b"print('hello, world')")
>>> sha.hexdigest()
'da8b53bb595a2bd0161f6470a4c3a82f6aa1dc9e'
>>> sha.digest()
b'\xda\x8bS\xbbYZ+\xd0\x16\x1fdp\xa4\xc3\xa8/j\xa1\xdc\x9e'
You want just the .digest() directly. You should probably just keep around the hash object and get whatever you need from there, or you can convert back and for, so if you have the hexdigest, you can get to the binary using:
>>> oid = sha.hexdigest()
>>> oid
'da8b53bb595a2bd0161f6470a4c3a82f6aa1dc9e'
>>> int(oid, 16).to_bytes(20)
b'\xda\x8bS\xbbYZ+\xd0\x16\x1fdp\xa4\xc3\xa8/j\xa1\xdc\x9e'
Bute really, if you are just going to keep one around, I'd keep the binary form, it seems more natural to me to convert to an int then format that in hex:
>>> oid = sha.digest()
>>> oid
b'\xda\x8bS\xbbYZ+\xd0\x16\x1fdp\xa4\xc3\xa8/j\xa1\xdc\x9e'
>>> int.from_bytes(oid)
1247667085693497210187506196029418989550863244446
>>> f"{int.from_bytes(oid):x}"
'da8b53bb595a2bd0161f6470a4c3a82f6aa1dc9e'
So, I'm going to assume you have:
>>> import hashlib
>>> mode = b"100644"
>>> filename = "foo.py"
>>> sha = hashlib.sha1(b"print('hello, world')")
>>> oid = sha.digest()
Now, there is no f-string-like interpolation for bytes-literals, but you can use the old-school % based formatting:
>>> entry = b"%s %s\x00%s" % (mode, filename.encode(), oid)
>>> entry
b'100644 foo.py\x00\xda\x8bS\xbbYZ+\xd0\x16\x1fdp\xa4\xc3\xa8/j\xa1\xdc\x9e'
Or since this is so simple, just concatenation:
>>> entry = mode + b" " + filename.encode() + b"\x00" + oid
>>> entry
b'100644 foo.py\x00\xda\x8bS\xbbYZ+\xd0\x16\x1fdp\xa4\xc3\xa8/j\xa1\xdc\x9e'
Now, you could use struct.pack here, but it's a bit unwieldy. There's no good way to add a space except as a single characer. Also, you'd have to dynamically come up with the format string, since there is no format for "arbitrary sized, null terminated bytes string". But you can use an f-string and len(file.encode()) + 1. So it would need to be something like:
>>> struct.pack(f">6sc{len(filename.encode())+1}s20s", mode, b" ", filename.encode(), oid)
b'100644 foo.py\x00\xda\x8bS\xbbYZ+\xd0\x16\x1fdp\xa4\xc3\xa8/j\xa1\xdc\x9e'
>>> struct.pack(f">6sc{len(filename.encode())+1}s20s", mode, b" ", filename.encode(), oid) == entry
True

invalid literal for int() with base 10: '328.94'(while converting bytes to int())

This is my code:
import serial
print('Arduino is setting up')
# Setting up the Arduino board
arduinoSerialData = serial.Serial('com4', 9600)
while True:
if arduinoSerialData.inWaiting() > 1:
myData = arduinoSerialData.readline()
myData = str(myData)
myData = myData.replace("b'", '')
myData = myData.replace("\\r\\n'", '')
myData1=myData
if myData1.find("a"):
myData1= myData1.replace("a",str(0))
if int(myData1)<100:
print(myData)
What this code does is it imports the data from the ultrasonic sensor thats attached to the arduino board, and prints it.myData is initially in bytes so I convert it to string, but I cannot seem to convert it to int.When I tried the above code, I get try this code, I get this error.Anyone know how to troubleshoot this?Thanks!
it seems that your bytes to string conversion is not correct. Why not try this:
1. Bytes to string conversion:
mydata = myData.decode("utf-8")
2. Eliminatinf trailing newline characters:
myData = myData.strip("\r\n")
Make sure that that the resulting string contains only numeric characters to get converted to int. You can do this check :
if mydata1.isdigit() and int(mydata1) < 100:
<your code>
If ur string contains float number,then u can perform do this:
if mydata1.replace(".", "").isdigit() and int(float(mydata1)) < 100:
If you give a string to int(), it needs to be an integer. If you instead have a non-integer, you can convert it with float() first, then use int() to turn that floating point value into an integer, as per the following transcript:
>>> print(int("328.94")) # Will not work.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '328.94'
>>> print(float("328.94")) # Convert string to float.
328.94
>>> print(int(float("328.94"))) # Convert string to float to int.
328
>>> print(int(float("328.94") + 0.5)) # Same but rounded.
329
That last one is an option if you want it rounded to the nearest integer, rather than truncated.

How can I convert hexadecimal file data into ASCII?

Am writing a program with python gui. that program concept is when we run the prgm it will ask to open one file(witch contains hexa decimal value as TASK.txt) with read mode.
am storing the data of one line in one variable.
how can i convert that data into ascii value. Am new to python. This is my code:
import binascii
import base64
from tkinter import *
from tkinter.filedialog import askopenfilename
def callback():
with open(askopenfilename(),'r') as r:
next(r)
for x in r:
z = str(x[1:-2])
if len(z) % 2:
z = '0' + 'x' + z
print(binascii.unhexlify(z))
a = Button(text='select file', command=callback)
a.pack()
mainloop()
This is the error I am getting:
Exception in Tkinter callback
Traceback (most recent call last):
File "D:\python sw\lib\tkinter\__init__.py", line 1699, in __call__
return self.func(*args)
File "C:\Users\LENOVO\Downloads\hex2.py", line 16, in callback
print(binascii.unhexlify(z))
binascii.Error: Non-hexadecimal digit found"""
Just reread your question correctly, new answer:
Do not prefix with 0x since it does not work with unhexlify and won't even make the string-length even.
You need an even string length, since each pair of hex-digits represent one byte (being one character)
unhexlify returns a byte array, which can be decoded to a string using .decode()
As pointed out here you don't even need the import binascii and can convert hex-to-string with bytearray.fromhex("7061756c").decode()
list(map(lambda hx: bytearray.fromhex(hx).decode(),"H7061756c H7061756c61".replace("H","").split(" ")))
Returns ['paul', 'paula']
What I wrote before I thoroughly read your question
may still be of use
As PM 2Ring noted, unhexilify only works without prefixes like 0x.
Your hex-strings are separated by spaces and are prefixed with H, which must be removed. You already did this, but I think this can be done in a nicer way:
r = "H247314748F8 HA010001FD" # one line in your file
z_arrary = data.replace("H","").split(" ")
# this returns ['247314748F8','A010001FD']
# now we can apply unhexlify to all those strings:
unhexed = map(binascii.unhexlify, z_array)
# and print it.
print(list(unhexed))
This will throw you an Error: Odd-length string. Make sure you really want to unhexilify your data. As stated in the docs you'll need an even number of hexadecimal characters, each pair representing a byte.
If you want to convert the hexadecimal numbers to decimal integers numbers instead, try this one:
list(map(lambda hx: int(hx,16),"H247314748F8 HA010001FD".replace("H","").split(" ")))
int(string, base) will convert from one number system (hexadecimal has base 16) to decimal (with base 10).
** Off topic **
if len(z) % 2:
z = '0' + 'x' + z
Will lead to z still being of uneven length, since you added an even amount of characters.

Ignore newline character in binary file with Python?

I open my file like so :
f = open("filename.ext", "rb") # ensure binary reading with b
My first line of data looks like this (when using f.readline()):
'\x04\x00\x00\x00\x12\x00\x00\x00\x04\x00\x00\x00\xb4\x00\x00\x00\x01\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x18\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00:\x00\x00\x00;\x00\x00\x00<\x00\x00\x007\x00\x00\x008\x00\x00\x009\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00\t\x00\x00\x00\n'
Thing is, I want to read this data byte by byte (f.read(4)). While debugging, I realized that when it gets to the end of the first line, it still takes in the newline character \n and it is used as the first byte of the following int I read. I don't want to simply use .splitlines()because some data could have an n inside and I don't want to corrupt it. I'm using Python 2.7.10, by the way. I also read that opening a binary file with the b parameter "takes care" of the new line/end of line characters; why is not the case with me?
This is what happens in the console as the file's position is right before the newline character:
>>> d = f.read(4)
>>> d
'\n\x00\x00\x00'
>>> s = struct.unpack("i", d)
>>> s
(10,)
(Followed from discussion with OP in chat)
Seems like the file is in binary format and the newlines are just mis-interpreted values. This can happen when writing 10 to the file for example.
This doesn't mean that newline was intended, and it is probably not. You can just ignore it being printed as \n and just use it as data.
You should just be able to replace the bytes that indicate it is a newline.
>>> d = f.read(4).replace(b'\x0d\x0a', b'') #\r\n should be bytes b'\x0d\x0a'
>>> diff = 4 - len(d)
>>> while diff > 0: # You can probably make this more sophisticated
... d += f.read(diff).replace(b'\x0d\x0a', b'') #\r\n should be bytes b'\x0d\x0a'
... diff = 4 - len(d)
>>>
>>> s = struct.unpack("i", d)
This should give you an idea of how it will work. This approach could mess with your data's byte alignment.
If you really are seeing "\n" in your print of d then try .replace(b"\n", b"")

Python3 print in hex representation

I can find lot's of threads that tell me how to convert values to and from hex. I do not want to convert anything. Rather I want to print the bytes I already have in hex representation, e.g.
byteval = '\x60'.encode('ASCII')
print(byteval) # b'\x60'
Instead when I do this I get:
byteval = '\x60'.encode('ASCII')
print(byteval) # b'`'
Because ` is the ASCII character that my byte corresponds to.
To clarify: type(byteval) is bytes, not string.
>>> print("b'" + ''.join('\\x{:02x}'.format(x) for x in byteval) + "'")
b'\x60'
See this:
hexify = lambda s: [hex(ord(i)) for i in list(str(s))]
And
print(hexify("abcde"))
# ['0x61', '0x62', '0x63', '0x64', '0x65']
Another example:
byteval='\x60'.encode('ASCII')
hexify = lambda s: [hex(ord(i)) for i in list(str(s))]
print(hexify(byteval))
# ['0x62', '0x27', '0x60', '0x27']
Taken from https://helloacm.com/one-line-python-lambda-function-to-hexify-a-string-data-converting-ascii-code-to-hexadecimal/

Categories

Resources