Python: Convert data file format to string - python

I have a file that has the following output when file command is run:
#file test.bin
#test.bin : data
#file -i test.bin
#test.bin: application/octet-stream; charset=binary
I want to read the contents of this file and forward to a python library that accepts this read-data as a string.
file = open("test.bin", "rb")
readBytes = file.read() # python type : <class 'bytes'>
output = test.process(readBytes) # process expects a string
I have tried str(readBytes), however that did not work. I see that there are also unprintable strings in the file test.bin, as the output of strings test.bin produces far lesser output than the actual bytes present in the file.
Is there a way to convert the bytes read into strings? Or am I trying to achieve something that makes no sense at all?

Try to use Bitstring. It's good package for reading bits.
# import module
from bitstring import ConstBitStream
# read file
x = ConstBitStream(filename='file.bin')
# read 5 bits
output = x.read(5)
# convert to unsigned int
int_val = output.uint

Do you mean by?
output = test.process(readBytes.decode('latin1'))

Related

how to change byte type in pythone

i have a small problem and it caused me a lot of trubble. basicly i want to convert an immage to bytes than store string wersion of those bytes in an txt file and than read file contents and transform it into bytes and than into image. i've goten first part of this kinda ready (it works but it's made quickly and badly) but the conversion from string to byte gives me problem.
when i read image bytes it's something like this: b'GIF89aP\x00P\x00\xe3'
but when i read it from txt by 'rb' or just transform str to byte it gives me this: b'GIF89aP\\x00P\\x00\\xe3'
and with this i can't write it to an immage.
so i've tried to read and learn anything about this but i couldn't find anything that would help.
the code is here and i know it's really messy but i just need it to work
file = open('p.gif', 'rb')
image = file.read()
str_b = str(image)
leng = len(str_b)
print(leng)
str_b = str_b[:0] + str_b[0+2:]
leng =- 1
str_b = str_b[:leng]
print(image)
#a = open('bytearray', 'w+')
#a.write(str_b)
#a.close
a = open('bytearray', 'r')
a = a.read()
temp = a.encode('utf-8')
print(temp)
#b = open('check', 'w+')
#b.write(str(string))
#print(string)
image_result = open('decoded.jpg', 'wb') # create a writable image and write the decoding result
image_result.write(temp)
basicly my goal right now is to get bytes that look like this: b'GIF89aP\x00P\x00\xe3'
Please do not use eval like suggested above, eval has serious security vulnerabilities and will execute any python code you pass within it. You could accidentally read a text file that has code to reformat the disk and it will just execute, this is just an example but you get my point its bad practice and just results in more problems see https://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html if you want some examples on why eval is bad
anyways lets try to fix your code
instead of converting your byte array to string by wrapping it in the str() method I would suggest you use .decode() and .encode()
Fixed Code:
with open('p.gif', 'rb') as file:
image = file.read() # read file bytes
str_image = image.decode("utf-8") #using decode we changed the bytes to a string
with open('image.txt', 'w') as file:
file.write(str_image) # write image as string to a text file
with open('image.txt', 'r') as file
str_from_file = file.read() # read the text file and store the string
file_bytes = str_from_file.encode("utf-8") # encode the image str back to bytes
print(type(str_from_file)) #type is str
print(type(file_bytes)) # types is bytes
I hope this fixes your issue and also doesn't include vulnerabilties in what your building

Python gzip gives error and asks for bytes like input

I'm unable to get this to work
import gzip
content = "Lots of content here"
f = gzip.open('file.txt.gz', 'a', 9)
f.write(content)
f.close()
I get output:
============== RESTART: C:/Users/Public/Documents/Trial_vr06.py ==============
Traceback (most recent call last):
File "C:/Users/Public/Documents/Trial_vr06.py", line 4, in <module>
f.write(content)
File "C:\Users\SONID\AppData\Local\Programs\Python\Python36\lib\gzip.py", line 260, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
This was linked in an answer Python Gzip - Appending to file on the fly to Is it possible to compress program output within python to reduce output file's size?
I've tried integer data but no effect. What is the issue here
by default gzip streams are binary (Python 3: gzip.open() and modes).
No problem in Python 2, but Python 3 makes a difference between binary & text streams.
So either encode your string (or use b prefix if it's a literal like in your example, not always possible)
f.write(content.encode("ascii"))
or maybe better for text only: open the gzip stream as text:
f = gzip.open('file.txt.gz', 'at', 9)
note that append mode on a gzip file works is not that efficient (Python Gzip - Appending to file on the fly)
In order to compress your string, it must be a binary value. In order to do that simply put a "b" in front of your string. This will tell the interpreter to read this as the binary value and not the string value.
content = b"Lots of content here"
https://docs.python.org/3/library/gzip.html

How to read specific bytes from a binary MP3 file in Python?

I would learn to handle read and write binary data. I know that I can open a binary file with
f = open(myfile, mode='rb')
fb = f.read()
f.close()
return fb
How can I access and read the range $A7-$AC in a mp3 file with this structure:
Lame mp3 Tags
You should take a look at Python's struct library for help with extracting binary data.
import struct
mp3_filename = r"my_mp3_file.mp3"
with open(mp3_filename, 'rb') as f_mp3:
mp3 = f_mp3.read()
entry = mp3[0xA7:0xAC+1]
print struct.unpack("{}b".format(len(entry)), entry)
This would give you a list of integers such as:
(49, 0, 57, 0, 57, 0)
You pass a format string to tell Python how to intepret each of the bytes. In this example, they are all simply converted from bytes into integers. Each format specifier can have a repeat count, so for your example, the format string would be "6b". If you wanted to decode this as words, you would simply change the format specifier, there is a full table of options to help you: Struct format characters
To convert these to zeros, you would need to close the file and reopen it for writing. Build a new output as follows:
import struct
mp3_filename = r"my_mp3_file.mp3"
zeros = "\0\0\0\0\0\0"
with open(mp3_filename, 'rb') as f_mp3:
mp3 = f_mp3.read()
entry = mp3[0xA7:0xAC+1]
print struct.unpack("{}B".format(len(entry)), entry)
if entry != zeros:
print "non zero"
with open(mp3_filename, 'wb') as f_mp3:
f_mp3.write(mp3[:0xA7] + zeros + mp3[0xAD:])
FYI: There are ready made Python libraries that are able to extract tag information from MP3 files. Take a look at something like the id3reader package.

How to open and read a binary file in Python?

I have a binary file (link) that I would like to open and read contents of with Python. How are such binary files opened and read with Python? Any specific modules to use for such an operation.
The 'b' flag will get python to treat the file as a binary, so no modules are needed. Also you haven't provided a purpose for having python read a binary file with a question like that.
f = open('binaryfile', 'rb')
print(f.read())
Here is an Example:
with open('somefile.bin', 'rb') as f: #the second parameter "rb" is used only when reading binary files. Term "rb" stands for "read binary".
data = f.read() #we are assigning a variable which will read whatever in the file and it will be stored in the variable called data.
print(data)
Reading a file in python is trivial (as mentioned above); however, it turns out that if you want to read a binary file and decode it correctly you need to know how it was encoded in the first place.
I found a helpful example that provided some insight at https://www.devdungeon.com/content/working-binary-data-python,
# Binary to Text
binary_data = b'I am text.'
text = binary_data.decode('utf-8') #Trans form back into human-readable ASCII
print(text)
binary_data = bytes([65, 66, 67]) # ASCII values for A, B, C
text = binary_data.decode('utf-8')
print(text)
but I was still unable to decode some files that my work created because they used an unknown encoding method.
Once you know how it is encoded you can read the file bit by bit and perform the decoding with a function of three.

Python - Read text file to String

I have the following python sript, which double hashes a hex value:
import hashlib
linestring = open('block_header.txt', 'r').read()
header_hex = linestring.encode("hex") // Problem!!!
print header_hex
header_bin = header_hex.decode('hex')
hash = hashlib.sha256(hashlib.sha256(header_bin).digest()).digest()
hash.encode('hex_codec')
print hash[::-1].encode('hex_codec')
My text file "block_header.txt" (hex) looks like this:
0100000081cd02ab7e569e8bcd9317e2fe99f2de44d49ab2b8851ba4a308000000000000e320b6c2fffc8d750423db8b1eb942ae710e951ed797f7affc8892b0f1fc122bc7f5d74df2b9441a42a14695
Unfortunately, the result from printing the variable header_hex looks like this (not like the txt file):
303130303030303038316364303261623765353639653862636439333137653266653939663264653434643439616232623838353162613461333038303030303030303030303030653332306236633266666663386437353034323364623862316562393432616537313065393531656437393766376166666338383932623066316663313232626337663564373464663262393434316134326131343639350a
I think the problem is in this line:
header_hex = linestring.encode("hex")
If I remove the ".encode("hex")"-part, then I get the error
unhandled TypeError "Odd-length string"
Can anyone give me a hint what might be wrong?
Thank you a lot :)
You're doing too much encoding/decoding.
Like others mentioned, if your input data is hex, then it's a good idea to strip leading / trailing whitespace with strip().
Then, you can use decode('hex') to turn the hex ASCII into binary. After performing whatever hashing you want, you'll have the binary digest.
If you want to be able to "see" that digest, you can turn it back into hex with encode('hex').
The following code works on your input file with any kinds of whitespace added at the beginning or end.
import hashlib
def multi_sha256(data, iterations):
for i in xrange(iterations):
data = hashlib.sha256(data).digest()
return data
with open('block_header.txt', 'r') as f:
hdr = f.read().strip().decode('hex')
_hash = multi_sha256(hdr, 2)
# Print the hash (in hex)
print 'Hash (hex):', _hash.encode('hex')
# Save the hash to a hex file
open('block_header_hash.hex', 'w').write(_hash.encode('hex'))
# Save the hash to a binary file
open('block_header_hash.bin', 'wb').write(_hash)

Categories

Resources