Python - Read text file to String - python

I have the following python sript, which double hashes a hex value:
import hashlib
linestring = open('block_header.txt', 'r').read()
header_hex = linestring.encode("hex") // Problem!!!
print header_hex
header_bin = header_hex.decode('hex')
hash = hashlib.sha256(hashlib.sha256(header_bin).digest()).digest()
hash.encode('hex_codec')
print hash[::-1].encode('hex_codec')
My text file "block_header.txt" (hex) looks like this:
0100000081cd02ab7e569e8bcd9317e2fe99f2de44d49ab2b8851ba4a308000000000000e320b6c2fffc8d750423db8b1eb942ae710e951ed797f7affc8892b0f1fc122bc7f5d74df2b9441a42a14695
Unfortunately, the result from printing the variable header_hex looks like this (not like the txt file):
303130303030303038316364303261623765353639653862636439333137653266653939663264653434643439616232623838353162613461333038303030303030303030303030653332306236633266666663386437353034323364623862316562393432616537313065393531656437393766376166666338383932623066316663313232626337663564373464663262393434316134326131343639350a
I think the problem is in this line:
header_hex = linestring.encode("hex")
If I remove the ".encode("hex")"-part, then I get the error
unhandled TypeError "Odd-length string"
Can anyone give me a hint what might be wrong?
Thank you a lot :)

You're doing too much encoding/decoding.
Like others mentioned, if your input data is hex, then it's a good idea to strip leading / trailing whitespace with strip().
Then, you can use decode('hex') to turn the hex ASCII into binary. After performing whatever hashing you want, you'll have the binary digest.
If you want to be able to "see" that digest, you can turn it back into hex with encode('hex').
The following code works on your input file with any kinds of whitespace added at the beginning or end.
import hashlib
def multi_sha256(data, iterations):
for i in xrange(iterations):
data = hashlib.sha256(data).digest()
return data
with open('block_header.txt', 'r') as f:
hdr = f.read().strip().decode('hex')
_hash = multi_sha256(hdr, 2)
# Print the hash (in hex)
print 'Hash (hex):', _hash.encode('hex')
# Save the hash to a hex file
open('block_header_hash.hex', 'w').write(_hash.encode('hex'))
# Save the hash to a binary file
open('block_header_hash.bin', 'wb').write(_hash)

Related

how to change byte type in pythone

i have a small problem and it caused me a lot of trubble. basicly i want to convert an immage to bytes than store string wersion of those bytes in an txt file and than read file contents and transform it into bytes and than into image. i've goten first part of this kinda ready (it works but it's made quickly and badly) but the conversion from string to byte gives me problem.
when i read image bytes it's something like this: b'GIF89aP\x00P\x00\xe3'
but when i read it from txt by 'rb' or just transform str to byte it gives me this: b'GIF89aP\\x00P\\x00\\xe3'
and with this i can't write it to an immage.
so i've tried to read and learn anything about this but i couldn't find anything that would help.
the code is here and i know it's really messy but i just need it to work
file = open('p.gif', 'rb')
image = file.read()
str_b = str(image)
leng = len(str_b)
print(leng)
str_b = str_b[:0] + str_b[0+2:]
leng =- 1
str_b = str_b[:leng]
print(image)
#a = open('bytearray', 'w+')
#a.write(str_b)
#a.close
a = open('bytearray', 'r')
a = a.read()
temp = a.encode('utf-8')
print(temp)
#b = open('check', 'w+')
#b.write(str(string))
#print(string)
image_result = open('decoded.jpg', 'wb') # create a writable image and write the decoding result
image_result.write(temp)
basicly my goal right now is to get bytes that look like this: b'GIF89aP\x00P\x00\xe3'
Please do not use eval like suggested above, eval has serious security vulnerabilities and will execute any python code you pass within it. You could accidentally read a text file that has code to reformat the disk and it will just execute, this is just an example but you get my point its bad practice and just results in more problems see https://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html if you want some examples on why eval is bad
anyways lets try to fix your code
instead of converting your byte array to string by wrapping it in the str() method I would suggest you use .decode() and .encode()
Fixed Code:
with open('p.gif', 'rb') as file:
image = file.read() # read file bytes
str_image = image.decode("utf-8") #using decode we changed the bytes to a string
with open('image.txt', 'w') as file:
file.write(str_image) # write image as string to a text file
with open('image.txt', 'r') as file
str_from_file = file.read() # read the text file and store the string
file_bytes = str_from_file.encode("utf-8") # encode the image str back to bytes
print(type(str_from_file)) #type is str
print(type(file_bytes)) # types is bytes
I hope this fixes your issue and also doesn't include vulnerabilties in what your building

Python: Convert data file format to string

I have a file that has the following output when file command is run:
#file test.bin
#test.bin : data
#file -i test.bin
#test.bin: application/octet-stream; charset=binary
I want to read the contents of this file and forward to a python library that accepts this read-data as a string.
file = open("test.bin", "rb")
readBytes = file.read() # python type : <class 'bytes'>
output = test.process(readBytes) # process expects a string
I have tried str(readBytes), however that did not work. I see that there are also unprintable strings in the file test.bin, as the output of strings test.bin produces far lesser output than the actual bytes present in the file.
Is there a way to convert the bytes read into strings? Or am I trying to achieve something that makes no sense at all?
Try to use Bitstring. It's good package for reading bits.
# import module
from bitstring import ConstBitStream
# read file
x = ConstBitStream(filename='file.bin')
# read 5 bits
output = x.read(5)
# convert to unsigned int
int_val = output.uint
Do you mean by?
output = test.process(readBytes.decode('latin1'))

Python get rid of bytes b' '

import save
string = ""
with open("image.jpg", "rb") as f:
byte = f.read(1)
while byte != b"":
byte = f.read(1)
print ((byte))
I'm getting bytes like:
b'\x00'
How do I get rid of this b''?
Let's say I wanna save the bytes to a list, and then save this list as the same image again. How do I proceed?
Thanks!
You can use bytes.decode function if you really need to "get rid of b": http://docs.python.org/3.3/library/stdtypes.html#bytes.decode
But it seems from your code that you do not really need to do this, you really need to work with bytes.
The b"..." is just a python notation of byte strings, it's not really there, it only gets printed. Does it cause some real problems to you?
The b'', is only the string representation of the data that is written when you print it.
Using decode will not help you here because you only want the bytes, not the characters they represent. Slicing the string representation will help even less because then you are still left with a string of several useless characters ('\', 'x', and so on), not the original bytes.
There is no need to modify the string representation of the data, because the data is still there. Just use it instead of the string (i.e. don't use print). If you want to copy the data, you can simply do:
data = file1.read(...)
...
file2.write(data)
If you want to output the binary data directly from your program, use the sys.stdout.buffer:
import sys
sys.stdout.buffer.write(data)
To operate on binary data you can use the array-module.
Below you will find an iterator that operates on 4096 chunks of data instead of reading everything into memory at ounce.
import array
def bytesfromfile(f):
while True:
raw = array.array('B')
raw.fromstring(f.read(4096))
if not raw:
break
yield raw
with open("image.jpg", 'rb') as fd
for byte in bytesfromfile(fd):
for b in byte:
# do something with b
This is one way to get rid of the b'':
import sys
print(b)
If you want to save the bytes later it's more efficient to read the entire file in one go rather than building a list, like this:
with open('sample.jpg', mode='rb') as fh:
content = fh.read()
with open('out.jpg', mode='wb') as out:
out.write(content)
Here is one solution
print(str(byte[2:-1]))

Python MD5 Hash comparison in Python 3.2

I am trying to validate two files downloaded from a server. The first contains data and the second file contains the MD5 hash checksum.
I created a function that returns a hexdigest from the data file like so:
def md5(fileName):
"""Compute md5 hash of the specified file"""
try:
fileHandle = open(fileName, "rb")
except IOError:
print ("Unable to open the file in readmode: [0]", fileName)
return
m5Hash = hashlib.md5()
while True:
data = fileHandle.read(8192)
if not data:
break
m5Hash.update(data)
fileHandle.close()
return m5Hash.hexdigest()
I compare the files using the following:
file = "/Volumes/Mac/dataFile.tbz"
fileHash = md5(file)
hashFile = "/Volumes/Mac/hashFile.tbz.md5"
fileHandle = open(hashFile, "rb")
fileHandleData = fileHandle.read()
if fileHash == fileHandleData:
print ("Good")
else:
print ("Bad")
The file comparison fails so I printed out both fileHash and fileHandleData and I get the following:
[0] b'MD5 (hashFile.tbz) = b60d684ab4a2570253961c2c2ad7b14c\n'
[0] b60d684ab4a2570253961c2c2ad7b14c
From the output above the hash values are identical. Why does the hash comparison fail? I am new to python and am using python 3.2. Any suggestions?
Thanks.
The comparison fails for the same reason this is false:
a = "data"
b = b"blah (blah) - data"
print(a == b)
The format of that .md5 file is strange, but if it is always in that format, a simple way to test would be:
if fileHandleData.rstrip().endswith(fileHash.encode()):
Because you have fileHash as a (Unicode) string, you have to encode it to bytes to compare. You may want to specify an encoding rather than use the current default string encoding.
If that exact format is always expected, it would be more robust to use a regex to extract the hash value and possibly check the filename.
Or, more flexibly, you could test substring presence:
if fileHash.encode() in fileHandleData:
You are comparing a hash value to the contents of the fileHandle. You need to get rid of the MD5 (hashFile.tbz) = part as well as the trailing newline, so try:
if fileHash == fileHandleData.rsplit(' ', 1)[-1].rstrip():
print ("Good")
else:
print ("Bad")
keep in mind that in Python 3, rsplit() and rstrip() do not support the buffer API and only operate on strings. Hence, as Fred Nurk correctly added, you also need to encode/decode fileHandleData/fileHash (a byte buffer or a (Unicode) string, respectively).
The hash values are identical, but the strings are not. You need to get the hex value of the digest, and you need to parse the hash out of the file. Once you have done those you can compare them for equality.
Try "fileHash.strip("\n")...then compare the two. That should fix the problem.

Python line file iteration and strange characters

I have a huge gzipped text file which I need to read, line by line. I go with the following:
for i, line in enumerate(codecs.getreader('utf-8')(gzip.open('file.gz'))):
print i, line
At some point late in the file, the python output diverges from the file. This is because lines are getting broken due to weird special characters that python thinks are newlines. When I open the file in 'vim', they are correct, but the suspect characters are formatted weirdly. Is there something I can do to fix this?
I've tried other codecs including utf-16, latin-1. I've also tried with no codec.
I looked at the file using 'od'. Sure enough, there are \n characters where they shouldn't be. But, the "wrong" ones are prepended by a weird character. I think there's some encoding here with some characters being 2-bytes, but the trailing byte being a \n if not viewed properly.
According to 'od -h file' the offending character is '1d1c'.
If I replace:
gzip.open('file.gz')
With:
os.popen('zcat file.gz')
It works fine (and actually, quite faster). But, I'd like to know where I'm going wrong.
Try again with no codec. The following reproduces your problem when using codec, and the absence of the problem without it:
import gzip
import os
import codecs
data = gzip.open("file.gz", "wb")
data.write('foo\x1d\x1cbar\nbaz')
data.close()
print list(codecs.getreader('utf-8')(gzip.open('file.gz')))
print list(os.popen('zcat file.gz'))
print list(gzip.open('file.gz'))
Outputs:
[u'foo\x1d', u'\x1c', u'bar\n', u'baz']
['foo\x1d\x1cbar\n', 'baz']
['foo\x1d\x1cbar\n', 'baz']
I asked (in a comment) """Show us the output from print repr(weird_special_characters). When you open the file in vim, WHAT are correct? Please be more precise than "formatted weirdly".""" But nothing :-(
What file are you looking at with od? file.gz?? If you can see anything recognisable in there, it's not a gzip file! You're not seeing newlines, you're seeing binary bytes that contain 0x0A.
If the original file was utf-8 encoded, what was the point of trying it with other codecs?
Does "works OK with zcat" mean that you got recognisable data without a utf8 decode step??
I suggest that you simplify your code, and do it a step at a time ... see for example the accepted answer to this question. Try it again and please show the exact code that you ran, and use repr() when describing the results.
Update It looks like DS has guessed what you were trying to explain about the \x1c and \x1d.
Here are some notes on WHY it happens like that:
In ASCII, only \r and \n are considered when line-breaking:
>>> import pprint
>>> text = ''.join('A' + chr(i) for i in range(32)) + 'BBB'
>>> print repr(text)
'A\x00A\x01A\x02A\x03A\x04A\x05A\x06A\x07A\x08A\tA\nA\x0bA\x0cA\rA\x0eA\x0fA\x10
A\x11A\x12A\x13A\x14A\x15A\x16A\x17A\x18A\x19A\x1aA\x1bA\x1cA\x1dA\x1eA\x1fBBB'
>>> pprint.pprint(text.splitlines(True))
['A\x00A\x01A\x02A\x03A\x04A\x05A\x06A\x07A\x08A\tA\n', # line break
'A\x0bA\x0cA\r', # line break
'A\x0eA\x0fA\x10A\x11A\x12A\x13A\x14A\x15A\x16A\x17A\x18A\x19A\x1aA\x1bA\x1cA\x
1dA\x1eA\x1fBBB']
>>>
However in Unicode, the characters \x1D (FILE SEPARATOR), \x1E (GROUP SEPARATOR), and \x1E (RECORD SEPARATOR) also qualify as line-endings:
>>> text = u''.join('A' + unichr(i) for i in range(32)) + u'BBB'
>>> print repr(text)
u'A\x00A\x01A\x02A\x03A\x04A\x05A\x06A\x07A\x08A\tA\nA\x0bA\x0cA\rA\x0eA\x0fA\x10A\x11A\x12A\x13A\x14A\x15A\x16A\x17A\x18A\x19A\x1aA\x1bA\x1cA\x1dA\x1eA\x1fBBB'
>>> pprint.pprint(text.splitlines(True))
[u'A\x00A\x01A\x02A\x03A\x04A\x05A\x06A\x07A\x08A\tA\n', # line break
u'A\x0bA\x0cA\r', # line break
u'A\x0eA\x0fA\x10A\x11A\x12A\x13A\x14A\x15A\x16A\x17A\x18A\x19A\x1aA\x1bA\x1c', # line break
u'A\x1d', # line break
u'A\x1e', # line break
u'A\x1fBBB']
>>>
This will happen whatever codec you use. You still need to work out what (if any) codec you need to use. You also need to work out whether the original file was really a text file and not a binary file. If it's a text file, you need to consider the meaning of the \x1c and \x1d in the file.

Categories

Resources