How I can append bits in a file using Python? - python

I am writing a program to read and write bits from a file to another file. I found a library called bitstring that helps to manipulate bits as strings. However, this library helps me to read bits, but I cannot write the read bits. Both inputs and outputs files have the same size, so it will be no problem in term of bytes. This is a part of my code.
import bitstring
file = bitstring.ConstBitStream(filename='paper.pdf')
print(file.length)
bits_to_read = 5000000
last_bits = 0
while file.pos < file.length-bits_to_read:
bits = file.read(bits_to_read)
str_bits = bitstring.BitArray(bits).bin
rest = file.length - file.pos
bits = file.read(rest)
str_bits = bitstring.BitArray(bits).bin
with kind of regards.

So, I have found a solution. I appended the resulted bits into one variable and next, I exported. This is a part of the code:
while file.pos < file.length-bits_to_read:
bits = file.read(bits_to_read)
str_bits = bitstring.BitArray(bits).bin
encrypted_bits = ''.join(encrypt(str_bits, cipher))
exported_str = exported_str + encrypted_bits
rest = file.length - file.pos
bits = file.read(rest)
str_bits = bitstring.BitArray(bits).bin
exported_str = exported_str + str_bits
exported_bits = bitstring.BitArray(bin=exported_str)
with open(output_name, 'wb') as f:
f.write(exported_bits.tobytes())

Related

Sorting 16 and 32 bits instructions of STM in Python using endianness

[Update of my old post]
I am using Python to read a hex file containing a STM32 code, translated to a one-line binary, and then re-written in a file with only one instruction per line.
That last part is the problem, because my code is not working properly : some 32-bits instructions are interpreted as a 16-bits instruction, causing the code to not properly divide the instructions.
Here is the code i use :
import json
def function(filepath):
STMfile = open(filepath) # Read the file containing the data
STMfileLines = STMfile.readlines()
data_STMfile = ""
for STMfileLine in STMfileLines:
data_STMfile += STMfileLine[12:-3] # Taking only the data of each line
data_binary = ""
for c in data_STMfile:
data_binary += str(bin(int(c, 16))[2:].zfill(4)) # Converting each hex character in binary of 4 bits
instruction_file = open("./instructions_file.txt", "w")
i = 0
while i < len(data_binary)-15:
tmp = data_binary[i:i+32]
if tmp[24:27] == '111' and tmp[27:29] != '00': # 32-bits condition
instruction_file.write(tmp[24:] + tmp[16:24] + tmp[8:16] + tmp[:8] + '\n')
i += 32
else:
tmp = data_binary[i:i+16]
instruction_file.write(tmp[8:] + tmp[:8] + '\n')
i += 16
if __name__ == '__main__':
function("./clignoter_led.srec")
The file 'clignoter_led.srec' is :
S3250800000000400120650100081D0200081D0200081D0200081D0200081D0200080000000040
S325080000200000000000000000000000001D0200081D020008000000001D0200081D02000816
S325080000401D0200081D0200081D0200081D0200081D0200081D0200081D0200081D0200085A
S325080000601D0200081D0200081D0200081D0200081D0200081D0200081D0200081D0200083A
S325080000801D0200081D0200081D0200081D0200081D0200081D0200081D0200081D0200081A
S325080000a01D0200081D0200081D0200081D0200081D0200081D0200081D0200081D020008FA
S325080000c01D0200081D0200081D0200081D0200081D0200081D0200081D0200081D020008DA
S325080000e01D0200081D0200081D0200081D0200081D0200081D0200081D0200081D020008BA
S325080001001D0200081D0200081D0200081D0200081D0200081D0200081D0200081D02000899
S325080001201D02000810B5054C237833B9044B13B10448AFF300800123237010BD04000020C4
S32508000140000000006802000808B5034B1BB103490348AFF3008008BD0000000008000020A2
S325080001606802000823488546AFF3008022482349234A002302E0D458C4500433C4188C423E
S32508000180F9D3204A204C002301E013600432A242FBD300F045F800F01EF81C484FF0010178
S325080001a00268114301601A484FF48061026811430160704717484FF02001026811430160D8
S325080001c0704714486FF0200102681140016070470138FDD17047FFF7E0FFFFF7EBFF0E48E2
S325080001e00068FFF7F5FFFFF7ECFF0B480068FFF7EFFFF2E7004001200000002004000020A1
S325080002008802000804000020200000201C380240000002401400024000000020FEE70000A7
S3250800022070B500260C4D0D4C641BA410A64209D100F01AF800260A4D0A4C641BA410A642CE
S3250800024005D170BD55F8043B98470136EEE755F8043B98470136F2E7800200088002000887
S325080002608002000884020008F8B500BFF8BC08BC9E467047F8B500BFF8BC08BC9E4670475A
S3250800028049010008250100083C55050000000000000000000000000000000000000000003A
S325080002a0000000000000000000000000000000000000000000000000000000000000000030
S325080002c0000000000000000000000000000000000000000000000000000000000000000010
S325080002e00000000000000000000000000000000000000000000000000000000000000000F0
S325080003002846A4EB0E0440EA03409EE7C1F120078B4022FA07FC4CEA030C25FA07FA4FEA6D
S325080003201C49BAFBF9F820FA07F309FB18AA8D401FFA8CFE1D4300FA01F308FB0EF02C0CD3
S3250800034044EA0A44A04202FA01F20BD91CEB040408F1FF3A80F08880A04240F28580A8F1F3
S3250800036002086444241AB4FBF9F009FB104400FB0EFEADB245EA0444A64508D91CEB0404D7
S3250800038000F1FF356CD2A6456AD90238644440EA0840A0FB0295A4EB0E04AC42C846AE46A7
S325080003a056D353D0002E69D0B3EB080264EB0E0422FA01F304FA07F71F43CC40C6E90074D6
S325080003c0002147E70CFA02FCC2F1200125FA01F34FEA1C4720FA01F195400D43B3FBF7F172
S325080003e007FB11331FFA8CFE280C40EA034001FB0EF3834204FA02F408D91CEB000001F1D5
S70500000000FA
As explained in my old post, an instruction is 32-bits if it starts with '111' but they must not be followed by '00' (therefore an instruction starting with '11100' is a 16-bits, and the instructions starting by '11101', '11110', '11111' are 32-bits instructions, all other starts are 16-bits ones).
When I run this function with the above file, the resulting file (instructions_file.txt) contains these lines :
(198) 11110000000000001101001111111011
(199) 11110000000000001111100001000101
(200) 1111100000011110
(201) 11110000010011110100100000011100
Here, the third line is supposed to be a 32-bits instruction, and so the 4th should be 16-bits, yet my code didn't recognize the third line as a 16-bits instruction.
I've verified every index, they are correct.
I've tried different hex file (they are directly downloaded from STM32CubeProgrammer), they all have this problem.

How to create a text file of size N kilobytes with repetitions of "Hello World"

I want to create a text file of size N kilobytes with repetitions of "Hello World" where N is specified via a config file in a different directory from the repository, with the help of python. I am able to display the hello world N number of times, where N is a numerical input from a config file, but I dont know anything about size. Here is the code I have written so far:
import ConfigParser
import webbrowser
configParser = ConfigParser.RawConfigParser()
configParser.read("/home/suryaveer/check.conf")
num = configParser.get('userinput-config', 'num')
num2 = int(num)
message = "hello world"
f = open('test.txt', 'w')
f.write(message*num2)
f.close()
A string with length of 1 is 1 byte (as long as it is utf8).
That means that the size of "Hello World" in bytes is len("Hello World") = 11 bytes.
To get ~N kilobytes, you can run something like this:
# N is int
size_bytes = N * 1024
message = "hello world"
# using context manager, so no need to remember to close the file.
with open('test.txt', 'w') as f:
repeat_amount = int((size_bytes/len(message))
f.write(message * repeat_amount)
First get the size of your message and bear in mind that strings in Pyhton are objects, so when you call sys.getsizeof(message) this is not the pure string but the object itself. Then just count how many time you need to repeat the pure message to get N Kb as follows:
import sys
N = 1024 # size of the output file in Kb
message = "hello world"
string_object_size = sys.getsizeof("")
single_message_size = sys.getsizeof(message) - string_object_size
reps = int((N)*1024/single_message_size)
f = open('test.txt', 'w')
f.write(message*reps)
f.close()
First, you have to be clear on difference between number of characters written and number of bytes. In many encodings one character occupies more than 1 byte. In your example, if phrase is in English ('Hello world') and default encoding is utf-8, the numbers will be the same, but if you enable other language with different character set, they may differ.
...
with open('test.txt', 'wb') as f: # binary because we need count bytes
max_size = num2 * 1024 # I assume num2 in kb
msg_bytes = message.encode('utf-8')
bytes_written = 0
while bytes_written < max_size: # if you dont need breaking the last phrase
f.write(msg_bytes)
bytes_written += len(msg_bytes)

Convert file to binary code in Python

I am looking to convert a file to binary for a project, preferably using Python as I am most comfortable with it, though if walked-through, I could probably use another language.
Basically, I need this for a project I am working on where we want to store data using a DNA strand and thus need to store files in binary ('A's and 'T's = 0, 'G's and 'C's = 1)
Any idea how I could proceed? I did find that use could encode in base64, then decode it, but it seems a bit inefficient, and the code that I have doesn't seem to work...
import base64
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
print(file_path)
with open(file_path) as f:
encoded = base64.b64encode(f.readlines())
print(encoded)
Also, I already have a program to do that simply with text. Any tips on how to improve it would also be appreciated!
import binascii
t = bytearray(str(input("Texte?")), 'utf8')
h = binascii.hexlify(t)
b = bin(int(h, 16)).replace('b','')
#removing the b that appears in the end for some reason
g = b.replace('1','G').replace('0','A')
print(g)
For example, if I input test:
ok so for the text to DNA:
I input 'test' and expect the DNA sequence that comes from the binary
the binary being: 01110100011001010111001101110100 (Also I asked to print every conversion in the example so that it is more comprehensible)
>>>Texte?test #Asks the text
>>>b'74657374' #converts to hex
>>>01110100011001010111001101110100 #converts to binary
>>>AGGGAGAAAGGAAGAGAGGGAAGGAGGGAGAA #converts 0 to A and 1 to G
So, thanks to #jonrshape and Sergey Vturin, I finally was able to achieve what I wanted!
My program asks for a file, turns it into binary, which then gives me its equivalent in "DNA code" using pairs of binary numbers (00 = A, 01 = T, 10 = G, 11 = C)
import binascii
from tkinter import filedialog
file_path = filedialog.askopenfilename()
x = ""
with open(file_path, 'rb') as f:
for chunk in iter(lambda: f.read(32), b''):
x += str(binascii.hexlify(chunk)).replace("b","").replace("'","")
b = bin(int(x, 16)).replace('b','')
g = [b[i:i+2] for i in range(0, len(b), 2)]
dna = ""
for i in g:
if i == "00":
dna += "A"
elif i == "01":
dna += "T"
elif i == "10":
dna += "G"
elif i == "11":
dna += "C"
print(x) #hexdump
print(b) #converted to binary
print(dna) #converted to "DNA"
Of course, it is inefficient!
base64 is designed to store binary in a text. It makes a bigger size block after conversion.
btw: what efficiency do you want? compactness?
if so: second sample is much nearer to what you want
btw: in your task you loose information! Are you aware of this?
Here is a sample how to store and restore.
It stores data in an easy to understand Hex-In-Text format -- just for the sake of a demo. If you want compactness - you can easily modify the code so as to store in binary file or if you want 00011001 view - modification will be easy too.
import math
#"make a long test string"
import numpy as np
s=''.join((str(x) for x in np.random.randint(4,size=33)))\
.replace('0','A').replace('1','T').replace('2','G').replace('3','C')
def store_(s):
size=len(s) #size will changed to fit 8*integer so remember true value of it and store with data
s2=s.replace('A','0').replace('T','0').replace('G','1').replace('C','1')\
.ljust( int(math.ceil(size/8.)*8),'0') #add '0' to 8xInt to the right
a=(hex( eval('0b'+s2[i*8:i*8+8]) )[2:].rjust(2,'0') for i in xrange(len(s2)/8))
return ''.join(a),size
yourDataAsHexInText,sizeToStore=store_(s)
print yourDataAsHexInText,sizeToStore
def restore_(s,size=None):
if size==None: size=len(s)/2
a=( bin(eval('0x'+s[i*2:i*2+2]))[2:].rjust(8,'0') for i in xrange(len(s)/2))
#you loose information, remember?, so it`s only A or G
return (''.join(a).replace('1','G').replace('0','A') )[:size]
restore_(yourDataAsHexInText,sizeToStore)
print "so check it"
print s ,"(input)"
print store_(s)
print s.replace('C','G').replace('T','A') ,"to compare with information loss"
print restore_(*store_(s)),"restored"
print s.replace('C','G').replace('T','A') == restore_(*store_(s))
result in my test:
63c9308a00 33
so check it
AGCAATGCCGATGTTCATCGTATACTTTGACTA (input)
('63c9308a00', 33)
AGGAAAGGGGAAGAAGAAGGAAAAGAAAGAGAA to compare with information loss
AGGAAAGGGGAAGAAGAAGGAAAAGAAAGAGAA restored
True

Find previous byte that isn't equal to unwanted byte in Python

I'm programming a small script that is meant to open a binary file, find an often-changing binary blob, and copy just that blob to a new file.
Here's the layout of the binary file:
-JUNK (Unknown Size) (Unknown Contents)
-3-byte HEADER containing encoded size of blob
-PADDING (Unknown Size) (Every byte is FF in hex)
-Start of blob (72 bytes) (Unknown Contents)
-16 bytes that are ALWAYS the same
-End of blob (Size can be determined from subtracting (72+16) from value HEADER) (Unknown Contents)
-JUNK (Unknown Size) (Unknown Contents)
Here's the code I've written so far:
from sys import argv
import binascii
import base64
InputFileName = argv[1]
with open(InputFileName, 'rb') as InputFile:
Constant16 = base64.b64decode("GIhTSuBask6y60iLI2VwIg==")
Constant16Offset = InputFile.read().find(Constant16)
InputFile.seek(Constant16Offset)
InputFile.seek(-72,1)
InputFile.seek(-1,1)
FFTestVar = InputFile.read(1)
while FFTestVar == b'\xFF':
InputFile.seek(-2,1)
FFTestVar = InputFile.read(1)
InputFile.seek(-3,1)
BlobSizeBin = InputFile.read(3)
BlobSizeHex = binascii.b2a_hex(BlobSizeBin)
BlobSizeDec = int(BlobSizeHex, 16)
InputFile.seek(Constant16Offset)
InputFile.seek(-72,1)
Blob = InputFile.read(BlobSizeDec)
with open('output.bin', 'wb') as OutputFile:
OutputFile.write(Blob)
Unfortunately, the while loop is SLOW. InputFile could be up to 24MB large, and the padding could be a huge chunk of that. Going through it one byte at a time is ridiculously slow.
I'm thinking that there's probably a better way of doing this, but an hour or two of Googling hasn't been helpful.
Thanks!
You can read whole file into memory (you actually do it):
data = InputFile.read()
And then you can treat data like casual string (but it's not unicode string but an array of bytes, which is unfortunately called str under python 2.X). You need to remember offset so we will create offset attribute. Every line which looks like InputFile.seek(xx) must be translated into offset = xx and InputFile.seek(xx, 1) into offset += xx.
magic_number = base64.b64decode("GIhTSuBask6y60iLI2VwIg==")
offset = magic_number_offset = data.find(magic_number)
offset -= 72
Then, instead of while loop use re module (you need to import that module):
pattern = re.compile("[^\xFF]\xFF*$")
offset = pattern.search(data, endpos=offset).start() + 1
And the rest of code is:
offset -= 3
blob_size_bin = data[offset:offset+3]
blob_size_hex = binascii.b2a_hex(blob_size_bin)
blob_size_dec = int(blob_size_hex, 16)
offset = magic_number_offset - 72
blob = data[offset:offset+blob_size_dec]
If the files are really big and the python process consumes a lot of memory, you can use mmap module instead of loading whole file into memory.
If this solutions is still slow, you can reverse order of your data (reversed_data = data[::-1]) and search for pattern [^\ff].

Why am I getting an IndexError: string index out of range?

I am running the following code on ubuntu 11.10, python 2.7.2+.
import urllib
import Image
import StringIO
source = '/home/cah/Downloads/evil2.gfx'
dataFile = open(source, 'rb').read()
slicedFile1 = StringIO.StringIO(dataFile[::5])
slicedFile2 = StringIO.StringIO(dataFile[1::5])
slicedFile3 = StringIO.StringIO(dataFile[2::5])
slicedFile4 = StringIO.StringIO(dataFile[3::5])
jpgimage1 = Image.open(slicedFile1)
jpgimage1.save('/home/cah/Documents/pychallenge12.1.jpg')
pngimage1 = Image.open(slicedFile2)
pngimage1.save('/home/cah/Documents/pychallenge12.2.png')
gifimage1 = Image.open(slicedFile3)
gifimage1.save('/home/cah/Documents/pychallenge12.3.gif')
pngimage2 = Image.open(slicedFile4)
pngimage2.save('/home/cah/Documents/pychallenge12.4.png')
in essence i'm taking a .bin file that has hex code for several image files jumbled
like 123451234512345... and clumping together then saving. The problem is i'm getting the following error:
File "/usr/lib/python2.7/dist-packages/PIL/PngImagePlugin.py", line 96, in read
len = i32(s)
File "/usr/lib/python2.7/dist-packages/PIL/PngImagePlugin.py", line 44, in i32
return ord(c[3]) + (ord(c[2])<<8) + (ord(c[1])<<16) + (ord(c[0])<<24)
IndexError: string index out of range
i found the PngImagePlugin.py and I looked at what it had:
def i32(c):
return ord(c[3]) + (ord(c[2])<<8) + (ord(c[1])<<16) + (ord(c[0])<<24) (line 44)
"Fetch a new chunk. Returns header information."
if self.queue:
cid, pos, len = self.queue[-1]
del self.queue[-1]
self.fp.seek(pos)
else:
s = self.fp.read(8)
cid = s[4:]
pos = self.fp.tell()
len = i32(s) (lines 88-96)
i would try tinkering, but I'm afraid I'll screw up png and PIL, which have been erksome to get working.
thanks
It would appear that len(s) < 4 at this stage
len = i32(s)
Which means that
s = self.fp.read(8)
isn't reading the whole 4 bytes
probably the data in the fp you are passing isn't making sense to the image decoder.
Double check that you are slicing correctly
Make sure that the string you are passing in is of at least length 4.

Categories

Resources