Write a "string" as raw binary into a file Python

Write a "string" as raw binary into a file Python - python

I'm trying to write a series of files for testing that I am building from scratch. The output of the data payload builder is of type string, and I'm struggling to get the string written directly to the file.
The payload builder only uses hex values, and simply adds a byte for each iteration.
The 'write' functions I have tried all either fall over the writing of strings, or write the ASCII code for the string, rather than the string its self...
I want to end up with a series of files - with the same filename as the data payload (e.g. file ff.txt contains the byte 0xff
def doMakeData(counter):
dataPayload = "%X" %counter
if len(dataPayload)%2==1:
dataPayload = str('0') + str(dataPayload)
fileName = path+str(dataPayload)+".txt"
return dataPayload, fileName
def doFilenameMaker(counter):
counter += 1
return counter
def saveFile(dataPayload, fileName):
# with open(fileName, "w") as text_file:
# text_file.write("%s"%dataPayload) #this just writes the ASCII for the string
f = file(fileName, 'wb')
dataPayload.write(f) #this also writes the ASCII for the string
f.close()
return
if __name__ == "__main__":
path = "C:\Users\me\Desktop\output\\"
counter = 0
iterator = 100
while counter < iterator:
counter = doFilenameMaker(counter)
dataPayload, fileName = doMakeData(counter)
print type(dataPayload)
saveFile(dataPayload, fileName)

To write just a byte, use chr(n) to get a byte containing integer n.
Your code can be simplified to:
import os
path = r'C:\Users\me\Desktop\output'
for counter in xrange(100):
with open(os.path.join(path,'{:02x}.txt'.format(counter)),'wb') as f:
f.write(chr(counter))
Note use of raw string for the path. If you had a '\r' or '\n' in the string they would be treated as a carriage return or linefeed without using a raw string.
f.write is the method to write to a file. chr(counter) generates the byte. Make sure to write in binary mode 'wb' as well.

dataPayload.write(f) # this fails "AttributeError: 'str' object has no attribute 'write'
Of course it does. You don't write to strings; you write to files:
f.write(dataPayload)
That is to say, write() is a method of file objects, not a method of string objects.
You got this right in the commented-out code just above it; not sure why you switched it around here...

Related

AttributeError: 'str' object has no attribute 'readlines'. Where did I go wrong in my code?

I am trying to generate the reverse complement for DNA sequences of multiple file types with a python script. Here is what I have written so far
import gzip
import re
############## Reverse Complement Function #################################
def rev_comp(dna):
dna_upper = dna.upper() #Ensures all input is capitalized
dna_rev = dna_upper[::-1] #Reverses the string
conversion = {'A':'T','C':'G','G':'C','T':'A','Y':'R','R':'Y',\
'S':'S','W':'W','K':'M','M':'K','B':'V','V':'B',\
'D':'H','H':'D','N':'N','-':'-'}
rev_comp = ''
rc = open("Rev_Comp.fasta", 'w')
for i in dna_rev:
rev_comp += conversion[i]
rc.write(str(rev_comp))
print("Reverse complement file Rev_Comp.fasta written to directory")
x = input("Enter filename (with extension) of the DNA sequence: ")
if x.endswith(".gz"): #Condition for gzip files
with gzip.open(x, 'rb') as f:
file_content = f.read()
new_file = open("unzipped.fasta", 'w')
new_file.write(str(file_content))
print("unzipped.fasta written to directory")
xread = x.readlines()
fast = ''
if x.endswith(".fasta"): #condition for fasta files
for i in xread:
if not i.startswith('>'):
fast = fast + i.strip('\n')
if x.endswith(".fastq"): #condition for fastq files
for i in range(1,len(xread),4):
fast = fast + xread[i].strip('\n')
rev_comp(x)
And what I wind up with is
AttributeError: 'str' object has no attribute 'readlines'
when I try to run the script using a .fastq file. What exactly is going wrong here? I expect the script to write Rev_comp.fasta, but it doesn't.

x is not a filehandle, just a file name. You need to do
with open(x) as xhandle:
xread = xhandle.readlines()
The overall logic might be better if you don't read all lines into memory. Also, the .gz case ends up in vaguely undefined territory; do you need to set x to the name of the unzipped data at the end of the gz handling, or perhaps put the code after it into an else: branch?

x is the input from the user, which is a string. You need to open a file to be able to call readlines on it.
According to your existing code:
x = input("Enter filename (with extension) of the DNA sequence: ") # x stores a string
file_x = open(x, 'r') # You must open a file ...
xread = file_x.readlines() # and call readlines on the file instance.
# Although it is not explicitly necessary, remember to close the file when you'done, is good practice.
file_x.close()
or use the file as a context manager
with open(x) as file_x:
xread = file_x.readlines()

Python binary messes up some files

I have written two python scripts. One of which encodes the file to binary, stores it as a textfile for later decryption. The other script can turn the textfile back into readable information, or at least, that's my aim.
script 1 (encrypt)
(use any .png image file as input, any .txt file as output):
u_input = input("What file to encrypt?")
file_store = input("Where do you want to store the binary?")
character = "" #Blank for now
encrypted = "" #Blank for now, stores the bytes before they are written
with open(u_input, 'rb') as f:
while True:
c = f.read(1)
if not c:
f.close()
break
encrypted = encrypted + str(bin(ord(c))[2:].zfill(8))
print("")
print(encrypted) # This line is not necessary, but I have included it to show that the encryption works
print("")
with open(file_store, 'wb') as f:
f.write(bytes(encrypted, 'UTF-8'))
f.close()
As far as I can tell, this works okay for text files (.txt)
I then have a second script (to decrypt the file)
Use the previously created .txt file as source, any .png file as dest:
u_input =("Sourcefile:")
file_store = input("Decrypted output:")
character = ""
decoded_string = ""
with open(u_input, 'r' as f:
while True:
c = f.read(1)
if not c:
f.close()
break
character = character + c
if len(character) % 8 == 0:
decoded_string = decoded_string + chr(int(character, 2))
character = ""
with open(file_store, 'wb') as f:
f.write(bytes(decoded_string, 'UTF-8'))
f.close()
print("SUCCESS!")
Which works partially. i.e. it writes the file. However, I cannot open it or edit it. When I compare my original file (img.png) with my second file (img2.png), I see characters have been replaced or line breaks not entered correctly. I can't view the file in any image viewing / editing program. I do not understand why.
Please could someone try to explain and provide a solution (albeit, partial)? Thanks in advance.
Note: I am aware that my use of "encryption" and "decryption" are not necessarily used correctly, but this is a personal project, so it doesn't matter to me

It appears you're using Python 3, as you put a UTF-8 parameter on the bytes call. That's your problem - the input should be decoded to a byte string, but you're putting together a Unicode string instead, and the conversion isn't 1:1. It's easy to fix.
decoded_string = b""
# ...
decoded_string = decoded_string + bytes([int(character, 2)])
# ...
f.write(decoded_string)
For a version that works in both Python 2 and Python 3, another small modification. This actually measures faster for me in Python 3.5 so it should be the preferred method.
import struct
# ...
decoded_string = decoded_string + struct.pack('B', int(character, 2))

python write umlauts into file

i have the following output, which i want to write into a file:
l = ["Bücher", "Hefte, "Mappen"]
i do it like:
f = codecs.open("testfile.txt", "a", stdout_encoding)
f.write(l)
f.close()
in my Textfile i want to see: ["Bücher", "Hefte, "Mappen"] instead of B\xc3\xbccher
Is there any way to do so without looping over the list and decode each item ? Like to give the write() function any parameter?
Many thanks

First, make sure you use unicode strings: add the "u" prefix to strings:
l = [u"Bücher", u"Hefte", u"Mappen"]
Then you can write or append to a file:
I recommend you to use the io module which is Python 2/3 compatible.
with io.open("testfile.txt", mode="a", encoding="UTF8") as fd:
for line in l:
fd.write(line + "\n")
To read your text file in one piece:
with io.open("testfile.txt", mode="r", encoding="UTF8") as fd:
content = fd.read()
The result content is an Unicode string.
If you decode this string using UTF8 encoding, you'll get bytes string like this:
b"B\xc3\xbccher"
Edit using writelines.
The method writelines() writes a sequence of strings to the file. The sequence can be any iterable object producing strings, typically a list of strings. There is no return value.
# add new lines
lines = [line + "\n" for line in l]
with io.open("testfile.txt", mode="a", encoding="UTF8") as fd:
fd.writelines(lines)

python image (.jpeg) to hex code

I operate with a thermal printer, this printer is able to print images, but it needs to get the data in hex format. For this I would need to have a python function to read an image and return a value containing the image data in hex format.
I currently use this format to sent hex format to the printer:
content = b"\x1B\x4E"
Which is the simplest way to do so using Python2.7?
All the best;

I don't really know what you mean by "hex format", but if it needs to get the whole file as a sequence of bytes you can do:
with open("image.jpeg", "rb") as fp:
img = fp.read()
If your printer expects the image in some other format (like 8bit values for every pixel) then try using the pillow library, it has many image manipulation functions and handles a wide range of input and ouput formats.

How about this:
with open('something.jpeg', 'rb') as f:
binValue = f.read(1)
while len(binValue) != 0:
hexVal = hex(ord(binValue))
# Do something with the hex value
binValue = f.read(1)
Or for a function, something like this:
import re
def imgToHex(file):
string = ''
with open(file, 'rb') as f:
binValue = f.read(1)
while len(binValue) != 0:
hexVal = hex(ord(binValue))
string += '\\' + hexVal
binValue = f.read(1)
string = re.sub('0x', 'x', string) # Replace '0x' with 'x' for your needs
return string
Note: You do not necessarily need to do the re.sub portion if you use struct.pack to write the bits, but this will get it into the format that you need

Read in a jpg and make a string of hex values. Then reverse the procedure. Take a string of hex and write it out as a jpg file...
import binascii
with open('my_img.jpg', 'rb') as f:
data = f.read()
print(data[:10])
im_hex = binascii.hexlify(data)
# check out the hex...
print(im_hex[:10])
# reversing the procedure
im_hex = binascii.a2b_hex(im_hex)
print(im_hex[:10])
# write it back out to a jpg file
with open('my_hex.jpg', 'wb') as image_file:
image_file.write(im_hex)

startswith TypeError in function

Here is the code:
def readFasta(filename):
""" Reads a sequence in Fasta format """
fp = open(filename, 'rb')
header = ""
seq = ""
while True:
line = fp.readline()
if (line == ""):
break
if (line.startswith('>')):
header = line[1:].strip()
else:
seq = fp.read().replace('\n','')
seq = seq.replace('\r','') # for windows
break
fp.close()
return (header, seq)
FASTAsequence = readFasta("MusChr01.fa")
The error I'm getting is:
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
But the first argument to startswith is supposed to be a string according to the docs... so what is going on?
I'm assuming I'm using at least Python 3 since I'm using the latest version of LiClipse.

It's because you're opening the file in bytes mode, and so you're calling bytes.startswith() and not str.startswith().
You need to do line.startswith(b'>'), which will make '>' a bytes literal.

If remaining to open a file in binary, replacing 'STR' to bytes('STR'.encode('utf-8')) works for me.

Without having your file to test on try encoding to utf-8 on the 'open'
fp = open(filename, 'r', encoding='utf-8')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Write a "string" as raw binary into a file Python - python

Related

AttributeError: 'str' object has no attribute 'readlines'. Where did I go wrong in my code?

Python binary messes up some files

python write umlauts into file

python image (.jpeg) to hex code

startswith TypeError in function

Categories

Resources