I have a big binary file. How I can write (prepend) to the begin of the file?
Ex:
file = 'binary_file'
string = 'bytes_string'
I expected get new file with content: bytes_string_binary_file.
Construction open("filename", ab) appends only.
I'm using Python 3.3.1.
There is no way to prepend to a file. You must rewrite the file completely:
with open("oldfile", "rb") as old, open("newfile", "wb") as new:
new.write(string)
new.write(old.read())
If you want to avoid reading the whole file into memory, simply read it by chunks:
with open("oldfile", "rb") as old, open("newfile", "wb") as new:
for chunk in iter(lambda: old.read(1024), b""):
new.write(chunk)
Replace 1024 with a value that works best with your system. (it is the number of bytes read each time).
Related
I'm trying to make the program that convert all words to uppercase.
a = open("file.txt",encoding='UTF-8')
for b in a:
c = b.rstrip()
print(c.upper())
a.close()
this is my code
it prints uppercase text. But it can't save the file on 'file.txt'.
I want to convert all words to uppercase.
How can I solve it????
Here's how you can do it: [provided that you are working with a small file]
Open the file in read mode store the uppercase text in a variable; then, open another file handler in write mode and write the content into it.
with open('file.txt' , 'r') as input:
y = input.read().upper()
with open('file.txt', 'w') as out:
out.write(y)
You can actually do this "in place" by reading and writing a character at a time.
with open("file.txt", "r") as f:
while (b := f.read(1)) != '':
f.write(b.upper())
This is safe because you are processing the file one byte at a time (and writing one byte for every byte read) and not using seek to potentially overwrite a byte before it is read. The file-like object's underlying buffering and your system's disk cache means this isn't as inefficient as it looks.
(This does make one assumption: that the encoded length of b is always the same as b.upper(). I suspect that should always be true. If not, you should be able to read and write at least a line at a time, though not in place:
with open("input.txt") as inh, open("output.txt", "w") as outh:
for line in inh:
print(line.upper(), file=outh)
)
First convert the txt into the string:
with open('file.txt', 'r') as file:
data = file.read()
And then revise the data to the uppercase:
data_revise = data.upper()
Finally revise the texts in the file:
fout = open('data/try.txt', 'w')
fout.write(data_revise)
You can write all changes to temporary file and replace original after all data processed. You can use either map() or generator expression:
with open(r"C:\original.txt") as inp_f, open(r"C:\temp.txt", "w+") as out_f:
out_f.writelines(map(str.upper, inp_f))
with open(r"C:\original.txt") as inp_f, open(r"C:\temp.txt", "w+") as out_f:
out_f.writelines(s.upper() for s in inp_f)
To replace original file you can use shutil.move():
import shutil
...
shutil.move(r"C:\temp.txt", r"C:\original.txt")
I'm adding a string at the end of a binary file, the problem is that I don't know how to get that string back. I append the string, to the binary file, in ascii format using that command.
f=open("file", "ab")
f.write("Testing string".encode('ascii'))
f.close()
The string length will be max 40 characters, if it is shorter it will be padded with zeros. But I don't know how to get the string back from the binary file since it is at the end and how to rewrite the file without the string. Thank you in advance.
Since you opened the file in append mode, you can't read from it like that.
You will need to reopen in in read mode like so:
f = open("file", "rb")
fb = f.read()
f.close()
For future reference, an easier way to open files is like this:
with open("file", "rb") as f:
fb = f.read()
At which point you can use fb. In doing this, it will automatically close itself after the with has finished.
I'm trying to replicate this bash command in Bash which returns each file gzipped 50MB each.
split -b 50m "file.dat.gz" "file.dat.gz.part-"
My attempt at the python equivalent
import gzip
infile_name = "file.dat.gz"
chunk = 50*1024*1024 # 50MB
with gzip.open(infile_name, 'rb') as infile:
for n, raw_bytes in enumerate(iter(lambda: infile.read(slice), '')):
print(n, chunk)
with gzip.open('{}.part-{}'.format(infile_name[:-3], n), 'wb') as outfile:
outfile.write(raw_bytes)
This returns 15MB each gzipped. When I gunzip the files, then they are 50MB each.
How do I split the gzipped file in python so that split up files are each 50MB each before gunzipping?
I don't believe that split works the way you think it does. It doesn't split the gzip file into smaller gzip files. I.e. you can't call gunzip on the individual files it creates. It literally breaks up the data into smaller chunks and if you want to gunzip it, you have to concatenate all the chunks back together first. So, to emulate the actual behavior with Python, we'd do something like:
infile_name = "file.dat.gz"
chunk = 50*1024*1024 # 50MB
with open(infile_name, 'rb') as infile:
for n, raw_bytes in enumerate(iter(lambda: infile.read(chunk), b'')):
print(n, chunk)
with open('{}.part-{}'.format(infile_name[:-3], n), 'wb') as outfile:
outfile.write(raw_bytes)
In reality we'd read multiple smaller input chunks to make one output chunk to use less memory.
We might be able to break the file into smaller files that we can individually gunzip, and still make our target size. Using something like a bytesIO stream, we could gunzip the file and gzip it into that memory stream until it was the target size then write it out and start a new bytesIO stream.
With compressed data, you have to measure the size of the output, not the size of the input as we can't predict how well the data will compress.
Here's a solution for emulating something like the split -l (split on lines) command option that will allow you to open each individual file with gunzip.
import io
import os
import shutil
from xopen import xopen
def split(infile_name, num_lines ):
infile_name_fp = infile_name.split('/')[-1].split('.')[0] #get first part of file name
cur_dir = '/'.join(infile_name.split('/')[0:-1])
out_dir = f'{cur_dir}/{infile_name_fp}_split'
if os.path.exists(out_dir):
shutil.rmtree(out_dir)
os.makedirs(out_dir) #create in same folder as the original .csv.gz file
m=0
part=0
buf=io.StringIO() #initialize buffer
with xopen(infile_name, 'rt') as infile:
for line in infile:
if m<num_lines: #fill up buffer
buf.write(line)
m+=1
else: #write buffer to file
with xopen(f'{out_dir}/{infile_name_fp}.part-{str(part).zfill(5)}.csv.gz', mode='wt', compresslevel=6) as outfile:
outfile.write(buf.getvalue())
m=0
part+=1
buf=io.StringIO() #flush buffer -> faster than seek(0); truncate(0);
#write whatever is left in buffer to file
with xopen(f'{out_dir}/{infile_name_fp}.part-{str(part).zfill(5)}.csv.gz', mode='wt', compresslevel=6) as outfile:
outfile.write(buf.getvalue())
buf.close()
Usage:
split('path/to/myfile.csv.gz', num_lines=100000)
Outputs a folder with split files at path/to/myfile_split.
Discussion: I've used xopen here for additional speed, but you may choose to use gzip.open if you want to stay with Python native packages. Performance-wise, I've benchmarked this to take about twice as long as a solution combining pigz and split. It's not bad, but could be better. The bottleneck is the for loop and the buffer, so maybe rewriting this to work asynchronously would be more performant.
I'm trying my hand at this rosalind problem and am running into an issue. I believe everything in my code is correct but it obviously isn't as it's not running as intended. i want to delete the contents of the file and then write some text to that file. The program writes the text that I want it to, but it doesn't first delete the initial contents.
def ini5(file):
raw = open(file, "r+")
raw2 = (raw.read()).split("\n")
clean = raw2[1::2]
raw.truncate()
for line in clean:
raw.write(line)
print(line)
I've seen:
How to delete the contents of a file before writing into it in a python script?
But my problem still persists. What am I doing wrong?
truncate() truncates at the current position. Per its documentation, emphasis added:
Resize the stream to the given size in bytes (or the current position if size is not specified).
After a read(), the current position is the end of the file. If you want to truncate and rewrite with that same file handle, you need to perform a seek(0) to move back to the beginning.
Thus:
raw = open(file, "r+")
contents = raw.read().split("\n")
raw.seek(0) # <- This is the missing piece
raw.truncate()
raw.write('New contents\n')
(You could also have passed raw.truncate(0), but this would have left the pointer -- and thus the location for future writes -- at a position other than the start of the file, making your file sparse when you started writing to it at that position).
If you want to completley overwrite the old data in the file, you should use another mode to open the file.
It should be:
raw = open(file, "w") # or "wb"
To resolve your problem, First read the file's contents:
with open(file, "r") as f: # or "rb"
file_data = f.read()
# And then:
raw = open(file, "w")
And then open it using the write mode.This way, you will not append your text to the file, you'll just write only your data to it.
Read about mode files here.
I want to read all data from a binary file without the last 4 bytes. How can we do it in Python?
This question is old, but for others who find this on Google, please note: doing f.read()[:-4] will read the whole file into memory, then slice it.
If you only need the first N bytes of a file, simply pass the size to the read() function as an argument:
with open(path_to_file, 'rb') as f:
first_four = f.read(4)
This avoids reading an entire potentially large file when you could lazily read the first N bytes and stop.
If you need the last N bytes of a file, you can seek to the last block of the file with os.lseek(). This is more complicated and left as an exercise for the reader. :-)
Read it as you normally would, then slice the result:
with open(path_to_file, 'rb') as f:
f.read()[:-4]