How to concatenate many binary file in one file in python? - python

I need to append many binary files in one binary file. All my binary files are saved i one folder:
file1.bin
file2.bin
...
For that I try by using this code:
import numpy as np
import glob
import os
Power_Result_File_Path ="/home/Deep_Learning_Based_Attack/Test.bin"
Folder_path =r'/home/Deep_Learning_Based_Attack/Test_Folder/'
os.chdir(Folder_path)
npfiles= glob.glob("*.bin")
loadedFiles = [np.load(bf) for bf in binfiles]
PowerArray=np.concatenate(loadedFiles, axis=0)
np.save(Power_Result_File_Path, PowerArray)
It gives me this error:
"Failed to interpret file %s as a pickle" % repr(file))
OSError: Failed to interpret file 'file.bin' as a pickle
My problem is how to concatenate binary file it is not about anaylysing every file indenpendently.

Taking your question literally: Brute raw data concatenation
files = ['my_file1', 'my_file2']
out_data = b''
for fn in files:
with open(fn, 'rb') as fp:
out_data += fp.read()
with open('the_concatenation_of_all', 'wb') as fp:
fp.write(out_data)
Comment about your example
You seem to be interpreting the files as saved numpy arrays (i.e. saved via np.save()). The error, however, tells me that you didn't save those files via numpy (because it fails decoding them). Numpy uses pickle to save and load, so if you try to open a random non-pickle file with np.load the call will throw an error.

for file in files:
async with aiofiles.open(file, mode='rb') as f:
contents = await f.read()
if file == files[0]:
write_mode = 'wb' # overwrite file
else:
write_mode = 'ab' # append to end of file
async with aiofiles.open(output_file), write_mode) as f:
await f.write(contents)

Related

CSV Should Return Strings, Not Bytes Error

I am trying to read CSV files from a directory that is not in the same directory as my Python script.
Additionally the CSV files are stored in ZIP folders that have the exact same names (the only difference being one ends with .zip and the other is a .csv).
Currently I am using Python's zipfile and csv libraries to open and get the data from the files, however I am getting the error:
Traceback (most recent call last): File "write_pricing_data.py", line 13, in <module>
for row in reader:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
My code:
import os, csv
from zipfile import *
folder = r'D:/MarketData/forex'
localFiles = os.listdir(folder)
for file in localFiles:
zipArchive = ZipFile(folder + '/' + file)
with zipArchive.open(file[:-4] + '.csv') as csvFile:
reader = csv.reader(csvFile, delimiter=',')
for row in reader:
print(row[0])
How can I resolve this error?
It's a bit of a kludge and I'm sure there's a better way (that just happens to elude me right now). If you don't have embedded new lines, then you can use:
import zipfile, csv
zf = zipfile.ZipFile('testing.csv.zip')
with zf.open('testing.csv', 'r') as fin:
# Create a generator of decoded lines for input to csv.reader
# (the csv module is only really happy with ASCII input anyway...)
lines = (line.decode('ascii') for line in fin)
for row in csv.reader(lines):
print(row)

reading gzipped csv file in python 3

I'm having problems reading from a gzipped csv file with the gzip and csv libs. Here's what I got:
import gzip
import csv
import json
f = gzip.open(filename)
csvobj = csv.reader(f,delimiter = ',',quotechar="'")
for line in csvobj:
ts = line[0]
data_json = json.loads(line[1])
but this throws an exception:
File "C:\Users\yaronol\workspace\raw_data_from_s3\s3_data_parser.py", line 64, in download_from_S3
self.parse_dump_file(filename)
File "C:\Users\yaronol\workspace\raw_data_from_s3\s3_data_parser.py", line 30, in parse_dump_file
for line in csvobj:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
gunzipping the file and opening that with csv works fine. I've also tried decoding the file text to convert from bytes to str...
What am I missing here?
Default mode for gzip.open is rb, if you wish to work with strs, you have to specify it extra:
f = gzip.open(filename, mode="rt")
OT: it is a good practice to write I/O operations in a with block:
with gzip.open(filename, mode="rt") as f:
You are opening the file in binary mode (which is the default for gzip).
Try instead:
import gzip
import csv
f = gzip.open(filename, mode='rt')
csvobj = csv.reader(f,delimiter = ',',quotechar="'")
too late, you can use datatable package in python
import datatable as dt
df = dt.fread(filename)
df.head()

Append binary file to another binary file

I want to append a previously-written binary file with a more recent binary file created.
Essentially merging them. This is the sample code I am using:
with open("binary_file_1", "ab") as myfile:
myfile.write("binary_file_2")
Except the error I get is "TypeError: must be string or buffer, not file"
But that's exactly what I am wanting to do! Add one binary file to the end of an earlier created binary file.
I did try adding "wb" to the "myfile.write("binary_file_2", "wb")but it didn't like that.
You need to actually open the second file and read its contents:
with open("binary_file_1", "ab") as myfile, open("binary_file_2", "rb") as file2:
myfile.write(file2.read())
From the python module shutil
import os
import shutil
WDIR=os.getcwd()
fext=open("outputFile.bin","wb")
for f in lstFiles:
fo=open(os.path.join(WDIR,f),"rb")
shutil.copyfileobj(fo, fext)
fo.close()
fext.close()
First we open the the outputFile.bin binary file for writing and then I loop over the list of files in lstFiles using the shutil.copyfileobj(src,dest) where src and dest are file objects. To get the file object just open the file by calling open on the filename with the proper mode "rb" read binary. For each file object opened we must close it. The concatenated file must be closed as well.
I hope it helps
for file in files:
async with aiofiles.open(file, mode='rb') as f:
contents = await f.read()
if file == files[0]:
write_mode = 'wb' # overwrite file
else:
write_mode = 'ab' # append to end of file
async with aiofiles.open(output_file), write_mode) as f:
await f.write(contents)

Python - how to open a file that is not yet written to disk?

I am using a script to strip exif data from uploaded JPGs in Python, before writing them to disk. I'm using Flask, and the file is brought in through requests
file = request.files['file']
strip the exif data, and then save it
f = open(file)
image = f.read()
f.close()
outputimage = stripExif(image)
f = ('output.jpg', 'w')
f.write(outputimage)
f.close()
f.save(os.path.join(app.config['IMAGE_FOLDER'], filename))
Open isn't working because it only takes a string as an argument, and if I try to just set f=file, it throws an error about tuple objects not having a write attribute. How can I pass the current file into this function before it is read?
file is a FileStorage, described in http://werkzeug.pocoo.org/docs/datastructures/#werkzeug.datastructures.FileStorage
As the doc says, stream represents the stream of data for this file, usually under the form of a pointer to a temporary file, and most function are proxied.
You probably can do something like:
file = request.files['file']
image = file.read()
outputimage = stripExif(image)
f = open(os.path.join(app.config['IMAGE_FOLDER'], 'output.jpg'), 'w')
f.write(outputimage)
f.close()
Try the io package, which has a BufferedReader(), ala:
import io
f = io.BufferedReader(request.files['file'])
...
file = request.files['file']
image = stripExif(file.read())
file.close()
filename = 'whatever' # maybe you want to use request.files['file'].filename
dest_path = os.path.join(app.config['IMAGE_FOLDER'], filename)
with open(dest_path, 'wb') as f:
f.write(image)

How do I automatically handle decompression when reading a file in Python?

I am writing some Python code that loops through a number of files and processes the first few hundred lines of each file. I would like to extend this code so that if any of the files in the list are compressed, it will automatically decompress while reading them, so that my code always receives the decompressed lines. Essentially my code currently looks like:
for f in files:
handle = open(f)
process_file_contents(handle)
Is there any function that can replace open in the above code so that if f is either plain text or gzip-compressed text (or bzip2, etc.), the function will always return a file handle to the decompressed contents of the file? (No seeking required, just sequential access.)
I had the same problem: I'd like my code to accept filenames and return a filehandle to be used with with, automatically compressed & etc.
In my case, I'm willing to trust the filename extensions and I only need to deal with gzip and maybe bzip files.
import gzip
import bz2
def open_by_suffix(filename):
if filename.endswith('.gz'):
return gzip.open(filename, 'rb')
elif filename.endswith('.bz2'):
return bz2.BZ2file(filename, 'r')
else:
return open(filename, 'r')
If we don't trust the filenames, we can compare the initial bytes of the file for magic strings (modified from https://stackoverflow.com/a/13044946/117714):
import gzip
import bz2
magic_dict = {
"\x1f\x8b\x08": (gzip.open, 'rb')
"\x42\x5a\x68": (bz2.BZ2File, 'r')
}
max_len = max(len(x) for x in magic_dict)
def open_by_magic(filename):
with open(filename) as f:
file_start = f.read(max_len)
for magic, (fn, flag) in magic_dict.items():
if file_start.startswith(magic):
return fn(filename, flag)
return open(filename, 'r')
Usage:
# cat
for filename in filenames:
with open_by_suffix(filename) as f:
for line in f:
print f
Your use-case would look like:
for f in files:
with open_by_suffix(f) as handle:
process_file_contents(handle)

Categories

Resources