I have three sound files for example a.wav, b.wav and c.wav . I want to write them into a single file for example all.xmv (extension could be different too) and when I need I want to extract one of them and I want to play it (for example I want to play a.wav and extract it form all.xmv).
How can I do it in python. I have heard that there is a function named blockwrite in Delphi and it does the thing that I want. Is there a function in python that is like blockwrite in Delphi or how can I write these files and play them?
Would standard tar/zip files work for you?
http://docs.python.org/library/zipfile.html
http://docs.python.org/library/tarfile.html
If the archive idea (which is btw, the best answer to your question) doesn't suit you, you can fuse the data from several files in one file, e.g. by writing consecutive blocks of binary data (thus creating an uncompressed archive!)
Let paths be a list of files that should be concatenated:
import io
import os
offsets = [] # the offsets that should be kept for later file navigation
last_offset = 0
fout = io.FileIO(out_path, 'w')
for path in paths:
f = io.FileIO(path) # stream IO
fout.write(f.read())
f.close()
last_offset += os.path.getsize(path)
offsets.append(last_offset)
fout.close()
# Pseudo: write the offsets to separate file e.g. by pickling
# ...
# reading the data, given that offsets[] list is available
file_ID = 10 # e.g. you need to read 10th file
f = io.FileIO(path)
f.seek(offsets[file_ID - 1]) # seek to required position
read_size = offsets[filed_ID] - offsets[file_ID - 1] # get the file size
data = f.read(read_size) # here we are!
f.close()
Related
I working on a pdf file dedupe project and analyzed many libraries in python, which read files, then generate hash value of it and then compare it with the next file for duplication - similar to logic below or using python filecomp lib. But the issue I found these logic is like, if a pdf is generated from a source DOCX(Save to PDF) , those outputs are not considered duplicates - even content is exactly the same. Why this happens? Is there any other logic to read the content, then create a unique hash value based on the actual content.
def calculate_hash_val(path, blocks=65536):
file = open(path, 'rb')
hasher = hashlib.md5()
data = file.read()
while len(data) > 0:
hasher.update(data)
data = file.read()
file.close()
return hasher.hexdigest()
One of the things that happens is that you save metadata to the file including the time of creation. It is invisible in the PDF, but that will make the hash different.
Here is an explanation of how to find and strip out that data with at least one tool. I am sure that there are many others.
I am new here to try to solve one of my interesting questions in World of Tanks. I heard that every battle data is reserved in the client's disk in the Wargaming.net folder because I want to make a batch of data analysis for our clan's battle performances.
image
It is said that these .dat files are a kind of json files, so I tried to use a couple of lines of Python code to read but failed.
import json
f = open('ex.dat', 'r', encoding='unicode_escape')
content = f.read()
a = json.loads(content)
print(type(a))
print(a)
f.close()
The code is very simple and obviously fails to make it. Well, could anyone tell me the truth about that?
Added on Feb. 9th, 2022
After I tried another set of codes via Jupyter Notebook, it seems like something can be shown from the .dat files
import struct
import numpy as np
import matplotlib.pyplot as plt
import io
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
fbuff = io.BufferedReader(f)
N = len(fbuff.read())
print('byte length: ', N)
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
data =struct.unpack('b'*N, f.read(1*N))
The result is a set of tuple but I have no idea how to deal with it now.
Here's how you can parse some parts of it.
import pickle
import zlib
file = '4402905758116487.dat'
cache_file = open(file, 'rb') # This can be improved to not keep the file opened.
# Converting pickle items from python2 to python3 you need to use the "bytes" encoding or "latin1".
legacyBattleResultVersion, brAllDataRaw = pickle.load(cache_file, encoding='bytes', errors='ignore')
arenaUniqueID, brAccount, brVehicleRaw, brOtherDataRaw = brAllDataRaw
# The data stored inside the pickled file will be a compressed pickle again.
vehicle_data = pickle.loads(zlib.decompress(brVehicleRaw), encoding='latin1')
account_data = pickle.loads(zlib.decompress(brAccount), encoding='latin1')
brCommon, brPlayersInfo, brPlayersVehicle, brPlayersResult = pickle.loads(zlib.decompress(brOtherDataRaw), encoding='latin1')
# Lastly you can print all of these and see a lot of data inside.
The response contains a mixture of more binary files as well as some data captured from the replays.
This is not a complete solution but it's a decent start to parsing these files.
First you can look at the replay file itself in a text editor. But it won't show the code at the beginning of the file that has to be cleaned out. Then there is a ton of info that you have to read in and figure out but it is the stats for each player in the game. THEN it comes to the part that has to do with the actual replay. You don't need that stuff.
You can grab the player IDs and tank IDs from WoT developer area API if you want.
After loading the pickle files like gabzo mentioned, you will see that it is simply a list of values and without knowing what the value is referring to, its hard to make sense of it. The identifiers for the values can be extracted from your game installation:
import zipfile
WOT_PKG_PATH = "Your/Game/Path/res/packages/scripts.pkg"
BATTLE_RESULTS_PATH = "scripts/common/battle_results/"
archive = zipfile.ZipFile(WOT_PKG_PATH, 'r')
for file in archive.namelist():
if file.startswith(BATTLE_RESULTS_PATH):
archive.extract(file)
You can then decompile the python files(uncompyle6) and then go through the code to see the identifiers for the values.
One thing to note is that the list of values for the main pickle objects (like brAccount from gabzo's code) always has a checksum as the first value. You can use this to check whether you have the right order and the correct identifiers for the values. The way these checksums are generated can be seen in the decompiled python files.
I have been tackling this problem for some time (albeit in Rust): https://github.com/dacite/wot-battle-results-parser/tree/main/datfile_parser.
I'm using Physionet's data base for some tasks related to ECG signal analysis. I wanted to read .MAT files, extract the MLII readings on the file (located throughout row 1), adjust the signal to mV using "gain" and "base" (located in the .INFO filed also supplied by Physionet) and finally print the signal values and its period.
I wanted to write a script that could do all of those things to all the files in one folder. Before this, I wrote one in which I could do everythin mentioned above and it worked nicely.
But the script that would manage all the .mat and .info files in my folder is giving me problems with the variables. I tried using the 'global' command in the very beginning of my succession of IFs, but it kept sending a similar error message.
This is the code:
import os
import scipy.io as sio
import numpy as np
import re
import matplotlib.pyplot as plt
for file in os.listdir('C:blablablablabla\Multiple .mat files'):
if file.endswith(".mat"):
file_name=os.path.splitext(file)
ext_txt=".txt"
ext_info=".info"
if file.endswith(".info"):
f=open(file_name[0]+ext_info,'r')
k=f.read()
f.close()
j=re.findall('\d+', k)
Fs=j[9]
gain=j[13]
base=j[14]
RawData=sio.loadmat(file)
signalVectors=RawData['val']
[a,b]=signalVectors.shape
signalVectors_2=np.true_divide((signalVectors-gain),base)
ecgSignal=signalVectors_2[1,1:]
T=np.true_divide(np.linspace(1,b,num=b-1),Fs)
txt_data=np.array([ecgSignal, T])
txt_data=txt_data.T
f=open(file_name[0]+ext_name,'w')
np.savetxt(file_name[0]+ext_txt,txt_data,fmt=['%.8f','%.8f'])
f.close()
The error message I get is:
> File "C:blablablablabla\Multiple .mat files\ecg_mat_multi.py", line 24, in <module>
signalVectors_2=np.true_divide((signalVectors-gain),base)
NameError: name 'gain' is not defined
The problem comes with the variables 'gain', 'base' and 'Fs'. I tried to define them as global variables, but that didn't make a difference. Can you help me fix this error, please?
Thanks a lot for your time and help.
EDIT 1: copied the error message below the script.
EDIT 2: Changed the post title and erased additional questions.
Use two loops and extract the info before processing the data files
for filepath in os.listdir('C:blablablablabla\Multiple .mat files'):
if filepath.endswith(".info"):
Fs, gain, base = get_info(filepath)
break
for file in os.listdir('C:blablablablabla\Multiple .mat files'):
if file.endswith(".mat"):
file_name=os.path.splitext(file)
...
RawData=sio.loadmat(file)
signalVectors=RawData['val']
...
I was working off your first edit so I'll include this even though the question has been streamlined
# foo.info
Source: record mitdb/100 Start: [00:00:10.000]
val has 2 rows (signals) and 3600 columns (samples/signal)
Duration: 0:10
Sampling frequency: 360 Hz Sampling interval: 0.002777777778 sec
Row Signal Gain Base Units
1 MLII 200 1024 mV
2 V5 200 1024 mV
To convert from raw units to the physical units shown
above, subtract 'base' and divide by 'gain'.
I would also write a function that returns the info you want. Using a function to extract the info makes the code in your loop more readable and it makes it easier to test the extraction.
Since the file is well structured, you could probably iterate over the lines and extract the info by counting lines and using str.split and slices.
This function uses regex patterns to extract the info:
# regex patterns
hz_pattern = r'frequency: (\d+) Hz'
mlii_pattern = r'MLII\t(\d+)\t(\d+)'
def get_info(filepath):
with open(filepath) as f:
info = f.read()
match = re.search(hz_pattern, info)
Fs = match.group(1)
match = re.search(mlii_pattern, info)
gain, base = match.groups()
return map(int, (Fs, gain, base))
If there are multiple .info and .mat files in a directory, you want to ensure you extract the correct info for the data. Since the .info file has the same name as the .mat file that it belongs to, sort the directory list by name then group by name -this will ensure you are operating on the two files that are related to each other.
import itertools
def name(filename):
name, extension = filename.split('.')
return name
files = os.listdir('C:blablablablabla\Multiple .mat files')
files.sort(key = name)
for fname, _ in itertools.groupby(files, key = name):
fname_info = name + '.info'
fname_data = name + '.mat'
Fs, gain, base = get_info(fname_info)
# process datafile
I am trying to concatenate a list of wav files into a single continuous wav file. I have used the following snippet but the output is not correct, sounds almost like the files are on top of each other. audio_files is a list with .wav filenames that are playing as expected.
This is my current code:
outfile = "sounds.wav"
data= []
for wav_file in audio_files:
w = wave.open(wav_file, 'rb')
data.append( [w.getparams(), w.readframes(w.getnframes())] )
w.close()
output = wave.open(outfile, 'wb')
output.setparams(data[0][0])
output.writeframes(data[0][1])
output.writeframes(data[1][1])
output.close()
I am assuming that you were using this question as your reference. However, the code that you have taken only takes into account 2 wav files because that is what the original question had asked.
Although I am not if this will fix the problem of the sound being on top of each other, you should iterate through each item in your data list.
output = wave.open(outfile, 'wb')
output.setparams(data[0][0])
for params,frames in data:
output.writeframes(frames)
output.close()
to ensure you are putting in the frames from each file you have.
One thing to keep in mind is that the params for your new wav files might be specific to each file and you might want to check that getparams() is returning similar results for each file.
I have created a series of PDF documents (maps) using data driven pages in ESRI ArcMap 10. There is a page 1 and page 2 for each map generated from separate *.mxd. So I have one list of PDF documents containing page 1 for each map and one list of PDF documents containing page 2 for each map. For example: Map1_001.pdf, map1_002.pdf, map1_003.pdf...map2_001.pdf, map2_002.pdf, map2_003.pdf...and so one.
I would like to append these maps, pages 1 and 2, together so that both page 1 and 2 are together in one PDF per map. For example: mapboth_001.pdf, mapboth_002.pdf, mapboth_003.pdf... (they don't have to go into a new pdf file (mapboth), it's fine to append them to map1)
For each map1_ *.pdf
Walk through the directory and append map2_ *.pdf where the numbers (where the * is) in the file name match
There must be a way to do it using python. Maybe with a combination of arcpy, os.walk or os.listdir, and pyPdf and a for loop?
for pdf in os.walk(datadirectory):
??
Any ideas? Thanks kindly for your help.
A PDF file is structured in a different way than a plain text file. Simply putting two PDF files together wouldn't work, as the file's structure and contents could be overwritten or become corrupt. You could certainly author your own, but that would take a fair amount of time, and intimate knowledge of how a PDF is internally structured.
That said, I would recommend that you look into pyPDF. It supports the merging feature that you're looking for.
This should properly find and collate all the files to be merged; it still needs the actual .pdf-merging code.
Edit: I have added pdf-writing code based on the pyPdf example code. It is not tested, but should (as nearly as I can tell) work properly.
Edit2: realized I had the map-numbering crossways; rejigged it to merge the right sets of maps.
import collections
import glob
import re
# probably need to install this module -
# pip install pyPdf
from pyPdf import PdfFileWriter, PdfFileReader
def group_matched_files(filespec, reg, keyFn, dataFn):
res = collections.defaultdict(list)
reg = re.compile(reg)
for fname in glob.glob(filespec):
data = reg.match(fname)
if data is not None:
res[keyFn(data)].append(dataFn(data))
return res
def merge_pdfs(fnames, newname):
print("Merging {} to {}".format(",".join(fnames), newname))
# create new output pdf
newpdf = PdfFileWriter()
# for each file to merge
for fname in fnames:
with open(fname, "rb") as inf:
oldpdf = PdfFileReader(inf)
# for each page in the file
for pg in range(oldpdf.getNumPages()):
# copy it to the output file
newpdf.addPage(oldpdf.getPage(pg))
# write finished output
with open(newname, "wb") as outf:
newpdf.write(outf)
def main():
matches = group_matched_files(
"map*.pdf",
"map(\d+)_(\d+).pdf$",
lambda d: "{}".format(d.group(2)),
lambda d: "map{}_".format(d.group(1))
)
for map,pages in matches.iteritems():
merge_pdfs((page+map+'.pdf' for page in sorted(pages)), "merged{}.pdf".format(map))
if __name__=="__main__":
main()
I don't have any test pdfs to try and combine but I tested with a cat command on text files.
You can try this out (I'm assuming unix based system): merge.py
import os, re
files = os.listdir("/home/user/directory_with_maps/")
files = [x for x in files if re.search("map1_", x)]
while len(files) > 0:
current = files[0]
search = re.search("_(\d+).pdf", current)
if search:
name = search.group(1)
cmd = "gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=FULLMAP_%s.pdf %s map2_%s.pdf" % (name, current, name)
os.system(cmd)
files.remove(current)
Basically it goes through and grabs the maps1 list and then just goes through and assumes correct files and just goes through numbers. (I can see using a counter to do this and padding with 0's to get similar effect).
Test the gs command first though, I just grabbed it from http://hints.macworld.com/article.php?story=2003083122212228.
There are examples of how to to do this on the pdfrw project page at googlecode:
http://code.google.com/p/pdfrw/wiki/ExampleTools