How to merge wma files to mp3 (with header editing)? - python

I have some .wma file which I am trying to merge into a single one...
I started with python reading files in bytes and writing them into a new one, just as I tried the cmd command copy /b file1.wma + file2.wma + else.wma total.wma
all came up with the same result: my total file was as large in byte as real total of my segments, but when I try to open the file it plays the first segment both in length(time) and content -meaning that I have a 15 MB 10 second voice :-))
I tried to do that with different .wma files but each time it is the first one in length and content and total of them in size.
My assumption is that probably some were my .wma data frame (maybe in file header) there is a data about length of current file, so that after merging the file when the player attempts to play the file reads that data about time and stops after the time. or some like that.
so I need to edit those data frame or header (if even exist) in a way that matches my final output or just simply ignore that.
but I don't know whether it is right or how I can do that
.wma file sample: https://github.com/Fsunroo/PowerPointVoiceExtract (media1.wma and media2.wma for example)
note: there is no such problem with web applications that join songs (maybe they do editing header??!)
Note2: it is a part of my code witch extract voice from a power point file.

I solved the problem by using moviepy.editor
the corrected project is accessible at: https://github.com/Fsunroo/PowerPointVoiceExtract

Related

Backblaze video duration issue in the merged large video file

I have a video file of roughly 100 Mb I have split it into 3 parts of 35Mb, 35Mb, 30Mb each.
Steps I have done,
I have called the start_large_file and I got the fileId.
Successfully uploaded all the video parts using upload_part and provided the fileId, part_number, sha1, content length, and input_stream.
Finally called the finish_large_file API with fileId and sha1 array of all the parts. The API gave a successful response and action equals upload.
Now, when I hit the merged file URL the video duration is equal to that of part 1 but the size is equal to 100Mb.
So the issue is with the merged video duration. The video duration should be equal to the all the parts combined.
Splitting a video file with FFmpeg will result in multiple shorter videos, each with its own header containing the length of the video, amongst other metadata. When the parts are recombined, the player will look at the header at the beginning of the file for the video length. It doesn't know that there is additional content after that first part.
If you're on Linux or a Mac, you can use the split command, like this:
split -b 35M my_video.mp4 video_parts_
This will result in three output files:
video_parts_aa - 35MB
video_parts_ab - 35MB
video_parts_ac - 30MB
These are the files you should upload (in order!). When they are recombined the result will be identical to the original file.
The easiest way to do this on Windows seems to be to obtain the split command via Cygwin.

How to Decompress a TAR file into TXT (read a CEL file) in either Python or R

I was wondering if anyone knows how to decompress TAR files in R and how to extrapolate data from large numbers of GZ files? In addition, does anyone know how to read large amounts of data (around the 100's) simultaneously while maintaining the integrity of the data files (at some point, my computer can't handle the amount of data and begins to write down scribbles)?
As a novice programmer still learning about programming. I was given an assignment to analyze and cross-reference data on similar genes found between different cell structures for a disease trait. I managed to access TXT dataset files to work and formatted it to be recognized by another program known as GSEA.
1.) I installed a software known as "WinZip" and it helped me decompress my TAR files into GZ files.
I stored these files into an newly created folder under "Downloads"
2.) I then tried to use R to access the files with this code:
>untar("file.tar", list=TRUE)
And it produced approximately 170 results (it converted TAR -> GZ files)
3.) When I tried to input one of the GZ files, it generated over a thousand lines of single alpha-numerical letters and numbers unintelligible to me.
>989 ™šBx
>990 33BŸ™šC:LÍC\005€
>991 LÍB¬
>992 B«™šBꙚB™™šB¯
>993 B¡
>994 BŸ
>995 C\003
>996 BŽ™šBð™šB¦
>997 B(
>998 LÍAòffBó
>999 LÍBñ™šBó
>1000 €
> [ reached 'max' / getOption("max.print") -- omitted 64340 rows ]
Warning messages:
>1: In read.table("GSM2458563_Control_1_0.CEL.gz") :
line 1 appears to contain embedded nulls
>2: In read.table("GSM2458563_Control_1_0.CEL.gz") :
line 2 appears to contain embedded nulls
>3: In read.table("GSM2458563_Control_1_0.CEL.gz") :
line 3 appears to contain embedded nulls
>4: In read.table("GSM2458563_Control_1_0.CEL.gz") :
line 4 appears to contain embedded nulls
>5: In read.table("GSM2458563_Control_1_0.CEL.gz") :
line 5 appears to contain embedded nulls
>6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
embedded nul(s) found in input
What I am trying to do is simultaneously access all of these files without information overload on the computer and maintain the integrity of the data. Then, I want to access the information properly where it would resemble some sort of data table (ideally, I was wondering if conversions from TAR to TXT file would have been possible for GSEA to read and identify such data).
Does anyone know any programs compatible with window that could properly decompress and read such files or any R commands that would help me generate or convert such data files?
Backgound Research
So I've been working on it around an hour - here are the results.
The file that you are trying to open is GSM2458563_Control_1_0 is compressed inside .gz file, which contains a .CELL file, therefore it's unreadable.
Such files are published by the "National Center for Biotechnology Information".
Seen a Python 2 code to open them:
from Bio.Affy import CelFile
with open('GSM2458563_Control_1_0.CEL') as file:
c = CelFile.read(file)
I've found documentation about Bio.Affy on version 1.74 of biopython.
Yet current biopython readme says:
"...Biopython 1.76 was our final release to support Python 2.7 and Python 3.5."
Nowadays Python 2 is deprecated, not to mention that the library mentioned above has evolved and changed tremendously.
Solution
So I found another way around it, using R.
My Specs:
Operation System : Windows 64
RStudio : Version 1.3.1073
R Version : R-4.0.2 for Windows
I've pre-installed the dependencies mentioned below.
Use the GEOquery.getGEO function to fetch from NCBI GEO the file.
# Presequites
# Download and install Rtools custom from http://cran.r-project.org/bin/windows/Rtools/
# Install BiocManager
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("GEOquery")
library(GEOquery)
# Download and open the data
gse <- getGEO("GSM2458563", GSEMatrix = TRUE)
show(gse)
# ****** Data Table ******
# ID_REF VALUE
# 1 7892501 1.267832
# 2 7892502 3.254963
# 3 7892503 1.640587
# 4 7892504 7.198422
# 5 7892505 2.226013

Memory issues when row-wise appending 2 csv files

I have a larger csv file (about 550 mb) and a smaller csv file (about 5mb) and I want to combine all the rows into one csv file. They both have the same header (same order, values, number of columns) and obviously the larger file has more rows. I'm using 32-bit Python (can't change it) and I'm having issues appending the csv's. It seems that the top answer and the next answer after the top answer works here: How do I combine large csv files in python?. However, this takes an ungodly amount of time and I am looking for ways to expedite the process. Also, when I stop running the code in the second answer for the linked question (since it takes so long to run), the first row in the resulting csv is always empty. I guess when you call pd.to_csv(..., mode='a', ...), it appends below the first row of the csv. How do you ensure the first row is populated?
This is much simpler in Linux command line, and won't need to load the file into memory
Use the tail command, the +2 is the number of lines to skip. Often for me, because of how the files are formatted I need +2 instead of +1:
tail -n +2 small.csv >> giant.csv
This should do the trick.
If you need to do it in python then, something like append mode might work but will need to load into memory.

Analyse and cancel common frequency of wave file

I was working with audio processing but got stuck. I have a video file which first I converted to .wav file. Actually I need to extract only the vocal part. What I did is I am able to remove the vocal part and only the background sound. That means I have two file now one the main file another only the music file i.e. Karoake file. Both the file sample rate is same. What I am planning to do that to compare the file whenever the main file and karoake file at exactly the same time will give zero if both are same. In the process I can extract the vocal parts only. I am new to octave and matlab. I am attaching my till date work.
[wave,fs]=wavread('music.wav');
[wave1,fs1]=wavread('sound.wav');
t=0:1/fs:(length(wave)-1)/fs;
t1=0:1/fs1:(length(wave1)-1)/fs1;
for x=0:length(wave)
if (wave{x}==wave1{x})
wave2{x}=wave{x}-wave1{x};
else
endif
endfor
for loop is showing an error.
EDIT: OK the question I asked was not actually the question. What I want is that to extract the vocal part only of an audio file.
In Matlab (or Octave) you can only index matrix with "()". Moreover the start index in Matlab is 1, not 0 like in C or Java. So I think your code must be correct by:
for x=1:length(wave)
if (wave(x)==wave1(x))
wave2(x)=wave(x)-wave1(x);
end
You can also remove the for instruction by using some arrays multiplications:
wave2 = (wave - wave1).*(wave1==wave);

Linux and python: Combining multiple wave files to one wave file

I am looking for a way that I can combine multiple wave files into one wave file using python and run it on linux. I don't want to use any add on other than the default shell command line and default python modules.
For example, if I have a.wav and b.wav. I want to create a c.wav which start with the content from a.wav then b.wav.
I've found wave module, that I can open a wave file and write into a new file. Since i'm really new in this audio world. I still can't figure out how to do it. Below is my code
import struct, wave
waveFileA = wave.open('./a.wav', 'r')
waveFileB = wave.open('./b.wav', 'r')
waveFileC = wave.open('./c.wav', 'w')
lengthA = waveFileA.getnframes()
for i in range(0,lengthA):
waveFileC.writeframes(waveFileA.readframes(1))
lengthB = waveFileB.getnframes()
for i in range(0,lengthB):
waveFileC.writeframes(waveFileB.readframes(1))
waveFileA.close()
waveFileB.close()
waveFileC.close()
When i run this code, I got this error:
wave.Error: # channels not specified
Please can any one help me?
You need to set the number of channels, sample width, and frame rate:
waveFileC.setnchannels(waveFileA.getnchannels())
waveFileC.setsampwidth(waveFileA.getsampwidth())
waveFileC.setframerate(waveFileA.getframerate())
If you want to handle a.wav and b.wav having different settings, you'll want to use something like pysox to convert them to the same settings, or for nchannels and sampwidth you may be able to tough through it yourself.
Looks like you need to call n=waveFileA.getnchannels() to find out how many channels the first input file uses, likewise for waveFileB, then you'll need to use waveFileC.setnchannels(n) to tell it how many channels to put in the outgoing file. I don't know how it will handle input files with different numbers of channels...
Here is the answer I am looking for
How to join two wav files using python?
(look for a thread by Tom 10)
It's in another thread. some one already solved this problem.

Categories

Resources