Word2Vec error when loading in GoogleNews data

Word2Vec error when loading in GoogleNews data - python

I am following a tutorial here: https://towardsdatascience.com/multi-class-text-classification-model-comparison-and-selection-5eb066197568
I am at the part "Word2vec and Logistic Regression". I have downloaded the "GoogleNews-vectors-negative300.bin.gz" file and I am tyring to apply it to my own text data. However when I get to the following code:
%%time
from gensim.models import Word2Vec
wv = gensim.models.KeyedVectors.load_word2vec_format("/data/users/USERS/File_path/classifier/GoogleNews_Embedding/GoogleNews-vectors-negative300.bin.gz", binary=True)
wv.init_sims(replace=True)
I run into the following error:
/data/users/msmith/env/lib64/python3.6/site-packages/smart_open/smart_open_lib.py:398: UserWarning: This function is deprecated, use smart_open.open instead. See the migration notes for details: https://github.com/RaRe-Technologies/smart_open/blob/master/README.rst#migrating-to-the-new-open-function
'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
---------------------------------------------------------------------------
EOFError Traceback (most recent call last)
<timed exec> in <module>
~/env/lib64/python3.6/site-packages/gensim/models/keyedvectors.py in load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype)
1492 return _load_word2vec_format(
1493 cls, fname, fvocab=fvocab, binary=binary, encoding=encoding, unicode_errors=unicode_errors,
-> 1494 limit=limit, datatype=datatype)
1495
1496 def get_keras_embedding(self, train_embeddings=False):
~/env/lib64/python3.6/site-packages/gensim/models/utils_any2vec.py in _load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype)
383 with utils.ignore_deprecation_warning():
384 # TODO use frombuffer or something similar
--> 385 weights = fromstring(fin.read(binary_len), dtype=REAL).astype(datatype)
386 add_word(word, weights)
387 else:
/usr/lib64/python3.6/gzip.py in read(self, size)
274 import errno
275 raise OSError(errno.EBADF, "read() on write-only GzipFile object")
--> 276 return self._buffer.read(size)
277
278 def read1(self, size=-1):
/usr/lib64/python3.6/_compression.py in readinto(self, b)
66 def readinto(self, b):
67 with memoryview(b) as view, view.cast("B") as byte_view:
---> 68 data = self.read(len(byte_view))
69 byte_view[:len(data)] = data
70 return len(data)
/usr/lib64/python3.6/gzip.py in read(self, size)
480 break
481 if buf == b"":
--> 482 raise EOFError("Compressed file ended before the "
483 "end-of-stream marker was reached")
484
EOFError: Compressed file ended before the end-of-stream marker was reached
Any idea whats gone wrong/ how to overcome this issue?
Thanks in advance!

Related

NoBackendError while getting features of audio files using librosa

Im using jupyter notebook for executing this. The Whole program is available in this link https://github.com/MiteshPuthran/Speech-Emotion-Analyzer/blob/master/final_results_gender_test.ipynb
I tried using ffmpeg, tried using another .wav file, nothing seems to be working. please help.
This is the code :
df = pd.DataFrame(columns=['feature'])
bookmark=0
for index,y in enumerate(mylist):
if mylist[index][6:-16]!='01' and mylist[index][6:-16]!='07' and mylist[index][6:-16]!='08' and mylist[index][:2]!='su' and mylist[index][:1]!='n' and mylist[index][:1]!='d':
X, sample_rate = librosa.load('C:/Users/Admin/Desktop/pw-4/Speech-Emotion-Analyzer-master/Speech-Emotion-Analyzer-master/'+y, res_type='kaiser_fast',duration=2.5,sr=22050*2,offset=0.5)
sample_rate = np.array(sample_rate)
mfccs = np.mean(librosa.feature.mfcc(y=X,
sr=sample_rate,
n_mfcc=13),
axis=0)
feature = mfccs
#[float(i) for i in feature]
#feature1=feature[:135]
df.loc[bookmark] = [feature]
bookmark=bookmark+1
and this is the error im getting:
RuntimeError Traceback (most recent call last)
File ~\AppData\Roaming\Python\Python39\site-packages\librosa\core\audio.py:155, in load(path, sr, mono, offset, duration, dtype, res_type)
153 else:
154 # Otherwise, create the soundfile object
--> 155 context = sf.SoundFile(path)
157 with context as sf_desc:
File ~\AppData\Roaming\Python\Python39\site-packages\soundfile.py:629, in SoundFile.__init__(self, file, mode, samplerate, channels, subtype, endian, format, closefd)
627 self._info = _create_info_struct(file, mode, samplerate, channels,
628 format, subtype, endian)
--> 629 self._file = self._open(file, mode_int, closefd)
630 if set(mode).issuperset('r+') and self.seekable():
631 # Move write position to 0 (like in Python file objects)
File ~\AppData\Roaming\Python\Python39\site-packages\soundfile.py:1183, in SoundFile._open(self, file, mode_int, closefd)
1182 raise TypeError("Invalid file: {0!r}".format(self.name))
-> 1183 _error_check(_snd.sf_error(file_ptr),
1184 "Error opening {0!r}: ".format(self.name))
1185 if mode_int == _snd.SFM_WRITE:
1186 # Due to a bug in libsndfile version <= 1.0.25, frames != 0
1187 # when opening a named pipe in SFM_WRITE mode.
1188 # See http://github.com/erikd/libsndfile/issues/77.
File ~\AppData\Roaming\Python\Python39\site-packages\soundfile.py:1357, in _error_check(err, prefix)
1356 err_str = _snd.sf_error_number(err)
-> 1357 raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'C:/Users/Admin/Desktop/pw-4/Speech-Emotion-Analyzer-master/Speech-Emotion-Analyzer-master/AudioRecorder.ipynb': File contains data in an unknown format.
During handling of the above exception, another exception occurred:
NoBackendError Traceback (most recent call last)
Input In [46], in <cell line: 3>()
3 for index,y in enumerate(mylist):
4 if mylist[index][6:-16]!='01' and mylist[index][6:-16]!='07' and mylist[index][6:-16]!='08' and mylist[index][:2]!='su' and mylist[index][:1]!='n' and mylist[index][:1]!='d':
----> 5 X, sample_rate = librosa.load('C:/Users/Admin/Desktop/pw-4/Speech-Emotion-Analyzer-master/Speech-Emotion-Analyzer-master/'+y, res_type='kaiser_fast',duration=2.5,sr=22050*2,offset=0.5)
6 sample_rate = np.array(sample_rate)
7 mfccs = np.mean(librosa.feature.mfcc(y=X,
8 sr=sample_rate,
9 n_mfcc=13),
10 axis=0)
File ~\AppData\Roaming\Python\Python39\site-packages\librosa\util\decorators.py:88, in deprecate_positional_args.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
86 extra_args = len(args) - len(all_args)
87 if extra_args <= 0:
---> 88 return f(*args, **kwargs)
90 # extra_args > 0
91 args_msg = [
92 "{}={}".format(name, arg)
93 for name, arg in zip(kwonly_args[:extra_args], args[-extra_args:])
94 ]
File ~\AppData\Roaming\Python\Python39\site-packages\librosa\core\audio.py:174, in load(path, sr, mono, offset, duration, dtype, res_type)
172 if isinstance(path, (str, pathlib.PurePath)):
173 warnings.warn("PySoundFile failed. Trying audioread instead.", stacklevel=2)
--> 174 y, sr_native = __audioread_load(path, offset, duration, dtype)
175 else:
176 raise (exc)
File ~\AppData\Roaming\Python\Python39\site-packages\librosa\core\audio.py:198, in __audioread_load(path, offset, duration, dtype)
192 """Load an audio buffer using audioread.
193
194 This loads one block at a time, and then concatenates the results.
195 """
197 y = []
--> 198 with audioread.audio_open(path) as input_file:
199 sr_native = input_file.samplerate
200 n_channels = input_file.channels
File ~\AppData\Roaming\Python\Python39\site-packages\audioread\__init__.py:116, in audio_open(path, backends)
113 pass
115 # All backends failed!
--> 116 raise NoBackendError()
NoBackendError:

Error opening file because of an unknown format

I read some Audio file, labeled them, and together with their path, save the path and emotion of each Audioo file in a csv file. Now I want to read their path from the file and open them but I get this Error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
File ~\.conda\envs\nhashemi\lib\site-packages\librosa\core\audio.py:155, in load(path, sr, mono, offset, duration, dtype, res_type)
153 else:
154 # Otherwise, create the soundfile object
--> 155 context = sf.SoundFile(path)
157 with context as sf_desc:
File ~\.conda\envs\nhashemi\lib\site-packages\soundfile.py:629, in SoundFile.__init__(self, file, mode, samplerate, channels, subtype, endian, format, closefd)
627 self._info = _create_info_struct(file, mode, samplerate, channels,
628 format, subtype, endian)
--> 629 self._file = self._open(file, mode_int, closefd)
630 if set(mode).issuperset('r+') and self.seekable():
631 # Move write position to 0 (like in Python file objects)
File ~\.conda\envs\nhashemi\lib\site-packages\soundfile.py:1183, in SoundFile._open(self, file, mode_int, closefd)
1182 raise TypeError("Invalid file: {0!r}".format(self.name))
-> 1183 _error_check(_snd.sf_error(file_ptr),
1184 "Error opening {0!r}: ".format(self.name))
1185 if mode_int == _snd.SFM_WRITE:
1186 # Due to a bug in libsndfile version <= 1.0.25, frames != 0
1187 # when opening a named pipe in SFM_WRITE mode.
1188 # See http://github.com/erikd/libsndfile/issues/77.
File ~\.conda\envs\nhashemi\lib\site-packages\soundfile.py:1357, in _error_check(err, prefix)
1356 err_str = _snd.sf_error_number(err)
-> 1357 raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'C:/Users/external_dipf/Documents/Dataset/CREMA/AudioWAV/1001_IEO_FEA_HI.wav': File contains data in an unknown format.
During handling of the above exception, another exception occurred:
NoBackendError Traceback (most recent call last)
Input In [553], in <cell line: 3>()
1 emotion='fear'
2 path = np.array(data_path.Path[data_path.Emotions==emotion])[1]
----> 3 data, sampling_rate = librosa.load(path)
4 create_waveplot(data, sampling_rate, emotion)
5 create_spectrogram(data, sampling_rate, emotion)
File ~\.conda\envs\nhashemi\lib\site-packages\librosa\util\decorators.py:88, in deprecate_positional_args.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
86 extra_args = len(args) - len(all_args)
87 if extra_args <= 0:
---> 88 return f(*args, **kwargs)
90 # extra_args > 0
91 args_msg = [
92 "{}={}".format(name, arg)
93 for name, arg in zip(kwonly_args[:extra_args], args[-extra_args:])
94 ]
File ~\.conda\envs\nhashemi\lib\site-packages\librosa\core\audio.py:174, in load(path, sr, mono, offset, duration, dtype, res_type)
172 if isinstance(path, (str, pathlib.PurePath)):
173 warnings.warn("PySoundFile failed. Trying audioread instead.", stacklevel=2)
--> 174 y, sr_native = __audioread_load(path, offset, duration, dtype)
175 else:
176 raise (exc)
File ~\.conda\envs\nhashemi\lib\site-packages\librosa\core\audio.py:198, in __audioread_load(path, offset, duration, dtype)
192 """Load an audio buffer using audioread.
193
194 This loads one block at a time, and then concatenates the results.
195 """
197 y = []
--> 198 with audioread.audio_open(path) as input_file:
199 sr_native = input_file.samplerate
200 n_channels = input_file.channels
File ~\.conda\envs\nhashemi\lib\site-packages\audioread\__init__.py:116, in audio_open(path, backends)
113 pass
115 # All backends failed!
--> 116 raise NoBackendError()
NoBackendError:
Here is my code to label and specify the label (emotion) of each file
CREMA ="C:/Users/external_dipf/Documents/Dataset/CREMA/AudioWAV/"
crema_directory_list = os.listdir(CREMA)
file_emotion = []
file_path = []
for file in crema_directory_list:
# storing file paths
file_path.append(CREMA + file)
# storing file emotions
part=file.split('_')
if part[2] == 'SAD':
file_emotion.append('sad')
elif part[2] == 'ANG':
file_emotion.append('angry')
elif part[2] == 'DIS':
file_emotion.append('disgust')
elif part[2] == 'FEA':
file_emotion.append('fear')
elif part[2] == 'HAP':
file_emotion.append('happy')
elif part[2] == 'NEU':
file_emotion.append('neutral')
else:
file_emotion.append('Unknown')
# dataframe for emotion of files
emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])
# dataframe for path of files.
path_df = pd.DataFrame(file_path, columns=['Path'])
CREMA_df = pd.concat([emotion_df, path_df], axis=1)
CREMA_df.head()
Here is were I save them in a CSV file
data_path = pd.concat([CREMA_df, RAVDESS_df, TESS_df, SAVEE_df], axis = 0)
data_path.to_csv("data_path.csv",index=False)
data_path.head()
and here I am trying to read the file. The error is related to the CREMA dataset.
emotion='fear'
path = np.array(data_path.Path[data_path.Emotions==emotion])[1]
data, sampling_rate = librosa.load(path)
create_waveplot(data, sampling_rate, emotion)
create_spectrogram(data, sampling_rate, emotion)
Audio(path)
I checked the path file, everything was correct. I can open other wav files. My librosa version is 0.9.1

I can't seem to find a fix for the "ValueError: Unknown subheader signature" raised while reading sas file using pd.read_sas?

I am trying to load a sasbdat file in python using pd.read_sas() and I fail to load the data due to the below error.
ValueError Traceback (most recent call last)
<ipython-input-148-64f915da8256> in <module>
----> 1 df_sas = pd.read_sas('input_sasfile.sas7bdat', format='sas7bdat')
~\.conda\envs\overloaded-new\lib\site-packages\pandas\io\sas\sasreader.py in read_sas(filepath_or_buffer, format, index, encoding, chunksize, iterator)
121
122 reader = SAS7BDATReader(
--> 123 filepath_or_buffer, index=index, encoding=encoding, chunksize=chunksize
124 )
125 else:
~\.conda\envs\overloaded-new\lib\site-packages\pandas\io\sas\sas7bdat.py in __init__(self, path_or_buf, index, convert_dates, blank_missing, chunksize, encoding, convert_text, convert_header_text)
144
145 self._get_properties()
--> 146 self._parse_metadata()
147
148 def column_data_lengths(self):
~\.conda\envs\overloaded-new\lib\site-packages\pandas\io\sas\sas7bdat.py in _parse_metadata(self)
349 self.close()
350 raise ValueError("Failed to read a meta data page from the SAS file.")
--> 351 done = self._process_page_meta()
352
353 def _process_page_meta(self):
~\.conda\envs\overloaded-new\lib\site-packages\pandas\io\sas\sas7bdat.py in _process_page_meta(self)
355 pt = [const.page_meta_type, const.page_amd_type] + const.page_mix_types
356 if self._current_page_type in pt:
--> 357 self._process_page_metadata()
358 is_data_page = self._current_page_type & const.page_data_type
359 is_mix_page = self._current_page_type in const.page_mix_types
~\.conda\envs\overloaded-new\lib\site-packages\pandas\io\sas\sas7bdat.py in _process_page_metadata(self)
388 subheader_signature = self._read_subheader_signature(pointer.offset)
389 subheader_index = self._get_subheader_index(
--> 390 subheader_signature, pointer.compression, pointer.ptype
391 )
392 self._process_subheader(subheader_index, pointer)
~\.conda\envs\overloaded-new\lib\site-packages\pandas\io\sas\sas7bdat.py in _get_subheader_index(self, signature, compression, ptype)
401 else:
402 self.close()
--> 403 raise ValueError("Unknown subheader signature")
404 return index
405
ValueError: Unknown subheader signature
Though I found relevant github issue (https://github.com/pandas-dev/pandas/issues/24794), but it was closed because the issue got resolved by updating the pandas.
Any help is greatly appreciated.

How can I process OPUS format with Librosa?

I am trying to generate spectrograms by using Librosa. When I was working with the .wav format file it was working fine. But I changed the format to OPUS audio codec and tried to run the same file, it give me below error.
X, sample_rate = librosa.load('TESS emotion datasets opus/OAF_Fear/OAF_beg_fear.opus', res_type='kaiser_fast', duration = 2.5, sr = 22050*2, offset = 0.5)
Error generated:
RuntimeError Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/librosa/core/audio.py in load(path, sr, mono, offset, duration, dtype, res_type)
145 try:
--> 146 with sf.SoundFile(path) as sf_desc:
147 sr_native = sf_desc.samplerate
~/anaconda3/lib/python3.6/site-packages/soundfile.py in __init__(self, file, mode, samplerate, channels, subtype, endian, format, closefd)
628 format, subtype, endian)
--> 629 self._file = self._open(file, mode_int, closefd)
630 if set(mode).issuperset('r+') and self.seekable():
~/anaconda3/lib/python3.6/site-packages/soundfile.py in _open(self, file, mode_int, closefd)
1183 _error_check(_snd.sf_error(file_ptr),
-> 1184 "Error opening {0!r}: ".format(self.name))
1185 if mode_int == _snd.SFM_WRITE:
~/anaconda3/lib/python3.6/site-packages/soundfile.py in _error_check(err, prefix)
1356 err_str = _snd.sf_error_number(err)
-> 1357 raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
1358
RuntimeError: Error opening 'TESS emotion datasets opus/OAF_Fear/OAF_beg_fear.opus': File contains data in an unimplemented format.
During handling of the above exception, another exception occurred:
NoBackendError Traceback (most recent call last)
<ipython-input-39-1372f02f676e> in <module>()
----> 1 X, sample_rate = librosa.load('TESS emotion datasets opus/OAF_Fear/OAF_beg_fear.opus', res_type='kaiser_fast', duration = 2.5, sr = 22050*2, offset = 0.5)
~/anaconda3/lib/python3.6/site-packages/librosa/core/audio.py in load(path, sr, mono, offset, duration, dtype, res_type)
161 if isinstance(path, (str, pathlib.PurePath)):
162 warnings.warn("PySoundFile failed. Trying audioread instead.")
--> 163 y, sr_native = __audioread_load(path, offset, duration, dtype)
164 else:
165 raise (exc)
~/anaconda3/lib/python3.6/site-packages/librosa/core/audio.py in __audioread_load(path, offset, duration, dtype)
185
186 y = []
--> 187 with audioread.audio_open(path) as input_file:
188 sr_native = input_file.samplerate
189 n_channels = input_file.channels
~/anaconda3/lib/python3.6/site-packages/audioread/__init__.py in audio_open(path, backends)
114
115 # All backends failed!
--> 116 raise NoBackendError()
NoBackendError:
I tried to install ffmpeg and gstreamer as suggested by some previous answers and github page of Librosa. But it didn't solve the problem.
On the contrary, this audio format works well when I run the same code in Google Colab.
What can be the reason of this error? How to solve it?

OSError: [Errno 5] Input/output error when using Google Colaboratory

I was working just fine with Google Colaboratory and suddenly this error started to pop up each time I try to load any type of file. It first started when I was trying to read an hdf file, now everything won't open.
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-8-65c4e8d1c435> in <module>()
----> 1 eo=EOPatch.load('./20.Clean_Textural_features/eopatch_0')
2 eo
9 frames
/usr/local/lib/python3.6/dist-packages/eolearn/core/eodata.py in load(path, features, lazy_loading, filesystem)
530 path = '/'
531
--> 532 return load_eopatch(EOPatch(), filesystem, path, features=features, lazy_loading=lazy_loading)
533
534 def time_series(self, ref_date=None, scale_time=1):
/usr/local/lib/python3.6/dist-packages/eolearn/core/eodata_io.py in load_eopatch(eopatch, filesystem, patch_location, features, lazy_loading)
76 loading_data = executor.map(lambda loader: loader.load(), loading_data)
77
---> 78 for (ftype, fname, _), value in zip(features, loading_data):
79 eopatch[(ftype, fname)] = value
80
/usr/lib/python3.6/concurrent/futures/_base.py in result_iterator()
584 # Careful not to keep a reference to the popped future
585 if timeout is None:
--> 586 yield fs.pop().result()
587 else:
588 yield fs.pop().result(end_time - time.monotonic())
/usr/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
423 raise CancelledError()
424 elif self._state == FINISHED:
--> 425 return self.__get_result()
426
427 self._condition.wait(timeout)
/usr/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
/usr/lib/python3.6/concurrent/futures/thread.py in run(self)
54
55 try:
---> 56 result = self.fn(*self.args, **self.kwargs)
57 except BaseException as exc:
58 self.future.set_exception(exc)
/usr/local/lib/python3.6/dist-packages/eolearn/core/eodata_io.py in <lambda>(loader)
74 if not lazy_loading:
75 with concurrent.futures.ThreadPoolExecutor() as executor:
---> 76 loading_data = executor.map(lambda loader: loader.load(), loading_data)
77
78 for (ftype, fname, _), value in zip(features, loading_data):
/usr/local/lib/python3.6/dist-packages/eolearn/core/eodata_io.py in load(self)
217 return self._decode(gzip_fp, self.path)
218
--> 219 return self._decode(file_handle, self.path)
220
221 def save(self, data, file_format, compress_level=0):
/usr/local/lib/python3.6/dist-packages/eolearn/core/eodata_io.py in _decode(file, path)
268
269 if FileFormat.NPY.extension() in path:
--> 270 return np.load(file)
271
272 raise ValueError('Unsupported data type.')
/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
434 _ZIP_SUFFIX = b'PK\x05\x06' # empty zip files start with this
435 N = len(format.MAGIC_PREFIX)
--> 436 magic = fid.read(N)
437 # If the file size is less than N, we need to make sure not
438 # to seek past the beginning of the file
OSError: [Errno 5] Input/output error
Also some notebooks won't open and this appears instead:
I looked around at similar posts here, but didn't understand anything. Therefore, any help would be highly appreciated.
PS: my files are in subfolders and not directly contained in 'My Drive'. I have a lso disabled all the adblocks ad the problem persists...

I think the file/link has been used for downloading beyond its weekly limit. This answer may help you.
Some discussion about Google's policy on hosting data on drive.
The solution is to wait for couple of hours/days and try again.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Word2Vec error when loading in GoogleNews data - python

Related

NoBackendError while getting features of audio files using librosa

Error opening file because of an unknown format

I can't seem to find a fix for the "ValueError: Unknown subheader signature" raised while reading sas file using pd.read_sas?

How can I process OPUS format with Librosa?

OSError: [Errno 5] Input/output error when using Google Colaboratory

Categories

Resources