Python nltk download and download_shell both freeze (hang) on punkt attempt - python

Using NLTK 2.0.4. installed for EPD's Python-2.7.3 (not Canopy). on Ubuntu 12.10. In the terminal I type:
In [96]: nltk.download_shell()
NLTK Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> d
Download which package (l=list; x=cancel)?
Identifier> punkt
Downloading package 'punkt' to /home/espears/nltk_data...
And then it freezes. The relevant punkt.zip file is written to the stated directory, but the download interface never relinquishes.
This example is with IPython, but I tried the same with the regular Python 2.7.3 interpreter and got the same result.
When I try to use unzip to unzip the file directly, I see errors saying that the proper central zip-file code is not found within the file and that it cannot be unzipped. See below:
espears#computer ~/nltk_data/tokenizers $ unzip punkt.zip
Archive: punkt.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of punkt.zip or
punkt.zip.zip, and cannot find punkt.zip.ZIP, period.
This happens with both nltk.download() and nltk.download_shell() in the same way.
I can inspect the .zip file using du to see that initially its size grows from 0 MB to about 2.7 MB, so it is actually downloading something and the file is not empty. But it stops at 2.7 MB (which may or may not correspond to the expected full size of the file) and then the Python shell downloader freezes.

I had the same problem and downloaded the necessary items manually from the following link:
http://nltk.org/nltk_data/
Not the desired solution, but will work until this is fixed.
UPDATE:
I was actually able to run nltk.download() to install cmudict. Maybe this issue only affects certain packages?

I had the same problem with nltk 3.0.01b. I downloaded the "book" package and monitored the download from the task manager's network display while at the same time checking the size of the target folder (AppData\Roaming\nltk_data on my Windows 7 system). The network traffic ceased and the folder stopped growing at a size of 379 MB. But the Python shell was locked. The following was the last message displayed:
showing info http://nltk.github.com/nltk_data/
However, if you cancel out the Tk window that shows what download items are available, the nltk.download() command will terminate and the shell prompt will come back.

Most probably it is not stuck. It may be downloading. It downloads at much slower rate even if you have good internet connectivity. I kept checking the folder size using a while loop and it slowly kept on increasing and it was successful finally. It would have worked if you waited. Unzipping might have failed because you tried to unzip before entire file downloaded.

Related

EMFILE: Too many files open

I'm trying to set up an api on azure's web app service using bottle + anaconda packages.
I can't simply use a copy of the site-packages folder because numpy is involved. Instead, in addition to the site-packages folder I must also give numpy access to the mkl binaries. So I copy the Anaconda\envs\{ENV_NAME}\Library\bin folder into the app and add it to %PATH%. That folder has less than 200 files in it, so I'm surprised seeing the following error during the deployment:
2020-10-29T04:34:21.3218237Z ##[error]Error: EMFILE: too many open files, open 'D:\a\_temp\temp_web_package_058969368946595324\site-packages\statsmodels\tsa\arima\datasets\__init__.py'
Everything builds and runs as long as I don't include the bin folder to %PATH%
No, I'm not close to my file size limit on the azure web app service. Has anyone run into this before?
This error happens because of the XDT Transform.
During an XDT Transform, all contents of the original package are transformed and then zipped up. This error is thrown if the deployment is significantly large.

Downloading zip folder/file from google drive from shared with me folder

I have been provided with access to a zip file/folder which is stored in my google drive and inside "Shared with me".
How can I download it to my laptop through terminal using "wget" or python or anything related.
The url for the whole folder within which it is contained goes like, https://drive.google.com/drive/folders/13cx4SBFLTX8CqIqjjec9-pcadGaJ0kNj
and the shareable link to the zip file is https://drive.google.com/open?id=1PMJEk3hT-_ziNhSPkU9BllLYASLzN7TL.
Since the files are 12GB in size in total, downloading them by clicking is quite tiresome when working with Jupyter notebook.
download the whole folder
!pip uninstall --yes gdown # After running this line, restart Colab runtime.
!pip install gdown -U --no-cache-dir
import gdown
url = r'https://drive.google.com/drive/folders/1sWD6urkwyZo8ZyZBJoJw40eKK0jDNEni'
gdown.download_folder(url)
You can check my answer here (Updated March 2018):
https://stackoverflow.com/a/49444877/4043524
There is one little issue with your case. In your situation, the format of the URL is different from the one mentioned in the link given above.
But, don't worry, you just have to copy the ID (a random-looking string in front of "id" key in the URL) and replace the FILEIDENTIFIER in the script with it.

AWS CLI for S3 - Errno 2 No such file or directory: u'

I'm trying to extract a few hundred GB of data from an S3 bucket to a Windows 10 computer's external hard drive and the command I'm using (in a .bat file with AWSCLI-64bit installed) is:
aws s3 sync s3://aws-extraction/ F:\Bowral-PE\ --delete --region ap-southeast-2 > "%SyncLogFile%"
For the most part it seems to be working because files seem to be downloading and directories seem to be getting created. So far there's 82GB of data downloaded, but every single line says in the cmd verbose output shows:
download failed: s3://aws-extraction/Archive/somedirectory/ to
F:\Bowral-PE\Archive\somedirectory Errno 2 No such file or directory:
u'F:\Bowral-PE\Archive\somedirectory '
Any ideas why I'm getting this and how to stop it?
Please note that the AWSCLI seems to be using Python or something similar (a guess based on results when searching for the error), and I am a complete newbie when it comes to Python.
I hit the similar problem when using the msi version of latest awscli on my windows 10 today:
C:>aws s3 sync s3://<my_bucket_name>/ <local_dir>
fatal error [Errno 2] No such file or directory
The above sync command works if I backup the bucket content to a unix machine.
After some googling, seems the "fatal error [Errno 2] No such file or directory" is coming out from python.
I realised the awscli's windows msi version uses the self contained python (v2.7), which may be too old for windows 10.
I then installed the latest python (v3.6), used pip way to install awscli following the instructions here.
After this the direct backup / restoration works as expected.
For me, I was getting this notification at the same time:
By clicking on it and allowing Python to access the folder I was trying to save it in, I was able to rerun the command and it succeeded.
I had the same issue, and after a lot of debugging, I enabled Windows long paths, and the issue went away.
Here's how I did it:
The registry key Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\LongPathsEnabled (Type: REG_DWORD) must exist and be set to 1. The key's value will be cached by the system (per process) after the first call to an affected Win32 file or directory function (see below for the list of functions). The registry key will not be reloaded during the lifetime of the process. In order for all apps on the system to recognize the value of the key, a reboot might be required because some processes may have started before the key was set.

Error packaging program in py2app

I have a program written in python 3.3 that I'd like to be able to distribute without the need for users to install python or any additional modules. I was able to successfully package this program using cx_Freeze on Windows, but the same script on OS X produced an app that wouldn't launch.
I thought I might have better luck using py2app, but now I'm running into a strange problem. The program opens (it has a GUI built with tkinter) and runs flawlessly when built in Alias mode. When I attempt to construct a final build, however, I get the following message in Terminal:
error: No such file or directory: /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/setuptools-2.1-py3.3.egg/_markerlib/__init__.pyc
I navigated to that directory and found a .egg file that I'm unable to open or extract. I've tried reinstalling setup tools and well as python itself with no luck. Has anyone experienced this problem?
It looks like the problem is that your setuptools is somehow broken.
To open the egg file, I tried downloading a third party tool, which crashed, and renaming it as a .zip, which failed. If I just double click on it, I get the "choose default application" popup.
Double-clicking it relies on the extension to decide what app to launch.
The best way to check whether something is a valid zip file is to use the unzip tool from the command line. For example:
$ unzip -t setuptools.egg
This will check all of the zip headers, and check the CRC of all files in the archive, and report any errors. Or, if it's not a zip at all, it'll report one error right at the start.
You can also use the file command to do a quick check to see whether it's some well-known type of file. If file /path/to/setuptools-whatever.egg just says "data" instead of "Zip archive data", then it's probably corrupted beyond recognition.
Anyway, assuming your setuptools didn't come with your Python installation (if you're using a python.org binary installer, it didn't), the safest thing to do is uninstall it, then reinstall it cleanly.
The reason it's important to uninstall first is that the current version will, by default, not install a .egg archive, but will instead install a normal unzipped package and egg-info directory, meaning it may not overwrite the old, broken copy.
The documentation covers uninstalling. Just delete the setuptools .egg file, and anything else named setuptools*, from your site-packages (and anywhere else on your sys.path). If you have distribute there as well, kill that too. This will leave a few files sitting around in other places (notably easy_install-3.3 somewhere on your PATH), but they'll get overwritten properly by the installation, so that's OK.
To install, just follow the usual instructions to reinstall it:
$ wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | python
… or, if you don't have write access to site-packages:
$ wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | sudo python
If you use pip, you may want to reinstall it after reinstalling setuptools, and then pip install -U setuptools pip just to make sure you have the latest versions—and to verify that everything is now working.

How to convert MP3 to WAV in Python

If I have an MP3 file how can I convert it to a WAV file? (preferably, using a pure python approach)
I maintain an open source library, pydub, which can help you out with that.
from pydub import AudioSegment
sound = AudioSegment.from_mp3("/path/to/file.mp3")
sound.export("/output/path/file.wav", format="wav")
One caveat: it uses ffmpeg to handle audio format conversions (except for wav files, which python handles natively).
note: you probably shouldn't do this conversion on GAE :/ even if it did support ffmpeg. EC2 would be a good match for the job though
This is working for me:
import subprocess
subprocess.call(['ffmpeg', '-i', 'audio.mp3',
'audio.wav'])
I think I am right person to answer this question because I am student who tried hard to get answer for this question. I am giving answer for Windows users but I think this may work with MAC OS too. But apt for windows.
Lets discuss answers in steps:
first check for pydub and ffmpeg package. If you computer dont have these packages then install pydub in you command prompt
pip install pydub
Next and imp thing is ffmpeg package which converts images to different formats. For this you should manually install this package. Let me give you reason why when we can use pip for installing package. First pip installs the package but it will not stores the path to the system. So computer cant recognize this package path. For this I suggest you to install manually but how.... dont worry will give you steps.
STEP 1:
#Present link
This first link that you have paste it in google
https://www.gyan.dev/ffmpeg/builds/ffmpeg-release-essentials.zip
#Use for future students
But people will have question now this link might work what about future. For that simple answer is
https://www.gyan.dev/ffmpeg/builds/
After typing this in google go to releases and download zip folder always don't download 7.zip.But thus is only when my first link will not work for future is any student search for answer.
STEP 2: After downloading the zip file from first step first link. Now make a folder in C drive. For this just click on my My PC, then OS(C:),make a new folder. Copy paste the zip file downloaded to this folder. Extract the zip file in this new folder. Now go into the folder and copy path of "bin" present in this folder from properties.
STEP 3:This is final step and imp one where you will set path. In search bar in your laptop search for "Edit the system environmental variables". Then click on "environmental variables" at bottom for path. Here they are two parts in screen system variables and user variables. Now you have to search for path "Path" in system variable is you want to use for whole system. Double click on "Path" in system variables. A window appears where you have to choose "New". Here copy paste the path of bin folder. Then click on Ok in all and close all tabs.
Step 4:Check for correct installation of ffmpeg. In command prompt type ffmpeg now you will get the list of paths and its features. This shows you have finished your installation.
Step 5 : Download a mp3 file. If your have downloaded python then open IDLE prompt. The click on new in File a note pad appears. One imp point to remember here is copy paste the mp3 file where you python code is stored. Example If I want to save the python file in Desktop the mp3 file should be stored in desktop. I think you go an idea. Now copy paste the code which I am using
import subprocess
subprocess.call(['ffmpeg', '-i', 'ind.mp3','ind1.wav'])
then click on run module
you will get the conversion.
Thank you
This answer might help you. If you want code and method for converting speech to text code and method you can post me. I wish this answer for 10 min may save you hours.
https://www.youtube.com/watch?v=vBb_eYThfRQ
use this video for path configuration or step 3 for reference but copy path to system variables not user because whole system can use this package then. If my language is bad don't mind I think it is understandable.
Install the module pydub. This is an audio manipulation module for Python. This module can open many multimedia audio and video formats. You can install this module with pip.
pip install pydub
If you have not installed ffmpeg yet, install it. You can use your package manager to do that.
For Ubuntu / Debian Linux:
apt-get install ffmpeg
When ready, execute the below code:
from os import path
from pydub import AudioSegment
# files
src = "transcript.mp3"
dst = "test.wav"
# convert wav to mp3
sound = AudioSegment.from_mp3(src)
sound.export(dst, format="wav")
Check this link for details.
For Those using windows 7 and above:
Step 1: This link will help you install ffmpeg:
How to Install FFMPEG on Windows
Step 2: This code will help you convert multiple files from one format to another ( which of course is supported by ffmpeg)
import os
import subprocess
input_dir = r'C:\\Path\\To\\Your\\Input\\Directory\\'
output_dir = r'C:\\Path\\To\\Your\\Output\\Directory\\'
path_to_ffmpeg_exe = r'C:\\Path\\To\\ffmpeg-2022-YY-MM-git-blabla-full_build\\bin\\ffmpeg.exe'
files_list = []
for path in os.listdir(input_dir):
if os.path.isfile(os.path.join(input_dir, path)):
files_list.append(path)
for file_nm in files_list:
print(file_nm)
subprocess.call([path_to_ffmpeg_exe, '-i', os.path.join(input_dir, file_nm), os.path.join(output_dir, str(file_nm.split(".")[0] + ".wav"))])

Categories

Resources