NLTK panlex_lite giving me error

NLTK panlex_lite giving me error - python

I'm trying to use NLTK for my NLP learning in Python.
Certain package called "panlex_lite" keeps giving me error so I tried using the following:
import nltk
nltk.download('all', halt_on_error = False)
and it gives me the following error:
[nltk_data] | Downloading package panlex_lite to
[nltk_data] | /Users/Harshil/nltk_data...
[nltk_data] | Unzipping corpora/panlex_lite.zip.
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
nltk.download('all', halt_on_error = False)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 664, in download
for msg in self.incr_download(info_or_id, download_dir, force):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 543, in incr_download
for msg in self.incr_download(info.children, download_dir, force):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 529, in incr_download
for msg in self._download_list(info_or_id, download_dir, force):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 572, in _download_list
for msg in self.incr_download(item, download_dir, force):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 549, in incr_download
for msg in self._download_package(info, download_dir, force):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 638, in _download_package
for msg in _unzip_iter(filepath, zipdir, verbose=False):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 2039, in _unzip_iter
outfile.write(contents)
OSError: [Errno 22] Invalid argument
Anyway to fix this? I've tried using "halt_on_error = False" method but it still gives me error.
Thanks.

Here's a "dirty" hack:
$ rm /Users/Harshil/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/Harshil/nltk_data/corpora/panlex_lite
$ python
>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('all')
Also, try earthy:
pip install earthy
TL;DR:
import earthy
path_to_nltk_data = '/home/yourusername/nltk_data/'
earthy.download('all', path_to_nltk_data) # Excludes the third party (non-NLTK) packages.
To download panlex_lite exclusively:
import earthy
earthy.download('panlex_lite', path_to_nltk_data)
To download all third-party datasets not natively hosted on nltk_data github:
import earthy
earthy.download('third_party', path_to_nltk_data')

Related

Using WordNet with PyScript

I'm trying to use WordNet within PyScript but I can't seem to properly load Wordnet.
At first I tried:
<py-env>
- nltk
</py-env>
<py-script>
import nltk
from nltk.corpus import wordnet as wn
<py-script>
This gave me a LookupError(resource_not_found), along with the message
Please use the NLTK Downloader to obtain the resource: [31m>>> import nltk >>> nltk.download('wordnet')
I then tried:
<py-script>
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wn
<py-script>
which gave me this message in the console:
writing to py-3f0adca1-a38a-4161-c36f-7e6548260aa5 [nltk_data] Error loading wordnet: <urlopen error unknown url type:
[nltk_data] https> true
I looked at the responses here: Pyodide filesystem for NLTK resources : missing files
and tried to replicate their code
from js import fetch
from pathlib import Path
import asyncio, os, sys, io, zipfile
response = await fetch('https://github.com/nltk/wordnet/archive/refs/heads/master.zip')
js_buffer = await response.arrayBuffer()
py_buffer = js_buffer.to_py() # this is a memoryview
stream = py_buffer.tobytes() # now we have a bytes object
d = Path("/nltk/wordnet")
d.mkdir(parents=True, exist_ok=True)
Path('/nltk/wordnet/master.zip').write_bytes(stream)
zipfile.ZipFile('/nltk/wordnet/master.zip').extractall(
path='/nltk/wordnet/'
)
This is the error message that I got:
APPENDING: True ==> py-2880055f-8922-cb23-34e4-db404fb1d7a4 --> PythonError: Traceback (most recent call last):
File "/lib/python3.10/asyncio/futures.py", line 201, in result
raise self._exception
File "/lib/python3.10/asyncio/tasks.py", line 232, in __step
result = coro.send(None)
File "/lib/python3.10/site-packages/_pyodide/_base.py", line 500, in eval_code_async
await CodeRunner(
File "/lib/python3.10/site-packages/_pyodide/_base.py", line 353, in run_async
await coroutine
File "<exec>", line 21, in
File "/lib/python3.10/zipfile.py", line 1258, in init
self._RealGetContents()
File "/lib/python3.10/zipfile.py", line 1325, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
What am I doing wrong? Thanks!
UPDATE:
I tried installing the wn library from PyPi using
await micropip.install('https://files.pythonhosted.org/packages/ce/f1/53b07100f5c3d41fd33fc78ebb9e99d736b0460ced8acff94840311ffc60/wn-0.9.1-py3-none-any.whl')
But I get the error:
JsException(PythonError: Traceback (most recent call last): File "/lib/python3.10/asyncio/futures.py", line 201, in result raise self._exception File "/lib/python3.10/asyncio/tasks.py", line 232, in __step result = coro.send(None) File "/lib/python3.10/site-packages/_pyodide/_base.py", line 500, in eval_code_async await CodeRunner( File "/lib/python3.10/site-packages/_pyodide/_base.py", line 353, in run_async await coroutine File "", line 14, in File "/lib/python3.10/site-packages/wn/init.py", line 47, in from wn._add import add, remove File "/lib/python3.10/site-packages/wn/_add.py", line 21, in from wn.project import iterpackages File "/lib/python3.10/site-packages/wn/project.py", line 12, in import lzma File "/lib/python3.10/lzma.py", line 27, in from _lzma import * ModuleNotFoundError: No module named '_lzma' )

pytube.exceptions.RegexMatchError: get_throttling_function_name: could not find match for multiple

I used to download songs the following way:
from pytube import YouTube
video = YouTube('https://www.youtube.com/watch?v=AWXvSBHB210')
video.streams.get_by_itag(251).download()
Since today there is this error:
Traceback (most recent call last):
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 170, in fmt_streams
extract.apply_signature(stream_manifest, self.vid_info, self.js)
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\extract.py", line 409, in apply_signature
cipher = Cipher(js=js)
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\cipher.py", line 43, in __init__
self.throttling_plan = get_throttling_plan(js)
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\cipher.py", line 387, in get_throttling_plan
raw_code = get_throttling_function_code(js)
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\cipher.py", line 293, in get_throttling_function_code
name = re.escape(get_throttling_function_name(js))
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\cipher.py", line 278, in get_throttling_function_name
raise RegexMatchError(
pytube.exceptions.RegexMatchError: get_throttling_function_name: could not find match for multiple
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Me\Documents\YouTubeDownloader.py", line 3, in <module>
video.streams.get_by_itag(251).download()
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 285, in streams
return StreamQuery(self.fmt_streams)
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 177, in fmt_streams
extract.apply_signature(stream_manifest, self.vid_info, self.js)
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\extract.py", line 409, in apply_signature
cipher = Cipher(js=js)
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\cipher.py", line 43, in __init__
self.throttling_plan = get_throttling_plan(js)
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\cipher.py", line 387, in get_throttling_plan
raw_code = get_throttling_function_code(js)
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\cipher.py", line 293, in get_throttling_function_code
name = re.escape(get_throttling_function_name(js))
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\cipher.py", line 278, in get_throttling_function_name
raise RegexMatchError(
pytube.exceptions.RegexMatchError: get_throttling_function_name: could not find match for multiple

Because youtube changed something on its end, and now you have to change pytube's cipher.py's get_throttling_function_name variable function_patterns to the following
r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&\s*'
r'\([a-z]\s*=\s*([a-zA-Z0-9$]{2,3})(\[\d+\])?\([a-z]\)'
And you also have to change line 288 to this:
nfunc=re.escape(function_match.group(1))),
You'll have to use this workaround until pytube officially releases a fix.

I had same issue when i was using pytube 11.0.0
so found out that there is a regular expression filter mismatch in pytube library in cipher.py class
function_patterns = [
r'a\.C&&\(b=a\.get\("n"\)\)&&\(b=([^(]+)\(b\),a\.set\("n",b\)\)}};',
]
Now there is a update of pytube code yesterday to 11.0.1
function_patterns = [
r'a\.[A-Z]&&\(b=a\.get\("n"\)\)&&\(b=([^(]+)\(b\)',
]
With this code update now downloading youtube video with pytube works!!!
Update your pytube library with this command:
python3 -m pip install --upgrade pytube

The updated regex expression:
r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&\s*'
r'\([a-z]\s*=\s*([a-zA-Z0-9$]{2,3})(\[\d+\])?\([a-z]\)'
in the answer above may be parsed incorrectly by pycharm if just copy/pasted from web. To fix, try merging the two strings onto one line:
r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&\s*\([a-z]\s*=\s*([a-zA-Z0-9$]{2,3})(\[\d+\])?\([a-z]\)'
I found this fixed the problem for V12.0.0

You can use yt-dlp: https://github.com/yt-dlp/yt-dlp
!pip install -U yt-dlp
Then for your video (mp4 + 1080p) use the following code:
!yt-dlp -f "bestvideo[height<=1080][ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" "https://www.youtube.com/watch?v=AWXvSBHB210"

Upgrade pytube
python3 -m pip install --upgrade pytube

Package python script with pandas using PEX

I have a simple python script that depends on pandas. I need to package it with pex so it's executed without dependency installation.
import sys
import csv
import argparse
import pandas as pd
class myLogic():
def __init__(self):
pass
def loadData(self, data_file):
return pd.read_csv(data_file, delimiter="|")
#command line interaction interface
def processInputArguments(self,args):
parser = argparse.ArgumentParser(description="my logic")
#transactions file name
parser.add_argument('-td',
'--data',
type=str,
dest='data',
help='data file location'
)
options = parser.parse_args(args)
return vars(options)
def main(self):
options = self.processInputArguments(sys.argv[1:])
data_file = options["data"]
data = self.loadData(data_file)
print data.head()
if __name__ == '__main__':
ml = myLogic()
ml.main()
I am trying to use pex to do that, so I did the following:
pex pandas -e myprogram.myLogic:main -o test1.pex
But I am getting this error when running the generated pex file:
Traceback (most recent call last):
File ".bootstrap/_pex/pex.py", line 317, in execute
File ".bootstrap/_pex/pex.py", line 250, in _wrap_coverage
File ".bootstrap/_pex/pex.py", line 282, in _wrap_profiling
File ".bootstrap/_pex/pex.py", line 360, in _execute
File ".bootstrap/_pex/pex.py", line 418, in execute_entry
File ".bootstrap/_pex/pex.py", line 435, in execute_pkg_resources
File ".bootstrap/pkg_resources.py", line 2088, in load
ImportError: No module named myLogic
I also tried packaging with the -c (switch for script) using the following command:
pex pandas -c myprogram.py -o test2.pex
But also getting an error:
Traceback (most recent call last):
File "/usr/local/bin/pex", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/pex/bin/pex.py", line 509, in main
pex_builder = build_pex(reqs, options, resolver_options_builder)
File "/usr/local/lib/python2.7/dist-packages/pex/bin/pex.py", line 486, in build_pex
pex_builder.set_script(options.script)
File "/usr/local/lib/python2.7/dist-packages/pex/pex_builder.py", line 214, in set_script
script, ', '.join(self._distributions)))
TypeError: sequence item 0: expected string, DistInfoDistribution found

The only option that worked for me up until now is creating an interpreter with pex that includes pandas and then shipping it with the python script. This can be done as follows:
pex pandas -o my_interpreter.pex
But this fails when the building python version is UCS4 and the version to run with is UCS2

unable to start carbon-graphite on ubuntu 12.04

I am working on Sensu and graphite. Where Sensu server sends the data in the JSON format to graphite server to draw graphs of the various metrics and parameters.
I am getting following error while trying to start carbon. I have pasted the error log and trace back here. I am not able to find the solution in the internet. I tried googling almost everything.
I have enabled AMQP to read the metrics from the rabbitMQ server.
Traceback (most recent call last):
File "./carbon-cache.py", line 30, in <module>
run_twistd_plugin(__file__)
File "/opt/graphite/lib/carbon/util.py", line 90, in run_twistd_plugin
config.parseOptions(twistd_options)
File "/usr/local/lib/python2.7/dist-packages/twisted/application/app.py", line 614, in parseOptions
usage.Options.parseOptions(self, options)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/usage.py", line 266, in parseOptions
self.subOptions.parseOptions(rest)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/usage.py", line 276, in parseOptions
self.postOptions()
File "/opt/graphite/lib/carbon/conf.py", line 188, in postOptions
program_settings = read_config(program, self)
File "/opt/graphite/lib/carbon/conf.py", line 497, in read_config
settings.readFrom(config, section)
File "/opt/graphite/lib/carbon/conf.py", line 137, in readFrom
value = parser.getboolean(section, key)
File "/usr/lib/python2.7/ConfigParser.py", line 370, in getboolean
raise ValueError, 'Not a boolean: %s' % v
ValueError: Not a boolean: False
ENABLE_AMQP = True
AMQP_VERBOSE = True
AMQP_HOST = 192.168.1.134
AMQP_PORT = 5671
AMQP_VHOST = /sensu
AMQP_USER = sensu
AMQP_PASSWORD = mypass
AMQP_EXCHANGE = metrics
AMQP_METRIC_NAME_IN_BODY = True
Kindly help me out. I am in need of it....

For the benefit of those coming from Google, the problem may actually be that you're missing the Python module txAMQP. You can install the module with:
pip install txamqp
or on Ubuntu (tested on Ubuntu 14.04)
sudo apt-get install python-txamqp

Error with pydub in python

i have successfully imported pydub
but for the code:
from pydub import AudioSegment
song = AudioSegment.from_mp3("c:\mks.mp3")
first_ten_seconds = song[:10000]
song.export("d:\mks.mp3", format="mp3")
But it gives the following error:
python "C:\Users\mKs\Desktop\mks2.py"
Process started >>>
Traceback (most recent call last):
File "C:\Users\mKs\Desktop\mks2.py", line 2, in <module>
song=AudioSegment.from_mp3("c:\mks.mp3");
File "C:\Python27\lib\site-packages\pydub-0.5.2-py2.7.egg\pydub\audio_segment.py", line 194, in from_mp3
return cls.from_file(file, 'mp3')
File "C:\Python27\lib\site-packages\pydub-0.5.2-py2.7.egg\pydub\audio_segment.py", line 189, in from_file
return cls.from_wav(output)
File "C:\Python27\lib\site-packages\pydub-0.5.2-py2.7.egg\pydub\audio_segment.py", line 206, in from_wav
return cls(data=file)
File "C:\Python27\lib\site-packages\pydub-0.5.2-py2.7.egg\pydub\audio_segment.py", line 33, in __init__
raw = wave.open(StringIO(data), 'rb')
File "C:\Python27\lib\wave.py", line 498, in open
return Wave_read(f)
File "C:\Python27\lib\wave.py", line 163, in __init__
self.initfp(f)
File "C:\Python27\lib\wave.py", line 128, in initfp
self._file = Chunk(file, bigendian = 0)
File "C:\Python27\lib\chunk.py", line 63, in __init__
raise EOFError
EOFError
I would love to get help on this topic

The only issue that I see with your code is trailing ";" at the end of last 3 line. Please remove those, and see if you still get the error.
In addition, make sure you have ffmpeg (http://www.ffmpeg.org/) installed. It is required for the support of all of the none wav file formats.
ADDED:
I think you have broken module dependencies in your python installation.
I have tried code that you provided above with python 2.7.2. It worked fine for me:
>>> from pydub import AudioSegment
>>> song = AudioSegment.from_wav('goodbye.wav')
>>> first_ten_seconds = song[:10000]
>>> song.export('goodbye1.wav',format='wav')
<open file 'goodbye1.wav', mode 'wb+' at 0x10cf2b270>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

NLTK panlex_lite giving me error - python

Related

Using WordNet with PyScript

pytube.exceptions.RegexMatchError: get_throttling_function_name: could not find match for multiple

Package python script with pandas using PEX

unable to start carbon-graphite on ubuntu 12.04

Error with pydub in python

Categories

Resources