I am trying to analyze Wikipedia dump file. I am using gensim.scripts, a Python library, and running this command in Windows 10 cmd.exe:
python -m gensim.scripts.make_wiki enwiki-latest-pages-articles.xml.bz2 wiki_en_output
This gives me the error:Microsoft Windows [Version 10.0.10586]
(c) 2015 Microsoft Corporation. All rights reserved.
2015-12-03 15:47:20,459 : INFO : running C:\Python27\lib\site-packages\gensim-0.12.3-py2.7-win32.egg\gensim\scripts\make_wiki.py enwiki-latest-pages-articles.xml.bz2 wiki_en_output
Traceback (most recent call last):
File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "C:\Python27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\Python27\lib\site-packages\gensim-0.12.3-py2.7-win32.egg\gensim\scripts\make_wiki.py", line 84, in <module>
wiki = WikiCorpus(inp, lemmatize=lemmatize) # takes about 9h on a macbook pro, for 3.5m articles (june 2011)
File "C:\Python27\lib\site-packages\gensim-0.12.3-py2.7-win32.egg\gensim\corpora\wikicorpus.py", line 270, in __init__
self.dictionary = Dictionary(self.get_texts())
File "C:\Python27\lib\site-packages\gensim-0.12.3-py2.7-win32.egg\gensim\corpora\dictionary.py", line 58, in __init__
self.add_documents(documents, prune_at=prune_at)
File "C:\Python27\lib\site-packages\gensim-0.12.3-py2.7-win32.egg\gensim\corpora\dictionary.py", line 119, in add_documents
for docno, document in enumerate(documents):
File "C:\Python27\lib\site-packages\gensim-0.12.3-py2.7-win32.egg\gensim\corpora\wikicorpus.py", line 290, in get_texts
texts = ((text, self.lemmatize, title, pageid) for title, text, pageid in extract_pages(bz2.BZ2File(self.fname), self.filter_namespaces))
IOError: [Errno 2] No such file or directory: 'enwiki-latest-pages-articles.xml.bz2'
Thoughts on what I should do to fix this?
On Windows 10. gensim.scripts has been installed.
Just put the whole path to the downloaded enwiki-latest-pages-articles.xml.bz2, or try to run gensim script from the download folder.
If you haven't that archive - you may find and download it from dumps wikimedia website
Related
i am using newsplease library that i have cloned from https://github.com/fhamborg/news-please.
i want to use newsplease to get news artices from commoncrawl news datasets.
i am running commoncrawl.py file as instruct here.
i have used the command below -
python -m newsplease.examples.commoncrawl
on executing the following command i am getting following errors -
my_local_download_dir_warc=./cc_download_warc/
my_local_download_dir_article=./cc_download_articles/
delete_warc_after_extraction=False
my_number_of_extraction_processes=1
INFO:newsplease.crawler.commoncrawl_crawler:executing: aws s3 ls --recursive s3://commoncrawl/crawl-data/CC-NEWS/ --no-sign-request > .tmpaws.txt && awk '{ print $4 }' .tmpaws.txt && rm .tmpaws.txt
INFO:newsplease.crawler.commoncrawl_crawler:found 2 files at commoncrawl.org
INFO:newsplease.crawler.commoncrawl_crawler:creating extraction process pool with 1 processes
INFO:newsplease.crawler.commoncrawl_extractor:found local file ./cc_download_warc/https%3A%2F%2Fcommoncrawl.s3.amazonaws.com%2F, not downloading again due to configuration
Traceback (most recent call last):
File "/home/prateek/.local/lib/python3.6/site-packages/warcio/recordloader.py", line 236, in _detect_type_load_headers
rec_headers = self.arc_parser.parse(stream, statusline)
File "/home/prateek/.local/lib/python3.6/site-packages/warcio/recordloader.py", line 312, in parse
raise StatusAndHeadersParserException(msg, parts)
warcio.statusandheaders.StatusAndHeadersParserException: Wrong # of headers, expected arc headers ['uri', 'ip-address', 'archive-date', 'content-type', 'length'], Found ['<?xml', 'version="1.0"', 'encoding="UTF-8"?>']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/prateek/.local/lib/python3.6/site-packages/newsplease/examples/commoncrawl.py", line 172, in <module>
main()
File "/home/prateek/.local/lib/python3.6/site-packages/newsplease/examples/commoncrawl.py", line 168, in main
continue_process=True)
File "/home/prateek/.local/lib/python3.6/site-packages/newsplease/crawler/commoncrawl_crawler.py", line 320, in crawl_from_commoncrawl
log_pathname_fully_extracted_warcs=__log_pathname_fully_extracted_warcs)
File "/home/prateek/.local/lib/python3.6/site-packages/newsplease/crawler/commoncrawl_crawler.py", line 230, in __start_commoncrawl_extractor
log_pathname_fully_extracted_warcs=__log_pathname_fully_extracted_warcs)
File "/home/prateek/.local/lib/python3.6/site-packages/newsplease/crawler/commoncrawl_extractor.py", line 338, in extract_from_commoncrawl
self.__run()
File "/home/prateek/.local/lib/python3.6/site-packages/newsplease/crawler/commoncrawl_extractor.py", line 292, in __run
self.__process_warc_gz_file(local_path_name)
File "/home/prateek/.local/lib/python3.6/site-packages/newsplease/crawler/commoncrawl_extractor.py", line 231, in __process_warc_gz_file
for record in ArchiveIterator(stream):
File "/home/prateek/.local/lib/python3.6/site-packages/warcio/archiveiterator.py", line 110, in _iterate_records
self.record = self._next_record(self.next_line)
File "/home/prateek/.local/lib/python3.6/site-packages/warcio/archiveiterator.py", line 262, in _next_record
self.check_digests)
File "/home/prateek/.local/lib/python3.6/site-packages/warcio/recordloader.py", line 88, in parse_record_stream
known_format))
File "/home/prateek/.local/lib/python3.6/site-packages/warcio/recordloader.py", line 243, in _detect_type_load_headers
raise ArchiveLoadFailed(msg + str(se.statusline))
warcio.exceptions.ArchiveLoadFailed: Unknown archive format, first line: ['<?xml', 'version="1.0"', 'encoding="UTF-8"?>']
what is the error here how can i resolve this.
https://github.com/fhamborg/news-please says that adopt the config section in
newsplease/examples/commoncrawl.py.
what does this mean ?
i have copied the configurations from this file and pasted in
config.cfg which is present in the newsplease/config directory.
is this what thay have instructed ? or i have made a mistake here.
i am using python 3.6. i have only one python installed in my machine.
this error is because of the libraries being used by the newsplease. mistake is made when we manually install every library, while installing focus on the versions of packages. version info of every library is given in setup.py file. install exact version given in setup.py file.
now there may be problems while executing the setup.py.
so use this command -
python3 setup.py install
if you need to uninstall all the previous verions of installed packeges then run -
pip3 freeze --user | xargs pip3 uninstall -y
for more ways to do this click here
I am new to python and attempting to setup up a mono repo using pants as the build system. All has been going well, until I told pants to use a python 3 interpreter by setting in pants.ini
[python-setup]
interpreter_constraints: ["CPython>=3.6.5"]
I have a setup a local library python_library which is imported into a python_binary.
In other words, for an identical code base,
running ./pants run ... on python 2.7 works perfectly
running ./pants run ... with the above interpreter_constraints fails with ModuleNotFoundError
The stack trace I am getting looks like
Traceback (most recent call last):
File "/home/daniel/PycharmProjects/analytics-platform/.pants.d/run/py/CPython-3.6.8/8c5d2b01b2fb9952ae4d6116e537a04edd7039e8/.bootstrap/pex/pex.py", line 397, in execute
exit_code = self._wrap_coverage(self._wrap_profiling, self._execute)
File "/home/daniel/PycharmProjects/analytics-platform/.pants.d/run/py/CPython-3.6.8/8c5d2b01b2fb9952ae4d6116e537a04edd7039e8/.bootstrap/pex/pex.py", line 329, in _wrap_coverage
return runner(*args)
File "/home/daniel/PycharmProjects/analytics-platform/.pants.d/run/py/CPython-3.6.8/8c5d2b01b2fb9952ae4d6116e537a04edd7039e8/.bootstrap/pex/pex.py", line 360, in _wrap_profiling
return runner(*args)
File "/home/daniel/PycharmProjects/analytics-platform/.pants.d/run/py/CPython-3.6.8/8c5d2b01b2fb9952ae4d6116e537a04edd7039e8/.bootstrap/pex/pex.py", line 442, in _execute
return self.execute_entry(self._pex_info.entry_point)
File "/home/daniel/PycharmProjects/analytics-platform/.pants.d/run/py/CPython-3.6.8/8c5d2b01b2fb9952ae4d6116e537a04edd7039e8/.bootstrap/pex/pex.py", line 540, in execute_entry
return runner(entry_point)
File "/home/daniel/PycharmProjects/analytics-platform/.pants.d/run/py/CPython-3.6.8/8c5d2b01b2fb9952ae4d6116e537a04edd7039e8/.bootstrap/pex/pex.py", line 547, in execute_module
runpy.run_module(module_name, run_name='__main__')
File "/usr/lib/python3.6/runpy.py", line 208, in run_module
return _run_code(code, {}, init_globals, run_name, mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/daniel/PycharmProjects/analytics-platform/.pants.d/pyprep/sources/b9e34cbd28ac4d65dc0c5e39ad35ba34da17a128/src/main.py", line 1, in <module>
from str_utils import normalize
ModuleNotFoundError: No module named 'str_utils'
Is there anything I need to add to the pants.ini file when using CPython>=3.6.5?
Please note - the code used works on python2 and python3 when executed manually
Why I can't install pip with pypy3 -m ensurepip? I have unpacked PyPy from official package, and followed instructions at official docs but resulting in an error. Interpreter log is below.
Traceback (most recent call last):
File "D:\pypy3-v5.10.0-win32\lib-python\3\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "D:\pypy3-v5.10.0-win32\lib-python\3\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\pypy3-v5.10.0-win32\lib-python\3\ensurepip\__main__.py", line 4, in <module>
ensurepip._main()
File "D:\pypy3-v5.10.0-win32\lib-python\3\ensurepip\__init__.py", line 209, in _main
default_pip=args.default_pip,
File "D:\pypy3-v5.10.0-win32\lib-python\3\ensurepip\__init__.py", line 116, in bootstrap
_run_pip(args + [p[0] for p in _PROJECTS], additional_paths)
File "D:\pypy3-v5.10.0-win32\lib-python\3\ensurepip\__init__.py", line 40, in _run_pip
import pip
File "C:\Users\user\AppData\Local\Temp\tmp5zq6hqua\pip-9.0.1-py2.py3-none-any.whl\pip\__init__.py", line 21, in <module>
File "C:\Users\user\AppData\Local\Temp\tmp5zq6hqua\pip-9.0.1-py2.py3-none-any.whl\pip\_vendor\requests\__init__.py", line 62, in <module>
File "C:\Users\user\AppData\Local\Temp\tmp5zq6hqua\pip-9.0.1-py2.py3-none-any.whl\pip\_vendor\requests\packages\__init__.py", line 27, in <module>
File "C:\Users\user\AppData\Local\Temp\tmp5zq6hqua\pip-9.0.1-py2.py3-none-any.whl\pip\_vendor\requests\packages\urllib3\__init__.py", line 8, in <module>
File "C:\Users\user\AppData\Local\Temp\tmp5zq6hqua\pip-9.0.1-py2.py3-none-any.whl\pip\_vendor\requests\packages\urllib3\connectionpool.py", line 101, in <module>
AttributeError: module 'errno' has no attribute 'EWOULDBLOCK'
The errno module on pypy3 on Windows (which is beta) is indeed incomplete. This has been fixed after the 5.10.0 release and will be included in the 5.10.1 release.
We build nightly zip files off the latest HEAD, available here. It would be great if you could try out the latest windows version and let us know on IRC at #pypy, or on the pypy-dev mailing list, or by filing an issue on our bitbucket issue tracker whether it works for you, so that we will not need to do a 5.10.2 bug release fix after the current one.
I just finished the first draft of a .ui file in the designer of pyqt, and I am wondering how I go from the .ui file to an exe to let someone test out my ui... I currently have a makefile that translates my .ui file into a .py file, but now I want to go from .py to .exe
Does anybody know how to do this? I have py2exe downloaded but not sure if this is what I want...
Please assume the people I want to test this don't have python downloaded and are using Windows (cross platform is better but windows will be used)
Thank you!
EDIT: when i run py2exe on my test.py (which was made from the test.ui)
I use
py -3.6 -m py2exe.build_exe test.py
and get
C:\Users\Chris\Desktop\makeExe>py -3.6 -m py2exe.build_exe test.py
Traceback (most recent call last):
File "C:\Users\Chris\AppData\Local\Programs\Python\Python36\lib\runpy.py", lin
e 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\Chris\AppData\Local\Programs\Python\Python36\lib\runpy.py", lin
e 85, in _run_code
exec(code, run_globals)
File "C:\Users\Chris\AppData\Local\Programs\Python\Python36\lib\site-packages\
py2exe\build_exe.py", line 145, in <module>
main()
File "C:\Users\Chris\AppData\Local\Programs\Python\Python36\lib\site-packages\
py2exe\build_exe.py", line 141, in main
builder.analyze()
File "C:\Users\Chris\AppData\Local\Programs\Python\Python36\lib\site-packages\
py2exe\runtime.py", line 160, in analyze
self.mf.import_hook(modname)
File "C:\Users\Chris\AppData\Local\Programs\Python\Python36\lib\site-packages\
py2exe\mf3.py", line 120, in import_hook
module = self._gcd_import(name)
File "C:\Users\Chris\AppData\Local\Programs\Python\Python36\lib\site-packages\
py2exe\mf3.py", line 274, in _gcd_import
return self._find_and_load(name)
File "C:\Users\Chris\AppData\Local\Programs\Python\Python36\lib\site-packages\
py2exe\mf3.py", line 357, in _find_and_load
self._scan_code(module.__code__, module)
File "C:\Users\Chris\AppData\Local\Programs\Python\Python36\lib\site-packages\
py2exe\mf3.py", line 388, in _scan_code
for what, args in self._scan_opcodes(code):
File "C:\Users\Chris\AppData\Local\Programs\Python\Python36\lib\site-packages\
py2exe\mf3.py", line 417, in _scan_opcodes
yield "store", (names[oparg],)
IndexError: tuple index out of range
C:\Users\Chris\Desktop\makeExe>
Py2exe didn't support signing whereas Pyinstaller has support for signing from version 1.4
So this is also a solution
pip install pyinstaller
pyinstaller --onefile --windowed test.py
I have installed nltk 3.2.1 on my CentOS machine.
Now whenever I try to download any corpora/models of NLTK, it gives me below error:
Traceback (most recent call last):
File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/lib/python2.7/site-packages/nltk/downloader.py", line 2268, in <module>
halt_on_error=options.halt_on_error)
File "/usr/lib/python2.7/site-packages/nltk/downloader.py", line 664, in download
for msg in self.incr_download(info_or_id, download_dir, force):
File "/usr/lib/python2.7/site-packages/nltk/downloader.py", line 534, in incr_download
try: info = self._info_or_id(info_or_id)
File "/usr/lib/python2.7/site-packages/nltk/downloader.py", line 508, in _info_or_id
return self.info(info_or_id)
File "/usr/lib/python2.7/site-packages/nltk/downloader.py", line 875, in info
self._update_index()
File "/usr/lib/python2.7/site-packages/nltk/downloader.py", line 825, in _update_index
ElementTree.parse(compat.urlopen(self._url)).getroot())
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1182, in parse
tree.parse(source, parser)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 656, in parse
parser.feed(data)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: syntax error: line 1, column 49
Note that I have tried all the below methods to download NLTK data -
nltk.download()
nltk.download('all')
python -m nltk.downloader all
But in all the methods I receive the same error.
Anybody has any idea why I am getting this error and how to download NLTK data?
Any help would be appreciated!
Let's see: Your downloader opens the xml document that lists the available downloads, tries to parse it, and gets an error:
ElementTree.parse(compat.urlopen(self._url)).getroot())
Either (very unlikely) the nltk site is no longer compatible with Python 2.7, or you're not actually receiving the expected XML document because there's something wrong with your connection. Are you behind a proxy? If not, something else is probably wrong with your connection.