NLTK internals.py error - python

I've been trying to setup hunpos on my windows system but am experiencing some issues.
The error i get is
File "C:\Users\a\Desktop\x.py", line 25, in <module>
ht = HunposTagger('english.model')
File "C:\Python27-32\lib\site-packages\nltk-2.0.1rc4-py2.7-win32.egg\nltk\tag\hunpos.py", line 84, in __init__
verbose=verbose)
File "C:\Python27-32\lib\site-packages\nltk-2.0.1rc4-py2.7-win32.egg\nltk\internals.py", line 526, in find_binary
url, verbose)
File "C:\Python27-32\lib\site-packages\nltk-2.0.1rc4-py2.7-win32.egg\nltk\internals.py", line 510, in find_file
raise LookupError('\n\n%s\n%s\n%s' % (div, msg, div))
LookupError: ===========================================================================
NLTK was unable to find the hunpos-tag file!
Use software specific configuration paramaters or set the HUNPOS environment variable.
Searched in:
- C:\Users\a\
- .
- /usr/bin
- /usr/local/bin
- /opt/local/bin
- /Applications/bin
- C:\Users\a/bin
- C:\Users\a/Applications/bin
I'm guessing there's a bug in nltk's internals.py but not sure how to fix it. I added os.getcwd() to hunpos_paths in hunpos.py but it doesn't help.
Does anyone know why this is happening?
Thanks

Do you have the file english.model? If you do, set the environment variable HUNPOS to the directory that contains it, and run python again. If you still get an error, check that the directory appears in the list of locations searched.

Did you compile hunpos-tag on your own? If not, and you downloaded the binary from google code or any other place, it cannot be the case that the runnable is actually an exe file (I have no idea whether exe extension is needed by windows for a file to be executable or not), and hunpos.py calls find_binary() to locate hunpos-tag, but not hunpos-tag.exe? I don't know how find_binary() works, but this may be the problem.

from nltk.tag.hunpos import HunposTagger
ht = HunposTagger('english.model', 'hunpos-1.0-win/hunpos-tag.exe')
ht.tag('What is the airspeed of an unladen swallow ?'.split())
ht.close()
You need to set hunpos files paths as arguments.

Related

Receiving "pytesseract not in your path" error on the exact same code that used to work fine

I wrote this code several months ago and wanted to pass through it to clean it up and add some new features. Its a simple tool I used to take a picture of my screen and get write-able words from it. I am on a new computer from the one I originally wrote the code on; however, I went through and installed every module via the pycharm module manager. However, I keep getting this error when I run the code even though I have located the package in my path. Any help would be greatly appreciated.
I've looked up several different variations of my problem but they all seem to have different causes and fixes, of course, none of which work for me.
if c ==2:
img = ImageGrab.grab(bbox=(x1-5, y1-5, x2+5,y2+5)) # bbox specifies region (bbox= x,y,width,height)
img_np = np.array(img)
frame = cv2.cvtColor(img_np, cv2.COLOR_BGR2GRAY)
c = 0
x = 0
string = str(pytesseract.image_to_string(frame)).lower()
print(string)
This is the only section of the code that references pytesseract other than of course "import pytesseract". Hopefully I can get this code up and running again and the pytesseract module in general as it is integral to many of my scripts. Thanks in advance for your help.
File "C:\Users\dante\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 184, in run_tesseract
proc = subprocess.Popen(cmd_args, **subprocess_args())
File "C:\Users\dante\Anaconda3\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Users\dante\Anaconda3\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/dante/Desktop/DPC/processes/screen_to_text.py", line 29, in <module>
string = str(pytesseract.image_to_string(frame)).lower()
File "C:\Users\dante\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 309, in image_to_string
}[output_type]()
File "C:\Users\dante\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 308, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "C:\Users\dante\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 218, in run_and_get_output
run_tesseract(**kwargs)
File "C:\Users\dante\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 186, in run_tesseract
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path```
The problem was in my lack of understanding of the module. pytesseract is not an OCR, it is simply a translator that allows users to use googles OCR. This means, in order to use this package, a user must have google's OCR installed ( I downloaded mine from here https://sourceforge.net/projects/tesseract-ocr-alt/files/).
This does NOT; however, solve the full problem. The pytesseract package needs to know where the actual OCR program is located. On line 35 of the pytesseract.py script there is a line that tells pytesseract where to find the actual google OCR tesseract program
tesseract_cmd = 'tesseract'
If you are on windows and you haven't manually added tesseract to your path (if you don't know what that means just follow the next steps) then you need to replace that line with the actual location of the google OCR on your computer. Replacing that line with
tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'
should allow you to run pytesseract assuming you have correctly installed everything. Took me quite a bit longer than i would care to admit to find the blatantly obvious solution to this issue, but hopefully people with this problem in the future resolve it faster than I did! Thanks and have a good day.

Tensorflow failed to create a newwriteablefile when retraining inception

I am following this tutorial: https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/?utm_campaign=chrome_series_machinelearning_063016&utm_source=gdev&utm_medium=yt-desc#4
I am running this part of the code:
python retrain.py \
--bottleneck_dir=bottlenecks \
--how_many_training_steps=500 \
--model_dir=inception \
--summaries_dir=training_summaries/basic \
--output_graph=retrained_graph.pb \
--output_labels=retrained_labels.txt \
--image_dir=flower_photos
Here is the error that I get after it finds the images, makes a bunch of bottlenecks and also does steps training.
Traceback (most recent call last):
File "retrain.py", line 1062, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "retrain.py", line 905, in main
f.write('\n'.join(image_lists.keys()) + '\n')
File "C:\Anaconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 101, in write
self._prewrite_check()
File "C:\Anaconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 87, in _prewrite_check
compat.as_bytes(self.__name), compat.as_bytes(self.__mode), status)
File "C:\Anaconda3\lib\contextlib.py", line 66, in __exit__
next(self.gen)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: Failed to create a NewWriteableFile: /tmp/output_labels.txt : The system cannot find the path specified.
You can find all my code here:
https://github.com/officialgupta/MachineLearningRecipes
Thanks
I have also found some similar errors. And if I understood, that you need to set an absolute paths for --output_graph and --output_labels.
For example:
--output_graph=/home/%your_homhttps://stackoverflow.com/review/late-answers/17020426#e_user_name_folder%/Inception_retrained_graph.pb
--output_labels=/home/%your_home_user_name_folder%/Inception_retrained_labels.txt
I had the same issue. The only thing I had to do is to reduce the length of path.
For example:
C:\Users\test\lib\Workspace\DataScience\Bachelorarbeit\ba_test\src\saved_models\neural20201029-235456-a0.8775
instead of
C:\Users\test\lib\Workspace\DataScience\Bachelorarbeit\ba_test\src\saved_models\neural20201029-234822Arg-e1-b512-l1-n256-oadam-z0.005-r0-d0-a0.8803.
In a similar case,I have met errors, when I tried to execute this command:
writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
then I found that, the output_path is empty. So you need to make sure an absolute paths is available.
I might be late for giving an answer but putting up an answer and hoping that it will be useful for anyone facing similar issue.
Today I came across similar issue while retraining the Inception model on Tensorflow and followed some steps to correct it.
There are two things we need to take care of.
Activate tensorflow before using Tensorflow commands.
source ~/tensorflow/bin/activate
Use full path of the files mentioned in your terminal commands as answered by #Nikita Verbitskiy in the answer below.
try below command in win10 to solve the problem
python -m retrain
--bottleneck_dir=bottlenecks
--how_many_training_steps=500
--model_dir=models
--summaries_dir=tf_files
--output_graph=retrained.pb
--output_labels=retrained_labels.txt
--architecture="mobilenet_0.50_224"
--image_dir=flower_photos
I faced the same problem, and could fix it.
Just to clear some things up, my code is running without ..
setting an absolute path
activating tensorflow
On the first run everything was saved as expected, on the second run I had the same issue as described.
For me it was enough to not set
--save_model_dir
but only setting
--output_labels
--output_graph

os.walk() not processing subdirectories when using UNC paths

I'm having trouble with os.walk() in python 2.7.8 on Windows.
When I supply it with a 'normal' path such as "D:\Test\master" it works as expected. However when I supply it with a UNC path such as "\\?\D:\Test\master" it will report the root directory as expected but it will not drill down into the sub directories, nor will it raise an exception.
My research: I read on the help page that os.walk() accepts a function argument to handle errors. By default this argument is None so no error is reported.
I passed a simple function to print the error and received the following for every directory.
def WalkError(Error):
raise Exception(Error)
Stack trace:
Traceback (most recent call last):
File "Compare.py", line 988, in StartServer
for root, dirs, files in os.walk(ROOT_DIR,True,WalkError):
File "C:\Program Files (x86)\Python2.7.8\lib\os.py", line 296, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "C:\Program Files (x86)\Python2.7.8\lib\os.py", line 281, in walk
onerror(err)
File "Compare.py", line 62, in WalkError
raise Exception(Error)
Exception: [Error 123] The filename, directory name, or volume label syntax is incorrect: '\\\\?\\D:\\Test\\master\\localization/*.*'
Answer from the original author (originally posted as an edit to the question):
Instant update: In the process of inspecting \lib\os.py, I discovered the error stems from os.listdir(). I searched for the above error message in relation to os.listdir() and found this solution which worked for me.
It looks like if you're going to use UNC style paths with os. modules they need to Unixised (have their \ converted to /). \\\\?\\D:\\Test\\master\\ becomes //?/D:/Test/master/ (note: you no longer need to escape the \ which is handy).
This runs counter to the UNC 'spec' so be aware if you're working with other modules which respect Microsoft's UNC implementation.
(Sorry for the self-solution, I was going to close the tab but felt there was knowledge here which couldn't be found elsewhere.)

File does not exist error with 'w' mode

I am encountering an odd behaviour from the file() builtin. I am using the unittest-xml-reporting Python package to generate results for my unit tests. Here are the lines that open a file for writing, a file which (obviously does not exist):
report_file = file('%s%sTEST-%s.xml' % \
(test_runner.output, os.sep, suite), 'w')
(code is taken from the package's Github page)
However, I am given the following error:
...
File "/home/[...]/django-cms/.tox/pytest/local/lib/python2.7/site-packages/xmlrunner/__init__.py", line 240, in generate_reports
(test_runner.output, os.sep, suite), 'w')
IOError: [Errno 2] No such file or directory: './TEST-cms.tests.page.NoAdminPageTests.xml'
I found this weird because, as the Python docs state, if the w mode is used, the file should be created if it doesn't exist. Why is this happening and how can I fix this?
from man 2 read
ENOENT O_CREAT is not set and the named file does not exist. Or, a
directory component in pathname does not exist or is a dangling
symbolic link.
take your pick :)
in human terms:
your current working directory, ./ is removed by the time this command is ran,
./TEST-cms.tests.page.NoAdminPageTests.xml exists but is a symlink pointing to nowhere
"w" in your open/file call is somehow messed up, e.g. if you redefined file builtin
file will create a file, but not a directory. You have to create it first, as seen here
It seems like the file which needed to be created was attempted to be created in a directory that has already been deleted (since the path was given as . and most probably the test directory has been deleted by that point).
I managed to fix this by supplying an absolute path to test_runner.output and the result files are successfully created now.

IOError: [Errno 2] No such file - Paramiko put()

I'm uploading a file via SFTP using Paramiko with sftp.put(localFile, remoteFile). I make the necessary directory first if needed with
makeCommand = 'mkdir -p "' + remotePath + '"'
ssh.exec_command(makeCommand)
this was works sometimes but I'm occasionally getting the following error:
sftp.put(localFile, remoteFile)
File "build/bdist.macosx-10.8-intel/egg/paramiko/sftp_client.py", line 565, in put
File "build/bdist.macosx-10.8-intel/egg/paramiko/sftp_client.py", line 245, in open
File "build/bdist.macosx-10.8-intel/egg/paramiko/sftp_client.py", line 635, in _request
File "build/bdist.macosx-10.8-intel/egg/paramiko/sftp_client.py", line 682, in _read_response
File "build/bdist.macosx-10.8-intel/egg/paramiko/sftp_client.py", line 708, in _convert_status
IOError: [Errno 2] No such file
despite the fact that the local file definitely exists (and localFile is the correct path to it) and the remote path is made.
There is discussion here and here on a similar problem but none of the points raised there have helped me. My server supports the df -hi command.
Has anyone any advice on this or a possible solution?
EDIT
After suggestions below I tried changing the working directory with sftp.chdir(remoteDirectory) but this call produced the exact same error as above. So it seems this isn't just an upload issue. Any ideas?
It seems to be a remote folder permission problem. Although the remote folder was made before the file was uploaded, it appears the permissions on the folder were preventing an upload.
The problem is linked to this issue - if I set open permissions on the folder I'll be uploading to before I upload, the program can upload fine. Although for a permission issue I should be getting IOError: [Errno 13] Permission denied, since I made the changes I haven't encountered any errors.
I'm not sure if it's the response the server is giving Paramiko which is the issue, or a bug in Paramiko itself which is causing IOError: [Errno 2] No such file instead of a Errno 13, but this appears to have solved the problem.
The put method has a confirm parameter which is enabled by default, which will do a stat on the file after transfer.
In my case, the remote server i was transferring the file to, immediately moved any transferred files to another location to get processed which was causing the stat to fail. Setting the confirm parameter to False resolved this.
def put(self, localpath, remotepath, callback=None, confirm=True):
From the paramiko source sftp_client.py:
:param bool confirm:
whether to do a stat() on the file afterwards to confirm the file
size (since 1.7.7)
The IOError is local, so (for whatever reason) it seems that your local python cannot find localFile. Safety checking this before the call might help tracking down the problem:
if os.path.isfile(localFile):
sftp.put(localFile, remoteFile)
else:
raise IOError('Could not find localFile %s !!' % localFile)
If you're positive that localFile does exist, then this could just be a path problem - is localFile on an absolute or relative path? Either way, the if statement above will catch it.
EDIT
Tracing through the paramiko files shows that line 245 of sftp_client.py (the one throwing the exception) is actually
fr = self.file(remotepath, 'wb')
which is quite misleading as paramiko throws an IOError for a remote file! My best guess now is that remoteFile is either a missing directory or a directory you don't have access to.
Out of interest, can you list the remote dir
sftp.listdir(path=os.path.dirname(remoteFile))
to check that it's there (or maybe it's there and you can write to it)?
Are you sure the directory has been created and it is your remote working directory?
Paramiko has its own methods for creating new directories and navigating the remote file system. Consider using something like:
sftp.mkdir(remotedirectory)
sftp.chdir(remotedirectory)
sftp.put(localfile, remotefile)
I faced the same issue. it was a silly mistake.
Just use sftp.stat(your remote directory) to check if it's there.
then use sftp.put(localfileabsolutepath, remotedir+filename)
It will work for sure.
Had the same issue. In my case it was a timing problem:
self.mkdir(remote_dir)
sftp.put(local_file, remote_file)
The mkdir() function, which had
ssh.exec_command(f"mkdir -p {remote_dir}")
in it, didn't finish fast enough.
Changing the original code to
self.mkdir(remote_dir)
sleep(0.01)
sftp.put(local_file, remote_file)
fixed it.

Categories

Resources