I am working on python tesseract package with sample code like the follows:
import pytesseract
from PIL import Image
tessdata_dir_config = "--tessdata-dir \"/opt/homebrew/Cellar/tesseract-lang/4.1.0/share/tessdata/\""
image = Image.open("dataset/test.jpeg")
text = pytesseract.image_to_string(image, lang = "chi-sim", config = tessdata_dir_config)
print(text)
And I received the following error message:
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file /opt/homebrew/Cellar/tesseract-lang/4.1.0/share/tessdata/chi-sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'chi-sim' Tesseract couldn't load any languages! Could not initialize tesseract.')
From my understanding, the error occurred when reading the file chi-sim.traineddata (which stands for Simplified Chinese), as I will explain the attempts I have made to settle this problem below.
My developing environment is M1 macOS, and I installed tesseract and tesseract-lang from Homebrew. I am pretty sure that the path specified above is exactly where the source files are located, since when I call
print(pytesseract.get_languages(config = ""))
I get a long list of languages printed, including chi-sim.
Further, if we just use English instead of Chinese, the following code can successfully recognize the English texts in an image:
text = pytesseract.image_to_string(image)
I've tried to specify environment variable TESSDATA_PREFIX in multiple ways, including:
Using config parameter as in the original code.
Adding global environment variable in PyCharm.
Adding the following line in the code
os.environ["TESSDATA_PREFIX"] = "tesseract/4.1.1/share/tessdata/"
Adding the following line to bash_profile in terminal
export TESSDATA_PREFIX=/opt/homebrew/Cellar/tesseract-lang/4.1.0/share/tessdata/
But unfortunately, none of these works.
It seems as if my file chi-sim.traineddata is, somehow, broken, so I directly downloaded the trained data file from GitHub (https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata), hit the "Download" button on the right, and placed the downloaded file in the tesseract-lang and original tesseract directory (where eng.traineddata is located). Yes, I've tried both, but neither works.
With respect to this issue, is there any potential solutions?
Code works for me on Linux if I use lang="chi_sim" with _ instead of - because file downloaded from server has name chi_sim.traineddata also with _ instead of -.
If I rename file into chi-sim.traineddata then I can use lang="chi-sim" (with - instead of _)
cv2.imread is always returning NoneType.
I am using python version 2.7 and OpenCV 2.4.6 on 64 bit Windows 7.
Maybe it's some kind of bug or permissions issue because the exact same installation of python and cv2 packages in another computer works correctly. Here's the code:
im = cv2.imread("D:\testdata\some.tif",CV_LOAD_IMAGE_COLOR)
I downloaded OpenCV from http://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv. Any clue would be appreciated.
First, make sure the path is valid, not containing any single backslashes. Check the other answers, e.g. https://stackoverflow.com/a/26954461/463796.
If the path is fixed but the image is still not loading, it might indeed be an OpenCV bug that is not resolved yet, as of 2013. cv2.imread is not working properly under Win32 for me either.
In the meantime, use LoadImage, which should work fine.
im = cv2.cv.LoadImage("D:/testdata/some.tif", CV_LOAD_IMAGE_COLOR)
In my case the problem was the spaces in the path. After I moved the images to a path with no spaces it worked.
Try changing the direction of the slashes
im = cv2.imread("D:/testdata/some.tif",CV_LOAD_IMAGE_COLOR)
or add r to the begining of the string
im = cv2.imread(r"D:\testdata\some.tif",CV_LOAD_IMAGE_COLOR)
I also met the same issue before on ubuntu 18.04.
cv2.imread(path)
I solved it when I changed the path argument from Relative_File_Path to Absolute_File_Path.
Hope it be useful.
just stumbled upon this one.
The solution is very simple but not intuitive.
if you use relative paths, you can use either '\' or '/' as in test\pic.jpg or test/pic.jpg respectively
if you use absolute paths, you should only use '/' as in /.../test/pic.jpg for unix or C:/.../test/pic.jpg for windows
to be on the safe side, just use for root, _, files in os.walk(<path>): in combination with abs_path = os.path.join(root, file). Calling imread afterwards, as in img = ocv.imread(abs_path) is always going to work.
In case no one mentioned in this question, another way to workaround is using plt to read image, then convert it to BGR format.
img=plt.imread(img_path)
print(img.shape)
img=img[...,::-1]
it has been mentioned in
cv2.imread does not read jpg files
This took a long time to resolve. first make sure that the file is in the directory and check that even though windows explorer says the file is "JPEG" it is actually "JPG". The first print statement is key to making sure that the file actually exists. I am a total beginner, so if the code sucks, so be it.
The code, just imports a picture and displays it . If the code finds the file, then True will be printed in the python window.
import cv2
import sys
import numpy as np
import os
image_path= "C:/python27/test_image.jpg"
print os.path.exists(image_path)
CV_LOAD_IMAGE_COLOR = 1 # set flag to 1 to give colour image
CV_LOAD_IMAGE_COLOR = 0 # set flag to 0 to give a grayscale one
img = cv2.imread(image_path,CV_LOAD_IMAGE_COLOR)
print img.shape
cv2.namedWindow('Display Window') ## create window for display
cv2.imshow('Display Window', img) ## Show image in the window
cv2.waitKey(0) ## Wait for keystroke
cv2.destroyAllWindows() ## Destroy all windows
I had a similar problem, changing the name of the image to English alphabetic worked for me. Also, it didn't work with a numeric name (e.g. 1.jpg).
My OS is Windows 10. I noticed imread is very sensitive to path. No any recommendation about slashes worked for me, so how I managed to solve problem: I have placed file to project folder and typed:
img = cv2.imread("MyImageName.jpg", 0)
So without any path and folder, just file name. And that worked for me.
Also try different files from different sources and of different formats
I spent some time on this only to find that this error is caused by a broken image file on my case. So please manually check your file to make sure it is valid and can be opened by common image viewers.
I had a similar issue,changing direction of slashes worked:
Change / to \
In my case helped changing file names to latin alphabet.
Instead of renaiming all files I wrote a simple wrapper to rename a file before the load into a random guid and right after the load rename it back.
import os
import uuid
import cv2
uid = str(uuid.uuid4())
def wrap_file_rename(my_path, function):
try:
directory = os.path.dirname(my_path)
new_full_name = os.path.join(directory, uid)
os.rename(my_path, new_full_name)
return function(new_full_name)
except Exception as error:
logger.error(error) # use your logger here
finally:
os.rename(new_full_name, my_path)
def my_image_read(my_path, param=None):
return wrap_file_rename(my_path, lambda p: cv2.imread(p) if param is None else cv2.imread(p, param))
Sometimes the file is corrupted. If it exists and cv2.imread returns None this may be the case.
Try opening the file כfrom file explorer and see if that works
I've run into this. Turns out the PIL module provides this functionality.
Similarly, numpy.imread and scipy.misc.imread both didn't exist until I installed PIL
In my configuration (win7 python2.7), that was done as follows:
cd /c/python27/scripts
easy_install PIL
So I found a python script that I think would be extremely useful to me. It allegedly will sort photos into "blury" or "not blurry" folders.
I'm very much a python newb, but I managed in still python 3.7, numpy, and openCV. I put the script in a folder with a bunch of .jpg images and run it by typing in the command prompt:
python C:\Users\myName\images\BlurDetection.py
When I run it though it just immediately returns:
Done. Processed 0 files into 0 blurred, and 0 ok.
No error messages or anything. It just doesn't do what it's supposed to do.
Here's the script.
#
# Sorts pictures in current directory into two subdirs, blurred and ok
#
import os
import shutil
import cv2
FOCUS_THRESHOLD = 80
BLURRED_DIR = 'blurred'
OK_DIR = 'ok'
blur_count = 0
files = [f for f in os.listdir('.') if f.endswith('.jpg')]
try:
os.makedirs(BLURRED_DIR)
os.makedirs(OK_DIR)
except:
pass
for infile in files:
print('Processing file %s ...' % (infile))
cv_image = cv2.imread(infile)
# Covert to grayscale
gray = cv2.cvtColor(cv_image, cv2.COLOR_BGR2GRAY)
# Compute the Laplacian of the image and then the focus
# measure is simply the variance of the Laplacian
variance_of_laplacian = cv2.Laplacian(gray, cv2.CV_64F).var()
# If below threshold, it's blurry
if variance_of_laplacian < FOCUS_THRESHOLD:
shutil.move(infile, BLURRED_DIR)
blur_count += 1
else:
shutil.move(infile, OK_DIR)
print('Done. Processed %d files into %d blurred, and %d ok.' % (len(files), blur_count, len(files)-blur_count))
Any thoughts why it might not be working or what is wrong? Please advise.
Thanks!!
Your script and photos are here:
C:\Users\myName\images
But that is not your working directory, i.e. Python looks for your photos in whatever this returns to you:
print(os.getcwd())
To make Python look for the files in the right folder, simply do:
os.chdir('C:\Users\myName\images')
Now it will be able to hit the files.
If .jpg images are already present in the directory then please check for folders "blurred" and "ok" in the directory. The error could be because these two folders are already present in the listed directory. I ran this code and it worked for me but when i re-ran this code without deleting the blurred and ok folder, I got the same error.
I am using a virtual environment, which I called cv. I am trying to read into a numpy array using opencv a .cr2 raw image.
Using:
import cv2
img = cv2.imread("raw.cr2")
print img
Returns:
None
Always.
I believe the problem is in the path of raw.cr2, which cannot be found apparently. I have tried including the absolute path in the file I pass to imread. My file is in the home folder (~) where I run python from. I know the path is the issue because if I run sys.os.exists(path), it always returns False.
Lastly, I also tried reading raw.cr2 using scipy.misc:
img = scipy.misc.imread(path)
Returns:
IOError: cannot identify image file 'raw.cr2'
Don't know if you ever solved this. I recently encountered the same problem (using ArchLinux) and found it was a permissions issue. Had to chown the images I wanted to use. Silly me.
I'm bumping into this error that's driving me a little bit crazy with the python wrapper for tesseract which is a python module called tesseract.
Here's the python code I am trying to run :
img = cv2.imread(image, 0)
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetPageSegMode(tesseract.PSM_AUTO)
tesseract.SetCvImage(img,api)
url = api.GetUTF8Text()
conf=api.MeanTextConf()
print('Extracted URL : ' + url)
api.End()
and this is what I get:
Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
I don't understand why it is doing this since I have the TESSDATA_PREFIX env variable correctly set to the correct path to my tesseract installation (with the trailing slash).
When I try to run Tesseract directly from powershell (I'm on windows 7 btw), by doing:
tesseract.exe .\data\test.tif -psm 7 out
it works like a charm !
Also when I call Tesseract with Popen in my python script it works fine but I don't like the idea of me not being able to grab the OCR'd text directly from stdout. Indeed, there seems to be no other choice than providing Tesseract with an output filename and then to fopen and read from that file. I feel it's going to be pretty awful to deal with temporary text files just to get the output of the OCR...
Help?
The first parameter to api.Init should be TESSDATA_PREFIX.
get location of ur tessdata folder by typing in command prompt:
$ brew list tesseract
in may case:
/usr/local/Cellar/tesseract/3.05.01/bin/tesseract
/usr/local/Cellar/tesseract/3.05.01/include/tesseract/ (27 files)
/usr/local/Cellar/tesseract/3.05.01/lib/libtesseract.3.dylib
/usr/local/Cellar/tesseract/3.05.01/lib/pkgconfig/tesseract.pc
/usr/local/Cellar/tesseract/3.05.01/lib/ (2 other files)
/usr/local/Cellar/tesseract/3.05.01/share/man/ (11 files)
/usr/local/Cellar/tesseract/3.05.01/share/tessdata/ (28 files)
now
tessdata_dir_config = r'--tessdata-dir "/usr/local/Cellar/tesseract/3.05.01/share/tessdata"'
txt= image_to_string(img,lang='eng',config=tessdata_dir_config)