I am new in working with python and I am using Melissa Dell's package to extract data from a table image. My image looks like this:
enter image description here
And my code, for now, is the following one:
pip install layoutparser[ocr]
import layoutparser as lp
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import numpy as np
import cv2
from google.cloud.vision_v1 import types
import json
import re
from google.cloud import vision
pip show google-cloud-vision
ocr_agent = lp.GCVAgent.with_credential('mycredebtials.json',
languages = ['es'])
img = plt.imread(r'D:\pdfDispacher.do_Página_2.jpg', cv2.IMREAD_COLOR)
print(img)
plt.imshow(img)
res = ocr_agent.detect(img, return_response=True)
texts = ocr_agent.gather_text_annotations(res)
layout = ocr_agent.gather_full_text_annotation(res, agg_level=lp.GCVFeatureType.WORD)
lp.draw_box(img, layout)
lp.draw_text(img, layout, font_size=12, with_box_on_text=True,
text_box_width=1)
What I need is to tell python to get all the columns and rows and save them in CSV format. But I am not able to get this done.
I really appreciate it if anyone can help me with the next lines.
In this code, I am trying to use a pre-loaded machine learning model and a pre-defined feature extraction function to perform image segmentation on multiple images that I have in my Train_images directory. My code runs all the way until the line "print(file)", and at that point, I am not sure what happens. This is the code I am using to go through a training set of tif image files.
import glob
import pickle
from matplotlib import pyplot as plt
filename = "sandstone_rf_model"
loaded_model = pickle.load(open(filename, 'rb'))
path = "images/Train_images/*.tif"
for file in glob.glob(path):
print(file) # just stop here to see all file names
printed
img1 = cv2.imread(file)
img = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
# Call the feature extraction function.
X = feature_extraction(img)
result = loaded_model.predict(X)
segmented = result.reshape((img.shape))
name = file.split("e_")
plt.imsave('images/Segmented/', segmented, cmap='jet')
Edit:
Previously I got a ValueError, but I changed my last line of code to the following:
plt.imsave('images/segmented_sandstone/'+name[1], segmented, cmap='jet')
Now I receive the following KeyError:
` File
"/Users/zeeshanpatel/opt/anaconda3/envs/master/lib/python3.7/site-
packages/PIL/Image.py", line 2123, in save
save_handler = SAVE[format.upper()]
KeyError: 'TIF'`
The issue is occuring in the last line of the code, please let me know how I can format this to fix this issue.
Use the tifffile library and replace the last line with tifffile.imsave, like this
import tiffile
tifffile.imsave('images/segmented_sandstone/'+name[1], segmented, cmap='jet')
I'm trying to collect all images in a zip file into a numpy array.
Always getting an error:
OSError: Cannot seek back after getting firstbytes!
import urllib.request
import os
import zipfile
import scipy
import numpy as np
import pandas as pd
import glob
import imageio
from os.path import splitext
url = 'https://github.com/yoavram/Sign-Language/raw/master/Dataset.zip'
filename = '../data/sign-lang'
if not os.path.exists('../data'):
os.mkdir('../data')
if not os.path.exists(filename):
urllib.request.urlretrieve(url, filename)
zf = zipfile.ZipFile(filename)
for file in zf.namelist():
basename,extension = splitext(file)
if extension == '.jpg':
with zf.open(file) as img_file:
img = imageio.read(img_file)
Help?
I had a similar problem for the longest time.
The problem is that imageio.read() expects bytes but is being provided with a file-type object.
To fix this, simply read the bytes from the file.
img = imageio.read(img_file.read())
Also, if you want numpy arrays you should use imageio.imread()
I am currently trying to load a csv file of data into spyder and I just cant figure it out. Also my code below gets a value error stating "could not convert string to float:"
My Code:
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('magnet lab.csv',delimiter=',',skiprows=2)
kimberlite = np.array(data[:,0])
forcekimberlite = np.array(data[:1])
plt.scatter(kimberlite,forcekimberlite,s=5,c='red',marker='o')
plt.xlim(0,5)
plt.xlabel('distance from center of magnet to kimberlite')
plt.ylabel('Force')
plt.title('Kimberlite results')
plt.show()
Try this for loading the csv file.
data = np.genfromtxt('magnet lab.csv',delimiter=',')
What I'm trying to do is fairly simple when we're dealing with a local file, but the problem comes when I try to do this with a remote URL.
Basically, I'm trying to create a PIL image object from a file pulled from a URL. Sure, I could always just fetch the URL and store it in a temp file, then open it into an image object, but that feels very inefficient.
Here's what I have:
Image.open(urlopen(url))
It flakes out complaining that seek() isn't available, so then I tried this:
Image.open(urlopen(url).read())
But that didn't work either. Is there a Better Way to do this, or is writing to a temporary file the accepted way of doing this sort of thing?
In Python3 the StringIO and cStringIO modules are gone.
In Python3 you should use:
from PIL import Image
import requests
from io import BytesIO
response = requests.get(url)
img = Image.open(BytesIO(response.content))
Using a StringIO
import urllib, cStringIO
file = cStringIO.StringIO(urllib.urlopen(URL).read())
img = Image.open(file)
The following works for Python 3:
from PIL import Image
import requests
im = Image.open(requests.get(url, stream=True).raw)
References:
https://github.com/python-pillow/Pillow/pull/1151
https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst#280-2015-04-01
Using requests:
from PIL import Image
import requests
from StringIO import StringIO
response = requests.get(url)
img = Image.open(StringIO(response.content))
Python 3
from urllib.request import urlopen
from PIL import Image
img = Image.open(urlopen(url))
img
Jupyter Notebook and IPython
import IPython
url = 'https://newevolutiondesigns.com/images/freebies/colorful-background-14.jpg'
IPython.display.Image(url, width = 250)
Unlike other methods, this method also works in a for loop!
Use StringIO to turn the read string into a file-like object:
from StringIO import StringIO
from PIL import Image
import urllib
Image.open(StringIO(urllib.request.urlopen(url).read()))
For those doing some sklearn/numpy post processing (i.e. Deep learning) you can wrap the PIL object with np.array(). This might save you from having to Google it like I did:
from PIL import Image
import requests
import numpy as np
from StringIO import StringIO
response = requests.get(url)
img = np.array(Image.open(StringIO(response.content)))
The arguably recommended way to do image input/output these days is to use the dedicated package ImageIO. Image data can be read directly from a URL with one simple line of code:
from imageio import imread
image = imread('https://cdn.sstatic.net/Sites/stackoverflow/img/logo.png')
Many answers on this page predate the release of that package and therefore do not mention it. ImageIO started out as component of the Scikit-Image toolkit. It supports a number of scientific formats on top of the ones provided by the popular image-processing library PILlow. It wraps it all in a clean API solely focused on image input/output. In fact, SciPy removed its own image reader/writer in favor of ImageIO.
select the image in chrome, right click on it, click on Copy image address, paste it into a str variable (my_url) to read the image:
import shutil
import requests
my_url = 'https://www.washingtonian.com/wp-content/uploads/2017/06/6-30-17-goat-yoga-congressional-cemetery-1-994x559.jpg'
response = requests.get(my_url, stream=True)
with open('my_image.png', 'wb') as file:
shutil.copyfileobj(response.raw, file)
del response
open it;
from PIL import Image
img = Image.open('my_image.png')
img.show()
Manually wrapping in BytesIO is no longer needed since PIL >= 2.8.0. Just use Image.open(response.raw)
Adding on top of Vinícius's comment:
You should pass stream=True as noted https://requests.readthedocs.io/en/master/user/quickstart/#raw-response-content
So
img = Image.open(requests.get(url, stream=True).raw)
USE urllib.request.urlretrieve() AND PIL.Image.open() TO DOWNLOAD AND READ IMAGE DATA :
import requests
import urllib.request
import PIL
urllib.request.urlretrieve("https://i.imgur.com/ExdKOOz.png", "sample.png")
img = PIL.Image.open("sample.png")
img.show()
or Call requests.get(url) with url as the address of the object file to download via a GET request. Call io.BytesIO(obj) with obj as the content of the response to load the raw data as a bytes object. To load the image data, call PIL.Image.open(bytes_obj) with bytes_obj as the bytes object:
import io
response = requests.get("https://i.imgur.com/ExdKOOz.png")
image_bytes = io.BytesIO(response.content)
img = PIL.Image.open(image_bytes)
img.show()
from PIL import Image
import cv2
import numpy as np
import requests
image=Image.open(requests.get("https://previews.123rf.com/images/darrenwhi/darrenwhi1310/darrenwhi131000024/24022179-photo-of-many-cars-with-one-a-different-color.jpg", stream=True).raw)
#image =resize((420,250))
image_array=np.array(image)
image
To directly get image as numpy array without using PIL
import requests, io
import matplotlib.pyplot as plt
response = requests.get(url).content
img = plt.imread(io.BytesIO(response), format='JPG')
plt.imshow(img)