I have tried searching this multiple times and came across no simple resources.
Lets say I have a single URL and I want to search for all images that look the same as said image (let's say <98% similarity, for arguments sake).
I simply want to return the number of images that look the same as my original, that is all.
I have searched Google multiple times ,but found no convenient way for me to do this using. I would preferably like to use Python, but if this is not possible it would be perfect if I could just use the method call in a Python program.
Is it possible? How?
You can search for all images you have through google image search engine simply, because Google has been providing us with their own APIs like
https://ajax.googleapis.com/ajax/services/search/images
And I found related article on stackoverflow, just take a look at python search with image google images
This is a problem more suitable for a web-service to perform. Tineye.com is a reverse image search, and I suspect that you can use curl to send a http request to this web service, and use e.g. BeautifulSoup to extract the number of matches.
Related
I am aware that this can't be done with bash script only, or it isn't as far as I know (and I'm still learning). This is why I'm asking for help. What do I need more ? Are there specific tools ?
This is what I'd like to do:
Upload an image to https://www.google.com/searchbyimage/upload
Then find all the identical images
Download the one which has the greatest resolution
So far I've been able to upload an image to Searchbyimage through curl. This uploaded image then creates a very long token that is used to search similar images, with some supplementary keywords.
The uploaded image creates a link composed like so:
https://www.google.com/search?tbs=sbi:
After this is the awfully long token: AMhZZith3JfR2OzwmuyQjufBifvdFWNjMShRMypWIE2-g005QfYLeTATLhGHAWz8MLI-tbgHzZp-bREPlJbsNWhY7U4Z2_19bu0oHII6VJPIVVJSPANODqnrJXp6X5VKKoXHMLcBCmI9eIpxS_1EX9g9YJPFL2XFEfJqIApLX83erP5mlRM7rSiIF5Te_1RPNyVkp4IPZPBRtoOKGhpDw2xad-JZsqd2ai4F5sMvyO2A_18PMFKg21nTRH_1jVeOeUhz8U5zkL4lycIg3kafAYlNy8YwmjSFcmc2nZB_10t9MFyi2BnBmemDRp4DCACI0FVM6pLTIB8VCBpU9A
And it adds this at the end: &hl=fr.
Finally the image is searched, and I have the choice between clicking "similar images" or "all sizes" (it's "all sizes" I want, as similar images doesn't ensure it will be identical). This will add some keywords from google's analysis of the picture (here, a photography of Émile Zola) and create a second token:
The picture I searched here
https://www.google.com/search?safe=strict&hl=fr&
q=emile+zola&tbm=isch
&tbs=simg:
CAQSmQEJthA57uIOXdcajQELEKjU2AQaBggXCD0IQgwLELCMpwgaYgpgCAMSKLQZ9QH3BLMZ2A6xGdcO3w70Ad0OwjrEOqEuwzqiLsE67iSTLoM4oC4aMIk1iw7XQn7Wu55hLB2k-bnfW3_1yf24eA0N-w-baKvWkDj48J67yZZS-uQ-BgjCRQyAEDAsQjq7-CBoKCggIARIEnfZWUgw&sa=X&ved=0ahUKEwi965ashtrhAhWI3eAKHSmRCBwQ2A4IKygB
&biw=1920&bih=944
With at the end the resolution of the picture. The idea is to recreate this second link, to then download the highest resolution image amongst what google has found. I have to get the token, but everything else can be found on the picture file itself: the file is properly named after the picture, and thus could make for keywords, and its resolution is also easily known. I'd like to make it a script, to download higher resolution images of many paintings - over a thousand - that are in low quality. Ideally I'd use it quite often. So far I had found how to upload a picture with curl, and it had gave me back a token, but uncomplete. Beyond this, I was completely lost.
In theory this doesn't seem impossible. The problem is I'm too much of a newbie: I enjoy a lot so far Linux and bash, but I only know so few. I have of course done some hours of googling before, nothing showed up that I knew I could use. There is nothing alike neither on github: a lot of scripts that search for similar images, but none for identical. None of them that also compares the sizes of these images. There's also a python API for reverse image searching, but it didn't seem like it could search for identical images, and it seems related to the google API, which is problematic. All of this is probably dumbly hard for me because I'm only a beginner, and I don't know enough to build this script: but in another way - maybe due to my lack of knowledge - it doesn't seem impossible at all, and I'm very willing to try, fail, try again: learn. So here I am, to ask: how do I do that ? Can it be done in bash only ? If not, what must I include ? Or perhaps it cannot be done ?
Lastly, I know there is a google API for reverse image searching. That'd be very useful, if it wasn't limited to a hundred image searches a day: if you want more, you've got to pay. And by a 100 images a day, it'd take me around eleven days to reverse search all the images I wanted in a better quality: in the end, I'd be done as fast by searching all that myself, by hand. But neither these options seems to be a solution: and this script doesn't seem impossible. It is only beyond my current capacities.
Thank you in advance, if anyone has got an idea !
PS: I can use linux wether through WSL, or a virtual machine. Both work very fine so far, including whatever command or package. WSL is much faster. And sorry for my english, I'm french !
Second PS: I've been asked to show what I had as code, but this doesn't get beyond this:
curl -i -F sch=sch -F encoded_image=#path/to/my/imagefile.jpg https://www.google.com/searchbyimage/upload
Which was a partial answer to my question I had found here:
How to use google search by image in curl
There's two fundamental ways to use the web programmatically:
via API: this is purpose built for computers to access web resources and always preferred. You follow strict rules and get well defined results back.
by crawling: this is when the computer pretends to be a user, emulating the clicking on links done in a browser. Basically curl, but over and over again with state stored in between, parameters generated correctly, encoding applied, etc.
As you say, there's an API available so if it does what you want then it's the right way to go. The fact that it does what you want, but enforces limits, is a very useful sign that was you're trying to do has limits. Those limits will have been carefully set to incentivise you to work within them. Trying to crawl for the same results will likely either breach Google's service term limits, or your sanity limits.
So if you really want to work around the API, then use a crawler library such as Python Scrapy. But note that the API limits might be a useful indication of how far you can expect to get without paying.
I am looking for a way to make a reverse image search with Python. So the input would be an image or an image-url and as an output I would like to have the number how often this image was found in the www and maybe the urls to these found images.
Is there an API I can use? Bing, Yahoo, Google? As far as I understand it is not possible with via Google… or is there a possible workaround?
In general I am interested in "measering" how spread or distributed an image is. If anyone has a suggestion please tell me :)
Thanks!
I don't see this as possible (or very easy).
For a reverse image search, you would have to scan the entire internet, loading each page then checking if the image is there.
Personally, I would look into BeautifulSoup to read in the HTML data from googles own reverse image search (or some page that implements googles reverse image search)
I need some image sample for machine learning training. I have not enough resource now, so I need to crawl some using the search engine. Google is not free now and I choose bing.
I have tried pybing. It seems not work now.
I don't known how to get the appid.
from py_bing_search import PyBingImageSearch
bing_image = PyBingImageSearch('Your-Api-Key-Here', "x-box console", image_filters='Size:medium+Color:Monochrome') #image_filters is optional
first_fifty_result= bing_image.search(limit=50, format='json') #1-50
print (first_fifty_result[0].media_url)
I am trying to rename the images in my massive pictures folder by searching google images by each image and naming them the result next to "Best guess for this image: ". I understand that google does have a python API but I am unsure if it can be used in this way, or if that is a reasonable project for someone of my limited experience.
https://developers.google.com/appengine/docs/python/images/usingimages#Uploading seems to be helpful but I'm not sure I understand what I need to be doing conceptually.
Another option is to use the drag-and-drop feature but I have not looked into that as much.
Thanks in advance for any guidance.
As far as I know, Google still doesn't offer a public API for its reverse image search service (i.e. you send a picture and get textual search results).
The most popular alternative that I know is TinEye ( http://www.tineye.com/ ). Here's a link to their RESTful API: http://services.tineye.com/TinEyeAPI
Similar to Reddit's r/pic sub-reddit, I want to aggregate media from various sources. Some sites use OEmbed specs to expose media on the page but not all sites do it. I was browsing through Reddit's source because essentially they 'scrape' links that users submit, retrieve images, videos etc. They create thumbnails which are then displayed along the link on their site. Now, I would like to do something similar and I looked at their code[1] and it seems that they have custom scrapers for each domain that they recognize and then they have a generic Scraper class that uses simple logic to get images from any domain (basically they retrieve the web-page, parse the html and then determine the largest image on the page which they then use to generate a thumbnail).
Since it's open source I can probably reuse the code for my application but unfortunately I have chosen Perl as this is a hobby project and I'm trying to learn Perl. Is there a Perl module which has similar functionality? If not, is there a Perl module that is similar to Python Imaging Library? It would be handy to determine the image sizes without actually downloading the whole image & thumbnail generation.
Thanks!
[1] https://github.com/reddit/reddit/blob/master/r2/r2/lib/scraper.py
Image::Size is the specialised module for determining image sizes from various format. It should be enough to read the first 1000 octets or so from a resource, enough for the diverse image headers, into a buffer and operating on that. I have not tested this.
I do not know any general scraping module that has an API for HTTP range requests in order to avoid downloading the whole image resource, but it is easy to subclass WWW::Mechanize.
Try PerlMagick, installation instruction is also listed there.