google search with python requests library

google search with python requests library - python

(I've tried looking but all of the other answers seem to be using urllib2)
I've just started trying to use requests, but I'm still not very clear on how to send or request something additional from the page. For example, I'll have
import requests
r = requests.get('http://google.com')
but I have no idea how to now, for example, do a google search using the search bar presented. I've read the quickstart guide but I'm not very familiar with HTML POST and the like, so it hasn't been very helpful.
Is there a clean and elegant way to do what I am asking?

Request Overview
The Google search request is a standard HTTP GET command. It includes a collection of parameters relevant to your queries. These parameters are included in the request URL as name=value pairs separated by ampersand (&) characters. Parameters include data like the search query and a unique CSE ID (cx) that identifies the CSE that is making the HTTP request. The WebSearch or Image Search service returns XML results in response to your HTTP requests.
First, you must get your CSE ID (cx parameter) at Control Panel of Custom Search Engine
Then, See the official Google Developers site for Custom Search.
There are many examples like this:
http://www.google.com/search?
start=0
&num=10
&q=red+sox
&cr=countryCA
&lr=lang_fr
&client=google-csbe
&output=xml_no_dtd
&cx=00255077836266642015:u-scht7a-8i
And there are explained the list of parameters that you can use.

import requests
from bs4 import BeautifulSoup
headers_Get = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
}
def google(q):
s = requests.Session()
q = '+'.join(q.split())
url = 'https://www.google.com/search?q=' + q + '&ie=utf-8&oe=utf-8'
r = s.get(url, headers=headers_Get)
soup = BeautifulSoup(r.text, "html.parser")
output = []
for searchWrapper in soup.find_all('h3', {'class':'r'}): #this line may change in future based on google's web page structure
url = searchWrapper.find('a')["href"]
text = searchWrapper.find('a').text.strip()
result = {'text': text, 'url': url}
output.append(result)
return output
Will return an array of google results in {'text': text, 'url': url} format. Top result url would be google('search query')[0]['url']

input:
import requests
def googleSearch(query):
with requests.session() as c:
url = 'https://www.google.co.in'
query = {'q': query}
urllink = requests.get(url, params=query)
print urllink.url
googleSearch('Linkin Park')
output:
https://www.google.co.in/?q=Linkin+Park

The readable way to send a request with many query parameters would be to pass URL parameters as a dictionary:
params = {
'q': 'minecraft', # search query
'gl': 'us', # country where to search from
'hl': 'en', # language
}
requests.get('URL', params=params)
But, in order to get the actual response (output/text/data) that you see in the browser you need to send additional headers, more specifically user-agent which is needed to act as a "real" user visit when bot or browser sends a fake user-agent string to announce themselves as a different client.
The reason that your request might be blocked is that the default requests user agent is python-requests and websites understand that. Check what's your user agent.
You can read more about it in the blog post I wrote about how to reduce the chance of being blocked while web scraping.
Pass user-agent:
headers = {
'User-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}
requests.get('URL', headers=headers)
Code and example in the online IDE:
from bs4 import BeautifulSoup
import requests, lxml
headers = {
'User-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}
params = {
'q': 'minecraft',
'gl': 'us',
'hl': 'en',
}
html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
for result in soup.select('.tF2Cxc'):
title = result.select_one('.DKV0Md').text
link = result.select_one('.yuRUbf a')['href']
print(title, link, sep='\n')
Alternatively, you can achieve the same thing by using Google Organic API from SerpApi. It's a paid API with a free plan.
The difference is that you don't have to create it from scratch and maintain it.
Code to integrate:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "tesla",
"hl": "en",
"gl": "us",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results["organic_results"]:
print(result['title'])
print(result['link'])
Disclaimer, I work for SerpApi.

In this code by using bs4 you can get all the h3 and print their text
# Import the beautifulsoup
# and request libraries of python.
import requests
import bs4
# Make two strings with default google search URL
# 'https://google.com/search?q=' and
# our customized search keyword.
# Concatenate them
text= "c++ linear search program"
url = 'https://google.com/search?q=' + text
# Fetch the URL data using requests.get(url),
# store it in a variable, request_result.
request_result=requests.get( url )
# Creating soup from the fetched request
soup = bs4.BeautifulSoup(request_result.text,"html.parser")
filter=soup.find_all("h3")
for i in range(0,len(filter)):
print(filter[i].get_text())

You can use 'webbroser', I think it doesn't get easier than that:
import webbrowser
query = input('Enter your query: ')
webbrowser.open(f'https://google.com/search?q={query}')

Related

Unable to fetch tabular content from a site using requests

I'm trying to fetch tabular content from a webpage using the requests module. After navigating to that webpage, when I manually type 0466425389 right next to Company number and hit the search button, the table is produced accordingly. However, when I mimic the same using requests, I get the following response.
<?xml version='1.0' encoding='UTF-8'?>
<partial-response><redirect url="/bc9/web/catalog"></redirect></partial-response>
I've tried with:
import requests
link = 'https://cri.nbb.be/bc9/web/catalog?execution=e1s1'
payload = {
'javax.faces.partial.ajax': 'true',
'javax.faces.source': 'page_searchForm:actions:0:button',
'javax.faces.partial.execute': 'page_searchForm',
'javax.faces.partial.render': 'page_searchForm page_listForm pageMessagesId',
'page_searchForm:actions:0:button': 'page_searchForm:actions:0:button',
'page_searchForm': 'page_searchForm',
'page_searchForm:j_id3:generated_number_2_component': '0466425389',
'page_searchForm:j_id3:generated_name_4_component': '',
'page_searchForm:j_id3:generated_address_zipCode_6_component': '',
'page_searchForm:j_id3_activeIndex': '0',
'page_searchForm:j_id2_stateholder': 'panel_param_visible;',
'page_searchForm:j_idt133_stateholder': 'panel_param_visible;',
'javax.faces.ViewState': 'e1s1'
}
headers = {
'Faces-Request': 'partial/ajax',
'X-Requested-With': 'XMLHttpRequest',
'Origin': 'https://cri.nbb.be',
'Accept': 'application/xml, text/xml, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br',
'Host': 'cri.nbb.be',
'Origin': 'https://cri.nbb.be',
'Referer': 'https://cri.nbb.be/bc9/web/catalog?execution=e1s1'
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'
s.get(link)
s.headers.update(headers)
res = s.post(link,data=payload)
print(res.text)
How can I fetch tabular content from that site using requests?

From looking at the "action" attribute on the search form, the form appears to generate a new JSESSIONID every time it is opened, and this seems to be a required attribute. I had some success by including this in the URL.
You don't need to explicitly set the headers other than the User-Agent.
I added some code: (a) to pull out the "action" attribute of the form using BeautifulSoup - you could do this with regex if you prefer, (b) to get the url from that redirection XML that you showed at the top of your question.
import re
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup
...
with requests.Session() as s:
s.headers["User-Agent"] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
# GET to get search form
req1 = s.get(link)
# Get the form action
soup = BeautifulSoup(req1.text, "lxml")
form = soup.select_one("#page_searchForm")
form_action = urljoin(link, form["action"])
# POST the form
req2 = s.post(form_action, data=payload)
# Extract the target from the redirection xml response
target = re.search('url="(.*?)"', req2.text).group(1)
# Final GET to get the search result
req3 = s.get(urljoin(link, target))
# Parse and print (some of) the result
soup = BeautifulSoup(req3.text, "lxml").body
for detail in soup.select(".company-details tr"):
columns = detail.select("td")
if columns:
print(f"{columns[0].text.strip()}: {columns[1].text.strip()}")
Result:
Company number: 0466.425.389
Name: A en B PARTNERS
Address: Quai de Willebroeck 37
: BE 1000 Bruxelles
Municipality code NIS: 21004 Bruxelles
Legal form: Cooperative company with limited liability
Legal situation: Normal situation
Activity code (NACE-BEL)
The activity code of the company is the statistical activity code in use on the date of consultation, given by the CBSO based on the main activity codes available at the Crossroads Bank for Enterprises and supplementary informations collected from the companies: 69201 - Accountants and fiscal advisors

I think requests could not handle dynamic web pages. I use helium and pandas to do the work.
import helium as he
import pandas as pd
url = 'https://cri.nbb.be/bc9/web/catalog?execution=e1s1'
driver = he.start_chrome(url)
he.write('0466425389', into='Company number')
he.click('Search')
he.wait_until(he.Button('New search').exists)
he.select(he.ComboBox('10'), '100')
he.wait_until(he.Button('New search').exists)
with open('download.html', 'w') as html:
html.write(driver.page_source)
he.kill_browser()
df = pd.read_html('download.html')
df[2]
Output

BeautifulSoup returns empty brackets

I'm trying to get how many results have a search in Google with bs4 library in python, but while doing it, it returns empty brackets.
Here is my code:
import requests
from bs4 import BeautifulSoup
url_page = 'https://www.google.com/search?q=covid&oq=covid&aqs=chrome.0.0i433l2j0i131i433j0i433j0i131i433l2j0j0i131i433j0i433j0i131i433.691j0j7&sourceid=chrome&ie=UTF-8'
page = requests.get(url_page).text
soup = BeautifulSoup(page, "lxml")
elTexto = soup.find_all(attrs ={'class': 'LHJvCe'})
print(elTexto)
I have an extension in google that check if the html class is correct and it gives me what I'm looking for so I guess that is not the problem.... Maybe is something related with the format of the 'text' I'm trying to get...
Thanks!

It is better to use gsearch package to accomplish your task, rather than scraping the web page manually.

Google is not randomizing classes as baduker mentioned. They could change some class names over time but they're not randomizing them.
One of the reasons why you get an empty result is because you haven't specified HTTP user-agent aka headers, thus Google might block your request and headers might help to avoid it. You can check what is your user-agent here. Headers will look like this:
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
requests.get('YOUR URL', headers=headers)
Also, you don't need to use find_all()/findAll() or select() since you're trying to get only one occurrence, not all of them. Use instead:
find('ELEMENT NAME', class_='CLASS NAME')
select_one('.CSS_SELECTORs')
select()/select_one() usually faster.
Code and example in the online IDE (note: the number of results will always differ. It just works this way.):
import requests, lxml
from bs4 import BeautifulSoup
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
"q": "fus ro dah defenition",
"gl": "us",
"hl": "en"
}
response = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')
number_of_results = soup.select_one('#result-stats nobr').previous_sibling
print(number_of_results)
# About 104,000 results
Alternatively, you achieve the same thing using Google Organic Results API from SerpApi, except you don't need to figure out why certain things don't work and instead iterate over structured JSON string and get the data you want.
Code:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "fus ro dah defenition",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
result = results["search_information"]['total_results']
print(result)
# 104000
Disclaimer, I work for SerpApi.

web crawling Google - getting different results

I have written the following Python script, to crawl and scrape headings of Google News search results, within a specific date range. Though the script is working, it's showing the latest search results, and not the ones mentioned in the list.
E.g. Rather than showing results from 1 Jul 2015 - 7 Jul 2015, the script is showing results from May 2016 (current month)
import urllib.request
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
#get and read the URL
url = ("https://www.google.co.in/search?q=banking&num=100&safe=off&espv=2&biw=1920&bih=921&source=lnt&tbs=cdr%3A1%2Ccd_min%3A01%2F07%2F2015%2Ccd_max%3A07%2F07%2F2015&tbm=nws")
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
html = opener.open(url)
bsObj = BeautifulSoup(html.read(), "html5lib")
#extracts all the links from the given page
itmes = bsObj.findAll("h3")
for item in itmes:
itemA = item.a
theHeading = itemA.text
print(theHeading)
Can someone please guide me to the correct method of getting the desired results, sorted by dates?
Thanks in advance.

I did some tests and it seems the problem is coming from the User-Agent which is not detailed enough.
Try replacing this line:
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
with:
opener.addheaders = [('User-agent', "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:36.0) Gecko/20100101 Firefox/36.0"),
It worked for me.
Of course this User-Agent is just an example.

As Julien Salinas wrote, it's because there's no user-agent specified in your request headers.
For example, the default requests user-agent is python-requests thus Google blocks a request because it knows that it's a bot and not a "real" user visit and you received a different HTML with different selectors and elements, and some sort of an error. User-agent fakes user visit by adding this information into HTTP request headers.
Pass user-agent into request headers using requests library:
headers = {
'User-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
}
response = requests.get('YOUR_URL', headers=headers)
Code and example in the online IDE:
from bs4 import BeautifulSoup
import requests, lxml
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
"q": "best potato recipes",
"hl": "en",
"gl": "us",
"tbm": "nws",
}
html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
for result in soup.select('.WlydOe'):
title = result.select_one('.nDgy9d').text
link = result['href']
print(title, link, sep='\n')
----------
'''
Call of Duty Vanguard (PS5) Beta Impressions – A Champion Hill To Die On
https://wccftech.com/call-of-duty-vanguard-ps5-beta-impressions-a-champion-hill-to-die-on/
Warzone players call for fan-favorite MW2 map to be added to Verdansk
https://charlieintel.com/warzone-players-call-for-fan-favorite-mw2-map-to-be-added-to-verdansk/114014/
'''
Alternatively, you can achieve the same thing by using Google News Results API from SerpApi. It's a paid API with a free plan.
The difference in your case is that you don't figure out why certain things don't work, bypass blocks from search engines since it's already done for the end-user, and you only need to iterate over structured JSON and get the data you want.
Code to integrate:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "Call of duty 360 no scope",
"tbm": "nws",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for news_result in results["news_results"]:
print(f"Title: {news_result['title']}\nLink: {news_result['link']}\n")
----------
'''
Call of Duty Vanguard (PS5) Beta Impressions – A Champion Hill To Die On
https://wccftech.com/call-of-duty-vanguard-ps5-beta-impressions-a-champion-hill-to-die-on/
Warzone players call for fan-favorite MW2 map to be added to Verdansk
https://charlieintel.com/warzone-players-call-for-fan-favorite-mw2-map-to-be-added-to-verdansk/114014/
'''
P.S - I wrote a bit more in-depth blog post about how to scrape Google News Results.
Disclaimer, I work for SerpApi.

Post forms using requests on .net website (python)

import requests
headers ={
"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding":"gzip, deflate",
"Accept-Language":"en-US,en;q=0.5",
"Connection":"keep-alive",
"Host":"mcfbd.com",
"Referer":"https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx",
"User-Agent":"Mozilla/5.0(Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0"}
a = requests.session()
soup = BeautifulSoup(a.get("https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx").content)
payload = {"ctl00$ContentPlaceHolder1$txtSearchHouse":"",
"ctl00$ContentPlaceHolder1$txtSearchSector":"",
"ctl00$ContentPlaceHolder1$txtPropertyID":"",
"ctl00$ContentPlaceHolder1$txtownername":"",
"ctl00$ContentPlaceHolder1$ddlZone":"1",
"ctl00$ContentPlaceHolder1$ddlSector":"2",
"ctl00$ContentPlaceHolder1$ddlBlock":"2",
"ctl00$ContentPlaceHolder1$btnFind":"Search",
"__VIEWSTATE":soup.find('input',{'id':'__VIEWSTATE'})["value"],
"__VIEWSTATEGENERATOR":"14039419",
"__EVENTVALIDATION":soup.find("input",{"name":"__EVENTVALIDATION"})["value"],
"__SCROLLPOSITIONX":"0",
"__SCROLLPOSITIONY":"0"}
b = a.post("https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx",headers = headers,data = payload).text
print(b)
above is my code for this website.
https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx
I checked firebug out and these are the values of the form data.
however doing this:
b = requests.post("https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx",headers = headers,data = payload).text
print(b)
throws this error:
[ArgumentException]: Invalid postback or callback argument
is my understanding of submitting forms via request correct?
1.open firebug
2.submit form
3.go to the NET tab
4.on the NET tab choose the post tab
5.copy form data like the code above
I've always wanted to know how to do this. I could use selenium but I thought I'd try something new and use requests

The error you are receiving is correct because the fields like _VIEWSTATE (and others as well) are not static or hardcoded. The proper way to do this is as follows:
Create a Requests Session object. Also, it is advisable to update it with headers containing USER-AGENT string -
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36",}`
s = requests.session()
Navigate to the specified url -
r = s.get(url)
Use BeautifulSoup4 to parse the html returned -
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.content, 'html5lib')
Populate formdata with the hardcoded values and dynamic values -
formdata = {
'__VIEWSTATE': soup.find('input', attrs={'name': '__VIEWSTATE'})['value'],
'field1': 'value1'
}
Then send the POST request using the session object itself -
s.post(url, data=formdata)

Wrong number of results in Google Scrape with Python

I was trying to learn web scraping and I am facing a freaky issue... My task is to search Google for news on a topic in a certain date range and count the number of results.
my simple code is
import requests, bs4
payload = {'as_epq': 'James Clark', 'tbs':'cdr:1,cd_min:1/01/2015,cd_max:1/01/2015','tbm':'nws'}
r = requests.get("https://www.google.com/search", params=payload)
soup = bs4.BeautifulSoup(r.text)
elems = soup.select('#resultStats')
print(elems[0].getText())
And the result I get is
About 8,600 results
So apparently all works... apart from the fact that the result is wrong. If I open the URL in Firefox (I can obtain the complete URL with r.url)
https://www.google.com/search?tbm=nws&as_epq=James+Clark&tbs=cdr%3A1%2Ccd_min%3A1%2F01%2F2015%2Ccd_max%3A1%2F01%2F2015
I see that the results are actually only 2, and if I manually download the HTML file, open the page source and search for id="resultStats" I find that the number of results is indeed 2!
Can anybody help me to understand why searching for the same id tag in the saved HTML file and in the soup item lead to two different numerical results?
************** UPDATE
It seems that the problem is the custom date range that does not get processed correctly by requests.get. If I use the same URL with selenium I get the correct answer
from selenium import webdriver
driver = webdriver.Firefox()
driver.get(url)
content = driver.page_source
soup = bs4.BeautifulSoup(content)
elems = soup.select('#resultStats')
print(elems[0].getText())
And the answer is
2 results (0.09 seconds)
The problem is that this methodology seems to be more cumbersome because I need to open the page in Firefox...

There are a couple of things that is causing this issue. First, it wants day and month parts of date in 2 digits and it is also expecting a user-agent string of some popular browser. Following code should work:
import requests, bs4
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36"
}
payload = {'as_epq': 'James Clark', 'tbs':'cdr:1,cd_min:01/01/2015,cd_max:01/01/2015', 'tbm':'nws'}
r = requests.get("https://www.google.com/search", params=payload, headers=headers)
soup = bs4.BeautifulSoup(r.content, 'html5lib')
print soup.find(id='resultStats').text

To add to Vikas' answer, Google will also fail to use 'custom date range' for some user-agents. That is, for certain user-agents, Google will simply search for 'recent' results instead of your specified date range.
I haven't detected a clear pattern in which user-agents will break the custom date range. It seems that including a language is a factor.
Here are some examples of user-agents that break cdr:
Mozilla/5.0 (Windows; U; Windows NT 6.1; fr-FR) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27
Mozilla/4.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/5.0)

There's no need in selenium, you're looking for this:
soup.select_one('#result-stats nobr').previous_sibling
# About 10,700,000 results
Code and full example in the online IDE:
from bs4 import BeautifulSoup
import requests, lxml
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
"q": 'James Clark', # query
"hl": "en", # lang
"gl": "us", # country to search from
"tbm": "nws", # news filter
}
html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
# if used without "nobr" selector and previous_sibling it will return seconds as well: (0.41 secods)
number_of_results = soup.select_one('#result-stats nobr').previous_sibling
print(number_of_results)
# About 10,700,000 results
Alternatively, you can achieve the same thing by using Google News Results API from SerpApi. It's a paid API with a free plan.
The difference in your case is that you don't have to find which selectors will make the work done, or figure out why some of them don't return data you want also they should, bypass blocks from search engines, and maintain it over time.
Instead, you only need to iterate over structured JSON and get the data you want, fast.
Code to integrate for your case:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": 'James Clark',
"tbm": "nws",
"gl": "us",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
number_of_results = results['search_information']['total_results']
print(number_of_results)
# 14300000
Disclaimer, I work for SerpApi.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

google search with python requests library - python

input: import requests def googleSearch(query): with requests.session() as c: url = 'https://www.google.co.in' query = {'q': query} urllink = requests.get(url, params=query) print urllink.url googleSearch('Linkin Park') output: https://www.google.co.in/?q=Linkin+Park

You can use 'webbroser', I think it doesn't get easier than that: import webbrowser query = input('Enter your query: ') webbrowser.open(f'https://google.com/search?q={query}')

Related

Unable to fetch tabular content from a site using requests

BeautifulSoup returns empty brackets

web crawling Google - getting different results

Post forms using requests on .net website (python)

Wrong number of results in Google Scrape with Python

Categories

Resources