I'm trying to use the googlesearch api in Python to get the top 10 results for several queries, and I'm encountering two issues:
Changing the country using the 'country' param (e.g country='us' etc..) doesn't seem to have any affect on the results at all. Tried this with several countries.
I want to include the Ads results and can't find any way to do so.
If anyone knows how to do this with googlesearch or any other free API that would be great.
Thanks!
# coding: utf-8
from googlesearch import search
from urlparse import urlparse
import csv
import datetime
keywords = [
"best website builder"
]
countries = [
"us",
"il"
]
filename = 'google_results.csv'
with open(filename, 'w') as f:
writer = csv.writer(f, delimiter=',')
for country in countries:
for keyword in keywords:
print "Showing results for: '" + keyword + "'"
writer.writerow([])
writer.writerow([keyword])
for url in search(keyword, lang='en', stop=10, country=country):
print(urlparse(url).netloc)
print(url)
writer.writerow([urlparse(url).netloc, url])
Answer 1. Your country format is incorrect.
What the module is doing is building the URL to make the request. With the following format:
url_search = "https://www.google.%(tld)s/search?hl=%(lang)s&q=%(query)s&btnG=Google+Search&tbs=%(tbs)s&safe=%(safe)s&cr=%(country)s"
When you give it a country, simply passing in us or il is not enough. You want the country parameter to be in the format of countryXX where XX is the two letter abbreviation. For example France is FR. So country will be countryFR.
And even in the source code it say that this parameter is not always reliable.
:param str country: Country or region to focus the search on. Similar to
changing the TLD, but does not yield exactly the same results.
Only Google knows why...
Answer 2: Ads are dynamically loaded using JavaScript. This library on the other hand only does static parsing. It does not execute any of the JavaScript. You will need to run Selenium or pyppeteer to have the browser execute the JavaScript to get the ads.
Unfortunately, the country targeting parameter is just a signal to Google, not a setting change. Google will not actually show you the results as they appear to an anonymous user in that country. So it's basically useless.
The APIs mentioned above will not fix this either as they only use US based IP addresses. (#Link can you confirm? I'd pay for your API if it wasn't only on US servers.)
So you're going to actually need to run this code from a server with an IP address in the country you're targeting, with the browser settings params of the country language set too.
You won't be able to render the ads either, as they're rendered slightly after the fact separately. There is a huge industry trying to get this right, and anyone who has nailed it charges pretty high fees. But the best place to start would be on an IP address in that country and using selenium. Requests won't cut it, and certainly not if you want ads.
Finally, Google is super aggressive with automated search detection as every search you automate that shows an ad, skews their advertiser numbers and actually costs advertisers money, even if you don't click on them (due to a mechanism called quality score).
If your volume is low, selenium based script with a private IP (as in, not an AWS or Azure data center ip) in that country is your best bet.
And if you figure out how to do this at scale, you'll have people falling over themselves to get the solution.
Related
I start in the web scraping and I am looking for a way to find the postal codes of a company list, using Python and web scraping.
For this I want to use the pandas library since my file is in excel format with the selenium library to search the internet for postal codes corresponding to companies.
For example, in the A column there is company_1. So the algorithm must search for "company_1" on the internet and return the corresponding postal code in the B column of Excel. The difficulty is that I don't have a website to associate for each company.
Is that possible ?
Thanks in advance
Easiest way to achieve what you need is with geocoding - Python has a library called GeoPy for that purpose
# importing geopy library
from geopy.geocoders import Nominatim
# calling the Nominatim tool
loc = Nominatim(user_agent="GetLoc")
# entering the location name
getLoc = loc.geocode("Wawel, Kraków", addressdetails=True)
# printing post code
print(getLoc.raw['address']['postcode'])
Webscraping is not a good solution, because you need to extract specific piece of information without knowing any of the sites' structures.
I am looking for a way in python to input a phone number and get a caller's carrier. I am looking for a free and simple way, I have used TELNYX and it returns CELLCO PARTNERSHIP DBA VERIZON instead of just simply 'verizon' which does not work for me. I have tried Twilio as well and it has not worked for me. Has anyone found success doing this? Thanks in advance. Code for the TELNYX:
def getcarrier(number):
url = 'https://api.telnyx.com/v1/phone_number/1' + number
html = requests.get(url).text
data = json.loads(html)
data = data["carrier"]
print(data["name"])
global carrier
What I have done in the past is to isolate the number prefix. And match against the prefix database available HERE. I did this only for my own country (Bangladesh), so it was a relatively easy code (just a series of if/else). So to work for any number I believe you'll need to consider the country code as well.
You can do it in two ways.
One. Having the data locally stored as CSV from the Wikipedia page. (scraping the page should be easy to do). And then use panda or similar CSV handling package to use it as the database of your program.
Or, two, you can write a small program that scrapes the page on demand and find the operator then.
Good Luck.
def live_price(stock):
string = (data.decode("utf-8"))
conn.request("GET", f"/stock/{stock}/ohlc", headers=headers)
print(Price)
live_price("QCOM")
I want to be able to type "live_price("stockname") and then have the function output the data for the stock. If anyone can help that would be great. All other variables mentioned are defined elsewhere in the code.
import yfinance
def live_price(stock):
inst = yfinance.download(stock)
print(inst['Open'][-1])
When one has a hammer, everything looks like a nail. Or, in different words - The best solution for your problem will actually be achieved with Google Sheets as it has access to Google Finance live data (which is by far the best possible data source for live prices). If you'd later like to make any analysis using Python, you can just draw data from your google sheet either locally with your preferred code editor, or even better, while using Google Colabratory.
I've been trying for hours using requests and urllib. I'm so lost, misunderstood by google too. Some tips, or even anything would be useful. Thank you.
Goals: Post country code and phone numbers, then get mobile carrier etc.
Problem: Not printing anything. Variable "name" prints out None.
def do_ccc(self): #Part of bigger class
"""Phone Number Scan"""
#prefix=input("Phone Number Prefix:")
#number=input("Phone Number: ")
url=("https://freecarrierlookup.com/index.php")
from bs4 import BeautifulSoup
import urllib
data = {'cc' : "COUNTRY CODE",
'phonenum' : "PHONE NUMBER"}#.encode('ascii')
data=json.dump(data,sys.stdout)
page=urllib.request.urlopen(url, data)
soup=BeautifulSoup(page, 'html.parser')
name=soup.find('div', attrs={'class': 'col-sm-6 col-md-8'})
#^^^# Test(should print phone number)
print(name)
As Zags pointed out, it is not a good idea to use a website and violate their terms of service, especially when the site of offers a cheap API.
But answering your original question:
You are using json.loads instead of json.load resulting in an empty an empty data object.
If you look at the page, you will see that the URL for POST requests is different, getcarrier.php instead of index.php.
You would also need to convert your str from json.dumps to bytes and even then the site will reject your calls, since a hidden token is added to each request submitted by the website to prevent automatic scraping.
The problem is with what you're code is trying to do. freecarrierlookup.com is just a website. In order to do what you want, you'd need to do web scraping, which is complicated, unmaintainable, and usually a violation of the site's terms of service.
What you need to do is find an API that provides the data you're looking for. A good API will usually have either sample code or a Python library that you can use to make requests.
There is a website that claims to predict the approximate salary of an individual on the basis of the following criteria presented in the form of individual drop-down
Age : 5 options
Education : 3 Options
Sex : 3 Options
Work Experience : 4 Options
Nationality: 12 Options
On clicking the Submit button, the website gives a bunch of text as output on a new page with an estimate of the salary in numerals.
So, there are technically 5*3*3*4*12 = 2160 data points. I want to get that and arrange it in an excel sheet. Then I would run a regression algorithm to guess the function this website has used. This is what I am looking forward to achieve through this exercise. This is entirely for learning purposes since I'm keen on learning these tools.
But I don't know how to go about it? Any relevant tutorial, documentation, guide would help! I am programming in python and I'd love to use it to achieve this task!
Thanks!
If you are uncomfortable asking them for database as roganjosh suggested :) use Selenium. Write in Python a script that controls Web Driver and repeatedly sends requests to all possible combinations. The script is pretty simple, just a nested loop for each type of parameter/drop down.
If you are sure that value of each type do not depend on each other, check what request is sent to the server. If it is simple URL encoded, like age=...&sex=...&..., then Selenium is not needed. Just generate such URLa for all possible combinations and call the server.