Working on this for a school project. I'm basically scraping IP addresses off of the wikipedia history. I'm then running the IP addresses through the ipstack.com API and getting lat and long. I'm then trying to push the lat and long to the opencage API but here is where I am running into issue. If I hard code a lat and long into this it returns a city.
result = geocoder.opencage([latitude, longitude], key=key, method='reverse')
print(result.city)
But when I try to loop through a lat and long list I get an error
TypeError: cannot convert the series to <class 'float'>
I'm thinking this might have to do with series type but then again I might be going about it completely wrong. Any ideas?
from bs4 import BeautifulSoup
import requests
from urllib.request import urlopen
import pandas as pd
import re
from opencage.geocoder import OpenCageGeocode
import geocoder
response = requests.get("https://en.wikipedia.org/w/index.php?title=Gun_laws_in_New_Hampshire&action=history")
soup = BeautifulSoup(response.text, "lxml")
bdi_text = []
for bdi_tag in soup.find_all('bdi'):
bdi_text.append(bdi_tag.text)
ip_addresses = []
for element in bdi_text:
ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', element)
if len(ip) > 0:
ip_addresses.append(ip)
api_key = '?access_key={YOUR_API_ACCESS_KEY}'
resolved_ips = []
for ips in ip_addresses:
api_call = requests.get('http://api.ipstack.com/' + ips[0] + api_key).json()
resolved_ips.append(api_call)
ip_df = pd.DataFrame.from_records(resolved_ips)
ip_df = ip_df[['city','country_code','latitude','longitude']]
key = 'my_API_key'
latitude = ip_df['latitude']
longitude = ip_df['longitude']
result = []
print(len(latitude))
for latlong in range(0,len(latitude)):
result = geocoder.opencage([latitude, longitude], key=key, method='reverse')
print(result.city)
Your implementation is rough. I would do something like this
def make_city(row):
result = geocoder.opencage(float(row['latitude']), #lat of target
float(row['longitude']), #long of target
key=key, #API key that I will keep to myself
method='reverse')
print(result.city)
ip_df.apply(make_city, axis = 1)
I think its getting confused with the type you are passing:
Not sure of the exact structure of your data, but try this instead:
latitude = ip_df['latitude'].astype(float)
longitude = ip_df['longitude'].astype(float)
Related
I'm trying to make a list of tuples, the first element being the download URL and the second being the file name from the URL string with below code:
import urllib
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io
url = r"https://www.ers.usda.gov/data-products/livestock-meat-domestic-data"
my_bytes = urllib.request.urlopen(url)
my_bytes = my_bytes.read().decode("utf8")
parsed_html = BeautifulSoup(my_bytes, features = "lxml")
table_data = parsed_html.body.find('table', attrs = {'id':'data_table'})
download_url = "https://www.ers.usda.gov"
full_download_url = [tuple(download_url,i["href"]) for i in table_data.find_all('a')]
But I've been getting TypeError: must be str, not list all along and I'm not sure how to fix this, please help? Thanks!
This is what I needed:
import urllib
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io
url = r"https://www.ers.usda.gov/data-products/livestock-meat-domestic-data"
my_bytes = urllib.request.urlopen(url)
my_bytes = my_bytes.read().decode("utf8")
parsed_html = BeautifulSoup(my_bytes, features = "lxml")
table_data = parsed_html.body.find('table', attrs = {'id':'data_table'})
download_url = "https://www.ers.usda.gov"
def convertTuple(tup):
str = ''
for item in tup:
str = str + item
return str
full_download_url = [convertTuple(tuple(download_url + i["href"])) for i in table_data.find_all('a')]
Thanks to Geeks for geeks and everyone trying to help :)
You are incorrectly accessing the download_url array index.
Python is interpreting your code as creating an array with one element [0] when i is
0 for example, and then trying to access the element ["href"] which is a string, not a valid index
If you specify download_url before accessing the indices it will work as expected
full_download_url = [(download_url, download_url[i]["href"]) for i in table_data.find_all('a')]
Python code:
url = 'https://www.basketball-reference.com/players/'
initial = list(string.ascii_lowercase)
initial_url = [url + i for i in initial]
html_initial = [urllib.request.urlopen(i).read() for i in initial_url]
soup_initial = [BeautifulSoup(i, 'html.parser') for i in html_initial]
tags_initial = [i('a') for i in soup_initial]
print(tags_initial[0][50])
Results example:
Shareef Abdur-Rahim
From the example above, I want to extract the name of the players which is 'Shareef Abdur-Rahim', but I want to do it for all the tags_initial lists,
Does anyone have an idea?
Could you modify your post by adding your code so that we can help you better?
Maybe that could help you :
name = soup.findAll(YOUR_SELECTOR)[0].string
UPDATE
import re
import string
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = 'https://www.basketball-reference.com/players/'
# Alphabet
initial = list(string.ascii_lowercase)
datas = []
# URLS
urls = [url + i for i in initial]
for url in urls:
# Soup Object
soup = BeautifulSoup(urlopen(url), 'html.parser')
# Players link
url_links = soup.findAll("a", href=re.compile("players"))
for link in url_links:
# Player name
datas.append(link.string)
print("datas : ", datas)
Then, "datas" contains all the names of the players, but I advise you to do a little processing afterwards to remove some erroneous information like "..." or perhaps duplicates
There are probably better ways but I'd do it like this:
html = "a href=\"/teams/LAL/2021.html\">Los Angeles Lakers</a"
index = html.find("a href")
index = html.find(">", index) + 1
index_end = html.find("<", index)
print(html[index:index_end])
If you're using a scraper library it probably has a similar function built-in.
I am trying to create a function that filters json data pulled from the Google places api. I want it to return the name of a business and the types values if the name contains the string "Body Shop" and the types are ['car_repair', 'point_of_interest', 'establishment'] otherwise I want it to reject the result. Here is my code so far. I have tried and tried and can't seem to figure out a way to store certain criteria to make the search easier.
import googlemaps
import pprint
import time
import urllib.request
API_KEY = 'YOUR_API_KEY'
lat, lng = 40.35003, -111.95206
#define our Client
gmaps = googlemaps.Client(key = API_KEY)
#Define our Search
places_result = gmaps.places_nearby(location= "40.35003,-111.95206", radius= 40000,open_now= False,type= ['car_repair','point_of_interest','establishment'])
#pprint.pprint(places_result['results'])
time.sleep(3)
places_result_2 = gmaps.places_nearby(page_token =
places_result['next_page_token'])
pprint.pprint(places_result_2['results'])
places_result_2 = gmaps.places_nearby(page_token =
places_result['next_page_token'])
types = place_details['result']['types']
name = place_details['result']['name']
def match(types,name):
for val in types:
'car_repair','point_of_interest','establishment' in val and "Body Shop" in name
print(name,types)
Try this:
import googlemaps
import pprint
import time
import urllib.request
API_KEY = 'YOUR_API_KEY'
lat, lng = 40.35003, -111.95206
#define our Client
gmaps = googlemaps.Client(key = API_KEY)
#Define our Search
places_result = gmaps.places_nearby(location= "40.35003,-111.95206", radius= 40000,open_now= False,type= ['car_repair','point_of_interest','establishment'])
#heres how to retrieve the name of the first result
example_of_name = places_result['results'][0]['name']
print(example_of_name)
#gets places name and type for all the results
for place in places_result['results']:
print("Name of Place:")
print(place['name'])
print("Type of the place:")
print(place['types'], "\n")
I want to write an translation api using this site, which has many desirable features when deal with sentences with wildcards.
First I use F12 in chrome to see what request url is using to produce the result.
I checked that only salt and sigh changed when I use different inputs.
So I look the js source code to see how salt and sigh were produced.
Then I use python library urllib to send the request and get the response. But the response translation was not the same when I use the browser to get it. For example,
Input :"what album was #head_entity# released on?"
Output_browser: "#head_entity#发布了什么专辑?"
Output_python:"发布的专辑是什么# head_entity?#"
which is clearly different.
This is the code for producing my result:
import urllib.request
import urllib.parse
import json
import time
import random
import hashlib
def translator(content):
"""arg:content"""
url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
data = {}
u = 'fanyideskweb'
d = content
f = str(int(time.time()*1000) + random.randint(1,10))
c = 'rY0D^0\'nM0}g5Mm1z%1G4'
sign = hashlib.md5((u + d + f + c).encode('utf-8')).hexdigest()
data['i'] = content
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['salt'] = f
data['sign'] = sign
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_CL1CKBUTTON'
data['typoResult'] = 'true'
data = urllib.parse.urlencode(data).encode('utf-8')
request = urllib.request.Request(url=url,data=data,method='POST')
response = urllib.request.urlopen(request)
d = json.loads(response.read().decode('utf-8'))
return d['translateResult'][0][0]['tgt']
translator('what album was #head_entity# released on?')
The only thing I think I changed to make the request different to the original page was the url argument in the code:
My_url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
Original_url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule' which gave me an error {"errorCode":50}
I checked the header and data parameters one by one but still can't solve the problem. I have no idea why this happened. Any ideas?
Here is a snippet of the code:
import flickrapi
api_key = "xxxxxxxxxxxxxxxxxxxxx"
secret_api_key = "xxxxxxxxxx"
flickr = flickrapi.FlickrAPI(api_key, secret_api_key)
def obtainImages3():
group_list = flickr.groups.search (api_key=api_key, text = 'Paris', per_page = 10)
for group in group_list[0]:
group_images = flickr.groups.pools.getPhotos (api_key=api_key, group_id = group.attrib['nsid'], extras = 'geo, tags, url_s')
for image in group_images[0]:
url = image.attrib['url_s']
tags = image.attrib['tags']
if image.attrib['geo'] != 'null':
photo_location = flickr.photos_geo_getLocation(photo_id=image.attrib['id'])
lat = float(photo_location[0][0].attrib['latitude'])
lon = float(photo_location[0][0].attrib['longitude'])
I want to get information about images if and only if they have a geo-tag connected to them. I tried to do this with the line if image.attrib['geo'] != 'null' but I don't think this works. Can anyone suggest a way I might be able to do it, thanks in advance!
Replace your if image.attrib['geo']!='null' condition with a try and exception block as below.
Since the API returns the data in JSON format you can check the presence of key using:
try:
image.attrib['geo']
photo_location=flickr.photos_geo_getLocation(photo_id=image.attrib['id'])
lat = float(photo_location[0][0].attrib['latitude'])
lon = float(photo_location[0][0].attrib['longitude'])
except KeyError:
pass