For a test program I'm crawling a webpage. I'd like to crawl all activites for specifid ID´s which are associated to the respective cities.
For example, my initial code:
RegionIDArray = {522: "London", 4745: "London", 2718: "London", 3487: "Tokio"}
Im now wondering if its possible to sum up all IDs (values) which are related to e.g. London into one key:
RegionIDArray = {522, 4745, 2718: "London}
If I´m trying this, I get no results
My full code so far
RegionIDArray = {522: "London", 4745: "London", 2718: "London", 3487: "Tokio"}
for reg in RegionIDArray:
r = requests.get("https://www.getyourguide.de/-l" +str(reg) +"/")
soup = BeautifulSoup(r.content, "lxml")
g_data = soup.find_all("span", {"class": "intro-title"})
for item in g_data:
POI_final = (str(item.text))
end_final = ("POI: " + POI_final)
if end_final not in already_printed:
print(end_final)
already_printed.add(end_final)
Is there any smart way.Appreciate any feedback.
You can do this in 2 steps:
Create a dictionary mapping locations to list of IDs.
Reverse this dictionary, taking care to ensure your keys are hashable.
The first step is optimally processed via collections.defaultdict.
For the second step, you can use either tuple or frozenset. I opt for the latter since it is not clear that ordering is relevant.
from collections import defaultdict
RegionIDArray = {522: "London", 4745: "London", 2718: "London", 3487: "Tokio"}
d = defaultdict(list)
for k, v in RegionIDArray.items():
d[v].append(k)
res = {frozenset(v): k for k, v in d.items()}
print(res)
{frozenset({522, 2718, 4745}): 'London',
frozenset({3487}): 'Tokio'}
You can use itertools.groupby:
import itertools
RegionIDArray = {522: "London", 4745: "London", 2718: "London", 3487: "Tokio"}
new_results = {tuple(c for c, _ in b):a for a, b in itertools.groupby(sorted(RegionIDArray.items(), key=lambda x:x[-1]), key=lambda x:x[-1])}
Output:
{(3487,): 'Tokio', (4745, 522, 2718): 'London'}
What you can do is make a reverse lookup table from the values to all working keys, like so:
def reverse(ids):
table = {}
for key in ids:
if ids[key] not in table:
table[ids[key]] = []
table[ids[key]].append(key)
return table
Related
Probably this is a simple question, but my experience in for loop is very limited.
I was trying to adapt the solution in this page https://www.mediawiki.org/wiki/API:Geosearch with some simple examples that i have, but the result is not what i expected.
For example:
I have this simple data frame:
df= pd.DataFrame({'City':['Sesimbra','Ciudad Juárez','31100 Treviso','Ramada Portugal','Olhão'],
'Country':['Portugal','México','Itália','Portugal','Portugal']})
I created a list based on cities:
lista_cidades = list(df['City'])
and i would like to iterate over this list to get the coordinates (decimal, preferably)
So far i tried this approach:
import requests
lng_dict = {}
lat_dict = {}
S = requests.Session()
URL = "https://en.wikipedia.org/w/api.php"
PARAMS = {
"action": "query",
"format": "json",
"titles": [lista_cidades],
"prop": "coordinates"
}
R = S.get(url=URL, params=PARAMS)
DATA = R.json()
PAGES = DATA['query']['pages']
for i in range(len(lista_cidades)):
for k, v in PAGES.items():
try:
lat_dict[lista_cidades[i]] = str(v['coordinates'][0]['lat'])
lng_dict[lista_cidades[i]] = str(v['coordinates'][0]['lon'])
except:
pass
but it looks like the code doesn't iterate over the list and always returns the same coordinate
For example, when i call the dictionary with latitude coordinates, this is what i get
lng_dict
{'Sesimbra': '-7.84166667',
'Ciudad Juárez': '-7.84166667',
'31100 Treviso': '-7.84166667',
'Ramada Portugal': '-7.84166667',
'Olhão': '-7.84166667'}
What should i do to solve this?
Thanks in advance
I think the query returns only one result, it will take only the last city from you list (in your cas the "Olhão" coordinates).
You can check it by logging the DATA content.
I do not know about wikipedia API, but either your call lack a parameter (documentation should give you the information) or you have to call the API for each city like :
import pandas as pd
import requests
df = pd.DataFrame({'City': ['Sesimbra', 'Ciudad Juárez', '31100 Treviso', 'Ramada Portugal', 'Olhão'],
'Country': ['Portugal', 'México', 'Itália', 'Portugal', 'Portugal']})
lista_cidades = list(df['City'])
lng_dict = {}
lat_dict = {}
S = requests.Session()
URL = "https://en.wikipedia.org/w/api.php"
for city in lista_cidades:
PARAMS = {
"action": "query",
"format": "json",
"titles": city,
"prop": "coordinates"
}
R = S.get(url=URL, params=PARAMS)
DATA = R.json()
PAGES = DATA['query']['pages']
for k, v in PAGES.items():
try:
lat_dict[city] = str(v['coordinates'][0]['lat'])
lng_dict[city] = str(v['coordinates'][0]['lon'])
except:
pass
I have produced a couple of json files after scraping a few elements. The structure for each file is as follows:
us.json
{'Pres': 'Biden', 'Vice': 'Harris', 'Secretary': 'Blinken'}
uk.json
{'1st Min': 'Johnson', 'Queen':'Elizabeth', 'Prince': 'Charles'}
I'd like to know how I could edit the structure of each dictionary inside the json file to get an output as it follows:
[
{"title": "Pres",
"name": "Biden"}
,
{"title": "Vice",
"name": "Harris"}
,
{"title": "Secretary",
"name": "Blinken"}
]
As far as I am able to think how to do it (I'm a beginner, studying only since a few weeks) I need first to run a loop to open each file, then I should generate a list of dictionaries and finally modify the dictionary to change the structure. This is what I got NOT WORKING as it overrides always with the same keys.
import os
import json
list_of_dicts = []
for filename in os.listdir("DOCS/Countries Data"):
with open(os.path.join("DOCS/Countries Data", filename), 'r', encoding='utf-8') as f:
text = f.read()
country_json = json.loads(text)
list_of_dicts.append(country_json)
for country in list_of_dicts:
newdict = country
lastdict = {}
for key in newdict:
lastdict = {'Title': key}
for value in newdict.values():
lastdict['Name'] = value
print(lastdict)
Extra bonus if you could also show me how to generate an ID mumber for each entry. Thank you very much
This look like task for list comprehension, I would do it following way
import json
us = '{"Pres": "Biden", "Vice": "Harris", "Secretary": "Blinken"}'
data = json.loads(us)
us2 = [{"title":k,"name":v} for k,v in data.items()]
us2json = json.dumps(us2)
print(us2json)
output
[{"title": "Pres", "name": "Biden"}, {"title": "Vice", "name": "Harris"}, {"title": "Secretary", "name": "Blinken"}]
data is dict, .items() provide key-value pairs, which I unpack into k and v (see tuple unpacking).
You can do this easily by writing a simple function like below
import uuid
def format_dict(data: dict):
return [dict(title=title, name=name, id=str(uuid.uuid4())) for title, name in data.items()]
where you can split the items as different objects and add a identifier for each using uuid.
Full code can be modified like this
import uuid
import os
import json
def format_dict(data: dict):
return [dict(title=title, name=name, id=str(uuid.uuid4())) for title, name in data.items()]
list_of_dicts = []
for filename in os.listdir("DOCS/Countries Data"):
with open(os.path.join("DOCS/Countries Data", filename), 'r', encoding='utf-8') as f:
country_json = json.load(f)
list_of_dicts.append(format_dict(country_json))
# list_of_dicts contains all file contents
I have three lists emojiLink, emojiTitle, emojiDescription in my code below.
from bs4 import BeautifulSoup
import pandas as pd
r = requests.get("https://www.emojimeanings.net/list-smileys-people-whatsapp")
soup = BeautifulSoup(r.text, "lxml")
emojiLink = []
emojiTitle = []
emojiDescription = []
for tableRow in soup.find_all("tr", attrs={"class": "ugc_emoji_tr"}):
for img in tableRow.findChildren("img"):
emojiLink.append(img['src'])
for tableData in soup.find_all("td"):
for boldTag in tableData.findChildren("b"):
emojiTitle.append(boldTag.text)
for tableRow in soup.find_all("tr", attrs={"class": "ugc_emoji_tr"}):
for tabledata in tableRow.findChildren("td"):
if tabledata.has_attr("id"):
k = tabledata.text.strip().split('\n')[-1]
l = k.lstrip()
emojiDescription.append(l)
I want to convert these lists into a Json object which gonna look like...
{{"link": "emojiLink[0]", "title": "emojiTitle[0]", "desc": "emojiDescription[0]"},{"link": "emojiLink[1]", "title": "emojiTitle[1]", "desc": "emojiDescription[1]"}..........} so on...
I am not getting to how to do this?
THANKS IN ADVANCE!!!
This returns an array of JSON objects based off of Chandella07's answer.
from bs4 import BeautifulSoup
import pandas as pd
import requests
import json
r = requests.get("https://www.emojimeanings.net/list-smileys-people-whatsapp")
soup = BeautifulSoup(r.text, "lxml")
emojiLinkList = []
emojiTitleList = []
emojiDescriptionList = []
jsonData = []
for tableRow in soup.find_all("tr", attrs={"class": "ugc_emoji_tr"}):
for img in tableRow.findChildren("img"):
emojiLinkList.append(img['src'])
for tableData in soup.find_all("td"):
for boldTag in tableData.findChildren("b"):
emojiTitleList.append(boldTag.text)
for tableRow in soup.find_all("tr", attrs={"class": "ugc_emoji_tr"}):
for tabledata in tableRow.findChildren("td"):
if tabledata.has_attr("id"):
k = tabledata.text.strip().split('\n')[-1]
l = k.lstrip()
emojiDescriptionList.append(l)
for link, title, desc in zip(emojiLinkList, emojiTitleList, emojiDescriptionList):
dict = {"link": link, "title": title, "desc": desc}
jsonData.append(dict)
print(json.dumps(jsonData, indent=2))
Data Example:
{
"link": "https://www.emojimeanings.net/img/emojis/purse_1f45b.png",
"title": "Wallet",
"desc": "After the shopping trip, the money has run out or the wallet was forgotten at home. The accessory keeps loose money but also credit cards or make-up. Can refer to shopping or money and stand for femininity and everything girlish."
},
One by one access each element from list and put it into some dict and at the end append to a list:
import json
# some example lists
em_link = ['a', 'b', 'c']
em_title = ['x', 'y', 'z']
em_desc = [1,2,3]
arr = []
for i,j,k in zip(em_link, em_title, em_desc):
d = {}
d.update({"link": i})
d.update({"title": j})
d.update({"desc": k})
arr.append(d)
print(json.dumps(arr))
Output:
[{"link": "a", "title": "x", "desc": 1}, {"link": "b", "title": "y", "desc": 2}, {"link": "c", "title": "z", "desc": 3}]
There is something wrong with your dict format. {{...},{...}} is not a valid format, [{...},{...}] is valid.
Regarding the merging logic:
for i in zip([1,2,3], ["a", "b"], [8,9,10]):
print(i)
... will output ...
(1, 'a', 8)
(2, 'b', 9)
Try something like that:
out = []
for i in zip(emojiLink, emojiTitle, emojiDescription):
out.append({"link": i[0], ...})
You can use the json library to read/write in json format.
import json
with open('./smthn.json', 'w') as f:
json.dump({"a": "dictionary"}, f)
https://devtut.github.io/python/json-module.html#storing-data-in-a-file
So, you want a list of dictionary records? If you're sure all of the lists are the same length, you can do:
gather = []
for l,t,d in zip(emojiLink,emojiTitle,emojiDescription):
gather.append( {"link":l, "title":t, "desc":d} )
json.dump( gather, open("myrecord.json","w") )
I have a image like this
I'm trying to extract the form data like this
{
"comments":"nil",
"namefirst":"Jhon",
"last":"Doe",
"mf":"",
"address 1": "PICADALLY LONDON",
"APT":"103",
"City": "London",
"State":"Nil",
"DOB": "",
"AGE": 43,
"Phone Number":"+4464343",
"email":"nil",
"date":"20-03-2012"
}
But I'm unable to extract it like that I'm able to get the box boundaries I'm stuck here since 5 days any help would be greatly appreciated.
my code
items = []
lines = {}
for text in response.text_annotations[1:]:
top_x_axis = text.bounding_poly.vertices[0].x
top_y_axis = text.bounding_poly.vertices[0].y
bottom_y_axis = text.bounding_poly.vertices[3].y
if top_y_axis not in lines:
lines[top_y_axis] = [(top_y_axis, bottom_y_axis), []]
for s_top_y_axis, s_item in lines.items():
if top_y_axis < s_item[0][1]:
lines[s_top_y_axis][1].append((top_x_axis, text.description))
break
for _, item in lines.items():
if item[1]:
words = sorted(item[1], key=lambda t: t[0])
items.append((item[0], ' '.join([word for _, word in words]), words))
print(items)
Can anyone help me with this.
Thanks in advance
So I have a small data like this:
data = [
{"Name":"Arab","Code":"Zl"},
{"Name":"Korea","Code":"Bl"},
{"Name":"China","Code":"Bz"}
]
I want to find a graph so that the x-axis is: "Bl", "Bz", "Zl" (alphabetic order)
and the y-axis is: "Korea", "China", "Arab" (corresponding to the codenames).
I thought of:
new_data = {}
for dic in data:
country_data = dic["Name"]
code_data = dic["Code"]
new_data[code_data] = country_data
code_data = []
for codes in new_data.keys():
code_data.append(codes)
code_data.sort()
name_data = []
for code in code_data:
name_data.append(new_data[code])
Is there a better way to do this?
Perhaps by not creating a new dictionary?
So here's the data:
data = [
{"Name":"Arab","Code":"Zl"},
{"Name":"Korea","Code":"Bl"},
{"Name":"China","Code":"Bz"}
]
To create a new sorted list:
new_list = sorted(data, key=lambda k: k['Code'])
If you don't want to get a new list:
data[:] = sorted(data, key=lambda k: k['Code'])
The result is:
[{'Code': 'Bl', 'Name': 'Korea'}, {'Code': 'Bz', 'Name': 'China'}, {'Code': 'Zl', 'Name': 'Arab'}]
I hope I could help you!
Better way to produce same results:
from operator import itemgetter
data = [
{"Name": "Arab", "Code": "Zl"},
{"Name": "Korea", "Code": "Bl"},
{"Name": "China", "Code": "Bz"}
]
sorted_data = ((d["Code"], d["Name"]) for d in sorted(data, key=itemgetter("Code")))
code_data, name_data = (list(item) for item in zip(*sorted_data))
print(code_data) # -> ['Bl', 'Bz', 'Zl']
print(name_data) # -> ['Korea', 'China', 'Arab']
Here's one way using operator.itemgetter and unpacking via zip:
from operator import itemgetter
_, data_sorted = zip(*sorted(enumerate(data), key=lambda x: x[1]['Code']))
codes, names = zip(*map(itemgetter('Code', 'Name'), data_sorted))
print(codes)
# ('Bl', 'Bz', 'Zl')
print(names)
# ('Korea', 'China', 'Arab')