Different Behavior for Parsing Two Similar Wikipedia Infoboxes

Different Behavior for Parsing Two Similar Wikipedia Infoboxes - python

I have two infoboxes that look exactly the same to me, but I'm getting different behavior in mwparserfromhell. In the first instance I'm getting what I expect - the entire infobox is captured as a template object. In the second instance parts of the infobox are extracted as separate templates. This is confusing since the infoboxes look very similar to me, and I was expecting that the entire infobox could be extracted in the second case.
This is the code I'm using:
mwparserfromhell.parse(text.strip().lower()).filter_templates()
Text 1 Input:
txt1 = """{{Infobox building
| name = 666 Fifth Avenue
| former_names = Tishman Building
| status = Complete
| image = 666 Fifth Avenue by David Shankbone.jpg
| image_size = 300px
| caption =
| location = 666 Fifth Avenue<br>[[Manhattan]], [[New York (state)|New York]] 10103
| coordinates = {{coord|40.760163|-73.976204|format=dms}}
| start_date =
| completion_date = 1957
| architect = [[Carson & Lundin]]
| owner = [[Brookfield Properties]]
| cost = $40 million
| floor_area = {{convert|1,463,892|sqft|m2|abbr=on}}
| top_floor =
| floor_count = 41
| references =
| map_type =
| building_type = Office
| antenna_spire =
| roof = {{convert|483|ft|m|abbr=on}}
| elevator_count = 24 (20 passenger, 4 freight)
| structural_engineer =
| main_contractor =
| opening = November 25, 1957
| developer = Tishman Realty and Construction
| management =
}}"""
Text 1 Output:
['{{infobox building\n| name = 666 fifth avenue\n| former_names = tishman building\n| status = complete\n| image = 666 fifth avenue by david shankbone.jpg\n| image_size = 300px\n| caption = \n| location = 666 fifth avenue<br>[[manhattan]], [[new york (state)|new york]] 10103\n| coordinates = {{coord|40.760163|-73.976204|format=dms}}\n| start_date = \n| completion_date = 1957\n| architect = [[carson & lundin]]\n| owner = [[brookfield properties]]\n| cost = $40 million\n| floor_area = {{convert|1,463,892|sqft|m2|abbr=on}}\n| top_floor = \n| floor_count = 41\n| references = \n| map_type = \n| building_type = office\n| antenna_spire = \n| roof = {{convert|483|ft|m|abbr=on}}\n| elevator_count = 24 (20 passenger, 4 freight)\n| structural_engineer = \n| main_contractor = \n| opening = november 25, 1957\n| developer = tishman realty and construction\n| management = \n}}',
'{{coord|40.760163|-73.976204|format=dms}}',
'{{convert|1,463,892|sqft|m2|abbr=on}}',
'{{convert|483|ft|m|abbr=on}}']
Text 2 Input:
txt2 = """{{Infobox building
| name = Central Park Tower
| alternate_names = Nordstrom Tower
| image = Central Park Tower April 2020.jpg
| caption = Central Park Tower on April 25, 2020
| location = 225 [[57th Street (Manhattan)|West 57th Street]]<br/>[[Manhattan]], [[New York City]], [[New York (state)|New York]], [[United States|U.S.]]
| coordinates = {{coord|40.7663|-73.9810|type:landmark_globe:earth_region:US-NY|display=inline,title}}
| status = Topped Out
| start_date = 2014
| est_completion = 2020<ref name=curbed>{{cite news |author=Amy Plitt |url=https://ny.curbed.com/2017/6/1/15714666/central-park-tower-offering-plan-approval-sales-launch |title=Central Park Tower is now one step closer to launching sales |date=June 1, 2017 |access-date=August 30, 2017 |work=Curbed}}</ref>
| building_type = [[Residential]], [[retail]]
| architectural_style = [[Modern architecture|Modern]]
| architectural = {{cvt|1550|ft|0}}
| floor_count = 131<ref>{{cite web |url=https://www.architecturaldigest.com/story/new-york-city-central-park-tower-worlds-tallest-residential-building </ref><ref>{{cite web |url=https://archpaper.com/2019/09/central-park-tower-tops-out/</ref> (98 habitable floors)<ref name="auto">{{Cite web |url=http://www.skyscrapercenter.com/building/central-park-tower/14269 |title=Central Park Tower - The Skyscraper Center |website=www.skyscrapercenter.com |access-date=October 10, 2018}}</ref>
| elevator_count = 11
| cost = $3 billion<ref name="Tase">{{cite news|url=https://commercialobserver.com/2019/04/all-in-good-tase-the-crisis-for-the-american-cohort-in-tel-aviv-is-essentially-over/|title=All in Good TASE: The Crisis for the American Cohort in Tel Aviv Is Essentially Over|date=April 4, 2019|work=Commercial Observer|last=Gourarie|first=Chava}}</ref>
| floor_area = {{convert|1,285,308|sqft|m2}}<ref name="auto" />
| architect = [[Adrian Smith + Gordon Gill Architecture]]
| structural_engineer = [[WSP Global]]
| main_contractor = [[Lendlease]]
| developer = [[Extell Development Company]]
}}"""
Text 2 Output:
['{{coord|40.7663|-73.9810|type:landmark_globe:earth_region:us-ny|display=inline,title}}',
'{{cite news |author=amy plitt |url=https://ny.curbed.com/2017/6/1/15714666/central-park-tower-offering-plan-approval-sales-launch |title=central park tower is now one step closer to launching sales |date=june 1, 2017 |access-date=august 30, 2017 |work=curbed}}',
'{{cvt|1550|ft|0}}',
'{{cite web |url=https://archpaper.com/2019/09/central-park-tower-tops-out/</ref> (98 habitable floors)<ref name="auto">{{cite web |url=http://www.skyscrapercenter.com/building/central-park-tower/14269 |title=central park tower - the skyscraper center |website=www.skyscrapercenter.com |access-date=october 10, 2018}}</ref>\n| elevator_count = 11\n| cost = $3 billion<ref name="tase">{{cite news|url=https://commercialobserver.com/2019/04/all-in-good-tase-the-crisis-for-the-american-cohort-in-tel-aviv-is-essentially-over/|title=all in good tase: the crisis for the american cohort in tel aviv is essentially over|date=april 4, 2019|work=commercial observer|last=gourarie|first=chava}}</ref>\n| floor_area = {{convert|1,285,308|sqft|m2}}<ref name="auto" />\n| architect = [[adrian smith + gordon gill architecture]]\n| structural_engineer = [[wsp global]]\n| main_contractor = [[lendlease]]\n| developer = [[extell development company]]\n}}',
'{{cite web |url=http://www.skyscrapercenter.com/building/central-park-tower/14269 |title=central park tower - the skyscraper center |website=www.skyscrapercenter.com |access-date=october 10, 2018}}',
'{{cite news|url=https://commercialobserver.com/2019/04/all-in-good-tase-the-crisis-for-the-american-cohort-in-tel-aviv-is-essentially-over/|title=all in good tase: the crisis for the american cohort in tel aviv is essentially over|date=april 4, 2019|work=commercial observer|last=gourarie|first=chava}}',
'{{convert|1,285,308|sqft|m2}}']

This is a known bug for mwparserfromhell. My workaround was to create an on-the-fly regex pattern that would remove the ref link but keep the rest of the text intact.
{{ this is text <ref>this is a ref link}}</ref>
to
{{ this is text }}
Here's the code:
def get_regex_str_pattern(reg_str):
""" Creates regex string to remove specific patterns that are embedded in larger strings
:param reg_str: String to tokenize
:return: Regex pattern
"""
return r'.*'.join([f"({re.escape(r.strip())})" for r in reg_str.split()])
def get_ref_clean_str(txt):
""" Removes badly formed ref strings from wiki text
:param txt: Wiki text
:return: Parser object
"""
clean_txt = txt
wiki_text = mwparserfromhell.parse(txt)
for r in wiki_text.filter_tags():
if str(r.tag) in ('ref', 'br'):
clean_txt = re.sub(get_regex_str_pattern(r), ' ', clean_txt)
return mwparserfromhell.parse(clean_txt)

Related

Find intersection of values in dictionary under a same key - python

I have a file with lines, each line is split on "|", I want to compare arguments 5 from each line and if intersect, then proceed. This gets to a second part:
first arguments1,2 are compared by dictionary, and if they are same AND
if arguments5,6 are overlapping,
then those lines get concatenated.
How to compare intersection of values under the same key? The code below works cross-key but not within same key:
from functools import reduce
reduce(set.intersection, (set(val) for val in query_dict.values()))
Here is an example of lines:
text1|text2|text3|text4 text 5| text 6| text 7 text 8|
text1|text2|text12|text4 text 5| text 6| text 7|
text9|text10|text3|text4 text 5| text 11| text 12 text 8|
The output should be:
text1|text2|text3;text12|text4;text5;text4;text5|text6;text7 text8;text6
In other words, only those lines that are matching by 1st,2nd arguments (cells equal) and if 5th,6th arguments are overlapping (intersection) are concatenated.
Here is input file:
Angela Darvill|19036321|School of Nursing, University of Salford, Peel House Eccles, Manchester M30 0NN, UK.|['GB','US']|['Salford', 'Eccles', 'Manchester']
Helen Stanley|19036320|Senior Lecturer, Institute of Nursing and Midwifery, University of Brighton, Westlain House, Village Way, Falmer, BN1 9PH Brighton, UK.|['US']|['Brighton', 'Brighton']
Angela Darvill|190323121|School of Nursing, University of Salford, Peel House Eccles, Manchester M30 0NN, UK.|['US']|['Brighton', 'Eccles', 'Manchester']
Helen Stanley|19576876320|Senior Lecturer, Institute of Nursing and Midwifery, University of Brighton, Westlain House, Village Way, Falmer, BN1 9PH Brighton, UK.|['US']|['Brighton', 'Brighton']
The output should look like:
Angela Darvill|19036321;190323121|...
Helen Stanley|19036320;19576876320|...
Angela Darvill gets stacked because two records share same name, same country and same city(-ies).

Based on your improved question :
import itertools
data = """\
Angela Darvill|19036321|School of Nursing, University of Salford, Peel House Eccles, Manchester M30 0NN, UK.|['GB','US']|['Salford', 'Eccles', 'Manchester']
Helen Stanley|19036320|Senior Lecturer, Institute of Nursing and Midwifery, University of Brighton, Westlain House, Village Way, Falmer, BN1 9PH Brighton, UK.|['US']|['Brighton', 'Brighton']
Angela Darvill|190323121|School of Nursing, University of Salford, Peel House Eccles, Manchester M30 0NN, UK.|['US']|['Brighton', 'Eccles', 'Manchester']
Helen Stanley|19576876320|Senior Lecturer, Institute of Nursing and Midwifery, University of Brighton, Westlain House, Village Way, Falmer, BN1 9PH Brighton, UK.|['US']|['Brighton', 'Brighton']
"""
lines = tuple(tuple(line.split('|')) for line in data.splitlines())
results = []
for line_a_index, line_a in enumerate(lines):
# we want to compare each line with each other, so we start at index+1
for line_b_index, line_b in enumerate(lines[line_a_index+1:], start=line_a_index+1):
assert len(line_a) >= 5, f"not enough cells ({len(line_a)}) in line {line_a_index}"
assert len(line_b) >= 5, f"not enough cells ({len(line_b)}) in line {line_b_index}"
assert all(isinstance(cell, str) for cell in line_a)
assert all(isinstance(cell, str) for cell in line_b)
columns0_are_equal = line_a[0] == line_b[0]
columns1_are_equal = line_a[1] == line_b[1]
columns3_are_overlap = set(line_a[3]).issubset(set(line_b[3])) or set(line_b[3]).issubset(set(line_a[3]))
columns4_are_overlap = set(line_a[4]).issubset(set(line_b[4])) or set(line_b[4]).issubset(set(line_a[4]))
print(f"between lines index={line_a_index} and index={line_b_index}, {columns0_are_equal=} {columns1_are_equal=} {columns3_are_overlap=} {columns4_are_overlap=}")
if (
columns0_are_equal
# and columns1_are_equal
and (columns3_are_overlap or columns4_are_overlap)
):
print("MATCH!")
results.append(
(line_a_index, line_b_index,) + tuple(
((cell_a or "") + (";" if (cell_a or cell_b) else "") + (cell_b or "")) if cell_a != cell_b
else cell_a
for cell_a, cell_b in itertools.zip_longest(line_a, line_b)
)
)
print("Fancy output :")
lines_to_display = set(itertools.chain.from_iterable((lines[result[0]], lines[result[1]], result[2:]) for result in results))
columns_widths = (max(len(str(index)) for result in results for index in (result[0], result[1])),) + tuple(
max(len(cell) for cell in column)
for column in zip(*lines_to_display)
)
for width in columns_widths:
print("-" * width, end="|")
print("")
for result in results:
for line_index, original_line in zip((result[0], result[1]), (lines[result[0]], lines[result[1]])):
for column_index, cell in zip(itertools.count(), (str(line_index),) + original_line):
if cell:
print(cell.ljust(columns_widths[column_index]), end='|')
print("", end='\n') # explicit newline
for column_index, cell in zip(itertools.count(), ("=",) + result[2:]):
if cell:
print(cell.ljust(columns_widths[column_index]), end='|')
print("", end='\n') # explicit newline
for width in columns_widths:
print("-" * width, end="|")
print("")
expected_outputs = """\
Angela Darvill|19036321;190323121|...
Helen Stanley|19036320;19576876320|...
""".splitlines()
for result, expected_output in itertools.zip_longest(results, expected_outputs):
actual_output = "|".join(result[2:])
assert actual_output.startswith(expected_output[:-3]) # minus the "..."
-|--------------|--------------------|---------------------------------------------------------------------------------------------------------------------------------------|------------------|------------------------------------------------------------------------|
0|Angela Darvill|19036321 |School of Nursing, University of Salford, Peel House Eccles, Manchester M30 0NN, UK. |['GB','US'] |['Salford', 'Eccles', 'Manchester'] |
2|Angela Darvill|190323121 |School of Nursing, University of Salford, Peel House Eccles, Manchester M30 0NN, UK. |['US'] |['Brighton', 'Eccles', 'Manchester'] |
=|Angela Darvill|19036321;190323121 |School of Nursing, University of Salford, Peel House Eccles, Manchester M30 0NN, UK. |['GB','US'];['US']|['Salford', 'Eccles', 'Manchester'];['Brighton', 'Eccles', 'Manchester']|
1|Helen Stanley |19036320 |Senior Lecturer, Institute of Nursing and Midwifery, University of Brighton, Westlain House, Village Way, Falmer, BN1 9PH Brighton, UK.|['US'] |['Brighton', 'Brighton'] |
3|Helen Stanley |19576876320 |Senior Lecturer, Institute of Nursing and Midwifery, University of Brighton, Westlain House, Village Way, Falmer, BN1 9PH Brighton, UK.|['US'] |['Brighton', 'Brighton'] |
=|Helen Stanley |19036320;19576876320|Senior Lecturer, Institute of Nursing and Midwifery, University of Brighton, Westlain House, Village Way, Falmer, BN1 9PH Brighton, UK.|['US'] |['Brighton', 'Brighton'] |
-|--------------|--------------------|---------------------------------------------------------------------------------------------------------------------------------------|------------------|------------------------------------------------------------------------|
You can see that the lines index 0 and 2 have been merged, same for the lines index 1 and 3.

from itertools import zip_longest
data = """\
text1|text2|text3|text4 text 5| text 6| text 7 text 8|
text1|text2|text12|text4 text 5| text 6| text 7| text9|text10|text3|text4 text 5| text 11| text 12 text 8|
"""
lines = tuple(line.split('|') for line in data.splitlines())
number_of_lines = len(lines)
print(f"number of lines : {number_of_lines}")
print(f"number of cells in line 1 : {len(lines[0])}")
print(f"number of cells in line 2 : {len(lines[1])}")
print(f"{lines[0]=}")
print(f"{lines[1]=}")
result = []
# we want to compare each line with each other :
for line_a_index, line_a in enumerate(lines):
for line_b_index, line_b in enumerate(lines[line_a_index+1:]):
assert len(line_a) >= 5, f"not enough cells ({len(line_a)}) in line {line_a_index}"
assert len(line_b) >= 5, f"not enough cells ({len(line_b)}) in line {line_b_index}"
assert all(isinstance(cell, str) for cell in line_a)
assert all(isinstance(cell, str) for cell in line_b)
if line_a[0] == line_b[0] and line_a[1] == line_b[1] and (
line_a[5] in line_b[5] or line_a[6] in line_b[6] # A in B
or line_b[5] in line_a[5] or line_b[6] in line_a[6] # B in A
):
result.append(tuple(
((cell_a or "") + (";" if (cell_a or cell_b) else "") + (cell_b or "")) if cell_a != cell_b else cell_a
for cell_a, cell_b in zip_longest(line_a[:5+1], line_b[:5+1]) # <-- here I truncated the lines
))
# I decided to have a fancy output, but I made some simplifying assumptions to make it simple
if len(result) > 1:
raise NotImplementedError
widths = tuple(max(len(a) if a is not None else 0, len(b) if b is not None else 0, len(c) if c is not None else 0)
for a, b, c in zip_longest(lines[0], lines[1], result[0]))
length = max(len(lines[0]), len(lines[1]), len(result[0]))
for line in (lines[0], lines[1], result[0]):
for index, cell in zip_longest(range(length), line):
if cell:
print(cell.ljust(widths[index]), end='|')
print("", end='\n') # explicit newline
original_expected_output = "text1|text2|text3;text12|text4;text5;text4;text5|text6;text7 text8;text6"
print(f"{original_expected_output} <-- expected")
lenormju_expected_output = "text1|text2|text3;text12|text4 text 5| text 6| text 7 text 8; text 7"
print(f"{lenormju_expected_output} <-- fixed")
output
number of lines : 2
number of cells in line 1 : 7
number of cells in line 2 : 13
lines[0]=['text1', 'text2', 'text3', 'text4 text 5', ' text 6', ' text 7 text 8', '']
lines[1]=['text1', 'text2', 'text12', 'text4 text 5', ' text 6', ' text 7', ' text9', 'text10', 'text3', 'text4 text 5', ' text 11', ' text 12 text 8', '']
text1|text2|text3 |text4 text 5| text 6| text 7 text 8 |
text1|text2|text12 |text4 text 5| text 6| text 7 | text9|text10|text3|text4 text 5| text 11| text 12 text 8|
text1|text2|text3;text12|text4 text 5| text 6| text 7 text 8; text 7|
text1|text2|text3;text12|text4;text5;text4;text5|text6;text7 text8;text6 <-- expected
text1|text2|text3;text12|text4 text 5| text 6| text 7 text 8; text 7 <-- fixed
EDIT:
from dataclasses import dataclass
from itertools import zip_longest
data = """\
text1|text2|text3|text4 text 5| text 6| text 7 text 8|
text1|text2|text12|text4 text 5| text 6| text 7| text9|text10|text3|text4 text 5| text 11| text 12 text 8|
"""
#dataclass
class Match: # a convenient way to store the solutions
line_a_index: int
line_b_index: int
line_result: tuple
lines = tuple(line.split('|') for line in data.splitlines())
results = []
for line_a_index, line_a in enumerate(lines):
for line_b_index, line_b in enumerate(lines[line_a_index+1:], line_a_index+1):
assert len(line_a) >= 5, f"not enough cells ({len(line_a)}) in line {line_a_index}"
assert len(line_b) >= 5, f"not enough cells ({len(line_b)}) in line {line_b_index}"
assert all(isinstance(cell, str) for cell in line_a)
assert all(isinstance(cell, str) for cell in line_b)
if line_a[0] == line_b[0] and line_a[1] == line_b[1] and (
line_a[5] in line_b[5] or line_a[6] in line_b[6] # A in B
or line_b[5] in line_a[5] or line_b[6] in line_a[6] # B in A
):
line_result = tuple(
((cell_a or "") + (";" if (cell_a or cell_b) else "") + (cell_b or "")) if cell_a != cell_b else cell_a
for cell_a, cell_b in zip_longest(line_a[:5+1], line_b[:5+1]) # <-- here I truncated the lines
)
results.append(Match(line_a_index=line_a_index, line_b_index=line_b_index, line_result=line_result))
# simple output of the solution
for result in results:
print(f"line n°{result.line_a_index} matches with n°{result.line_b_index} : {result.line_result}")
line n°0 matches with n°1 : ('text1', 'text2', 'text3;text12', 'text4 text 5', ' text 6', ' text 7 text 8; text 7')

How to create function to pass lat long in api call for get weather data

I try to get the data from pyOWM package using city name but in some cases because of city typo error
not getting data & it breaks the process.
I want to get the weather data using lat-long but don't know how to set function for it.
Df1:
-----
User City State Zip Lat Long
-----------------------------------------------------------------------------
A Kuala Lumpur Wilayah Persekutuan 50100 5.3288907 103.1344397
B Dublin County Dublin NA 50.2030506 14.5509842
C Oconomowoc NA NA 53.3640384 -6.1953066
D Mumbai Maharashtra 400067 19.2177166 72.9708833
E Mratin Stredocesky kraj 250 63 40.7560585 -5.6924778
.
.
.
----------------------------------
Code:
--------
import time
from tqdm.notebook import tqdm
import pyowm
from pyowm.utils import config
from pyowm.utils import timestamps
cities = Df1["City"].unique().tolist()
cities1 = cities [:5]
owm = pyowm.OWM('bee8db7d50a4b777bfbb9f47d9beb7d0')
mgr = owm.weather_manager()
'''
Step-1 Define list where save the data
'''
list_wind_Speed =[]
list_tempreture =[]
list_max_temp =[]
list_min_temp =[]
list_humidity =[]
list_pressure =[]
list_city = []
list_cloud=[]
list_status =[]
list_rain =[]
'''
Step-2 Fetch data
'''
j=0
for city in tqdm(cities1):
j=+1
if j < 60:
# one_call_obs = owm.weather_at_coords(52.5244, 13.4105).weather
# one_call_obs.current.humidity
observation = mgr.weather_at_place(city)
l = observation.weather
list_city.append(city)
list_wind_Speed.append(l.wind()['speed'])
list_tempreture.append(l.temperature('celsius')['temp'])
list_max_temp.append(l.temperature('celsius')['temp_max'])
list_min_temp.append(l.temperature('celsius')['temp_min'])
list_humidity.append(l.humidity)
list_pressure.append(l.pressure['press'])
list_cloud.append(l.clouds)
list_rain.append(l.rain)
else:
time.sleep(60)
j=0
'''
Step-3 Blank data frame and store data in that
'''
df2 = pd.DataFrame()
df2["City"] = list_city
df2["Temp"] = list_tempreture
df2["Max_Temp"] = list_max_temp
df2["Min_Temp"] = list_min_temp
df2["Cloud"] = list_cloud
df2["Humidity"] = list_humidity
df2["Pressure"] = list_pressure
df2["Status"] = list_status
df2["Rain"] = list_status
df2
From the above code, I get the result as below,
City | Temp |Max_Temp|Min_Temp|Cloud |Humidity|Pressure |Status | Rain
------------------------------------------------------------------------------------------
Kuala Lumpur|29.22 |30.00 |27.78 | 20 |70 |1007 | moderate rain | moderate rain
Dublin |23.12 |26.43 |22.34 | 15 |89 | 978 | cloudy | cloudy
...
Now because of some city typo error processes getting stop,
Looking for an alternate solution of it and try to get weather data from Lat-Long but don't know how to set function for pass lat & long column data.
Df1 = {'User':['A','B','C','D','E'],
'City':['Kuala Lumpur','Dublin','Oconomowoc','Mumbai','Mratin'],
'State':['Wilayah Persekutuan','County Dublin',NA,1'Maharashtra','Stredocesky kraj'],
'Zip': [50100,NA,NA,400067,250 63],
'Lat':[5.3288907,50.2030506,53.3640384,19.2177166,40.7560585],
'Long':[103.1344397,14.5509842,-6.1953066,72.9708833,-5.6924778]}
# Try to use this code to get wather data
# one_call_obs = owm.weather_at_coords(52.5244, 13.4105).weather
# one_call_obs.current.humidity
Expected Result
--------------
User | City | Lat | Long | Temp | Cloud | Humidity | Pressure | Rain | Status
-----------------------------------------------------------------------------

Catch the error if a city is not found, parse the lat/lon from the dataframe. Use that lat/lon to create a bounding box and use weather_at_places_in_bbox to get a list of observations in that area.
import time
from tqdm.notebook import tqdm
import pyowm
from pyowm.utils import config
from pyowm.utils import timestamps
import pandas as pd
from pyowm.commons.exceptions import NotFoundError, ParseAPIResponseError
df1 = pd.DataFrame({'City': ('Kuala Lumpur', 'Dublin', 'Oconomowoc', 'Mumbai', 'C airo', 'Mratin'),
'Lat': ('5.3288907', '50.2030506', '53.3640384', '19.2177166', '30.22', '40.7560585'),
'Long': ('103.1344397', '14.5509842', '-6.1953066', '72.9708833', '31', '-5.6924778')})
cities = df1["City"].unique().tolist()
owm = pyowm.OWM('bee8db7d50a4b777bfbb9f47d9beb7d0')
mgr = owm.weather_manager()
for city in cities:
try:
observation = mgr.weather_at_place(city)
# print(city, observation)
except NotFoundError:
# get city by lat/lon
lat_top = float(df1.loc[df1['City'] == city, 'Lat'])
lon_left = float(df1.loc[df1['City'] == city, 'Long'])
lat_bottom = lat_top - 0.3
lon_right = lon_left + 0.3
try:
observations = mgr.weather_at_places_in_bbox(lon_left, lat_bottom, lon_right, lat_top, zoom=5)
observation = observations[0]
except ParseAPIResponseError:
raise RuntimeError(f"Couldn't find {city} at lat: {lat_top} / lon: {lon_right}, try tweaking the bounding box")
weather = observation.weather
temp = weather.temperature('celsius')['temp']
print(f"The current temperature in {city} is {temp}")

BeautifulSoup + For loop Zip - How to Try/Except one of them

Got the following code:
from bs4 import BeautifulSoup
import requests
import re
# Source Sites
mimo = 'https://tienda.mimo.com.ar/mimo/junior/ropa-para-ninas.html'
cheeky = ''
grisino = ''
source = requests.get(mimo).text
soup = BeautifulSoup(source, 'lxml')
for name_product, old_price, special_price in zip(soup.select('h3.titprod'),
soup.select('span[id^="old-price"]'),
soup.select('span[id^="product-price"]')):
print(f'Name: {name_product.text.strip()} | Old price = {old_price.text.strip()} | Discounted price = {special_price.text.strip()}')
that outputs the information perfectly:
Name: TAPABOCAS | Old price = $ 295 | Discounted price = $ 236
Name: REMERA JR TOWN | Old price = $ 990 | Discounted price = $ 743
Name: CAMISOLA NENA DELFI | Old price = $ 2.300 | Discounted price = $ 1.725
Name: CAMISOLA JR TRAFUL | Old price = $ 1.550 | Discounted price = $ 1.163
Name: VESTIDO NENA DELFI | Old price = $ 2.990 | Discounted price = $ 2.243
Name: SAQUITO JR DESAGUJADO | Old price = $ 1.990 | Discounted price = $ 1.493
Name: JEGGING JR ENGOMADO | Old price = $ 1.990 | Discounted price = $ 1.493
but...sometimes the special_price loop won't find a discounted price..so i need to make a try/except, tried to "preprocess it"..but i do not know how make it work..
special_prices_with_defaults_added = []
for sp in soup.select('span[id^="product-price"]'):
try:
special_prices_with_defaults_added.append(sp.text.strip())
except:
special_prices_with_defaults_added.append("No default price available")
for name_product, old_price, special_price in zip(
soup.select('h3.titprod'), soup.select('span[id^="old-price"]'), special_prices_with_defaults_added):
print(f'Name: {name_product.text.strip()} | Old price = {old_price.text.strip()} | Discounted price = {special_prices_with_defaults_added}')
Wrongly output:
Name: TAPABOCAS | Old price = $ 295 | Discounted price = ['$\xa0236', '$\xa0743', '$\xa01.725', '$\xa01.163', '$\xa02.243', '$\xa01.493', '$\xa01.493', '$\xa02.925', '$\xa0668', '$\xa0713', '$\xa01.688', '$\xa01.268', '$\xa0593', '$\xa0743', '$\xa01.125', '$\xa03.300', '$\xa02.175', '$\xa0743', '$\xa01.493', '$\xa0863', '$\xa0668', '$\xa0792', '$\xa01.520', '$\xa01.760', '$\xa0696', '$\xa03.150', '$\xa03.520', '$\xa0712', '$\xa01.352', '$\xa01.112', '$\xa01.112', '$\xa01.192', '$\xa02.800', '$\xa02.720', '$\xa03.920', '$\xa01.920']
Name: REMERA JR TOWN | Old price = $ 990 | Discounted price = ['$\xa0236', '$\xa0743', '$\xa01.725', '$\xa01.163', '$\xa02.243', '$\xa01.493', '$\xa01.493', '$\xa02.925', '$\xa0668', '$\xa0713', '$\xa01.688', '$\xa01.268', '$\xa0593', '$\xa0743', '$\xa01.125', '$\xa03.300', '$\xa02.175', '$\xa0743', '$\xa01.493', '$\xa0863', '$\xa0668', '$\xa0792', '$\xa01.520', '$\xa01.760', '$\xa0696', '$\xa03.150', '$\xa03.520', '$\xa0712', '$\xa01.352', '$\xa01.112', '$\xa01.112', '$\xa01.192', '$\xa02.800', '$\xa02.720', '$\xa03.920', '$\xa01.920']

As #furas said... it was just a small fix on the for loop call.
for name_product, old_price, special_price in zip(
soup.select('h3.titprod'), soup.select('span[id^="old-price"]'), special_prices_with_defaults_added):
print(
f'Name: {name_product.text.strip()} | Old price = {old_price.text.strip()} | Discounted price = {special_price}')

How to split string into column names with python/panda?

Do you know how to solve this in python? I would like to have a dataframe with data arranged in the correct column.
Thanks in advance!
Here is an example of a string from a dataframe.
' Huidigefuncties Michael Jordan 2015 - present Director Marketing & Indirect Channels, Ricoh Nederland 2010 - present Basketball Center, Center for Business-Expertise Loopbaan Michael Jordan 2012 - 2015 Director Marketing & Business Development, Ricoh Opleiding Michael Jordan 1988 - 1992 Marketing , Harvard '
Preferred result
type from to function organization
current 2015 present Director Marketing & Indirect Channels Ricoh Nederland
current 2010 present Owner & Consultant Basketball Center
old 2012 2015 Director Marketing & Business Development Ricoh
school 1988 1992 Marketing Harvard
Current df
Name Data
Michael Jordan ' Huidigefuncties Michael Jordan 2015 - present Director Marketing & Indirect Channels, Ricoh Nederland 2010 - present Basketball Center, Center for Business-Expertise Loopbaan Michael Jordan 2012 - 2015 Director Marketing & Business Development, Ricoh Opleiding Michael Jordan 1988 - 1992 Marketing , Harvard '

Well, this is a solution that I did for this problem
import pandas as pd
beautiful_data = 'Huidigefuncties Michael Jordan 2015 - present Director Marketing & Indirect Channels, Ricoh Nederland 2010 - present Basketball Center, Center for Business-Expertise Loopbaan Michael Jordan 2012 - 2015 Director Marketing & Business Development, Ricoh Opleiding Michael Jordan 1988 - 1992 Marketing , Harvard'
main_dict = {'type':[], 'from':[], 'to':[], 'function':[], 'organization': []}
data = beautiful_data.split(' ')
i = 0
huidi_index = data.index('Huidigefuncties')
loopbaan_index = data.index('Loopbaan')
ople_index = data.index('Opleiding')
# print(data)
while i < len(data):
if data[i] == 'Huidigefuncties':
line = ' '.join(data[i + 1: loopbaan_index])
i = loopbaan_index
print(line)
type_data = 'current'
elif data[i] == 'Loopbaan':
line = ' '.join(data[i + 1: ople_index])
i = ople_index
print(line)
type_data = 'old'
elif data[i] == 'Opleiding':
line = ' '.join(data[i+1: ])
i = len(data)
print(line)
type_data = 'school'
else:
i += 1
data_line = line.split('-')
if len(data_line) == 2:
print(type_data)
main_dict['type'].append(type_data)
from_data = data_line[0].strip().split(' ')[-1]
print(from_data)
main_dict['from'].append(from_data)
to_data = data_line[1].strip().split(' ')[0]
print(to_data)
main_dict['to'].append(to_data)
function_data = ' '.join(data_line[1].strip().split(' ')[1:-1])[:-1]
print(function_data)
main_dict['function'].append(function_data)
organization_data = data_line[1].split(',')[-1].strip()
print(organization_data)
main_dict['organization'].append(organization_data)
elif len(data_line) > 2:
j = 0
while j < len(data_line):
register_data = data_line[j:j+2]
if len(register_data) > 1:
if len(register_data[0].split(' ')) > 1 and len(register_data[1].split(' ')) > 1:
if j == 0:
print(register_data)
print('----------')
print(type_data)
main_dict['type'].append(type_data)
from_data = register_data[0].strip().split(' ')[-1]
print(from_data)
main_dict['from'].append(from_data)
to_data = register_data[1].strip().split(' ')[0]
print(to_data)
main_dict['to'].append(to_data)
function_org = register_data[1].strip().split(',')
function_data = ' '.join(function_org[0].split(' ')[1:])
print(function_data)
main_dict['function'].append(function_data)
org_data = ' '.join(function_org[1].split(' ')[:-1]).strip()
print(org_data)
main_dict['organization'].append(org_data)
print('-----------')
else:
print('-----------')
print(register_data)
print(type_data)
main_dict['type'].append(type_data)
from_data = register_data[0].strip().split(' ')[-1]
print(from_data)
main_dict['from'].append(from_data)
to_data = register_data[1].strip().split(' ')[0]
print(to_data)
main_dict['to'].append(to_data)
function_org = register_data[1].strip().split(',')
function_data = ' '.join(function_org[0].split(' ')[1:])
print(function_data)
main_dict['function'].append(function_data)
org_data = ' '.join(function_org[1].split(' ')).strip()
print(org_data)
main_dict['organization'].append(org_data)
print('-----------')
j += 1
df = pd.DataFrame(main_dict)
Tested

regex to parse well-formated multi-line data dictionary

I am trying to read and parse a data dictionary for the Census Bureau's American Community Survey Public Use Microsample data release, as found here.
It is reasonably well formated, although with a few lapses where a few explanatory notes are inserted.
I think my preferred outcome is to either get a dataframe with one row per variable, and serialize all value labels for a given variable into one dictionary stored in a value dictionary field in the same row (although a hierarchical json-like format would not be bad, but more complicated.
I got the following code:
import pandas as pd
import re
import urllib2
data = urllib2.urlopen('http://www.census.gov/acs/www/Downloads/data_documentation/pums/DataDict/PUMSDataDict13.txt')
## replace newline characters so we can use dots and find everything until a double
## carriage return (replaced to ||) with a lookahead assertion.
data=data.replace('\n','|')
datadict=pd.DataFrame(re.findall("([A-Z]{2,8})\s{2,9}([0-9]{1})\s{2,6}\|\s{2,4}([A-Za-z\-\(\) ]{3,85})",data,re.MULTILINE),columns=['variable','width','description'])
datadict.head(5)
+----+----------+-------+------------------------------------------------+
| | variable | width | description |
+----+----------+-------+------------------------------------------------+
| 0 | RT | 1 | Record Type |
+----+----------+-------+------------------------------------------------+
| 1 | SERIALNO | 7 | Housing unit |
+----+----------+-------+------------------------------------------------+
| 2 | DIVISION | 1 | Division code |
+----+----------+-------+------------------------------------------------+
| 3 | PUMA | 5 | Public use microdata area code (PUMA) based on |
+----+----------+-------+------------------------------------------------+
| 4 | REGION | 1 | Region code |
+----+----------+-------+------------------------------------------------+
| 5 | ST | 2 | State Code |
+----+----------+-------+------------------------------------------------+
So far so good. The list of variables is there, along with the width in characters of each.
I can expand this and get additional lines (where the value labels live), like so:
datadict_exp=pd.DataFrame(
re.findall("([A-Z]{2,9})\s{2,9}([0-9]{1})\s{2,6}\|\s{4}([A-Za-z\-\(\)\;\<\> 0-9]{2,85})\|\s{11,15}([a-z0-9]{0,2})[ ]\.([A-Za-z/\-\(\) ]{2,120})",
data,re.MULTILINE))
datadict_exp.head(5)
+----+----------+-------+---------------------------------------------------+---------+--------------+
| id | variable | width | description | value_1 | label_1 |
+----+----------+-------+---------------------------------------------------+---------+--------------+
| 0 | DIVISION | 1 | Division code | 0 | Puerto Rico |
+----+----------+-------+---------------------------------------------------+---------+--------------+
| 1 | REGION | 1 | Region code | 1 | Northeast |
+----+----------+-------+---------------------------------------------------+---------+--------------+
| 2 | ST | 2 | State Code | 1 | Alabama/AL |
+----+----------+-------+---------------------------------------------------+---------+--------------+
| 3 | NP | 2 | Number of person records following this housin... | 0 | Vacant unit |
+----+----------+-------+---------------------------------------------------+---------+--------------+
| 4 | TYPE | 1 | Type of unit | 1 | Housing unit |
+----+----------+-------+---------------------------------------------------+---------+--------------+
So that gets the first value and associated label. My regex issue is here how to repeat the multi-line match starting with \s{11,15} and to the end--i.e. some variables have tons of unique values (ST or state code is followed by some 50 lines, denoting the value and label for each state).
I changed early on the carriage return in the source file with a pipe, thinking that I could then shamelessly rely on the dot to match everything until a double carriage return, indicating the end of that particular variable, and here is where I got stuck.
So--how to repeat a multi-line pattern an arbitrary number of times.
(A complication for later is that some variables are not fully enumerated in the dictionary, but are shown with valid ranges of values. NP for example [number of persons associated with the same household], is denoted with ``02..20` following a description. If I don't account for this, my parsing will miss such entries, of course.)

This isn't a regex, but I parsed PUMSDataDict2013.txt and PUMS_Data_Dictionary_2009-2013.txt (Census ACS 2013 documentation, FTP server) with this Python 3x script below. I used pandas.DataFrame.from_dict and pandas.concat to create a hierarchical dataframe, also below.
Python 3x function to parse PUMSDataDict2013.txt and PUMS_Data_Dictionary_2009-2013.txt:
import collections
import os
def parse_pumsdatadict(path:str) -> collections.OrderedDict:
r"""Parse ACS PUMS Data Dictionaries.
Args:
path (str): Path to downloaded data dictionary.
Returns:
ddict (collections.OrderedDict): Parsed data dictionary with original
key order preserved.
Raises:
FileNotFoundError: Raised if `path` does not exist.
Notes:
* Only some data dictionaries have been tested.[^urls]
* Values are all strings. No data types are inferred from the
original file.
* Example structure of returned `ddict`:
ddict['title'] = '2013 ACS PUMS DATA DICTIONARY'
ddict['date'] = 'August 7, 2015'
ddict['record_types']['HOUSING RECORD']['RT']\
['length'] = '1'
['description'] = 'Record Type'
['var_codes']['H'] = 'Housing Record or Group Quarters Unit'
ddict['record_types']['HOUSING RECORD'][...]
ddict['record_types']['PERSON RECORD'][...]
ddict['notes'] =
['Note for both Industry and Occupation lists...',
'* In cases where the SOC occupation code ends...',
...]
References:
[^urls]: http://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/
PUMSDataDict2013.txt
PUMS_Data_Dictionary_2009-2013.txt
"""
# Check arguments.
if not os.path.exists(path):
raise FileNotFoundError(
"Path does not exist:\n{path}".format(path=path))
# Parse data dictionary.
# Note:
# * Data dictionary keys and values are "codes for variables",
# using the ACS terminology,
# https://www.census.gov/programs-surveys/acs/technical-documentation/pums/documentation.html
# * The data dictionary is not all encoded in UTF-8. Replace encoding
# errors when found.
# * Catch instances of inconsistently formatted data.
ddict = collections.OrderedDict()
with open(path, encoding='utf-8', errors='replace') as fobj:
# Data dictionary name is line 1.
ddict['title'] = fobj.readline().strip()
# Data dictionary date is line 2.
ddict['date'] = fobj.readline().strip()
# Initialize flags to catch lines.
(catch_var_name, catch_var_desc,
catch_var_code, catch_var_note) = (None, )*4
var_name = None
var_name_last = 'PWGTP80' # Necessary for unformatted end-of-file notes.
for line in fobj:
# Replace tabs with 4 spaces
line = line.replace('\t', ' '*4).rstrip()
# Record type is section header 'HOUSING RECORD' or 'PERSON RECORD'.
if (line.strip() == 'HOUSING RECORD'
or line.strip() == 'PERSON RECORD'):
record_type = line.strip()
if 'record_types' not in ddict:
ddict['record_types'] = collections.OrderedDict()
ddict['record_types'][record_type] = collections.OrderedDict()
# A newline precedes a variable name.
# A newline follows the last variable code.
elif line == '':
# Example inconsistent format case:
# WGTP54 5
# Housing Weight replicate 54
#
# -9999..09999 .Integer weight of housing unit
if (catch_var_code
and 'var_codes' not in ddict['record_types'][record_type][var_name]):
pass
# Terminate the previous variable block and look for the next
# variable name, unless past last variable name.
else:
catch_var_code = False
catch_var_note = False
if var_name != var_name_last:
catch_var_name = True
# Variable name is 1 line with 0 space indent.
# Variable name is followed by variable description.
# Variable note is optional.
# Variable note is preceded by newline.
# Variable note is 1+ lines.
# Variable note is followed by newline.
elif (catch_var_name and not line.startswith(' ')
and var_name != var_name_last):
# Example: "Note: Public use microdata areas (PUMAs) ..."
if line.lower().startswith('note:'):
var_note = line.strip() # type(var_note) == str
if 'notes' not in ddict['record_types'][record_type][var_name]:
ddict['record_types'][record_type][var_name]['notes'] = list()
# Append a new note.
ddict['record_types'][record_type][var_name]['notes'].append(var_note)
catch_var_note = True
# Example: """
# Note: Public Use Microdata Areas (PUMAs) designate areas ...
# population. Use with ST for unique code. PUMA00 applies ...
# ...
# """
elif catch_var_note:
var_note = line.strip() # type(var_note) == str
if 'notes' not in ddict['record_types'][record_type][var_name]:
ddict['record_types'][record_type][var_name]['notes'] = list()
# Concatenate to most recent note.
ddict['record_types'][record_type][var_name]['notes'][-1] += ' '+var_note
# Example: "NWAB 1 (UNEDITED - See 'Employment Status Recode' (ESR))"
else:
# type(var_note) == list
(var_name, var_len, *var_note) = line.strip().split(maxsplit=2)
ddict['record_types'][record_type][var_name] = collections.OrderedDict()
ddict['record_types'][record_type][var_name]['length'] = var_len
# Append a new note if exists.
if len(var_note) > 0:
if 'notes' not in ddict['record_types'][record_type][var_name]:
ddict['record_types'][record_type][var_name]['notes'] = list()
ddict['record_types'][record_type][var_name]['notes'].append(var_note[0])
catch_var_name = False
catch_var_desc = True
var_desc_indent = None
# Variable description is 1+ lines with 1+ space indent.
# Variable description is followed by variable code(s).
# Variable code(s) is 1+ line with larger whitespace indent
# than variable description. Example:"""
# PUMA00 5
# Public use microdata area code (PUMA) based on Census 2000 definition for data
# collected prior to 2012. Use in combination with PUMA10.
# 00100..08200 .Public use microdata area codes
# 77777 .Combination of 01801, 01802, and 01905 in Louisiana
# -0009 .Code classification is Not Applicable because data
# .collected in 2012 or later
# """
# The last variable code is followed by a newline.
elif (catch_var_desc or catch_var_code) and line.startswith(' '):
indent = len(line) - len(line.lstrip())
# For line 1 of variable description.
if catch_var_desc and var_desc_indent is None:
var_desc_indent = indent
var_desc = line.strip()
ddict['record_types'][record_type][var_name]['description'] = var_desc
# For lines 2+ of variable description.
elif catch_var_desc and indent <= var_desc_indent:
var_desc = line.strip()
ddict['record_types'][record_type][var_name]['description'] += ' '+var_desc
# For lines 1+ of variable codes.
else:
catch_var_desc = False
catch_var_code = True
is_valid_code = None
if not line.strip().startswith('.'):
# Example case: "01 .One person record (one person in household or"
if ' .' in line:
(var_code, var_code_desc) = line.strip().split(
sep=' .', maxsplit=1)
is_valid_code = True
# Example inconsistent format case:"""
# bbbb. N/A (age less than 15 years; never married)
# """
elif '. ' in line:
(var_code, var_code_desc) = line.strip().split(
sep='. ', maxsplit=1)
is_valid_code = True
else:
raise AssertionError(
"Program error. Line unaccounted for:\n" +
"{line}".format(line=line))
if is_valid_code:
if 'var_codes' not in ddict['record_types'][record_type][var_name]:
ddict['record_types'][record_type][var_name]['var_codes'] = collections.OrderedDict()
ddict['record_types'][record_type][var_name]['var_codes'][var_code] = var_code_desc
# Example case: ".any person in group quarters)"
else:
var_code_desc = line.strip().lstrip('.')
ddict['record_types'][record_type][var_name]['var_codes'][var_code] += ' '+var_code_desc
# Example inconsistent format case:"""
# ADJHSG 7
# Adjustment factor for housing dollar amounts (6 implied decimal places)
# """
elif (catch_var_desc and
'description' not in ddict['record_types'][record_type][var_name]):
var_desc = line.strip()
ddict['record_types'][record_type][var_name]['description'] = var_desc
catch_var_desc = False
catch_var_code = True
# Example inconsistent format case:"""
# WGTP10 5
# Housing Weight replicate 10
# -9999..09999 .Integer weight of housing unit
# WGTP11 5
# Housing Weight replicate 11
# -9999..09999 .Integer weight of housing unit
# """
elif ((var_name == 'WGTP10' and 'WGTP11' in line)
or (var_name == 'YOEP12' and 'ANC' in line)):
# type(var_note) == list
(var_name, var_len, *var_note) = line.strip().split(maxsplit=2)
ddict['record_types'][record_type][var_name] = collections.OrderedDict()
ddict['record_types'][record_type][var_name]['length'] = var_len
if len(var_note) > 0:
if 'notes' not in ddict['record_types'][record_type][var_name]:
ddict['record_types'][record_type][var_name]['notes'] = list()
ddict['record_types'][record_type][var_name]['notes'].append(var_note[0])
catch_var_name = False
catch_var_desc = True
var_desc_indent = None
else:
if (catch_var_name, catch_var_desc,
catch_var_code, catch_var_note) != (False, )*4:
raise AssertionError(
"Program error. All flags to catch lines should be set " +
"to `False` by end-of-file.")
if var_name != var_name_last:
raise AssertionError(
"Program error. End-of-file notes should only be read "+
"after `var_name_last` has been processed.")
if 'notes' not in ddict:
ddict['notes'] = list()
ddict['notes'].append(line)
return ddict
Create the hierarchical dataframe (formatted below as Jupyter Notebook cells):
In [ ]:
import pandas as pd
ddict = parse_pumsdatadict(path=r'/path/to/PUMSDataDict2013.txt')
tmp = dict()
for record_type in ddict['record_types']:
tmp[record_type] = pd.DataFrame.from_dict(ddict['record_types'][record_type], orient='index')
df_ddict = pd.concat(tmp, names=['record_type', 'var_name'])
df_ddict.head()
Out[ ]:
# Click "Run code snippet" below to render the output from `df_ddict.head()`.
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th></th>
<th>length</th>
<th>description</th>
<th>var_codes</th>
<th>notes</th>
</tr>
<tr>
<th>record_type</th>
<th>var_name</th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th rowspan="5" valign="top">HOUSING RECORD</th>
<th>ACCESS</th>
<td>1</td>
<td>Access to the Internet</td>
<td>{'b': 'N/A (GQ)', '1': 'Yes, with subscription...</td>
<td>NaN</td>
</tr>
<tr>
<th>ACR</th>
<td>1</td>
<td>Lot size</td>
<td>{'b': 'N/A (GQ/not a one-family house or mobil...</td>
<td>NaN</td>
</tr>
<tr>
<th>ADJHSG</th>
<td>7</td>
<td>Adjustment factor for housing dollar amounts (...</td>
<td>{'1000000': '2013 factor (1.000000)'}</td>
<td>[Note: The value of ADJHSG inflation-adjusts r...</td>
</tr>
<tr>
<th>ADJINC</th>
<td>7</td>
<td>Adjustment factor for income and earnings doll...</td>
<td>{'1007549': '2013 factor (1.007549)'}</td>
<td>[Note: The value of ADJINC inflation-adjusts r...</td>
</tr>
<tr>
<th>AGS</th>
<td>1</td>
<td>Sales of Agriculture Products (Yearly sales)</td>
<td>{'b': 'N/A (GQ/vacant/not a one family house o...</td>
<td>[Note: no adjustment factor is applied to AGS.]</td>
</tr>
</tbody>
</table>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Different Behavior for Parsing Two Similar Wikipedia Infoboxes - python

Related

Find intersection of values in dictionary under a same key - python

How to create function to pass lat long in api call for get weather data

BeautifulSoup + For loop Zip - How to Try/Except one of them

How to split string into column names with python/panda?

regex to parse well-formated multi-line data dictionary

Categories

Resources