How to decode this Ajax response in python? - python

How do I decode the following response from this url in python? https://www.scorespro.com/livescore/ajax0.php
1599071734^^~~##Wed 02 Sep 21:35 GMT +03^^~~##2361498-1##194837##0##2020-09-02 17:00:00##76##1##18032##17842##Club Friendly##0##Real Sociedad##Villarreal##CLB##un##FG##0-2##0-2##2 HF##Friendly Games##1599066000######2##99######real-sociedad-vs-villarreal/02-09-2020##friendly-games##club-friendly##2##LEAGUE##2020##round-1##0####0##1599066240##############0##0 2325164-1##196097##0##2020-09-02 17:00:00##71##1##105187##104946##Canadian Premier League - Premier League##0##Valour FC (5)##HFX Wanderers FC (6)##PL##ca##CAN##0-2##0-2##2 HF##Canada##1599066000######2##99######valour-fc-vs-hfx-wanderers-fc/30-05-2020##canada##premier-league##1##LEAGUE##2020##round-1##1##canadian-premier-league##0##1599066540##############0##0 2338959-1##197065##0##2020-09-02 17:00:00##81##4##39942##41961##Regionalliga Nordost##0##Germania Halberstadt (18)##Optik Rathenow (20)##N/E##de##GER##0-2##0-0##2 HF##Germany##1599066000######2##99######germania-halberstadt-vs-optik-rathenow/02-09-2020##germany##regionalliga-nordost##5##LEAGUE##2020-2021##round-4##1####0##1599065940##############0##0 2338955-1##197065##0##2020-09-02 17:00:00##81##4##44124##56097##Regionalliga Nordost##0##Viktoria Berlin (2)##VSG Altglienicke (1)##N/E##de##GER##2-1##1-1##2 HF##Germany##1599066000######2##99######viktoria-berlin-vs-vsg-altglienicke/02-09-2020##germany##regionalliga-nordost##5##LEAGUE##2020-2021##round-4##1####0##1599065940##############0##0 2338958-1##197065##0##2020-09-02 17:00:00##78##4##13847##3034##Regionalliga Nordost##0##SV Babelsberg 03 (12)##Chemnitzer FC (15)##N/E##de##GER##2-2##1-1##2 HF##Germany##1599066000######2##99######sv-babelsberg-03-vs-chemnitzer-fc/02-09-2020##germany##regionalliga-nordost##5##LEAGUE##2020-2021##round-4##1####0##1599066120##############0##0 2338954-1##197065##0##2020-09-02 17:00:00##79##4##21173##37508##Regionalliga Nordost##0##Hertha Berlin II (5)##Berliner AK 07 (16)##N/E##de##GER##2-5##1-2##2 HF##Germany##1599066000######2##99######hertha-berlin-ii-vs-berliner-ak-07/02-09-2020##germany##regionalliga-nordost##5##LEAGUE##2020-2021##round-4##1####0##1599066060##############0##0 2361307-1##197664##0##2020-09-02 17:00:00##81##1##24981##21152##Landspokal##0##Slagelse##Dalum##CUP##dk##DEN##1-1##0-0##2 HF##Denmark##1599066000######2##99######slagelse-vs-dalum/01-09-2020##denmark##fa-cup##6##PHASE##2020-2021##round-1##0####0##1599065940##2.20##2.87##3.75########0##0 2338953-1##197065##0##2020-09-02 17:00:00##80##4##41959##2993##Regionalliga Nordost##0##VfB Auerbach (7)##Energie Cottbus (19)##N/E##de##GER##2-4##1-2##2 HF##Germany##1599066000######2##99######vfb-auerbach-vs-energie-cottbus/02-09-2020##germany##regionalliga-nordost##5##LEAGUE##2020-2021##round-4##1####0##1599066000##############0##0 2307988-1##195163##0##2020-09-02 17:15:00##62##10##34050##49891##Serie A - First Stage##0##CD Olmedo (16)##Delfin SC (10)##SA1##ec##ECU##2-0##2-0##2 HF##Ecuador##1599066900######2##99######cd-olmedo-vs-delfin-sc/10-05-2020##ecuador##first-stage##1##LEAGUE##2020##round-10##1##serie-a##0##1599067080##2.45##2.55##3.25########1##1 2338956-1##197065##0##2020-09-02 17:30:00##HT##4##41960##50882##Regionalliga Nordost##0##Lokomotive Leipzig (13)##FSV 63 Luckenwalde (8)##N/E##de##GER##1-0##1-0##H/T##Germany##1599067800######2##99######lokomotive-leipzig-vs-fsv-63-luckenwalde/02-09-2020##germany##regionalliga-nordost##5##LEAGUE##2020-2021##round-4##1####0##0##############0##0 2367153-1##194837##0##2020-09-02 17:30:00##HT##1##18022##4189##Club Friendly##0##Real Betis##Almeria##CLB##un##FG##1-0##1-0##H/T##Friendly Games##1599067800######2##99######real-betis-vs-almeria/02-09-2020##friendly-games##club-friendly##2##LEAGUE##2020##round-1##0####0##0##1.53##6.00##3.50########0##0 2313051-1##195400##0##2020-09-02 17:30:00##48##15##43773##103469##1. Deild##0##Magni (12)##Afturelding (8)##D2##is##ISL##2-0##2-0##2 HF##Iceland##1599067800######2##99######magni-vs-afturelding/29-07-2020##iceland##1-deild##2##LEAGUE##2020##round-15##1####0##1599067920##############0##0 2366633-1##194837##0##2020-09-02 17:30:00##47##1##18052##28704##Club Friendly##0##Levante##Cartagena##CLB##un##FG##1-1##1-1##2 HF##Friendly Games##1599067800######2##99######levante-vs-cartagena/02-09-2020##friendly-games##club-friendly##2##LEAGUE##2020##round-1##0####0##1599067980##1.40##5.25##4.20########0##1 2313052-1##195400##0##2020-09-02 17:30:00##47##15##52987##30414##1. Deild##0##Vestri (7)##Thor Akureyri (5)##D2##is##ISL##1-0##1-0##2 HF##Iceland##1599067800######2##99######vestri-vs-thor-akureyri/29-07-2020##iceland##1-deild##2##LEAGUE##2020##round-15##1####0##1599067980##############0##0 2313056-1##195400##0##2020-09-02 17:30:00##47##15##26363##32547##1. Deild##0##IBV Vestmannaeyjar (3)##Leiknir R. (4)##D2##is##ISL##0-2##0-2##2 HF##Iceland##1599067800######2##99######ibv-vestmannaeyjar-vs-leiknir-r/01-08-2020##iceland##1-deild##2##LEAGUE##2020##round-15##1####0##1599067980##############0##0 2363441-1##194837##0##2020-09-02 18:00:00##36##1##21281##3220##Club Friendly##0##Benfica##SC Braga##CLB##un##FG##0-0##-##1 HF##Friendly Games##1599069600######2##99######benfica-vs-sc-braga/02-09-2020##friendly-games##club-friendly##2##LEAGUE##2020##round-1##0####0##1599069540##1.55##5.50##3.50########0##0 2289461-1##193678##0##2020-09-02 18:30:00##4##24##40009##40019##Premier League##0##Smouha SC (6)##El Entag El Harby (14)##PL##eg##EGY##0-0##-##1 HF##Egypt##1599071400######2##99######smouha-sc-vs-el-entag-el-harby/02-03-2020##egypt##premier-league##1##LEAGUE##2019-2020##round-24##1####0##1599071460##2.15##4.00##2.70########0##0 2211667-1##190057##0##2020-09-02 18:30:00##1##1##23376##19092##U21 Championship - Qualifying Group Stage##0##San Marino U21 (6)##Czech Republic U21 (1)##QR##eu##UEF##0-0##-##1 HF##Europe (UEFA)##1599071400######2##8######san-marino-u21-vs-czech-republic-u21/02-09-2020##uefa##qualifying-group-stage##8##PHASE##2021-hungary-slovenia##round-1##1##u21-championship##1##1599071640##29.00##1.01##21.00########0##0 ^^##a##1599071731##1599071658^^~~##5333322-1##197760##0-1##Set 2##2##40089##32214##Pavic/Soares B.##Granollers-P M./Zeballos H. (5)##US OPEN##us##ATP##3-6|2-2|-|-|-##ATP Doubles##US Open##1599068700####H##atp-doubles##us-open##2020######195167######0####13##R32##3########################Set2##-2##z##1##- 5333324-1##197760##0-1##Set 2##2##41811##51180##Bambridge L./McLachlan B.##Eubanks C./Mcdonald M. (wc)##US OPEN##us##ATP##3-6|2-5|-|-|-##ATP Doubles##US Open##1599067500####A##atp-doubles##us-open##2020######195167######0####16##R32##3########################Set2##-2##z##0##- 5333325-1##197760##1-1##Set 3##2##27573##39167##Chardy J./Martin F.##Harrison C./Harrison R. (wc)##US OPEN##us##ATP##7-5|65-77|0-0|-|-##ATP Doubles##US Open##1599066000####H##atp-doubles##us-open##2020######195167######0####25##R32##3########################Set3##-2##z##1##30-30 5333330-1##197760##1-0##Set 2##2##143786##21596##Gille S./Vliegen J.##Kubot L./Melo M. (2)##US OPEN##us##ATP##6-2|4-3|-|-|-##ATP Doubles##US Open##1599067500####H##atp-doubles##us-open##2020######195167######0####15##R32##3########################Set2##-2##z##1##30-15 5334089-1##197805##1-1##Set 3##2##145891##145017##Carlos Alcaraz (se)##Juan Pablo Ficovich####it##CHM##4-6|6-3|5-4|-|-##Challenger Men Singles##Cordenons (Italy)##1599063000####H##challenger-men-singles##cordenons##2020##es##ar##195223##ESP##ARG######28##R32##5########################Set3##10##z##1##- 5334294-1##197761##0-1##Set 2##2##46027##43093##Gerasimov E.##Thompson J.##US OPEN##us##ATP##1-6|3-5|-|-|-##ATP Singles##US Open##1599067500####H##atp-singles##us-open##2020##by##au##195166##BLR##AUS##0####15##R64##1######gerasimov-e##thompson-j################Set2##-2##z##1##40-30 5334313-1##197761##0-0##Set 1##2##58519##52671##Davidovich Fokina A.##Hurkacz H. (24)##US OPEN##us##ATP##0-0|-|-|-|-##ATP Singles##US Open##1599071700####A##atp-singles##us-open##2020##es##pl##195166##ESP##POL##0####0##R64##1######davidovich-fokina-a##hurkacz-h################Set1##-2##z##1##40-15 5334316-1##197761##0-0##Set 1##2##21374##41813##Djokovic N. (1)##Edmund K.##US OPEN##us##ATP##1-2|-|-|-|-##ATP Singles##US Open##1599070500####H##atp-singles##us-open##2020##rs##gb-eng##195166##SRB##ENG##0####3##R64##1######novak-djokovic##edmund-k################Set1##-2##z##1##- 5334317-1##197761##0-1##Set 2##2##143584##44241##Nakashima B. (wc)##Zverev A. (5)##US OPEN##us##ATP##5-7|4-3|-|-|-##ATP Singles##US Open##1599066600####A##atp-singles##us-open##2020##us##de##195166##USA##GER##0####19##R64##1######nakashima-b##alexander-zverev################Set2##-2##z##1##15-40 5334319-1##197761##0-0##Set 1##2##55929##38842##Harris Ll.##Goffin D. (7)##US OPEN##us##ATP##4-3|-|-|-|-##ATP Singles##US Open##1599069300####A##atp-singles##us-open##2020##za##be##195166##RSA##BEL##0####7##R64##1######harris-ll##david-goffin################Set1##-2##z##1##30-15 5334322-1##197761##0-0##Set 1##2##32375##38073##Mannarino A. (32)##Sock J. (pr)##US OPEN##us##ATP##1-2|-|-|-|-##ATP Singles##US Open##1599070800####H##atp-singles##us-open##2020##fr##us##195166##FRA##USA##0####3##R64##1######adrian-mannarino##jack-sock################Set1##-2##z##1##30-15 5334325-1##197761##2-0##Set 3##2##31424##42596##Kukushkin M.##Garin C. (13)##US OPEN##us##ATP##6-2|6-1|2-5|-|-##ATP Singles##US Open##1599065100####H##atp-singles##us-open##2020##kz##cl##195166##KAZ##CHI##0####22##R64##1######mikhail-kukushkin##garin-c################Set3##-2##z##1##40-15 5334328-1##197761##0-0##Set 1##2##51475##43105##Mmoh M. (wc)##Struff J-L. (28)##US OPEN##us##ATP##2-5|-|-|-|-##ATP Singles##US Open##1599070200####A##atp-singles##us-open##2020##us##de##195166##USA##GER##0####7##R64##1######mmoh-m##jan-lennard-struff################Set1##-2##z##1##- 5334329-1##197763##0-1##Set 2##2##4337##41349##Flipkens K.##Pegula J. (wc)##US##us##WTA##61-77|0-0|-|-|-##WTA Singles##US Open##1599068100####H##wta-singles##us-open##2020##be##us##195168##BEL##USA##0####13##R64##2######kirsten-flipkens##pegula-j################Set2##10##z##1##15-0 ^^##a##1599071731##0^^~~##5320202-1##197316##68-62##Q4##2##47955##47954##TBV Start Lublin##Polski Cukier Torun##PLK-RS##pl##POL##21-16|15-15|18-21|14-10| - |36-31##Poland##Energa Basket Liga##1599066000######poland##energa-basket-liga##2020-2021######197315######1####1.21##4.25##########4Qrt##1##z##0 ^^##a##1599071661##0^^~~##5333286-3##197776##1-0##Set 2##2##8410##8414##Spor Toto (1)##Ziraat Bankasi (2)##GS##tr##TUR##25-18|8-5|-|-|-##Turkey##Turkish Cup - Group Stage##1599069600######turkey##national-cup##2020-2021######197447######1####56##############2S##3##z##0 ^^##a##1599071674##1599071126^^~~##5302817-3##196725##28-21##2H##2##140460##41158##Molde W##Larvik W##RS##no##NOR##13-9##Norway##REMA 1000-ligaen - Women##1599066900######norway##postenligaen-women##2020-2021######196717######1####49##1.12##7.50##12.00########2H##2##z##0 5303101-3##196762##21-13##2H##2##8559##3172##Sonderjyske##Skjern##RS##dk##DEN##16-10##Denmark##Handbold Liagen##1599067800######denmark##handball-league##2020-2021######196756######1####34##3.40##1.55##8.50########2H##1##z##0 5303102-3##196762##1-0##1H##2##3517##3516##Skanderborg##Arhus GF##RS##dk##DEN##1-0##Denmark##Handbold Liagen##1599071400######denmark##handball-league##2020-2021######196756######1##1H##1##1.35##4.50##9.50########1H##1##z##0 5304740-3##196776##25-16##2H##2##3587##6865##Kadetten Schaffhausen##Amicitia Zurich##RS##ch##SUI##10-9##Switzerland##NLA##1599066000######switzerland##nla##2020-2021######196774######1####41##1.11##8.00##12.00########2H##1##z##0 5304741-3##196776##12-11##2H##2##10782##10780##HC Kriens##Wacker Thun##RS##ch##SUI##11-10##Switzerland##NLA##1599067800######switzerland##nla##2020-2021######196774######1##H##23##1.50##3.20##8.00########2H##1##z##0 5304742-3##196776##22-18##2H##2##10786##10783##Pfadi Winterthur##Bern Muri##RS##ch##SUI##19-13##Switzerland##NLA##1599067800######switzerland##nla##2020-2021######196774######1##A##40##1.25##5.00##10.00########2H##1##z##0 5304743-3##196776##15-15##2H##2##10777##10784##St. Otmar St. Gallen##Suhr Aarau##RS##ch##SUI##14-15##Switzerland##NLA##1599067800######switzerland##nla##2020-2021######196774######1####30##2.00##2.15##7.50########2H##1##z##0 5304744-3##196776##7-7##1H##2##12340##10778##Endingen##1879 Basel##RS##ch##SUI##7-7##Switzerland##NLA##1599069600######switzerland##nla##2020-2021######196774######1####14##1.67##2.60##7.50########1H##1##z##0 5312581-3##197132##11-17##HT##2##41099##3282##Oroshazi##Pick Szeged##RS##hu##HUN##11-17##Hungary##Liga 1##1599068700######hungary##liga-1##2020-2021######197130######1####28##67.00##1.00##50.00########H/T##1##z##0 5334268-3##197814##2-3##1H##2##9591##9590##Fivers WAT Margareten##Alpla Hard##CUP##at##AUT##2-3##Austria##Super Cup - Cup##1599070800######austria##super-cup##2020-2021######197234######0##H##5##1.67##2.60##7.50########1H##5##z##0 ^^##a##1599071731##1599071728^^~~##5321546-1##197388##2-2##P3##2##22308##22300##HC CSKA Moscow##AK Bars Kazan##KHL-RS##ru##RUS##1-0|0-1|1-1| - | - ##Russia##KHL##1599064200######russia##khl##2020-2021######197387######1####hc-cska-moscow-vs-ak-bars-kazan/02-09-2020##1.67##2.25##########3Per##1##z##1 ^^##a##1599071289##0^^~~##^^##a##1597635989##0^^~~##^^##a##1599057728##0^^~~##^^##a##0##0^^~~##^^##a##1599042879##1590074504^^~~##^^##a##1599045851##0^^~~##^^##a##1599071265##0^^~~##45

Related

How to export a lot of routes in a shpfile from OSMNX

I have a trip data including lat,lng. I want to simulate the shortest paths of the trip,and export the paths to shpfile.Then I'll do the Linedensity Analysis to discover changes in the trips. I don't know how to export the paths as a shpfile
in once.
my sample data is below.you can save as station.csv
{
ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_casual
A847FADBBC638E45 docked_bike 2020/4/26 17:45 2020/4/26 18:12 Eckhart Park 86 Lincoln Ave & Diversey Pkwy 152 41.8964 -87.661 41.9322 -87.6586 member
5405B80E996FF60D docked_bike 2020/4/17 17:08 2020/4/17 17:17 Drake Ave & Fullerton Ave 503 Kosciuszko Park 499 41.9244 -87.7154 41.9306 -87.7238 member
5DD24A79A4E006F4 docked_bike 2020/4/1 17:54 2020/4/1 18:08 McClurg Ct & Erie St 142 Indiana Ave & Roosevelt Rd 255 41.8945 -87.6179 41.8679 -87.623 member
2A59BBDF5CDBA725 docked_bike 2020/4/7 12:50 2020/4/7 13:02 California Ave & Division St 216 Wood St & Augusta Blvd 657 41.903 -87.6975 41.8992 -87.6722 member
27AD306C119C6158 docked_bike 2020/4/18 10:22 2020/4/18 11:15 Rush St & Hubbard St 125 Sheridan Rd & Lawrence Ave 323 41.8902 -87.6262 41.9695 -87.6547 casual
356216E875132F61 docked_bike 2020/4/30 17:55 2020/4/30 18:01 Mies van der Rohe Way & Chicago Ave 173 Streeter Dr & Grand Ave 35 41.8969 -87.6217 41.8923 -87.612 member
A2759CB06A81F2BC docked_bike 2020/4/2 14:47 2020/4/2 14:52 Streeter Dr & Grand Ave 35 Fairbanks St & Superior St 635 41.8923 -87.612 41.8957 -87.6201 member
FC8BC2E2D54F35ED docked_bike 2020/4/7 12:22 2020/4/7 13:38 Ogden Ave & Roosevelt Rd 434 Western Ave & Congress Pkwy 382 41.8665 -87.6847 41.8747 -87.6864 casual
9EC5648678DE06E6 docked_bike 2020/4/15 10:30 2020/4/15 10:35 LaSalle Dr & Huron St 627 Larrabee St & Division St 359 41.8949 -87.6323 41.9035 -87.6434 casual
}
This is my code(only get the picture of the path):
`
import networkx as nx
import osmnx as ox
import geopandas as gpd
import pandas as pd
from shapely.geometry import LineString, MultiLineString
ox.config(log_console=True, use_cache=True)
# get a graph for some city
startlat = []
startlng = []
endlat = []
endlng = []
data = pd.read_csv("station.csv")
startlat = data['start_lat']
startlng = data['start_lng']
endlat = data['end_lat']
endlng = data['end_lng']
G2 = ox.graph_from_place('Chicago, Illinois', network_type='drive')
route_list = []
# get nodes and edges
nodes, edges = ox.graph_to_gdfs(G2, nodes=True, edges=True)
for i in range(len(startlng)):
origin = (startlat[i], startlng[i])
destination = (endlat[i], endlng[i])
origin_node = ox.get_nearest_node(G2, origin)
destination_node = ox.get_nearest_node(G2, destination)
# exception handling, skipping points without path
try:
route = nx.shortest_path(G2, origin_node, destination_node, )
route_nodes = nodes.loc[route]
# Create a geometry for the shortest path
route_line = MultiLineString(list(route_nodes.geometry.values))
# Create a GeoDataFrame
route_geom = gpd.GeoDataFrame([[route_line]], geometry='geometry', crs=edges.crs, columns=['geometry'])
except:
pass
route_list.append(route)
fig, ax = ox.plot_graph_routes(G2, route_list, node_size=0)
`

TypeError: unorderable types: int() < str()

There is an error occurs when I was applying the 5W1H extractor(which is an opensource library in Git) on my JSON news dataset.
The error occurs at evaluate_location file when it tried to run
raw_locations.sort(key=lambda x: x[1], reverse=True)
Then the console gave the error says
TypeError: unorderable types: int() < str()
My question is: Does this means something wrong with my dataset format? But if so shouldn't it consider all the news data as a simple long string when the extractor work on this corpus? I'm eagerly looking for a solution to this problem.
This is one of the json news data:
{
"title": "Football: Van Dijk, Ronaldo and Messi shortlisted for FIFA award",
"body": "ROME: Liverpool centre-back Virgil van Dijk is on the shortlist to add FIFA's best player award to his UEFA Men's Player of the Year honour.The Dutch international denied Cristiano Ronaldo and Lionel Messi for the European title last week and the same trio are in the running for the FIFA accolade to be announced in Milan on September 23. Van Dijk starred in Liverpool's triumphant Champions League campaign.England full-back Lucy Bronze won UEFA's women's award and is on FIFA's shortlist with the United States' World Cup-winning duo Megan Rapinoe and Alex Morgan.Manchester City boss Pep Guardiola is up against Liverpool's Jurgen Klopp and Mauricio Pochettino of Tottenham for best men's coach.Phil Neville, who led England's women to a World Cup semi-final, is up for the women's coach award with the USA's Jill Ellis and Sarina Wiegman who guided European champions the Netherlands to the World Cup final. FIFA Best shortlistsMen's player:Cristiano Ronaldo (Juventus/Portugal), Lionel Messi (Barcelona/Argentina), Virgil van Dijk player:Lucy Bronze (Lyon/England), Alex Morgan (Orlando Pride/USA), Megan Rapinoe (Reign FC/USA)Men's coach:Pep Guardiola (Manchester City), Jurgen Klopp (Liverpool), Mauricio Pochettino (Tottenham)Women's coach:Jill Ellis (USA), Phil Neville (England), Sarina Wiegman (Netherlands)Women's goalkeeper:Christiane Endler (Paris St-Germain/Chile), Hedvig Lindahl (Wolfsburg/Sweden), Sari van Veenendaal (Atletico Madrid/Netherlands)Men's goalkeeper:Alisson (Liverpool/Brazil), Ederson (Manchester City/Brazil), Marc-Andre ter Stegen (Barcelona/Germany)Puskas award (for best goal):Lionel Messi (Barcelona v Real Betis), Juan Quintero (River Plate v Racing Club), Daniel Zsori (Debrecen v Ferencvaros)",
"published_at": "2019-09-02",
}
Code:
json_file = open("./Labeled.json","r",encoding="utf-8")
data = json.load(json_file)
if __name__ == '__main__':
# logger setup
log = logging.getLogger('GiveMe5W')
log.setLevel(logging.DEBUG)
sh = logging.StreamHandler()
sh.setLevel(logging.DEBUG)
log.addHandler(sh)
# giveme5w setup - with defaults
extractor = MasterExtractor()
Document()
for i in range(0,1000):
body = data[i]["body"]
#print(body)
#for line in body:
#print(line[0:line.find('\n')])
#head = re.sub("[^A-Z\d]", "", "")
head = re.search("^[^\n]*", body).group(0)
head = str(head)
title = data[i]["title"]
title = str(title)
body = data[i]["body"]
body = str(body)
published_at = data[i]["published_at"]
published_at = str(published_at)
doc1 = Document(title,head,body,published_at)
doc = extractor.parse(doc1)
Instead of return the extracted time&location result, it gave me this error:
Traceback (most recent call last): File
"/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run() File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractor.py",
line 20, in run
extractor.process(document) File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractors/abs_extractor.py",
line 41, in process
self._evaluate_candidates(document) File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractors/environment_extractor.py",
line 75, in _evaluate_candidates
locations = self._evaluate_locations(document) File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractors/environment_extractor.py",
line 224, in _evaluate_locations
raw_locations.sort(key=lambda x: x[1], reverse=True) TypeError: unorderable types: int() < str()
The row_locations is build in the same file in line 219:
raw_locations.append([parts, location.raw['place_id'], location.point, bb, area, 0, 0, candidate, 0])
Thus, the sort function tries to sort the locations by their place_id. Please check your dataset if it does include strings and numbers for the place_id. If so you need to convert all entries to one type.

Calculating average of data set, with text mixed [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I am required to write a Python program that reads a file and calculates the average GDP for each country over the 10-year period.
Basically, my desired result is:
Australia: 1248467214849.1
Azerbaijan: 55506365440.0
Bangladesh: 139036345780.9
Brazil: 2057882976008.9
Brunei Darussalam: 14817756697.0
Burkina Faso: 10081729086.1
Cabo Verde: 1719693752.3
Cambodia: 13779735437.1
Chile: 229246627569.0
China: 7784747168448.6
Czech Republic: 207328405561.6
Dominica: 499171357.0
Egypt, Arab Rep.: 247614743339.3
France: 2702817149305.2
Germany: 3582562859622.3
Greece: 270091322197.4
Guam: 5115700000.0
India: 1726508317353.4
Iran, Islamic Rep.: 454617559842.3
Iraq: 169480789377.9
Japan: 5217301203153.5
Jordan: 29469864942.1
Kazakhstan: 168198946242.6
Kenya: 48807995178.8
Korea, Rep.: 1205755199135.1
Latvia: 28908355369.8
Lebanon: 40455121214.3
Lithuania: 42763449721.2
Madagascar: 9486935333.5
Malaysia: 274833978374.2
Mali: 11894695436.7
Mongolia: 9207583282.1
Mozambique: 12838623643.4
Myanmar: 50703575766.4
Nicaragua: 10212597587.4
Nigeria: 375494148527.7
Paraguay: 23250819867.6
Philippines: 231981575952.4
Qatar: 149455747118.1
Singapore: 257026873704.2
Spain: 1404296966483.9
Sweden: 519174481541.8
Tanzania: 36731725995.3
Tunisia: 44118349316.0
Turkmenistan: 29383204467.2
United Kingdom: 2736682446205.8
United States: 16108231800000.0
Vietnam: 144579453846.2
Zambia: 21393965950.9
Zimbabwe: 11907947332.3
and the provided text file is as:
853764622753
1055334825425
927168311000
1142876772659
1390557034408
1538194473087
1567178619062
1459597906913
1345383143356
1204616439828
Australia
33050343783
48852482960
44291490421
52902703376
65951627200
69684317719
74164435946
75244166773
53074370486
37847715736
Azerbaijan
79611888213
91631278239
102477791472
115279077465
128637938711
133355749482
149990451022
172885454931
195078665828
221415162446
Bangladesh
1397084381901
1695824517396
1667019605882
2208871646203
2616201578192
2465188674415
2472806919902
2455993200170
1803652649614
1796186586414
Brazil
12247694247
14393099069
10732366286
13707370737
18525319978
19048495519
18093829923
17098342541
12930394938
11400653732
Brunei Darussalam
6771277871
8369637065
8369175126
8979966766
10724063458
11166063467
11947176342
12377391463
10419303761
11693235542
Burkina Faso
1513934037
1789333749
1711817182
1664310770
1864824081
1751888562
1850951315
1858121723
1574288668
1617467436
Cabo Verde
8639235842
10351914093
10401851851
11242275199
12829541141
14038383450
15449630419
16777820333
18049954289
20016747754
Cambodia
173605968179
179638496279
172389498445
218537551220
252251992029
267122320057
278384332694
260990299051
242517905162
247027912574
Chile
3552182311653
4598206091384
5109953609257
6100620488868
7572553836875
8560547314679
9607224481533
10482372109962
11064666282626
11199145157649
China
189227050760
235718586901
206179982164
207477857919
227948349666
207376427021
209402444996
207818330724
186829940546
195305084919
Czech Republic
421375852
458190185
489074333
493824407
501025303
485997988
501979277
523666347
535095846
581484032
Dominica
130478960092
162818181818
188982374701
218888324505
236001858960
279372758362
288586231502
305529656458
332698041031
332791045964
Egypt, Arab Rep.
2663112510266
2923465651091
2693827452070
2646837111795
2862680142625
2681416108537
2808511203185
2849305322685
2433562015516
2465453975282
France
3439953462907
3752365607148
3418005001389
3417094562649
3757698281118
3543983909148
3752513503278
3890606893347
3375611100742
3477796274497
Germany
318497936901
354460802549
330000252153
299361576558
287797822093
245670666639
239862011450
237029579261
195541761243
192690813127
Greece
4375000000
4621000000
4781000000
4895000000
4928000000
5199000000
5337000000
5531000000
5697000000
5793000000
Guam
1201111768409
1186952757636
1323940295875
1656617073124
1823049927771
1827637859136
1856722121395
2035393459979
2089865410868
2263792499341
India
349881601459
406070949554
414059094949
487069570464
583500357530
598853401276
467414852231
434474616832
385874474399
418976679729
Iran, Islamic Rep.
88840050497
131613661510
111660855043
138516722650
185749664444
218000986223
234648370497
234648370497
179640210726
171489001692
Iraq
4515264514431
5037908465114
5231382674594
5700098114744
6157459594824
6203213121334
5155717056271
4848733415524
4383076298082
4940158776617
Japan
17110587447
21972004086
23820230000
26425379437
28840263380
30937277606
33593843662
35826925775
37517410282
38654727746
Jordan
104849886826
133441612247
115308661143
148047348241
192626507972
207998568866
236634552078
221415572820
184388432149
137278320084
Kazakhstan
31958195182
35895153328
37021512049
39999659234
41953433591
50412754822
55097343448
61445345999
63767539357
70529014778
Kenya
1122679154632
1002219052968
901934953365
1094499338703
1202463682634
1222807284485
1305604981272
1411333926201
1382764027114
1411245589977
Korea, Rep.
30901399261
35596016664
26169854045
23757368290
28223552825
28119996053
30314363219
31419072948
27009231911
27572698482
Latvia
24577114428
29227350570
35477118070
38419626628
40075674163
43868565282
46014226808
47833413749
49459296463
49598825982
Lebanon
39738180077
47850551149
37440673478
37120517694
43476878139
42847900766
46473646002
48545251796
41402022148
42738875963
Lithuania
7342923489
9413002921
8550363975
8729936136
9892702358
9919780071
10601690872
10673516673
9744243420
10001193420
Madagascar
193547824063
230813597938
202257586268
255016609233
297951960784
314443149443
323277158907
338061963396
296434003329
296535930381
Malaysia
8145694632
9750822511
10181021770
10678749467
12978107561
12442747897
13246412031
14388360064
13100058100
14034980334
Mali
4234999823
5623216449
4583850368
7189481824
10409797649
12292770631
12582122604
12226514722
11749620620
11183458131
Mongolia
9366742309
11494837053
10911698208
10154238250
13131168012
14534278446
16018848991
16961127046
14798439527
11014858592
Mozambique
20182477481
31862554102
36906181381
49540813342
59977326086
59937797559
60269734045
65446402659
59687373958
63225097051
Myanmar
7423377429
8496965842
8298695145
8758622329
9774316692
10532001130
10982972256
11880438824
12747741540
13230844687
Nicaragua
166451213396
208064753766
169481317540
369062464570
411743801712
460953836444
514966287207
568498937588
481066152889
404652720165
Nigeria
13794910634
18504130753
15929902138
20030528043
25099681461
24595319574
28965906502
30881166852
27282581336
27424071383
Paraguay
149359920006
174195135053
168334599538
199590775190
224143083707
250092093548
271836123724
284584522899
292774099014
304905406845
Philippines
79712087912
115270054945
97798351648
125122306346
167775274725
186833516484
198727747253
206224725275
164641483516
152451923077
Qatar
179981288567
192225881688
192408387762
236421782178
275599459374
289162118909
302510668904
308142766948
296840704102
296975678610
Singapore
1479341637011
1635015380108
1499099749931
1431616749640
1488067258325
1336018949806
1361854206549
1376910811041
1197789902774
1237255019654
Spain
487816328342
513965650650
429657033108
488377689565
563109663291
543880647757
578742001488
573817719109
497918109302
514459972806
Sweden
21501741757
27368386358
28573777052
31407908612
33878631649
39087748240
44333456245
48197218327
45628320606
47340071107
Tanzania
38908069299
44856586316
43454935940
44050929160
45810626509
45044112939
46251061734
47587913059
43156708809
42062549395
Tunisia
12664165103
19271523179
20214385965
22583157895
29233333333
35164210526
39197543860
43524210526
35799628571
36179885714
Turkmenistan
3074359743898
2890564338235
2382825985356
2441173394730
2619700404733
2662085168499
2739818680930
3022827781881
2885570309161
2647898654635
United Kingdom
14477635000000
14718582000000
14418739000000
14964372000000
15517926000000
16155255000000
16691517000000
17393103000000
18120714000000
18624475000000
United States
77414425532
99130304099
106014659770
115931749697
135539438560
155820001920
171222025117
186204652922
193241108710
205276172135
Vietnam
14056957976
17910858638
15328342304
20265556274
23460098340
25503370699
28045460442
27150630607
21154394546
21063989683
Zambia
5291950100
4415702800
8621573608
10141859710
12098450749
14242490252
15451768659
15891049236
16304667807
16619960402
Zimbabwe
So what I have thought of so far is:
to use an aggregation loop that checks whether the current line is a GDP value or the name of a country: when it reaches the name of a country it should calculate the average and print out the result, then it should reset the per-country aggregation variables and continue looping to aggregate the next country's GDP values.
And so to handle the mixed nature of the input file, I would either use the str.isnumeric() method or keep a counter to check when 10 GDP values have been read (since the next line would then be the name of the corresponding country).
for value in open("10year-gdp.txt"):
Something like this in Python 3 may work:
import statistics
with open('10year-gdp.txt') as f:
items = []
for line in f.readlines():
line = line.strip()
if line.isdigit():
items.append(float(line))
else:
print('{0}: {1}'.format(line, statistics.mean(items)))
items = []
You can try this one too:
with open("10year-gdp.txt", "r") as infile:
content = infile.readlines()
content = [content[i:i+11] for i in range(0,len(content),11)]
results = [": ".join([c[10],str(sum(map(float,c[0:10]))/10)]).replace("\n","") for c in content]
for result in results:
print(result)
Output:
Australia: 1248467214849.1
Azerbaijan: 55506365440.0
Bangladesh: 139036345780.9
Brazil: 2057882976008.9
Brunei Darussalam: 14817756697.0
Burkina Faso: 10081729086.1
Cabo Verde: 1719693752.3
Cambodia: 13779735437.1
Chile: 229246627569.0
China: 7784747168448.6
Czech Republic: 207328405561.6
Dominica: 499171357.0
Egypt, Arab Rep.: 247614743339.3
France: 2702817149305.2
Germany: 3582562859622.3
Greece: 270091322197.4
Guam: 5115700000.0
India: 1726508317353.4
Iran, Islamic Rep.: 454617559842.3
Iraq: 169480789377.9
Japan: 5217301203153.5
Jordan: 29469864942.1
Kazakhstan: 168198946242.6
Kenya: 48807995178.8
Korea, Rep.: 1205755199135.1
Latvia: 28908355369.8
Lebanon: 40455121214.3
Lithuania: 42763449721.2
Madagascar: 9486935333.5
Malaysia: 274833978374.2
Mali: 11894695436.7
Mongolia: 9207583282.1
Mozambique: 12838623643.4
Myanmar: 50703575766.4
Nicaragua: 10212597587.4
Nigeria: 375494148527.7
Paraguay: 23250819867.6
Philippines: 231981575952.4
Qatar: 149455747118.1
Singapore: 257026873704.2
Spain: 1404296966483.9
Sweden: 519174481541.8
Tanzania: 36731725995.3
Tunisia: 44118349316.0
Turkmenistan: 29383204467.2
United Kingdom: 2736682446205.8
United States: 16108231800000.0
Vietnam: 144579453846.2
Zambia: 21393965950.9
Zimbabwe: 11907947332.3
#!/usr/bin/env python
from statistics import mean
GDPGroup = []
GDPDictionary = {}
with open("10year-gdp.txt") as FileObject:
lines = FileObject.readlines()
for line in lines:
line = line.strip()
if not line.isdigit():
GDPDictionary[line] = GDPGroup
GDPGroup = []
else:
GDPGroup.append(float(line))
for key in GDPDictionary:
array = GDPDictionary[key]
array2 = []
GDPDictionary[key] = mean(array)
print(GDPDictionary)
Prints out:
{'Guam': 5115700000.0, 'Lithuania': 42763449721.2, 'Azerbaijan': 55506365440.0, 'Bangladesh': 139036345780.9, 'Egypt, Arab Rep.': 247614743339.3, 'Burkina Faso': 10081729086.1, 'Chile': 229246627569.0, 'Mongolia': 9207583282.1, 'Nicaragua': 10212597587.4, 'Brazil': 2057882976008.9, 'Kenya': 48807995178.8, 'Dominica': 499171357.0, 'Japan': 5217301203153.5, 'India': 1726508317353.4, 'Cabo Verde': 1719693752.3, 'United States': 16108231800000.0, 'Greece': 270091322197.4, 'Myanmar': 50703575766.4, 'Madagascar': 9486935333.5, 'Tunisia': 44118349316.0, 'Mozambique': 12838623643.4, 'Cambodia': 13779735437.1, 'Iraq': 169480789377.9, 'Korea, Rep.': 1205755199135.1, 'Kazakhstan': 168198946242.6, 'Turkmenistan': 29383204467.2, 'Germany': 3582562859622.3, 'Iran, Islamic Rep.': 454617559842.3, 'France': 2702817149305.2, 'Paraguay': 23250819867.6, 'United Kingdom': 2736682446205.8, 'Malaysia': 274833978374.2, 'Philippines': 231981575952.4, 'Qatar': 149455747118.1, 'Lebanon': 40455121214.3, 'Jordan': 29469864942.1, 'Mali': 11894695436.7, 'Zambia': 21393965950.9, 'Australia': 1248467214849.1, 'Singapore': 257026873704.2, 'Zimbabwe': 11907947332.3, 'Sweden': 519174481541.8, 'Nigeria': 375494148527.7, 'China': 7784747168448.6, 'Tanzania': 36731725995.3, 'Czech Republic': 207328405561.6, 'Vietnam': 144579453846.2, 'Latvia': 28908355369.8, 'Spain': 1404296966483.9, 'Brunei Darussalam': 14817756697.0}

BeautifulSoup, extract a table (from poorly designed site) and turn it into a CSV

I'm trying to extract this table in whole - any tips? I've tried the following code 8 different ways, with no avail. Thank you!
data = []
table = soup.find_all("tbody")
rows = table.find_all("tr")
for row in rows:
cols = row.find_all("td")
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele])
Code:
import requests
from bs4 import BeautifulSoup
html = requests.get('http://www.boxofficemojo.com/alltime/adjusted.htm').text
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table', cellspacing='1')
f = open('data.csv','w')
for row in table.find_all('tr'):
print(''.join(row.findAll(text=True)).replace('\n', '|'))
f.write(''.join(row.findAll(text=True)).replace('\n', '|') + '\n')
f.close()
Output:
1|Gone with the Wind|MGM|$1,854,769,700|$198,676,459|1939^|
2|Star Wars|Fox|$1,635,137,900|$460,998,007|1977^|
3|The Sound of Music|Fox|$1,307,373,200|$158,671,368|1965|
4|E.T.: The Extra-Terrestrial|Uni.|$1,302,222,800|$435,110,554|1982^|
5|Titanic|Par.|$1,244,347,300|$659,363,944|1997^|
6|The Ten Commandments|Par.|$1,202,580,000|$65,500,000|1956|
7|Jaws|Uni.|$1,175,763,500|$260,000,000|1975|
8|Doctor Zhivago|MGM|$1,139,563,500|$111,721,910|1965|
9|The Exorcist|WB|$1,015,300,400|$232,906,145|1973^|
10|Snow White and the Seven Dwarfs|Dis.|$1,000,620,000|$184,925,486|1937^|
11|Star Wars: The Force Awakens|BV|$992,496,600|$936,662,225|2015|
12|101 Dalmatians|Dis.|$917,240,400|$144,880,014|1961^|
13|The Empire Strikes Back|Fox|$901,298,200|$290,475,067|1980^|
14|Ben-Hur|MGM|$899,640,000|$74,000,000|1959|
15|Avatar|Fox|$893,301,900|$760,507,625|2009^|
16|Return of the Jedi|Fox|$863,465,400|$309,306,177|1983^|
17|Jurassic Park|Uni.|$843,843,500|$402,453,882|1993^|
18|Star Wars: Episode I - The Phantom Menace|Fox|$829,064,800|$474,544,677|1999^|
19|The Lion King|BV|$818,364,200|$422,783,777|1994^|
20|The Sting|Uni.|$818,331,400|$156,000,000|1973|
21|Raiders of the Lost Ark|Par.|$812,675,900|$248,159,971|1981^|
22|The Graduate|AVCO|$785,595,300|$104,945,305|1967^|
23|Fantasia|Dis.|$762,339,100|$76,408,097|1941^|
24|Jurassic World|Uni.|$725,671,700|$652,270,625|2015|
25|The Godfather|Par.|$724,509,200|$134,966,411|1972^|
26|Forrest Gump|Par.|$721,682,300|$330,252,182|1994^|
27|Mary Poppins|Dis.|$717,709,100|$102,272,727|1964^|
28|Grease|Par.|$706,577,200|$188,755,690|1978^|
29|Marvel's The Avengers|BV|$705,769,500|$623,357,910|2012|
30|Thunderball|UA|$686,664,000|$63,595,658|1965|
31|The Dark Knight|WB|$683,575,000|$534,858,444|2008^|
32|The Jungle Book|Dis.|$676,381,600|$141,843,612|1967^|
33|Sleeping Beauty|Dis.|$667,166,200|$51,600,000|1959^|
34|Ghostbusters|Col.|$653,374,800|$242,212,467|1984^|
35|Shrek 2|DW|$652,247,500|$441,226,247|2004|
36|Butch Cassidy and the Sundance Kid|Fox|$647,721,100|$102,308,889|1969|
37|Love Story|Par.|$642,583,000|$106,397,186|1970|
38|Spider-Man|Sony|$637,870,000|$403,706,375|2002|
39|Independence Day|Fox|$635,888,300|$306,169,268|1996^|
40|Home Alone|Fox|$621,799,900|$285,761,243|1990|
41|Pinocchio|Dis.|$618,762,600|$84,254,167|1940^|
42|Cleopatra (1963)|Fox|$616,744,200|$57,777,778|1963|
43|Beverly Hills Cop|Par.|$616,437,200|$234,760,478|1984|
44|Star Wars: The Last Jedi|BV|$615,738,300|$615,738,279|2017|
45|Goldfinger|UA|$608,634,000|$51,081,062|1964|
46|Airport|Uni.|$606,901,600|$100,489,151|1970|
47|American Graffiti|Uni.|$603,257,100|$115,000,000|1973|
48|The Robe|Fox|$600,872,700|$36,000,000|1953|
49|Pirates of the Caribbean: Dead Man's Chest|BV|$593,288,400|$423,315,812|2006|
50|Around the World in 80 Days|UA|$593,169,200|$42,000,000|1956|
51|Bambi|RKO|$584,880,300|$102,247,150|1942^|
52|Blazing Saddles|WB|$580,539,700|$119,601,481|1974^|
53|Batman|WB|$577,923,400|$251,188,924|1989|
54|The Bells of St. Mary's|RKO|$576,000,000|$21,333,333|1945|
55|The Lord of the Rings: The Return of the King|NL|$565,852,400|$377,845,905|2003^|
56|Finding Nemo|BV|$565,364,200|$380,843,261|2003^|
57|The Towering Inferno|Fox|$563,428,600|$116,000,000|1974|
58|Rogue One: A Star Wars Story|BV|$554,854,100|$532,177,324|2016|
59|Cinderella (1950)|Dis.|$553,567,100|$93,141,149|1950^|
60|Spider-Man 2|Sony|$552,257,300|$373,585,825|2004|
61|My Fair Lady|WB|$550,800,000|$72,000,000|1964|
62|The Greatest Show on Earth|Par.|$550,800,000|$36,000,000|1952|
63|National Lampoon's Animal House|Uni.|$549,792,700|$141,600,000|1978^|
64|The Passion of the Christ|NM|$548,090,400|$370,782,930|2004^|
65|Star Wars: Episode III - Revenge of the Sith|Fox|$544,599,700|$380,270,577|2005^|
66|Back to the Future|Uni.|$542,085,000|$210,609,762|1985|
67|The Lord of the Rings: The Two Towers|NL|$529,918,100|$342,551,365|2002^|
68|The Dark Knight Rises|WB|$528,601,000|$448,139,099|2012|
69|The Sixth Sense|BV|$528,576,400|$293,506,292|1999|
70|Superman|WB|$526,547,600|$134,218,018|1978|
71|Tootsie|Col.|$522,378,200|$177,200,000|1982|
72|Smokey and the Bandit|Uni.|$521,726,300|$126,737,428|1977|
73|Beauty and the Beast (2017)|BV|$521,407,600|$504,014,165|2017|
74|Finding Dory|BV|$515,531,300|$486,295,561|2016|
75|West Side Story|MGM|$513,807,200|$43,656,822|1961|
76|Close Encounters of the Third Kind|Col.|$513,370,800|$135,189,114|1977^|
77|Harry Potter and the Sorcerer's Stone|WB|$513,281,200|$317,575,550|2001|
78|Lady and the Tramp|Dis.|$511,646,200|$93,602,326|1955^|
79|Lawrence of Arabia|Col.|$508,421,000|$44,824,144|1962^|
80|The Rocky Horror Picture Show|Fox|$505,537,300|$112,892,319|1975|
81|Rocky|UA|$505,267,000|$117,235,147|1976|
82|The Best Years of Our Lives|RKO|$504,900,000|$23,650,000|1946|
83|The Poseidon Adventure|Fox|$504,000,000|$84,563,118|1972|
84|The Lord of the Rings: The Fellowship of the Ring|NL|$503,057,400|$315,544,750|2001^|
85|Twister|WB|$502,037,000|$241,721,524|1996|
86|Men in Black|Sony|$501,381,100|$250,690,539|1997|
87|The Bridge on the River Kwai|Col.|$499,392,000|$27,200,000|1957|
88|Transformers: Revenge of the Fallen|P/DW|$494,810,500|$402,111,870|2009|
89|It's a Mad, Mad, Mad, Mad World|MGM|$494,576,300|$46,332,858|1963|
90|Swiss Family Robinson|Dis.|$493,957,400|$40,356,000|1960|
91|One Flew Over the Cuckoo's Nest|UA|$492,831,600|$108,981,275|1975|
92|M.A.S.H.|Fox|$492,821,000|$81,600,000|1970|
93|Indiana Jones and the Temple of Doom|Par.|$491,431,300|$179,870,271|1984|
94|Avengers: Age of Ultron|BV|$491,377,100|$459,005,868|2015|
95|Star Wars: Episode II - Attack of the Clones|Fox|$490,840,600|$310,676,740|2002^|
96|Toy Story 3|BV|$489,656,000|$415,004,880|2010|
97|Mrs. Doubtfire|Fox|$483,642,600|$219,195,243|1993|
98|Aladdin|BV|$481,420,700|$217,350,219|1992|
99|Ghost|Par.|$472,450,700|$217,631,306|1990|
100|The Hunger Games: Catching Fire|LGF|$469,232,400|$424,668,047|2013|
101|Duel in the Sun|Selz.|$468,367,300|$20,408,163|1946|
102|The Hunger Games|LGF|$466,924,700|$408,010,692|2012|
103|Pirates of the Caribbean: The Curse of the Black Pearl|BV|$464,956,900|$305,413,918|2003|
104|House of Wax|WB|$463,883,000|$23,750,000|1953|
105|Rear Window|Par.|$462,256,500|$36,764,313|1954^|
106|The Lost World: Jurassic Park|Uni.|$458,173,400|$229,086,679|1997|
107|Indiana Jones and the Last Crusade|Par.|$453,643,400|$197,171,806|1989|
108|Monsters, Inc.|BV|$453,061,600|$289,916,256|2001^|
109|Frozen|BV|$450,196,500|$400,738,009|2013|
110|Spider-Man 3|Sony|$449,033,200|$336,530,303|2007|
111|Iron Man 3|BV|$448,060,700|$409,013,994|2013|
112|Terminator 2: Judgment Day|TriS|$447,732,400|$205,881,154|1991^|
113|Sergeant York|WB|$441,770,900|$16,361,885|1941|
114|How the Grinch Stole Christmas|Uni.|$441,620,600|$260,044,825|2000|
115|Top Gun|Par.|$440,917,900|$179,800,601|1986^|
116|Harry Potter and the Deathly Hallows Part 2|WB|$440,547,300|$381,011,219|2011|
117|Toy Story 2|BV|$439,139,300|$245,852,179|1999^|
118|Shrek|DW|$434,128,000|$267,665,011|2001|
119|Shrek the Third|P/DW|$430,606,000|$322,719,944|2007|
120|Despicable Me 2|Uni.|$430,487,800|$368,061,265|2013|
121|Captain America: Civil War|BV|$429,213,000|$408,084,349|2016|
122|The Matrix Reloaded|WB|$428,668,600|$281,576,461|2003|
123|Transformers|P/DW|$425,970,900|$319,246,193|2007|
124|Crocodile Dundee|Par.|$424,138,600|$174,803,506|1986|
125|Wonder Woman|WB|$423,340,500|$412,563,408|2017|
126|The Four Horsemen of the Apocalypse|MPC|$421,530,600|$9,183,673|1921|
127|Saving Private Ryan|DW|$419,958,100|$216,540,909|1998|
128|Young Frankenstein|Fox|$419,041,900|$86,273,333|1974|
129|Peter Pan|Dis.|$418,824,000|$87,404,651|1953^|
130|Gremlins|WB|$417,526,300|$153,083,102|1984^|
131|Beauty and the Beast|BV|$416,438,900|$218,967,620|1991^|
132|The Chronicles of Narnia: The Lion, the Witch and the Wardrobe|BV|$414,717,600|$291,710,957|2005|
133|Harry Potter and the Goblet of Fire|WB|$414,709,000|$290,013,036|2005|
134|Pirates of the Caribbean: At World's End|BV|$412,860,400|$309,420,425|2007|
135|Harry Potter and the Chamber of Secrets|WB|$412,327,800|$261,988,482|2002|
136|The Fugitive|WB|$407,567,300|$183,875,760|1993|
137|The Caine Mutiny|Col.|$407,479,600|$21,750,000|1954|
138|Iron Man|Par.|$407,095,000|$318,412,101|2008|
139|Transformers: Dark of the Moon|P/DW|$406,315,000|$352,390,543|2011|
140|Meet the Fockers|Uni.|$405,508,300|$279,261,160|2004|
141|Indiana Jones and the Kingdom of the Crystal Skull|Par.|$405,430,100|$317,101,119|2008|
142|Toy Story|BV|$402,711,200|$191,796,233|1995^|
143|Dances with Wolves|Orion|$401,159,500|$184,208,848|1990|
144|An Officer and a Gentleman|Par.|$400,769,900|$129,795,554|1982|
145|Guardians of the Galaxy Vol. 2|BV|$399,848,900|$389,813,101|2017|
146|2001: A Space Odyssey|MGM|$397,829,200|$56,954,992|1968^|
147|Rain Man|MGM|$397,417,800|$172,825,435|1988|
148|The Secret Life of Pets|Uni.|$397,253,600|$368,384,330|2016|
149|Guess Who's Coming to Dinner|Col.|$397,099,200|$56,666,667|1967|
150|Inside Out|BV|$396,452,900|$356,461,711|2015|
151|American Sniper|WB|$395,474,400|$350,126,372|2014|
152|Kramer Vs. Kramer|Col.|$394,925,800|$106,260,000|1979|
153|Armageddon|BV|$394,560,300|$201,578,182|1998|
154|Psycho|Uni.|$391,680,100|$32,000,000|1960|
155|Rocky III|UA|$390,271,700|$125,049,125|1982^|
156|Harry Potter and the Order of the Phoenix|WB|$389,622,600|$292,004,738|2007|
157|Rambo: First Blood Part II|TriS|$388,961,600|$150,415,432|1985|
158|Batman Forever|WB|$388,369,100|$184,031,112|1995|
159|Deadpool|Fox|$388,249,600|$363,070,709|2016|
160|Pretty Woman|BV|$387,179,600|$178,406,268|1990|
161|Earthquake|Uni.|$386,952,300|$79,666,653|1974|
162|Alice in Wonderland (2010)|BV|$385,896,200|$334,191,110|2010|
163|The Incredibles|BV|$385,835,000|$261,441,092|2004|
164|Cast Away|Fox|$384,588,700|$233,632,142|2000|
165|Home Alone 2: Lost in New York|Fox|$384,179,200|$173,585,516|1992|
166|The Jungle Book (2016)|BV|$382,904,500|$364,001,123|2016|
167|Three Men and a Baby|BV|$382,840,700|$167,780,960|1987|
168|My Big Fat Greek Wedding|IFC|$380,230,800|$241,438,208|2002|
169|Guardians of the Galaxy|BV|$378,010,100|$333,176,600|2014|
170|Furious 7|Uni.|$376,598,400|$353,007,020|2015|
171|Mission: Impossible|Par.|$375,885,400|$180,981,856|1996|
172|The Hunger Games: Mockingjay - Part 1|LGF|$373,872,900|$337,135,885|2014|
173|Minions|Uni.|$373,756,800|$336,045,770|2015|
174|Saturday Night Fever|Par.|$372,751,500|$94,213,184|1977|
175|On Golden Pond|Uni.|$372,564,100|$119,285,432|1981|
176|Austin Powers: The Spy Who Shagged Me|NL|$372,332,300|$206,040,086|1999|
177|Harry Potter and the Half-Blood Prince|WB|$371,524,900|$301,959,197|2009|
178|Bruce Almighty|Uni.|$369,680,400|$242,829,261|2003|
179|Harry Potter and the Prisoner of Azkaban|WB|$368,886,800|$249,541,069|2004|
180|Funny Girl|Col.|$367,562,200|$52,223,306|1968^|
181|Mission: Impossible II|Par.|$366,876,200|$215,409,889|2000|
182|Rush Hour 2|NL|$366,817,700|$226,164,286|2001|
183|Apollo 13|Uni.|$365,894,000|$173,837,933|1995^|
184|Patton|Fox|$365,718,000|$61,749,765|1970|
185|Fatal Attraction|Par.|$364,269,300|$156,645,693|1987|
186|Zootopia|BV|$363,584,000|$341,268,248|2016|
187|Liar Liar|Uni.|$362,821,200|$181,410,615|1997|
188|Robin Hood: Prince of Thieves|WB|$360,863,200|$165,493,908|1991|
189|Beverly Hills Cop II|Par.|$360,778,800|$153,665,036|1987|
190|Iron Man 2|Par.|$360,772,100|$312,433,331|2010|
191|Up|BV|$360,533,300|$293,004,164|2009|
192|Batman Returns|WB|$360,191,600|$162,831,698|1992|
193|Signs|BV|$360,164,800|$227,966,634|2002|
194|Jumanji: Welcome to the Jungle|Sony|$358,036,900|$358,036,871|2017|
195|The Twilight Saga: Eclipse|Sum.|$357,823,200|$300,531,751|2010|
196|Superman II|WB|$357,246,300|$108,185,706|1981|
197|The Twilight Saga: New Moon|Sum.|$357,194,500|$296,623,634|2009|
198|What's Up, Doc?|WB|$356,400,000|$66,000,000|1972|
199|9 to 5|Fox|$352,493,200|$103,290,500|1980|
200|Batman v Superman: Dawn of Justice|WB|$351,232,600|$330,360,194|2016|
201|The Firm|Par.|$351,120,300|$158,348,367|1993|
202|Suicide Squad|WB|$350,483,800|$325,100,054|2016|
203|Who Framed Roger Rabbit|BV|$349,448,400|$156,452,370|1988|
204|Inception|WB|$348,133,400|$292,576,195|2010|
205|Skyfall|Sony|$347,389,600|$304,360,277|2012|
206|The Hobbit: An Unexpected Journey|WB (NL)|$347,313,400|$303,003,568|2012|
207|Porky's|Fox|$346,289,600|$111,289,673|1982^|
208|Air Force One|Sony|$345,835,200|$172,956,409|1997|
209|Stir Crazy|Col.|$345,700,400|$101,300,000|1980|
210|A Star Is Born (1976)|WB|$344,788,700|$80,000,000|1976|
211|There's Something About Mary|Fox|$344,053,800|$176,484,651|1998|
212|Spider-Man: Homecoming|Sony|$343,499,000|$334,201,140|2017|
213|Cars|BV|$342,088,800|$244,082,982|2006|
214|The Hangover|WB|$341,182,900|$277,322,503|2009|
215|Lethal Weapon 2|WB|$340,501,700|$147,253,986|1989|
216|Night at the Museum|Fox|$340,041,900|$250,863,268|2006|
217|Harry Potter and the Deathly Hallows Part 1|WB|$339,560,700|$295,983,305|2010|
218|I Am Legend|WB|$337,126,200|$256,393,010|2007|
219|Austin Powers in Goldmember|NL|$337,033,800|$213,307,889|2002|
220|War of the Worlds|Par.|$335,521,600|$234,280,354|2005|
221|It|WB (NL)|$335,148,900|$327,481,748|2017|
222|Every Which Way But Loose|WB|$334,232,400|$85,196,485|1978|
223|The Twilight Saga: Breaking Dawn Part 2|LG/S|$333,495,700|$292,324,737|2012|
224|The Love Bug|Dis.|$331,410,900|$51,264,000|1969|
225|The Twilight Saga: Breaking Dawn Part 1|Sum.|$329,680,800|$281,287,133|2011|
226|You Only Live Twice|UA|$329,598,600|$43,084,787|1967|
227|X-Men: The Last Stand|Fox|$328,465,300|$234,362,462|2006|
228|The Mummy Returns|Uni.|$327,657,500|$202,019,785|2001|
229|X2: X-Men United|Fox|$327,236,800|$214,949,694|2003|
230|Platoon|Orion|$325,302,500|$138,530,565|1986|
231|Rocky IV|UA|$324,855,400|$127,873,716|1985|
232|Pearl Harbor|BV|$322,017,800|$198,542,554|2001|
233|True Lies|Fox|$321,261,400|$146,282,411|1994|
234|Heaven Can Wait (1978)|Par.|$320,281,100|$81,640,278|1978|
235|Lethal Weapon 3|WB|$320,153,100|$144,731,527|1992|
236|Look Who's Talking|TriS|$319,854,500|$140,088,813|1989|
237|Gladiator|DW|$319,592,900|$187,705,427|2000|
238|Man of Steel|WB|$318,830,300|$291,045,518|2013|
239|Jaws 2|Uni.|$318,717,900|$81,766,007|1978^|
240|Star Trek|Par.|$317,150,800|$257,730,019|2009|
241|The Santa Clause|BV|$316,776,400|$144,833,357|1994|
242|The Amityville Horror|AIP|$316,113,900|$86,432,000|1979|
243|Thor: Ragnarok|BV|$314,143,200|$314,143,225|2017|
244|The Waterboy|BV|$314,053,600|$161,491,646|1998|
245|A Bug's Life|BV|$313,363,900|$162,798,565|1998|
246|A Few Good Men|Col.|$313,069,200|$141,340,178|1992|
247|The Odd Couple|Par.|$312,030,500|$44,527,234|1968|
248|Rocky II|UA|$311,542,700|$85,182,160|1979|
249|Jerry Maguire|Sony|$311,468,800|$153,952,592|1996|
250|The Perfect Storm|WB|$311,027,300|$182,618,434|2000|
251|King Kong|Uni.|$310,014,100|$218,080,025|2005|
252|The Matrix|WB|$309,879,100|$171,479,930|1999|
253|The Amazing Spider-Man|Sony|$309,163,500|$262,030,663|2012|
254|Tarzan|BV|$309,122,000|$171,091,819|1999|
255|Sister Act|BV|$308,813,300|$139,605,150|1992|
256|Hooper|WB|$306,000,000|$78,000,000|1978|
257|The Blind Side|WB|$305,701,600|$255,959,475|2009|
258|The Da Vinci Code|Sony|$304,882,700|$217,536,138|2006|
259|Monsters University|BV|$304,779,900|$268,492,764|2013|
260|All the President's Men|WB|$304,276,100|$70,600,000|1976|
261|What Women Want|Par.|$303,763,400|$182,811,707|2000|
262|The Bourne Ultimatum|Uni.|$303,515,200|$227,471,070|2007|
263|Gravity|WB|$302,369,300|$274,092,705|2013|
264|Honey, I Shrunk the Kids|BV|$302,279,100|$130,724,172|1989|
265|Terms of Endearment|Par.|$301,824,600|$108,423,489|1983|
266|Men in Black II|Sony|$300,868,300|$190,418,803|2002|
267|Star Trek: The Motion Picture|Par.|$300,849,700|$82,258,456|1979|
268|Wedding Crashers|NL|$299,683,200|$209,255,921|2005|
269|Despicable Me|Uni.|$299,217,100|$251,513,985|2010|
270|Pocahontas|BV|$298,782,100|$141,579,773|1995|
271|Arthur|WB|$298,725,900|$95,461,682|1981|
272|The Hunger Games: Mockingjay - Part 2|LGF|$297,446,700|$281,723,902|2015|
273|The LEGO Movie|WB|$296,654,200|$257,760,692|2014|
274|Batman Begins|WB|$295,860,600|$206,852,432|2005^|
275|Apocalypse Now|MGM|$295,789,400|$83,471,511|1979^|
276|Charlie and the Chocolate Factory|WB|$295,677,800|$206,459,076|2005|
277|Big Daddy|Sony|$295,422,100|$163,479,795|1999|
278|Ocean's Eleven|WB|$294,446,200|$183,417,150|2001|
279|Jurassic Park III|Uni.|$293,844,100|$181,171,875|2001|
280|Teenage Mutant Ninja Turtles|NL|$293,555,800|$135,265,915|1990|
281|Planet of the Apes (2001)|Fox|$291,948,200|$180,011,740|2001|
282|Alien|Fox|$291,755,600|$80,931,801|1979^|
283|Hancock|Sony|$291,441,100|$227,946,274|2008|
284|As Good as It Gets|Sony|$290,776,100|$148,478,011|1997|
285|The Hangover Part II|WB|$289,972,400|$254,464,305|2011|
286|Midnight Cowboy|UA|$289,525,900|$44,785,053|1969|
287|The Hobbit: The Desolation of Smaug|WB (NL)|$289,308,500|$258,366,855|2013|
288|The French Connection|Fox|$287,640,000|$51,700,000|1971|
289|The Flintstones|Uni.|$286,669,000|$130,531,208|1994|
290|Captain America: The Winter Soldier|BV|$286,373,800|$259,766,572|2014|
291|Coming to America|Par.|$286,238,000|$128,152,301|1988|
292|National Treasure: Book of Secrets|BV|$286,164,000|$219,964,115|2007|
293|WALL-E|BV|$286,150,300|$223,808,164|2008|
294|The Hobbit: The Battle of the Five Armies|WB (NL)|$285,304,300|$255,119,788|2014|
295|The Silence of the Lambs|Orion|$285,087,900|$130,742,922|1991|
296|The Karate Kid Part II|Col.|$284,812,500|$115,103,979|1986|
297|Airplane!|Par.|$284,796,800|$83,453,539|1980|
298|Alvin and the Chipmunks|Fox|$284,128,700|$217,326,974|2007|
299|Meet the Parents|Uni.|$282,676,300|$166,244,045|2000|
300|Ransom|BV|$282,366,800|$136,492,681|1996|

BeautifulSoup - how to arrange data and write to txt?

New to Python, have a simple problem. I am pulling some data from Yahoo Fantasy Baseball to text file, but my code didn't work properly:
from bs4 import BeautifulSoup
import urllib2
teams = ("http://baseball.fantasysports.yahoo.com/b1/2282/players?status=A&pos=B&cut_type=33&stat1=S_S_2015&myteam=0&sort=AR&sdir=1")
page = urllib2.urlopen(teams)
soup = BeautifulSoup(page, "html.parser")
players = soup.findAll('div', {'class':'ysf-player-name Nowrap Grid-u Relative Lh-xs Ta-start'})
playersLines = [span.get_text('\t',strip=True) for span in players]
with open('output.txt', 'w') as f:
for line in playersLines:
line = playersLines[0]
output = line.encode('utf-8')
f.write(output)
In output file is only one player for 25 times. Any ideas to get result like this?
Pedro Álvarez Pit - 1B,3B
Kevin Pillar Tor - OF
Melky Cabrera CWS - OF
etc
Try removing:
line = playersLines[0]
Also, append a newline character to the end of your output to get them to write to separate lines in the output.txt file:
from bs4 import BeautifulSoup
import urllib2
teams = ("http://baseball.fantasysports.yahoo.com/b1/2282/players?status=A&pos=B&cut_type=33&stat1=S_S_2015&myteam=0&sort=AR&sdir=1")
page = urllib2.urlopen(teams)
soup = BeautifulSoup(page, "html.parser")
players = soup.findAll('div', {'class':'ysf-player-name Nowrap Grid-u Relative Lh-xs Ta-start'})
playersLines = [span.get_text('\t',strip=True) for span in players]
with open('output.txt', 'w') as f:
for line in playersLines:
output = line.encode('utf-8')
f.write(output+'\n')
Results:
Pedro Álvarez Pit - 1B,3B
Kevin Pillar Tor - OF
Melky Cabrera CWS - OF
Ryan Howard Phi - 1B
Michael A. Taylor Was - OF
Joe Mauer Min - 1B
Maikel Franco Phi - 3B
Joc Pederson LAD - OF
Yangervis Solarte SD - 1B,2B,3B
César Hernández Phi - 2B,3B,SS
Eddie Rosario Min - 2B,OF
Austin Jackson Sea - OF
Danny Espinosa Was - 1B,2B,3B,SS
Danny Valencia Oak - 1B,3B,OF
Freddy Galvis Phi - 3B,SS
Jimmy Paredes Bal - 2B,3B
Colby Rasmus Hou - OF
Luis Valbuena Hou - 1B,2B,3B
Chris Young NYY - OF
Kevin Kiermaier TB - OF
Steven Souza TB - OF
Jace Peterson Atl - 2B,3B
Juan Lagares NYM - OF
A.J. Pierzynski Atl - C
Khris Davis Mil - OF

Categories

Resources