Cannot find containers with BeautifulSoup - python
I'm trying to make a very simple script which will scrape the top 50 sounds on SoundCloud, add them to a dictionary, then save them to a file. When I try to find all the items I get none back (as seen by a debug message I put in). I was wondering what I did wrong and if anyone could help me figure it out, thanks!
from bs4 import BeautifulSoup as Bs
import requests
website = "https://soundcloud.com/charts/top?genre=rock&country=all-countries"
session = requests.session()
def get_songs():
songs = {}
response = session.get(website)
soup = Bs(response.text, "html.parser")
print(soup.title.text)
containers = soup.find_all("li", {"class": "chartTracks__item"})
if len(containers) == 0:
print("Could not find any containers")
for element in containers:
chart_track_div = element.div("chartTrack")
details_div = chart_track_div.div("chartTrack__details")
artist = details_div.div("chartTrack__username").text
song_name = details_div.div("chartTrack__title").text
songs[song_name] = artist
return songs
def create_file(songs_dictionary):
# Just printing out key&value for now
for key, value in songs_dictionary:
print("Song: " + key)
print("Artist: " + value)
toSave = get_songs()
create_file(toSave)
This is what I get after I run it: http://prntscr.com/m78dfr
A few things that I needed to change.
First, it is a dynamic page, so if you want to grab that info into containers using the soup.find_all("li", {"class": "chartTracks__item"}), you'd have to render the page first, either with Selenium or requests-html, then do the .find_all
However, the data you are pulling is found in the html source, but under different tags, so I just went ahead and grabbed the info you were grabbing.
Second, I didn't know if this was exactly your intention, but you were saving artist as the user name, and song as the title. Unfortunately, each of those songs have slightly different formats listed by soundcloud. If you were to truly grab strictly the artist - title, it'll require some filtering and re-working with the strings. But I kept it as you had it, and you can choose what to do from there.
Third, you didn't pass any parameters into your first function:
def get_songs():
songs = {}
response = session.get(website)
So it's not going to do anything, since it's referring to website, but it's never passed in. So I changed that to:
def get_songs(website):
songs = {}
response = session.get(website)
Fourth, you can't iterate through the dictionary with for key, value in songs_dictionary:. It's asking for 2 values, but can only unpack 1. To do what you are trying, you have 2 options:
for key, value in songs_dictionary.items():
print("Song: " + key)
print("Artist: " + value)
or
for key in songs_dictionary:
print("Song: " + key)
print("Artist: " + songs_dictionary[key])
I think that's all that I found, but full code here:
from bs4 import BeautifulSoup as Bs
import requests
website = "https://soundcloud.com/charts/top?genre=rock&country=all-countries"
session = requests.session()
def get_songs(website):
songs = {}
response = session.get(website)
soup = Bs(response.text, "html.parser")
print(soup.title.text)
containers = soup.find_all("section", {"class": "sounds"})
songs_ranks = containers[0].find_all('li')
if len(songs_ranks) == 0:
print("Could not find any containers")
for element in songs_ranks:
artist = element.find_all('a')[1].text
song_name = element.find('a', {'itemprop':'url'}).text
songs[song_name] = artist
return songs
def create_file(songs_dictionary):
# Just printing out key&value for now
for key, value in songs_dictionary.items():
print("Song: " + key)
print("Artist: " + value)
toSave = get_songs(website)
create_file(toSave)
output:
Song: KING
Artist: XXXTENTACION
Song: Queen - Bohemian Rhapsody
Artist: rizky.rilos
Song: áá
©á·áá
¡á¯ (Brit Rock Remix For áá
¡áá
áá
¢áá
®á¨áá
¦) - BTS
Artist: BTS
Song: XXXTENTACION - NUMB
Artist: conrad foxx
Song: In The End
Artist: LINKIN_PARK
Song: I Write Sins Not Tragedies
Artist: Panic! At The Disco
Song: Man Upon The Hill
Artist: Stars and Rabbit
Song: Nirvana - Smells like teen spirit
Artist: Rocio Araujo
Song: Nickelback - Rockstar
Artist: Roadrunner USA
Song: xxxtentacion - valentine
Artist: ó
Song: Zombie
Artist: Bad Wolves
Song: MarÃlia Mendonça â Amante Não Tem Lar
Artist: Sertanejo Repost
Song: sleep thru ur alarms
Artist: Lontalius
Song: Angel With A Shotgun
Artist: NightCore
Song: Nightcore - My Demons
Artist: NightCore
Song: Armada - Harusnya Aku
Artist: DJCantik.com
Song: Dont Stop Me Now - Queen
Artist: Zinay Hernandez
Song: Sing To Me feat. Karen O
Artist: waltermartinmusic
Song: Everytime
Artist: boy pablo
Song: Tongue Tied - Grouplove
Artist: Atlantic Records
Song: For Beginners
Artist: M. Ward
Song: This Is Gospel
Artist: Panic! At The Disco
Song: Skillet - Hero
Artist: Warner Music Nashville
Song: Wonderwall - Oasis
Artist: Florian.N.
Song: High Hopes - Panic! At the disco
Artist: IrisDH
Song: Another One Bites The Dust (Remastered 2011)
Artist: Queen
Song: Panic! At The Disco - Bohemian Rhapsody (from Suicide Squad: The Album) (Audio)
Artist: Panic! At The Disco
Song: Killer Queen (Remastered 2011)
Artist: Queen
Song: Blue Bird-Naruto Shippuden 3rd Opening Theme
Artist: flaviogomes23
Song: Virzha-tentang rindu mp3
Artist: Arjuna Bilal
Song: Tipe-X - Mawar Hitam
Artist: Tora Loaadiing
Song: Lolot - Galungan Lan Kuningan
Artist: I Made Suwita
Song: Red Hot Chili peppers - Californication
Artist: arthyum
Song: Nickelback - How You Remind Me
Artist: Roadrunner USA
Song: 2004 Green Day "Boulevard of broken dreams" Vinyl rip
Artist: Collin Codeïne
Song: Zé Neto E Cristiano - Seu PolÃcia (DVD Zé Neto E Cristiano Ao Vivo Em São José Do Rio Preto)
Artist: Sertanejo universitario (2018)
Song: Pink Floyd - Wish You Were Here
Artist: Ulviyya Ali
Song: Apocalypse
Artist: Cigarettes After Sex
Song: Linkin Park - In The End
Artist: ALLMusic
Song: Come As You Are
Artist: Nirvana
Song: Avenged Sevenfold - Dear God
Artist: Malik Hamza Sajjad
Song: Kaleo - Way Down We Go
Artist: AminAshkan
Song: Ya Qurban, Khumariyaan, Coke Studio Season 11, Episode 7
Artist: CokeStudio
Song: IDOL (Korean classical music ver.)_2018MMA VER.
Artist: Atm Soo
Song: Gym Best Music For Workout vol 2
Artist: Gym Best MusicFor Workout
Song: Do I Wanna Know? - Arctic Monkeys
Artist: Teenage Kicks.
Song: Um44k - Nossa Música âªâ«
Artist: Portal do Rap
Song: Nanatsu No Taizai (The Seven Deadly Sins) Anime OST - Perfect Time (POWER SONG)
Artist: cobritsa
Song: Tipe X - Genit
Artist: Hilmie CintaSederhana
Song: Kodaline - All I want - Acoustic Performance
Artist: Andy Wells 1
Related
I cant print horizontally
I want to make a book catalog, the output is to print horizontally and change line for every 3 books. I understand that we can do a print horizontal by using: end = "" BUT that only works for 1 line. As my output has 3 line like Title, ISBN, Price, if I using end = "", it can't get it done. Below is my code line_format = "{:50s} \n{:6s} - {:13s} \n{:11s}" books = db.get_books(kel) for book in books: print((line_format.format(str(book.title), str(book.isbn), "Rp. {:,}".format(book.price).replace(",",".")))) What I got is: Deaver - Never Game A/UK 9780008303778 Rp. 161.000 Poirot - DEATH ON THE NILE (Exp] 9780008328948 Rp. 28.000 Alchemist - 25th Anniv ed 9780062355300 Rp. 160.000 Finn- Woman in the Window [MTI] 9780062906137 Rp. 162.000 Mahurin- Blood & Honey 9780063041172 Rp. 62.000 What I want for the output is: Deaver - Never Game DEATH ON THE NILE (Exp] Alchemist 9780008303778 9780008328948 9780062355300 Rp. 161.000 Rp. 28.000 Rp. 160.000 Woman in the Window Blood & Honey 9780062906137 9780063041172 Rp. 162.000 Rp. 62.000
TypeError: unorderable types: int() < str()
There is an error occurs when I was applying the 5W1H extractor(which is an opensource library in Git) on my JSON news dataset. The error occurs at evaluate_location file when it tried to run raw_locations.sort(key=lambda x: x[1], reverse=True) Then the console gave the error says TypeError: unorderable types: int() < str() My question is: Does this means something wrong with my dataset format? But if so shouldn't it consider all the news data as a simple long string when the extractor work on this corpus? I'm eagerly looking for a solution to this problem. This is one of the json news data: { "title": "Football: Van Dijk, Ronaldo and Messi shortlisted for FIFA award", "body": "ROME: Liverpool centre-back Virgil van Dijk is on the shortlist to add FIFA's best player award to his UEFA Men's Player of the Year honour.The Dutch international denied Cristiano Ronaldo and Lionel Messi for the European title last week and the same trio are in the running for the FIFA accolade to be announced in Milan on September 23. Van Dijk starred in Liverpool's triumphant Champions League campaign.England full-back Lucy Bronze won UEFA's women's award and is on FIFA's shortlist with the United States' World Cup-winning duo Megan Rapinoe and Alex Morgan.Manchester City boss Pep Guardiola is up against Liverpool's Jurgen Klopp and Mauricio Pochettino of Tottenham for best men's coach.Phil Neville, who led England's women to a World Cup semi-final, is up for the women's coach award with the USA's Jill Ellis and Sarina Wiegman who guided European champions the Netherlands to the World Cup final. FIFA Best shortlistsMen's player:Cristiano Ronaldo (Juventus/Portugal), Lionel Messi (Barcelona/Argentina), Virgil van Dijk player:Lucy Bronze (Lyon/England), Alex Morgan (Orlando Pride/USA), Megan Rapinoe (Reign FC/USA)Men's coach:Pep Guardiola (Manchester City), Jurgen Klopp (Liverpool), Mauricio Pochettino (Tottenham)Women's coach:Jill Ellis (USA), Phil Neville (England), Sarina Wiegman (Netherlands)Women's goalkeeper:Christiane Endler (Paris St-Germain/Chile), Hedvig Lindahl (Wolfsburg/Sweden), Sari van Veenendaal (Atletico Madrid/Netherlands)Men's goalkeeper:Alisson (Liverpool/Brazil), Ederson (Manchester City/Brazil), Marc-Andre ter Stegen (Barcelona/Germany)Puskas award (for best goal):Lionel Messi (Barcelona v Real Betis), Juan Quintero (River Plate v Racing Club), Daniel Zsori (Debrecen v Ferencvaros)", "published_at": "2019-09-02", } Code: json_file = open("./Labeled.json","r",encoding="utf-8") data = json.load(json_file) if __name__ == '__main__': # logger setup log = logging.getLogger('GiveMe5W') log.setLevel(logging.DEBUG) sh = logging.StreamHandler() sh.setLevel(logging.DEBUG) log.addHandler(sh) # giveme5w setup - with defaults extractor = MasterExtractor() Document() for i in range(0,1000): body = data[i]["body"] #print(body) #for line in body: #print(line[0:line.find('\n')]) #head = re.sub("[^A-Z\d]", "", "") head = re.search("^[^\n]*", body).group(0) head = str(head) title = data[i]["title"] title = str(title) body = data[i]["body"] body = str(body) published_at = data[i]["published_at"] published_at = str(published_at) doc1 = Document(title,head,body,published_at) doc = extractor.parse(doc1) Instead of return the extracted time&location result, it gave me this error: Traceback (most recent call last): File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractor.py", line 20, in run extractor.process(document) File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractors/abs_extractor.py", line 41, in process self._evaluate_candidates(document) File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractors/environment_extractor.py", line 75, in _evaluate_candidates locations = self._evaluate_locations(document) File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractors/environment_extractor.py", line 224, in _evaluate_locations raw_locations.sort(key=lambda x: x[1], reverse=True) TypeError: unorderable types: int() < str()
The row_locations is build in the same file in line 219: raw_locations.append([parts, location.raw['place_id'], location.point, bb, area, 0, 0, candidate, 0]) Thus, the sort function tries to sort the locations by their place_id. Please check your dataset if it does include strings and numbers for the place_id. If so you need to convert all entries to one type.
posting more than one line of text on fb status [Python] [Facebook API]
I started this project in the school holidays as a way to keep practising and enhancing my python knowledge. To put it shortly the code is a facebook bot that randomly generates NBA teams, Players and positions which should look like this when run. Houston Rockets [PG] Ryan Anderson [PF] Michael Curry [SF] Marcus Morris [C] Bob Royer [SF] Brian Heaney I'm currently having trouble when it comes to posting my code to my facebook page where instead of posting 1 team and 5 players/positions the programme will only post 1 single player like this Ryan Anderson Here is my code import os import random import facebook token = "...." fb = facebook.GraphAPI(access_token = token) parent_dir = "../NBAbot" os.chdir(parent_dir) file_name = "nba_players.txt" def random_position(): """Random Team from list""" position = ['[PG]','[SG]','[SF]','[PF]','[C]',] random.shuffle(position) position = position.pop() return(position) def random_team(): """Random Team from list""" Team = ['Los Angeles Lakers','Golden State Warriors','Toronto Raptors','Boston Celtics','Cleveland Cavaliers','Houston Rockets','San Antonio Spurs','New York Knicks','Chicago Bulls','Minnesota Timberwolves','Philadelphia 76ers','Miami Heat','Milwaukee','Portland Trail Blazers','Dallas Mavericks','Phoenix Suns','Denver Nuggets','Utah Jazz','Indiana Pacers','Los Angeles Clippers','Washington Wizards','Brooklyn Nets','New Orleans Pelicans','Sacramento Kings','Atlanta Hawks','Detroit Pistons','Memphis Grizzlies','Charlotte Hornets','Orlando Magic'] random.shuffle(Team) Team = Team.pop() return(Team) def random_player(datafile): read_mode = "r" with open (datafile, read_mode) as read_file: the_line = read_file.readlines() return(random.choice(the_line)) def main(): return( random_team(), random_position(), random_player(file_name), random_position(), random_player(file_name), random_position(), random_player(file_name), random_position(), random_player(file_name), random_position(), random_player(file_name)) fb.put_object(parent_object="me", connection_name='feed', message=main()) any help is appreciated.
BeautifulSoup, extract a table (from poorly designed site) and turn it into a CSV
I'm trying to extract this table in whole - any tips? I've tried the following code 8 different ways, with no avail. Thank you! data = [] table = soup.find_all("tbody") rows = table.find_all("tr") for row in rows: cols = row.find_all("td") cols = [ele.text.strip() for ele in cols] data.append([ele for ele in cols if ele])
Code: import requests from bs4 import BeautifulSoup html = requests.get('http://www.boxofficemojo.com/alltime/adjusted.htm').text soup = BeautifulSoup(html, 'html.parser') table = soup.find('table', cellspacing='1') f = open('data.csv','w') for row in table.find_all('tr'): print(''.join(row.findAll(text=True)).replace('\n', '|')) f.write(''.join(row.findAll(text=True)).replace('\n', '|') + '\n') f.close() Output: 1|Gone with the Wind|MGM|$1,854,769,700|$198,676,459|1939^| 2|Star Wars|Fox|$1,635,137,900|$460,998,007|1977^| 3|The Sound of Music|Fox|$1,307,373,200|$158,671,368|1965| 4|E.T.: The Extra-Terrestrial|Uni.|$1,302,222,800|$435,110,554|1982^| 5|Titanic|Par.|$1,244,347,300|$659,363,944|1997^| 6|The Ten Commandments|Par.|$1,202,580,000|$65,500,000|1956| 7|Jaws|Uni.|$1,175,763,500|$260,000,000|1975| 8|Doctor Zhivago|MGM|$1,139,563,500|$111,721,910|1965| 9|The Exorcist|WB|$1,015,300,400|$232,906,145|1973^| 10|Snow White and the Seven Dwarfs|Dis.|$1,000,620,000|$184,925,486|1937^| 11|Star Wars: The Force Awakens|BV|$992,496,600|$936,662,225|2015| 12|101 Dalmatians|Dis.|$917,240,400|$144,880,014|1961^| 13|The Empire Strikes Back|Fox|$901,298,200|$290,475,067|1980^| 14|Ben-Hur|MGM|$899,640,000|$74,000,000|1959| 15|Avatar|Fox|$893,301,900|$760,507,625|2009^| 16|Return of the Jedi|Fox|$863,465,400|$309,306,177|1983^| 17|Jurassic Park|Uni.|$843,843,500|$402,453,882|1993^| 18|Star Wars: Episode I - The Phantom Menace|Fox|$829,064,800|$474,544,677|1999^| 19|The Lion King|BV|$818,364,200|$422,783,777|1994^| 20|The Sting|Uni.|$818,331,400|$156,000,000|1973| 21|Raiders of the Lost Ark|Par.|$812,675,900|$248,159,971|1981^| 22|The Graduate|AVCO|$785,595,300|$104,945,305|1967^| 23|Fantasia|Dis.|$762,339,100|$76,408,097|1941^| 24|Jurassic World|Uni.|$725,671,700|$652,270,625|2015| 25|The Godfather|Par.|$724,509,200|$134,966,411|1972^| 26|Forrest Gump|Par.|$721,682,300|$330,252,182|1994^| 27|Mary Poppins|Dis.|$717,709,100|$102,272,727|1964^| 28|Grease|Par.|$706,577,200|$188,755,690|1978^| 29|Marvel's The Avengers|BV|$705,769,500|$623,357,910|2012| 30|Thunderball|UA|$686,664,000|$63,595,658|1965| 31|The Dark Knight|WB|$683,575,000|$534,858,444|2008^| 32|The Jungle Book|Dis.|$676,381,600|$141,843,612|1967^| 33|Sleeping Beauty|Dis.|$667,166,200|$51,600,000|1959^| 34|Ghostbusters|Col.|$653,374,800|$242,212,467|1984^| 35|Shrek 2|DW|$652,247,500|$441,226,247|2004| 36|Butch Cassidy and the Sundance Kid|Fox|$647,721,100|$102,308,889|1969| 37|Love Story|Par.|$642,583,000|$106,397,186|1970| 38|Spider-Man|Sony|$637,870,000|$403,706,375|2002| 39|Independence Day|Fox|$635,888,300|$306,169,268|1996^| 40|Home Alone|Fox|$621,799,900|$285,761,243|1990| 41|Pinocchio|Dis.|$618,762,600|$84,254,167|1940^| 42|Cleopatra (1963)|Fox|$616,744,200|$57,777,778|1963| 43|Beverly Hills Cop|Par.|$616,437,200|$234,760,478|1984| 44|Star Wars: The Last Jedi|BV|$615,738,300|$615,738,279|2017| 45|Goldfinger|UA|$608,634,000|$51,081,062|1964| 46|Airport|Uni.|$606,901,600|$100,489,151|1970| 47|American Graffiti|Uni.|$603,257,100|$115,000,000|1973| 48|The Robe|Fox|$600,872,700|$36,000,000|1953| 49|Pirates of the Caribbean: Dead Man's Chest|BV|$593,288,400|$423,315,812|2006| 50|Around the World in 80 Days|UA|$593,169,200|$42,000,000|1956| 51|Bambi|RKO|$584,880,300|$102,247,150|1942^| 52|Blazing Saddles|WB|$580,539,700|$119,601,481|1974^| 53|Batman|WB|$577,923,400|$251,188,924|1989| 54|The Bells of St. Mary's|RKO|$576,000,000|$21,333,333|1945| 55|The Lord of the Rings: The Return of the King|NL|$565,852,400|$377,845,905|2003^| 56|Finding Nemo|BV|$565,364,200|$380,843,261|2003^| 57|The Towering Inferno|Fox|$563,428,600|$116,000,000|1974| 58|Rogue One: A Star Wars Story|BV|$554,854,100|$532,177,324|2016| 59|Cinderella (1950)|Dis.|$553,567,100|$93,141,149|1950^| 60|Spider-Man 2|Sony|$552,257,300|$373,585,825|2004| 61|My Fair Lady|WB|$550,800,000|$72,000,000|1964| 62|The Greatest Show on Earth|Par.|$550,800,000|$36,000,000|1952| 63|National Lampoon's Animal House|Uni.|$549,792,700|$141,600,000|1978^| 64|The Passion of the Christ|NM|$548,090,400|$370,782,930|2004^| 65|Star Wars: Episode III - Revenge of the Sith|Fox|$544,599,700|$380,270,577|2005^| 66|Back to the Future|Uni.|$542,085,000|$210,609,762|1985| 67|The Lord of the Rings: The Two Towers|NL|$529,918,100|$342,551,365|2002^| 68|The Dark Knight Rises|WB|$528,601,000|$448,139,099|2012| 69|The Sixth Sense|BV|$528,576,400|$293,506,292|1999| 70|Superman|WB|$526,547,600|$134,218,018|1978| 71|Tootsie|Col.|$522,378,200|$177,200,000|1982| 72|Smokey and the Bandit|Uni.|$521,726,300|$126,737,428|1977| 73|Beauty and the Beast (2017)|BV|$521,407,600|$504,014,165|2017| 74|Finding Dory|BV|$515,531,300|$486,295,561|2016| 75|West Side Story|MGM|$513,807,200|$43,656,822|1961| 76|Close Encounters of the Third Kind|Col.|$513,370,800|$135,189,114|1977^| 77|Harry Potter and the Sorcerer's Stone|WB|$513,281,200|$317,575,550|2001| 78|Lady and the Tramp|Dis.|$511,646,200|$93,602,326|1955^| 79|Lawrence of Arabia|Col.|$508,421,000|$44,824,144|1962^| 80|The Rocky Horror Picture Show|Fox|$505,537,300|$112,892,319|1975| 81|Rocky|UA|$505,267,000|$117,235,147|1976| 82|The Best Years of Our Lives|RKO|$504,900,000|$23,650,000|1946| 83|The Poseidon Adventure|Fox|$504,000,000|$84,563,118|1972| 84|The Lord of the Rings: The Fellowship of the Ring|NL|$503,057,400|$315,544,750|2001^| 85|Twister|WB|$502,037,000|$241,721,524|1996| 86|Men in Black|Sony|$501,381,100|$250,690,539|1997| 87|The Bridge on the River Kwai|Col.|$499,392,000|$27,200,000|1957| 88|Transformers: Revenge of the Fallen|P/DW|$494,810,500|$402,111,870|2009| 89|It's a Mad, Mad, Mad, Mad World|MGM|$494,576,300|$46,332,858|1963| 90|Swiss Family Robinson|Dis.|$493,957,400|$40,356,000|1960| 91|One Flew Over the Cuckoo's Nest|UA|$492,831,600|$108,981,275|1975| 92|M.A.S.H.|Fox|$492,821,000|$81,600,000|1970| 93|Indiana Jones and the Temple of Doom|Par.|$491,431,300|$179,870,271|1984| 94|Avengers: Age of Ultron|BV|$491,377,100|$459,005,868|2015| 95|Star Wars: Episode II - Attack of the Clones|Fox|$490,840,600|$310,676,740|2002^| 96|Toy Story 3|BV|$489,656,000|$415,004,880|2010| 97|Mrs. Doubtfire|Fox|$483,642,600|$219,195,243|1993| 98|Aladdin|BV|$481,420,700|$217,350,219|1992| 99|Ghost|Par.|$472,450,700|$217,631,306|1990| 100|The Hunger Games: Catching Fire|LGF|$469,232,400|$424,668,047|2013| 101|Duel in the Sun|Selz.|$468,367,300|$20,408,163|1946| 102|The Hunger Games|LGF|$466,924,700|$408,010,692|2012| 103|Pirates of the Caribbean: The Curse of the Black Pearl|BV|$464,956,900|$305,413,918|2003| 104|House of Wax|WB|$463,883,000|$23,750,000|1953| 105|Rear Window|Par.|$462,256,500|$36,764,313|1954^| 106|The Lost World: Jurassic Park|Uni.|$458,173,400|$229,086,679|1997| 107|Indiana Jones and the Last Crusade|Par.|$453,643,400|$197,171,806|1989| 108|Monsters, Inc.|BV|$453,061,600|$289,916,256|2001^| 109|Frozen|BV|$450,196,500|$400,738,009|2013| 110|Spider-Man 3|Sony|$449,033,200|$336,530,303|2007| 111|Iron Man 3|BV|$448,060,700|$409,013,994|2013| 112|Terminator 2: Judgment Day|TriS|$447,732,400|$205,881,154|1991^| 113|Sergeant York|WB|$441,770,900|$16,361,885|1941| 114|How the Grinch Stole Christmas|Uni.|$441,620,600|$260,044,825|2000| 115|Top Gun|Par.|$440,917,900|$179,800,601|1986^| 116|Harry Potter and the Deathly Hallows Part 2|WB|$440,547,300|$381,011,219|2011| 117|Toy Story 2|BV|$439,139,300|$245,852,179|1999^| 118|Shrek|DW|$434,128,000|$267,665,011|2001| 119|Shrek the Third|P/DW|$430,606,000|$322,719,944|2007| 120|Despicable Me 2|Uni.|$430,487,800|$368,061,265|2013| 121|Captain America: Civil War|BV|$429,213,000|$408,084,349|2016| 122|The Matrix Reloaded|WB|$428,668,600|$281,576,461|2003| 123|Transformers|P/DW|$425,970,900|$319,246,193|2007| 124|Crocodile Dundee|Par.|$424,138,600|$174,803,506|1986| 125|Wonder Woman|WB|$423,340,500|$412,563,408|2017| 126|The Four Horsemen of the Apocalypse|MPC|$421,530,600|$9,183,673|1921| 127|Saving Private Ryan|DW|$419,958,100|$216,540,909|1998| 128|Young Frankenstein|Fox|$419,041,900|$86,273,333|1974| 129|Peter Pan|Dis.|$418,824,000|$87,404,651|1953^| 130|Gremlins|WB|$417,526,300|$153,083,102|1984^| 131|Beauty and the Beast|BV|$416,438,900|$218,967,620|1991^| 132|The Chronicles of Narnia: The Lion, the Witch and the Wardrobe|BV|$414,717,600|$291,710,957|2005| 133|Harry Potter and the Goblet of Fire|WB|$414,709,000|$290,013,036|2005| 134|Pirates of the Caribbean: At World's End|BV|$412,860,400|$309,420,425|2007| 135|Harry Potter and the Chamber of Secrets|WB|$412,327,800|$261,988,482|2002| 136|The Fugitive|WB|$407,567,300|$183,875,760|1993| 137|The Caine Mutiny|Col.|$407,479,600|$21,750,000|1954| 138|Iron Man|Par.|$407,095,000|$318,412,101|2008| 139|Transformers: Dark of the Moon|P/DW|$406,315,000|$352,390,543|2011| 140|Meet the Fockers|Uni.|$405,508,300|$279,261,160|2004| 141|Indiana Jones and the Kingdom of the Crystal Skull|Par.|$405,430,100|$317,101,119|2008| 142|Toy Story|BV|$402,711,200|$191,796,233|1995^| 143|Dances with Wolves|Orion|$401,159,500|$184,208,848|1990| 144|An Officer and a Gentleman|Par.|$400,769,900|$129,795,554|1982| 145|Guardians of the Galaxy Vol. 2|BV|$399,848,900|$389,813,101|2017| 146|2001: A Space Odyssey|MGM|$397,829,200|$56,954,992|1968^| 147|Rain Man|MGM|$397,417,800|$172,825,435|1988| 148|The Secret Life of Pets|Uni.|$397,253,600|$368,384,330|2016| 149|Guess Who's Coming to Dinner|Col.|$397,099,200|$56,666,667|1967| 150|Inside Out|BV|$396,452,900|$356,461,711|2015| 151|American Sniper|WB|$395,474,400|$350,126,372|2014| 152|Kramer Vs. Kramer|Col.|$394,925,800|$106,260,000|1979| 153|Armageddon|BV|$394,560,300|$201,578,182|1998| 154|Psycho|Uni.|$391,680,100|$32,000,000|1960| 155|Rocky III|UA|$390,271,700|$125,049,125|1982^| 156|Harry Potter and the Order of the Phoenix|WB|$389,622,600|$292,004,738|2007| 157|Rambo: First Blood Part II|TriS|$388,961,600|$150,415,432|1985| 158|Batman Forever|WB|$388,369,100|$184,031,112|1995| 159|Deadpool|Fox|$388,249,600|$363,070,709|2016| 160|Pretty Woman|BV|$387,179,600|$178,406,268|1990| 161|Earthquake|Uni.|$386,952,300|$79,666,653|1974| 162|Alice in Wonderland (2010)|BV|$385,896,200|$334,191,110|2010| 163|The Incredibles|BV|$385,835,000|$261,441,092|2004| 164|Cast Away|Fox|$384,588,700|$233,632,142|2000| 165|Home Alone 2: Lost in New York|Fox|$384,179,200|$173,585,516|1992| 166|The Jungle Book (2016)|BV|$382,904,500|$364,001,123|2016| 167|Three Men and a Baby|BV|$382,840,700|$167,780,960|1987| 168|My Big Fat Greek Wedding|IFC|$380,230,800|$241,438,208|2002| 169|Guardians of the Galaxy|BV|$378,010,100|$333,176,600|2014| 170|Furious 7|Uni.|$376,598,400|$353,007,020|2015| 171|Mission: Impossible|Par.|$375,885,400|$180,981,856|1996| 172|The Hunger Games: Mockingjay - Part 1|LGF|$373,872,900|$337,135,885|2014| 173|Minions|Uni.|$373,756,800|$336,045,770|2015| 174|Saturday Night Fever|Par.|$372,751,500|$94,213,184|1977| 175|On Golden Pond|Uni.|$372,564,100|$119,285,432|1981| 176|Austin Powers: The Spy Who Shagged Me|NL|$372,332,300|$206,040,086|1999| 177|Harry Potter and the Half-Blood Prince|WB|$371,524,900|$301,959,197|2009| 178|Bruce Almighty|Uni.|$369,680,400|$242,829,261|2003| 179|Harry Potter and the Prisoner of Azkaban|WB|$368,886,800|$249,541,069|2004| 180|Funny Girl|Col.|$367,562,200|$52,223,306|1968^| 181|Mission: Impossible II|Par.|$366,876,200|$215,409,889|2000| 182|Rush Hour 2|NL|$366,817,700|$226,164,286|2001| 183|Apollo 13|Uni.|$365,894,000|$173,837,933|1995^| 184|Patton|Fox|$365,718,000|$61,749,765|1970| 185|Fatal Attraction|Par.|$364,269,300|$156,645,693|1987| 186|Zootopia|BV|$363,584,000|$341,268,248|2016| 187|Liar Liar|Uni.|$362,821,200|$181,410,615|1997| 188|Robin Hood: Prince of Thieves|WB|$360,863,200|$165,493,908|1991| 189|Beverly Hills Cop II|Par.|$360,778,800|$153,665,036|1987| 190|Iron Man 2|Par.|$360,772,100|$312,433,331|2010| 191|Up|BV|$360,533,300|$293,004,164|2009| 192|Batman Returns|WB|$360,191,600|$162,831,698|1992| 193|Signs|BV|$360,164,800|$227,966,634|2002| 194|Jumanji: Welcome to the Jungle|Sony|$358,036,900|$358,036,871|2017| 195|The Twilight Saga: Eclipse|Sum.|$357,823,200|$300,531,751|2010| 196|Superman II|WB|$357,246,300|$108,185,706|1981| 197|The Twilight Saga: New Moon|Sum.|$357,194,500|$296,623,634|2009| 198|What's Up, Doc?|WB|$356,400,000|$66,000,000|1972| 199|9 to 5|Fox|$352,493,200|$103,290,500|1980| 200|Batman v Superman: Dawn of Justice|WB|$351,232,600|$330,360,194|2016| 201|The Firm|Par.|$351,120,300|$158,348,367|1993| 202|Suicide Squad|WB|$350,483,800|$325,100,054|2016| 203|Who Framed Roger Rabbit|BV|$349,448,400|$156,452,370|1988| 204|Inception|WB|$348,133,400|$292,576,195|2010| 205|Skyfall|Sony|$347,389,600|$304,360,277|2012| 206|The Hobbit: An Unexpected Journey|WB (NL)|$347,313,400|$303,003,568|2012| 207|Porky's|Fox|$346,289,600|$111,289,673|1982^| 208|Air Force One|Sony|$345,835,200|$172,956,409|1997| 209|Stir Crazy|Col.|$345,700,400|$101,300,000|1980| 210|A Star Is Born (1976)|WB|$344,788,700|$80,000,000|1976| 211|There's Something About Mary|Fox|$344,053,800|$176,484,651|1998| 212|Spider-Man: Homecoming|Sony|$343,499,000|$334,201,140|2017| 213|Cars|BV|$342,088,800|$244,082,982|2006| 214|The Hangover|WB|$341,182,900|$277,322,503|2009| 215|Lethal Weapon 2|WB|$340,501,700|$147,253,986|1989| 216|Night at the Museum|Fox|$340,041,900|$250,863,268|2006| 217|Harry Potter and the Deathly Hallows Part 1|WB|$339,560,700|$295,983,305|2010| 218|I Am Legend|WB|$337,126,200|$256,393,010|2007| 219|Austin Powers in Goldmember|NL|$337,033,800|$213,307,889|2002| 220|War of the Worlds|Par.|$335,521,600|$234,280,354|2005| 221|It|WB (NL)|$335,148,900|$327,481,748|2017| 222|Every Which Way But Loose|WB|$334,232,400|$85,196,485|1978| 223|The Twilight Saga: Breaking Dawn Part 2|LG/S|$333,495,700|$292,324,737|2012| 224|The Love Bug|Dis.|$331,410,900|$51,264,000|1969| 225|The Twilight Saga: Breaking Dawn Part 1|Sum.|$329,680,800|$281,287,133|2011| 226|You Only Live Twice|UA|$329,598,600|$43,084,787|1967| 227|X-Men: The Last Stand|Fox|$328,465,300|$234,362,462|2006| 228|The Mummy Returns|Uni.|$327,657,500|$202,019,785|2001| 229|X2: X-Men United|Fox|$327,236,800|$214,949,694|2003| 230|Platoon|Orion|$325,302,500|$138,530,565|1986| 231|Rocky IV|UA|$324,855,400|$127,873,716|1985| 232|Pearl Harbor|BV|$322,017,800|$198,542,554|2001| 233|True Lies|Fox|$321,261,400|$146,282,411|1994| 234|Heaven Can Wait (1978)|Par.|$320,281,100|$81,640,278|1978| 235|Lethal Weapon 3|WB|$320,153,100|$144,731,527|1992| 236|Look Who's Talking|TriS|$319,854,500|$140,088,813|1989| 237|Gladiator|DW|$319,592,900|$187,705,427|2000| 238|Man of Steel|WB|$318,830,300|$291,045,518|2013| 239|Jaws 2|Uni.|$318,717,900|$81,766,007|1978^| 240|Star Trek|Par.|$317,150,800|$257,730,019|2009| 241|The Santa Clause|BV|$316,776,400|$144,833,357|1994| 242|The Amityville Horror|AIP|$316,113,900|$86,432,000|1979| 243|Thor: Ragnarok|BV|$314,143,200|$314,143,225|2017| 244|The Waterboy|BV|$314,053,600|$161,491,646|1998| 245|A Bug's Life|BV|$313,363,900|$162,798,565|1998| 246|A Few Good Men|Col.|$313,069,200|$141,340,178|1992| 247|The Odd Couple|Par.|$312,030,500|$44,527,234|1968| 248|Rocky II|UA|$311,542,700|$85,182,160|1979| 249|Jerry Maguire|Sony|$311,468,800|$153,952,592|1996| 250|The Perfect Storm|WB|$311,027,300|$182,618,434|2000| 251|King Kong|Uni.|$310,014,100|$218,080,025|2005| 252|The Matrix|WB|$309,879,100|$171,479,930|1999| 253|The Amazing Spider-Man|Sony|$309,163,500|$262,030,663|2012| 254|Tarzan|BV|$309,122,000|$171,091,819|1999| 255|Sister Act|BV|$308,813,300|$139,605,150|1992| 256|Hooper|WB|$306,000,000|$78,000,000|1978| 257|The Blind Side|WB|$305,701,600|$255,959,475|2009| 258|The Da Vinci Code|Sony|$304,882,700|$217,536,138|2006| 259|Monsters University|BV|$304,779,900|$268,492,764|2013| 260|All the President's Men|WB|$304,276,100|$70,600,000|1976| 261|What Women Want|Par.|$303,763,400|$182,811,707|2000| 262|The Bourne Ultimatum|Uni.|$303,515,200|$227,471,070|2007| 263|Gravity|WB|$302,369,300|$274,092,705|2013| 264|Honey, I Shrunk the Kids|BV|$302,279,100|$130,724,172|1989| 265|Terms of Endearment|Par.|$301,824,600|$108,423,489|1983| 266|Men in Black II|Sony|$300,868,300|$190,418,803|2002| 267|Star Trek: The Motion Picture|Par.|$300,849,700|$82,258,456|1979| 268|Wedding Crashers|NL|$299,683,200|$209,255,921|2005| 269|Despicable Me|Uni.|$299,217,100|$251,513,985|2010| 270|Pocahontas|BV|$298,782,100|$141,579,773|1995| 271|Arthur|WB|$298,725,900|$95,461,682|1981| 272|The Hunger Games: Mockingjay - Part 2|LGF|$297,446,700|$281,723,902|2015| 273|The LEGO Movie|WB|$296,654,200|$257,760,692|2014| 274|Batman Begins|WB|$295,860,600|$206,852,432|2005^| 275|Apocalypse Now|MGM|$295,789,400|$83,471,511|1979^| 276|Charlie and the Chocolate Factory|WB|$295,677,800|$206,459,076|2005| 277|Big Daddy|Sony|$295,422,100|$163,479,795|1999| 278|Ocean's Eleven|WB|$294,446,200|$183,417,150|2001| 279|Jurassic Park III|Uni.|$293,844,100|$181,171,875|2001| 280|Teenage Mutant Ninja Turtles|NL|$293,555,800|$135,265,915|1990| 281|Planet of the Apes (2001)|Fox|$291,948,200|$180,011,740|2001| 282|Alien|Fox|$291,755,600|$80,931,801|1979^| 283|Hancock|Sony|$291,441,100|$227,946,274|2008| 284|As Good as It Gets|Sony|$290,776,100|$148,478,011|1997| 285|The Hangover Part II|WB|$289,972,400|$254,464,305|2011| 286|Midnight Cowboy|UA|$289,525,900|$44,785,053|1969| 287|The Hobbit: The Desolation of Smaug|WB (NL)|$289,308,500|$258,366,855|2013| 288|The French Connection|Fox|$287,640,000|$51,700,000|1971| 289|The Flintstones|Uni.|$286,669,000|$130,531,208|1994| 290|Captain America: The Winter Soldier|BV|$286,373,800|$259,766,572|2014| 291|Coming to America|Par.|$286,238,000|$128,152,301|1988| 292|National Treasure: Book of Secrets|BV|$286,164,000|$219,964,115|2007| 293|WALL-E|BV|$286,150,300|$223,808,164|2008| 294|The Hobbit: The Battle of the Five Armies|WB (NL)|$285,304,300|$255,119,788|2014| 295|The Silence of the Lambs|Orion|$285,087,900|$130,742,922|1991| 296|The Karate Kid Part II|Col.|$284,812,500|$115,103,979|1986| 297|Airplane!|Par.|$284,796,800|$83,453,539|1980| 298|Alvin and the Chipmunks|Fox|$284,128,700|$217,326,974|2007| 299|Meet the Parents|Uni.|$282,676,300|$166,244,045|2000| 300|Ransom|BV|$282,366,800|$136,492,681|1996|
html scraping using python topboxoffice list from imdb website
URL: http://www.imdb.com/chart/?ref_=nv_ch_cht_2 I want you to print top box office list from above site (all the movies' rank, title, weekend, gross and weeks movies in the order) Example output: Rank:1 title: godzilla weekend:$93.2M Gross:$93.2M Weeks: 1 Rank: 2 title: Neighbours
This is just a simple way to extract those entities by BeautifulSoup from bs4 import BeautifulSoup import urllib2 url = "http://www.imdb.com/chart/?ref_=nv_ch_cht_2" data = urllib2.urlopen(url).read() page = BeautifulSoup(data, 'html.parser') rows = page.findAll("tr", {'class': ['odd', 'even']}) for tr in rows: for data in tr.findAll("td", {'class': ['titleColumn', 'weeksColumn','ratingColumn']}): print data.get_text() P.S.-Arrange according to your will.
There is no need to scrape anything. See the answer I gave here. How to scrape data from imdb business page?
The below Python script will give you, 1) List of Top Box Office movies from IMDb 2) And also the List of Cast for each of them. from lxml.html import parse def imdb_bo(no_of_movies=5): bo_url = 'http://www.imdb.com/chart/' bo_page = parse(bo_url).getroot() bo_table = bo_page.cssselect('table.chart') bo_total = len(bo_table[0][2]) if no_of_movies <= bo_total: count = no_of_movies else: count = bo_total movies = {} for i in range(0, count): mo = {} mo['url'] = 'http://www.imdb.com'+bo_page.cssselect('td.titleColumn')[i][0].get('href') mo['title'] = bo_page.cssselect('td.titleColumn')[i][0].text_content().strip() mo['year'] = bo_page.cssselect('td.titleColumn')[i][1].text_content().strip(" ()") mo['weekend'] = bo_page.cssselect('td.ratingColumn')[i*2].text_content().strip() mo['gross'] = bo_page.cssselect('td.ratingColumn')[(i*2)+1][0].text_content().strip() mo['weeks'] = bo_page.cssselect('td.weeksColumn')[i].text_content().strip() m_page = parse(mo['url']).getroot() m_casttable = m_page.cssselect('table.cast_list') flag = 0 mo['cast'] = [] for cast in m_casttable[0]: if flag == 0: flag = 1 else: m_starname = cast[1][0][0].text_content().strip() mo['cast'].append(m_starname) movies[i] = mo return movies if __name__ == '__main__': no_of_movies = raw_input("Enter no. of Box office movies to display:") bo_movies = imdb_bo(int(no_of_movies)) for k,v in bo_movies.iteritems(): print '#'+str(k+1)+' '+v['title']+' ('+v['year']+')' print 'URL: '+v['url'] print 'Weekend: '+v['weekend'] print 'Gross: '+v['gross'] print 'Weeks: '+v['weeks'] print 'Cast: '+', '.join(v['cast']) print '\n' Output (run in terminal): parag#parag-innovate:~/python$ python imdb_bo_scraper.py Enter no. of Box office movies to display:3 #1 Cinderella (2015) URL: http://www.imdb.com/title/tt1661199?ref_=cht_bo_1 Weekend: $67.88M Gross: $67.88M Weeks: 1 Cast: Cate Blanchett, Lily James, Richard Madden, Helena Bonham Carter, Nonso Anozie, Stellan Skarsgård, Sophie McShera, Holliday Grainger, Derek Jacobi, Ben Chaplin, Hayley Atwell, Rob Brydon, Jana Perez, Alex Macqueen, Tom Edden #2 Run All Night (2015) URL: http://www.imdb.com/title/tt2199571?ref_=cht_bo_2 Weekend: $11.01M Gross: $11.01M Weeks: 1 Cast: Liam Neeson, Ed Harris, Joel Kinnaman, Boyd Holbrook, Bruce McGill, Genesis Rodriguez, Vincent D'Onofrio, Lois Smith, Common, Beau Knapp, Patricia Kalember, Daniel Stewart Sherman, James Martinez, Radivoje Bukvic, Tony Naumovski #3 Kingsman: The Secret Service (2014) URL: http://www.imdb.com/title/tt2802144?ref_=cht_bo_3 Weekend: $6.21M Gross: $107.39M Weeks: 5 Cast: Adrian Quinton, Colin Firth, Mark Strong, Jonno Davies, Jack Davenport, Alex Nikolov, Samantha Womack, Mark Hamill, Velibor Topic, Sofia Boutella, Samuel L. Jackson, Michael Caine, Taron Egerton, Geoff Bell, Jordan Long