pdfplumber | Extract text from dynamic column layouts - python
Attempted Solution at bottom of post.
I have near-working code that extracts the sentence containing a phrase, across multiple lines.
However, some pages have columns. So respective outputs are incorrect; where separate texts are wrongly merged together as a bad sentence.
This problem has been addressed in the following posts:
Solution 1
Solution 2
Question:
How do I "if-condition" whether there are columns?
Pages may not have columns,
Pages may have more than 2 columns.
Pages may also have headers and footers (that can be left out).
Example .pdf with dynamic text layout: PDF (pg. 2).
Jupyter Notebook:
# pip install PyPDF2
# pip install pdfplumber
# ---
import pdfplumber
# ---
def scrape_sentence(phrase, lines, index):
# -- Gather sentence 'phrase' occurs in --
sentence = lines[index]
print("-- sentence --", sentence)
print("len(lines)", len(lines))
# Previous lines
pre_i, flag = index, 0
while flag == 0:
pre_i -= 1
if pre_i <= 0:
break
sentence = lines[pre_i] + sentence
if '.' in lines[pre_i] or '!' in lines[pre_i] or '?' in lines[pre_i] or ' • ' in lines[pre_i]:
flag == 1
print("\n", sentence)
# Following lines
post_i, flag = index, 0
while flag == 0:
post_i += 1
if post_i >= len(lines):
break
sentence = sentence + lines[post_i]
if '.' in lines[post_i] or '!' in lines[post_i] or '?' in lines[post_i] or ' • ' in lines[pre_i]:
flag == 1
print("\n", sentence)
# -- Extract --
sentence = sentence.replace('!', '.')
sentence = sentence.replace('?', '.')
sentence = sentence.split('.')
sentence = [s for s in sentence if phrase in s]
print(sentence)
sentence = sentence[0].replace('\n', '').strip() # first occurance
print(sentence)
return sentence
# ---
phrase = 'Gulf Petrochemical Industries Company'
with pdfplumber.open('GPIC_Sustainability_Report_2016-v9_(lr).pdf') as opened_pdf:
for page in opened_pdf.pages:
text = page.extract_text()
if text == None:
continue
lines = text.split('\n')
i = 0
sentence = ''
while i < len(lines):
if phrase in lines[i]:
sentence = scrape_sentence(phrase, lines, i)
i += 1
Example Incorrect Output:
-- sentence -- being a major manufacturer within the kingdom of In 2012, Gulf Petrochemical Industries Company becomes part of
len(lines) 47
Company (GPIC)gulf petrochemical industries company (gpic) is a leading joint venture setup and owned by the government of the kingdom of bahrain, saudi basic industries corporation (sabic), kingdom of saudi arabia and petrochemical industries company (pic), kuwait. gpic was set up for the purposes of manufacturing fertilizers and petrochemicals. being a major manufacturer within the kingdom of In 2012, Gulf Petrochemical Industries Company becomes part of
Company (GPIC)gulf petrochemical industries company (gpic) is a leading joint venture setup and owned by the government of the kingdom of bahrain, saudi basic industries corporation (sabic), kingdom of saudi arabia and petrochemical industries company (pic), kuwait. gpic was set up for the purposes of manufacturing fertilizers and petrochemicals. being a major manufacturer within the kingdom of In 2012, Gulf Petrochemical Industries Company becomes part of the global transformation for a sustainable future by committing to bahrain, gpic is also a proactive stakeholder within the United Nations Global Compact’s ten principles in the realms the kingdom and the region with our activities being of Human Rights, Labour, Environment and Anti-Corruption. represented by natural gas purchases, empowering bahraini nationals through training & employment, utilisation of local contractors and suppliers, energy consumption and other financial, commercial, environmental and social activities that arise as a part of our core operations within the kingdom.GPIC becomes an organizational stakeholder of Global Reporting for the purpose of clarity throughout this report, Initiative ( GRI) in 2014. By supporting GRI, Organizational ‘gpic’, ’we’ ‘us’, and ‘our’ refer to the gulf Stakeholders (OS) like GPIC, demonstrate their commitment to transparency, accountability and sustainability to a worldwide petrochemical industries company; ‘sabic’ refers to network of multi-stakeholders.the saudi basic industries corporation; ‘pic’ refers to the petrochemical industries company, kuwait; ‘nogaholding’ refers to the oil and gas holding company, kingdom of bahrain; and ‘board’ refers to our board of directors represented by a group formed by nogaholding, sabic and pic.the oil and gas holding company (nogaholding) is GPIC is a Responsible Care Company certified for RC 14001 since July 2010. We are committed to the safe, ethical and the business and investment arm of noga (national environmentally sound management of the petrochemicals oil and gas authority) and steward of the bahrain and fertilizers we make and export. Stakeholders’ well-being is government’s investment in the bahrain petroleum always a key priority at GPIC.company (bapco), the bahrain national gas company (banagas), the bahrain national gas expansion company (bngec), the bahrain aviation fuelling company (bafco), the bahrain lube base oil company, the gulf petrochemical industries company (gpic), and tatweer petroleum.GPIC SuStaInabIlIty RePoRt 2016 01ii GPIC SuStaInabIlIty RePoRt 2016 GPIC SuStaInabIlIty RePoRt 2016 01
[' being a major manufacturer within the kingdom of In 2012, Gulf Petrochemical Industries Company becomes part of the global transformation for a sustainable future by committing to bahrain, gpic is also a proactive stakeholder within the United Nations Global Compact’s ten principles in the realms the kingdom and the region with our activities being of Human Rights, Labour, Environment and Anti-Corruption']
being a major manufacturer within the kingdom of In 2012, Gulf Petrochemical Industries Company becomes part of the global transformation for a sustainable future by committing to bahrain, gpic is also a proactive stakeholder within the United Nations Global Compact’s ten principles in the realms the kingdom and the region with our activities being of Human Rights, Labour, Environment and Anti-Corruption
...
Attempted Minimal Solution:
This will separate text into 2 columns; regardless if there are 2.
# pip install PyPDF2
# pip install pdfplumber
# ---
import pdfplumber
import decimal
# ---
with pdfplumber.open('GPIC_Sustainability_Report_2016-v9_(lr).pdf') as opened_pdf:
for page in opened_pdf.pages:
left = page.crop((0, 0, decimal.Decimal(0.5) * page.width, decimal.Decimal(0.9) * page.height))
right = page.crop((decimal.Decimal(0.5) * page.width, 0, page.width, page.height))
l_text = left.extract_text()
r_text = right.extract_text()
print("\n -- l_text --", l_text)
print("\n -- r_text --", r_text)
text = str(l_text) + " " + str(r_text)
Please let me know if there is anything else I should clarify.
This answer enables you to scrape text, in the intended order.
Towards Data Science article PDF Text Extraction in Python:
Compared with PyPDF2, PDFMiner’s scope is much more limited, it really focuses only on extracting the text from the source information of a pdf file.
from io import StringIO
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
def convert_pdf_to_string(file_path):
output_string = StringIO()
with open(file_path, 'rb') as in_file:
parser = PDFParser(in_file)
doc = PDFDocument(parser)
rsrcmgr = PDFResourceManager()
device = TextConverter(rsrcmgr, output_string, laparams=LAParams())
interpreter = PDFPageInterpreter(rsrcmgr, device)
for page in PDFPage.create_pages(doc):
interpreter.process_page(page)
return(output_string.getvalue())
file_path = '' # !
text = convert_pdf_to_string(file_path)
print(text)
Cleansing can be applied thereafter.
Related
Python Regex Capturing Multiple Matches in separate observations
I am trying to create variables location; contract items; contract code; federal aid using regex on the following text: PAGE 1 BID OPENING DATE 07/25/18 FROM 0.2 MILES WEST OF ICE HOUSE 07/26/18 CONTRACT NUMBER 03-2F1304 ROAD TO 0.015 MILES WEST OF CONTRACT CODE 'A ' LOCATION 03-ED-50-39.5/48.7 DIVISION HIGHWAY ROAD 44 CONTRACT ITEMS INSTALL SANDTRAPS AND PULLOUTS FEDERAL AID ACNH-P050-(146)E PAGE 1 BID OPENING DATE 07/25/18 IN EL DORADO COUNTY AT VARIOUS 07/26/18 CONTRACT NUMBER 03-2H6804 LOCATIONS ALONG ROUTES 49 AND 193 CONTRACT CODE 'C ' LOCATION 03-ED-0999-VAR 13 CONTRACT ITEMS TREE REMOVAL FEDERAL AID NONE PAGE 1 BID OPENING DATE 07/25/18 IN LOS ANGELES, INGLEWOOD AND 07/26/18 CONTRACT NUMBER 07-296304 CULVER CITY, FROM I-105 TO PORT CONTRACT CODE 'B ' LOCATION 07-LA-405-R21.5/26.3 ROAD UNDERCROSSING 55 CONTRACT ITEMS ROADWAY SAFETY IMPROVEMENT FEDERAL AID ACIM-405-3(056)E This text is from one word file; I'll be looping my code on multiple doc files. In the text above are three location; contract items; contract code; federal aid pairs. But when I use regex to create variables, only the first instance of each pair is included. The code I have right now is: # imports import os import pandas as pd import re import docx2txt import textract import antiword all_bod = [] all_cn = [] all_location = [] all_fedaid = [] all_contractcode = [] all_contractitems = [] all_file = [] text = ' PAGE 1 BID OPENING DATE 07/25/18 FROM 0.2 MILES WEST OF ICE HOUSE 07/26/18 CONTRACT NUMBER 03-2F1304 ROAD TO 0.015 MILES WEST OF CONTRACT CODE 'A ' LOCATION 03-ED-50-39.5/48.7 DIVISION HIGHWAY ROAD 44 CONTRACT ITEMS INSTALL SANDTRAPS AND PULLOUTS FEDERAL AID ACNH-P050-(146)E PAGE 1 BID OPENING DATE 07/25/18 IN EL DORADO COUNTY AT VARIOUS 07/26/18 CONTRACT NUMBER 03-2H6804 LOCATIONS ALONG ROUTES 49 AND 193 CONTRACT CODE 'C ' LOCATION 03-ED-0999-VAR 13 CONTRACT ITEMS TREE REMOVAL FEDERAL AID NONE PAGE 1 BID OPENING DATE 07/25/18 IN LOS ANGELES, INGLEWOOD AND 07/26/18 CONTRACT NUMBER 07-296304 CULVER CITY, FROM I-105 TO PORT CONTRACT CODE 'B ' LOCATION 07-LA-405-R21.5/26.3 ROAD UNDERCROSSING 55 CONTRACT ITEMS ROADWAY SAFETY IMPROVEMENT FEDERAL AID ACIM-405-3(056)E' bod1 = re.search('BID OPENING DATE \s+ (\d+\/\d+\/\d+)', text) bod2 = re.search('BID OPENING DATE\n\n(\d+\/\d+\/\d+)', text) if not(bod1 is None): bod = bod1.group(1) elif not(bod2 is None): bod = bod2.group(1) else: bod = 'NA' all_bod.append(bod) # creating contract number cn1 = re.search('CONTRACT NUMBER\n+(.*)', text) cn2 = re.search('CONTRACT NUMBER\s+(.........)', text) if not(cn1 is None): cn = cn1.group(1) elif not(cn2 is None): cn = cn2.group(1) else: cn = 'NA' all_cn.append(cn) # location location1 = re.search('LOCATION \s+\S+', text) location2 = re.search('LOCATION \n+\S+', text) if not(location1 is None): location = location1.group(0) elif not(location2 is None): location = location2.group(0) else: location = 'NA' all_location.append(location) # federal aid fedaid = re.search('FEDERAL AID\s+\S+', text) fedaid = fedaid.group(0) all_fedaid.append(fedaid) # contract code contractcode = re.search('CONTRACT CODE\s+\S+', text) contractcode = contractcode.group(0) all_contractcode.append(contractcode) # contract items contractitems = re.search('\d+ CONTRACT ITEMS', text) contractitems = contractitems.group(0) all_contractitems.append(contractitems) This code parses the only first instance of these variables in the text. contract-number location contract-items contract-code federal-aid 03-2F1304 03-ED-50-39.5/48.7 44 A ACNH-P050-(146)E But, I am trying to figure out a way to get all possible instances in different observations. contract-number location contract-items contract-code federal-aid 03-2F1304 03-ED-50-39.5/48.7 44 A ACNH-P050-(146)E 03-2H6804 03-ED-0999-VAR 13 C NONE 07-296304 07-LA-405-R21.5/26.3 55 B ACIM-405-3(056)E The all_variables in the code are for looping over multiple word files - we can ignore that if we want :). Any leads would be super helpful. Thanks so much!
import re data = [] df = pd.DataFrame() regex_contract_number =r"(?:CONTRACT NUMBER\s+(?P<contract_number>\S+?)\s)" regex_location = r"(?:LOCATION\s+(?P<location>\S+))" regex_contract_items = r"(?:(?P<contract_items>\d+)\sCONTRACT ITEMS)" regex_federal_aid =r"(?:FEDERAL AID\s+(?P<federal_aid>\S+?)\s)" regex_contract_code =r"(?:CONTRACT CODE\s+\'(?P<contract_code>\S+?)\s)" regexes = [regex_contract_number,regex_location,regex_contract_items,regex_federal_aid,regex_contract_code] for regex in regexes: for match in re.finditer(regex, text): data.append(match.groupdict()) df = pd.concat([df, pd.DataFrame(data)], axis=1) data = [] df
Beautiful Soup Craigslist Scraping Pricing is the same
I am trying to scrape Craigslist using BeautifulSoup4. All data shows properly EXCEPT price. I can't seem to find the right tagging to loop through pricing instead of showing the same price for each post. import requests from bs4 import BeautifulSoup source = requests.get('https://washingtondc.craigslist.org/search/nva/sss?query=5%20hp%20boat%20motor&sort=rel').text soup = BeautifulSoup(source, 'lxml') for summary in soup.find_all('p', class_='result-info'): pricing = soup.find('span', class_='result-price') price = pricing title = summary.a.text url = summary.a['href'] print(title + '\n' + price.text + '\n' + url + '\n') Left: HTML code from Craigslist, commented out is irrelevant (in my opinion) code. I want pricing to not loop the same number. Right: Sublime SS of code. Snippet of code running through terminal. Pricing is the same for each post. Thank you
Your script is almost correct. You need to change the soup object for the price to summary import requests from bs4 import BeautifulSoup source = requests.get('https://washingtondc.craigslist.org/search/nva/sss?query=5%20hp%20boat%20motor&sort=rel').text soup = BeautifulSoup(source, 'lxml') for summary in soup.find_all('p', class_='result-info'): price = summary.find('span', class_='result-price') title = summary.a.text url = summary.a['href'] print(title + '\n' + price.text + '\n' + url + '\n') Output: Boat Water Tender - 10 Tri-Hull with Electric Trolling Motor $629 https://washingtondc.craigslist.org/nva/boa/d/haymarket-boat-water-tender-10-tri-hull/7160572264.html 1987 Boston Whaler Montauk 17 $25450 https://washingtondc.craigslist.org/nva/boa/d/alexandria-1987-boston-whaler-montauk-17/7163033134.html 1971 Westerly Warwick Sailboat $3900 https://washingtondc.craigslist.org/mld/boa/d/upper-marlboro-1971-westerly-warwick/7170495800.html Buy or Rent. DC Party Pontoon for Dock Parties or Cruises $15000 https://washingtondc.craigslist.org/doc/boa/d/washington-buy-or-rent-dc-party-pontoon/7157810378.html West Marine Zodiac Inflatable Boat SB285 With 5HP Gamefisher (Merc) $850 https://annapolis.craigslist.org/boa/d/annapolis-west-marine-zodiac-inflatable/7166031908.html 2012 AB aluminum/hypalon inflatable dinghy/2012 Yamaha 6hp four stroke $3400 https://annapolis.craigslist.org/bpo/d/annapolis-2012-ab-aluminum-hypalon/7157768911.html RHODES-18’ CENTERBOARD DAYSAILER $6500 https://annapolis.craigslist.org/boa/d/ocean-view-rhodes-18-centerboard/7148322078.html Mercury Outboard 7.5 HP $250 https://baltimore.craigslist.org/bpo/d/middle-river-mercury-outboard-75-hp/7167399866.html 8 hp yamaha 2 stroke $0 https://baltimore.craigslist.org/bpo/d/8-hp-yamaha-2-stroke/7154103281.html TRADE 38' BENETEAU IDYLLE 1150 $35000 https://baltimore.craigslist.org/boa/d/middle-river-trade-38-beneteau-idylle/7163761741.html 5-hp Top Tank Mercury $0 https://baltimore.craigslist.org/bpo/d/5-hp-top-tank-mercury/7154102434.html 5-hp Top Tank Mercury $0 https://baltimore.craigslist.org/bpo/d/5-hp-top-tank-mercury/7154102744.html Wanted ur unwanted outboards $0 https://baltimore.craigslist.org/bpo/d/randallstown-wanted-ur-unwanted/7141349142.html Grumman Sport Boat $2250 https://baltimore.craigslist.org/boa/d/baldwin-grumman-sport-boat/7157186381.html 1996 Carver 355 Aft Cabin Motor Yacht $47000 https://baltimore.craigslist.org/boa/d/middle-river-1996-carver-355-aft-cabin/7156830617.html Lower unit, long shaft $50 https://baltimore.craigslist.org/bpo/d/catonsville-lower-unit-long-shaft/7155566763.html Lower unit, long shaft $50 https://baltimore.craigslist.org/bpo/d/catonsville-lower-unit-long-shaft/7155565771.html Lower unit, long shaft $50 https://baltimore.craigslist.org/bpo/d/catonsville-lower-unit-long-shaft/7155566035.html Lower unit, long shaft $50 https://baltimore.craigslist.org/bpo/d/catonsville-lower-unit-long-shaft/7155565301.html Cape Dory 25 Sailboat for sale or trade $6500 https://baltimore.craigslist.org/boa/d/reedville-cape-dory-25-sailboat-for/7149227778.html West Marine HP-V 350 $1200 https://baltimore.craigslist.org/boa/d/pasadena-west-marine-hp-350/7147285666.html
Is there a way to properly convert data from lists to a CSV file using BeautifulSoup?
I am trying to create a webscraper for a website. The problem is that after the collected data is stored in a list, I'm not able to write this to a csv file properly. I have been stuck for ages with this problem and hopefully someone has an idea about how to fix this one! The loop to get the data from the web pages: import csv from htmlrequest import simple_get from htmlrequest import BeautifulSoup # Define variables listData = ['Companies', 'Locations', 'Descriptions'] plus = 15 max = 30 count = 0 # while loop to repeat process till max is reached while (count <= max): start = 'https://www.companiesintheuk.co.uk/find?q=Activities+of+sport+clubs&start=' + str(count) + '&s=h&t=SicCodeSearch&location=&sicCode=93120' raw_html = simple_get(start) soup = BeautifulSoup(raw_html, 'html.parser') for i, div in enumerate(soup.find_all('div', class_="search_result_title")): listData[0] = listData[0].strip() + div.text for i, div2 in enumerate(soup.find_all('div', class_="searchAddress")): listData[1] = listData[1].strip() + div2.text # This is extra information # for i, div3 in enumerate(soup.find_all('div', class_="searchSicCode")): # listData[2] = listData[2].strip() + div3.text count = count + plus output example if printed: Companies (AMG) AGILITY MANAGEMENT GROUP LTD (KLA) LIONS/LIONESS FOOTBALL TEAMS WORLD CUP LTD (Dissolved) 1 SPORT ORGANISATION LIMITED 100UK LTD 1066 GYMNASTICS 1066 SPECIALS 10COACHING LIMITED 147 LOUNGE LTD 147 SNOOKER AND POOL CLUB (LEICESTER) LIMITED Locations ENGLAND, BH8 9PS LONDON, EC2M 2PL ENGLAND, LS7 3JB ENGLAND, LE2 8FN UNITED KINGDOM, N18 2QX AVON, BS5 0JH UNITED KINGDOM, WC2H 9JQ UNITED KINGDOM, SE18 5SZ UNITED KINGDOM, EC1V 2NX I've tried to get it into a CSV file by using this code but I can't figure out how to properly format my output! Any suggestions are welcome. # writing to csv with open('test.csv', 'w') as csvfile: write = csv.writer(csvfile, delimiter=',') write.writerow(['Name','Location']) write.writerow([listData[0],listData[1]]) print("Writing has been done!") I want the code to be able to format it properly in the csv file to be able to import the two rows in a database. This is the output when I write the data on 'test.csv' which will result into this when opened up The expected outcome would be something like this!
I'm not sure how it is improperly formatted, but maybe you just need to replace with open('test.csv', 'w') with with open('test.csv', 'w+', newline='') I've combined your code (taking out htmlrequests for requests and bs4 modules and also not using listData, but instead creating my own lists. I've left your lists but they do nothing): import csv import bs4 import requests # Define variables listData = ['Companies', 'Locations', 'Descriptions'] company_list = [] locations_list = [] plus = 15 max = 30 count = 0 # while loop to repeat process till max is reached while count <= max: start = 'https://www.companiesintheuk.co.uk/find?q=Activities+of+sport+clubs&start={}&s=h&t=SicCodeSearch&location=&sicCode=93120'.format(count) res = requests.get(start) soup = bs4.BeautifulSoup(res.text, 'html.parser') for i, div in enumerate(soup.find_all('div', class_="search_result_title")): listData[0] = listData[0].strip() + div.text company_list.append(div.text.strip()) for i, div2 in enumerate(soup.find_all('div', class_="searchAddress")): listData[1] = listData[1].strip() + div2.text locations_list.append(div2.text.strip()) # This is extra information # for i, div3 in enumerate(soup.find_all('div', class_="searchSicCode")): # listData[2] = listData[2].strip() + div3.text count = count + plus if len(company_list) == len(locations_list): with open('test.csv', 'w+', newline='') as csvfile: writer = csv.writer(csvfile, delimiter=',') writer.writerow(['Name', 'Location']) for i in range(len(company_list)): writer.writerow([company_list[i], locations_list[i]]) Which generates a csv file like: Name,Location (AMG) AGILITY MANAGEMENT GROUP LTD,"UNITED KINGDOM, M6 6DE" "(KLA) LIONS/LIONESS FOOTBALL TEAMS WORLD CUP LTD (Dissolved)","ENGLAND, BD1 2PX" 0161 STUDIOS LTD,"UNITED KINGDOM, HD6 3AX" 1 CLICK SPORTS MANAGEMENT LIMITED,"ENGLAND, E10 5PW" 1 SPORT ORGANISATION LIMITED,"UNITED KINGDOM, CR2 6NF" 100UK LTD,"UNITED KINGDOM, BN14 9EJ" 1066 GYMNASTICS,"EAST SUSSEX, BN21 4PT" 1066 SPECIALS,"EAST SUSSEX, TN40 1HE" 10COACHING LIMITED,"UNITED KINGDOM, SW6 6LR" 10IS ACADEMY LIMITED,"ENGLAND, PE15 9PS" "10TH MAN LIMITED (Dissolved)","GLASGOW, G3 6AN" 12 GAUGE EAST MANCHESTER COMMUNITY MMA LTD,"ENGLAND, OL9 8DQ" 121 MAKING WAVES LIMITED,"TYNE AND WEAR, NE30 1AR" 121 WAVES LTD,"TYNE AND WEAR, NE30 1AR" 1-2-KICK LTD,"ENGLAND, BH8 9PS" "147 HAVANA LIMITED (Liquidation)","LONDON, EC2M 2PL" 147 LOUNGE LTD,"ENGLAND, LS7 3JB" 147 SNOOKER AND POOL CLUB (LEICESTER) LIMITED,"ENGLAND, LE2 8FN" 1ACTIVE LTD,"UNITED KINGDOM, N18 2QX" 1ON1 KING LTD,"AVON, BS5 0JH" 1PUTT LTD,"UNITED KINGDOM, WC2H 9JQ" 1ST SPORTS LTD,"UNITED KINGDOM, SE18 5SZ" 2 BRO PRO EVENTS LTD,"UNITED KINGDOM, EC1V 2NX" 2 SPLASH SWIM SCHOOL LTD,"ENGLAND, B36 0EY" 2 STEPPERS C.I.C.,"SURREY, CR0 6BX" 2017 MOTO LIMITED,"UNITED KINGDOM, ME2 4NW" 2020 ARCHERY LTD,"LONDON, SE16 6SS" 21 LEISURE LIMITED,"LONDON, EC4M 7WS" 261 FEARLESS CLUB UNITED KINGDOM CIC,"LANCASHIRE, LA2 8RF" 2AIM4 LIMITED,"HERTFORDSHIRE, SG2 0JD" 2POINT4 FM LTD,"LONDON, NW10 8LW" 3 LIONS SCHOOL OF SPORT LTD,"BRISTOL, BS20 8BU" 3 PT LTD,"ANTRIM, BT40 2FB" 3 PUTT LIFE LTD,"UNITED KINGDOM, LU3 2DP" 3 THIRTY SEVEN LTD,"KENT, DA9 9RS" 3:30 SOCCER SCHOOL LTD,"UNITED KINGDOM, EH6 7JB" 30 MINUTE WORKOUT (LLANISHEN) LTD,"PONTYCLUN, CF72 9UA" 321 RELAX LTD,"MID GLAMORGAN, CF83 3HL" 360 MOTOR RACING CLUB LTD,"HALSTEAD, CO9 2ET" 3LIONSATHLETICS LIMITED,"ENGLAND, S3 8DB" 3S SWIM ROMFORD LTD,"UNITED KINGDOM, DA9 9DR" 3XL EVENT MANAGEMENT LIMITED,"KENT, BR3 4NW" 3XL MOTORSPORT MANAGEMENT LIMITED,"KENT, BR3 4NW" 4 CORNER FOOTBALL LTD,"BROMLEY, BR1 5DD" 4 PRO LTD,"UNITED KINGDOM, FY5 5HT" Which seems fine to me, but your post was very unclear about how you expected it to be formatted so I really have no idea
posting more than one line of text on fb status [Python] [Facebook API]
I started this project in the school holidays as a way to keep practising and enhancing my python knowledge. To put it shortly the code is a facebook bot that randomly generates NBA teams, Players and positions which should look like this when run. Houston Rockets [PG] Ryan Anderson [PF] Michael Curry [SF] Marcus Morris [C] Bob Royer [SF] Brian Heaney I'm currently having trouble when it comes to posting my code to my facebook page where instead of posting 1 team and 5 players/positions the programme will only post 1 single player like this Ryan Anderson Here is my code import os import random import facebook token = "...." fb = facebook.GraphAPI(access_token = token) parent_dir = "../NBAbot" os.chdir(parent_dir) file_name = "nba_players.txt" def random_position(): """Random Team from list""" position = ['[PG]','[SG]','[SF]','[PF]','[C]',] random.shuffle(position) position = position.pop() return(position) def random_team(): """Random Team from list""" Team = ['Los Angeles Lakers','Golden State Warriors','Toronto Raptors','Boston Celtics','Cleveland Cavaliers','Houston Rockets','San Antonio Spurs','New York Knicks','Chicago Bulls','Minnesota Timberwolves','Philadelphia 76ers','Miami Heat','Milwaukee','Portland Trail Blazers','Dallas Mavericks','Phoenix Suns','Denver Nuggets','Utah Jazz','Indiana Pacers','Los Angeles Clippers','Washington Wizards','Brooklyn Nets','New Orleans Pelicans','Sacramento Kings','Atlanta Hawks','Detroit Pistons','Memphis Grizzlies','Charlotte Hornets','Orlando Magic'] random.shuffle(Team) Team = Team.pop() return(Team) def random_player(datafile): read_mode = "r" with open (datafile, read_mode) as read_file: the_line = read_file.readlines() return(random.choice(the_line)) def main(): return( random_team(), random_position(), random_player(file_name), random_position(), random_player(file_name), random_position(), random_player(file_name), random_position(), random_player(file_name), random_position(), random_player(file_name)) fb.put_object(parent_object="me", connection_name='feed', message=main()) any help is appreciated.
BeautifulSoup, extract a table (from poorly designed site) and turn it into a CSV
I'm trying to extract this table in whole - any tips? I've tried the following code 8 different ways, with no avail. Thank you! data = [] table = soup.find_all("tbody") rows = table.find_all("tr") for row in rows: cols = row.find_all("td") cols = [ele.text.strip() for ele in cols] data.append([ele for ele in cols if ele])
Code: import requests from bs4 import BeautifulSoup html = requests.get('http://www.boxofficemojo.com/alltime/adjusted.htm').text soup = BeautifulSoup(html, 'html.parser') table = soup.find('table', cellspacing='1') f = open('data.csv','w') for row in table.find_all('tr'): print(''.join(row.findAll(text=True)).replace('\n', '|')) f.write(''.join(row.findAll(text=True)).replace('\n', '|') + '\n') f.close() Output: 1|Gone with the Wind|MGM|$1,854,769,700|$198,676,459|1939^| 2|Star Wars|Fox|$1,635,137,900|$460,998,007|1977^| 3|The Sound of Music|Fox|$1,307,373,200|$158,671,368|1965| 4|E.T.: The Extra-Terrestrial|Uni.|$1,302,222,800|$435,110,554|1982^| 5|Titanic|Par.|$1,244,347,300|$659,363,944|1997^| 6|The Ten Commandments|Par.|$1,202,580,000|$65,500,000|1956| 7|Jaws|Uni.|$1,175,763,500|$260,000,000|1975| 8|Doctor Zhivago|MGM|$1,139,563,500|$111,721,910|1965| 9|The Exorcist|WB|$1,015,300,400|$232,906,145|1973^| 10|Snow White and the Seven Dwarfs|Dis.|$1,000,620,000|$184,925,486|1937^| 11|Star Wars: The Force Awakens|BV|$992,496,600|$936,662,225|2015| 12|101 Dalmatians|Dis.|$917,240,400|$144,880,014|1961^| 13|The Empire Strikes Back|Fox|$901,298,200|$290,475,067|1980^| 14|Ben-Hur|MGM|$899,640,000|$74,000,000|1959| 15|Avatar|Fox|$893,301,900|$760,507,625|2009^| 16|Return of the Jedi|Fox|$863,465,400|$309,306,177|1983^| 17|Jurassic Park|Uni.|$843,843,500|$402,453,882|1993^| 18|Star Wars: Episode I - The Phantom Menace|Fox|$829,064,800|$474,544,677|1999^| 19|The Lion King|BV|$818,364,200|$422,783,777|1994^| 20|The Sting|Uni.|$818,331,400|$156,000,000|1973| 21|Raiders of the Lost Ark|Par.|$812,675,900|$248,159,971|1981^| 22|The Graduate|AVCO|$785,595,300|$104,945,305|1967^| 23|Fantasia|Dis.|$762,339,100|$76,408,097|1941^| 24|Jurassic World|Uni.|$725,671,700|$652,270,625|2015| 25|The Godfather|Par.|$724,509,200|$134,966,411|1972^| 26|Forrest Gump|Par.|$721,682,300|$330,252,182|1994^| 27|Mary Poppins|Dis.|$717,709,100|$102,272,727|1964^| 28|Grease|Par.|$706,577,200|$188,755,690|1978^| 29|Marvel's The Avengers|BV|$705,769,500|$623,357,910|2012| 30|Thunderball|UA|$686,664,000|$63,595,658|1965| 31|The Dark Knight|WB|$683,575,000|$534,858,444|2008^| 32|The Jungle Book|Dis.|$676,381,600|$141,843,612|1967^| 33|Sleeping Beauty|Dis.|$667,166,200|$51,600,000|1959^| 34|Ghostbusters|Col.|$653,374,800|$242,212,467|1984^| 35|Shrek 2|DW|$652,247,500|$441,226,247|2004| 36|Butch Cassidy and the Sundance Kid|Fox|$647,721,100|$102,308,889|1969| 37|Love Story|Par.|$642,583,000|$106,397,186|1970| 38|Spider-Man|Sony|$637,870,000|$403,706,375|2002| 39|Independence Day|Fox|$635,888,300|$306,169,268|1996^| 40|Home Alone|Fox|$621,799,900|$285,761,243|1990| 41|Pinocchio|Dis.|$618,762,600|$84,254,167|1940^| 42|Cleopatra (1963)|Fox|$616,744,200|$57,777,778|1963| 43|Beverly Hills Cop|Par.|$616,437,200|$234,760,478|1984| 44|Star Wars: The Last Jedi|BV|$615,738,300|$615,738,279|2017| 45|Goldfinger|UA|$608,634,000|$51,081,062|1964| 46|Airport|Uni.|$606,901,600|$100,489,151|1970| 47|American Graffiti|Uni.|$603,257,100|$115,000,000|1973| 48|The Robe|Fox|$600,872,700|$36,000,000|1953| 49|Pirates of the Caribbean: Dead Man's Chest|BV|$593,288,400|$423,315,812|2006| 50|Around the World in 80 Days|UA|$593,169,200|$42,000,000|1956| 51|Bambi|RKO|$584,880,300|$102,247,150|1942^| 52|Blazing Saddles|WB|$580,539,700|$119,601,481|1974^| 53|Batman|WB|$577,923,400|$251,188,924|1989| 54|The Bells of St. Mary's|RKO|$576,000,000|$21,333,333|1945| 55|The Lord of the Rings: The Return of the King|NL|$565,852,400|$377,845,905|2003^| 56|Finding Nemo|BV|$565,364,200|$380,843,261|2003^| 57|The Towering Inferno|Fox|$563,428,600|$116,000,000|1974| 58|Rogue One: A Star Wars Story|BV|$554,854,100|$532,177,324|2016| 59|Cinderella (1950)|Dis.|$553,567,100|$93,141,149|1950^| 60|Spider-Man 2|Sony|$552,257,300|$373,585,825|2004| 61|My Fair Lady|WB|$550,800,000|$72,000,000|1964| 62|The Greatest Show on Earth|Par.|$550,800,000|$36,000,000|1952| 63|National Lampoon's Animal House|Uni.|$549,792,700|$141,600,000|1978^| 64|The Passion of the Christ|NM|$548,090,400|$370,782,930|2004^| 65|Star Wars: Episode III - Revenge of the Sith|Fox|$544,599,700|$380,270,577|2005^| 66|Back to the Future|Uni.|$542,085,000|$210,609,762|1985| 67|The Lord of the Rings: The Two Towers|NL|$529,918,100|$342,551,365|2002^| 68|The Dark Knight Rises|WB|$528,601,000|$448,139,099|2012| 69|The Sixth Sense|BV|$528,576,400|$293,506,292|1999| 70|Superman|WB|$526,547,600|$134,218,018|1978| 71|Tootsie|Col.|$522,378,200|$177,200,000|1982| 72|Smokey and the Bandit|Uni.|$521,726,300|$126,737,428|1977| 73|Beauty and the Beast (2017)|BV|$521,407,600|$504,014,165|2017| 74|Finding Dory|BV|$515,531,300|$486,295,561|2016| 75|West Side Story|MGM|$513,807,200|$43,656,822|1961| 76|Close Encounters of the Third Kind|Col.|$513,370,800|$135,189,114|1977^| 77|Harry Potter and the Sorcerer's Stone|WB|$513,281,200|$317,575,550|2001| 78|Lady and the Tramp|Dis.|$511,646,200|$93,602,326|1955^| 79|Lawrence of Arabia|Col.|$508,421,000|$44,824,144|1962^| 80|The Rocky Horror Picture Show|Fox|$505,537,300|$112,892,319|1975| 81|Rocky|UA|$505,267,000|$117,235,147|1976| 82|The Best Years of Our Lives|RKO|$504,900,000|$23,650,000|1946| 83|The Poseidon Adventure|Fox|$504,000,000|$84,563,118|1972| 84|The Lord of the Rings: The Fellowship of the Ring|NL|$503,057,400|$315,544,750|2001^| 85|Twister|WB|$502,037,000|$241,721,524|1996| 86|Men in Black|Sony|$501,381,100|$250,690,539|1997| 87|The Bridge on the River Kwai|Col.|$499,392,000|$27,200,000|1957| 88|Transformers: Revenge of the Fallen|P/DW|$494,810,500|$402,111,870|2009| 89|It's a Mad, Mad, Mad, Mad World|MGM|$494,576,300|$46,332,858|1963| 90|Swiss Family Robinson|Dis.|$493,957,400|$40,356,000|1960| 91|One Flew Over the Cuckoo's Nest|UA|$492,831,600|$108,981,275|1975| 92|M.A.S.H.|Fox|$492,821,000|$81,600,000|1970| 93|Indiana Jones and the Temple of Doom|Par.|$491,431,300|$179,870,271|1984| 94|Avengers: Age of Ultron|BV|$491,377,100|$459,005,868|2015| 95|Star Wars: Episode II - Attack of the Clones|Fox|$490,840,600|$310,676,740|2002^| 96|Toy Story 3|BV|$489,656,000|$415,004,880|2010| 97|Mrs. Doubtfire|Fox|$483,642,600|$219,195,243|1993| 98|Aladdin|BV|$481,420,700|$217,350,219|1992| 99|Ghost|Par.|$472,450,700|$217,631,306|1990| 100|The Hunger Games: Catching Fire|LGF|$469,232,400|$424,668,047|2013| 101|Duel in the Sun|Selz.|$468,367,300|$20,408,163|1946| 102|The Hunger Games|LGF|$466,924,700|$408,010,692|2012| 103|Pirates of the Caribbean: The Curse of the Black Pearl|BV|$464,956,900|$305,413,918|2003| 104|House of Wax|WB|$463,883,000|$23,750,000|1953| 105|Rear Window|Par.|$462,256,500|$36,764,313|1954^| 106|The Lost World: Jurassic Park|Uni.|$458,173,400|$229,086,679|1997| 107|Indiana Jones and the Last Crusade|Par.|$453,643,400|$197,171,806|1989| 108|Monsters, Inc.|BV|$453,061,600|$289,916,256|2001^| 109|Frozen|BV|$450,196,500|$400,738,009|2013| 110|Spider-Man 3|Sony|$449,033,200|$336,530,303|2007| 111|Iron Man 3|BV|$448,060,700|$409,013,994|2013| 112|Terminator 2: Judgment Day|TriS|$447,732,400|$205,881,154|1991^| 113|Sergeant York|WB|$441,770,900|$16,361,885|1941| 114|How the Grinch Stole Christmas|Uni.|$441,620,600|$260,044,825|2000| 115|Top Gun|Par.|$440,917,900|$179,800,601|1986^| 116|Harry Potter and the Deathly Hallows Part 2|WB|$440,547,300|$381,011,219|2011| 117|Toy Story 2|BV|$439,139,300|$245,852,179|1999^| 118|Shrek|DW|$434,128,000|$267,665,011|2001| 119|Shrek the Third|P/DW|$430,606,000|$322,719,944|2007| 120|Despicable Me 2|Uni.|$430,487,800|$368,061,265|2013| 121|Captain America: Civil War|BV|$429,213,000|$408,084,349|2016| 122|The Matrix Reloaded|WB|$428,668,600|$281,576,461|2003| 123|Transformers|P/DW|$425,970,900|$319,246,193|2007| 124|Crocodile Dundee|Par.|$424,138,600|$174,803,506|1986| 125|Wonder Woman|WB|$423,340,500|$412,563,408|2017| 126|The Four Horsemen of the Apocalypse|MPC|$421,530,600|$9,183,673|1921| 127|Saving Private Ryan|DW|$419,958,100|$216,540,909|1998| 128|Young Frankenstein|Fox|$419,041,900|$86,273,333|1974| 129|Peter Pan|Dis.|$418,824,000|$87,404,651|1953^| 130|Gremlins|WB|$417,526,300|$153,083,102|1984^| 131|Beauty and the Beast|BV|$416,438,900|$218,967,620|1991^| 132|The Chronicles of Narnia: The Lion, the Witch and the Wardrobe|BV|$414,717,600|$291,710,957|2005| 133|Harry Potter and the Goblet of Fire|WB|$414,709,000|$290,013,036|2005| 134|Pirates of the Caribbean: At World's End|BV|$412,860,400|$309,420,425|2007| 135|Harry Potter and the Chamber of Secrets|WB|$412,327,800|$261,988,482|2002| 136|The Fugitive|WB|$407,567,300|$183,875,760|1993| 137|The Caine Mutiny|Col.|$407,479,600|$21,750,000|1954| 138|Iron Man|Par.|$407,095,000|$318,412,101|2008| 139|Transformers: Dark of the Moon|P/DW|$406,315,000|$352,390,543|2011| 140|Meet the Fockers|Uni.|$405,508,300|$279,261,160|2004| 141|Indiana Jones and the Kingdom of the Crystal Skull|Par.|$405,430,100|$317,101,119|2008| 142|Toy Story|BV|$402,711,200|$191,796,233|1995^| 143|Dances with Wolves|Orion|$401,159,500|$184,208,848|1990| 144|An Officer and a Gentleman|Par.|$400,769,900|$129,795,554|1982| 145|Guardians of the Galaxy Vol. 2|BV|$399,848,900|$389,813,101|2017| 146|2001: A Space Odyssey|MGM|$397,829,200|$56,954,992|1968^| 147|Rain Man|MGM|$397,417,800|$172,825,435|1988| 148|The Secret Life of Pets|Uni.|$397,253,600|$368,384,330|2016| 149|Guess Who's Coming to Dinner|Col.|$397,099,200|$56,666,667|1967| 150|Inside Out|BV|$396,452,900|$356,461,711|2015| 151|American Sniper|WB|$395,474,400|$350,126,372|2014| 152|Kramer Vs. Kramer|Col.|$394,925,800|$106,260,000|1979| 153|Armageddon|BV|$394,560,300|$201,578,182|1998| 154|Psycho|Uni.|$391,680,100|$32,000,000|1960| 155|Rocky III|UA|$390,271,700|$125,049,125|1982^| 156|Harry Potter and the Order of the Phoenix|WB|$389,622,600|$292,004,738|2007| 157|Rambo: First Blood Part II|TriS|$388,961,600|$150,415,432|1985| 158|Batman Forever|WB|$388,369,100|$184,031,112|1995| 159|Deadpool|Fox|$388,249,600|$363,070,709|2016| 160|Pretty Woman|BV|$387,179,600|$178,406,268|1990| 161|Earthquake|Uni.|$386,952,300|$79,666,653|1974| 162|Alice in Wonderland (2010)|BV|$385,896,200|$334,191,110|2010| 163|The Incredibles|BV|$385,835,000|$261,441,092|2004| 164|Cast Away|Fox|$384,588,700|$233,632,142|2000| 165|Home Alone 2: Lost in New York|Fox|$384,179,200|$173,585,516|1992| 166|The Jungle Book (2016)|BV|$382,904,500|$364,001,123|2016| 167|Three Men and a Baby|BV|$382,840,700|$167,780,960|1987| 168|My Big Fat Greek Wedding|IFC|$380,230,800|$241,438,208|2002| 169|Guardians of the Galaxy|BV|$378,010,100|$333,176,600|2014| 170|Furious 7|Uni.|$376,598,400|$353,007,020|2015| 171|Mission: Impossible|Par.|$375,885,400|$180,981,856|1996| 172|The Hunger Games: Mockingjay - Part 1|LGF|$373,872,900|$337,135,885|2014| 173|Minions|Uni.|$373,756,800|$336,045,770|2015| 174|Saturday Night Fever|Par.|$372,751,500|$94,213,184|1977| 175|On Golden Pond|Uni.|$372,564,100|$119,285,432|1981| 176|Austin Powers: The Spy Who Shagged Me|NL|$372,332,300|$206,040,086|1999| 177|Harry Potter and the Half-Blood Prince|WB|$371,524,900|$301,959,197|2009| 178|Bruce Almighty|Uni.|$369,680,400|$242,829,261|2003| 179|Harry Potter and the Prisoner of Azkaban|WB|$368,886,800|$249,541,069|2004| 180|Funny Girl|Col.|$367,562,200|$52,223,306|1968^| 181|Mission: Impossible II|Par.|$366,876,200|$215,409,889|2000| 182|Rush Hour 2|NL|$366,817,700|$226,164,286|2001| 183|Apollo 13|Uni.|$365,894,000|$173,837,933|1995^| 184|Patton|Fox|$365,718,000|$61,749,765|1970| 185|Fatal Attraction|Par.|$364,269,300|$156,645,693|1987| 186|Zootopia|BV|$363,584,000|$341,268,248|2016| 187|Liar Liar|Uni.|$362,821,200|$181,410,615|1997| 188|Robin Hood: Prince of Thieves|WB|$360,863,200|$165,493,908|1991| 189|Beverly Hills Cop II|Par.|$360,778,800|$153,665,036|1987| 190|Iron Man 2|Par.|$360,772,100|$312,433,331|2010| 191|Up|BV|$360,533,300|$293,004,164|2009| 192|Batman Returns|WB|$360,191,600|$162,831,698|1992| 193|Signs|BV|$360,164,800|$227,966,634|2002| 194|Jumanji: Welcome to the Jungle|Sony|$358,036,900|$358,036,871|2017| 195|The Twilight Saga: Eclipse|Sum.|$357,823,200|$300,531,751|2010| 196|Superman II|WB|$357,246,300|$108,185,706|1981| 197|The Twilight Saga: New Moon|Sum.|$357,194,500|$296,623,634|2009| 198|What's Up, Doc?|WB|$356,400,000|$66,000,000|1972| 199|9 to 5|Fox|$352,493,200|$103,290,500|1980| 200|Batman v Superman: Dawn of Justice|WB|$351,232,600|$330,360,194|2016| 201|The Firm|Par.|$351,120,300|$158,348,367|1993| 202|Suicide Squad|WB|$350,483,800|$325,100,054|2016| 203|Who Framed Roger Rabbit|BV|$349,448,400|$156,452,370|1988| 204|Inception|WB|$348,133,400|$292,576,195|2010| 205|Skyfall|Sony|$347,389,600|$304,360,277|2012| 206|The Hobbit: An Unexpected Journey|WB (NL)|$347,313,400|$303,003,568|2012| 207|Porky's|Fox|$346,289,600|$111,289,673|1982^| 208|Air Force One|Sony|$345,835,200|$172,956,409|1997| 209|Stir Crazy|Col.|$345,700,400|$101,300,000|1980| 210|A Star Is Born (1976)|WB|$344,788,700|$80,000,000|1976| 211|There's Something About Mary|Fox|$344,053,800|$176,484,651|1998| 212|Spider-Man: Homecoming|Sony|$343,499,000|$334,201,140|2017| 213|Cars|BV|$342,088,800|$244,082,982|2006| 214|The Hangover|WB|$341,182,900|$277,322,503|2009| 215|Lethal Weapon 2|WB|$340,501,700|$147,253,986|1989| 216|Night at the Museum|Fox|$340,041,900|$250,863,268|2006| 217|Harry Potter and the Deathly Hallows Part 1|WB|$339,560,700|$295,983,305|2010| 218|I Am Legend|WB|$337,126,200|$256,393,010|2007| 219|Austin Powers in Goldmember|NL|$337,033,800|$213,307,889|2002| 220|War of the Worlds|Par.|$335,521,600|$234,280,354|2005| 221|It|WB (NL)|$335,148,900|$327,481,748|2017| 222|Every Which Way But Loose|WB|$334,232,400|$85,196,485|1978| 223|The Twilight Saga: Breaking Dawn Part 2|LG/S|$333,495,700|$292,324,737|2012| 224|The Love Bug|Dis.|$331,410,900|$51,264,000|1969| 225|The Twilight Saga: Breaking Dawn Part 1|Sum.|$329,680,800|$281,287,133|2011| 226|You Only Live Twice|UA|$329,598,600|$43,084,787|1967| 227|X-Men: The Last Stand|Fox|$328,465,300|$234,362,462|2006| 228|The Mummy Returns|Uni.|$327,657,500|$202,019,785|2001| 229|X2: X-Men United|Fox|$327,236,800|$214,949,694|2003| 230|Platoon|Orion|$325,302,500|$138,530,565|1986| 231|Rocky IV|UA|$324,855,400|$127,873,716|1985| 232|Pearl Harbor|BV|$322,017,800|$198,542,554|2001| 233|True Lies|Fox|$321,261,400|$146,282,411|1994| 234|Heaven Can Wait (1978)|Par.|$320,281,100|$81,640,278|1978| 235|Lethal Weapon 3|WB|$320,153,100|$144,731,527|1992| 236|Look Who's Talking|TriS|$319,854,500|$140,088,813|1989| 237|Gladiator|DW|$319,592,900|$187,705,427|2000| 238|Man of Steel|WB|$318,830,300|$291,045,518|2013| 239|Jaws 2|Uni.|$318,717,900|$81,766,007|1978^| 240|Star Trek|Par.|$317,150,800|$257,730,019|2009| 241|The Santa Clause|BV|$316,776,400|$144,833,357|1994| 242|The Amityville Horror|AIP|$316,113,900|$86,432,000|1979| 243|Thor: Ragnarok|BV|$314,143,200|$314,143,225|2017| 244|The Waterboy|BV|$314,053,600|$161,491,646|1998| 245|A Bug's Life|BV|$313,363,900|$162,798,565|1998| 246|A Few Good Men|Col.|$313,069,200|$141,340,178|1992| 247|The Odd Couple|Par.|$312,030,500|$44,527,234|1968| 248|Rocky II|UA|$311,542,700|$85,182,160|1979| 249|Jerry Maguire|Sony|$311,468,800|$153,952,592|1996| 250|The Perfect Storm|WB|$311,027,300|$182,618,434|2000| 251|King Kong|Uni.|$310,014,100|$218,080,025|2005| 252|The Matrix|WB|$309,879,100|$171,479,930|1999| 253|The Amazing Spider-Man|Sony|$309,163,500|$262,030,663|2012| 254|Tarzan|BV|$309,122,000|$171,091,819|1999| 255|Sister Act|BV|$308,813,300|$139,605,150|1992| 256|Hooper|WB|$306,000,000|$78,000,000|1978| 257|The Blind Side|WB|$305,701,600|$255,959,475|2009| 258|The Da Vinci Code|Sony|$304,882,700|$217,536,138|2006| 259|Monsters University|BV|$304,779,900|$268,492,764|2013| 260|All the President's Men|WB|$304,276,100|$70,600,000|1976| 261|What Women Want|Par.|$303,763,400|$182,811,707|2000| 262|The Bourne Ultimatum|Uni.|$303,515,200|$227,471,070|2007| 263|Gravity|WB|$302,369,300|$274,092,705|2013| 264|Honey, I Shrunk the Kids|BV|$302,279,100|$130,724,172|1989| 265|Terms of Endearment|Par.|$301,824,600|$108,423,489|1983| 266|Men in Black II|Sony|$300,868,300|$190,418,803|2002| 267|Star Trek: The Motion Picture|Par.|$300,849,700|$82,258,456|1979| 268|Wedding Crashers|NL|$299,683,200|$209,255,921|2005| 269|Despicable Me|Uni.|$299,217,100|$251,513,985|2010| 270|Pocahontas|BV|$298,782,100|$141,579,773|1995| 271|Arthur|WB|$298,725,900|$95,461,682|1981| 272|The Hunger Games: Mockingjay - Part 2|LGF|$297,446,700|$281,723,902|2015| 273|The LEGO Movie|WB|$296,654,200|$257,760,692|2014| 274|Batman Begins|WB|$295,860,600|$206,852,432|2005^| 275|Apocalypse Now|MGM|$295,789,400|$83,471,511|1979^| 276|Charlie and the Chocolate Factory|WB|$295,677,800|$206,459,076|2005| 277|Big Daddy|Sony|$295,422,100|$163,479,795|1999| 278|Ocean's Eleven|WB|$294,446,200|$183,417,150|2001| 279|Jurassic Park III|Uni.|$293,844,100|$181,171,875|2001| 280|Teenage Mutant Ninja Turtles|NL|$293,555,800|$135,265,915|1990| 281|Planet of the Apes (2001)|Fox|$291,948,200|$180,011,740|2001| 282|Alien|Fox|$291,755,600|$80,931,801|1979^| 283|Hancock|Sony|$291,441,100|$227,946,274|2008| 284|As Good as It Gets|Sony|$290,776,100|$148,478,011|1997| 285|The Hangover Part II|WB|$289,972,400|$254,464,305|2011| 286|Midnight Cowboy|UA|$289,525,900|$44,785,053|1969| 287|The Hobbit: The Desolation of Smaug|WB (NL)|$289,308,500|$258,366,855|2013| 288|The French Connection|Fox|$287,640,000|$51,700,000|1971| 289|The Flintstones|Uni.|$286,669,000|$130,531,208|1994| 290|Captain America: The Winter Soldier|BV|$286,373,800|$259,766,572|2014| 291|Coming to America|Par.|$286,238,000|$128,152,301|1988| 292|National Treasure: Book of Secrets|BV|$286,164,000|$219,964,115|2007| 293|WALL-E|BV|$286,150,300|$223,808,164|2008| 294|The Hobbit: The Battle of the Five Armies|WB (NL)|$285,304,300|$255,119,788|2014| 295|The Silence of the Lambs|Orion|$285,087,900|$130,742,922|1991| 296|The Karate Kid Part II|Col.|$284,812,500|$115,103,979|1986| 297|Airplane!|Par.|$284,796,800|$83,453,539|1980| 298|Alvin and the Chipmunks|Fox|$284,128,700|$217,326,974|2007| 299|Meet the Parents|Uni.|$282,676,300|$166,244,045|2000| 300|Ransom|BV|$282,366,800|$136,492,681|1996|