Python: Read Content of Hidden HTML Table - python

On this webpage there is a "Show Study location" tab, when I click the tab it shows the entire location list and changes the web-address which I included in this program. and when I run the program to print out the entire location list, I get this result:
soup = BeautifulSoup(urllib2.urlopen('https://clinicaltrials.gov/ct2/show/study/NCT01718158?term=NCT01718158&rank=1&show_locs=Y#locn').read())
for row in soup('table')[5].findAll('tr'):
tds = row('td')
if len(tds)<2:
continue
print tds[0].string, tds[1].string #, '\n'.join(filter(unicode.strip, tds[1].strings))
Local Institution None
Local Institution None
Local Institution None
Local Institution None
Local Institution None
and so on..... leaving the rest of the information out. I feel I am missing something here. my result should be:
United States, California
Va Long Beach Healthcare System
Long Beach, California, United States, 90822
United States, Georgia
Gastrointestinal Specialists Of Georgia Pc
Marietta, Georgia, United States, 30060
United States, New York
Weill Cornell Medical College
and so forth. I want to print out the entire location list.

The local institutes are in rows with just one table cell, but you are skipping those.
Perhaps you need to extract the data from all cells and only skip rows without <td> cells here:
for row in soup('table')[5].findAll('tr'):
tds = row('td')
if not tds:
continue
print u' '.join([cell.string for cell in tds if cell.string])
This produces
United States, California
Va Long Beach Healthcare System
Long Beach, California, United States, 90822
United States, Georgia
Gastrointestinal Specialists Of Georgia Pc
Marietta, Georgia, United States, 30060
# ....
Local Institution
Taipei, Taiwan, 100
Local Institution
Taoyuan, Taiwan, 333
United Kingdom
Local Institution
London, Greater London, United Kingdom, SE5 9RS

Related

Selecting random row from dataframe in python based on certain conditions

Hi this is probably a very basic fix but I am just completely stuck and don't know enough about Python to figure out how to go about this myself. I made a dictionary of restaurants in my city and created a data frame of them. The whole program is just supposed to pick a random restaurant out of the dataframe. However, I want it to be able to select random restaurants based on certain things. For instance, "Cuisine" is a category and I want it to be able to select a random restaurant(row) based on cuisine being Mexican. I hope that makes sense because I am very lost.
my code is also below but there is not much to it
import pandas as pd
# Define a dictionary containing employee data
data = {'Restaurant':['August Henrys Burger Bar', 'Bridges & Bourbon', 'The Capital Grille', 'Chinatown Inn', 'Chipotle','Condado Tacos','Crafted North','Cristos Mediterranean Grille','Five Guys','Forbes Tavern','Freshii','Genoa Pizza & Bar','Giovannis Pizza & Pasta','Hello Bistro','Joe and Pie Cafe & Pizzeria','Las Velas','Mandarin Gourmet','McCormick and Schmick','Moes Southwest Grill','Nickys Thai Kitchen','Noodles & Company','The Original Oyster House','Pizza Parma','Primanti Bros','Siam Thai Restaurant','The Simple Greek','SlyFox Taphouse','SoFresh','Villa Reale Pizzeria & Restaurant','The Warren','The Yard'],
'Cuisine':['American', 'American', 'American','Asian', 'Mexican','Mexican','American','Mediterranean','American','American','American','Italian','Italian','American','Italian','Mexican','Asian','American','Mexican','Asian','American','American','Italian','American','Asian','Mediterranean','American','American','Italian','American','American'],
'Address':['946 Penn Avenue 412-765-3270', '930 Penn Avenue 412-586-4287', '301 Fifth Avenue 412-338-9100', '522 Third Avenue 412-261-1291', '211 Forbes Avenue 412-224-5586',',971 Liberty Avenue 412-281-9111','Marriott City Center 412-471-4000','130 6th Street 412-261-6442','Three PPH PLace 412-227-0206','310 Forbes Avenue 412-281-1999','501 Grant Street 412-430-0318','111 Market Street 412-281-6100','123 6th Street 412-281-7060','292 Forbes Avenue 412-434-0100','955 Liberty Avenue 412-738-0603','21 Market Square 412-251-0031','305 Wood Street 412-261-6151','301 Fifth Avenue 412-201-6992','210 Forbes Avenue 412-224-4422','903 Penn Avenue 412-471-8424','476 McMasters Way 412-562-2191','20 Market Square 412-566-7925','963 Liberty Avenue 412-577-7300','2 Market Square 412-261-1599','410 First Avenue 412-281-1122','4313 Market Street 412-261-4976','300 Liberty Avenue 412-586-7474','Five PPG Place Suite 100 412-586-7240','628 Smithfield Street 412-391-3963','245 7th Street 412-201-5888','100 Fifth Avenue 412-291-8182'],
'Operation':['Local', 'Local', 'Franchise', 'Local', 'Franchise','Franchise','Franchise','Local','Franchise','Local','Franchise','Local','Local','Franchise','Franchise','Local','Local','Franchise','Franchise','Local','Franchise','Franchise','Franchise','Franchise','Local','Local','Franchise','Local','Local','Local','Franchise']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
You can first filter by Cusisine and then use sample to pick a random row:
df.loc[df.Cuisine=='Mexican'].sample(1)
Restaurant Cuisine Address Operation
18 Moes Southwest Grill Mexican 210 Forbes Avenue 412-224-4422 Franchise

Pandas : Create a new dataframe from 2 different dataframes using fuzzy matching [duplicate]

I have two data frames with each having a different number of rows. Below is a couple rows from each data set
df1 =
Company City State ZIP
FREDDIE LEES AMERICAN GOURMET SAUCE St. Louis MO 63101
CITYARCHRIVER 2015 FOUNDATION St. Louis MO 63102
GLAXOSMITHKLINE CONSUMER HEALTHCARE St. Louis MO 63102
LACKEY SHEET METAL St. Louis MO 63102
and
df2 =
FDA Company FDA City FDA State FDA ZIP
LACKEY SHEET METAL St. Louis MO 63102
PRIMUS STERILIZER COMPANY LLC Great Bend KS 67530
HELGET GAS PRODUCTS INC Omaha NE 68127
ORTHOQUEST LLC La Vista NE 68128
I joined them side by side using combined_data = pandas.concat([df1, df2], axis = 1). My next goal is to compare each string under df1['Company'] to each string under in df2['FDA Company'] using several different matching commands from the fuzzy wuzzy module and return the value of the best match and its name. I want to store that in a new column. For example if I did the fuzz.ratio and fuzz.token_sort_ratio on LACKY SHEET METAL in df1['Company'] to df2['FDA Company'] it would return that the best match was LACKY SHEET METAL with a score of 100 and this would then be saved under a new column in combined data. It results would look like
combined_data =
Company City State ZIP FDA Company FDA City FDA State FDA ZIP fuzzy.token_sort_ratio match fuzzy.ratio match
FREDDIE LEES AMERICAN GOURMET SAUCE St. Louis MO 63101 LACKEY SHEET METAL St. Louis MO 63102 LACKEY SHEET METAL 100 LACKEY SHEET METAL 100
CITYARCHRIVER 2015 FOUNDATION St. Louis MO 63102 PRIMUS STERILIZER COMPANY LLC Great Bend KS 67530
GLAXOSMITHKLINE CONSUMER HEALTHCARE St. Louis MO 63102 HELGET GAS PRODUCTS INC Omaha NE 68127
LACKEY SHEET METAL St. Louis MO 63102 ORTHOQUEST LLC La Vista NE 68128
I tried doing
combined_data['name_ratio'] = combined_data.apply(lambda x: fuzz.ratio(x['Company'], x['FDA Company']), axis = 1)
But got an error because the lengths of the columns are different.
I am stumped. How I can accomplish this?
I couldn't tell what you were doing. This is how I would do it.
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
Create a series of tuples to compare:
compare = pd.MultiIndex.from_product([df1['Company'],
df2['FDA Company']]).to_series()
Create a special function to calculate fuzzy metrics and return a series.
def metrics(tup):
return pd.Series([fuzz.ratio(*tup),
fuzz.token_sort_ratio(*tup)],
['ratio', 'token'])
Apply metrics to the compare series
compare.apply(metrics)
There are bunch of ways to do this next part:
Get closest matches to each row of df1
compare.apply(metrics).unstack().idxmax().unstack(0)
Get closest matches to each row of df2
compare.apply(metrics).unstack(0).idxmax().unstack(0)

Finding the Metropolitan Area or County for City, State

I have following data in a dataframe:
... location amount
... Jacksonville, FL 40
... Provo, UT 20
... Montgomery, AL 22
... Los Angeles, CA 34
My dataset only contains U.S. cities in the form of [city name, state code] and I have no ZIP codes.
I want to determine either county of a city, in order to visualize my data with ggcounty (like here).
I looked on the website of the U.S. Census Bureau but couldn't really find a table of city,state,county, or similar.
Assuming that I would prefer solving the problem in R only, who has an idea how to solve this?
You can get a ZIP code and more detailed info doing this:
library(ggmap)
revgeocode(as.numeric(geocode('Jacksonville, FL ')))
Hope it helps

How to use python beautifulsoup to get image description from html?

I did not find this answer in other location, so seek your's help:
I had a python code try to access http://news.yahoo.com/rss/entertainment
To get the title and descriptions. but some is in image alt format:
This is my code:
for child in body_tag.contents[0].channel.children:
if (child.__class__ != NavigableString):
if child.title != None :
print "------title----------"
print(child.title.contents[0].encode('ascii','ignore'))
print "-----description-class------------"
mchild=child.find_next("description").contents[0]
print mchild.__class__
print "-------description---------"
print mchild.find_next("img")
print(mchild.encode('ascii','ignore'))
print "-------end---------"
This is part of the output:
------title----------
University of Connecticut revokes Cosby's honorary degree
-----description-class------------
class 'bs4.element.NavigableString'
-------description---------
None
To display it, I use () replace "<" and ">"
(p) (a href="http://news.yahoo.com/university-connecticut-revokes-cosbys-honorary-degree-155552959.html")
(img src="http://l.yimg.com/bt/api/res/1.2/cjgCZP4YBj7M6SmdpoGj.Q--/YXBwaWQ9eW5ld3NfbGVnbztmaT1maWxsO2g9ODY7cT03NTt3PTEzMA--/http://media.zenfs.com/en_us/News/ap_webfeeds/7b35f971ec59428491aef6308db4567e.jpg" width="130" height="86" alt="FILE - In this May 24, 2016 file photo, Bill Cosby departs the Montgomery County Courthouse after a preliminary hearing, in Norristown, Pa. A 72-year-old New Hampshire woman who says Bill Cosby raped her in 1965 has withdrawn her civil defamation lawsuit against the comedian after a federal judge had allowed the case to move forward. (AP Photo/Matt Rourke, File)" align="left" title="FILE - In this May 24, 2016 file photo, Bill Cosby departs the Montgomery County Courthouse after a preliminary hearing, in Norristown, Pa. A 72-year-old New Hampshire woman who says Bill Cosby raped her in 1965 has withdrawn her civil defamation lawsuit against the comedian after a federal judge had allowed the case to move forward. (AP Photo/Matt Rourke, File)" border="0" /((/a)STORRS, Conn. (AP) The University of Connecticut on Wednesday revoked an honorary degree awarded to Bill Cosby, saying he engaged in conduct "incongruent" with the university's values.(/p((br) clear="all"/)
-------end---------
-------end---------
How could I get the tile inside the img tag:
title="FILE - In this May 24, 2016 file photo,
I tried to find_next("img") and others but I couldn't get them.
So you want all the text from the description and the title from any img tags, you can find all the decription tags then turn the description.text in a BeautifulSoup object then look for the img in that try to pull either the title or alt attribute, to find the matching title find the previous title to the description tag:
for desc in soup.find_all("description"):
d = BeautifulSoup(desc.text,"lxml")
img = d.find("img")
print("Title = {}".format(desc.find_previous("title").text))
img_text = img.get("title") or img.get("alt","") if img else ""
print("Decscription = {}\n" .format(d.find(text=True) + img_text))
Which gives you:
Title = Entertainment News Headlines — Yahoo! News
Decscription = Get the latest entertainment news headlines from Yahoo! News. Find breaking entertainment news, including analysis and opinion on top entertainment stories.
Title = Spotify's Top 10 most streamed tracks
Decscription = The following list represents the most streamed tracks on Spotify, based on the number of people who shared it divided by the number who listened to it, from Monday, Oct. 20 to Sunday Oct. 26 via Facebook, Tumblr, Twitter and Spotify.FILE - In this Sept. 7, 2012 file photo, musician Robin Thicke performs during Macy's Passport presents Glamorama 2012 at The Orpheum Theatre in Los Angeles. Thicke's "Blurred Lines (feat. T.I. & Pharrell)" was the top streamed tracks on Spotify from Monday, June 10, to Sunday, June 16, 2013. (Photo by Matt Sayles/Invision/AP, File)
Title = Who will win at the Tony Awards? AP predicts
Decscription = NEW YORK (AP) — The great comedian W.C. Fields is credited with the line, "Never work with children or animals." He would have had trouble on Broadway this season.This theater image released by The O+M Company shows the cast during a performance of the musical "Kinky Boots." The Cyndi Lauper-scored "Kinky Boots," based on the 2005 British movie about a real-life shoe factory that struggles until it finds new life in fetish footwear, is nominated for 13 Tony Award nominations. The awards will be broadcast on CBS from Radio City Music Hall on June 9. (AP Photo/The O+M Company, Matthew Murphy)
Title = The top iPhone and iPad apps on App Store
Decscription = App Store Official Charts for the week ending November 3, 2014:
Title = Fairey: 'Vindicated' by dismissal of Detroit tagging case
Decscription = DETROIT (AP) — Graffiti artist Shepard Fairey says he feels "relieved and vindicated" now that a malicious destruction of property case in Detroit has been dismissed.
Title = FBI seeks Rockwell painting on 40th anniversary of its theft
Decscription = CHERRY HILL, N.J. (AP) — Federal authorities are seeking the public's help in recovering a 1919 Norman Rockwell painting on the 40th anniversary of its theft from a New Jersey home.
Title = APNewsBreak: Union has deal with 4th Atlantic City casino
Decscription = ATLANTIC CITY, N.J. (AP) — Atlantic City's main casino workers union reached agreement Thursday with four of the five casinos it had been targeting for a strike this weekend.Union members cheer as they discuss preparations for a strike against as many as five of the city's eight casinos in Atlantic City, N.J. on Wednesday June 29, 2016. Local 54 of the Unite-HERE union says it will go on strike Friday if it can't reach new contracts with three casinos owned by Caesars Entertainment (Bally's, Caesars and Harrah's) and two casinos owned by billionaire investor Carl Icahn (the Tropicana and the Trump Taj Mahal). About 6,500 of the union's nearly 10,000 workers are at the five hotels. (AP Photo/Wayne Parry)
Title = The Latest: APNewsBreak: Union has deal with 4th casino
Decscription = The Latest on contract negotiations with casinos (all times local): 4:35 p.m. Atlantic City's main casino workers union has reached agreement with the fourth of five casinos it had been targeting for a ...Union members cheer as they discuss preparations for a strike against as many as five of the city's eight casinos in Atlantic City, N.J. on Wednesday June 29, 2016. Local 54 of the Unite-HERE union says it will go on strike Friday if it can't reach new contracts with three casinos owned by Caesars Entertainment (Bally's, Caesars and Harrah's) and two casinos owned by billionaire investor Carl Icahn (the Tropicana and the Trump Taj Mahal). About 6,500 of the union's nearly 10,000 workers are at the five hotels. (AP Photo/Wayne Parry)
Title = CBS reporter traveling to 59 parks in a year
Decscription = NEW YORK (AP) — Conor Knighton didn't take the easy route when he proposed a "CBS Sunday Morning" story on the National Park Service's centennial.
Title = Here come the virtual reality Olympics ... for Samsung users
Decscription = NEW YORK (AP) — Athletes in Rio will compete to be the fastest sprinter and highest jumper at the Olympics this August. But there's another test underway as well: How well can virtual reality capture sporting events?This photo provided by NBC and HD Studio shows NBC's daytime and late night set for the Rio Olympics located on Copacabana Beach in Rio. NBC says it will provide 85 hours of virtual reality programming during the Rio Olympics in August, but only to users of Samsung Galaxy smartphones and the Samsung Gear VR headset. (HD Studio/Courtesy of NBC via AP)
Title = Oscars timetable for 2017 revealed
Decscription = Movie buffs, mark your calendars: your 2017 Oscars party will be on Sunday, February 26. The Academy of Motion Picture Arts and Sciences announced the timetable for the 89th Oscars on Thursday, one day after it announced that it had invited a record number of artists to join the body, the majority of them women and people of color.A view of the Oscars logo at the 88th Annual Academy Awards nominee luncheon on February 8, 2016 in Beverly Hills, California
Title = Queen marks deadly Somme centenary at Westminster Abbey
Decscription = LONDON (AP) — Queen Elizabeth II attended a service at Westminster Abbey on Thursday, the eve of the centenary of the Battle of the Somme, one of the deadliest chapters of World War I.
Title = Rob Wasserman, accomplished bass player, dead at 64
Decscription = NEW YORK (AP) — Rob Wasserman, a highly respected bass player and composer who performed and recorded with Lou Reed, Neil Young, Brian Wilson and many other musicians, has died. He was 64.
Title = Documents filed by some Prince claimants to become public
Decscription = CHASKA, Minn. (AP) — A Minnesota judge overseeing the legal proceedings about Prince's estate will allow documents filed by some claimants to become public.
Title = The Latest: Oprah Winfrey to appear at Essence Festival
Decscription = NEW ORLEANS (AP) — The Latest on the annual Essence Festival held over the July 4th holiday in New Orleans (all times local):FILE - In this Jan. 20, 2009, file photo, Mariah Carey performs at the Neighborhood Inaugural Ball in Washington. Music is at the heart of the annual Essence Festival in New Orleans, and this year is no different. Fans will get to hear from first-timers Mariah Carey, Puff Daddy and Jeremih as well as from festival veterans Charlie Wilson, Maxwell, New Edition, Tyrese and Lalah Hathaway - all of whom are scheduled to perform inside the Superdome Friday, July 1, 2016, through Sunday. (AP Photo/Alex Brandon, File)
Title = Brad Paisley: West Virginia floods shocking, heartbreaking
Decscription = CHARLESTON, W.Va. (AP) — Brad Paisley said he's shocked and heartbroken by the destruction from deadly flooding in his home state of West Virginia.Principal Mike Kelley walks through a hallway that is filled with slick mud at Herbert Hoover High School in Clendenin, W.Va., Monday, June 27, 2016. The first floor hallways and rooms of the school are caked in 3-5 inches mud, which was left by over six feet of flood water that swamped the building late last week. (Sam Owens/Charleston Gazette-Mail via AP)
Title = Chechen leader Kadyrov seeks apprentice on reality TV show
Decscription = MOSCOW (AP) — Another powerful, controversial man is taking to reality TV to find an assistant — not Donald Trump but the leader of Chechnya.FILE - In this Wednesday March 23, 2016 file photo, Chechen regional leader Ramzan Kadyrov addresses a rally marking the 13th anniversary of the adoption of the Constitution of Russian region of Chechnya, in the regional capital of Grozny, Russia. Russian state television on Thursday is to broadcast the opening episode of "Live - The Team," in which participants compete to become an assistant to leader of Chechnya Ramzan Kadyrov. (AP Photo/Musa Sadulayev, File)
Title = With an eye to Tuscany, Debi Mazar plots culinary future
Decscription = NEW YORK (AP) — Debi Mazar and her brood spend at least a month in Tuscany each year, but if the "Younger" actress had her way, the region would be a far more permanent fixture in her life.FILE - In this Wednesday, Jan. 6, 2016 file photo, Debi Mazar speaks during the "Younger" panel at the TV Land 2016 Winter TCA in Pasadena, Calif. After the success of her award-winning cooking show "Extra Virgin," Mazar's creative juices are still flowing, as the actress talks about the possibility of another show and more of her culinary dreams. (Photo by Richard Shotwell/Invision/AP)
Title = Wisecracking De Niro touts Catskills with NY governor
Decscription = BETHEL, N.Y. (AP) — Robert De Niro is conjuring the legacy — and the stand-up jokes — of comedians like Rodney Dangerfield, Henny Youngman and Milton Berle while praising the natural beauty of New York's Catskills region.
Title = Music Review: Sara Watkins branches out
Decscription = Sara Watkins, "Young in All the Wrong Ways" (New West Records)FILE - In this July 29, 2012 file photo, Sara Watkins performs at the Newport Folk Festival in Newport, R.I. Watkins describes her latest venture as “a breakup album with myself,” but it seems like there might have been someone else involved. The songs on her new album, “Young in All the Wrong Ways,” have bite to them. There is anger here, a jarring departure from Watkins’ previous work. A couple of the songs push into hard-edged rock, her voice straining against a jagged electric guitar. (AP Photo/Joe Giblin)
Title = Disney Animation's 'Wreck-It Ralph 2' set for March 2018
Decscription = LOS ANGELES (AP) — "Wreck-It Ralph" is headed back to the arcade, and theaters, in a sequel planned for release on March 9, 2018. Co-directors Rich Moore and Phil Johnston announced the sequel to the 2012 animated film Thursday morning on Facebook Live.FILE - In this Oct. 29, 2012 file photo, Director Rich Moore arrives at the world premiere of "Wreck-It Ralph" at El Capitan Theatre in Los Angeles. “Wreck-It Ralph” is headed back to the arcade, and theaters, in a sequel planned for release on March 9, 2018. Co-directors Rich Moore and Phil Johnston announced the sequel to the 2012 animated film Thursday, June 30, 2016 on Facebook Live. (Photo by Jordan Strauss/Invision/AP)
Title = Scarlett Johansson ranked Hollywood's top-grossing actress
Decscription = Scarlett Johansson has taken the crown as Hollywood's highest-grossing actress ever.FILE - In this April 21, 2015, file photo, Scarlett Johansson poses for photographers upon arrival at the premiere for the film 'The Avengers Age of Ultron' in London. Box Office Mojo has crowned Johansson as Hollywood's highest grossing actress on a list updated June 29, 2016.(Photo by Joel Ryan/Invision/AP, File)
Title = HLN's Nancy Grace leaving her legal show
Decscription = NEW YORK (AP) — Tough-talking former prosecutor Nancy Grace is leaving her prime-time show on the HLN network in October.FILE - In this Friday, Oct. 21, 2014, file photo, television host Nancy Grace arrives at the 7th annual GLSEN Respect Awards in Beverly Hills, Calif. Grace is leaving her prime-time show on the HLN network in October 2016. The CNN sister station said Grace told her staff on Thursday, June 30, 2016 that her show would be ending after 12 years. An HLN spokeswoman said the network had no immediate announcement on what program would go in its place. (AP Photo/Matt Sayles, File)
Title = Moviegoers to Hollywood: It better be good
Decscription = NEW YORK (AP) — As Hollywood girds for a low-key Fourth of July box office weekend and watches its summer season dip 15 percent below last year's, an even more worrisome trend has taken shape: Moviegoers are growing pickier.FILE - This image released by Warner Bros. Entertainment shows Alexander Skarsgard from "The Legend of Tarzan." For films that aren’t “the movie to see,” moviegoers are increasingly staying home. With word-of-mouth traveling at the speed of Twitter, quality has become a more vital currency. (Jonathan Olley/Warner Bros. Entertainment via AP, File)
Title = 8 rescued after Oklahoma City roller coaster gets stuck
Decscription = OKLAHOMA CITY (AP) — No one was injured when a roller coaster at an Oklahoma City amusement park stalled out and stranded eight people, including seven children.
Title = Smallest national park? Kosciuszko, forgotten son of liberty
Decscription = PHILADELPHIA (AP) — If the hip-hop Broadway smash "Hamilton" can reignite interest in the first U.S. treasury secretary, what will it take to drum up interest in another forgotten hero from America's fight for independence?FILE - In this April 1, 2013 file photo a statue of Poland's General Thaddeus Kosciuszko is enveloped in the early morning fog in Lafayette Park across from the White House in Washington. Kosciuszko was a military engineer from Poland, Kosciuszko came to Philadelphia in August 1776 to offer his services in the fight against the British. (AP Photo/Jacquelyn Martin, File)
Title = Pregnant Alanis Morissette posts nude underwater photo
Decscription = Alanis Morissette has posted a nude photo of herself sporting a large baby bump while floating underwater.FILE - In this Nov. 22, 2015, file photo, Souleye, left, and Alanis Morissette arrive at the American Music Awards in Los Angeles. Morissette posted a nude photo of herself sporting a large baby bump while floating underwater on Instagram on June 28, 2016. (Photo by Jordan Strauss/Invision/AP, File)
Title = New Orleans ready to 'party with a purpose' at Essence Fest
Decscription = NEW ORLEANS (AP) — Music has always been at the heart of the annual Essence Festival, now in its 22nd year, and this year will be no different.FILE - In this Jan. 20, 2009, file photo, Mariah Carey performs at the Neighborhood Inaugural Ball in Washington. Music is at the heart of the annual Essence Festival in New Orleans, and this year is no different. Fans will get to hear from first-timers Mariah Carey, Puff Daddy and Jeremih as well as from festival veterans Charlie Wilson, Maxwell, New Edition, Tyrese and Lalah Hathaway - all of whom are scheduled to perform inside the Superdome Friday, July 1, 2016, through Sunday. (AP Photo/Alex Brandon, File)
Title = Alvin Toffler, author of 'Future Shock,' dead at 87
Decscription = NEW YORK (AP) — Alvin Toffler, a guru of the post-industrial age whose million-selling "Future Shock" and other books anticipated the disruptions and transformations brought about by the rise of digital technology, has died. He was 87.
Title = Theater shows R-rated comedy trailer with "Finding Dory"
Decscription = CONCORD, Calif. (AP) — The owner of a California movie theater is apologizing after a trailer for an R-rated upcoming Seth Rogen comedy was shown ahead of a screening of Disney's "Finding Dory."FILE - This undated file image released by Disney shows the character Dory, voiced by Ellen DeGeneres, in a scene from "Finding Dory." In its second week, “Finding Dory” easily remained on top with an estimated $73.2 million, according to studio estimates Sunday, June 26, 2016. (Pixar/Disney via AP, File)
Title = Christie's to sell contents of Reagans' LA home
Decscription = NEW YORK (AP) — A two-day auction of the contents of Ronald and Nancy Reagan's ranch-style home in California will include everything from personal mementos from heads of state and friends to objects the couple took with them to the White House.This undated photo provided by Christie's shows a needlepoint cushion given to Ronald Reagan for his 70th birthday in 1981. The pillow, which will be sold by Christie's New York during a two-day auction of the contents of Ronald and Nancy Reagan's ranch-style home in California, has a pre-sale estimate of $1,000-1,500. Christie’s announced Thursday, June 30, 2016, highlights of the Sept. 21-22 sale in New York City. (Christie's via AP) MANDATORY CREDIT
Title = Asian actors too busy to fret over Hollywood 'white-washing'
Decscription = TOKYO (AP) — The film world of Asia, known for producing Akira Kurosawa, Satyajit Ray, Brillante Mendoza and other greats, is too busy making movies of its own to fret much about the debate slamming Hollywood — the casting of white people in roles written for Asians.FILE - In this Sept. 5, 2007, file photo, Japanese actress Kaori Momoi poses during the photo call for the movie "Sukiyaki Western Django" at the 64th Venice Film Festival, in Venice, Italy. The film world of Asia is too busy making movies of its own to fret much about the debate slamming Hollywood - the casting of white people in roles written for Asians. Momoi, who appeared in “Memoirs of a Geisha,” as well as Russian filmmaker Aleksandr Sokurov’s “The Sun,” suggested acting was ultimately about individual talent, not skin color or nationality. (AP Photo/Andrew Medichini, File)
Title = Film academy invites 683 new members to join
Decscription = LOS ANGELES (AP) — Six months after announcing intentions to double the number of female and minority members in its ranks by 2020, the Academy of Motion Picture Arts and Sciences has invited 683 new members to join the organization.FILE - In this March 2, 2014 file photo, an Oscar statue is displayed at the Oscars at the Dolby Theatre in Los Angeles. Six months after announcing intentions to double the number of female and minority members in its ranks, the Academy of Motion Picture Arts and Sciences has invited 683 new members to join the organization. The academy says its invitees are 46 percent female, 41 percent minority and represent 59 countries.(Photo by Matt Sayles/Invision/AP, File)
Title = Miss Teen USA pageant replaces swimsuits with athletic wear
Decscription = LAS VEGAS (AP) — The Miss Teen USA pageant is dropping the swimsuit portion of its competition.
Title = YouTube personality charged with making false police report
Decscription = LOS ANGELES (AP) — A gay YouTube personality who said he was assaulted outside a West Hollywood club has been charged with filing a false police report and faking his injuries.This Wednesday, June 29, 2016, photo released by Los Angeles County Sheriff's Department shows Calum McSwiggan. The London-native gay YouTube personality who said he was assaulted outside a West Hollywood club has been charged with filing a false police report and faking his injuries. (Los Angeles County Sheriff's Department via AP) MANDATORY CREDIT
Title = Jesus Christ film coming to virtual reality
Decscription = LOS ANGELES (AP) — The story of Jesus Christ is coming to virtual reality for the first time.This undated photo provided by Autumn VR Inc. and VRWERX, LLC, shows a production still from "Jesus VR - The Story of Christ." The story of Jesus Christ is coming to virtual reality for the first time. Autumn Productions and VRWerx announced plans Wednesday, June 29, 2016, to release the live-action film on all major VR platforms this Christmas. (Autumn VR Inc. and VRWERX, LLC via AP)
Title = The Latest: Celebrities record tribute to nightclub victims
Decscription = ORLANDO, Fla. (AP) — The Latest on the mass shooting at a gay Orlando nightclub that left 49 people dead (all times local):
Title = The Latest: Golfer Bubba Watson plans to help flood victims
Decscription = CHARLESTON, W.Va. (AP) — The Latest on flooding that has devastated parts of West Virginia (all times local):
Title = Twitter dominated by tongue-in-cheek #HeterosexualPrideDay
Decscription = What appears to be a tongue-in-cheek social media movement to mark June 29 as a day to celebrate heterosexual pride has become one of the day's top online trends.
Title = Miss Teen USA axes 'outdated' bikini competition
Decscription = One of America's top beauty pageants has axed its swimsuit competition, ditching bikinis for sportswear to fend off years of complaints that parading in a bikini is sexist and demeaning. The Miss Universe Organization, which operates the pageant, said from now on contestants would be judged on athletic wear, in addition to the evening wear and personality competitions. "Miss Teen USA's transition to athletic wear reads as less exploitative and more focused on the importance of physical fitness for its younger participants," it said.Miss Teen USA 2016 Katherine Haik (R) congratulates Miss District of Columbia USA 2016 Deshauna Barber during the 2016 Miss USA pageant at T-Mobile Arena on June 5, 2016 in Las Vegas, Nevada
Title = Kayne West, Adidas expand partnership for Yeezy line
Decscription = LOS ANGELES (AP) — Rapper Kanye West and Adidas are expanding their partnership that began almost two years ago with retail hubs for his Yeezy products and additional sportswear designs.FILE - In this Aug. 30, 2015, file photo, Kanye West accepts the video vanguard award at the MTV Video Music Awards at the Microsoft Theater in Los Angeles. West and Adidas are expanding their partnership that began almost two years ago with retail hubs for his Yeezy products and additional sportswear designs. The sportswear company announced the collaboration on Wednesday, June 29, 2016, and described it as the most significant partnership between a non-athlete and an athletic brand. (Photo by Matt Sayles/Invision/AP, File)
You cannot find every title first and then the following description as not all titles are related to a description but all descriptions are related to a title.

writing and saving CSV file from scraping data using python and Beautifulsoup4

I am trying to scrape data from the PGA.com website to get a table of all of the golf courses in the United States. In my CSV table I want to include the Name of the golf course ,Address ,Ownership ,Website , Phone number. With this data I would like to geocode it and place into a map and have a local copy on my computer
I utilized Python and Beautiful Soup4 to extract my data. I have reached as far to extract the data from the website but I am having difficulty on writing the script to export the data into a CSV file displaying the parameters I need.
Attached below is my script. I need help on creating code that will transfer my extracted code into a CSV file and how to save it into my desktop.
Here is my script below:
import csv
import requests
from bs4 import BeautifulSoup
url = "http://www.pga.com/golf-courses/search?searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0"
r = requests.get(url)
soup = BeautifulSoup(r.content)
g_data1=soup.find_all("div",{"class":"views-field-nothing-1"})
g_data2=soup.find_all("div",{"class":"views-field-nothing"})
for item in g_data1:
try:
print item.contents[1].find_all("div",{"class":"views-field-counter"})[0].text
except:
pass
try:
print item.contents[1].find_all("div",{"class":"views-field-course-type"})[0].text
except:
pass
for item in g_data2:
try:
print item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
except:
pass
try:
print item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
except:
pass
try:
print item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
except:
pass
This is what I currently get when I run the script. I want to take this data and make into a CSV table for geocoding later.
1801 Merrimac Trl
Williamsburg, Virginia 23185-5905
12551 Glades Rd
Boca Raton, Florida 33498-6830
Preserve Golf Club
13601 SW 115th Ave
Dunnellon, Florida 34432-5621
1000 Acres Ranch Resort
465 Warrensburg Rd
Stony Creek, New York 12878-1613
1757 Golf Club
45120 Waxpool Rd
Dulles, Virginia 20166-6923
27 Pines Golf Course
5611 Silverdale Rd
Sturgeon Bay, Wisconsin 54235-8308
3 Creek Ranch Golf Club
2625 S Park Loop Rd
Jackson, Wyoming 83001-9473
3 Lakes Golf Course
6700 Saltsburg Rd
Pittsburgh, Pennsylvania 15235-2130
3 Par At Four Points
8110 Aero Dr
San Diego, California 92123-1715
3 Parks Fairways
3841 N Florence Blvd
Florence, Arizona 85132
3-30 Golf & Country Club
101 Country Club Lane
Lowden, Iowa 52255
401 Par Golf
5715 Fayetteville Rd
Raleigh, North Carolina 27603-4525
93 Golf Ranch
406 E 200 S
Jerome, Idaho 83338-6731
A 1 Golf Center
1805 East Highway 30
Rockwall, Texas 75087
A H Blank Municipal Course
808 County Line Rd
Des Moines, Iowa 50320-6706
A-Bar-A Ranch Golf Course
Highway 230
Encampment, Wyoming 82325
A-Ga-Ming Golf Resort, Sundance
627 Ag A Ming Dr
Kewadin, Michigan 49648-9397
A-Ga-Ming Golf Resort, Torch
627 Ag A Ming Dr
Kewadin, Michigan 49648-9397
A. C. Read Golf Club, Bayou
Bldg 3495, Nas Pensacola
Pensacola, Florida 32508
A. C. Read Golf Club, Bayview
Bldg 3495, Nas Pensacola
Pensacola, Florida 32508
All you really need to do here is put your output in a list and then use the CSV library to export it. I'm not entirely clear on what you are getting out views-field-nothing-1 but to just focus on view-fields-nothing, you could do something like:
courses_list=[]
for item in g_data2:
try:
name=item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
except:
name=''
try:
address1=item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
except:
address1=''
try:
address2=item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
except:
address2=''
course=[name,address1,address2]
courses_list.append(course)
This will put the courses in a list, next you can write them to a cvs like so:
import csv
with open ('filename.cv','wb') as file:
writer=csv.writer(file)
for row in course_list:
writer.writerow(row)
First of all you want to put all of your items in a list and then write to a file later in case there is an error while you are scrapping. Instead of printing just append to a list.
Then you can write to a csv file
f= open('filename', 'wb')
csv_writer = csv.writer(f)
for i in main_list:
csv_writer.writerow(i)
f.close()

Categories

Resources