Hey guess so I got as far as being able to add the a class to a list. The problem is I just want the href link to be added to the links_with_text list and not the entire a class. What am I doing wrong?
from bs4 import BeautifulSoup
from requests import get
import requests
URL = "https://news.ycombinator.com"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id = 'hnmain')
articles = results.find_all(class_="title")
links_with_text = []
for article in articles:
link = article.find('a', href=True)
links_with_text.append(link)
print('\n'.join(map(str, links_with_text)))
This prints exactly how I want the list to print but I just want the href from every a class not the entire a class. Thank you
To get all links from the https://news.ycombinator.com, you can use CSS selector 'a.storylink'.
For example:
from bs4 import BeautifulSoup
from requests import get
import requests
URL = "https://news.ycombinator.com"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
links_with_text = []
for a in soup.select('a.storylink'): # <-- find all <a> with class="storylink"
links_with_text.append(a['href']) # <-- note the ['href']
print(*links_with_text, sep='\n')
Prints:
https://blog.mozilla.org/futurereleases/2020/06/18/introducing-firefox-private-network-vpns-official-product-the-mozilla-vpn/
https://mxb.dev/blog/the-return-of-the-90s-web/
https://github.blog/2020-06-18-introducing-github-super-linter-one-linter-to-rule-them-all/
https://www.sciencemag.org/news/2018/11/why-536-was-worst-year-be-alive
https://www.strongtowns.org/journal/2020/6/16/do-the-math-small-projects
https://devblogs.nvidia.com/announcing-cuda-on-windows-subsystem-for-linux-2/
https://lwn.net/SubscriberLink/822568/61d29096a4012e06/
https://imil.net/blog/posts/2020/fakecracker-netbsd-as-a-function-based-microvm/
https://jepsen.io/consistency
https://tumblr.beesbuzz.biz/post/621010836277837824/advice-to-young-web-developers
https://archive.org/search.php?query=subject%3A%22The+Navy+Electricity+and+Electronics+Training+Series%22&sort=publicdate
https://googleprojectzero.blogspot.com/2020/06/ff-sandbox-escape-cve-2020-12388.html?m=1
https://apnews.com/1da061ce00eb531291b143ace0eed1c9
https://support.apple.com/library/content/dam/edam/applecare/images/en_US/appleid/android-apple-music-account-payment-none.jpg
https://standpointmag.co.uk/issues/may-june-2020/the-healing-power-of-birdsong/
https://steveblank.com/2020/06/18/the-coming-chip-wars-of-the-21st-century/
https://www.videolan.org/security/sb-vlc3011.html
https://onesignal.com/careers/2023b71d-2f44-4934-a33c-647855816903
https://www.bbc.com/news/world-europe-53006790
https://github.com/efficient/HOPE
https://everytwoyears.org/
https://www.historytoday.com/archive/natural-histories/intelligence-earthworms
https://cr.yp.to/2005-590/powerpc-cwg.pdf
https://quantum.country/
http://www.crystallography.net/cod/
https://parkinsonsnewstoday.com/2020/06/17/tiny-magnetically-powered-implant-may-be-future-of-deep-brain-stimulation/
https://spark.apache.org/releases/spark-release-3-0-0.html
https://arxiv.org/abs/1712.09624
https://www.washingtonpost.com/technology/2020/06/18/data-privacy-law-sherrod-brown/
https://blog.chromium.org/2020/06/improving-chromiums-browser.html
I am attempting to scrape the following web page
https://www.betexplorer.com/tennis/wta-singles/dubai/siniakova-katerina-kvitova-petra/6ZCipZ9h/#ha
I am fine with scraping player names, the date, the score, however, I am running into trouble when trying to scrape the match odds of the different bookmakers (listed in the table)
Here is what I attempted
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.betexplorer.com/tennis/wta-singles/dubai/siniakova-katerina-kvitova-petra/6ZCipZ9h/')
soup = BeautifulSoup(r.text,'html.parser')
Odds = soup.find_all('td', attrs= {'class':'table-main__detail-odds table-main__detail-odds--first'})
print(odds)
[]
As you can see, nothing is being found.
Any ideas on this?
Thanks
The class you seem to be looking for is table-main__odds as per the page source.
For example:
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.betexplorer.com/tennis/wta-singles/dubai/siniakova-katerina-kvitova-petra/6ZCipZ9h/')
soup = BeautifulSoup(r.text, 'html.parser')
odds = [x.attrs for x in soup.find_all('td', attrs={'class': 'table-main__odds'})]
print(odds)
Output:
[{u'class': [u'table-main__odds'],
u'data-odd': u'3.46',
u'data-odd-max': u'3.90'},
{u'class': [u'table-main__odds', u'colored']},
{u'class': [u'table-main__odds'],
u'data-odd': u'3.58',
u'data-odd-max': u'3.92'},
{u'class': [u'table-main__odds', u'colored']}]
Again I am having trouble scraping href's in BeautifulSoup. I have a list of pages that I am scraping and I have the data but I can't seem to get the hrefs even when I use various codes that work in other scripts.
So here is the code and my data will be below that:
import requests
from bs4 import BeautifulSoup
with open('states_names.csv', 'r') as reader:
states = [states.strip().replace(' ', '-') for states in reader]
url = 'https://www.hauntedplaces.org/state/alabama'
for state in states:
page = requests.get(url+state)
soup = BeautifulSoup(page.text, 'html.parser')
links = soup.findAll('div', class_='description')
# When I try to add .get('href') I get a traceback error. Am I trying to scrape the href too early?
h_page = soup.findAll('h3')
<h3>Gaines Ridge Dinner Club</h3>
<h3>Purifoy-Lipscomb House</h3>
<h3>Kate Shepard House Bed and Breakfast</h3>
<h3>Cedarhurst Mansion</h3>
<h3>Crybaby Bridge</h3>
<h3>Gaineswood Plantation</h3>
<h3>Mountain View Hospital</h3>
This works perfectly:
from bs4 import BeautifulSoup
import requests
url = 'https://www.hauntedplaces.org/state/Alabama'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
for link in soup.select('div.description a'):
print(link['href'])
Try that:
soup = BeautifulSoup(page.content, 'html.parser')
list0 = []
possible_links = soup.find_all('a')
for link in possible_links:
if link.has_attr('href'):
print (link.attrs['href'])
list0.append(link.attrs['href'])
print(list0)
I am currently crawling a web page (https://www.klook.com/city/30-kyoto/?p=1) using Python 3.4 and bs4 in order to collect the deeplinks of the respective activities.
I found that the links are located in the html source like this:
<a class="j_activity_item_link" href="/activity/1031-arashiyama-rickshaw-tour-kyoto/" class="j_activity_item_link" data-card-tags="{}" data-sold-out="false" data-price="40.0" data-city-id="30" data-id="1031" data-url-seo="arashiyama-rickshaw-tour-kyoto">
But after several trials, this href="/activity/1031-arashiyama-rickshaw-tour-kyoto/" never show up.
Here is my logic so far:
import requests
from bs4 import BeautifulSoup
user_agent = {'User-agent': 'Chrome/43.0.2357'}
for page in range(1,6):
r = requests.get("https://www.klook.com/city/30-kyoto" + "/?p=" + str(page))
soup = BeautifulSoup(r.content, "lxml")
g_data = soup.find_all("a", {"class": "j_activity_item_link"})
for item in g_data:
Deeplink = item.find_all("a")
for t in Deeplink:
print(t.get("href"))
Output:
Process finished with exit code 0
Could you guys help me put? Any feedback is appreciated.
Your "error" of error code 0 simply indicates that everything went ok with your run. According to your example, your list g_data should contain all of the a tags that you are interested in. You should not need the second for loop to again iterate through and find nested a tags. As a debugging step, print the length of your lists to ensure that they are not empty. See the following:
import requests
from bs4 import BeautifulSoup
user_agent = {'User-agent': 'Chrome/43.0.2357'}
for page in range(1,6):
r = requests.get("https://www.klook.com/city/30-kyoto" + "/?p=" + str(page))
soup = BeautifulSoup(r.content, "lxml")
g_data = soup.find_all("a", {"class": "j_activity_item_link"})
for item in g_data:
print(item.get("href"))
You can first find the number of pages of activities, and then use regex with BeautifulSoup:
import re
from bs4 import BeautifulSoup as soup
data = soup(str(urllib.urlopen('https://www.klook.com/city/30-kyoto/?p=1').read()), 'lxml')
page_numbers = [i.text for i in data.find_all('a', {'class':'p_num '})]
activities = {1:[i['href'] for i in data.find_all('a', {'href':re.compile("^/activity/")})]}
for page in page_numbers:
data = soup(str(urllib.urlopen('https://www.klook.com/city/30-kyoto/?p={}'.format(page)).read()), 'lxml')
activities[int(page)] = [i['href'] for i in data.find_all('a', {'href':re.compile("^/activity/")})]
Output:
{1: ['/activity/1079-one-day-kimono-rental-kyoto/', '/activity/1032-higashiyama-rickshaw-tour-kyoto/', '/activity/6128-kyoto-seaside-day-tour-osaka/', '/activity/1540-hankyu-1-day-tourist-pass-osaka/', '/activity/1777-icoca-ic-card-kyoto/', '/activity/1541-kix-airport-limousine-bus-transfer-kyoto/', '/activity/1753-randen-kyoto-bus-subway-1-day-pass-kyoto/', '/activity/3260-sagano-romantic-train-ticket-kyoto/', '/activity/793-japanese-lzakaya-cooking-course-kyoto/', '/activity/882-nishiki-market-teramachi-street-kyoto/', '/activity/792-morning-bento-cooking-course-kyoto/', '/activity/2918-sushi-class-experience-kyoto/', '/activity/6032-ninja-kyoto-restaurant-labyrinth-kyoto/', '/activity/5215-garden-ryokan-nanzenji-yachiyo-kyoto/', '/activity/1079-one-day-kimono-rental-kyoto/', '/activity/3260-sagano-romantic-train-ticket-kyoto/', '/activity/675-wifi-device-japan-kyoto/', '/activity/1031-arashiyama-rickshaw-tour-kyoto/', '/activity/657-day-trip-hiroshima-miyajima-kyoto/', '/activity/4774-4G-wifi-kyoto/', '/activity/2826-gionya-kimono-rental-kyoto/', '/activity/1464-kyoto-tower-admission-ticket-kyoto/', '/activity/2249-sagano-romantic-train-ticket-kyoto/', '/activity/1777-icoca-ic-card-kyoto/', '/activity/1541-kix-airport-limousine-bus-transfer-kyoto/', '/activity/1540-hankyu-1-day-tourist-pass-osaka/', '/activity/3532-wifi-device-japan-kyoto/', '/activity/1753-randen-kyoto-bus-subway-1-day-pass-kyoto/', '/activity/1319-4g-wifi-device-kyoto/', '/activity/1447-wi-ho-japan-wifi-device-kyoto/', '/activity/3826-wifi-device-japan-kyoto/', '/activity/2699-japan-wifi-device-taiwan-kyoto/', '/activity/3652-wifi-device-singapore-kyoto/', '/activity/1122-wi-ho-japan-wifi-device-kyoto/', '/activity/719-japan-docomo-sim-card-kyoto/', '/activity/6128-kyoto-seaside-day-tour-osaka/', '/activity/6241-nanzen-ji-fushimi-inari-taisha-sagano-romantic-train-day-tour/', '/activity/5137-guenpin-fugu-restaurant-kyoto/'], 2: ['/activity/1079-one-day-kimono-rental-kyoto/', '/activity/1032-higashiyama-rickshaw-tour-kyoto/', '/activity/6128-kyoto-seaside-day-tour-osaka/', '/activity/1540-hankyu-1-day-tourist-pass-osaka/', '/activity/1777-icoca-ic-card-kyoto/', '/activity/1541-kix-airport-limousine-bus-transfer-kyoto/', '/activity/1753-randen-kyoto-bus-subway-1-day-pass-kyoto/', '/activity/3260-sagano-romantic-train-ticket-kyoto/', '/activity/793-japanese-lzakaya-cooking-course-kyoto/', '/activity/882-nishiki-market-teramachi-street-kyoto/', '/activity/792-morning-bento-cooking-course-kyoto/', '/activity/2918-sushi-class-experience-kyoto/', '/activity/6032-ninja-kyoto-restaurant-labyrinth-kyoto/', '/activity/5215-garden-ryokan-nanzenji-yachiyo-kyoto/', '/activity/6543-arashiyama-golden-pavilion-temple-todaiji-kobe-mosaic-day-tour-kyoto/', '/activity/5198-nanzenji-junsei-restaurant-kyoto/', '/activity/7877-hanami-kimono-rental-kyoto/', '/activity/793-japanese-lzakaya-cooking-course-kyoto/', '/activity/9915-kyoto-osaka-sightseeing-pass-kyoto-japan/', '/activity/883-geisha-districts-tour-kyoto/', '/activity/1097-gion-kimono-experience-kyoto/', '/activity/6032-ninja-kyoto-restaurant-labyrinth-kyoto/', '/activity/792-morning-bento-cooking-course-kyoto/', '/activity/9272-4g-data-daijobu-sim-card-kyoto/', '/activity/871-sake-brewery-visit-fushimi-inari-shrine-kyoto/', '/activity/5979-tower-terrace-kyoto/', '/activity/632-kyoto-backstreet-cycling/', '/activity/646-kyoto-afternoon-exploration/', '/activity/640-kyoto-morning-sightseeing/', '/activity/872-arashiyama-bamboo-forest-half-day-tour-kyoto/', '/activity/5272-mukadeya-kyoto/', '/activity/6081-one-night-in-kyoto/', '/activity/2918-sushi-class-experience-kyoto/', '/activity/1032-higashiyama-rickshaw-tour-kyoto/', '/activity/5445-kimono-photo-shoot-kyoto/', '/activity/5215-garden-ryokan-nanzenji-yachiyo-kyoto/', '/activity/882-nishiki-market-teramachi-street-kyoto/', '/activity/7096-japan-prepaid-sim-card-kyoto/'], 3: ['/activity/1079-one-day-kimono-rental-kyoto/', '/activity/1032-higashiyama-rickshaw-tour-kyoto/', '/activity/6128-kyoto-seaside-day-tour-osaka/', '/activity/1540-hankyu-1-day-tourist-pass-osaka/', '/activity/1777-icoca-ic-card-kyoto/', '/activity/1541-kix-airport-limousine-bus-transfer-kyoto/', '/activity/1753-randen-kyoto-bus-subway-1-day-pass-kyoto/', '/activity/3260-sagano-romantic-train-ticket-kyoto/', '/activity/793-japanese-lzakaya-cooking-course-kyoto/', '/activity/882-nishiki-market-teramachi-street-kyoto/', '/activity/792-morning-bento-cooking-course-kyoto/', '/activity/2918-sushi-class-experience-kyoto/', '/activity/6032-ninja-kyoto-restaurant-labyrinth-kyoto/', '/activity/5215-garden-ryokan-nanzenji-yachiyo-kyoto/', '/activity/5271-itoh-dining-kyoto/', '/activity/9094-sagano-sightseeing-carriage-tour-kyoto/', '/activity/8192-japan-sim-card-taiwan-airport-pickup-kyoto/', '/activity/8420-south-korea-wifi-device-kyoto/', '/activity/8644-rock-climbing-at-kyoto-konpirayama-kyoto /', '/activity/9934-3g-4g-wifi-mnl-pick-up-delivery-for-japan-kyoto/', '/activity/8966-donburi-cooking-course-and-nishiki-market-tour-kyoto/', '/activity/9215-arashiyama-kyoto-food-drink-half-day-tour/']}
Im trying to scrape all the urls from amazon categories website (https://www.amazon.com/gp/site-directory/ref=nav_shopall_btn)
but I can just get the first url of any category. For example, from "Amazon video" I am getting "All videos", "Fire TV" amazon fire tv, etc.
That is my code:
from bs4 import BeautifulSoup
import requests
url = "https://www.amazon.es/gp/site-directory/ref=nav_shopall_btn"
amazon_link = requests.get(url)
html = BeautifulSoup(amazon_link.text,"html.parser")
categorias_amazon = html.find_all('div',{'class':'popover-grouping'})
for i in range(len(categorias_amazon)):
print("www.amazon.es" + categorias_amazon[i].a['href'])
I have tried with:
print("www.amazon.es" + categorias_amazon[i].find_all['a'])
but I get an error. I am looking to get href attribute of every sub category.
You can try this code:
from bs4 import BeautifulSoup
import requests
url = "https://www.amazon.es/gp/site-directory/ref=nav_shopall_btn"
amazon_link = requests.get(url)
html = BeautifulSoup(amazon_link.text,"html.parser")
# print html
categorias_amazon = html.find_all('div',{'class':'popover-grouping'})
allurls=html.select("div.popover-grouping [href]")
values=[link['href'].strip() for link in allurls]
for value in values:
print("www.amazon.es" + value)
It will print:
www.amazon.es/b?ie=UTF8&node=1748200031
www.amazon.es/gp/dmusic/mp3/player
www.amazon.es/b?ie=UTF8&node=2133385031
www.amazon.es/clouddrive/primephotos
www.amazon.es/clouddrive/home
www.amazon.es/clouddrive/home#download-section
www.amazon.es/clouddrive?_encoding=UTF8&sf=1
www.amazon.es/dp/B0186FET66
www.amazon.es/dp/B00QJDO0QC
www.amazon.es/dp/B00IOY524S
www.amazon.es/dp/B010EK1GOE
www.amazon.es/b?ie=UTF8&node=827234031
www.amazon.es/ebooks-kindle/b?ie=UTF8&node=827231031
www.amazon.es/gp/kindle/ku/sign-up/
www.amazon.es/b?ie=UTF8&node=8504981031
www.amazon.es/gp/digital/fiona/kcp-landing-page
www.amazon.eshttps://www.amazon.es:443/gp/redirect.html?location=https://leer.amazon.es/&token=CA091C61DBBA8A5C0F6E4A46ED30C059164DBC74&source=standards
www.amazon.es/gp/digital/fiona/manage
www.amazon.es/dp/B00ZDWLEEG
www.amazon.es/dp/B00IRKMZX0
www.amazon.es/dp/B01AHBC23E
www.amazon.es/b?ie=UTF8&node=827234031
www.amazon.es/mobile-apps/b?ie=UTF8&node=1661649031
www.amazon.es/b?ie=UTF8&node=1726755031
www.amazon.es/b?ie=UTF8&node=1748200031
www.amazon.es/ebooks-kindle/b?ie=UTF8&node=827231031
www.amazon.es/gp/digital/fiona/manage
www.amazon.es/b?ie=UTF8&node=10909716031
www.amazon.es/b?ie=UTF8&node=10909718031
www.amazon.es/b?ie=UTF8&node=10909719031
www.amazon.es/b?ie=UTF8&node=10909720031
www.amazon.es/b?ie=UTF8&node=10909721031
www.amazon.es/b?ie=UTF8&node=10909722031
www.amazon.es/b?ie=UTF8&node=8464150031
www.amazon.es/mobile-apps/b?ie=UTF8&node=1661649031
www.amazon.es/b?ie=UTF8&node=1726755031
www.amazon.es/b?ie=UTF8&node=4622953031
www.amazon.es/gp/feature.html?ie=UTF8&docId=1000658923
www.amazon.es/gp/mas/your-account/myapps
www.amazon.es/comprar-libros-espa%C3%B1ol/b?ie=UTF8&node=599364031
www.amazon.es/ebooks-kindle/b?ie=UTF8&node=827231031
www.amazon.es/gp/kindle/ku/sign-up/
www.amazon.es/Libros-en-ingl%C3%A9s/b?ie=UTF8&node=665418031
www.amazon.es/Libros-en-otros-idiomas/b?ie=UTF8&node=599367031
www.amazon.es/b?ie=UTF8&node=902621031
www.amazon.es/libros-texto/b?ie=UTF8&node=902673031
www.amazon.es/Blu-ray-DVD-peliculas-series-3D/b?ie=UTF8&node=599379031
www.amazon.es/series-tv-television-DVD-Blu-ray/b?ie=UTF8&node=665293031
www.amazon.es/Blu-ray-peliculas-series-3D/b?ie=UTF8&node=665303031
www.amazon.es/M%C3%BAsica/b?ie=UTF8&node=599373031
www.amazon.es/b?ie=UTF8&node=1748200031
www.amazon.es/musical-instruments/b?ie=UTF8&node=3628866031
www.amazon.es/fotografia-videocamaras/b?ie=UTF8&node=664660031
www.amazon.es/b?ie=UTF8&node=931491031
www.amazon.es/tv-video-home-cinema/b?ie=UTF8&node=664659031
www.amazon.es/b?ie=UTF8&node=664684031
www.amazon.es/gps-accesorios/b?ie=UTF8&node=664661031
www.amazon.es/musical-instruments/b?ie=UTF8&node=3628866031
www.amazon.es/accesorios/b?ie=UTF8&node=928455031
www.amazon.es/Inform%C3%A1tica/b?ie=UTF8&node=667049031
www.amazon.es/Electr%C3%B3nica/b?ie=UTF8&node=599370031
www.amazon.es/portatiles/b?ie=UTF8&node=938008031
www.amazon.es/tablets/b?ie=UTF8&node=938010031
www.amazon.es/ordenadores-sobremesa/b?ie=UTF8&node=937994031
www.amazon.es/componentes/b?ie=UTF8&node=937912031
www.amazon.es/b?ie=UTF8&node=2457643031
www.amazon.es/b?ie=UTF8&node=2457641031
www.amazon.es/Software/b?ie=UTF8&node=599376031
www.amazon.es/pc-videojuegos-accesorios-mac/b?ie=UTF8&node=665498031
www.amazon.es/Inform%C3%A1tica/b?ie=UTF8&node=667049031
www.amazon.es/material-oficina/b?ie=UTF8&node=4352791031
www.amazon.es/productos-papel-oficina/b?ie=UTF8&node=4352794031
www.amazon.es/boligrafos-lapices-utiles-escritura/b?ie=UTF8&node=4352788031
www.amazon.es/electronica-oficina/b?ie=UTF8&node=4352790031
www.amazon.es/oficina-papeleria/b?ie=UTF8&node=3628728031
www.amazon.es/videojuegos-accesorios-consolas/b?ie=UTF8&node=599382031
www.amazon.es/b?ie=UTF8&node=665290031
www.amazon.es/pc-videojuegos-accesorios-mac/b?ie=UTF8&node=665498031
www.amazon.es/b?ie=UTF8&node=8490963031
www.amazon.es/b?ie=UTF8&node=1381541031
www.amazon.es/Juguetes-y-juegos/b?ie=UTF8&node=599385031
www.amazon.es/bebe/b?ie=UTF8&node=1703495031
www.amazon.es/baby-reg/homepage
www.amazon.es/gp/family/signup
www.amazon.es/b?ie=UTF8&node=2181872031
www.amazon.es/b?ie=UTF8&node=3365351031
www.amazon.es/bano/b?ie=UTF8&node=3244779031
www.amazon.es/b?ie=UTF8&node=1354952031
www.amazon.es/iluminacion/b?ie=UTF8&node=3564289031
www.amazon.es/pequeno-electrodomestico/b?ie=UTF8&node=2165363031
www.amazon.es/aspiracion-limpieza-planchado/b?ie=UTF8&node=2165650031
www.amazon.es/almacenamiento-organizacion/b?ie=UTF8&node=3359926031
www.amazon.es/climatizacion-calefaccion/b?ie=UTF8&node=3605952031
www.amazon.es/Hogar/b?ie=UTF8&node=599391031
www.amazon.es/herramientas-electricas-mano/b?ie=UTF8&node=3049288031
www.amazon.es/Cortacespedes-Tractores-Jardineria/b?ie=UTF8&node=3249445031
www.amazon.es/instalacion-electrica/b?ie=UTF8&node=3049284031
www.amazon.es/accesorios-cocina-bano/b?ie=UTF8&node=3049286031
www.amazon.es/seguridad/b?ie=UTF8&node=3049292031
www.amazon.es/Bricolaje-Herramientas-Fontaneria-Ferreteria-Jardineria/b?ie=UTF8&node=2454133031
www.amazon.es/Categorias/b?ie=UTF8&node=6198073031
www.amazon.es/b?ie=UTF8&node=6348071031
www.amazon.es/Categorias/b?ie=UTF8&node=6198055031
www.amazon.es/b?ie=UTF8&node=12300685031
www.amazon.es/Salud-y-cuidado-personal/b?ie=UTF8&node=3677430031
www.amazon.es/Suscribete-Ahorra/b?ie=UTF8&node=9699700031
www.amazon.es/Amazon-Pantry/b?ie=UTF8&node=10547412031
www.amazon.es/moda-mujer/b?ie=UTF8&node=5517558031
www.amazon.es/moda-hombre/b?ie=UTF8&node=5517557031
www.amazon.es/moda-infantil/b?ie=UTF8&node=5518995031
www.amazon.es/bolsos-mujer/b?ie=UTF8&node=2007973031
www.amazon.es/joyeria/b?ie=UTF8&node=2454126031
www.amazon.es/relojes/b?ie=UTF8&node=599388031
www.amazon.es/equipaje/b?ie=UTF8&node=2454129031
www.amazon.es/gp/feature.html?ie=UTF8&docId=12464607031
www.amazon.es/b?ie=UTF8&node=8520792031
www.amazon.es/running/b?ie=UTF8&node=2928523031
www.amazon.es/fitness-ejercicio/b?ie=UTF8&node=2928495031
www.amazon.es/ciclismo/b?ie=UTF8&node=2928487031
www.amazon.es/tenis-padel/b?ie=UTF8&node=2985165031
www.amazon.es/golf/b?ie=UTF8&node=2928503031
www.amazon.es/deportes-equipo/b?ie=UTF8&node=2975183031
www.amazon.es/deportes-acuaticos/b?ie=UTF8&node=2928491031
www.amazon.es/deportes-invierno/b?ie=UTF8&node=2928493031
www.amazon.es/Tiendas-campa%C3%B1a-Sacos-dormir-Camping/b?ie=UTF8&node=2928471031
www.amazon.es/deportes-aire-libre/b?ie=UTF8&node=2454136031
www.amazon.es/ropa-calzado-deportivo/b?ie=UTF8&node=2975170031
www.amazon.es/calzado-deportivo/b?ie=UTF8&node=2928484031
www.amazon.es/electronica-dispositivos-el-deporte/b?ie=UTF8&node=2928496031
www.amazon.es/Coche-y-moto/b?ie=UTF8&node=1951051031
www.amazon.es/b?ie=UTF8&node=2566955031
www.amazon.es/gps-accesorios/b?ie=UTF8&node=664661031
www.amazon.es/Motos-accesorios-piezas/b?ie=UTF8&node=2425161031
www.amazon.es/industrial-cientfica/b?ie=UTF8&node=5866088031
www.amazon.es/b?ie=UTF8&node=6684191031
www.amazon.es/b?ie=UTF8&node=6684193031
www.amazon.es/b?ie=UTF8&node=6684192031
www.amazon.es/handmade/b?ie=UTF8&node=9699482031
www.amazon.es/b?ie=UTF8&node=10740508031
www.amazon.es/b?ie=UTF8&node=10740511031
www.amazon.es/b?ie=UTF8&node=10740559031
www.amazon.es/b?ie=UTF8&node=10740502031
www.amazon.es/b?ie=UTF8&node=10740505031
Hope this is what you were looking for.
Do you want to scrapp it or scrape it? If it's the latter, that about this?
from BeautifulSoup import BeautifulSoup
import urllib2
import re
html_page = urllib2.urlopen("https://www.amazon.es/gp/site-directory/ref=nav_shopall_btn")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
print link.get('href')