I am scraping a website nykaa.com and the link is (https://www.nykaa.com/skin/moisturizers/serums-essence/c/8397?root=nav_3&page_no=1). There are 25 pages and the data loads dynamically per page. I am unable to find the source of the data. Moreover when Scrape the data I am only able to get 20 products which become redundant and the list becomes 420 products.
import requests
from bs4 import BeautifulSoup
import unicodecsv as csv
urls = []
l1 = []
for page in range(1,5):
result = requests.get("https://www.nykaa.com/skin/moisturizers/serums-essence/c/8397?root=nav_3&page_no=" + str(page))
src = result.content
soup = BeautifulSoup(src,'lxml')
for div_tag in soup.find_all("div", class_ = "card-wrapper-container col-xs-12 col-sm-6 col-md-4"):
for div1_tag in soup.find_all("div", class_ = "product-list-box card desktop-cart"):
h2_tag = div1_tag.find("h2").find("span")
price_tag = div1_tag.find("div", class_ = "price-info")
l1 = [h2_tag.get_text(),price_tag.get_text()]
urls.append(l1)
#print(urls)
with open('xyz.csv', 'wb') as myfile:
wr = csv.writer(myfile)
wr.writerows(urls)
The above code fetches me a list of around 1200 product names and prices, out of which only 30 to 40 are unique otherwise all are redundant. I want to fetch data of 25 pages uniquely and there are total 486 unique products. I also used selenium to click the next page link but that also didn't work out.
This shows making the request the page does (as viewed in network tab) in a loop over all pages (including determing number of pages). results is a list of lists you can easily write to csv.
import requests, math, csv
page = '1'
def append_new_rows(data):
for i in data:
if 'name' in i:
results.append([i['name'], i['final_price']])
with requests.Session() as s:
r = s.get(f'https://www.nykaa.com/gludo/products/list?pro=false&filter_format=v2&app_version=null&client=react&root=nav_3&page_no={page}&category_id=8397').json()
results_per_page = 20
total_results = r['response']['total_found']
num_pages = math.ceil(total_results/results_per_page)
results = []
append_new_rows(r['response']['products'])
for page in range(2, num_pages + 1):
r = s.get(f'https://www.nykaa.com/gludo/products/list?pro=false&filter_format=v2&app_version=null&client=react&root=nav_3&page_no={page}&category_id=8397').json()
append_new_rows(r['response']['products'])
with open("data.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
w = csv.writer(csv_file, delimiter = ",", quoting=csv.QUOTE_MINIMAL)
w.writerow(['Name','Price'])
for row in results:
w.writerow(row)
You can use selenium:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://www.nykaa.com/skin/moisturizers/serums-essence/c/8397')
def get_products(_d):
return [{'title':(lambda x:x if not x else x.text)(i.find('div', {'class':'m-content__product-list__title'})), 'price':(lambda x:x if not x else x.text)(i.find('span', {'class':'post-card__content-price-offer'}))} for i in _d.find_all('div', {'class':'card-wrapper-container col-xs-12 col-sm-6 col-md-4'})]
s = soup(d.page_source, 'html.parser')
r = [list(filter(None, get_products(s)))]
while 'disable-event' not in s.find('li', {'class':'next'}).attrs['class']:
d.get(f"https://www.nykaa.com{s.find('li', {'class':'next'}).a['href']}")
s = soup(d.page_source, 'html.parser')
r.append(list(filter(None, get_products(s))))
Sample output (first three pages):
[[{'title': 'The Face Shop Calendula Essential Moisture Serum', 'price': '₹1320 '}, {'title': 'Palmers Cocoa Butter Formula Skin Perfecting Ultra Hydrating...', 'price': '₹970 '}, {'title': "Cheryl's Cosmeceuticals Clarifi Acne Anti Blemish Serum", 'price': '₹875 '}, {'title': 'Estee Lauder Advanced Night Repair Synchronized Recovery Com...', 'price': '₹1250 '}, {'title': 'Estee Lauder Advanced Night Repair Synchronized Recovery Com...', 'price': '₹1250 '}, {'title': 'Estee Lauder Advanced Night Repair Synchronized Recovery Com...', 'price': '₹3900 '}, {'title': 'Klairs Freshly Juiced Vitamin Drop', 'price': '₹1492 '}, {'title': 'Innisfree The Green Tea Seed Serum', 'price': '₹1950 '}, {'title': "Kiehl's Midnight Recovery Concentrate", 'price': '₹2100 '}, {'title': 'The Face Shop White Seed Brightening Serum', 'price': '₹1990 '}, {'title': 'Biotique Bio Dandelion Visibly Ageless Serum', 'price': '₹230 '}, {'title': None, 'price': None}, {'title': 'St.Botanica Vitamin C 20% + Vitamin E & Hyaluronic Acid Faci...', 'price': '₹1499 '}, {'title': 'Biotique Bio Coconut Whitening & Brightening Cream', 'price': '₹199 '}, {'title': 'Neutrogena Fine Fairness Brightening Serum', 'price': '₹849 '}, {'title': "Kiehl's Clearly Corrective Dark Spot Solution", 'price': '₹4300 '}, {'title': "Kiehl's Clearly Corrective Dark Spot Solution", 'price': '₹4300 '}, {'title': 'Lakme Absolute Perfect Radiance Skin Lightening Serum', 'price': '₹960 '}, {'title': 'St.Botanica Hyaluronic Acid + Vitamin C, E Facial Serum', 'price': '₹1499 '}, {'title': 'Jeva Vitamin C Serum with Hyaluronic Acid for Anti Aging and...', 'price': '₹350 '}, {'title': 'Lotus Professional Phyto-Rx Whitening & Brightening Serum', 'price': '₹595 '}], [{'title': 'The Face Shop Chia Seed Moisture Recharge Serum', 'price': '₹1890 '}, {'title': 'Lotus Herbals WhiteGlow Skin Whitening & Brightening Gel Cre...', 'price': '₹280 '}, {'title': 'Lakme 9 to 5 Naturale Aloe Aqua Gel', 'price': '₹200 '}, {'title': 'Estee Lauder Advanced Night Repair Synchronized Recovery Com...', 'price': '₹5900 '}, {'title': 'Mixify Unloc Skin Glow Serum', 'price': '₹499 '}, {'title': 'St.Botanica Retinol 2.5% + Vitamin E & Hyaluronic Acid Profe...', 'price': '₹1499 '}, {'title': 'LANEIGE Hydration Combo Set', 'price': '₹3000 '}, {'title': 'Biotique Bio Dandelion Ageless Visiblly Serum', 'price': '₹690 '}, {'title': 'The Moms Co. Natural Vita Rich Face Serum', 'price': '₹699 '}, {'title': "It's Skin Power 10 Formula VC Effector", 'price': '₹950 '}, {'title': "Kiehl's Powerful-Strength Line-Reducing Concentrate", 'price': '₹5100 '}, {'title': 'Olay Natural White Light Instant Glowing Fairness Skin Cream', 'price': '₹99 '}, {'title': 'Plum Green Tea Skin Clarifying Concentrate', 'price': '₹881 '}, {'title': 'Olay Total Effects 7 In One Anti-Ageing Smoothing Serum', 'price': '₹764 '}, {'title': 'Elizabeth Arden Ceramide Daily Youth Restoring Serum 60 Caps...', 'price': '₹5850 '}, {'title': None, 'price': None}, {'title': 'Olay Regenerist Advanced Anti-Ageing Micro-Sculpting Serum', 'price': '₹1699 '}, {'title': 'Lakme Absolute Argan Oil Radiance Overnight Oil-in-Serum', 'price': '₹945 '}, {'title': 'The Face Shop Mango Seed Silk Moisturizing Emulsion', 'price': '₹1890 '}, {'title': 'The Face Shop Calendula Essential Good to Glow Combo', 'price': '₹2557 '}, {'title': 'Garnier Skin Naturals Light Complete Serum Cream', 'price': '₹69 '}], [{'title': 'Clinique Moisture Surge Hydrating Supercharged Concentrate', 'price': '₹2550 '}, {'title': 'LANEIGE Sleeping Mask Combo', 'price': '₹3000 '}, {'title': 'Klairs Rich Moist Soothing Serum', 'price': '₹1492 '}, {'title': 'Estee Lauder Idealist Pore Minimizing Skin Refinisher', 'price': '₹5500 '}, {'title': 'O3+ Whitening & Brightening Serum', 'price': '₹1475 '}, {'title': 'Elizabeth Arden Ceramide Daily Youth Restoring Serum 90 Caps...', 'price': '₹6900 '}, {'title': 'Olay Natural White Light Instant Glowing Fairness Skin Cream', 'price': '₹189 '}, {'title': "L'Oreal Paris White Perfect Clinical Expert Anti-Spot Whiten...", 'price': '₹1480 '}, {'title': 'belif Travel Kit', 'price': '₹1499 '}, {'title': 'Forest Essentials Advanced Soundarya Serum With 24K Gold', 'price': '₹3975 '}, {'title': "L'Occitane Immortelle Reset Serum", 'price': '₹4500 '}, {'title': 'Lakme Absolute Skin Gloss Reflection Serum 30ml', 'price': '₹990 '}, {'title': 'Neutrogena Hydro Boost Emulsion', 'price': '₹999 '}, {'title': 'Innisfree Anti-Aging Set', 'price': '₹2350 '}, {'title': 'Clinique Fresh Pressed 7-Day System With Pure Vitamin C', 'price': '₹2400 '}, {'title': 'The Face Shop The Therapy Premier Serum', 'price': '₹2490 '}, {'title': 'The Body Shop Vitamin E Overnight Serum In Oil', 'price': '₹1695 '}, {'title': 'Jeva Vitamin C Serum with Hyaluronic Acid for Anti Aging and...', 'price': '₹525 '}, {'title': 'Olay Regenerist Micro Sculpting Cream and White Radiance Hyd...', 'price': '₹2698 '}, {'title': 'The Face Shop Yehwadam Pure Brightening Serum', 'price': '₹4350 '}]]
Related
I am trying to join two lists and export as csv, but when the csv is built, it's all messed up with spaces that I don't know the origin of and with the first line strangely duplicated as you can see in the attached image.
import csv
amazondata = [{'amzlink': 'https://www.amazon.com/dp/B084ZZ7VY3', 'asin': 'B084ZZ7VY3', 'url': 'https://www.amazon.com/s?k=712145360504&s=review-rank', 'title': '100% Non-GMO Berberine HCL Complex Supplement - Supports Gut, Heart, and Immune System Health- Harvested in The Himalayas, Helps Regulate Blood Sugar & Cholesterol, 100% Free of Additives & Allergens', 'price': '$14.95', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/81D1P4QqLfL._AC_SX425_.jpg', 'rank': 'Best Sellers Rank: #194,130 in Health & Household (See Top 100 in Health & Household)\n#6,896 in Blended Vitamin & Mineral Supplements', 'rating': '4.7 out of 5'}, {'amzlink': 'https://www.amazon.com/dp/B000NRTWS6', 'asin': 'B000NRTWS6', 'url': 'https://www.amazon.com/s?k=753950000698&s=review-rank'}, {'amzlink': 'https://www.amazon.com/dp/B07XM9P4C4', 'asin': 'B07XM9P4C4', 'url': 'https://www.amazon.com/s?k=753950005266&s=review-rank'}, {'amzlink': 'https://www.amazon.com/dp/B08KJ1VQJD', 'asin': 'B08KJ1VQJD', 'url': 'https://www.amazon.com/s?k=753950005242&s=review-rank'}, {'amzlink': 'https://www.amazon.com/dp/B005P0VD4W', 'asin': 'B005P0VD4W', 'url': 'https://www.amazon.com/s?k=043292560180&s=review-rank'}, {'amzlink': 'https://www.amazon.com/dp/B008FCJTES', 'asin': 'B008FCJTES', 'url': 'https://www.amazon.com/s?k=311845053213&s=review-rank'}]
amazonPage = [{'title': '100% Non-GMO Berberine HCL Complex Supplement - Supports Gut, Heart, and Immune System Health- Harvested in The Himalayas, Helps Regulate Blood Sugar & Cholesterol, 100% Free of Additives & Allergens', 'price': '$14.95', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/81D1P4QqLfL._AC_SX425_.jpg', 'rank': 'Best Sellers Rank: #194,130 in Health & Household (See Top 100 in Health & Household)\n#6,896 in Blended Vitamin & Mineral Supplements', 'rating': '4.7 out of 5'}, {'title': "Doctor's Best High Absorption CoQ10 with BioPerine, Vegan, Gluten Free, Naturally Fermented, Heart Health & Energy Production, 100 mg 60 Veggie Caps", 'price': '$14.24 ($0.24 / Count)', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/71sNP5u1N1S._AC_SX425_.jpg', 'rank': 'Rank not found', 'rating': 'Rating not found'}, {'title': "Doctor's Best CoQ10 Gummies 200 Mg, Coenzyme Q10 (Ubiquinone), Supports Heart Health, Boost Cellular Energy, Potent Antioxidant, 60 Ct (Packaging May Vary)", 'price': '$19.96 ($0.33 / Count)', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/71NIwP2V2uL._AC_SY450_.jpg', 'rank': 'Best Sellers Rank: #20,634 in Health & Household (See Top 100 in Health & Household)\n#73 in CoQ10 Nutritional Supplements', 'rating': '4.7 out of 5'}, {'title': 'CoQ10 300mg Doctors Best 30 Softgel', 'price': '$19.64', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/61DttvOf18L._AC_SX425_.jpg', 'rank': 'Manufacturer \u200f : \u200e Doctors Best', 'rating': 'Rating not found'}, {'title': 'Ultra Glandulars Ultra Raw Eye 60 Tab', 'price': '$19.65', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/71ASBPlEphL._AC_SX425_.jpg', 'rank':
'Best Sellers Rank: #286,790 in Health & Household (See Top 100 in Health & Household)\n#14,261 in Herbal Supplements', 'rating': '4.1 out of 5'}, {'title': 'Mason Natural Garlic Oil 500 mg Odorless Allium Sativum Supplement - Supports Healthy Circulatory Function, 100 Softgels', 'price': '$6.75', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/71eXVu3zxFL._AC_SX425_.jpg', 'rank': 'Best Sellers Rank: #346,287 in Health & Household (See Top 100 in Health & Household)\n#350 in Garlic Herbal Supplements', 'rating': '4.9 out of 5'}]
result = []
amazonPage.extend(amazondata)
for myDict in amazonPage:
if myDict not in result:
result.append(myDict)
print (result)
amazonPage[0].update(amazondata[0])
keys = amazonPage[0].keys()
print(keys)
with open('Test WF.csv', 'w', newline='', encoding="utf-8") as csvfile:
dict_writer = csv.DictWriter(csvfile, keys)
dict_writer.writeheader()
dict_writer.writerows(result)
csv output image
You're only merging the first dictionary in each list. You should merge them all.
result = [data | page for data, page in zip(amazondata, amazonPage)]
keys = result[0].keys()
Hi I have written a script which uses BeautifulSoup4 to extract list of jobs as well as their details and associated application links. I have used a for loop as each value (Link/Title/Company etc) as each piece of information is under a different class.
I have managed to write for loops to extract all of the data however not sure how to link the first result in the 1st for loop (Link) to pair with the 1st result in the second for loop (Job Title) and so on.
So my output is currently:
(There are 50 jobs on the search)
First 50 lines : Links of the application
Second 50 lines : Names of each job title
etc etc.
import requests
import json
from bs4 import BeautifulSoup
URL = "https://remote.co/remote-jobs/developer/"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
jobs = soup.find_all('a', class_='card m-0 border-left-0 border-right-0 border-top-0 border-bottom')
titles = soup.find_all('span', class_='font-weight-bold larger')
date_added = soup.find_all('span', class_='float-right d-none d-md-inline text-secondary')
company = soup.find_all('p', class_='m-0 text-secondary')
remote = 'https://remote.co/'
job_list = []
for a in jobs:
link = a['href']
print(f'Apply here: {remote}{link}')
job_list.append(link)
for b in titles:
job_list.append(b.text)
for c in date_added:
job_list(c.text)
for d in company:
job_list(d.text)
Here's the code I have written, can someone help me with organising it so that the first chunk of text will be
Link to Apply
Job Title
Date the Job was Added
Name of Company and Working Hours
Here is a snippet of the HTML from the site
<div class="card bg-light mb-3 rounded-0">
<div class="card-body">
<div class="d-flex align-items-center mb-3">
<h2 class="text-uppercase mb-0 mr-2 raleway" style="-webkit-box-flex:0;flex-grow:0;">Remote Developer Jobs</h2><div style="background:#00a2e1;-webkit-box-flex:1;flex-grow:1;height:3px;"></div>
</div>
<div class="card bg-white m-0">
<div class="card-body p-0">
<p class="p-3 m-0 border-bottom">
<a href="/remote-jobs/" style="font-size:18px;">
<em>
See all Remote Jobs >
</em>
</a>
</p>
<a href="/job/staff-frontend-web-developer-24/" class="card m-0 border-left-0 border-right-0 border-top-0 border-bottom">
<div class="card border-0 p-3 job-card bg-white">
<div class="row no-gutters align-items-center">
<div class="col-lg-1 col-md-2 position-static d-none d-md-block pr-md-3">
<img src="data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%200%200'%3E%3C/svg%3E" alt="Routable" class="card-img" data-lazy-src="https://remoteco.s3.amazonaws.com/wp-content/uploads/2021/07/27194326/routable-150x150.png"/><noscript><img src="https://remoteco.s3.amazonaws.com/wp-content/uploads/2021/07/27194326/routable-150x150.png" alt="Routable" class="card-img"/></noscript>
</div>
<div class="col position-static">
<div class="card-body px-3 py-0 pl-md-0">
<p class="m-0"><span class="font-weight-bold larger">Staff Frontend Web Developer</span><span class="float-right d-none d-md-inline text-secondary"><small><date>1 day ago</date></small></span></p>
<p class="m-0 text-secondary">
Routable
| <span class="badge badge-success"><small>Full-time</small></span>
| <span class="badge badge-success"><small>International</small></span>
</p>
</div>
</div>
</div>
</div>
</a>
You can try the next example:
from bs4 import BeautifulSoup
import requests
page = requests.get('https://remote.co/remote-jobs/developer')
soup = BeautifulSoup(page.content,'lxml')
data = []
for e in soup.select('div.card-body.p-0 > a'):
soup2 = BeautifulSoup(requests.get('https://remote.co'+e.get('href')).content,'lxml')
d = {
'title':soup2.h1.text,
'job_name':soup2.select_one('div.job_description > p').text,
'company':soup2.select_one('div.co_name > strong').text,
'date':soup2.select_one('.date_sm time').text.replace('Posted:',''),
'Link':'https://remote.co'+e.get('href')
}
data.append(d)
print(data)
Output:
[{'title': 'Principal Software Engineer at Wisetack', 'job_name': 'Principal Software Engineer', 'company': 'Wisetack', 'date': ' 2 hours ago', 'Link': 'https://remote.co/job/principal-software-engineer-26/'}, {'title': 'Staff Frontend Web Developer at Routable', 'job_name': 'Staff Frontend Web Developer', 'company': 'Routable', 'date': ' 1 day ago', 'Link': 'https://remote.co/job/staff-frontend-web-developer-24/'}, {'title': 'Developer Advocate at DeepSource', 'job_name': 'Developer Advocate', 'company': 'DeepSource', 'date': ' 2 days ago', 'Link': 'https://remote.co/job/developer-advocate-24/'}, {'title': 'Senior GCP DevOps Engineer at RXMG', 'job_name': 'Location:\xa0 US Locations Only; 100% Remote', 'company': 'RXMG', 'date': ' 3 days ago', 'Link': 'https://remote.co/job/senior-gcp-devops-engineer-23/'}, {'title': 'Growth Engineer, MarTech at Facet Wealth', 'job_name': 'Location:\xa0 US Locations Only; 100% Remote', 'company': 'Facet Wealth', 'date': ' 3 days ago', 'Link': 'https://remote.co/job/growth-engineer-martech-23/'}, {'title': 'DevOps Engineer at Oddball', 'job_name': 'Location:\xa0 US Locations Only; 100% Remote', 'company': 'Oddball', 'date': ' 3 days ago', 'Link': 'https://remote.co/job/devops-engineer-66/'}, {'title': 'DevOps Engineer at Paymentology', 'job_name': 'Location:\xa0 International, Anywhere; 100% remote', 'company': 'Paymentology', 'date': ' 4 days ago', 'Link': 'https://remote.co/job/devops-engineer-67/'}, {'title': 'Director, Core Technology Software Development at Andela', 'job_name': 'Title: Director, Core Technology Software Development', 'company': 'Andela', 'date': ' 4 days ago', 'Link': 'https://remote.co/job/director-core-technology-software-development-22/'}, {'title': 'Senior Developer – Net Core/C#/SQL (REMOTE or Local) at Cascade Financial Technology', 'job_name': 'Location:\xa0 US Locations Only; 100% Remote', 'company': 'Cascade Financial Technology', 'date': ' 4 days ago', 'Link': 'https://remote.co/job/senior-developer-net-core-c-sql-remote-or-local-22/'}, {'title': 'Front End Android Developer at Cascade Financial Technology', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote', 'company': 'Cascade Financial Technology', 'date': ' 4 days ago', 'Link': 'https://remote.co/job/front-end-android-developer-22/'}, {'title': 'Senior Backend Engineer – Python at Doist', 'job_name': 'Senior Backend Engineer (Python)', 'company': 'Doist', 'date': ' 5 days ago', 'Link': 'https://remote.co/job/senior-backend-engineer-python-21/'}, {'title': "Front End Developer at Brad's Deals", 'job_name': 'Front End Developer', 'company': "Brad's Deals", 'date': ' 5 days ago', 'Link': 'https://remote.co/job/front-end-developer-21-2/'}, {'title': 'Director of Engineering at Farmgirl Flowers', 'job_name': 'Director of Engineering', 'company': 'Farmgirl Flowers', 'date': ' 5 days ago', 'Link': 'https://remote.co/job/director-of-engineering-21/'}, {'title': 'Software Engineer, Backend Identity at Affirm', 'job_name': 'Title: Software Engineer, Backend (Identity)', 'company': 'Affirm', 'date': ' 5 days ago', 'Link': 'https://remote.co/job/software-engineer-backend-identity-21/'}, {'title': 'Backend Developer (Node/Typescript) at CitizenShipper', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote', 'company': 'CitizenShipper', 'date': ' 6 days ago', 'Link': 'https://remote.co/job/backend-developer-node-typescript-20/'}, {'title': 'Fullstack Developer (TypeScript) at CitizenShipper', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote', 'company': 'CitizenShipper', 'date': ' 6 days ago', 'Link': 'https://remote.co/job/fullstack-developer-typescript-20/'}, {'title': 'Senior Software Engineer- Java at Method, Inc.', 'job_name': 'Location:\xa0 US Locations; 100% Remote', 'company': 'Method, Inc.', 'date': ' 6 days ago', 'Link': 'https://remote.co/job/senior-software-engineer-java-2/'}, {'title': 'Senior Software Engineer – Backend at Varsity Tutors', 'job_name': 'Title:\xa0Senior Software Engineer (Backend) – Golang', 'company': 'Varsity
Tutors', 'date': ' 6 days ago', 'Link': 'https://remote.co/job/senior-software-engineer-backend-20/'}, {'title': 'Backend Engineer, Growth Engineering at Stripe, Inc.', 'job_name': 'Backend Engineer, Growth Engineering', 'company':
'Stripe, Inc.', 'date': ' 6 days ago', 'Link': 'https://remote.co/job/backend-engineer-growth-engineering-20/'}, {'title': 'Game Developer at Voodoo', 'job_name': 'Game Developer', 'company': 'Voodoo', 'date': ' 6 days ago', 'Link': 'https://remote.co/job/game-developer-20/'}, {'title': 'Senior Ruby Engineer at Clearcover', 'job_name': 'Title: Sr. Ruby Engineer', 'company': 'Clearcover', 'date': ' 1 week ago', 'Link': 'https://remote.co/job/senior-ruby-engineer-18/'}, {'title': 'Ruby Engineer at Clearcover', 'job_name': 'Title: Ruby Engineer', 'company': 'Clearcover', 'date': ' 1 week ago', 'Link': 'https://remote.co/job/ruby-engineer-17/'}, {'title': 'DevOps Engineer at OCCRP', 'job_name': 'Location:\xa0 International, Anywhere; Freelance', 'company': 'OCCRP', 'date': ' 1 week ago', 'Link': 'https://remote.co/job/devops-engineer-65/'}, {'title': 'Python Developer at ScienceLogic', 'job_name': 'Title:\xa0Python Developer', 'company': 'ScienceLogic', 'date': ' 1 week ago', 'Link': 'https://remote.co/job/python-developer-16/'}, {'title': 'Senior Software Engineer – App Stores Backend at Canonical', 'job_name': 'Title:\xa0Senior Software Engineer – App Stores Backend (Remote)', 'company': 'Canonical', 'date': ' 1 week ago', 'Link': 'https://remote.co/job/senior-software-engineer-app-stores-backend-16/'}, {'title': 'Software Engineer, Backend – Machine Learning Platform at
Affirm', 'job_name': 'Software Engineer, Backend (Machine Learning Platform)', 'company': 'Affirm', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/software-engineer-backend-machine-learning-platform-14/'}, {'title': 'Senior
Engineering Manager, Billing at Webflow', 'job_name': 'Title: Senior Engineering Manager, Billing', 'company': 'Webflow', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/senior-engineering-manager-billing-14/'}, {'title': 'Senior Software Engineer, Anti-Tracking at Mozilla', 'job_name': 'Title: Senior Software Engineer, Anti-Tracking', 'company': 'Mozilla', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/senior-software-engineer-anti-tracking-14/'}, {'title': 'Director of Engineering at Conserv', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote', 'company': 'Conserv', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/director-of-engineering-14/'}, {'title': 'Lead Front End Developer- Email at Stitch Fix', 'job_name': 'Title:\xa0Lead Front End Developer- Email', 'company': 'Stitch Fix', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/lead-front-end-developer-email-13/'}, {'title': 'Technical Lead Growth Monetization, Frontend at HubSpot', 'job_name': 'Technical Lead Growth Monetization, Frontend (US/Remote)', 'company': 'HubSpot', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/technical-lead-growth-monetization-frontend-11/'}, {'title': 'Senior Software Engineer, Backend Debit+ at Affirm', 'job_name': 'Title:\xa0Senior Software Engineer, Backend\xa0(Debit+)', 'company': 'Affirm', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/senior-software-engineer-backend-debit-11/'}, {'title': 'C++ Graphics and Windowing System Software Engineer at Canonical', 'job_name': 'Title:\xa0C++ Graphics and Windowing System Software Engineer\xa0– Mir', 'company': 'Canonical', 'date': ' 2 weeks ago', 'Link': 'https://remote.co/job/c-graphics-and-windowing-system-software-engineer-9/'}, {'title': 'Senior Manager, Software Engineering at Myriad Genetics', 'job_name': 'Title:\xa0Senior Manager, Software Engineering', 'company': 'Myriad Genetics', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-manager-software-engineering-8/'}, {'title': 'Senior Kernel Build Automation Engineer at Canonical', 'job_name': 'Title: Senior Kernel Build Automation Engineer ', 'company': 'Canonical', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-kernel-build-automation-engineer-8/'}, {'title': 'Engineering Manager – Full Stack at Betterment', 'job_name': 'Title: Engineering Manager – Full Stack', 'company': 'Betterment', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/engineering-manager-full-stack-7/'}, {'title': 'Principal Architect – Software Engineering at Citizens Bank', 'job_name': 'Principal Architect – Software Engineering', 'company': 'Citizens Bank', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/principal-architect-software-engineering-7/'}, {'title': 'Senior Software Engineer, Kubernetes Platform at Appboy', 'job_name': 'Title:\xa0Senior Software Engineer, Kubernetes Platform', 'company': 'Appboy', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-software-engineer-kubernetes-platform-7/'}, {'title': 'Senior React Native Developer at Toptal', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote; Freelance', 'company': 'Toptal', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-react-native-developer-11/'}, {'title': 'Senior Blockchain Developer at Toptal', 'job_name': 'Location: International, Anywhere; 100% Remote; Freelance', 'company': 'Toptal', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-blockchain-developer-5/'}, {'title': 'Front-End Developer at Toptal', 'job_name': 'Location: International, Anywhere; 100% Remote; Freelance', 'company': 'Toptal', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/front-end-developer-5-2/'}, {'title': 'Senior DevOps Engineer at Toptal', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote; Freelance', 'company': 'Toptal', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-devops-engineer-11-2/'}, {'title': 'Senior React Developer at Toptal', 'job_name': 'Location: Anywhere, International;\xa0 Freelance;\xa0 100% Remote', 'company': 'Toptal', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-react-developer-5/'}, {'title': 'Full-Stack Developer at Toptal', 'job_name': 'Location: International, Anywhere; 100% Remote; Freelance', 'company': 'Toptal', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/full-stack-developer-5-2/'}, {'title': 'Senior Full Stack Developer: Long-term job – 100% remote at Proxify AB', 'job_name': 'Location:\xa0 International, Anywhere; 100% Remote; Freelance', 'company': 'Proxify AB', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-full-stack-developer-long-term-job-100-remote-6/'}, {'title': 'Software Engineer – Backend at 0x', 'job_name': 'Software Engineer – Backend (Campus)', 'company': '0x', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/software-engineer-backend-5-2/'}, {'title': 'Engineering Manager at Array.com', 'job_name': 'Engineering Manager', 'company': 'Array.com', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/engineering-manager-5-2/'}, {'title': 'Senior Software Engineer, Canvas Facilitation at MURAL.co', 'job_name': 'Senior Software Engineer, Canvas Facilitation', 'company': 'MURAL.co', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/senior-software-engineer-canvas-facilitation-5/'}, {'title': 'Backend Engineer at CareRev', 'job_name': 'Title:\xa0Backend Engineer', 'company': 'CareRev', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/backend-engineer-5-2/'}, {'title': 'Principal Software Engineer, Architect Cognitive Automation at Appian', 'job_name': 'Title:\xa0Principal Software Engineer/Architect (Cognitive Automation)', 'company': 'Appian', 'date': ' 3 weeks ago', 'Link': 'https://remote.co/job/principal-software-engineer-architect-cognitive-automation-5/'}]
Your lists have a little bit of unnecessary data at the moment. Can you provide an example of how it is supposed to look in the end?
However, can use zip() to iterate over all lists at the same time:
jobs = soup.find_all('a', class_='card m-0 border-left-0 border-right-0 border-top-0 border-bottom')
titles = soup.find_all('span', class_='font-weight-bold larger')
dates_added = soup.find_all('span', class_='float-right d-none d-md-inline text-secondary')
companies = soup.find_all('p', class_='m-0 text-secondary')
for job, title, date_added, company in zip(jobs, titles, dates_added, companies):
print(job, title, date_added, company)
I'm attempting to get data from Wikipedias sidebar on the 'Current Events' page with the below. At the moment this produces an array of Objects each with value title and url.
I would also like to provide a new value to the objects in array headline derived from the <h3> id or text content. This would result in each object having three values: headline, url and title. However, I'm unsure how to iterate through these.
Beautiful Soup Code
soup = BeautifulSoup(response, "html.parser").find('div', {'aria-labelledby': 'Ongoing_events'})
links = soup.findAll('a')
for item in links:
title = item.text
url = ("https://en.wikipedia.org"+item['href'])
eo = CurrentEventsObject(title, url)
eventsArray.append(eo)
Wikipedia Current Events List
<div class="mw-collapsible-content">
<h3><span class="mw-headline" id="Disasters">Disasters</span</h3>
<ul>
<li>Climate crisis</li>
<li>COVID-19 pandemic</li>
<li>2021–22 European windstorm season</li>
<li>2020–21 H5N8 outbreak</li>
<li>2021 Pacific typhoon season</li>
<li>Madagascar food crisis</li>
<li>Water crisis in Iran</li>
<li>Yemeni famine</li>
<li>2021 La Palma eruption</li>
</ul>
<h3><span class="mw-headline" id="Economic">Economic</span></h3>
<ul>
<li>2020–2021 global chip shortage</li>
<li>2021 global supply chain crisis</li>
<li>COVID-19 recession</li>
<li>Lebanese liquidity crisis</li>
<li>Pandora Papers leak</li>
<li>Sri Lankan economic and food crisis</li>
<li>Turkish currency and debt crisis</li>
<li>United Kingdom natural gas supplier crisis</li>
</ul>
<h3><span class="mw-headline" id="Politics">Politics</span></h3>
<ul>
<li>Belarus−European Union border crisis</li>
<li>Brazilian protests</li>
<li>Colombian tax reform protests</li>
<li>Eswatini protests</li>
<li>Haitian protests</li>
<li>Indian farmers' protests</li>
<li>Insulate Britain protests</li>
<li>Jersey dispute</li>
<li>Libyan peace process</li>
<li>Malaysian political crisis</li>
<li>Myanmar protests</li>
<li>Nicaraguan protests</li>
<li>Nigerian protests</li>
<li>Persian Gulf crisis</li>
<li>Peruvian crisis</li>
<li>Russian election protests</li>
<li>Solomon Islands unrest</li>
<li>Tigrayan peace process</li>
<li>Thai protests</li>
<li>Tunisian political crisis</li>
<li>United States racial unrest</li>
<li>Venezuelan presidential crisis</li>
</ul>
<div class="editlink noprint plainlinks"><a class="external text" href="https://en.wikipedia.org/w/index.php?title=Portal:Current_events/Sidebar&action=edit">edit section</a></div>
</div>
Note: Try to select your elements more specific to get all information in one process - Defining a list outside your loops will avoid from overwriting
Following steps will create a list of dicts, that for example could simply iterated or turned into a data frame.
#1
Select all <ul> that are direct siblings of a <h3>
soup.select('h3 + ul')
#2 Select the <h3> and get its text:
e.find_previous_sibling('h3').text.strip()
#3 Select all <a> in the <ul> and iterat the results while creating a list of dicts:
for a in e.select('a'):
data.append({
'headline':headline,
'title': a['title'],
'url':'https://en.wikipedia.org'+a['href']
})
Example
soup = BeautifulSoup(response, "html.parser").find('div', {'aria-labelledby': 'Ongoing_events'})
data = []
for e in soup.select('h3 + ul'):
headline = e.find_previous_sibling('h3').text.strip()
for a in e.select('a'):
data.append({
'headline':headline,
'title': a['title'],
'url':'https://en.wikipedia.org'+a['href']
})
data
Output
[{'headline': 'Disasters',
'title': 'Climate crisis',
'url': 'https://en.wikipedia.org/wiki/Climate_crisis'},
{'headline': 'Disasters',
'title': 'COVID-19 pandemic',
'url': 'https://en.wikipedia.org/wiki/COVID-19_pandemic'},
{'headline': 'Disasters',
'title': '2021–22 European windstorm season',
'url': 'https://en.wikipedia.org/wiki/2021%E2%80%9322_European_windstorm_season'},
{'headline': 'Disasters',
'title': '2020–21 H5N8 outbreak',
'url': 'https://en.wikipedia.org/wiki/2020%E2%80%9321_H5N8_outbreak'},
{'headline': 'Disasters',
'title': '2021 Pacific typhoon season',
'url': 'https://en.wikipedia.org/wiki/2021_Pacific_typhoon_season'},
{'headline': 'Disasters',
'title': '2021 Madagascar food crisis',
'url': 'https://en.wikipedia.org/wiki/2021_Madagascar_food_crisis'},
{'headline': 'Disasters',
'title': 'Water scarcity in Iran',
'url': 'https://en.wikipedia.org/wiki/Water_scarcity_in_Iran'},
{'headline': 'Disasters',
'title': 'Famine in Yemen (2016–present)',
'url': 'https://en.wikipedia.org/wiki/Famine_in_Yemen_(2016%E2%80%93present)'},
{'headline': 'Disasters',
'title': '2021 Cumbre Vieja volcanic eruption',
'url': 'https://en.wikipedia.org/wiki/2021_Cumbre_Vieja_volcanic_eruption'},
{'headline': 'Economic',
'title': '2020–2021 global chip shortage',
'url': 'https://en.wikipedia.org/wiki/2020%E2%80%932021_global_chip_shortage'},
{'headline': 'Economic',
'title': '2021 global supply chain crisis',
'url': 'https://en.wikipedia.org/wiki/2021_global_supply_chain_crisis'},
{'headline': 'Economic',
'title': 'COVID-19 recession',
'url': 'https://en.wikipedia.org/wiki/COVID-19_recession'},
{'headline': 'Economic',
'title': 'Lebanese liquidity crisis',
'url': 'https://en.wikipedia.org/wiki/Lebanese_liquidity_crisis'},
{'headline': 'Economic',
'title': 'Pandora Papers',
'url': 'https://en.wikipedia.org/wiki/Pandora_Papers'},
{'headline': 'Economic',
'title': '2021 Sri Lankan economic crisis',
'url': 'https://en.wikipedia.org/wiki/2021_Sri_Lankan_economic_crisis'},
{'headline': 'Economic',
'title': '2018–2021 Turkish currency and debt crisis',
'url': 'https://en.wikipedia.org/wiki/2018%E2%80%932021_Turkish_currency_and_debt_crisis'},
{'headline': 'Economic',
'title': '2021 United Kingdom natural gas supplier crisis',
'url': 'https://en.wikipedia.org/wiki/2021_United_Kingdom_natural_gas_supplier_crisis'},
{'headline': 'Politics',
'title': '2021 Belarus–European Union border crisis',
'url': 'https://en.wikipedia.org/wiki/2021_Belarus%E2%80%93European_Union_border_crisis'},
{'headline': 'Politics',
'title': '2021 Brazilian protests',
'url': 'https://en.wikipedia.org/wiki/2021_Brazilian_protests'},
{'headline': 'Politics',
'title': '2021 Colombian protests',
'url': 'https://en.wikipedia.org/wiki/2021_Colombian_protests'},
{'headline': 'Politics',
'title': '2021 Eswatini protests',
'url': 'https://en.wikipedia.org/wiki/2021_Eswatini_protests'},
{'headline': 'Politics',
'title': '2018–2021 Haitian protests',
'url': 'https://en.wikipedia.org/wiki/2018%E2%80%932021_Haitian_protests'},
{'headline': 'Politics',
'title': "2020–2021 Indian farmers' protest",
'url': 'https://en.wikipedia.org/wiki/2020%E2%80%932021_Indian_farmers%27_protest'},
{'headline': 'Politics',
'title': 'Insulate Britain protests',
'url': 'https://en.wikipedia.org/wiki/Insulate_Britain_protests'},
{'headline': 'Politics',
'title': '2021 Jersey dispute',
'url': 'https://en.wikipedia.org/wiki/2021_Jersey_dispute'},
{'headline': 'Politics',
'title': 'Libyan peace process',
'url': 'https://en.wikipedia.org/wiki/Libyan_peace_process'},
{'headline': 'Politics',
'title': '2020–21 Malaysian political crisis',
'url': 'https://en.wikipedia.org/wiki/2020%E2%80%9321_Malaysian_political_crisis'},
{'headline': 'Politics',
'title': '2021 Myanmar protests',
'url': 'https://en.wikipedia.org/wiki/2021_Myanmar_protests'},
{'headline': 'Politics',
'title': '2018–2021 Nicaraguan protests',
'url': 'https://en.wikipedia.org/wiki/2018%E2%80%932021_Nicaraguan_protests'},
{'headline': 'Politics',
'title': 'End SARS',
'url': 'https://en.wikipedia.org/wiki/End_SARS'},
{'headline': 'Politics',
'title': '2019–2021 Persian Gulf crisis',
'url': 'https://en.wikipedia.org/wiki/2019%E2%80%932021_Persian_Gulf_crisis'},
{'headline': 'Politics',
'title': '2017–present Peruvian political crisis',
'url': 'https://en.wikipedia.org/wiki/2017%E2%80%93present_Peruvian_political_crisis'},
{'headline': 'Politics',
'title': '2021 Russian election protests',
'url': 'https://en.wikipedia.org/wiki/2021_Russian_election_protests'},
{'headline': 'Politics',
'title': '2021 Solomon Islands unrest',
'url': 'https://en.wikipedia.org/wiki/2021_Solomon_Islands_unrest'},
{'headline': 'Politics',
'title': 'Tigrayan peace process',
'url': 'https://en.wikipedia.org/wiki/Tigrayan_peace_process'},
{'headline': 'Politics',
'title': '2020–2021 Thai protests',
'url': 'https://en.wikipedia.org/wiki/2020%E2%80%932021_Thai_protests'},
{'headline': 'Politics',
'title': '2021 Tunisian political crisis',
'url': 'https://en.wikipedia.org/wiki/2021_Tunisian_political_crisis'},
{'headline': 'Politics',
'title': '2020–2021 United States racial unrest',
'url': 'https://en.wikipedia.org/wiki/2020%E2%80%932021_United_States_racial_unrest'},
{'headline': 'Politics',
'title': 'Venezuelan presidential crisis',
'url': 'https://en.wikipedia.org/wiki/Venezuelan_presidential_crisis'}]
I want to scrape multiple Google scholar user profiles - publications, journals, citations etc. I have already written the python code for scraping a user profile given the url. Now, suppose I have 100 names and the corresponding urls in an excel file like this.
name link
Autor https://scholar.google.com/citations?user=cp-8uaAAAAAJ&hl=en
Dorn https://scholar.google.com/citations?user=w3Dri00AAAAJ&hl=en
Hanson https://scholar.google.com/citations?user=nMtHiQsAAAAJ&hl=en
Borjas https://scholar.google.com/citations?user=Patm-BEAAAAJ&hl=en
....
My question is can I read the 'link' column of this file and write a for loop for the urls so that I can scrape each of these profiles and append the results in the same file. I seems a bit far fetched but I hope there is a way to do so. Thanks in advance!
You can use pandas.read_csv() to read a specific file from a csv. For example:
import pandas as pd
df = pd.read_csv('data.csv')
arr = []
link_col = df['link']
for i in link_col:
arr.append(i);
print(arr)
This would allow you extract only the link column and append each value into your array. If you'd like you learn more, you can refer to pandas.
I hope it is not too advanced for you
1 Create a class for your pages
class Pages:
def __init__(self, name=None, link=None):
self.name = name
self.link = link
2 Create pages list
pages = []
3 Find rows locator, like:
rows = driver.find_elements_by_css_selector("your_selector")
rows number must be the same as the number of rows in you table. For example, the you have to items in the list, the rows number will be 20.
4 Get each row value:
for row in rows:
name = row.find_element_by_css_selector("here is a unique selector for each data field for name").text
link = row.find_element_by_css_selector("here is a unique selector for each data field for link").text
5 Create pages object:
page = Page(name=name,link=link)
6 Put all rows to the list:
pages.append(page)
Result
A list of pages (object page) where the first row will be accessible with pages[0], second with pages[1] and so on.
P.S
If you are having trouble with selectors, as them as different questions.
I think I explained the concept to you, so you will be able to start.
To read data from an Excel file you can use pandas read_excel() method like so:
# https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html
authors_df = pd.read_excel("google_scholar_scrape_multiple_authors.xlsx", sheet_name="authors") # sheet_name is optional in this case
print(authors_df["author_link"])
'''
0 https://scholar.google.com/citations?hl=en&use...
1 https://scholar.google.com/citations?hl=en&use...
2 https://scholar.google.com/citations?hl=en&use...
3 https://scholar.google.com/citations?hl=en&use...
4 https://scholar.google.com/citations?hl=en&use...
Name: author_link, dtype: object
'''
print(authors_df)
'''
author_name author_link
0 Masatoshi Nei https://scholar.google.com/citations?hl=en&use...
1 Ulrich K. Laemmli https://scholar.google.com/citations?hl=en&use...
2 Gene Myers https://scholar.google.com/citations?hl=en&use...
3 Sudhir Kumar https://scholar.google.com/citations?hl=en&use...
4 Irving Weissman https://scholar.google.com/citations?hl=en&use...
'''
To scrape from multiple authors you can use a for loop to iterate over ["author_link"] and extract desired data using beautifulsoup, lxml, and requests libraries.
Code and full example in the online IDE:
from bs4 import BeautifulSoup
import requests, lxml
import pandas as pd
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582",
}
# https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html
authors_df = pd.read_excel("google_scholar_scrape_multiple_authors.xlsx", sheet_name="authors") # sheet_name is optional in this case
# to_list() returns a list of author links so we can iterate over them
for author_link in authors_df["author_link"].to_list():
html = requests.get(author_link, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
print(f"Currently extracting: {soup.select_one('#gsc_prf_in').text}")
author_email = soup.select_one("#gsc_prf_ivh").text
author_image = f'https://scholar.google.com{soup.select_one("#gsc_prf_pup-img")["src"]}'
print(author_image, f"Author email: {author_email}", sep="\n")
# iterating over container with all needed data by accessing the right CSS selector
# have a look at SelectorGadget Chrome extension to easily grab CSS selectors
for article in soup.select("#gsc_a_b .gsc_a_t"):
article_title = article.select_one(".gsc_a_at").text
article_link = f'https://scholar.google.com{article.select_one(".gsc_a_at")["href"]}'
article_authors = article.select_one(".gsc_a_at+ .gs_gray").text
article_publication = article.select_one(".gs_gray+ .gs_gray").text
print(article_title, article_link, article_authors, article_publication, sep="\n")
print("-" * 15)
# part of the output:
'''
Currently extracting: Masatoshi Nei
https://scholar.google.comhttps://scholar.googleusercontent.com/citations?view_op=view_photo&user=VxOmZDgAAAAJ&citpid=3
Author email: Verified email at temple.edu
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:u5HHmVD_uO8C
N Saitou, M Nei
Molecular biology and evolution 4 (4), 406-425, 1987
... other results
---------------
Currently extracting: Irving Weissman
https://scholar.google.com/citations/images/avatar_scholar_128.png
Author email: Verified email at stanford.edu
Stem cells, cancer, and cancer stem cells
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=Y66bJgUAAAAJ&citation_for_view=Y66bJgUAAAAJ:u5HHmVD_uO8C
T Reya, SJ Morrison, MF Clarke, IL Weissman
nature 414 (6859), 105-111, 2001
'''
Alternatively, you can achieve the same thing using Google Scholar Author API from SeprApi. It's a paid API with a free plan.
Essentially, you only need to grab the data you want from the received dictionary without the need to figure out what selectors to use in order to scrape the proper data, how to bypass blocks from Google, how to increase the number of requests.
Example code to integrate:
import re
import pandas as pd
from serpapi import GoogleSearch
authors_df = pd.read_excel("google_scholar_scrape_multiple_authors.xlsx", sheet_name="authors") # sheet_name is optional in this case
for author in authors_df["author_link"].to_list():
params = {
"api_key": "YOUR_API_KEY",
"engine": "google_scholar_author",
"hl": "en",
# using basic regular expression to grab user ID from the passed URL
"author_id": re.search(r"user=(.*)", author).group(1) # -> VxOmZDgAAAAJ, unique author ID from the URL
}
search = GoogleSearch(params)
results = search.get_dict()
print(f"Extracting data from: {results['author']['name']}\n"
f"Author info: {results['author']}\n\n"
f"Author articles:\n{results['articles']}\n")
# part of the output:
'''
Extracting data from: Masatoshi Nei
Author info: {'name': 'Masatoshi Nei', 'affiliations': 'Laura Carnell Professor of Biology, Temple University', 'email': 'Verified email at temple.edu', 'interests': [{'title': 'Evolution', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:evolution', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolution'}, {'title': 'Evolutionary biology', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:evolutionary_biology', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolutionary_biology'}, {'title': 'Molecular evolution', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:molecular_evolution', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amolecular_evolution'}, {'title': 'Population genetics', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:population_genetics', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Apopulation_genetics'}, {'title': 'Phylogenetics', 'link': 'https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:phylogenetics', 'serpapi_link': 'https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aphylogenetics'}], 'thumbnail': 'https://scholar.googleusercontent.com/citations?view_op=view_photo&user=VxOmZDgAAAAJ&citpid=3'}
Author articles:
[{'title': 'The neighbor-joining method: a new method for reconstructing phylogenetic trees.', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:u5HHmVD_uO8C', 'citation_id': 'VxOmZDgAAAAJ:u5HHmVD_uO8C', 'authors': 'N Saitou, M Nei', 'publication': 'Molecular biology and evolution 4 (4), 406-425, 1987', 'cited_by': {'value': 64841, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=7672721046330422437,346314157833338191', 'serpapi_link': 'https://serpapi.com/search.json?cites=7672721046330422437%2C346314157833338191&engine=google_scholar&hl=en', 'cites_id': '7672721046330422437,346314157833338191'}, 'year': '1987'}, {'title': 'MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:Tyk-4Ss8FVUC', 'citation_id': 'VxOmZDgAAAAJ:Tyk-4Ss8FVUC', 'authors': 'K Tamura, D Peterson, N Peterson, G Stecher, M Nei, S Kumar', 'publication': 'Molecular biology and evolution 28 (10), 2731-2739, 2011', 'cited_by': {'value': 44316, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5624029996178252455,5910675136328950108,13537318717249213469', 'serpapi_link': 'https://serpapi.com/search.json?cites=5624029996178252455%2C5910675136328950108%2C13537318717249213469&engine=google_scholar&hl=en', 'cites_id': '5624029996178252455,5910675136328950108,13537318717249213469'}, 'year': '2011'}, {'title': 'MEGA6: molecular evolutionary genetics analysis version 6.0', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:qmtmRrLr0tkC', 'citation_id': 'VxOmZDgAAAAJ:qmtmRrLr0tkC', 'authors': 'K Tamura, G Stecher, D Peterson, A Filipski, S Kumar', 'publication': 'Molecular biology and evolution 30 (12), 2725-2729, 2013', 'cited_by': {'value': 40558, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5258359186493639031', 'serpapi_link': 'https://serpapi.com/search.json?cites=5258359186493639031&engine=google_scholar&hl=en', 'cites_id': '5258359186493639031'}, 'year': '2013'}, {'title': 'MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:u-x6o8ySG0sC', 'citation_id': 'VxOmZDgAAAAJ:u-x6o8ySG0sC', 'authors': 'K Tamura, J Dudley, M Nei, S Kumar', 'publication': 'Molecular biology and evolution 24 (8), 1596-1599, 2007', 'cited_by': {'value': 34245, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=8480751610153565117', 'serpapi_link': 'https://serpapi.com/search.json?cites=8480751610153565117&engine=google_scholar&hl=en', 'cites_id': '8480751610153565117'}, 'year': '2007'}, {'title': 'Molecular evolutionary genetics', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:d1gkVwhDpl0C', 'citation_id': 'VxOmZDgAAAAJ:d1gkVwhDpl0C', 'authors': 'M Nei', 'publication': 'Columbia university press, 1987', 'cited_by': {'value': 20704, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=7660515423132980153', 'serpapi_link': 'https://serpapi.com/search.json?cites=7660515423132980153&engine=google_scholar&hl=en', 'cites_id': '7660515423132980153'}, 'year': '1987'}, {'title': 'MEGA2: molecular evolutionary genetics analysis software', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:IjCSPb-OGe4C', 'citation_id': 'VxOmZDgAAAAJ:IjCSPb-OGe4C', 'authors': 'S Kumar, K Tamura, IB Jakobsen, M Nei', 'publication': 'Bioinformatics 17 (12), 1244-1245, 2001', 'cited_by': {'value': 16078, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=14171206204658643394,531426008085525562,5869149036159079676,8067244568899724142,12929609819447339488,15783386726452728786', 'serpapi_link': 'https://serpapi.com/search.json?cites=14171206204658643394%2C531426008085525562%2C5869149036159079676%2C8067244568899724142%2C12929609819447339488%2C15783386726452728786&engine=google_scholar&hl=en', 'cites_id': '14171206204658643394,531426008085525562,5869149036159079676,8067244568899724142,12929609819447339488,15783386726452728786'}, 'year': '2001'}, {'title': 'Estimation of average heterozygosity and genetic distance from a small number of individuals', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:2osOgNQ5qMEC', 'citation_id': 'VxOmZDgAAAAJ:2osOgNQ5qMEC', 'authors': 'M Nei', 'publication': 'Genetics 89 (3), 583-590, 1978', 'cited_by': {'value': 14504, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=11038674224870321151', 'serpapi_link': 'https://serpapi.com/search.json?cites=11038674224870321151&engine=google_scholar&hl=en', 'cites_id': '11038674224870321151'}, 'year': '1978'}, {'title': 'MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:9yKSN-GCB0IC', 'citation_id': 'VxOmZDgAAAAJ:9yKSN-GCB0IC', 'authors': 'S Kumar, K Tamura, M Nei', 'publication': 'Briefings in bioinformatics 5 (2), 150-163, 2004', 'cited_by': {'value': 14403, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=10013295782066828040,15148316572039251274', 'serpapi_link': 'https://serpapi.com/search.json?cites=10013295782066828040%2C15148316572039251274&engine=google_scholar&hl=en', 'cites_id': '10013295782066828040,15148316572039251274'}, 'year': '2004'}, {'title': 'Mathematical model for studying genetic variation in terms of restriction endonucleases', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:qjMakFHDy7sC', 'citation_id': 'VxOmZDgAAAAJ:qjMakFHDy7sC', 'authors': 'M Nei, WH Li', 'publication': 'Proceedings of the National Academy of Sciences 76 (10), 5269-5273, 1979', 'cited_by': {'value': 13619, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5179626164554275201,1942230974501958280', 'serpapi_link': 'https://serpapi.com/search.json?cites=5179626164554275201%2C1942230974501958280&engine=google_scholar&hl=en', 'cites_id': '5179626164554275201,1942230974501958280'}, 'year': '1979'}, {'title': 'Genetic distance between populations', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:UeHWp8X0CEIC', 'citation_id': 'VxOmZDgAAAAJ:UeHWp8X0CEIC', 'authors': 'M Nei', 'publication': 'The American Naturalist 106 (949), 283-292, 1972', 'cited_by': {'value': 12980, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=4154924214026252226,7115074001272219295', 'serpapi_link': 'https://serpapi.com/search.json?cites=4154924214026252226%2C7115074001272219295&engine=google_scholar&hl=en', 'cites_id': '4154924214026252226,7115074001272219295'}, 'year': '1972'}, {'title': 'Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:Y0pCki6q_DkC', 'citation_id': 'VxOmZDgAAAAJ:Y0pCki6q_DkC', 'authors': 'K Tamura, M Nei', 'publication': 'Molecular biology and evolution 10 (3), 512-526, 1993', 'cited_by': {'value': 11093, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=13509507708085673250', 'serpapi_link': 'https://serpapi.com/search.json?cites=13509507708085673250&engine=google_scholar&hl=en', 'cites_id': '13509507708085673250'}, 'year': '1993'}, {'title': 'Analysis of gene diversity in subdivided populations', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:zYLM7Y9cAGgC', 'citation_id': 'VxOmZDgAAAAJ:zYLM7Y9cAGgC', 'authors': 'M Nei', 'publication': 'Proceedings of the national academy of sciences 70 (12), 3321-3323, 1973', 'cited_by': {'value': 10714, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=11712109356391350421', 'serpapi_link': 'https://serpapi.com/search.json?cites=11712109356391350421&engine=google_scholar&hl=en', 'cites_id': '11712109356391350421'}, 'year': '1973'}, {'title': 'Molecular evolution and phylogenetics', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:YsMSGLbcyi4C', 'citation_id': 'VxOmZDgAAAAJ:YsMSGLbcyi4C', 'authors': 'M Nei, S Kumar', 'publication': 'Oxford University Press, USA, 2000', 'cited_by': {'value': 8795, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=703217195301701212,1351927951694611906', 'serpapi_link': 'https://serpapi.com/search.json?cites=703217195301701212%2C1351927951694611906&engine=google_scholar&hl=en', 'cites_id': '703217195301701212,1351927951694611906'}, 'year': '2000'}, {'title': 'Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions.', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:W7OEmFMy1HYC', 'citation_id': 'VxOmZDgAAAAJ:W7OEmFMy1HYC', 'authors': 'M Nei, T Gojobori', 'publication': 'Molecular biology and evolution 3 (5), 418-426, 1986', 'cited_by': {'value': 5279, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=12106160511321461626', 'serpapi_link': 'https://serpapi.com/search.json?cites=12106160511321461626&engine=google_scholar&hl=en', 'cites_id': '12106160511321461626'}, 'year': '1986'}, {'title': 'Prospects for inferring very large phylogenies by using the neighbor-joining method', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:0EnyYjriUFMC', 'citation_id': 'VxOmZDgAAAAJ:0EnyYjriUFMC', 'authors': 'K Tamura, M Nei, S Kumar', 'publication': 'Proceedings of the National Academy of Sciences 101 (30), 11030-11035, 2004', 'cited_by': {'value': 4882, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=9650987578903829104', 'serpapi_link': 'https://serpapi.com/search.json?cites=9650987578903829104&engine=google_scholar&hl=en', 'cites_id': '9650987578903829104'}, 'year': '2004'}, {'title': 'The bottleneck effect and genetic variability in populations', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:eQOLeE2rZwMC', 'citation_id': 'VxOmZDgAAAAJ:eQOLeE2rZwMC', 'authors': 'M Nei, T Maruyama, R Chakraborty', 'publication': 'Evolution, 1-10, 1975', 'cited_by': {'value': 3906, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=13149273985544466189', 'serpapi_link': 'https://serpapi.com/search.json?cites=13149273985544466189&engine=google_scholar&hl=en', 'cites_id': '13149273985544466189'}, 'year': '1975'}, {'title': 'Accuracy of estimated phylogenetic trees from molecular data', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:Se3iqnhoufwC', 'citation_id': 'VxOmZDgAAAAJ:Se3iqnhoufwC', 'authors': 'M Nei, F Tajima, Y Tateno', 'publication': 'Journal of molecular evolution 19 (2), 153-170, 1983', 'cited_by': {'value': 2877, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=10638180566709737898', 'serpapi_link': 'https://serpapi.com/search.json?cites=10638180566709737898&engine=google_scholar&hl=en', 'cites_id': '10638180566709737898'}, 'year': '1983'}, {'title': 'Molecular population genetics and evolution.', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:WF5omc3nYNoC', 'citation_id': 'VxOmZDgAAAAJ:WF5omc3nYNoC', 'authors': 'M Nei', 'publication': 'Molecular population genetics and evolution., 1975', 'cited_by': {'value': 2795, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=7886550802885580479', 'serpapi_link': 'https://serpapi.com/search.json?cites=7886550802885580479&engine=google_scholar&hl=en', 'cites_id': '7886550802885580479'}, 'year': '1975'}, {'title': 'Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:roLk4NBRz8UC', 'citation_id': 'VxOmZDgAAAAJ:roLk4NBRz8UC', 'authors': 'AL Hughes, M Nei', 'publication': 'Nature 335 (6186), 167-170, 1988', 'cited_by': {'value': 2169, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=2966744676732650646', 'serpapi_link': 'https://serpapi.com/search.json?cites=2966744676732650646&engine=google_scholar&hl=en', 'cites_id': '2966744676732650646'}, 'year': '1988'}, {'title': 'Sampling variances of heterozygosity and genetic distance', 'link': 'https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VxOmZDgAAAAJ&citation_for_view=VxOmZDgAAAAJ:UebtZRa9Y70C', 'citation_id': 'VxOmZDgAAAAJ:UebtZRa9Y70C', 'authors': 'M Nei, AK Roychoudhury', 'publication': 'Genetics 76 (2), 379-390, 1974', 'cited_by': {'value': 1918, 'link': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5978996318059495400', 'serpapi_link': 'https://serpapi.com/search.json?cites=5978996318059495400&engine=google_scholar&hl=en', 'cites_id': '5978996318059495400'}, 'year': '1974'}]
'''
Disclaimer, I work for SerpApi.
I am importing news API in my Django project. I can print my data in my terminal however I can't print through my news.html file. This could be an issue related to importing the data in HTML.
from django.shortcuts import render
import requests
def news(request):
url = ('https://newsapi.org/v2/top-headlines?'
'sources=bbc-news&'
'apiKey=647505e4506e425994ac0dc310221d04')
response = requests.get(url)
print(response.json())
news = response.json()
return render(request,'new/new.html',{'news':news})
base.html
<html>
<head>
<title></title>
</head>
<body>
{% block content %}
{% endblock %}
</body>
</html>
news.html
{% extends 'base.html' %}
{% block content %}
<h2>news API</h2>
{% if news %}
<p><strong>{{ news.title }}</strong><strong>{{ news.name}}</strong> public repositories.</p>
{% endif %}
{% endblock %}
Terminal and API Output
System check identified no issues (0 silenced).
November 28, 2018 - 12:31:07
Django version 2.1.3, using settings 'map.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
{'status': 'ok', 'totalResults': 10, 'articles': [{'source':
{'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News',
'title': 'Sri Lanka defence chief held over murders',
'description': "The country's top officer is in custody, accused of covering up illegal killings in the civil war.", 'url': 'http://www.bbc.co.uk/news/world-asia-46374111', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/1010/production/_104521140_26571c51-e151-41b9-85a3-d6e441f5262b.jpg', 'publishedAt': '2018-11-28T12:12:05Z', 'content': "Image copyright AFP Image caption Adm Wijeguneratne denies the charges Sri Lanka's top military officer has been remanded in custody, accused of covering up civil war-era murders. Chief of Defence Staff Ravindra Wijeguneratne appeared in court after warrants … [+288 chars]"}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': 'Flash-flooding causes chaos in Sydney', 'description': "Emergency crews respond to hundreds of calls on the city's wettest November day since 1984.", 'url': 'http://www.bbc.co.uk/news/world-australia-46366961', 'urlToImage': 'https://ichef.bbci.co.uk/images/ic/1024x576/p06t1d6h.jpg', 'publishedAt': '2018-11-28T11:58:49Z', 'content': 'Media caption People in vehicles were among those caught up in the floods Sydney has been deluged by the heaviest November rain it has experienced in decades, causing flash-flooding, traffic chaos and power cuts. Heavy rain fell throughout Wednesday, the city… [+2282 chars]'}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': "Rapist 'gets chance to see victim's child'", 'description': 'Sammy Woodhouse calls for a law change after rapist Arshid Hussain is given the chance to see his son.', 'url': 'http://www.bbc.co.uk/news/uk-england-south-yorkshire-46368991', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/12C94/production/_95184967_jessica.jpg', 'publishedAt': '2018-11-28T09:38:07Z', 'content': "Image caption Sammy Woodhouse's son was conceived when she was raped by Arshid Hussain A child exploitation scandal victim has called for a law change amid claims a man who raped her has been invited to play a role in her son's life. Arshid Hussain, who was j… [+2543 chars]"}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': 'China chemical plant explosion kills 22', 'description': 'Initial reports say a vehicle carrying chemicals exploded while waiting to enter the north China plant.', 'url': 'http://www.bbc.co.uk/news/world-asia-46369041', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/2E1A/production/_104520811_mediaitem104520808.jpg', 'publishedAt': '2018-11-28T08:03:12Z', 'content': 'Image copyright AFP Image caption A line of burnt out vehicles could be seen outside the chemical plant At least 22 people have died and 22 more were injured in a blast outside a chemical factory in northern China. A vehicle carrying chemicals exploded while … [+1252 chars]'}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': 'Thousands told to flee Australia bushfire', 'description': 'Queensland\'s fire danger warning has been raised to "catastrophic" for the first time.', 'url': 'http://www.bbc.co.uk/news/world-australia-46366964', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/8977/production/_104519153_1ccd493b-4500-4d8d-9e6c-f32ba036dd3e.jpg', 'publishedAt': '2018-11-28T07:01:41Z', 'content': 'Image copyright EPA Image caption More than 130 bushfires are burning across Queensland, officials say Thousands of Australians have been told to evacuate their homes as a powerful bushfire threatens properties in Queensland. It follows the raising of the sta… [+974 chars]'}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': "Chinese scientist defends 'gene-editing'", 'description': "He Jiankui shocked the world by claiming he had created the world's first genetically edited children.", 'url': 'http://www.bbc.co.uk/news/world-asia-china-46368731', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/7A23/production/_97176213_breaking_news_bigger.png', 'publishedAt': '2018-11-28T06:00:22Z', 'content': 'A Chinese scientist who claims to have created the world\'s first genetically edited babies has defended his work. Speaking at a genome summit in Hong Kong, He Jiankui, an associate professor at a Shenzhen university, said he was "proud" of his work. He said "… [+335 chars]'}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': 'Republican wins Mississippi Senate seat', 'description': "Cindy Hyde-Smith wins Mississippi's Senate race in a vote overshadowed by racial acrimony.", 'url': 'http://www.bbc.co.uk/news/world-us-canada-46361369', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/3A2B/production/_104519841_050855280.jpg', 'publishedAt': '2018-11-28T04:19:15Z', 'content': "Image copyright Reuters Image caption In her victory speech, Cindy Hyde-Smith promised to represent all Mississippians Republican Cindy Hyde-Smith has won Mississippi's racially charged Senate election, beating a challenge from the black Democrat, Mike Espy. … [+4327 chars]"}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': "Lion Air should 'improve safety culture'", 'description': 'Indonesian authorities release a preliminary report into a crash in October that killed 189 people.', 'url': 'http://www.bbc.co.uk/news/world-asia-46121127', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/1762F/production/_104519759_45e74e27-2dc6-45dc-bded-405c057702f5.jpg', 'publishedAt': '2018-11-28T04:10:45Z', 'content': "Image copyright Reuters Image caption The families of the victims visited the site of the crash to pay tribute Indonesian authorities have recommended that budget airline Lion Air improve its safety culture, in a preliminary report into last month's deadly cr… [+1725 chars]"}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': "Trump 'may cancel Putin talks over Ukraine'", 'description': '"I don\'t like the aggression," the US leader says after Russia seizes Ukrainian boats off Crimea.', 'url': 'http://www.bbc.co.uk/news/world-europe-46367191', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/0C77/production/_104519130_050842389.jpg', 'publishedAt': '2018-11-28T01:08:30Z', 'content': 'Image copyright AFP Image caption Some of the detained Ukrainians have appeared in court in Crimea US President Donald Trump says he may cancel a meeting with Russian President Vladimir Putin following a maritime clash between Russia and Ukraine. Mr Trump tol… [+4595 chars]'}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': 'Wandering dog home after 2,200-mile adventure', 'description': "Sinatra the husky was found in Florida 18 months after vanishing in New York. Here's how he got home.", 'url': 'http://www.bbc.co.uk/news/world-us-canada-46353240', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/D49E/production/_104503445_p06t0kn9.jpg', 'publishedAt': '2018-11-27T21:47:59Z', 'content': "Video Sinatra the husky was found in Florida 18 months after vanishing in New York. Here's the remarkable story of how he got home."}]}
[28/Nov/2018 12:31:12] "GET / HTTP/1.1" 200 155
The data you get from that API doesn't have title or name as attributes at the top level. Rather, they are inside the articles element, which itself is a list.
{% for article in news.articles %}
<p><strong>{{ article.title }}</strong><strong>{{ article.source.name }}</strong> public repositories.</p>
{% endif %}