I am importing news API in my Django project. I can print my data in my terminal however I can't print through my news.html file. This could be an issue related to importing the data in HTML.
from django.shortcuts import render
import requests
def news(request):
url = ('https://newsapi.org/v2/top-headlines?'
'sources=bbc-news&'
'apiKey=647505e4506e425994ac0dc310221d04')
response = requests.get(url)
print(response.json())
news = response.json()
return render(request,'new/new.html',{'news':news})
base.html
<html>
<head>
<title></title>
</head>
<body>
{% block content %}
{% endblock %}
</body>
</html>
news.html
{% extends 'base.html' %}
{% block content %}
<h2>news API</h2>
{% if news %}
<p><strong>{{ news.title }}</strong><strong>{{ news.name}}</strong> public repositories.</p>
{% endif %}
{% endblock %}
Terminal and API Output
System check identified no issues (0 silenced).
November 28, 2018 - 12:31:07
Django version 2.1.3, using settings 'map.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
{'status': 'ok', 'totalResults': 10, 'articles': [{'source':
{'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News',
'title': 'Sri Lanka defence chief held over murders',
'description': "The country's top officer is in custody, accused of covering up illegal killings in the civil war.", 'url': 'http://www.bbc.co.uk/news/world-asia-46374111', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/1010/production/_104521140_26571c51-e151-41b9-85a3-d6e441f5262b.jpg', 'publishedAt': '2018-11-28T12:12:05Z', 'content': "Image copyright AFP Image caption Adm Wijeguneratne denies the charges Sri Lanka's top military officer has been remanded in custody, accused of covering up civil war-era murders. Chief of Defence Staff Ravindra Wijeguneratne appeared in court after warrants … [+288 chars]"}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': 'Flash-flooding causes chaos in Sydney', 'description': "Emergency crews respond to hundreds of calls on the city's wettest November day since 1984.", 'url': 'http://www.bbc.co.uk/news/world-australia-46366961', 'urlToImage': 'https://ichef.bbci.co.uk/images/ic/1024x576/p06t1d6h.jpg', 'publishedAt': '2018-11-28T11:58:49Z', 'content': 'Media caption People in vehicles were among those caught up in the floods Sydney has been deluged by the heaviest November rain it has experienced in decades, causing flash-flooding, traffic chaos and power cuts. Heavy rain fell throughout Wednesday, the city… [+2282 chars]'}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': "Rapist 'gets chance to see victim's child'", 'description': 'Sammy Woodhouse calls for a law change after rapist Arshid Hussain is given the chance to see his son.', 'url': 'http://www.bbc.co.uk/news/uk-england-south-yorkshire-46368991', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/12C94/production/_95184967_jessica.jpg', 'publishedAt': '2018-11-28T09:38:07Z', 'content': "Image caption Sammy Woodhouse's son was conceived when she was raped by Arshid Hussain A child exploitation scandal victim has called for a law change amid claims a man who raped her has been invited to play a role in her son's life. Arshid Hussain, who was j… [+2543 chars]"}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': 'China chemical plant explosion kills 22', 'description': 'Initial reports say a vehicle carrying chemicals exploded while waiting to enter the north China plant.', 'url': 'http://www.bbc.co.uk/news/world-asia-46369041', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/2E1A/production/_104520811_mediaitem104520808.jpg', 'publishedAt': '2018-11-28T08:03:12Z', 'content': 'Image copyright AFP Image caption A line of burnt out vehicles could be seen outside the chemical plant At least 22 people have died and 22 more were injured in a blast outside a chemical factory in northern China. A vehicle carrying chemicals exploded while … [+1252 chars]'}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': 'Thousands told to flee Australia bushfire', 'description': 'Queensland\'s fire danger warning has been raised to "catastrophic" for the first time.', 'url': 'http://www.bbc.co.uk/news/world-australia-46366964', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/8977/production/_104519153_1ccd493b-4500-4d8d-9e6c-f32ba036dd3e.jpg', 'publishedAt': '2018-11-28T07:01:41Z', 'content': 'Image copyright EPA Image caption More than 130 bushfires are burning across Queensland, officials say Thousands of Australians have been told to evacuate their homes as a powerful bushfire threatens properties in Queensland. It follows the raising of the sta… [+974 chars]'}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': "Chinese scientist defends 'gene-editing'", 'description': "He Jiankui shocked the world by claiming he had created the world's first genetically edited children.", 'url': 'http://www.bbc.co.uk/news/world-asia-china-46368731', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/7A23/production/_97176213_breaking_news_bigger.png', 'publishedAt': '2018-11-28T06:00:22Z', 'content': 'A Chinese scientist who claims to have created the world\'s first genetically edited babies has defended his work. Speaking at a genome summit in Hong Kong, He Jiankui, an associate professor at a Shenzhen university, said he was "proud" of his work. He said "… [+335 chars]'}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': 'Republican wins Mississippi Senate seat', 'description': "Cindy Hyde-Smith wins Mississippi's Senate race in a vote overshadowed by racial acrimony.", 'url': 'http://www.bbc.co.uk/news/world-us-canada-46361369', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/3A2B/production/_104519841_050855280.jpg', 'publishedAt': '2018-11-28T04:19:15Z', 'content': "Image copyright Reuters Image caption In her victory speech, Cindy Hyde-Smith promised to represent all Mississippians Republican Cindy Hyde-Smith has won Mississippi's racially charged Senate election, beating a challenge from the black Democrat, Mike Espy. … [+4327 chars]"}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': "Lion Air should 'improve safety culture'", 'description': 'Indonesian authorities release a preliminary report into a crash in October that killed 189 people.', 'url': 'http://www.bbc.co.uk/news/world-asia-46121127', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/1762F/production/_104519759_45e74e27-2dc6-45dc-bded-405c057702f5.jpg', 'publishedAt': '2018-11-28T04:10:45Z', 'content': "Image copyright Reuters Image caption The families of the victims visited the site of the crash to pay tribute Indonesian authorities have recommended that budget airline Lion Air improve its safety culture, in a preliminary report into last month's deadly cr… [+1725 chars]"}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': "Trump 'may cancel Putin talks over Ukraine'", 'description': '"I don\'t like the aggression," the US leader says after Russia seizes Ukrainian boats off Crimea.', 'url': 'http://www.bbc.co.uk/news/world-europe-46367191', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/0C77/production/_104519130_050842389.jpg', 'publishedAt': '2018-11-28T01:08:30Z', 'content': 'Image copyright AFP Image caption Some of the detained Ukrainians have appeared in court in Crimea US President Donald Trump says he may cancel a meeting with Russian President Vladimir Putin following a maritime clash between Russia and Ukraine. Mr Trump tol… [+4595 chars]'}, {'source': {'id': 'bbc-news', 'name': 'BBC News'}, 'author': 'BBC News', 'title': 'Wandering dog home after 2,200-mile adventure', 'description': "Sinatra the husky was found in Florida 18 months after vanishing in New York. Here's how he got home.", 'url': 'http://www.bbc.co.uk/news/world-us-canada-46353240', 'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/D49E/production/_104503445_p06t0kn9.jpg', 'publishedAt': '2018-11-27T21:47:59Z', 'content': "Video Sinatra the husky was found in Florida 18 months after vanishing in New York. Here's the remarkable story of how he got home."}]}
[28/Nov/2018 12:31:12] "GET / HTTP/1.1" 200 155
The data you get from that API doesn't have title or name as attributes at the top level. Rather, they are inside the articles element, which itself is a list.
{% for article in news.articles %}
<p><strong>{{ article.title }}</strong><strong>{{ article.source.name }}</strong> public repositories.</p>
{% endif %}
Related
I am trying to join two lists and export as csv, but when the csv is built, it's all messed up with spaces that I don't know the origin of and with the first line strangely duplicated as you can see in the attached image.
import csv
amazondata = [{'amzlink': 'https://www.amazon.com/dp/B084ZZ7VY3', 'asin': 'B084ZZ7VY3', 'url': 'https://www.amazon.com/s?k=712145360504&s=review-rank', 'title': '100% Non-GMO Berberine HCL Complex Supplement - Supports Gut, Heart, and Immune System Health- Harvested in The Himalayas, Helps Regulate Blood Sugar & Cholesterol, 100% Free of Additives & Allergens', 'price': '$14.95', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/81D1P4QqLfL._AC_SX425_.jpg', 'rank': 'Best Sellers Rank: #194,130 in Health & Household (See Top 100 in Health & Household)\n#6,896 in Blended Vitamin & Mineral Supplements', 'rating': '4.7 out of 5'}, {'amzlink': 'https://www.amazon.com/dp/B000NRTWS6', 'asin': 'B000NRTWS6', 'url': 'https://www.amazon.com/s?k=753950000698&s=review-rank'}, {'amzlink': 'https://www.amazon.com/dp/B07XM9P4C4', 'asin': 'B07XM9P4C4', 'url': 'https://www.amazon.com/s?k=753950005266&s=review-rank'}, {'amzlink': 'https://www.amazon.com/dp/B08KJ1VQJD', 'asin': 'B08KJ1VQJD', 'url': 'https://www.amazon.com/s?k=753950005242&s=review-rank'}, {'amzlink': 'https://www.amazon.com/dp/B005P0VD4W', 'asin': 'B005P0VD4W', 'url': 'https://www.amazon.com/s?k=043292560180&s=review-rank'}, {'amzlink': 'https://www.amazon.com/dp/B008FCJTES', 'asin': 'B008FCJTES', 'url': 'https://www.amazon.com/s?k=311845053213&s=review-rank'}]
amazonPage = [{'title': '100% Non-GMO Berberine HCL Complex Supplement - Supports Gut, Heart, and Immune System Health- Harvested in The Himalayas, Helps Regulate Blood Sugar & Cholesterol, 100% Free of Additives & Allergens', 'price': '$14.95', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/81D1P4QqLfL._AC_SX425_.jpg', 'rank': 'Best Sellers Rank: #194,130 in Health & Household (See Top 100 in Health & Household)\n#6,896 in Blended Vitamin & Mineral Supplements', 'rating': '4.7 out of 5'}, {'title': "Doctor's Best High Absorption CoQ10 with BioPerine, Vegan, Gluten Free, Naturally Fermented, Heart Health & Energy Production, 100 mg 60 Veggie Caps", 'price': '$14.24 ($0.24 / Count)', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/71sNP5u1N1S._AC_SX425_.jpg', 'rank': 'Rank not found', 'rating': 'Rating not found'}, {'title': "Doctor's Best CoQ10 Gummies 200 Mg, Coenzyme Q10 (Ubiquinone), Supports Heart Health, Boost Cellular Energy, Potent Antioxidant, 60 Ct (Packaging May Vary)", 'price': '$19.96 ($0.33 / Count)', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/71NIwP2V2uL._AC_SY450_.jpg', 'rank': 'Best Sellers Rank: #20,634 in Health & Household (See Top 100 in Health & Household)\n#73 in CoQ10 Nutritional Supplements', 'rating': '4.7 out of 5'}, {'title': 'CoQ10 300mg Doctors Best 30 Softgel', 'price': '$19.64', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/61DttvOf18L._AC_SX425_.jpg', 'rank': 'Manufacturer \u200f : \u200e Doctors Best', 'rating': 'Rating not found'}, {'title': 'Ultra Glandulars Ultra Raw Eye 60 Tab', 'price': '$19.65', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/71ASBPlEphL._AC_SX425_.jpg', 'rank':
'Best Sellers Rank: #286,790 in Health & Household (See Top 100 in Health & Household)\n#14,261 in Herbal Supplements', 'rating': '4.1 out of 5'}, {'title': 'Mason Natural Garlic Oil 500 mg Odorless Allium Sativum Supplement - Supports Healthy Circulatory Function, 100 Softgels', 'price': '$6.75', 'image': 'https://m.media-amazon.com/images/W/IMAGERENDERING_521856-T2/images/I/71eXVu3zxFL._AC_SX425_.jpg', 'rank': 'Best Sellers Rank: #346,287 in Health & Household (See Top 100 in Health & Household)\n#350 in Garlic Herbal Supplements', 'rating': '4.9 out of 5'}]
result = []
amazonPage.extend(amazondata)
for myDict in amazonPage:
if myDict not in result:
result.append(myDict)
print (result)
amazonPage[0].update(amazondata[0])
keys = amazonPage[0].keys()
print(keys)
with open('Test WF.csv', 'w', newline='', encoding="utf-8") as csvfile:
dict_writer = csv.DictWriter(csvfile, keys)
dict_writer.writeheader()
dict_writer.writerows(result)
csv output image
You're only merging the first dictionary in each list. You should merge them all.
result = [data | page for data, page in zip(amazondata, amazonPage)]
keys = result[0].keys()
I'm attempting to get data from Wikipedias sidebar on the 'Current Events' page with the below. At the moment this produces an array of Objects each with value title and url.
I would also like to provide a new value to the objects in array headline derived from the <h3> id or text content. This would result in each object having three values: headline, url and title. However, I'm unsure how to iterate through these.
Beautiful Soup Code
soup = BeautifulSoup(response, "html.parser").find('div', {'aria-labelledby': 'Ongoing_events'})
links = soup.findAll('a')
for item in links:
title = item.text
url = ("https://en.wikipedia.org"+item['href'])
eo = CurrentEventsObject(title, url)
eventsArray.append(eo)
Wikipedia Current Events List
<div class="mw-collapsible-content">
<h3><span class="mw-headline" id="Disasters">Disasters</span</h3>
<ul>
<li>Climate crisis</li>
<li>COVID-19 pandemic</li>
<li>2021–22 European windstorm season</li>
<li>2020–21 H5N8 outbreak</li>
<li>2021 Pacific typhoon season</li>
<li>Madagascar food crisis</li>
<li>Water crisis in Iran</li>
<li>Yemeni famine</li>
<li>2021 La Palma eruption</li>
</ul>
<h3><span class="mw-headline" id="Economic">Economic</span></h3>
<ul>
<li>2020–2021 global chip shortage</li>
<li>2021 global supply chain crisis</li>
<li>COVID-19 recession</li>
<li>Lebanese liquidity crisis</li>
<li>Pandora Papers leak</li>
<li>Sri Lankan economic and food crisis</li>
<li>Turkish currency and debt crisis</li>
<li>United Kingdom natural gas supplier crisis</li>
</ul>
<h3><span class="mw-headline" id="Politics">Politics</span></h3>
<ul>
<li>Belarus−European Union border crisis</li>
<li>Brazilian protests</li>
<li>Colombian tax reform protests</li>
<li>Eswatini protests</li>
<li>Haitian protests</li>
<li>Indian farmers' protests</li>
<li>Insulate Britain protests</li>
<li>Jersey dispute</li>
<li>Libyan peace process</li>
<li>Malaysian political crisis</li>
<li>Myanmar protests</li>
<li>Nicaraguan protests</li>
<li>Nigerian protests</li>
<li>Persian Gulf crisis</li>
<li>Peruvian crisis</li>
<li>Russian election protests</li>
<li>Solomon Islands unrest</li>
<li>Tigrayan peace process</li>
<li>Thai protests</li>
<li>Tunisian political crisis</li>
<li>United States racial unrest</li>
<li>Venezuelan presidential crisis</li>
</ul>
<div class="editlink noprint plainlinks"><a class="external text" href="https://en.wikipedia.org/w/index.php?title=Portal:Current_events/Sidebar&action=edit">edit section</a></div>
</div>
Note: Try to select your elements more specific to get all information in one process - Defining a list outside your loops will avoid from overwriting
Following steps will create a list of dicts, that for example could simply iterated or turned into a data frame.
#1
Select all <ul> that are direct siblings of a <h3>
soup.select('h3 + ul')
#2 Select the <h3> and get its text:
e.find_previous_sibling('h3').text.strip()
#3 Select all <a> in the <ul> and iterat the results while creating a list of dicts:
for a in e.select('a'):
data.append({
'headline':headline,
'title': a['title'],
'url':'https://en.wikipedia.org'+a['href']
})
Example
soup = BeautifulSoup(response, "html.parser").find('div', {'aria-labelledby': 'Ongoing_events'})
data = []
for e in soup.select('h3 + ul'):
headline = e.find_previous_sibling('h3').text.strip()
for a in e.select('a'):
data.append({
'headline':headline,
'title': a['title'],
'url':'https://en.wikipedia.org'+a['href']
})
data
Output
[{'headline': 'Disasters',
'title': 'Climate crisis',
'url': 'https://en.wikipedia.org/wiki/Climate_crisis'},
{'headline': 'Disasters',
'title': 'COVID-19 pandemic',
'url': 'https://en.wikipedia.org/wiki/COVID-19_pandemic'},
{'headline': 'Disasters',
'title': '2021–22 European windstorm season',
'url': 'https://en.wikipedia.org/wiki/2021%E2%80%9322_European_windstorm_season'},
{'headline': 'Disasters',
'title': '2020–21 H5N8 outbreak',
'url': 'https://en.wikipedia.org/wiki/2020%E2%80%9321_H5N8_outbreak'},
{'headline': 'Disasters',
'title': '2021 Pacific typhoon season',
'url': 'https://en.wikipedia.org/wiki/2021_Pacific_typhoon_season'},
{'headline': 'Disasters',
'title': '2021 Madagascar food crisis',
'url': 'https://en.wikipedia.org/wiki/2021_Madagascar_food_crisis'},
{'headline': 'Disasters',
'title': 'Water scarcity in Iran',
'url': 'https://en.wikipedia.org/wiki/Water_scarcity_in_Iran'},
{'headline': 'Disasters',
'title': 'Famine in Yemen (2016–present)',
'url': 'https://en.wikipedia.org/wiki/Famine_in_Yemen_(2016%E2%80%93present)'},
{'headline': 'Disasters',
'title': '2021 Cumbre Vieja volcanic eruption',
'url': 'https://en.wikipedia.org/wiki/2021_Cumbre_Vieja_volcanic_eruption'},
{'headline': 'Economic',
'title': '2020–2021 global chip shortage',
'url': 'https://en.wikipedia.org/wiki/2020%E2%80%932021_global_chip_shortage'},
{'headline': 'Economic',
'title': '2021 global supply chain crisis',
'url': 'https://en.wikipedia.org/wiki/2021_global_supply_chain_crisis'},
{'headline': 'Economic',
'title': 'COVID-19 recession',
'url': 'https://en.wikipedia.org/wiki/COVID-19_recession'},
{'headline': 'Economic',
'title': 'Lebanese liquidity crisis',
'url': 'https://en.wikipedia.org/wiki/Lebanese_liquidity_crisis'},
{'headline': 'Economic',
'title': 'Pandora Papers',
'url': 'https://en.wikipedia.org/wiki/Pandora_Papers'},
{'headline': 'Economic',
'title': '2021 Sri Lankan economic crisis',
'url': 'https://en.wikipedia.org/wiki/2021_Sri_Lankan_economic_crisis'},
{'headline': 'Economic',
'title': '2018–2021 Turkish currency and debt crisis',
'url': 'https://en.wikipedia.org/wiki/2018%E2%80%932021_Turkish_currency_and_debt_crisis'},
{'headline': 'Economic',
'title': '2021 United Kingdom natural gas supplier crisis',
'url': 'https://en.wikipedia.org/wiki/2021_United_Kingdom_natural_gas_supplier_crisis'},
{'headline': 'Politics',
'title': '2021 Belarus–European Union border crisis',
'url': 'https://en.wikipedia.org/wiki/2021_Belarus%E2%80%93European_Union_border_crisis'},
{'headline': 'Politics',
'title': '2021 Brazilian protests',
'url': 'https://en.wikipedia.org/wiki/2021_Brazilian_protests'},
{'headline': 'Politics',
'title': '2021 Colombian protests',
'url': 'https://en.wikipedia.org/wiki/2021_Colombian_protests'},
{'headline': 'Politics',
'title': '2021 Eswatini protests',
'url': 'https://en.wikipedia.org/wiki/2021_Eswatini_protests'},
{'headline': 'Politics',
'title': '2018–2021 Haitian protests',
'url': 'https://en.wikipedia.org/wiki/2018%E2%80%932021_Haitian_protests'},
{'headline': 'Politics',
'title': "2020–2021 Indian farmers' protest",
'url': 'https://en.wikipedia.org/wiki/2020%E2%80%932021_Indian_farmers%27_protest'},
{'headline': 'Politics',
'title': 'Insulate Britain protests',
'url': 'https://en.wikipedia.org/wiki/Insulate_Britain_protests'},
{'headline': 'Politics',
'title': '2021 Jersey dispute',
'url': 'https://en.wikipedia.org/wiki/2021_Jersey_dispute'},
{'headline': 'Politics',
'title': 'Libyan peace process',
'url': 'https://en.wikipedia.org/wiki/Libyan_peace_process'},
{'headline': 'Politics',
'title': '2020–21 Malaysian political crisis',
'url': 'https://en.wikipedia.org/wiki/2020%E2%80%9321_Malaysian_political_crisis'},
{'headline': 'Politics',
'title': '2021 Myanmar protests',
'url': 'https://en.wikipedia.org/wiki/2021_Myanmar_protests'},
{'headline': 'Politics',
'title': '2018–2021 Nicaraguan protests',
'url': 'https://en.wikipedia.org/wiki/2018%E2%80%932021_Nicaraguan_protests'},
{'headline': 'Politics',
'title': 'End SARS',
'url': 'https://en.wikipedia.org/wiki/End_SARS'},
{'headline': 'Politics',
'title': '2019–2021 Persian Gulf crisis',
'url': 'https://en.wikipedia.org/wiki/2019%E2%80%932021_Persian_Gulf_crisis'},
{'headline': 'Politics',
'title': '2017–present Peruvian political crisis',
'url': 'https://en.wikipedia.org/wiki/2017%E2%80%93present_Peruvian_political_crisis'},
{'headline': 'Politics',
'title': '2021 Russian election protests',
'url': 'https://en.wikipedia.org/wiki/2021_Russian_election_protests'},
{'headline': 'Politics',
'title': '2021 Solomon Islands unrest',
'url': 'https://en.wikipedia.org/wiki/2021_Solomon_Islands_unrest'},
{'headline': 'Politics',
'title': 'Tigrayan peace process',
'url': 'https://en.wikipedia.org/wiki/Tigrayan_peace_process'},
{'headline': 'Politics',
'title': '2020–2021 Thai protests',
'url': 'https://en.wikipedia.org/wiki/2020%E2%80%932021_Thai_protests'},
{'headline': 'Politics',
'title': '2021 Tunisian political crisis',
'url': 'https://en.wikipedia.org/wiki/2021_Tunisian_political_crisis'},
{'headline': 'Politics',
'title': '2020–2021 United States racial unrest',
'url': 'https://en.wikipedia.org/wiki/2020%E2%80%932021_United_States_racial_unrest'},
{'headline': 'Politics',
'title': 'Venezuelan presidential crisis',
'url': 'https://en.wikipedia.org/wiki/Venezuelan_presidential_crisis'}]
I have a list that contains dictionaries of lists that contain dictionaries of lists which contain dictionaries of key/value pairs. It looks something like this:
super_data: [{'data': [{'attributes': {'company_name': 'Test Ltd.',
'created_at': '2011-03-31T22:24:37.000Z',
'description': 'Department: Testing Department\r\n'
'Salary: 100000\r\n'
'Bonus: 25000\r\n'
'Position Type: Full Time\r\n'
'Reference: 0\r\n'
'Company Confirmed: 0\r\n'
'Candidate Confirmed: 0\r\n'
'Signing: 0\r\n'
'Performance: -1\r\n'
'Car: -1\r\n'
'Stock: -1\r\n'
'Club: -1\r\n'
'Performance Details: 50%\r\n',
'from': '2003-01-02',
'name': 'Chief Testing Officer',
'present': True,
'to': None,
'updated_at': '2020-07-21T15:16:05.961Z'},
'id': '86735',
'relationships': {},
'type': 'positions'},
{'attributes': {'company_name': 'Test Ltd.',
'created_at': '2011-03-31T22:24:37.000Z',
'description': 'Salary: 60000\r\n',
'from': '2002-01-02',
'name': 'Chief Operating Officer',
'present': False,
'to': '2003-01-02',
'updated_at': '2020-07-07T21:03:00.213Z'},
'id': '8976543',
'relationships': {},
'type': 'positions'}],
'links': {'first': 'https://api.test.com/api/v5/contacts/65789/positions?offset=0',
'next': None,
'prev': None},
'meta': {'count': 2, 'offset': 0}},
{'data': [{'attributes': {'company_name': 'Tester',
'created_at': '2020-01-13T16:15:58.313Z',
'description': None,
'from': '2013-03-01',
'name': 'Chief Operating and Financial Officer',
'present': True,
'to': None,
'updated_at': '2020-05-21T21:45:53.183Z'},
'id': '354833',
'relationships': {},
'type': 'positions'},
{'attributes': {'company_name': 'Testy',
'created_at': '2020-01-13T16:15:58.470Z',
'description':
'from': '2008-02-01',
'name': 'Chief Financial Officer',
'present': False,
'to': '2013-03-01',
'updated_at': '2020-01-13T16:15:58.470Z'},
'id': '53435',
'relationships': {},
'type': 'positions'},
{'attributes': {'company_name': 'Test 3',
'created_at': '2020-01-13T16:15:58.627Z',
'description':
'from': '2000-01-01',
'name': 'Chief Financial Officer'
'Partners',
'present': False,
'to': '2007-01-01',
'updated_at': '2020-01-13T16:15:58.627Z'},
'id': '876534',
'relationships': {},
'type': 'positions'},
{'attributes': {'company_name': 'Test4',
'created_at': '2020-01-13T16:15:58.780Z',
'description':
'from': '1996-01-01',
'name': 'Chief Financial Officer',
'present': False,
'to': '2000-01-01',
'updated_at': '2020-01-13T16:15:58.780Z'},
'id': '65435',
'relationships': {},
'type': 'positions'},
{'attributes': {'company_name': 'Test5',
'created_at': '2020-01-13T16:15:59.291Z',
'description': None,
'from': '1992-01-01',
'name': 'Auditor',
'present': False,
'to': '1996-01-01',
'updated_at': '2020-01-13T16:15:59.291Z'},
'id': '4654654',
'relationships': {},
'type': 'positions'}],
'links': {'first': 'https://api.test.com/api/v5/contacts/45687/positions?offset=0',
'next': None,
'prev': None},
'meta': {'count': 5, 'offset': 0}}]
I'm trying to use this to place specific values in cells in a word template using docxtpl and jinja2 tags.
For example: I want to be able to have the first cell in a column contain the values for all of the 'company_name' keys and 'name' keys inside all of the 'attribute' sets of the first 'data' set, and then have the next ones in subsequent cells in a column.
So I'm trying to make it look like this:
Jobs
______________________________________________
Test Ltd. - Chief Testing Officer
Test Ltd. - Chief Operating Officer
______________________________________________
Tester - Chief Operating and Financial Officer
Testy - Chief Financial Officer
Test 3 - Chief Financial Officer
Test 4 - Chief Financial Officer
Test 5 - Auditor
In my doc template I have a table containing the following jinja2 tags to try create the above, but it's not quite working:
{% hm %} | {% for j in item.data %}v | {%tc endfor %}
{%tc for item |{{ j.attributes.company_name }}{{ j.attributes.name }}|
in super_data %} | {% endfor %} |
Maybe I need to convert the lists to dicts in python to get this to work? Not sure how to do that, if that's the case. Also, the number of dicts and lists is variable.
Would appreciate any advice on how to accomplish what I'm trying to do and working with this data set.. I'm pulling my hair out!
Sometimes it is easier to first do it in normal python then make a jinja template do the same thing. (Just make sure that your super_data is correctly formatted as it was missing some "description" values)
Python draft:
for section in super_data:
for listing in section["data"]:
print(listing["attributes"]["company_name"] + " - " + listing["attributes"]["name"])
print()
Draft result:
Test Ltd. - Chief Testing Officer
Test Ltd. - Chief Operating Officer
Tester - Chief Operating and Financial Officer
Testy - Chief Financial Officer
Test 3 - Chief Financial OfficerPartners
Test4 - Chief Financial Officer
Test5 - Auditor
Jinja2 template to print same thing as the draft: (make sure to pass named super_data argument to jinja rendering function)
<h1>Jobs</h1>
{% for section in super_data %}
{% for listing in section["data"] %}
<p>
{{ listing["attributes"]["company_name"] }} - {{ listing["attributes"]["name"] }}
</p>
{% endfor %}
<hr>
{% endfor %}
Rendered jinja2 template:
I am trying to use the scrape_linkedin package. I follow the section on the github page on how to set up the package/LinkedIn li_at key (which I paste here for clarity).
Getting LI_AT
Navigate to www.linkedin.com and log in
Open browser developer tools (Ctrl-Shift-I or right click -> inspect element)
Select the appropriate tab for your browser (Application on Chrome, Storage on Firefox)
Click the Cookies dropdown on the left-hand menu, and select the www.linkedin.com option
Find and copy the li_at value
Once I collect the li_at value from my LinkedIn, I run the following code:
from scrape_linkedin import ProfileScraper
with ProfileScraper(cookie='myVeryLong_li_at_Code_which_has_characters_like_AQEDAQNZwYQAC5_etc') as scraper:
profile = scraper.scrape(url='https://www.linkedin.com/in/justintrudeau/')
print(profile.to_dict())
I have two questions (I am originally an R user).
How can I input a list of profiles:
https://www.linkedin.com/in/justintrudeau/
https://www.linkedin.com/in/barackobama/
https://www.linkedin.com/in/williamhgates/
https://www.linkedin.com/in/wozniaksteve/
and scrape the profiles? (In R I would use the map function from the purrr package to apply the function to each of the LinkedIn profiles).
The output (from the original github page) is returned in a JSON style format. My second question is how I can convert this into a pandas data frame (i.e. it is returned similar to the following).
{'personal_info': {'name': 'Steve Wozniak', 'headline': 'Fellow at
Apple', 'company': None, 'school': None, 'location': 'San Francisco
Bay Area', 'summary': '', 'image': '', 'followers': '', 'email': None,
'phone': None, 'connected': None, 'websites': [],
'current_company_link': 'https://www.linkedin.com/company/sandisk/'},
'experiences': {'jobs': [{'title': 'Chief Scientist', 'company':
'Fusion-io', 'date_range': 'Jul 2014 – Present', 'location': 'Primary
Data', 'description': "I'm looking into future technologies applicable
to servers and storage, and helping this company, which I love, get
noticed and get a lead so that the world can discover the new amazing
technology they have developed. My role is principally a marketing one
at present but that will change over time.", 'li_company_url':
'https://www.linkedin.com/company/sandisk/'}, {'title': 'Fellow',
'company': 'Apple', 'date_range': 'Mar 1976 – Present', 'location': '1
Infinite Loop, Cupertino, CA 94015', 'description': 'Digital Design
engineer.', 'li_company_url': ''}, {'title': 'President & CTO',
'company': 'Wheels of Zeus', 'date_range': '2002 – 2005', 'location':
None, 'description': None, 'li_company_url':
'https://www.linkedin.com/company/wheels-of-zeus/'}, {'title':
'diagnostic programmer', 'company': 'TENET Inc.', 'date_range': '1970
– 1971', 'location': None, 'description': None, 'li_company_url':
''}], 'education': [{'name': 'University of California, Berkeley',
'degree': 'BS', 'grades': None, 'field_of_study': 'EE & CS',
'date_range': '1971 – 1986', 'activities': None}, {'name': 'University
of Colorado Boulder', 'degree': 'Honorary PhD.', 'grades': None,
'field_of_study': 'Electrical and Electronics Engineering',
'date_range': '1968 – 1969', 'activities': None}], 'volunteering':
[]}, 'skills': [], 'accomplishments': {'publications': [],
'certifications': [], 'patents': [], 'courses': [], 'projects': [],
'honors': [], 'test_scores': [], 'languages': [], 'organizations':
[]}, 'interests': ['Western Digital', 'University of Colorado
Boulder', 'Western Digital Data Center Solutions', 'NEW Homebrew
Computer Club', 'Wheels of Zeus', 'SanDisk®']}
Firstly, You can create a custom function to scrape data and use map function in Python to apply it over each profile link.
Secondly, to create a pandas dataframe using a dictionary, you can simply pass the dictionary to pd.DataFrame.
Thus to create a dataframe df, with dictionary dict, you can do like this:
df = pd.DataFrame(dict)
I'm new to the concept of generators and I'm struggling with how to apply my changes to the records within the generator object returned from the RISparser module.
I understand that a generator only reads a record at a time and doesn't actually store the data in memory but I'm having a tough time iterating over it effectively and applying my changes.
My changes will involve dropping records that have not got ['doi'] values that are contained within a list of DOIs [doi_match].
doi_match = ['10.1002/14651858.CD008259.pub2','10.1002/14651858.CD011552','10.1002/14651858.CD011990']
Generator object returned form RISparser contains the following information, this is just the first 2 records returned of a few 100. I want to iterate over it and compare the 'doi': key from the generator with the list of DOIs.
{'type_of_reference': 'JOUR', 'title': "The CoRe Outcomes in WomeN's health (CROWN) initiative: Journal editors invite researchers to develop core outcomes in women's health", 'secondary_title': 'Neurourology and Urodynamics', 'alternate_title1': 'Neurourol. Urodyn.', 'volume': '33', 'number': '8', 'start_page': '1176', 'end_page': '1177', 'year': '2014', 'doi': '10.1002/nau.22674', 'issn': '07332467 (ISSN)', 'authors': ['Khan, K.'], 'keywords': ['Bias (epidemiology)', 'Clinical trials', 'Consensus', 'Endpoint determination/standards', 'Evidence-based medicine', 'Guidelines', 'Research design/standards', 'Systematic reviews', 'Treatment outcome', 'consensus', 'editor', 'female', 'human', 'medical literature', 'Note', 'outcomes research', 'peer review', 'randomized controlled trial (topic)', 'systematic review (topic)', "women's health", 'outcome assessment', 'personnel', 'publication', 'Female', 'Humans', 'Outcome Assessment (Health Care)', 'Periodicals as Topic', 'Research Personnel', "Women's Health"], 'publisher': 'John Wiley and Sons Inc.', 'notes': ['Export Date: 14 July 2020', 'CODEN: NEURE'], 'type_of_work': 'Note', 'name_of_database': 'Scopus', 'custom2': '25270392', 'language': 'English', 'url': 'https://www.scopus.com/inward/record.uri?eid=2-s2.0-84908368202&doi=10.1002%2fnau.22674&partnerID=40&md5=b220702e005430b637ef9d80a94dadc4'}
{'type_of_reference': 'JOUR', 'title': "The CROWN initiative: Journal editors invite researchers to develop core outcomes in women's health", 'secondary_title': 'Gynecologic Oncology', 'alternate_title1': 'Gynecol. Oncol.', 'volume': '134', 'number': '3', 'start_page': '443', 'end_page': '444', 'year': '2014', 'doi': '10.1016/j.ygyno.2014.05.005', 'issn': '00908258 (ISSN)', 'authors': ['Karlan, B.Y.'], 'author_address': 'Gynecologic Oncology and Gynecologic Oncology Reports, India', 'keywords': ['clinical trial (topic)', 'decision making', 'Editorial', 'evidence based practice', 'female infertility', 'health care personnel', 'human', 'outcome assessment', 'outcomes research', 'peer review', 'practice guideline', 'premature labor', 'priority journal', 'publication', 'systematic review (topic)', "women's health", 'editorial', 'female', 'outcome assessment', 'personnel', 'publication', 'Female', 'Humans', 'Outcome Assessment (Health Care)', 'Periodicals as Topic', 'Research Personnel', "Women's Health"], 'publisher': 'Academic Press Inc.', 'notes': ['Export Date: 14 July 2020', 'CODEN: GYNOA', 'Correspondence Address: Karlan, B.Y.; Gynecologic Oncology and Gynecologic Oncology ReportsIndia'], 'type_of_work': 'Editorial', 'name_of_database': 'Scopus', 'custom2': '25199578', 'language': 'English', 'url': 'https://www.scopus.com/inward/record.uri?eid=2-s2.0-84908351159&doi=10.1016%2fj.ygyno.2014.05.005&partnerID=40&md5=ab5a4d26d52c12d081e38364b0c79678'}
I tried iterating over the generator and applying the changes. But the records that have matches are not being placed in the match list.
match = []
for entry in ris_records:
if entry['doi'] in doi_match:
match.append(entry)
else:
del entry
any advice on how to iterate over a generator correctly, thanks.