I'm trying to pull very specific elements from a dictionary of RSS data that was fetched using the feedparser library, then place that data into a new dictionary so it can be called on later using Flask. The reason I'm doing this is because the original dictionary contains tons of metadata I don't need.
I have broken down the process into simple steps but keep getting hung up on creating the new dictionary! As it is below, it does create a dictionary object, but it's not comprehensive-- it only contains a single article's title, URL and description-- the rest is absent.
I've tried switching to other RSS feeds and had the same result, so it would appear the problem is either the way I'm trying to do it, or there's something wrong with the structure of the list generated by feedparser.
Here's my code:
from html.parser import HTMLParser
import feedparser
def get_feed():
url = "http://thefreethoughtproject.com/feed/"
front_page = feedparser.parse(url)
return front_page
feed = get_feed()
# make a dictionary to update with the vital information
posts = {}
for i in range(0, len(feed['entries'])):
posts.update({
'title': feed['entries'][i].title,
'description': feed['entries'][i].summary,
'url': feed['entries'][i].link,
})
print(posts)
Ultimately, I'd like to have a dictionary like the following, except that it keeps going with more articles:
[{'Title': 'Trump Does Another Ridiculous Thing',
'Description': 'Witnesses looked on in awe as the Donald did this thing',
'Link': 'SomeNewsWebsite.com/Story12345'},
{...},
{...}]
Something tells me it's a simple mistake-- perhaps the syntax is off, or I'm forgetting a small yet important detail.
The code example you provided does an update to the same dict over and over again. So, you only get one dict at the end of the loop. What your example data shows, is that you actually want a list of dictionaries:
# make a list to update with the vital information
posts = []
for entry in feed['entries']:
posts.append({
'title': entry.title,
'description': entry.summary,
'url': entry.link,
})
print(posts)
Seems that the problem is that you are using a dict instead of a list. Then you are updating the same keys of the dict, so each iteration you are overriding the last content added.
I think that the following code will solve your problem:
from html.parser import HTMLParser
import feedparser
def get_feed():
url = "http://thefreethoughtproject.com/feed/"
front_page = feedparser.parse(url)
return front_page
feed = get_feed()
# make a dictionary to update with the vital information
posts = [] # It should be a list
for i in range(0, len(feed['entries'])):
posts.append({
'title': feed['entries'][i].title,
'description': feed['entries'][i].summary,
'url': feed['entries'][i].link,
})
print(posts)
So as you can see the code above are defining the posts variable as a list. Then in the loop we are adding dicts to this list, so it will give you the data structure that you want.
I hope to help you with this solution.
Related
Let's say my JSON looks like this
In Post, the labels are constantly changing. If they were stable, I can retrieve the JSON value by just doing this and retrieving the title
['payload']['references']['Post']['CONSTANT']['title']
But, the ['CONSTANT'] or ['4c708604012f'] is always changing if there are new Posts so I'm not sure how I can retrieve it the title?
Thanks for any help
What you need to do is return all of the .keys() of the changing dictionaries, and then reference them in a loop.
titles = []
for constant in json['payload']['references']['Post'].keys():
titles.append(json['payload']['references']['Post'][constant]['title'])
Loop through all the elements of Post:
for post in var['payload']['references']['Post'].items():
print(post['title']
You can collect all of them in a list:
titles = [post['title'] for post in var['payload']['references']['Post'].items()]
I am trying to make some (JSON) API calls to our Wi-Fi controller and obtain some info. When I store the JSON response into a dict somehow it only see's a few keys, namely:
dict_keys(['totalCount', 'hasMore', 'firstIndex', 'list'])
and items:
dict_items([('totalCount', 32), ('hasMore', False), ('firstIndex', 0),
('list', [{'id': 'ehgfhafgf', 'name': 'fasdfsd
xxxx'}, {'id': 'efasfsfas',
'name': 'zxcva'}])])
I removed a lot of items so It would make some sense otherwise it would be too much text.
So as you can see the dict recognizes the wrong variables as keys. Because as keys I need id and name. Is there a way to manually assign dict keys or a trick to simulate this?
My piece of code:
#Method that retrieves all zones
def getZones():
r = requests.get("url..", verify=False, cookies=cookie)
print(type(r.json()))
jsonResponse = r.json()
print("items: \n")
print(jsonResponse.items())
print("\nkeys: \n")
print(jsonResponse.keys())
print(jsonResponse.get('id'))
return r
doing a lot of prints for debugging reasons.
Your question would have been clearer if you had shown the actual JSON response.
However, it is clear from what you have posted that id and name are indeed not top-level keys, but keys inside nested dictionaries inside a list assigned to the list key. So you should get them from there:
for item in jsonResponse['list']:
print(item['id'], item['name'])
I'm a Python novice, thanks for your patience.
I retrieved a web page, using the requests module. I used Beautiful Soup to harvest a few hundred href objects (links). I used uritools to create an array of full URLs for the target pages I want to download.
I don't want everybody who reads this note to bombard the web server with requests, so I'll show a hypothetical example that is realistic for just 2 hrefs. The array looks like this:
hrefs2 = ['http://ku.edu/pls/WP040?PT001F01=910&pf7331=11',
'http://ku.edu/pls/WP040?PT001F01=910&pf7331=12']
If I were typing these into 100s of lines of code, I understand what to do in order to retrieve each page:
from lxml import html
import requests
url = 'http://ku.edu/pls/WP040/'
payload = {'PT001F01' : '910', 'pf7331' : '11')
r = requests.get(url, params = payload)
Then get the second page
payload = {'PT001F01' : '910', 'pf7331' : '12')
r = requests.get(url, params = payload)
And keep typing in payload objects. Not all of the hrefs I'm dealing with are sequential, not all of the payloads are different simply in the last integer.
I want to automate this and I don't see how to create the payloads from the hrefs2 array.
While fiddling with uritools, I find urisplit which can give me the part I need to parse into a payload:
[urisplit(x)[3] for x in hrefs2]
['PT001F01=910&pf7331=11',
'PT001F01=910&pf7331=12']
Each one of those has to be turned into a payload object and I don't understand what to do.
I'm using Python3 and I used uritools because that appears to be the standards-compliant replacement of urltools.
I fell back on shell script to get pages with wget, which does work, but it is so un-Python-ish that I'm asking here for what to do. I mean, this does work:
import subprocess
for i in hrefs2:
subprocess.call(["wget", i])
You can pass the full url to requests.get() without splitting up the parameters.
>>> requests.get('http://ku.edu/pls/WP040?PT001F01=910&pf7331=12')
<Response [200]>
If for some reason you don't want to do that, you'll need to split up the parameters some how. I'm sure there are better ways to do it, but the first thing that comes to mind is:
a = ['PT001F01=910&pf7331=11',
'PT001F01=910&pf7331=12']
# list to store all url parameters after they're converted to dicts
urldata = []
#iterate over list of params
for param in a:
data = {}
# split the string into key value pairs
for kv in param.split('&'):
# split the pairs up
b = kv.split('=')
# first part is the key, second is the value
data[b[0]] = b[1]
# After converting every kv pair in the parameter, add the result to a list.
urldata.append(data)
You could do this with less code but I wanted to be clear what was going on. I'm sure there is already a module somewhere out there that does this for you too.
So I'm trying to learn Python here, and would appreciate any help you guys could give me. I've written a bit of code that asks one of my favorite websites for some information, and the api call returns an answer in a dictionary. In this dictionary is a list. In that list is a dictionary. This seems crazy to me, but hell, I'm a newbie.
I'm trying to assign the answers to variables, but always get various error messages depending on how I write my {},[], or (). Regardless, I can't get it to work. How do I read this return? Thanks in advance.
{
"answer":
[{"widgets":16,
"widgets_available":16,
"widgets_missing":7,
"widget_flatprice":"156",
"widget_averages":15,
"widget_cost":125,
"widget_profit":"31",
"widget":"90.59"}],
"result":true
}
Edited because I put in the wrong sample code.
You need to show your code, but the de-facto way of doing this is by using the requests module, like this:
import requests
url = 'http://www.example.com/api/v1/something'
r = requests.get(url)
data = r.json() # converts the returned json into a Python dictionary
for item in data['answer']:
print(item['widgets'])
Assuming that you are not using the requests library (see Burhan's answer), you would use the json module like so:
data = '{"answer":
[{"widgets":16,
"widgets_available":16,
"widgets_missing":7,
"widget_flatprice":"156",
"widget_averages":15,
"widget_cost":125,
"widget_profit":"31",
"widget":"90.59"}],
"result":true}'
import json
data = json.loads(data)
# Now you can use it as you wish
data['answer'] # and so on...
First I will mention that to access a dictionary value you need to use ["key"] and not {}. see here an Python dictionary syntax.
Here is a step by step walkthrough on how to build and access a similar data structure:
First create the main dictionary:
t1 = {"a":0, "b":1}
you can access each element by:
t1["a"] # it'll return a 0
Now lets add the internal list:
t1["a"] = ["x",7,3.14]
and access it using:
t1["a"][2] # it'll return 3.14
Now creating the internal dictionary:
t1["a"][2] = {'w1':7,'w2':8,'w3':9}
And access:
t1["a"][2]['w3'] # it'll return 9
Hope it helped you.
The title is very explicit. i have a dict (very very big dict), and a it has this:
'orderItems': {
'entries': [{
'links': {
'order': {
'href': 'https: //api-latest.wdpro.xxxxx.com/booking-servicx/xxxxx/154301425212-3420290-4070919-6588782'
}
so, orderItems is a dict, inside it has entries that is a list and inside it haslinks, what i need to get is the href inside order
i´m getting the list with: orderlink = json_response["orderItems"]["entries"]
but i´m not very sure how to go through the list to find the href. Maybe with in.
Thanks.
To access elements in a list, you have to use numeric indexes, or process all of them.
The best thing probaly is to use a for loop in there, that will guarantee you will iterate over all entries on the list:
hrefs = []
for entry in orderlink:
hrefs.append(entry["links"]["order"]["href"])
that will give you a list with only the desired URLs
Supposing that you have that JSON structure I will use this code to solve your problem:
# Suppose that json_response is the whole dictionary
entry_list = json_response["orderItems"]["entries"]
# Now for each entry in the list, you need to get the "href" field
hrefs = []
for entry in entry_list:
curr_href = entry["links"]["order"]["href"]
hrefs.append(curr_href)
You need to pay attention to the dictionary structure in order to access to the field correctly. Before using this code, please pay read about dictionaries in the Python3 documentation.