Getting started with json - python

I have never worked with json before. I am trying: http://api.worldbank.org//topics?format=JSON and make things with it, but I don't even know how to get started.
Following some manuals, I did this:
import urllib
import urllib2
import simplejson
urlb = 'http://api.worldbank.org/topics'
datab = urllib2.urlopen(urlb+'?'+ param)
resultb = simplejson.load(datab)
but I have no clue of how to parse and work on it now, how do I list the individual items? count them? filter them?. Is there any simple tutorial that you guys can point me to or advice? I checked diveintopython, json's website and most of the obvious ones, but I am still struggling with it. Is there any simple step-by-step guide that somebody could point me to?
Thanks

Trying printing resultb. Its just a python list with dictionaries inside it. Treat it like you would any list.

Related

How can I add a list item to my URL structure within a loop?

So I'm new to Python and am working on a simple program that will read a text file of protein names (PDB IDs) and create a URL to search a database (the PDB) for that protein and some associated data.
Unfortunately, as a newbie, I forgot to save my script, so I can't recall what I did to make my code work!
Below is my code so far:
import urllib
import urllib.parse
import urllib.request
import os
os.chdir("C:\\PythonProjects\\Samudrala Lab Projects")
protein_file = open("protein_list.txt","r")
protein_list = protein_file.read()
for item in protein_list:
item = item[0:4]
query_string =urlencode('customReportColumns','averageBFactor','resolution','experimentalTechnique','service=wsfile','format=csv')
**final_URL = url + '?pdbid={}{}'.format(url, item, query_string)**
print(final_URL)
The line of code I'm stuck on is starred.
The object "final_url" within the loop is missing some modification to indicate that I'd like the URL to search for the item as a pdbid. Can anyone give me a hint as to how I can tell the URL to plug in each item on the list as a PDBID?
I'm getting a type error indicating that it's not a valid non-string sequence or mapping object. Original post was edited to add this info.
Please let me know if this is an unclear question, or if you need any additional info.
Thanks!
How about something like this?
final_URL = "{}?pdbids={}{}".format(url, item, query_string)

bibtex to html with pybtex, python 3

I want to take a file of one or more bibtex entries and output it as an html-formatted string. The specific style is not so important, but let's just say APA. Basically, I want the functionality of bibtex2html but with a Python API since I'm working in Django. A few people have asked similar questions here and here. I also found someone who provided a possible solution here.
The first issue I'm having is pretty basic, which is that I can't even get the above solutions to run. I keep getting errors similar to ModuleNotFoundError: No module named 'pybtex.database'; 'pybtex' is not a package. I definitely have pybtex installed and can make basic API calls in the shell no problem, but whenever I try to import pybtex.database.whatever or pybtex.plugin I keep getting ModuleNotFound errors. Is it maybe a python 2 vs python 3 thing? I'm using the latter.
The second issue is that I'm having trouble understanding the pybtex python API documentation. Specifically, from what I can tell it looks like the format_from_string and format_from_file calls are designed specifically for what I want to do, but I can't seem to get the syntax correct. Specifically, when I do
pybtex.format_from_file('foo.bib',style='html')
I get pybtex.plugin.PluginNotFound: plugin pybtex.style.formatting.html not found. I think I'm just not understanding how the call is supposed to work, and I can't find any examples of how to do it properly.
Here's a function I wrote for a similar use case--incorporating bibliographies into a website generated by Pelican.
from pybtex.plugin import find_plugin
from pybtex.database import parse_string
APA = find_plugin('pybtex.style.formatting', 'apa')()
HTML = find_plugin('pybtex.backends', 'html')()
def bib2html(bibliography, exclude_fields=None):
exclude_fields = exclude_fields or []
if exclude_fields:
bibliography = parse_string(bibliography.to_string('bibtex'), 'bibtex')
for entry in bibliography.entries.values():
for ef in exclude_fields:
if ef in entry.fields.__dict__['_dict']:
del entry.fields.__dict__['_dict'][ef]
formattedBib = APA.format_bibliography(bibliography)
return "<br>".join(entry.text.render(HTML) for entry in formattedBib)
Make sure you've installed the following:
pybtex==0.22.2
pybtex-apa-style==1.3

How to read and assign variables from an API return that's formatted as Dictionary-List-Dictionary?

So I'm trying to learn Python here, and would appreciate any help you guys could give me. I've written a bit of code that asks one of my favorite websites for some information, and the api call returns an answer in a dictionary. In this dictionary is a list. In that list is a dictionary. This seems crazy to me, but hell, I'm a newbie.
I'm trying to assign the answers to variables, but always get various error messages depending on how I write my {},[], or (). Regardless, I can't get it to work. How do I read this return? Thanks in advance.
{
"answer":
[{"widgets":16,
"widgets_available":16,
"widgets_missing":7,
"widget_flatprice":"156",
"widget_averages":15,
"widget_cost":125,
"widget_profit":"31",
"widget":"90.59"}],
"result":true
}
Edited because I put in the wrong sample code.
You need to show your code, but the de-facto way of doing this is by using the requests module, like this:
import requests
url = 'http://www.example.com/api/v1/something'
r = requests.get(url)
data = r.json() # converts the returned json into a Python dictionary
for item in data['answer']:
print(item['widgets'])
Assuming that you are not using the requests library (see Burhan's answer), you would use the json module like so:
data = '{"answer":
[{"widgets":16,
"widgets_available":16,
"widgets_missing":7,
"widget_flatprice":"156",
"widget_averages":15,
"widget_cost":125,
"widget_profit":"31",
"widget":"90.59"}],
"result":true}'
import json
data = json.loads(data)
# Now you can use it as you wish
data['answer'] # and so on...
First I will mention that to access a dictionary value you need to use ["key"] and not {}. see here an Python dictionary syntax.
Here is a step by step walkthrough on how to build and access a similar data structure:
First create the main dictionary:
t1 = {"a":0, "b":1}
you can access each element by:
t1["a"] # it'll return a 0
Now lets add the internal list:
t1["a"] = ["x",7,3.14]
and access it using:
t1["a"][2] # it'll return 3.14
Now creating the internal dictionary:
t1["a"][2] = {'w1':7,'w2':8,'w3':9}
And access:
t1["a"][2]['w3'] # it'll return 9
Hope it helped you.

how to pass value from output to a string in python

I've been trying to make an application in python and I'm new to python.
Well, what I actually want to do is that . I want the feedparser to read the values from an RSS of a website... say reddit... and then I want to make that output as a stringand pass the value further to my code... my code right now..
import feedparser
import webbrowser
feed = feedparser.parse('http://www.reddit.com/.rss')
print feed['entries'][1]['title']
print feed['entries'][1]['link']
It is working right now.. it parses the feed and I get the output I want... Now, I want to use the "link" from the "print feed['entries'][1]['link'] " and use it in the code further...
how can I do so..? To be more specific.. I want to open that URL in my browser...
I concluded to something like this..
import feedparser
import webbrowser
feed = feedparser.parse('http://www.reddit.com/.rss')
print feed['entries'][1]['title']
print feed['entries'][1]['link']
mystring = 'feed['entries'][1]['link']'
webbrowser.open('mystring')
It is of course not working... Please Help... if you need to know anything else.. please let me know...
This is Reddit specific so it won't work on other RSS feeds but I thought this might help you.
from __future__ import print_function
import praw
r = praw.Reddit("my_cool_user_agent")
submissions = r.get_front_page()
for x in submissions:
print("Title: {0} URL: {1} Permalink: {2}".format(x, x.url, x.permalink))
print ("------------------------------------------------------------")
For Reddit there are 2 URLs that you might be interested in: the actual link that is submitted (the 'external' link... think imgur, etc) and the permalink to the Reddit post itself.
Instead of passing the feed[entries][1][link] as a string, just pass the value inside to the webbrowser.
Example -
webbrowser.open(feed['entries'][1]['link'])

How to do parsing in python?

I'm kinda new to Python. And I'm trying to find out how to do parsing in Python?
I've got a task: to do parsing with some piece of unknown for me symbols and put it to DB. I guess I can create DB and tables with help of SQLAlchemy, but I have no idea how to do parsing and what all these symbols below mean?
http://joxi.ru/YmEVXg6Iq3Q426
http://joxi.ru/E2pvG3NFxYgKrY
$$HDRPUBID 112701130020011127162536
H11127011300UNIQUEPONUMBER120011127
D11127011300UNIQUEPONUMBER100001112345678900000001
D21127011300UNIQUEPONUMBER1000011123456789AR000000001
D11127011300UNIQUEPONUMBER200002123456987X000000001
D21127011300UNIQUEPONUMBER200002123456987XIR000000000This item is inactive. 9781605600000
$$EOFPUBID 1127011300200111271625360000005
Thanks in advance those who can give me some advices what to start from and how the parsing is going on?
The best approach is to first figure out where each token begins and ends, and write a regular expression to capture these. The site RegexPal might help you design the regex.
As other suggest take a look to some regex tutorials, and also re module help.
Probably you're looking to something like this:
import re
headerMapping = {'type': (1,5), 'pubid': (6,11), 'batchID': (12,21),
'batchDate': (22,29), 'batchTime': (30,35)}
poaBatchHeaders = re.findall('\$\$HDR\d{30}', text)
parsedBatchHeaders = []
batchHeaderDict = {}
for poaHeader in poaBatchHeaders:
for key in headerMapping:
start = headerMapping[key][0]-1
end = headerMapping[key][1]
batchHeaderDict.update({key: poaHeader[start:end]})
parsedBatchHeaders.append(batchHeaderDict)
Then you have list with dicts, each dict contains data for each attribute. I assume that you have your datafile in text which is string. Each dict is made for one found structure (POA Batch Header in example).
If you want to parse it further, you have to made a function to parse each date in each attribute.
def batchDate(batch):
return (batch[0:2]+'-'+batch[2:4]+'-20'+batch[4:])
for header in parsedBatchHeaders:
header.update({'batchDate': batchDate( header['batchDate'] )})
Remember, that's an example and I don't know documentation of your data! I guess it works like that, but rest is up to you.

Categories

Resources