Python JSON dictionary key error - python

I'm trying to collect data from a JSON file using python. I was able to access several chunks of text but when I get to the 3rd object in the JSON file I'm getting a key error. The first three lines work fine but the last line gives me a key error.
response = urllib.urlopen("http://asn.desire2learn.com/resources/D2740436.json")
data = json.loads(response.read())
title = data["http://asn.desire2learn.com/resources/D2740436"]["http://purl.org/dc/elements/1.1/title"][0]["value"]
description = data["http://asn.desire2learn.com/resources/D2740436"]["http://purl.org/dc/terms/description"][0]["value"]
topics = data["http://asn.desire2learn.com/resources/D2740436"]["http://purl.org/gem/qualifiers/hasChild"]
topicDesc = data["http://asn.desire2learn.com/resources/S2743916"]
Here is the JSON file I'm using. http://s3.amazonaws.com/asnstaticd2l/data/rdf/D2742493.json I went through all the braces and can't figure out why I'm getting this error. Anyone know why I might be getting this?

topics = data["http://asn.desire2learn.com/resources/D2740436"]["http://purl.org/gem/qualifiers/hasChild"]
I don't see this key "http://asn.desire2learn.com/resources/D2740436" anywhere in your source file. You didn't include your stack, but my first thought would be typo resulting in a bad key and you getting an error like:
KeyError: "http://asn.desire2learn.com/resources/D2740436"
Which means that value does not exist in the data you are referencing

The link in your code and your AWS link go to very different files. Open up the link in your code in a web browser, and you will find that it's much shorter than the file on AWS. It doesn't actually contain the key you're looking for.

You say that you are using the linked file, in which the key "http://asn.desire2learn.com/resources/S2743916" turns up once.
However, your code is downloading a different file - one in which the key does not appear.
Try using the file you linked in your code, and you should see the key will work.

Related

python-libtorrent torrent_info method

I've been using python-libtorrent to check what pieces belong to a file in a torrent containing multiple files.
I'm using the below code to iterate over the torrent file
info = libtorrent.torrent_info('~/.torrent')
for f in info.files():
print f
But this returns <libtorrent.file_entry object at 0x7f0eda4fdcf0> and I don't know how to extract information from this.
I'm unaware of the torrent_info property which would return piece value information of various files. Some help is appreciated.
the API is documented here and here. Obviously the python API can't always be exactly as the C++ one. But generally the interface takes a file index and returns some property of that file.

GitHub API response with fewer files

I am new to GitHub API.
I am writing a Python program (using requests) that should list all the changed/added files of a pull request in a given repository.
Using the API I am able to list all the pull requests and get their numbers. However, when I try to get the information about the files, the response does not contain all the files in the pull request.
pf = session.get(f'https://api.github.com/repos/{r}/pulls/{pull_num}/files')
pj = pf.json()
pprint.pprint(pf.json())
for i in range(len(pj)):
print(fj[i]['filename']))
(I know there might be a prettier way, Python is not really my cup of coffee yet, but when I compare the pf.text with the output of this snippet, the result is identical.)
I know that there is a limit of 300 files as mentioned in the documentation, but the problem occurs even if their total number is less that 300.
I created a test repo with a single pull request that adds files called file1, file 2, ..., file222 and after I send the GET request, the response only contains filenames of:
file1, file10, file100, file101, file102, file103, file104, file105, file106, file107, file108, file109, file11, file110, file111, file112, file113, file114, file115, file116, file117, file118, file119, file12, file120, file121, file122, file123, file124, file125
Is there another limit that I don't know about? Or why would the response contain only those filenames? How do I get all of them?
I found a solution a while after I posted the question. The API sends a few entries (filenames) and a link to another page in the header of the response. The files from the question are the first few in the alphabetical order, the first page.

Python KeyError if logic or try logic

I'm trying to loop through some JSON data to export to CSV and all is going well until I get to a portion of the data that I need to get certain field values where these fields do not always exist beneath "tags".
I'm getting the error of:
for alarm in tag["alarmst"]:
KeyError: 'alarmst'
I believe from Built-in Exceptions reading that this means the key/field just does not exist.
I read in Errors and Exceptions that I can put this logic in a try statement to say, if this key does not exist, don't give me the error and do something else or move onto the next set of records beneath "tag" where "alarmst" is and just dump that (and the other fields specified) to the file.
I'm having trouble figuring out how to tell this logic to stop giving me this error and to only use the csv_file.writerow() function with all the field values if only the "alarmst" exist.
Since I will be working with one file and processes before this Python process runs will get the "devs" and the "tags" to their own CSV files, I cannot parse the data and cut down on the for loops within the other for loops.
I'm not sure if the issue with the if tag["alarmst"] in tag: is due to there being so many for loops within others, or if I need to use a try statement somehow instead, or if I'm just not doing something else correctly since I'm new to Python at this level of coding but it seems to work for the need thus far.
I'm running this on Windows 10 OS if that makes any difference but I assume it doesn't.
Starting Code:
import json
import csv
with open('C:\\folder\\dev\\TagAlarms.txt',"r") as file:
data = json.load(file)
with open('C:\\folder\\dev\\TagAlarms.csv',"w",newline='') as file:
csv_file = csv.writer(file)
for dev in data["devs"]:
for tag in dev["tags"]:
for alarm in tag["alarmst"]:
csv_file.writerow(alarm['dateStatus'],[alarm['dateStart'], alarm['status'], alarm['type']])
If Code:
import json
import csv
with open('C:\\folder\\dev\\TagAlarms.txt',"r") as file:
data = json.load(file)
with open('C:\\folder\\dev\\TagAlarms.csv',"w",newline='') as file:
csv_file = csv.writer(file)
for dev in data["devs"]:
for tag in dev["tags"]:
for alarm in tag["alarmst"]:
if tag["alarmst"] in tag:
csv_file.writerow(alarm['dateStatus'],[alarm['dateStart'], alarm['status'], alarm['type']])
tag["alarmst"] is what throws the error. It means getting the value from tag associated with the key "alarmst" and there is no such key so it fails. if tag["alarmst"] in tag will throw the same error, and moreover you won't even reach that point if it's below for alarm in tag["alarmst"]:. What you want is:
if "alarmst" in tag:
for alarm in tag["alarmst"]:
But much nicer is:
for alarm in tag.get("alarmst", []):
get is similar to usual square bracket access but the second argument is a default if the key is not found. So if "alarmst" is not in the dictionary this will essentially be:
for alarm in []:
which is just an empty loop that won't run at all.

How do I load JSON into Couchbase Headless Server in Python?

I am trying to create a Python script that can take a JSON object and insert it into a headless Couchbase server. I have been able to successfully connect to the server and insert some data. I'd like to be able to specify the path of a JSON object and upsert that.
So far I have this:
from couchbase.bucket import Bucket
from couchbase.exceptions import CouchbaseError
import json
cb = Bucket('couchbase://XXX.XXX.XXX?password=XXXX')
print cb.server_nodes
#tempJson = json.loads(open("myData.json","r"))
try:
result = cb.upsert('healthRec', {'record': 'bob'})
# result = cb.upsert('healthRec', {'record': tempJson})
except CouchbaseError as e:
print "Couldn't upsert", e
raise
print(cb.get('healthRec').value)
I know that the first commented out line that loads the json is incorrect because it is expecting a string not an actual json... Can anyone help?
Thanks!
Figured it out:
with open('myData.json', 'r') as f:
data = json.load(f)
try:
result = cb.upsert('healthRec', {'record': data})
I am looking into using cbdocloader, but this was my first step getting this to work. Thanks!
I know that you've found a solution that works for you in this instance but I thought I'd correct the issue that you experienced in your initial code snippet.
json.loads() takes a string as an input and decodes the json string into a dictionary (or whatever custom object you use based on the object_hook), which is why you were seeing the issue as you are passing it a file handle.
There is actually a method json.load() which works as expected, as you have used in your eventual answer.
You would have been able to use it as follows (if you wanted something slightly less verbose than the with statement):
tempJson = json.load(open("myData.json","r"))
As Kirk mentioned though if you have a large number of json documents to insert then it might be worth taking a look at cbdocloader as it will handle all of this boilerplate code for you (with appropriate error handling and other functionality).
This readme covers the uses of cbdocloader and how to format your data correctly to allow it to load your documents into Couchbase Server.

Open URL stored in a csv file

I'm almost an absolute beginner in Python, but I am asked to manage some difficult task. I have read many tutorials and found some very useful tips on this website, but I think that this question was not asked until now, or at least in the way I tried it in the search engine.
I have managed to write some url in a csv file. Now I would like to write a script able to open this file, to open the urls, and write their content in a dictionary. But I have failed : my script can print these addresses, but cannot process the file.
Interestingly, my script dit not send the same error message each time. Here the last : req.timeout = timeout
AttributeError: 'list' object has no attribute 'timeout'
So I think my script faces several problems :
1- is my method to open url the right one ?
2 - and what is wrong in the way I build the dictionnary ?
Here is my attempt below. Thanks in advance to those who would help me !
import csv
import urllib
dict = {}
test = csv.reader(open("read.csv","rb"))
for z in test:
sock = urllib.urlopen(z)
source = sock.read()
dict[z] = source
sock.close()
print dict
First thing, don't shadow built-ins. Rename your dictionary to something else as dict is used to create new dictionaries.
Secondly, the csv reader creates a list per line that would contain all the columns. Either reference the column explicitly by urllib.urlopen(z[0]) # First column in the line or open the file with a normal open() and iterate through it.
Apart from that, it works for me.

Categories

Resources