Need help in Python for JSON data scraping - python

I am using the below code for scraping data from a website. but I am facing key error: 0
Kindly tell me the problems in my code.
Original JSON response from the webpage:
https://www.demo.com/api/user_details/22
Response:
{"user_details":{"user_id":"22","username":"Test","user_email":"test#gmail.com"}}
I wanna scrape the username, user_id and user_email.
What I have tried:
import json
import requests
import datetime
#data outputs to a CSV file in the current directory
csv_output = open("test.csv", "w")
end_page = 5;
#scan through pages 1 to end_page for data, 20 results per page
for page in range(1,end_page+1):
r = requests.get('https://www.demo.com/api/user_details/' + str(page))
data = r.json()
for index in range(len(data["user_details"])):
csv_output.write("\"%s\",%s\n" % (data["user_details"][index]["user_id"].encode('ascii', 'ignore'))),
data["user_details"][index]["user_id"]
csv_output.close()

data["user_details"] is a dict and not a list and you are getting the error because you are trying to access the values using an index:
data["user_details"][index] ....
You can get the entries by accessing specific keys from the dict:
user_id = data["user_details"]['user_id']
username = data["user_details"]['username']
user_email = data["user_details"]['user_email']

Exactly what AKS asnwered, but I really recommend you to use a framework called Scrapy tocreate crawlers. Much easier. :)

{"user_details":{"user_id":"22","username":"Test","user_email":"test#gmail.com"}}
User details is a dictionary here. On the other hand, index is an integer coming from the range call. The first value would be 0. Your code tries to load data["user_details"][0]. But there is no key 0 in that dictionary.
To iterate over a dictionary, you can call the items method which would give you a tuple with (key, value) pair.
d = {"user_id":"22","username":"Test","user_email":"test#gmail.com"}
for k,v in d.items():
print("Key: {}, Value: {}".format(k,v))

Related

how to get a value from a json text Python

import requests
import json
r = requests.get("https://api.investing.com/api/search/?t=Equities&q=amd") # i get json text from this api
data = json.loads(r.text)
if data['articles'][0]['exchange'] == 'Sydney': # the error is here KeyError: 'exchange'
print('success')
else:
print('fail')
if i want to get the url '/equities/segue-resources-ltd' by checking if the 'exchange' is 'Sydney' which is stored in this part of the json text, {"id":948190,"url":"/equities/segue-resources-ltd","description":"Segue Resources Ltd","symbol":"AMD","exchange":"Sydney","flag":"AU","type":"Equities"}
If i'm understanding this correctly, the exchange identifier only appears in part of the json response. So, in order to get your result using the same data variable in your question, we can do this:
result = [val["url"] for val in data["quotes"] if val["exchange"] == "Sydney"]
We are using a list comprehension here, where the loop is only going through data["quotes"] instead of the whole json response, and for each item in that json subset, we're returning the value for key == "url" where the exchange == "Sydney". Running the line above should get you:
['/equities/segue-resources-ltd']
As expected. If you aren't comfortable with list comprehensions, the more conventional loop-version of it looks like:
result = []
for val in data["quotes"]:
if val["exchange"] == "Sydney":
result.append(val["url"])
print(result)
KeyError: 'exchange' means that the dictionary data['articles'][0] did not have a key 'exchange'.
Depending on your use case, you may want to iterate over the whole list of articles:
for article in data['articles']:
if 'exchange' in article and article['exchange'] == 'Sydney':
... # Your code here
If you only want to check the first article, then use data['articles'][0].get('exchange'). The dict.get() method will return None if the key is not present instead of throwing a KeyError.

How can I append values from a JSON dictionary to a new list?

I have a .json file of all of my AWS target groups. This was created using aws elbv2 describe-target-groups. I want to extract every TargetGroupArn from this file and store it into a Python list.
With my current code, I get no output. I can confirm that the dictionary has data in it, but nothing is being appended to the list that I'm trying to create.
import json
from pprint import pprint
with open('target_groups.json') as f:
data = json.load(f)
items = data['TargetGroups']
arn_list = []
for key, val in data.items():
if key == 'TargetGroupArn':
arn_list.append(val)
print(arn_list)
Expected results would be for arn_list to print out looking like this:
[arn:aws:elb:xxxxxxx:targetgroup1, arn:aws:elb:xxxxxxx:targetgroup2, arn:aws:elb:xxxxxxx:targetgroup3]
Change your code to this:
import json
from pprint import pprint
with open('target_groups.json') as f:
data = json.load(f)
arn_list = []
if 'TargetGroups' in data:
items = data['TargetGroups']
for item in items:
if 'TargetGroupArn' in item:
arn_list.append(item['TargetGroupArn'])
print(arn_list)
else:
print('No data')
There are many ways to make this python code more concise. However, I prefer a more wordy style that easier to read.
Also note that this code checks that keys exist so that the code will not stackdump for missing data.
it would be better if you could post the file you are trying to get data from, but this part:
for key, val in data.items():
if key == 'TargetGroupArn':
arn_list.append(val)
need to be changed to:
for key, val in items.items():
if key == 'TargetGroupArn':
arn_list.append(val)
you get data from 'data' and add it to items, but you never actually used it.
give it a shot.

Getting Keyerror when parsing JSON in Python

I have just made a program to parse some data from an api. The api gives data back with a JSON format. When I try to parse it it gives me a key error
url = json.loads(r.text)["url"]
KeyError: 'url'
This is the part of the code
url = json.loads(r.text)["url"]
I am trying to get the data in the plain field. Here is the output from the API:
{"updates":[{"id":"a6aa-8bd","description":"Bug fixes and enhancemets","version":"8.1.30","type":"firmware","url":"https://con-man.company.com/api/v1/file-732e844b","updated":"2017-07-25"}]}
You cannot access url since it is inside update (list), therefore you need to Pass index and then key :
One liner:
>>> url = json.loads(r.text)['updates'][0]['url']
'https://con-man.company.com/api/v1/file-732e844b'
Explicit
>>> jobj = json.loads(r.text)
>>> url = jobj['updates'][0]['url']
'https://con-man.company.com/api/v1/file-732e844b'
try this,
url = json.loads(r.text)["updates"][0]["url"]
{
"updates": [
{
"id":"a6aa-8bd",
"description":"Bug fixes and enhancemets",
"version":"8.1.30",
"type":"firmware",
"url":"https://con-man.company.com/api/v1/file-732e844b",
"updated":"2017-07-25"
}
]
}
Try to visualize of your dict, it has only one key "update" in that key value it has another list and into that list, you has another dict
so if in your case
_dict = json.loads(r.text) # read file and load dict
_list = _dict['updates'] # read list inside dict
_dict_1 = _list[0] # read list first value and load dict
url = _dict_1['url'] # read 'url' key from dict
I used this and works now for me.
json_object = json.loads(response.content.decode("utf-8"))['list'][0]['localPercentDynamicObjectsUsed']

KeyError: 0 when iterating json through python-nmap

I'm trying to parse out specfic values from my dictionary. Having worked with dictionaries before, I was certain you could iterate through a length of results using integers.
Below is an edited example of my nmap scan (using fake IPs). I'm trying to access the ipv4 values.
{'165.19.100.145': {'addresses': {'ipv4': '165.19.100.145'}}, '165.19.100.200': {'addresses': {'ipv4': '165.19.100.200'}}}
I'm trying to iterate through the dictionary like so:
#!/usr/bin/env python3
import nmap
import json
nm = nmap.PortScanner()
results = nm.scan(hosts='165.19.100.0/24', arguments='-sP')
results_json = json.dumps(results['scan'], indent=4, sort_keys=True, ensure_ascii=False)
json_data = json.loads(results_json)
scan_len = len(json_data)
for x in range(0, scan_len):
ip_address = json_data[x]['addresses']['ipv4']
print(ip_address)
When I run this script, I get a KeyError: 0. I have no idea why I might be getting this error. Wouldn't the 0 refer to the first 165.19.100.145? What am I doing wrong here?
Because you are not iterating over a list but over a dictionary. there are no indexes in a dictionary afaik.
rather do this:
for key, value in json_data.iteritems():
print json_data[key]['addresses']['ipv4']

Steam API: Type Error and turning JSON into CSV?

I'm trying to figure out why I keep getting a Type error when trying to pull out from the Steam API, I'm trying to create a dictionary to then turn into a CSV file, I know this is in JSON so my question is two fold, how to create a CSV type of data and how to take the JSON info that I have into that. The idea of this method is to get a list of AppIDs so I can find their prices:
Code:
def steamlibrarypull(steamID, key):
#Pulls out a CSV of Steam libraries
steaminfo = {
'key': key,
'steamid': steamID,
'format':'JSON',
'include_appinfo':'1'
}
r = requests.get('http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/', params=steaminfo)
d = json.loads(r.content)
I = d['response']['games']
B = {}
for games in I:
B[I['name']] = I['appid']
return B
Traceback (most recent call last):
File "steam.py", line 46, in <module>
print steamlibrarypull(76561197960434622, key)
File "steam.py", line 44, in steamlibrarypull
B[I['name']] = I['appid']
TypeError: list indices must be integers, not str
You are not referencing the iterator properly.
def steamlibrarypull(steamID, key):
#Pulls out a CSV of Steam libraries
steaminfo = {
'key': key,
'steamid': steamID,
'format':'JSON',
'include_appinfo':'1'
}
r = requests.get('http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/', params=steaminfo)
d = json.loads(r.content)
I = d['response']['games']
B = {}
for games in I:
B[games['name']] = games['appid']
return B
That will return a dictionary of name:appid. You will then need to iterate through it and write it to a file.
with open('games.csv', 'w') as f:
for key, value in B.items():
f.write("%s,%s\r\n" % (key, value))
Your for loop is not doing what you want it to do. Use this:
for game in I:
B[game['name']] = game['appid']
return B
I in this case (I'm assuming, since I don't have a Steam account) is a list containing a number of dicts, each with a 'name' field and an 'appid' field, among others. Your for loop is iterating over each of these dicts, and you want to store just those two fields in a new dict named B. However, in your code, I['name'] doesn't work, as I is a list, and can only be indexed by integers, hence the error. However, when iterating over this list of dicts, game['name'] will work just fine, because dicts are indexed by their keys.
As far as turning this data into a CSV, there are a number of questions on SO on this topic, so just use Google to search for them.

Categories

Resources