I have a list of countries and their cities on one website. I take all names of countries and their capitals from this list, and want to put them in JSON file like this:
[
{
"country": "Afghanistan",
"city": "Kabul"
},
{
"country": "Aland Islands",
"city": "Mariehamn"
}
]
there's my code:
cells = soup.table('td')
count = 0
cities_list.write('[\n')
for cell in cells:
if count == len(cells)-2:
break
else:
cities_list.write(json.dumps({"country": "{}".format(cells[count].getText()),
"city": "{}".format(cells[count].next_sibling.getText())},
indent=2))
count += 2
cities.list_write('\n]')
And my problem is that objects are not separated by comma:
[
{
"country": "Afghanistan",
"city": "Kabul"
}{
"country": "Aland Islands",
"city": "Mariehamn"
}
]
How can I make my objects separated by comma, and also is it possible to do without writing '\n]' at the end and beginning?
Python obviously has no way to know that you are writing a list of objects when you are writing them one at a time ... so just don't.
cells = soup.table('td')
cities = []
for cell in cells[:-2]:
cities.append({"country": str(cells[count].getText()),
"city": str(cells[count].next_sibling.getText())})
json.dump(cities, cities_list)
Notice also how "{}".format(value) is just a clumsy way to write str(value) (or just value if value is already a string) and how json.dump lets you pass an open file handle to write to.
Related
I have some json data similar to this...
{
"people": [
{
"name": "billy",
"age": "12"
...
...
},
{
"name": "karl",
"age": "31"
...
...
},
...
...
]
}
At the moment I can do this to get a entry from the people list...
wantedPerson = "karl"
for person in people:
if person['name'] == wantedPerson:
* I have the persons entry *
break
Is there a better way of doing this? Something similar to how we can .get('key') ?
Thanks,
Chris
Assuming you load that json data using the standard library for it, you're fairly close to optimal, perhaps you were looking for something like this:
from json import loads
text = '{"people": [{"name": "billy", "age": "12"}, {"name": "karl", "age": "31"}]}'
data = loads(text)
people = [p for p in data['people'] if p['name'] == 'karl']
If you frequently need to access this data, you might just do something like this:
all_people = {p['name']: p for p in data['people']}
print(all_people['karl'])
That is, all_people becomes a dictionary that uses the name as a key, so you can access any person in it quickly by accessing them by name. This assumes however that there are no duplicate names in your data.
First, there's no problem with your current 'naive' approach - it's clear and efficient since you can't find the value you're looking for without scanning the list.
It seems that you refer to better as shorter, so if you want a one-liner solution, consider the following:
next((person for person in people if person.name == wantedPerson), None)
It gets the first person in the list that has the required name or None if no such person was found.
similarly
ps = {
"people": [
{
"name": "billy",
"age": "12"
},
{
"name": "karl",
"age": "31"
},
]
}
print([x for x in ps['people'] if 'karl' in x.values()])
For possible alternatives or details see e.g. # Get key by value in dictionary
I am trying to use Python to extract pricePerUnit from JSON. There are many entries, and this is just 2 of them -
{
"terms": {
"OnDemand": {
"7Y9ZZ3FXWPC86CZY": {
"7Y9ZZ3FXWPC86CZY.JRTCKXETXF": {
"offerTermCode": "JRTCKXETXF",
"sku": "7Y9ZZ3FXWPC86CZY",
"effectiveDate": "2020-11-01T00:00:00Z",
"priceDimensions": {
"7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7": {
"rateCode": "7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7",
"description": "Processed translation request in AWS GovCloud (US)",
"beginRange": "0",
"endRange": "Inf",
"unit": "Character",
"pricePerUnit": {
"USD": "0.0000150000"
},
"appliesTo": []
}
},
"termAttributes": {}
}
},
"CQNY8UFVUNQQYYV4": {
"CQNY8UFVUNQQYYV4.JRTCKXETXF": {
"offerTermCode": "JRTCKXETXF",
"sku": "CQNY8UFVUNQQYYV4",
"effectiveDate": "2020-11-01T00:00:00Z",
"priceDimensions": {
"CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7": {
"rateCode": "CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7",
"description": "$0.000015 per Character for TextTranslationJob:TextTranslationJob in EU (London)",
"beginRange": "0",
"endRange": "Inf",
"unit": "Character",
"pricePerUnit": {
"USD": "0.0000150000"
},
"appliesTo": []
}
},
"termAttributes": {}
}
}
}
}
}
The issue I run into is that the keys, which in this sample, are 7Y9ZZ3FXWPC86CZY, CQNY8UFVUNQQYYV4.JRTCKXETXF, and CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7 are a changing string that I cannot just type out as I am parsing the dictionary.
I have python code that works for the first level of these random keys -
with open('index.json') as json_file:
data = json.load(json_file)
json_keys=list(data['terms']['OnDemand'].keys())
#Get the region
for i in json_keys:
print((data['terms']['OnDemand'][i]))
However, this is tedious, as I would need to run the same code three times to get the other keys like 7Y9ZZ3FXWPC86CZY.JRTCKXETXF and 7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7, since the string changes with each JSON entry.
Is there a way that I can just tell python to automatically enter the next level of the JSON object, without having to parse all keys, save them, and then iterate through them? Using JQ in bash I can do this quite easily with jq -r '.terms[][][]'.
If you are really sure, that there is exactly one key-value pair on each level, you can try the following:
def descend(x, depth):
for i in range(depth):
x = next(iter(x.values()))
return x
You can use dict.values() to iterate over the values of a dict. You can also use next(iter(dict.values())) to get a first (only) element of a dict.
for demand in data['terms']['OnDemand'].values():
next_level = next(iter(demand.values()))
print(next_level)
If you expect other number of children than 1 in the second level, you can just nest the fors:
for demand in data['terms']['OnDemand'].values():
for sub_demand in demand.values()
print(sub_demand)
If you are insterested in the keys too, you can use dict.items() method to iterate over dict keys and values at the same time:
for demand_key, demand in data['terms']['OnDemand'].items():
for sub_demand_key, sub_demand in demand.items()
print(demand_key, sub_demand_key, sub_demand)
I'm trying to loop through a JSON file using Python and return the name of the object and associated modules for it.
Right now I can basically get the output I want hardcoding the indexes. However, this obviously isn't the right way to do it (the JSON file can vary in length).
Whenever I try to use a loop, I get errors like:
TypeError: string indices must be integers
My JSON file looks like this:
{
"name": "gaming_companies",
"columns": [{
"name": "publisher",
"type": "string",
"cleansing": ["clean_string"]
},
{
"name": "genre",
"type": "string",
"cleansing": ["match_genre", "clean_string"]
},
{
"name": "sales",
"type": "int",
"cleansing": []
}
]
}
My Python code which is 'working' looks like:
import json as js
def cleansing(games_json):
print (games_json['columns'][0]['name'] + " - cleansing:")
[print(i) for i in games_json['columns'][0]['cleansing'] ]
print (games_json['columns'][1]['name'] + " - cleansing:")
[print(i) for i in games_json['columns'][1]['cleansing'] ]
print (games_json['columns'][2]['name'] + " - cleansing:")
[print(i) for i in games_json['columns'][2]['cleansing'] ]
with open(r'C:\Desktop\gamefolder\jsonfiles\games.json') as input_json:
games_json = js.load(input_json)
cleansing(games_json)
The output I'm trying to return is:
publisher
cleansing:
clean_string
genre
cleansing:
match_genre
clean_string
sales
cleansing:
My attempt to loop through them like this:
for x in games_json:
for y in games_json['columns'][x]:
print (y)
Results in:
TypeError: list indices must be integers or slices, not str
games_json shows as a Dict.
Columns shows as a list of dictionaries.
Each object's cleansing attribute shows as a list.
I think this is where my problem is, but I'm not able to get over the hurdle.
The problem with your attempt is using an iterator as a string.
The x in for y in games_json['columns'][x]: is an iterator object and not the strings ['name', 'cleansing'].
You can learn more about python iterators here
As for the case - you might want to iterate over the columns as a separate list.
This code should work
for item in f["columns"]:
print(item["name"])
print("cleansing:")
print(item["cleansing"])
Output-
publisher
cleansing:
['clean_string']
genre
cleansing:
['match_genre', 'clean_string']
sales
cleansing:
[]
This can be one of working solutions as you want to iterate array's elements.
import json
for x in games_json['columns']:
print(x)
print(x['name'])
x = """{
"name": "gaming_companies",
"columns": [{
"name": "publisher",
"type": "string",
"cleansing": ["clean_string"]
},
{
"name": "genre",
"type": "string",
"cleansing": ["match_genre", "clean_string"]
},
{
"name": "sales",
"type": "int",
"cleansing": []
}
]
}"""
x = json.loads(x)
for i in x['columns']:
print(i['name'])
print("cleansing:")
for j in i["cleansing"]:
print(j)
print('\n')
Output
publisher
cleansing:
clean_string
genre
cleansing:
match_genre
clean_string
sales
cleansing:
with open(r'C:\Desktop\gamefolder\jsonfiles\games.json') as input_json:
games_json = js.load(input_json)
for i in games_json['columns']:
print(i['name'])
print("cleansing:")
for j in i["cleansing"]:
print(j)
print('\n')
I am a beginner to python and scripting so I am unfamiliar with how json innately works, but this is the problem I have. I wrote a script which took values of the "location" variable from the json file I was reading and used googlemaps API to find the country this location was in. However, as some of these locations are repeat, and I did not want to repeatedly check the same location over and over. I stored all the values retrieved from the location variable in a list, then converted the list into a set to get rid of duplicates.
My question is this: once I have retrieved the country data (I have the data stored in a list), how can I add this country data to my original json file?
For instance, these are a few of my tests from the json file I am reading.
{"login": "hi", "name": "hello", "location": "Seoul, South Korea"}
{"login": "hi", "name": "hello", "location": null}
{"login": "hi", "name": "hello", "location": "Berlin, Germany"}
{"login": "hi", "name": "hello", "location": "Pittsburgh, PA"}
{"login": "hi", "name": "hello", "location": "London"}
{"login": "hi", "name": "hello", "location": "Tokyo, Japan"}
input = codecs.open(inputFile, 'r', 'utf8')
for line in input.readlines():
temp = json.loads(line)
if (temp['location'] != None): #some locations are Null
locationList.append(temp['location'])
input.close()
locationList = list(set(locationList))
print(locationList)
#getting location data, storing it in countryList variable
for uniqueLocation in locationList:
geocodeResult = gm.geocode(uniqueLocation)[0] #getting information about each location
geoObject = geocodeResult['address_components'] #gettnig just the address components
for item in geoObject: #iterating in object
if item['types'][0] == 'country': #if first element of this item is country
countryName = item['long_name'] #then retrieve long_name from this item
countryList.append(countryName)
print(countryList)
Check this out:
How to append data to a json file?
I am using rest with a python script to extract Name and Start Time from a response.
I can get the information but I can't combine data so that the information is on the same line in a CSV. When I go to export them to CSV they all go on new lines.
There is probably a much better way to extract data from a JSON List.
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
for key, value in data_item.items():
#driver = {}
#test = {}
#startTime = {}
if key == "Name":
drivers.append(value)
if key == "StartTime":
drivers.append(value)
print (drivers)
Code to write to CSV:
with open(logFileName, 'a') as outcsv:
# configure writer to write standard csv file
writer = csv.writer(outcsv, delimiter=',', quotechar="'",
quoting=csv.QUOTE_MINIMAL, lineterminator='\n',skipinitialspace=True)
for driver in drivers:
writer.writerow(driver)
Here is a sample of the response:
"Query": {
"Results": [
{
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1400"
},
{
"Name": " John Doe"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1401"
},
{
"Name": " Jane Smith"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
My ouput in csv:
John Doe
2018-06-19T07:16:10Z
Jane Smith
2018-06-19T07:16:10Z
Desired Outcome:
John Doe, 2018-06-19T07:16:10Z
Jane Smith, 2018-06-19T07:16:10Z
Just use normal dictionary access to get the values:
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
drivers.append(data_item["Name"])
if "StartTime" in data_item:
drivers.append(data_item["StartTime"])
print (drivers)
If you know the items will already have the required fields then you won't even need the in tests.
writer.writerow() expects a sequence. You are calling it with a single string as a parameter so it will split the string into individual characters. Probably you want to keep the name and start time together so extract them as a tuple:
for item in driverDetails['Query']['Results']:
name, start_time = "", ""
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
name = data_item["Name"]
if "StartTime" in data_item:
start_time = data_item["StartTime"]
drivers.append((name, start_time))
print (drivers)
Now instead of being a list of strings, drivers is a list of tuples: the name for every item that has a name and the start time but if an input item has a name and no start time that field could be empty. Your code to write the csv file should now do the expected thing.
If you want to get all or most of the values try gathering them together into a single dictionary, then you can pull out the fields you want:
for item in driverDetails['Query']['Results']:
fields = {}
for data_item in item['XValues']:
body.append(data_item)
fields.update(data_item)
drivers.append((fields["ID"], fields["Name"], fields["StartTime"]))
print (drivers)
Once you have the fields in a single dictionary you could even build the tuple with a loop:
drivers.append(tuple(fields[f] for f in ("ID", "Name", "StartTime", "ReportScopeStartTime", "ReportScopeEndTime")))
I think you should list the fields you want explicitly just to ensure that new fields don't surprise you.