So I'm pulling data from an API and want to save only specific dicts and list from the JSON response. The problem is that when I dump the data inside the loop it creates very weird looking data in the file which isn't actually JSON.
r=requests.get(url,headers=header)
result=r.json()
with open ('myfile.json','a+') as file:
for log in result['logs']:
hello=json.dump(log['log']['driver']['username'], file)
hello=json.dump(log['log']['driver']['first_name'],file)
hello=json.dump(log['log']['driver']['last_name'],file)
for event in log['log']['events']:
hello=json.dump(event['event']['id'],file)
hello=json.dump(event['event']['start_time'],file)
hello=json.dump(event['event']['type'],file)
hello=json.dump(event['event']['location'],file)
The end goal here is to convert this data into a CSV. The only reason I'm saving it to a JSON file is so that I can load it and save it into a CSV then. The API endpoint I'm targeting is Logs:
https://developer.keeptruckin.com/reference#get-logs
I think #GBrandt has the right idea as far as creating valid JSON output goes, but as I said in a comment, I don't think that JSON-to-JSON conversion step is really necessary — since you could just create the CSV file from the JSON you already have:
(Modified to also split start_time into two separate fields as per you follow-on question.)
result = r.json()
with open('myfile.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
for log in result['logs']:
username = log['log']['driver']['username']
first_name = log['log']['driver']['first_name']
last_name = log['log']['driver']['last_name']
for event in log['log']['events']:
id = event['event']['id']
start_time = event['event']['start_time']
date, time = start_time.split('T') # Split time into two fields.
_type = event['event']['type'] # Avoid using name of built-in.
location = event['event']['location']
if not location:
location = "N/A"
writer.writerow(
(username, first_name, last_name, id, date, time, _type, location))
It looks like you're just dumping individual JSON strings into the file in an unstructured way.
json.dump will not magically create a JSON dict-like object and save it into the file. See:
json.dump(log['log']['driver']['username'], file)
What it actually does there is just stringifying the driver's username and dumping it right into the file, so the file will have only a string, not a JSON object (which I'm guessing is what you want). It is JSON, just not really useful.
What you're looking for is this:
r=requests.get(url,headers=header)
result=r.json()
with open ('myfile.json','w+') as file:
logs = []
for log in result['logs']:
logs.append({
'username': log['log']['driver']['username'],
'first_name': log['log']['driver']['first_name'],
'last_name': log['log']['driver']['last_name'],
# ...
'events': [
({
'id': event['event']['id'],
'start_time': event['event']['start_time'],
# ...
}) for event in log['log']['events']
]
})
json.dump(logs, file)
Also, I would recommend not using append mode on JSON files, a .json is expected to hold a single JSON object (as far as I'm concerned).
How about the code below (A sample json is loaded from a file instead of via HTTP call in order to get data to work with).
Sample JSON taken from https://developer.keeptruckin.com/reference#get-logs
import json
with open('input.json', 'r') as f_in:
data = json.load(f_in)
data_to_collect = []
logs = data['logs']
with open('output.json', 'w') as f_out:
for log in logs:
_log = log['log']
data_to_collect.append({key: _log['driver'].get(key) for key in ['username', 'first_name', 'last_name']})
data_to_collect[-1]['events'] = []
for event in _log['events']:
data_to_collect[-1]['events'].append(
{key: event['event'].get(key) for key in ['id', 'start_time', 'type', 'location']})
json.dump(data_to_collect, f_out)
Output file
[
{
"username": "demo_driver",
"first_name": "Demo",
"last_name": "Driver",
"events": [
{
"start_time": "2016-10-16T07:00:00Z",
"type": "driving",
"id": 221,
"location": "Mobile, AL"
},
{
"start_time": "2016-10-16T09:00:00Z",
"type": "sleeper",
"id": 474,
"location": null
},
{
"start_time": "2016-10-16T11:00:00Z",
"type": "driving",
"id": 475,
"location": null
}
]
}
]
Related
I have a JSON file with 10000 data entries like below in a file.
{
"1":{
"name":"0",
"description":"",
"image":""
},
"2":{
"name":"1",
"description":"",
"image":""
},
...
}
I need to write each entry in this object into its own file.
For example, the output of each file looks like this:
1.json
{
"name": "",
"description": "",
"image": ""
}
I have the following code, but I'm not sure how to proceed from here. Can anyone help with this?
import json
with open('sample.json', 'r') as openfile:
# Reading from json file
json_object = json.load(openfile)
You can use a for loop to iterate over all the fields in the outer object, and then create a new file for each inner object:
import json
with open('sample.json', 'r') as input_file:
json_object = json.load(input_file)
for key, value in json_object.items():
with open(f'{key}.json', 'w') as output_file:
json.dump(value, output_file)
I'm exploring the avro file format and am currently struggling to append data. I seem to overwrite in each run. I found an existing thread here, saying I should not pass in a schema in order to "append" to existing file without overwriting. Even my lint gives this clue: If the schema is not present, presume we're appending.. However, If I try to declare DataFileWriter as DataFileWriter(open("users.avro", "wb"), DatumWriter(), None) then the code wont run.
Simply put, how do I append values to an existing avro files without writing over existing content.
schema = avro.schema.parse(open("user.avsc", "rb").read()
writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema)
print("start appending")
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 12, "favorite_color": "blue"})
writer.close()
print("write successful!")
# Read data from an avro file
with open('users.avro', 'rb') as f:
reader = DataFileReader(open("users.avro", "rb"), DatumReader())
users = [user for user in reader]
reader.close()
print(f'Schema {schema}')
print(f'Users:\n {users}')
I'm not sure how to do it with the standard avro library, but if you use fastavro it can be done. See the example below:
from fastavro import parse_schema, writer, reader
schema = {
"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
parsed_schema = parse_schema(schema)
records = [
{"name": "Alyssa", "favorite_number": 256},
{"name": "Ben", "favorite_number": 12, "favorite_color": "blue"},
]
# Write initial 2 records
with open("users.avro", "wb") as fp:
writer(fp, schema, records)
# Append third record
with open("users.avro", "a+b") as fp:
writer(fp, schema, [{"name": "Chris", "favorite_number": 1}])
# Read all records
with open("users.avro", "rb") as fp:
for record in reader(fp):
print(record)
The solution to skip the schema is correct, but only once you have the Avro file set up with the correct schema.
In your code, what you have incorrectly, was to put wb instead of ab in the file open mode.
Putting None or no argument at all in DataFileWriter should not matter and the code should run.
This reproducible code initializes the file in the correct schema. It does not matter if it is ab or wb mode, just write to an empty file with a schema and close it.
writer = DataFileWriter(open("reproducible.avro", "ab+"), DatumWriter(), schema)
writer.close()
Now to write the actual records in the append mode (so no re-reading the file!), you can skip the schema while in the ab mode:
for i in range(3):
writer = DataFileWriter(open("reproducible.avro", "ab+"), DatumWriter())
writer.append(db_entry)
writer.close()
Finally, read the entire file:
reader = DataFileReader(open("reproducible.avro", "rb"), DatumReader())
for data in reader:
print(data)
reader.close()
Works for me on Windows, with Python 3.9.13 and avro library 1.11.1.
For full reproducible example, please begin with:
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
import json
schema = {
"type": "record",
"name": "recordName",
"fields": [
{
"name": "id",
"type": "string"
}
]
}
schema = avro.schema.parse(json.dumps(schema))
db_entry = {
"id": "random_id"
}
I have a JSON file with 20 objects, each containing a resource parameter with an associated value. I would like to extract the value of resource for each object, and save that value as a line in a txt file.
The structure of the JSON is:
"objects": [
{"created": "2020-10-04", "name": "john", "resource": "api/john/",}
{"created": "2020-10-04", "name": "paul", "resource": "api/paul/",}
{"created": "2020-10-04", "name": "george", "resource": "api/george/",}
{"created": "2020-10-04", "name": "ringo", "resource": "api/ringo/",}
]
So far, I have got the following code, however this can only get the resource value from the first object, and does not let me write it to a txt file using Python.
with open(input_json) as json_file:
data = json.load(json_file)
resource = (data["objects"][1]["resource"])
values = resource.items()
k = {str(key): str(value) for key, value in values}
with open ('resource-list.txt', 'w') as resource_file:
resource_file.write(k)
You have to use lists:
txtout=""
with open(input_json) as json_file:
data = json.load(json_file)
objects = data["objects"]
for jobj in objects:
txtout = txtout + jobj["resource"] + "\n"
with open ('resource-list.txt', 'w') as resource_file:
resource_file.write(txtout)
hi there new Pythonista!
well the thing you missed here is the part where you iterate over your json object.
with open(input_json) as json_file:
data = json.load(json_file)
resource = (data["objects"][1]["resource"]) # right here you simply took the second object (which is the [1] position)
a decet fix would be:
with open(input_json) as json_file:
data = json.load(json_file)
all_items = [] # lets keep here all resource values
for item in data["objects"]: # iterate over entire items
all_items.append(item["resource"]) # push the necessary info
# lets concat every item to one string since it's only made of 20, it will not make our buffer explode
to_write = "\n".join(all_items)
with open("resource-list.txt", "w") as f:
f.write(to_write)
and we’re done!
I have a CSV file that is structured as below :
Store, Region, District, MallName, Location
1234,90,910,MallA,GMT
4567,87,902,MallB,EST
2468,90,811,MallC,PST
1357,87,902,MallD,CST
What I was able to accomplish with my iterative brow-beating was getting a format like so:
{
"90": {
"910": {
"1234": {
"name": "MallA",
"location": "GMT"
}
},
"811": {
"2468": {
"name": "MallB",
"location": "PST"
}
}
},
"87": {
"902": {
"4567": {
"name": "MallB",
"location": "EST"
},
"1357": {
"name": "MallD",
"location": "CST"
}
}
}
}
The code below is stripped down to match the sample data set I provided but you get the idea as to what is happening. Again, it's very iterative and non-pythonic which I'm trying to also move towards. (If anyone feels the defined procedures would be worthwhile to post I can).
#************
# Main()
#************
dictHierarchy = {}
with open(getFilePath(), 'r') as f:
content = [line.strip('\n') for line in f.readlines()]
for data in content:
data = data.split(",")
myRegion = data[1]
myDistrict = data[2]
myName = data[3]
myLocation = data[4]
myStore = data[0]
if myRegion in dictHierarchy:
#check for District
if myDistrict in dictHierarchy[myRegion]:
#checkforStore
dictHierarchy[myRegion][myDistrict].update({myStore:addStoreDetails(data)})
else:
#add district
dictHierarchy[myRegion].update({myDistrict:addStore(data)})
else:
#add region
dictHierarchy.update({myRegion:addDistrict(data)})
with open('hierarchy.json', 'w') as outfile:
json.dump(dictHierarchy, outfile)
Obsessive compulsive me looked at the JSON output above and thought that to someone blindly opening the file it looks like a hodge-podge. What I was hoping to do for plain-text readability is group the data and throw it into JSON format as so:
{"Regions":[
{"Region":"90", "Districts":[
{"District":"910", "Stores":[
{"Store":"1234", "name":"MallA", "location":"GMT"}]},
{"District":"811", "Stores":[
{"Store":"2468", "name":"MallC", "location":"PST"}]}]},
{"Region":"87", "Districts":[
{"District":"902", "Stores":[
{"Store":"4567", "name":"MallB", "location":"EST"},
{"Store":"1357", "name":"MallD", "location":"CST"}]}]}]}
Long story short I wasted quite some time today trying to sort out how to actually populate the data structure in Python and essentially ended up no where. Is there a clean, pythonic way to achieve this? Is it even worth the effort?
I've added headers to your input like:
Store,Region,District,name,location
1234,90,910,MallA,GMT
4567,87,902,MallB,EST
2468,90,811,MallC,PST
1357,87,902,MallD,CST
then used python csv reader and group by like this:
import csv
from itertools import groupby, ifilter
from operator import itemgetter
data = []
with open('in.csv') as csvfile:
reader = csv.DictReader(csvfile)
regions = []
regions_dict = sorted(list(reader), key=itemgetter('Region'))
for region_id, region_group in groupby(regions_dict, itemgetter('Region')):
districts = []
regions.append({'Region': region_id, 'Districts': districts})
districts_dict = sorted(region_group, key=itemgetter('District'))
for district_id, district_group in groupby(districts_dict, itemgetter('District')):
districts.append({'District': district_id, 'Stores': list(district_group)})
print regions
Can you give some idea how to do this collection. The problem is this: I get the JSON to assume the following
[{
"pk": 1,
"model": "store.book",
"fields": {
"name": "Mostly Harmless",
"author": ["Douglas", "Adams"]
}
}]
then unzip a file I save the data and close the file, the next time (this is a cycle) again receive again like JSON, for example, the following
[{
"pk": 2,
"model": "store.book",
"fields": {
"name": "Henry",
"author": ["Hans"]
}
}]
the second JSON must go into the same file in which it is located and the first. Here comes the problem how to do. At this stage, I do it in the following way, delete the brackets and put commas.Is there any smarter and a better way for this job?
Creating JSON-Serializing Django objects of a use. I would be very grateful if you share their ideas.
PS: It is important to use minimal memory. Assume that the file is around 50-60 GB and in memory to hold about 1 GB maximum
You would have to convert your data into JSON and store it into a file. Then read from the file again and append your new data to the object and again save it into the file. Here is some code that might be useful for you:
Use JSON. Documentation available at - http://docs.python.org/2/library/json.html
The first time you write into the file, you can use something like:
>>> import json
>>> fileW = open("filename.txt","w")
>>> json1 = [{
... "pk": 1,
... "model": "store.book",
... "fields": {
... "name": "Mostly Harmless",
... "author": ["Douglas", "Adams"]
... }
... }]
>>> json.dump(json1, fileW)
>>> fileW.close()
The following code could be used in a loop to read from the file and add data to it.
>>> fileLoop = open("filename.txt","r+")
>>> jsonFromFile = json.load(fileLoop)
>>> jsonFromFile
[{u'pk': 1, u'model': u'store.book', u'fields': {u'name': u'Mostly Harmless', u'author': [u'Douglas', u'Adams']}}]
>>> newJson = [{
... "pk": 2,
... "model": "store.book",
... "fields": {
... "name": "Henry",
... "author": ["Hans"]
... }
... }]
>>> jsonFromFile.append(newJson[0])
>>> jsonFromFile
[{u'pk': 1, u'model': u'store.book', u'fields': {u'name': u'Mostly Harmless', u'author': [u'Douglas', u'Adams']}}, {'pk': 2, 'model': 'store.book', 'fields': {'name': 'Henry', 'author': ['Hans']}}]
>>> json.dump(jsonFromFile, fileLoop)
>>> fileLoop.close()
You do not need to parse the JSON because you are only storing it. The following (a) creates a file and (b) appends text to the file in each cycle.
from os.path import getsize
def init(filename):
"""
Creates a new file and sets its content to "[]".
"""
with open(filename, 'w') as f:
f.write("[]")
f.close()
def append(filename, text):
"""
Appends a JSON to a file that has been initialised with `init`.
"""
length = getsize(filename) #Figure out the number of characters in the file
with open(filename, 'r+') as f:
f.seek(length - 1) #Go to the end of the file
if length > 2: #Insert a delimiter if this is not the first JSON
f.write(",\n")
f.write(text[1:-1]) #Append the JSON
f.write("]") #Write a closing bracket
f.close()
filename = "temp.txt"
init(filename)
while mycondition:
append(filename, getjson())
If you did not have to save the JSON after each cycle, you could do the following
jsons = []
while mycondition:
jsons.append(getjson()[1:-1])
with open("temp.txt", "w") as f:
f.write("[")
f.write(",".join(jsons))
f.write("]")
f.close()
To avoid creating multigigabytes objects, you could store each object on a separate line. It requires you to dump each object without newlines used for formatting (json strings themselves may use \n (two chars) as usual):
import json
with open('output.txt', 'a') as file: # open the file in the append mode
json.dump(obj, file,
separators=',:') # the most compact representation by default
file.write("\n")