I have a huge json array file downloaded which needs to be split into smaller files but I need the smaller files in the below format (newline for each new object in array): (original json is also in the same format)
[
{"a":"a1","b":"b1","c":"c1"},
{"a":"a2","b":"b2","c":"c2"},
{"a":"a3","b":"b3","c":"c3"}
]
I used json.dump but it is just printing the smaller array in a single line and using the indent option is also not giving me the output in the above format
Although I don't know what your original json looks like, you would basically want something like this
lines = []
for something in original_json:
line={something['a']:something['aa']} #whatever you need to do to get your values
lines.append(line)
#alternatively you can simplify this by doing lines.append({something['a']:something['aa'], etc}
with open('myfile.json', 'a+') as f1:
f1.write("[\n")
for line in lines:
f1.write("%s,\n"%(line))
f1.write("]")
Related
I am using Python and I have a text file with results from a previous complex code. It wrote to a file called 'results' structured by:
xml file name.xml
['chebi:28726', 'chebi:27466', 'chebi:27721', 'chebi:15532', 'chebi:15346']
xml file name.xml
['chebi:27868', 'chebi:27668', 'chebi:15471', 'chebi:15521', 'chebi:15346']
xml file name.xml
['chebi:28528', 'chebi:28325', 'chebi:10723', 'chebi:28493', 'chebi:15346']
etc...
my current code is:
file = open("results.txt", "r")
data = file.readlines()
for a in data:
print(a)
The problem is I want to grab the specific elements within that list, for example chebi:28528, and convert them from their current compounds into a different format. I wrote the code for this conversion already, but am having trouble with the step before the actual conversion of the compounds.
The problem is that I need to be able to loop through the file and select each element from that list but I am unable to do so.
If i do
for a in data:
for b in a:
It selects each individual character and not the entire word (chebi:28528).
Is there a way I can loop through the text file and grab just the specific Chebi compounds so that I can then convert them into a different format needed? Python is treating the entire list of compounds as 1 elements, and indexing within that list will just correspond to a character rather than the compound.
So assuming that your file is as above, it looks like you have lists in raw test format. You can loop on those word elements by converting them to Python lists using ast or something similar.
You had the right ideas but you're looping through characters actually. How about this?
import ast
with open('results.txt', 'r') as f:
data = f.readlines()
for line in data:
if '[' not in line:
continue
ls = ast.literal_eval(line)
for word in ls:
if 'chebi' in word:
process_me(word)
I have a .txt file containing formatting elements as \n for line breaks which I want to read and then rewrite its data until a specific line back to a new .txt file. My code looks like this:
with open (filename) as f:
content=f.readlines()
with open("lf.txt", "w") as file1:
file1.write(str(content))
file1.close
The output file lf.txt is produced correctly but it throws away the formatting of the input file. Is there a way to keep the formatting of file 1 when rewriting it to a new file?
You converted content to a string, while it's really a list of strings (lines).
Use join to convert the lines back to a string:
file1.write(''.join(content))
join is a string method, and it is activated in the example from an empty string object. The string calling this method is used as a separator for the strings joining process. In this situation we don't need any separator, just joining the strings.
Trying to handle some JSON response to a Python Requests call to an API, in Python--a language I'm still learning.
Here's the structure of the sample returned JSON data:
{"sports":[{"searchtype":"seasonal", "sports":["'baseball','football','softball','soccer','summer','warm'","'hockey','curling','luge','snowshoe','winter','cold'"]}]}
Currently, I'm parsing and writing output to a file like this:
output = response.json
results = output['sports'][0]['sports']
if results:
with open (filename, "w") as fileout:
fileout.write(pprint.pformat(results))
Giving me this as my file:
[u"'baseball','football','softball','soccer','summer','warm'",
"'hockey','curling','luge','snowshoe','winter','cold'"]
Since I'm basically creating double-quoted JSON Arrays, consisting of comma separated strings--how can I manipulate the array to print only the comma separated values I want? In this case, everything except the fifth column which represents seasons.
[u"'baseball','football','softball','soccer','warm'",
"'hockey','curling','luge','snowshoe','cold'"]
Ultimately, I'd like to strip away the unicode too, since I have no non-ascii characters. I currently do this manually with a language I'm more familiar with (AWK) after the fact. My desired output is really:
'baseball','football','softball','soccer','warm'
'hockey','curling','luge','snowshoe','cold'
your results is actually a list of strings, to get your desired output you can do it like this for example:
if results:
with open (filename, "w") as fileout:
for line in results
fileout.write(line)
I am downloading Json files from an API, I use the following code to write the JSON. Each item the loop gives me a JSON file. I need to save it and extract entities from the appended JSON file using a loop.
for item in style_ls:
dat = get_json(api, item)
specs_dict[item] = dat
with open("specs_append.txt", "a") as myfile:
json.dump(dat, myfile)
myfile.close()
print item
with open ("specs_data.txt", "w") as my file:
json.dump(spec_dict, myfile)
myfile.close()
I know that I cannot get a valid JSON format from the specs_append.txt, but I can get one from the specs_data.txt. I am doing the first one just because my program needs atleast 3-4 days to complete and there are high chances that my system may shutdown. So is there anyway I can do this efficiently ?
If not is there anyway I can extract it from specs_append.txt <{JSON}{JSON}> format (which is not a valid JSON format)?
If not should I write specs_dict to a txt file every time in the loop, so that even if program gets terminated i can start if from that point in loop and still get a valid json format?
I suggest several possible solutions.
One solution is to write custom code to slurp in the input file. I would suggest putting a special line before each JSON object in the file, such as: ###
Then you could write code like this:
import json
def json_get_objects(f):
temp = ''
line = next(f) # pull first line
assert line == SPECIAL_LINE
for line in f:
if line != SPECIAL_LINE:
temp += line
else:
# found special marker, temp now contains a complete JSON object
j = json.loads(temp)
yield j
temp = ''
# after loop done, yield up last JSON object
if temp:
j = json.loads(temp)
yield j
with open("specs_data.txt", "r") as f:
for j in json_get_objects(f):
pass # do something with JSON object j
Two notes on this. First, I am simply appending to a string over and over; this used to be a very slow way to do this in Python, so if you are using a very old version of Python, don't do it this way unless your JSON objects are very small. Second, I wrote code to split the input and yield up JSON objects one at a time, but you could also use a guaranteed-unique string, slurp in all the data with a single call to f.read() and then split on your guaranteed-unique string using the str.split() method function.
Another solution would be to write the whole file as a valid JSON list of valid JSON objects. Write the file like this:
{"mylist":[
# first JSON object, followed by a comma
# second JSON object, followed by a comma
# third JSON object
]}
This would require your file appending code to open the file with writing permission, and seek to the last ] in the file before writing a comma plus newline, then the new JSON object on the end, and then finally writing ]} to close out the file. If you do it this way, you can use json.loads() to slurp the whole thing in and have a list of JSON objects.
Finally, I suggest that maybe you should just use a database. Use SQLite or something and just throw the JSON strings in to a table. If you choose this, I suggest using an ORM to make your life simple, rather than writing SQL commands by hand.
Personally, I favor the first suggestion: write in a special line like ###, then have custom code to split the input on those marks and then get the JSON objects.
EDIT: Okay, the first suggestion was sort of assuming that the JSON was formatted for human readability, with a bunch of short lines:
{
"foo": 0,
"bar": 1,
"baz": 2
}
But it's all run together as one big long line:
{"foo":0,"bar":1,"baz":2}
Here are three ways to fix this.
0) write a newline before the ### and after it, like so:
###
{"foo":0,"bar":1,"baz":2}
###
{"foo":0,"bar":1,"baz":2}
Then each input line will alternately be ### or a complete JSON object.
1) As long as SPECIAL_LINE is completely unique (never appears inside a string in the JSON) you can do this:
with open("specs_data.txt", "r") as f:
temp = f.read() # read entire file contents
lst = temp.split(SPECIAL_LINE)
json_objects = [json.loads(x) for x in lst]
for j in json_objects:
pass # do something with JSON object j
The .split() method function can split up the temp string into JSON objects for you.
2) If you are certain that each JSON object will never have a newline character inside it, you could simply write JSON objects to the file, one after another, putting a newline after each; then assume that each line is a JSON object:
import json
def json_get_objects(f):
for line in f:
if line.strip():
yield json.loads(line)
with open("specs_data.txt", "r") as f:
for j in json_get_objects(f):
pass # do something with JSON object j
I like the simplicity of option (2), but I like the reliability of option (0). If a newline ever got written in as part of a JSON object, option (0) would still work, but option (2) would error.
Again, you can also simply use an actual database (SQLite) with an ORM and let the database worry about the details.
Good luck.
Append json data to a dict on every loop.
In the end dump this dict as a json and write it to a file.
For getting you an idea for appending data to dict:
>>> d1 = {'suku':12}
>>> t1 = {'suku1':212}
>>> d1.update(t1)
>>> d1
{'suku1': 212, 'suku': 12}
I've created a very simple piece of code to read in tweets in JSON format in text files, determine if they contain an id and coordinates and if so, write these attributes to a csv file. This is the code:
f = csv.writer(open('GeotaggedTweets/ListOfTweets.csv', 'wb+'))
all_files = glob.glob('SampleTweets/*.txt')
for filename in all_files:
with open(filename, 'r') as file:
data = simplejson.load(file)
if 'text' and 'coordinates' in data:
f.writerow([data['id'], data['geo']['coordinates']])
I've been having some difficulties but with the help of the excellent JSON Lint website have realised my mistake. I have multiple JSON objects and from what I read these need to be separated by commas and have square brackets added to the start and end of the file.
How can I achieve this? I've seen some examples online where each individual line is read and it's added to the first and last line, but as I load the whole file I'm not entirely sure how to do this.
You have a file that either contains too many newlines (in the JSON values themselves) or too few (no newlines between the tweets at all).
You can still repair this by using some creative re-stitching. The following generator function should do it:
import json
def read_objects(filename):
decoder = json.JSONDecoder()
with open(filename, 'r') as inputfile:
line = next(inputfile).strip()
while line:
try:
obj, index = decoder.raw_decode(line)
yield obj
line = line[index:]
except ValueError:
# Assume we didn't have a complete object yet
line += next(inputfile).strip()
if not line:
line += next(inputfile).strip()
This should be able to read all your JSON objects in sequence:
for filename in all_files:
for data in read_objects(filename):
if 'text' and 'coordinates' in data:
f.writerow([data['id'], data['geo']['coordinates']])
It is otherwise fine to have multiple JSON strings written to one file, but you need to make sure that the entries are clearly separated somehow. Writing JSON entries that do not use newlines, then using newlines in between them, for example, makes sure you can later on read them one by one again and process them sequentially without this much hassle.