I am writing a parser to grab data from GitHub api, and I would like to output the file into the following .js format:
//Ideal output
var LIST_DATA = [
{
"name": "Python",
"type": "category"
}]
Though I am having trouble writing into output.js file with the variable var LIST_DATA, no matter what I do with the string, the end result shows as "var LIST_DATA"
For example:
//Current Output
"var LIST_DATA = "[
{
"name": "Python",
"type": "category"
}]
My Python script:
var = "var LIST_DATA = "
with open('list_data.js', 'w') as outfile:
outfile.write(json.dumps(var, sort_keys = True))
I also tried using strip method according to this StackOverflow answer and got the same result
var = "var LIST_DATA = "
with open('list_data.js', 'w') as outfile:
outfile.write(json.dumps(var.strip('"'), sort_keys = True))
I am assuming, whenever I am dumping the text into the js file, the string gets passed in along with the double quote... Is there a way around this?
Thanks.
If you pass a string to json.dumps it will always be quoted. The first part (the name of the variable) is not JSON - so you want to just write it verbatim, and then write the object using json.dumps:
var = "var LIST_DATA = "
my_dict = [
{
"name": "Python",
"type": "category"
}
]
with open('list_data.js', 'w') as outfile:
outfile.write(var)
# Write the JSON value here
outfile.write(json.dumps(my_dict))
Try
var = '''var LIST_DATA = [
{
name: "Python",
type: "category"
}]'''
with open("list_data.js", "w") as f:
f.write(var)
The json library is not what you are looking for here.
And also, dictionaries in Javascript do not require quoted keys unless there are spaces in the key.
Output:
Related
I have large file (about 3GB) which contains what looks like a JSON file but isn't because it lacks commas (,) between "observations" or JSON objects (I have about 2 million of these "objects" in my data file).
For example, this is what I have:
{
"_id": {
"$id": "fh37fc3huc3"
},
"messageid": "4757724838492485088139042828",
"attachments": [],
"usernameid": "47284592942",
"username": "Alex",
"server": "475774810304151552",
"text": "Must watch",
"type": "462050823720009729",
"datetime": "2018-08-05T21:20:20.486000+00:00",
"type": {
"$numberLong": "0"
}
}
{
"_id": {
"$id": "23453532dwq"
},
"messageid": "232534",
"attachments": [],
"usernameid": "273342",
"usernameid": "Alice",
"server": "475774810304151552",
"text": "https://www.youtube.com/",
"type": "4620508237200097wd29",
"datetime": "2018-08-05T21:20:11.803000+00:00",
"type": {
"$numberLong": "0"
}
And this is what I want (the comma between "observations"):
{
"_id": {
"$id": "fh37fc3huc3"
},
"messageid": "4757724838492485088139042828",
"attachments": [],
"username": "Alex",
"server": "475774810304151552",
"type": {
"$numberLong": "0"
}
},
{
"_id": {
"$id": "23453532dwq"
},
"messageid": "232534",
"attachments": [],
"usernameid": "Alice",
"server": "475774810304151552",
"type": {
"$numberLong": "0"
}
This is what I tried but it doesn't give me a comma where I need it:
import re
with open('dataframe.txt', 'r') as input, open('out.txt', 'w') as output:
output.write("[")
for line in input:
line = re.sub('', '},{', line)
output.write(' '+line)
output.write("]")
What can I do so that I can add a comma between each JSON object in my datafile?
This solution presupposes that none of the fields in JSON contains neither { nor }.
If we assume that there is at least one blank line between JSON dictionaries, an idea: let's maintain unclosed curly brackets count ({) as unclosed_count; and if we meet an empty line, we add the coma once.
Like this:
with open('test.json', 'r') as input_f, open('out.json', 'w') as output_f:
output_f.write("[")
unclosed_count = 0
comma_after_zero_added = True
for line in input_f:
unclosed_count_change = line.count('{') - line.count('}')
unclosed_count += unclosed_count_change
if unclosed_count_change != 0:
comma_after_zero_added = False
if line.strip() == '' and unclosed_count == 0 and not comma_after_zero_added:
output_f.write(",\n")
comma_after_zero_added = True
else:
output_f.write(line)
output_f.write("]")
Assuming sufficient memory, you can parse such a stream one object at a time using json.JSONDecoder.raw_decode directly, instead of using json.loads.
>>> x = '{"a": 1}\n{"b": 2}\n' # Hypothetical output of open("dataframe.txt").read()
>>> decoder = json.JSONDecoder()
>>> x = '{"a": 1}\n{"b":2}\n'
>>> decoder.raw_decode(x)
({'a': 1}, 8)
>>> decoder.raw_decode(x, 9)
({'b': 2}, 16)
The output of raw_decode is a tuple containing the first JSON value decoded and the position in the string where the remaining data starts. (Note that json.loads just creates an instance of JSONDecoder, and calls the decode method, which just calls raw_decode and artificially raises an exception if the entire input isn't consumed by the first decoded value.)
A little extra work is involved; note that you can't start decoding with whitespace, so you'll have to use the returned index to detect where the next value starts, following any additional whitespace at the returned index.
Another way to view your data is that you have multiple json records separated by whitespace. You can use the stdlib JSONDecoder to read each record, then strip whitespace and repeat til done. The decoder reads a record from a string and tells you how far it got. Apply that iteratively to the data until all is consumed. This is far less risky than making a bunch of assumptions about what data is contained in the json itself.
import json
def json_record_reader(filename):
with open(filename, encoding="utf-8") as f:
txt = f.read().lstrip()
decoder = json.JSONDecoder()
result = []
while txt:
data, pos = decoder.raw_decode(txt)
result.append(data)
txt = txt[pos:].lstrip()
return result
print(json_record_reader("data.json"))
Considering the size of your file, a memory mapped text file may be the better option.
If you're sure that the only place you will find a blank line is between two dicts, then you can go ahead with your current idea, after you fix its execution. For every line, check if it's empty. If it isn't, write it as-is. If it is, write a comma instead
with open('dataframe.txt', 'r') as input_file, open('out.txt', 'w') as output_file:
output_file.write("[")
for line in input_file:
if line.strip():
output_file.write(line)
else:
output_file.write(",")
output_file.write("]")
If you cannot guarantee that any blank line must be replaced by a comma, you need a different approach.
You want to replace a close-bracket, followed by an empty line (or multiple whitespace), followed by an open-bracket, with },{.
You can keep track of the previous two lines in addition to the current line, and if these are "}", "", and "{" in that order, then write a comma before writing the "{".
from collections import deque
with open('dataframe.txt', 'r') as input_file, open('out.txt', 'w') as output_file:
last_two_lines = deque(maxlen=2)
output_file.write("[")
for line in input_file:
line_s = line.strip()
if line_s == "{" and list(last_two_lines) == ["}", ""]:
output_file.write("," + line)
else:
output_file.write(line)
last_two_lines.append(line_s)
Alternatively, if you want to stick with regex, then you could do
with open('dataframe.txt') as input_file:
file_contents = input_file.read()
repl_contents = re.sub(r'\}(\s+)\{', r'},\1{', file_contents)
with open('out.txt', 'w') as output_file:
output_file.write(repl_contents)
Here, the regex r"\}(\s+)\{" matches the pattern we're looking for (\s+ matches multiple whitespace characters, and captures them in group 1, which we then use in the replacement string as \1.
Note that you will need to read and run re.sub on the entire file, which will be slow.
I know that the title may not be very clear so here's my problem below:
I'm pretty new to Python, I have a yaml file that contains many occurrence of this bloc of code:
x-amazon-apigateway-integration:
responses:
default:
statusCode: "200"
uri: addProfile_uri
passthroughBehavior: when_no_match
httpMethod: POST
cacheNamespace: roq9wj
cacheKeyParameters:
- method.request.path.proxy
type: aws_proxy
x-amazon-apigateway-integration:
responses:
default:
statusCode: "200"
uri: deleteProfile_uri
passthroughBehavior: when_no_match
httpMethod: POST
cacheNamespace: roq9wj
cacheKeyParameters:
- method.request.path.proxy
type: aws_proxy
And a json file that contains this:
[
{
"function_variable_uri_name": "addProfile_uri",
"uri": "arn:aws:apigateway:XXXXXX"
},
{
"function_variable_uri_name": "deleteProfile_uri",
"uri": "arn:aws:apigateway:XXXXXX"
},
{
"function_variable_uri_name": "getAllProfile_uri",
"uri": "arn:aws:apigateway:XXXXXX"
},
{
"function_variable_uri_name": "getProfile_uri",
"uri": "arn:aws:apigateway:XXXXXX"
},
{
"function_variable_uri_name": "updateProfile_uri",
"uri": "arn:aws:apigateway:XXXXXX"
}
]
So in my Python code, I've tried to loop inside the JSON file, extract both uri and function_variable_uri_name values. The idea was to loop inside the yaml file and search for every occurrence of function_variable_uri_name (example: deleteProfile_uri) and change it with the uri value arn:aws:apigateway:XXXXXX.
import json
flambda = open('uri_var.json')
lambda_inputs = json.load(flambda)
fout = open("profile_modif.yaml", "wt")
with open('profile.yaml', 'r+') as file:
for each in lambda_inputs:
variable_uri_name = each['function_variable_uri_name']
uri = each['uri']
for line in file:
fout.write(line.replace(variable_uri_name, uri))
file.close()
My Python code above is changing only the value of the first occurrence of variable_uri_name inside YAML which is addProfile_uri to the value of uri and not changing the deleteProfile_uri, and I need the for-loop inside my JSON file because I have many inputs to be treated.
Update: Here is a simple print for both values of function_variable_uri_name and uri from my json file:
addProfile_uri
arn:aws:apigateway:XXXXXX
deleteProfile_uri
arn:aws:apigateway:XXXXXX
getAllProfile_uri
arn:aws:apigateway:XXXXXX
getProfile_uri
arn:aws:apigateway:XXXXXX
updateProfile_uri
arn:aws:apigateway:XXXXXX
Any solutions, please? Thank you all in advance.
Change how you're thinking about the work. You want to work once on each line, and perform multiple check and replace operations on each line. In other words, you want to iterate on the file first, then iterate on the replacement variables in a nested operation on the file operation:
import json
with open('uri_var.json') as flambda:
lambda_inputs = json.load(flambda)
with open("profile_modif.yaml", "wt") as fout:
with open('profile.yaml', 'r+') as file:
for line in file:
for item in lambda_inputs:
line = line.replace(item['function_variable_uri_name'], item['uri'])
fout.write(line)
I searched for a long time but I am not really familiar with python and json and I can't find the answer of my problem.
Here is my Python script
import json
jsonFile = open("config.json", "r")
data = json.load(jsonFile)
data.format(friendly, teaching, leader, target)
print(data)
Here is json the file:
{
"commend": {
"friendly": {},
"teaching": {},
"leader": {}
},
"account": {
"username": "",
"password": "",
"sharedSecret": ""
},
"proxy": {
"enabled": false,
"file": "proxies.txt",
"switchProxyEveryXaccounts": 5
},
"type": "COMMEND",
"method": "SERVER",
"target": "https://steamcommunity.com/id/{}",
"perChunk": 20,
"betweenChunks": 300000,
"cooldown": 28800000,
"steamWebAPIKey": "{}",
"disableUpdateCheck": false
}
I tried .format but we can't use this method with with a dictionary.
With your help I managed to find the answer A big thank you for your speed and your help ! Here is what I did:
import json
jsonFile = open("config.json", "r")
data = json.load(jsonFile)
(data['commend']['friendly']) = nbfriendly
(data['commend']['teaching']) = nbteaching
(data['commend']['leader']) = nbleader
print(data)
print(data)
A json file is a dictionary, so you can use dict methods with it. Here is the code:
import json
with open("config.json", "r") as json_file:
data = json.load(json_file)
# Let's say you want to add the string "Hello, World!" to the "password" key
data["account"]["password"] += "Hello, World!"
# Or you can use this way to overwrite anything already written inside the key
data["account"]["password"] = "Hello, World!"
print(data)
You can add data by tranversing through it like a dictionary:
data['key'] = value
Example:
dic["commend"]["friendly"]={'a':1}
I'm trying to add key value pairs into the existing JSON file. I am able to concatenate to the parent label, How to add value to the child items?
JSON file:
{
"students": [
{
"name": "Hendrick"
},
{
"name": "Mikey"
}
]
}
Code:
import json
with open("input.json") as json_file:
json_decoded = json.load(json_file)
json_decoded['country'] = 'UK'
with open("output.json", 'w') as json_file:
for d in json_decoded[students]:
json.dump(json_decoded, json_file)
Expected Results:
{
"students": [
{
"name": "Hendrick",
"country": "UK"
},
{
"name": "Mikey",
"country": "UK"
}
]
}
You can do the following in order to manipulate the dict the way you want:
for s in json_decoded['students']:
s['country'] = 'UK'
json_decoded['students'] is a list of dictionaries that you can simply iterate and update in a loop. Now you can dump the entire object:
with open("output.json", 'w') as json_file:
json.dump(json_decoded, json_file)
import json
with open("input.json", 'r') as json_file:
json_decoded = json.load(json_file)
for element in json_decoded['students']:
element['country'] = 'UK'
with open("output.json", 'w') as json_out_file:
json.dump(json_decoded, json_out_file)
opened a json file i.e. input.json
iterated through each of its element
add a key named "country" and dynamic value "UK", to each element
opened a new json file with the modified JSON.
Edit:
Moved writing to output file inside to first with segment. Issue with earlier implemenation is that json_decoded will not be instantiated if opening of input.json fails. And hence, writing to output will raise an exception - NameError: name 'json_decoded' is not defined
This gives [None, None] but update the dict:
a = {'students': [{'name': 'Hendrick'}, {'name': 'Mikey'}]}
[i.update({'country':'UK'}) for i in a['students']]
print(a)
I have an existing JSON file and trying to add string into file. But as soon as I write a JSON file the new line characters do disappear in the JSON file and format gets changed.
Below is the code:
#!/usr/bin/python
import json
userinput = raw_input('Enter the name of a file a you want to read: ')
with open(userinput) as json_data:
s = json_data.read()
data = json.loads(s)
print data['classes']
json_data.close()
class_add = raw_input('Enter the name of a class a you want to add: ')
if class_add in data['classes']:
print "Class %s already exists, doing nothing." % class_add
else:
data['classes'].append(class_add)
print json.dumps(data)
print data['classes']
with open(userinput, 'w') as json_data:
json_data.write(json.dumps(data))
json_data.close()
One more import thing here is, that the formatting of the JSON file. So by default we will be having the file in the below formatting.
# cat test.json
{
"selinux_mode": "enforcing",
"cis_manages_auditd_service": true,
"classes": [ "basic", "admin", "lvm"]
}
#
But once we add the class it becomes the following.
# cat test.json
{"cis_manages_auditd_service": true, "classes": [ "basic", "admin", "lvm"], "selinux_mode": "enforcing"}
Is there any way that I can keep the JSON whitespace and new line character as it is without changing anything.
JSON doesn't need a particular layout, but for slightly less bad human readability you can supply indent=2
import sys
import json
userinput = raw_input('Enter the name of a file a you want to read: ')
with open(userinput) as json_data:
data = json.load(json_data)
print(data['classes'])
class_add = raw_input('Enter the name of a class a you want to add: ')
if class_add in data['classes']:
print("Class {} already exists, doing nothing.".format(class_add))
else:
data['classes'].append(class_add)
json.dump(data, sys.stdout, indent=2)
print(data['classes'])
with open(userinput, 'w') as json_data:
json.dump(data, json_data, indent=2)
Please note that if you are using the with statement to open a file, you should not close it explicitly (that is done at the end of the block, whether there is an exception or not).
If you are writing to a file, instead of processing the JSON as string, you should also refrain from using json.dumps(data) and use json.dump(data, file_pointer) instead. The same holds for json.loads() and json.load().
You can set the indent argument to function json.dumps() to an integer representing the number of spaces to indent.
This will still modify the formatting of the output to something like this:
import json
s = '''{
"selinux_mode": "enforcing",
"cis_manages_auditd_service": true,
"classes": [ "basic", "admin", "lvm"]
}'''
data = json.loads(s)
>>> print(json.dumps(data))
{"cis_manages_auditd_service": true, "classes": ["basic", "admin", "lvm"], "selinux_mode": "enforcing"}
>>> print(json.dumps(data, indent=4))
{
"cis_manages_auditd_service": true,
"classes": [
"basic",
"admin",
"lvm"
],
"selinux_mode": "enforcing"
}
The difference between the original and the updated version is that the classes list is now displayed across multiple lines.