Delete specific content in a json file - python

I have this json file :
{
"help": [
{
"title": "",
"date": "",
"link": ""
},
{
"title": "",
"date": "",
"link": ""
},
{
"title": "",
"date": "",
"link": ""
}
]
}
And I am currently struggling trying to delete each 'block' in the help list.
I eventually came up with this:
import json
with open('dest_file.json', 'w') as dest_file:
with open('source.json', 'r') as source_file:
for line in source_file:
element = json.loads(line.strip())
if 'help' in element:
del element['help']
dest_file.write(json.dumps(element))
So I was wondering how could I delete each thing in the help list, without deleting the help list.
ty

Yes you can replace the element with an empty list:
if 'help' in element:
element['help'] = []

You have some further issues in the script specifically with line for line in source_file. If you read line by line then you are getting each line and not the complete dictionary object and that is giving several other json errors.
Try this:
import json
with open('dest_file.json', 'w') as dest_file:
with open('source.json', 'r') as source_file:
element = json.load(source_file)
if "help" in element:
element['help'] = []
dest_file.write(json.dumps(element))
This works for the above example shown but if you have multiple nested items, then you need to iterate over each separately and fix them individually.

Related

Json count unique values

I have this json file:
[
{
"#timestamp": "",
"userID": "",
"destinationUserName": "",
"message": ": 12,050",
"name": "Purge Client Events"
},
{
"#timestamp": "",
"userID": "",
"destinationUserName": "",
"message": "",
"name": ""
},
{
"#timestamp": "",
"userID": "",
"destinationUserName": "",
"message": "",
"name": ""
},
{
"#timestamp": "",
"userID": "",
"name": "",
"sourceUserName": "",
"deviceAction": ""
}
]
I am looking for a solution in which I can loop over all the file, and count the unique values for UserID and return that value printed.
I found different solution but non of them worked for me and I am completely stuck.
So far this is my code, its just a formatter that convert the file into a json format.
After that I tried to check the length of the file and loop over it appending unique elements.
import json
to_queue = []
def structure_json():
with open("file.json", "r+") as f:
old = f.read()
f.seek(0) # rewind
# save to the old string after replace
new = old.replace('}{', '},{')
f.write(new)
tmps = '[' + str(new) + ']'
json_string = json.loads(tmps)
for key in json_string:
to_queue.append(key)
f.close
with open('update_2.json', 'w') as file:
json.dump(json_string, file, indent=2)
size=len(file["UserID"])
uniqueNames = [];
for i in range(0,size,1):
if(file["UserID"] not in uniqueNames):
uniqueNames.append(file["UserID"]);
print(uniqueNames)
structure_json()
print(to_queue)
But I get the following error:
Traceback (most recent call last):
File "format.py", line 24, in <module>
structure_json()
File "format.py", line 17, in structure_json
size=len(file["UserID"])
TypeError: '_io.TextIOWrapper' object is not subscriptable
Please any help will be much appreciated. Thank you so much for any help, and if you need any more info just let me know
Open the file and load the content. Then you can iterate over list of dicts and crate set of all values for key userID. Note, if any missing key it will yield None and will affect the count (+1).
import json
with open('your_file.json') as f:
data = json.load(f)
users = set(item.get('userID') for item in data)
print(len(users))
print(users)

Format a text file in python when it finds a specific work?

I am looking to format a text file from an api request output. So far my code looks like such:
import requests
url = 'http://URLhere.com'
headers = {'tokenname': 'tokenhash'}
response = requests.get(url, headers=headers,)
with open('newfile.txt', 'w') as outf:
outf.write(response.text)
and this creates a text file but the output is on one line.
What I am trying to do is:
Have it start a new line every time the code reaches a certain word like "id","status", or "closed_at" but unfortunately I have not been able to figure this out.
Also I am trying to get a count of how many "id" there are in the file but I think due to the formatting, the script does not like it.
The output is as follows:
{
[
{
"id": 12345,
"status": "open or close",
"closed_at": null,
"created_at": "yyyy-mm-ddTHH:MM:SSZ",
"due_date": "yyyy-mm-dd",
"notes": null,
"port": [pnumber
],
"priority": 1,
"identifiers": [
"12345"
],
"last_seen_time": "yyyy-mm-ddThh:mm:ss.sssZ",
"scanner_score": 1.0,
"fix_id": 12345,
"scanner_vulnerabilities": [
{
"port": null,
"external_unique_id": "12345",
"open": false
}
],
"asset_id": 12345
This continues on one line with the same names but for different assets.
This code :
with open ('text.txt') as text_file :
data = text_file.read ()
print ('\n'.join (data.split (',')))
Gives this output :
"{[{"id":12345
"status":"open or close"
"closed_at":null
"created_at":"yyyy-mm-ddTHH:MM:SSZ"
"due_date":"yyyy-mm-dd"
"notes":null
"port":[pnumber]
"priority":1
"identifiers":["12345"]
"last_seen_time":"yyyy-mm-ddThh:mm:ss.msmsmsZ"
"scanner_score":1.0
"fix_id":12345
"scanner_vulnerabilities":[{"port":null
"external_unique_id":"12345"
"open":false}]
"asset_id":12345"
And then to write it to a new file :
output = data.split (',')
with open ('new.txt', 'w') as write_file :
for line in output :
write_file.write (line + '\n')

Reading json in python separated by newlines

I am trying to read some json with the following format. A simple pd.read_json() returns ValueError: Trailing data. Adding lines=True returns ValueError: Expected object or value. I've tried various combinations of readlines() and load()/loads() so far without success.
Any ideas how I could get this into a dataframe?
{
"content": "kdjfsfkjlffsdkj",
"source": {
"name": "jfkldsjf"
},
"title": "dsldkjfslj",
"url": "vkljfklgjkdlgj"
}
{
"content": "djlskgfdklgjkfgj",
"source": {
"name": "ldfjkdfjs"
},
"title": "lfsjdfklfldsjf",
"url": "lkjlfggdflkjgdlf"
}
The sample you have above isn't valid JSON. To be valid JSON these objects need to be within a JS array ([]) and be comma separated, as follows:
[{
"content": "kdjfsfkjlffsdkj",
"source": {
"name": "jfkldsjf"
},
"title": "dsldkjfslj",
"url": "vkljfklgjkdlgj"
},
{
"content": "djlskgfdklgjkfgj",
"source": {
"name": "ldfjkdfjs"
},
"title": "lfsjdfklfldsjf",
"url": "lkjlfggdflkjgdlf"
}]
I just tried on my machine. When formatted correctly, it works
>>> pd.read_json('data.json')
content source title url
0 kdjfsfkjlffsdkj {'name': 'jfkldsjf'} dsldkjfslj vkljfklgjkdlgj
1 djlskgfdklgjkfgj {'name': 'ldfjkdfjs'} lfsjdfklfldsjf lkjlfggdflkjgdlf
Another solution if you do not want to reformat your files.
Assuming your JSON is in a string called my_json you could do:
import json
import pandas as pd
splitted = my_json.split('\n\n')
my_list = [json.loads(e) for e in splitted]
df = pd.DataFrame(my_list)
Thanks for the ideas internet. None quite solved the problem in the way I needed (I had lots of newline characters in the strings themselves which meant I couldn't split on them) but they helped point the way. In case anyone has a similar problem, this is what worked for me:
with open('path/to/original.json', 'r') as f:
data = f.read()
data = data.split("}\n")
data = [d.strip() + "}" for d in data]
data = list(filter(("}").__ne__, data))
data = [json.loads(d) for d in data]
with open('path/to/reformatted.json', 'w') as f:
json.dump(data, f)
df = pd.read_json('path/to/reformatted.json')
If you can use jq then solution is simpler:
jq -s '.' path/to/original.json > path/to/reformatted.json

json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 21641 (char 21640)

I am getting the error as described in the title when calling the function:
def read_file(file_name):
"""Return all the followers of a user."""
f = open(file_name, 'r')
data = []
for line in f:
data.append(json.loads(line.strip()))
f.close()
return data
Sample data looks like this:
"from": {"name": "Ronaldo Naz\u00e1rio", "id": "618977411494412"},
"message": "The king of football. Happy birthday, Pel\u00e9!",
"type": "photo", "shares": {"count": 2585}, "created_time": "2018-10-23T11:43:49+0000",
"link": "https://www.facebook.com/ronaldo/photos/a.661211307271022/2095750393817099/?type=3",
"status_type": "added_photos",
"reactions": {"data": [], "summary": {"total_count": 51779, "viewer_reaction": "NONE"}},
"comments": {"data": [{"created_time": "2018-10-23T11:51:57+0000", ... }]}
You are trying to parse each line of the file as JSON on its own, which is probably wrong. You should read the entire file and convert to JSON at once, preferably using with so Python can handle the opening and closing of the file, even if an exception occures.
The entire thing can be condensed to 2 lines thanks to json.load accepting a file object and handling the reading of it on its own.
def read_file(file_name):
with open(file_name) as f:
return json.load(f)

Unable to format the output of a link in python

I am trying to access weather api data. It returns a long a less human readable single line. I am trying to replace every bracket({) with '{/n' so that bracket remains but as well a new line character just for better readable json.
But it returns every character on a new line in the shell.
import urllib2
url2 = 'http://api.openweathermap.org/data/2.5/find?q=london,PK&units=metric'
data = urllib2.urlopen(url2)
s = data.read()
count = 0
s = s.replace('{',"{\n")
#s = ''.join(s)
for line in s:
print line
count = count + 1
print count
after join() the problem still persists.
The problematic output after this code is like this
Why don't you use the built-in capabilities of the json library that's standard in Python?
import urllib2
import json
url2 = 'http://api.openweathermap.org/data/2.5/find?q=london,PK&units=metric'
data = urllib2.urlopen(url2)
# read the contents in and parse the JSON.
jsonData = json.loads(data.read())
# print it out nicely formatted:
print json.dumps(jsonData, sort_keys=True, indent=4, separators=(',', ': '))
output:
{
"cod": "200",
"count": 1,
"list": [
{
"clouds": {
"all": 20
},
"coord": {
"lat": 38.7994,
"lon": -89.9603
},
"dt": 1442072098,
"id": 4237717,
"main": {
"humidity": 67,
"pressure": 1020,
"temp": 16.82,
"temp_max": 18.89,
"temp_min": 15
},
"name": "Edwardsville",
"sys": {
"country": "United States of America"
},
"weather": [
{
"description": "few clouds",
"icon": "02d",
"id": 801,
"main": "Clouds"
}
],
"wind": {
"deg": 350,
"speed": 4.6
}
}
],
"message": "accurate"
}
The issue is here:
for line in s:
print line
At this point, it will print every character on a separate line - that's what print does (it adds a trailing newline to each print command), as shown by this code:
print 1
print
print 2
which outputs this:
1
2
You may be confused with the name line, but it's not a special variable name. You can change the word line to any valid variable name and it will work the same way.
A for loop will iterate over an iterable. If it's a file, it will do each line. A list will be each element, and a string goes over every character. Because you say to print it, it then prints them individually.
Are you expecting a non-string response from the API? If it gives a list like this:
["calls=10","message=hello"]
then your for loop will print each in turn. But if it's just a string, like "message=hello" it will print each character.
And the reason there is a blank newline after the {? Because the replace command is working fine.
s is just a string, so doing for x in s actually iterates over individual characters of s, not over its lines. I think you're confusing it with for line in f when f is a file object!

Categories

Resources