Json count unique values - python

I have this json file:
[
{
"#timestamp": "",
"userID": "",
"destinationUserName": "",
"message": ": 12,050",
"name": "Purge Client Events"
},
{
"#timestamp": "",
"userID": "",
"destinationUserName": "",
"message": "",
"name": ""
},
{
"#timestamp": "",
"userID": "",
"destinationUserName": "",
"message": "",
"name": ""
},
{
"#timestamp": "",
"userID": "",
"name": "",
"sourceUserName": "",
"deviceAction": ""
}
]
I am looking for a solution in which I can loop over all the file, and count the unique values for UserID and return that value printed.
I found different solution but non of them worked for me and I am completely stuck.
So far this is my code, its just a formatter that convert the file into a json format.
After that I tried to check the length of the file and loop over it appending unique elements.
import json
to_queue = []
def structure_json():
with open("file.json", "r+") as f:
old = f.read()
f.seek(0) # rewind
# save to the old string after replace
new = old.replace('}{', '},{')
f.write(new)
tmps = '[' + str(new) + ']'
json_string = json.loads(tmps)
for key in json_string:
to_queue.append(key)
f.close
with open('update_2.json', 'w') as file:
json.dump(json_string, file, indent=2)
size=len(file["UserID"])
uniqueNames = [];
for i in range(0,size,1):
if(file["UserID"] not in uniqueNames):
uniqueNames.append(file["UserID"]);
print(uniqueNames)
structure_json()
print(to_queue)
But I get the following error:
Traceback (most recent call last):
File "format.py", line 24, in <module>
structure_json()
File "format.py", line 17, in structure_json
size=len(file["UserID"])
TypeError: '_io.TextIOWrapper' object is not subscriptable
Please any help will be much appreciated. Thank you so much for any help, and if you need any more info just let me know

Open the file and load the content. Then you can iterate over list of dicts and crate set of all values for key userID. Note, if any missing key it will yield None and will affect the count (+1).
import json
with open('your_file.json') as f:
data = json.load(f)
users = set(item.get('userID') for item in data)
print(len(users))
print(users)

Related

Python List Comprehension to Delete Dict in List

{
"Credentials": [
{
"realName": "Jimmy John",
"toolsOut": null,
"username": "291R"
},
{
"realName": "Grant Hanson",
"toolsOut": null,
"username": "98U9"
},
{
"realName": "Gill French",
"toolsOut": null,
"username": "F114"
}
]
}
I have a json file formatted as above and I am trying to have a function delete an entry based on the username input from a user. I want the file to then be overwritten with the entry removed.
I have tried this:
def removeUserFunc(SID):
print(SID.get())
with open('Credentials.json','r+') as json_file:
data = json.load(json_file)
data['Credentials'][:] = [item for item in data['Credentials'] if item['username'] != SID.get()]
json_file.seek(0)
json.dump(data,json_file,indent=3,sort_keys=True)
It partially works as everything in the Credentials section looks normal, but it appends a strange copied piece to the end that breaks the JSON. Say I was removing Grant and I ran this code, my JSON looks like below:
{
"Credentials": [
{
"realName": "Marcus Koga",
"toolsOut": null,
"username": "291F"
},
{
"realName": "Gill French",
"toolsOut": null,
"username": "F114"
}
]
} "realName": "Gill French",
"toolsOut": null,
"username": "F114"
}
]
}
I am relatively new to Python and editing JSONs as well.
You need to truncate the file after writing:
def removeUserFunc(SID):
print(SID.get())
with open('Credentials.json','r+') as json_file:
data = json.load(json_file)
data['Credentials'][:] = [item for item in data['Credentials'] if item['username'] != SID.get()]
json_file.seek(0)
json.dump(data,json_file,indent=3,sort_keys=True)
json_file.truncate()
You can close the file and then open the file for writing-only, which will clear the the contents of the original file before writing the new contents.:
def removeUserFunc(SID):
# open file for reading
with open('Credentials.json','r') as json_file:
# read the data to variable:
data = json.load(json_file)
# open file for writing:
with open('Credentials.json','w') as json_file:
data['Credentials'][:] = [item for item in data['Credentials'] if item['username'] != SID.get()]
json.dump(data,json_file,indent=3,sort_keys=True)
file.seek(0) just moves the file pointer but does not change the length of the file. So if you do not overwrite the entire contents of the file you will leave residual content. Use file.truncate() to set the file length to zero before writing.
json_file.seek(0)
json_file.truncate() # set file length to zero
json.dump(data,json_file,indent=3,sort_keys=True)

Delete specific content in a json file

I have this json file :
{
"help": [
{
"title": "",
"date": "",
"link": ""
},
{
"title": "",
"date": "",
"link": ""
},
{
"title": "",
"date": "",
"link": ""
}
]
}
And I am currently struggling trying to delete each 'block' in the help list.
I eventually came up with this:
import json
with open('dest_file.json', 'w') as dest_file:
with open('source.json', 'r') as source_file:
for line in source_file:
element = json.loads(line.strip())
if 'help' in element:
del element['help']
dest_file.write(json.dumps(element))
So I was wondering how could I delete each thing in the help list, without deleting the help list.
ty
Yes you can replace the element with an empty list:
if 'help' in element:
element['help'] = []
You have some further issues in the script specifically with line for line in source_file. If you read line by line then you are getting each line and not the complete dictionary object and that is giving several other json errors.
Try this:
import json
with open('dest_file.json', 'w') as dest_file:
with open('source.json', 'r') as source_file:
element = json.load(source_file)
if "help" in element:
element['help'] = []
dest_file.write(json.dumps(element))
This works for the above example shown but if you have multiple nested items, then you need to iterate over each separately and fix them individually.

Format a text file in python when it finds a specific work?

I am looking to format a text file from an api request output. So far my code looks like such:
import requests
url = 'http://URLhere.com'
headers = {'tokenname': 'tokenhash'}
response = requests.get(url, headers=headers,)
with open('newfile.txt', 'w') as outf:
outf.write(response.text)
and this creates a text file but the output is on one line.
What I am trying to do is:
Have it start a new line every time the code reaches a certain word like "id","status", or "closed_at" but unfortunately I have not been able to figure this out.
Also I am trying to get a count of how many "id" there are in the file but I think due to the formatting, the script does not like it.
The output is as follows:
{
[
{
"id": 12345,
"status": "open or close",
"closed_at": null,
"created_at": "yyyy-mm-ddTHH:MM:SSZ",
"due_date": "yyyy-mm-dd",
"notes": null,
"port": [pnumber
],
"priority": 1,
"identifiers": [
"12345"
],
"last_seen_time": "yyyy-mm-ddThh:mm:ss.sssZ",
"scanner_score": 1.0,
"fix_id": 12345,
"scanner_vulnerabilities": [
{
"port": null,
"external_unique_id": "12345",
"open": false
}
],
"asset_id": 12345
This continues on one line with the same names but for different assets.
This code :
with open ('text.txt') as text_file :
data = text_file.read ()
print ('\n'.join (data.split (',')))
Gives this output :
"{[{"id":12345
"status":"open or close"
"closed_at":null
"created_at":"yyyy-mm-ddTHH:MM:SSZ"
"due_date":"yyyy-mm-dd"
"notes":null
"port":[pnumber]
"priority":1
"identifiers":["12345"]
"last_seen_time":"yyyy-mm-ddThh:mm:ss.msmsmsZ"
"scanner_score":1.0
"fix_id":12345
"scanner_vulnerabilities":[{"port":null
"external_unique_id":"12345"
"open":false}]
"asset_id":12345"
And then to write it to a new file :
output = data.split (',')
with open ('new.txt', 'w') as write_file :
for line in output :
write_file.write (line + '\n')

json with array

I am trying to create a dict like this:
{
"Name": "string",
"Info": [
{
"MainID": "00000000-0000-0000-0000-000000000000",
"Values": [
{
"IndvidualID": "00000000-0000-0000-0000-000000000000",
"Result": "string"
},
{
"IndvidualID": "00000000-0000-0000-0000-000000000000",
"Result": "string"
}
]
}
]
}
Where the Values section has 100+ things inside of it. I put 2 as an example.
Unsure how to build this dynamically. Code I have attempted so far:
count = 0
with open('myTextFile.txt') as f:
line = f.readline()
line = line.rstrip()
myValues[count]["IndvidualID"] = count
myValues[count]["Result"] = line
count = count +1
data = {"Name": "Demo",
"Info": [{
"MainID":TEST_SUITE_ID,
"Values":myValues
}}
This does not work however. Due to "Key Error 0" it says. Works if I do it like myValues[count]= count but as soon as I add the extra layer it breaks myValues[count]["IndvidualID"] = count. I see some example of setting a list in there, but I need like a List (Values) with multiple things inside (ID and Result). Have tried a few things with no luck. Anyone have a simple way to do this in python?
Full traceback:
Traceback (most recent call last):
File "testExtractor.py", line 28, in <module>
myValues[count]["IndvidualID"] = count
KeyError: 0
If I add a few bits and pieces I can get this code to run:
count = 0
myValues = []
with open('myTextFile.txt') as f:
for line in f:
line = line.rstrip()
d = {"IndvidualID":count, "Result":line}
myValues.append(d)
count = count + 1
TEST_SUITE_ID = 1
data = {"Name": "Demo",
"Info": [{
"MainID":TEST_SUITE_ID,
"Values":myValues
}]}
Output:
{'Name': 'Demo', 'Info': [{'MainID': 1,
'Values': [{'IndvidualID': 0, 'Result': 'apples'}, {'IndvidualID': 1, 'Result': 'another apples'}]
}]}
Note:
I have defined myValues as an empty list. I iterate over the file to read all lines and I create a new dict for each line, appending to myValues each time. Finally I create the whole data object with myValues embedded inside.
To get the above output I actually substituted: f = ['apples','another apples'] for the open file, but I'm sure an actual file would work.

TypeError: list indices must be integers or slices, not str - get from json

I try to get all lattitudes and longtitudes from this json.
Code:
import urllib.parse
import requests
raw_json = 'http://live.ksmobile.net/live/getreplayvideos?userid='
print()
userid = 735890904669618176
#userid = input('UserID: ')
url = raw_json + urllib.parse.urlencode({'userid=': userid}) + '&page_size=1000'
print(url)
json_data = requests.get(url).json()
print()
for coordinates in json_data['data']['video_info']:
print(coordinates['lat'], coordinates['lnt'])
print()
Error:
/usr/bin/python3.6 /media/anon/3D0B8DD536C9574F/PythonProjects/getLocation/getCoordinates
http://live.ksmobile.net/live/getreplayvideos?userid=userid%3D=735890904669618176&page_size=1000
Traceback (most recent call last):
File "/media/anon/3D0B8DD536C9574F/PythonProjects/getLocation/getCoordinates", line 17, in <module>
for coordinates in json_data['data']['video_info']:
TypeError: list indices must be integers or slices, not str
Process finished with exit code 1
Where do I go wrong?
In advance, thanks for your help and time.
I just post some of the json to show what it looks like.
The json looks like this:
{
"status": "200",
"msg": "",
"data": {
"time": "1499275646",
"video_info": [
{
"vid": "14992026438883533757",
"watchnumber": "38",
"topicid": "0",
"topic": "",
"vtime": "1499202678",
"title": "happy 4th of july",
"userid": "735890904669618176",
"online": "0",
"addr": "",
"isaddr": "2",
"lnt": "-80.1282576",
"lat": "26.2810628",
"area": "A_US",
"countryCode": "US",
"chatSystem": "1",
},
Full json: https://pastebin.com/qJywTqa1
Your URL construction is incorrect. The URL you have built (as shown in the output of your script) is:
http://live.ksmobile.net/live/getreplayvideos?userid=userid%3D=735890904669618176&page_size=1000
Where you actually want this:
http://live.ksmobile.net/live/getreplayvideos?userid=735890904669618176&page_size=1000
So your were actually getting this JSON in your response:
{
"status": "200",
"msg": "",
"data": []
}
Which is why you were seeing that error.
Here is the corrected script:
import urllib.parse
import requests
raw_json = 'http://live.ksmobile.net/live/getreplayvideos?'
print()
userid = 735890904669618176
#userid = input('UserID: ')
url = raw_json + urllib.parse.urlencode({'userid': userid}) + '&page_size=1000'
print(url)
json_data = requests.get(url).json()
print()
for coordinates in json_data['data']['video_info']:
print(coordinates['lat'], coordinates['lnt'])
print()
According to your posted json, you have problem in this statement-
print(coordinates['lat'], coordinates['lnt'])
Here coordinates is a list having only one item which is dictionary. So your statement should be-
print(coordinates[0]['lat'], coordinates[0]['lnt'])

Categories

Resources