Number plate detection JSON dataset - python

I am new to JSON. I am doing a project for Vehicle Number Plate Detection.
I have a dataset of the form:
{"content": "http://com.dataturks.a96-i23.open.s3.amazonaws.com/2c9fafb0646e9cf9016473f1a561002a/77d1f81a-bee6-487c-aff2-0efa31a9925c____bd7f7862-d727-11e7-ad30-e18a56154311.jpg.jpeg","annotation":[{"label":["number_plate"],"notes":"","points":[{"x":0.7220843672456576,"y":0.5879828326180258},{"x":0.8684863523573201,"y":0.6888412017167382}],"imageWidth":806,"imageHeight":466}],"extras":null},
{"content": "http://com.dataturks.a96-i23.open.s3.amazonaws.com/2c9fafb0646e9cf9016473f1a561002a/4eb236a3-6547-4103-b46f-3756d21128a9___06-Sanjay-Dutt.jpg.jpeg","annotation":[{"label":["number_plate"],"notes":"","points":[{"x":0.16194331983805668,"y":0.8507795100222717},{"x":0.582995951417004,"y":1}],"imageWidth":494,"imageHeight":449}],"extras":null},
There are in total 240 blocks of data.
I want to do two things with the above dataset.
Firstly,I need to download all the images from each block and secondly,need to get the values of "points" column to a text file.
I am getting problem while getting the values for the columns.
import json
jsonFile = open('Indian_Number_plates.json', 'r')
x = json.load(jsonFile)
for criteria in x['annotation']:
for key, value in criteria.iteritems():
print(key, 'is:', value)
print('')
I have written the above code to get all the values under the "annotation".
But,getting the following error
Traceback (most recent call last):
File "prac.py", line 13, in <module>
x = json.load(jsonFile)
File "C:\python364\Lib\json\__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "C:\python364\Lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\python364\Lib\json\decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 394 (char 393)
Please help me for getting the values for "points" column and also for downloading the images from the link in the "content" section.

i found this answer while searching. Essentially, you can read an object, catch the exception when JSON sees an unexpected object, and then seek/reparse and build a list of objects.
in Java, i'd just tell you to use Jackson and their SAX style streaming interface, as i've done that to read a list of objects formatted like this - if JSON in python has a streaming api, i'd use that instead of the exception handler workaround

the error comes because your file contains two records or more :
{"content": "http://com.dataturks.a96- } ..... {"content": .....
to solve this you should reformat your json so that all the records are contained in an array :
{ "data" : [ {"content": "http://com.dataturks.a96- .... },{"content":... }]}
to download the images, extract the image names and urls and use requests :
import requests
with open(image_name, 'wb') as handle:
response = requests.get(pic_url, stream=True)
if not response.ok:
print response
for block in response.iter_content(1024):
if not block:
break
handle.write(block)

Related

How to shuffle big JSON file?

I have a JSON file with 1 000 000 entries in it (Size: 405 Mb). It looks like that:
[
{
"orderkey": 1,
"name": "John",
"age": 23,
"email": "john#example.com"
},
{
"orderkey": 2,
"name": "Mark",
"age": 33,
"email": "mark#example.com"
},
...
]
The data is sorted by "orderkey", I need to shuffle data.
I tried to apply the following Python code. It worked for smaller JSON file, but did not work for my 405 MB one.
import json
import random
with open("sorted.json") as f:
data = json.load(f)
random.shuffle(data)
with open("sorted.json") as f:
json.dump(data, f, indent=2)
How to do it?
UPDATE:
Initially I got the following error:
~/Desktop/shuffleData$ python3 toShuffle.py
Traceback (most recent call last):
File "/home/andrei/Desktop/shuffleData/toShuffle.py", line 5, in <module>
data = json.load(f)
File "/usr/lib/python3.10/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.10/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 403646259 (char 403646258)
Figured out that the problem was that I had "}" in the end of JSON file. I had [{...},{...}]} that was not valid.
Removing "}" fixed the problem.
Figured out that the problem was that I had "}" in the end of JSON file. I had [{...},{...}]} that was not valid format.
Removing "}" in the end fixed the problem.
Provided python code works.
Well this should ideally work unless you have memory constraints.
import random
random.shuffle(data)
In case you are looking for another way and would like to benchmark which is faster for the huge set, you can use the sci-kit learn libraries shuffle function.
from sklearn.utils import shuffle
shuffled_data = shuffle(data)
print(shuffled_data)
Note: Additional package has to be installed called Scikit learn. (pip install -U scikit-learn)

Python errors when trying to read and query a JSON file

I am trying to write a Python function as part of my job to be able to check the existence of data in a JSON file which I can only get by downloading it from a website. I am the only resource here with any coding or scripting experience (HTML, CSS & SQL) so this has fallen to me to sort out. I have no experience thus far with Python.
I am not allowed to change the structure or format of the JSON file, the format of it is:
{
"naglowek": {
"dataGenerowaniaDanych": "20210514",
"liczbaTransformacji": "5000",
"schemat": "RRRRMMDDNNNNNNNNNNBBBBBBBBBBBBBBBBBBBBBBBBBB"
},
"skrotyPodatnikowCzynnych": [
"examplestring1",
"examplestring2",
"examplestring3",
"examplestring4",
],
"maski": [
"examplemask1",
"examplemask2",
"examplemask3",
"examplemask4"
]
}
I have tried numerous examples found online but none of them seem to work. From looking at various websites the Python code I have is:
import json
with open('20210514.json') as myfile:
data = json.load(myfile)
print(data)
keyVal = 'examplestring2'
if keyVal in data:
# Print the success message and the value of the key
print("Data is found in JSON data")
else:
# Print the message if the value does not exist
print("Data is not found in JSON data")
But I am getting these errors below, I am a complete newbie to Python so am having trouble deciphering them:
D:\PycharmProjects\venv\Scripts\python.exe D:/PycharmProjects/json_test.py
Traceback (most recent call last):
File "D:\PycharmProjects\json_test.py", line 4, in <module>
data = json.load(myfile)
File "C:\Users\xyz\AppData\Local\Programs\Python\Python39\lib\json\__init__.py", line 293, in load
return loads(fp.read(),
File "C:\Users\xyz\AppData\Local\Programs\Python\Python39\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\xyz\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\xyz\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 12 column 5 (char 921)
Process finished with exit code 1
Any help would be massively appreciated!
{
"naglowek": {
"dataGenerowaniaDanych": "20210514",
"liczbaTransformacji": "5000",
"schemat": "RRRRMMDDNNNNNNNNNNBBBBBBBBBBBBBBBBBBBBBBBBBB"
},
"skrotyPodatnikowCzynnych": [
"examplestring1",
"examplestring2",
"examplestring3",
"examplestring4"
],
"maski": [
"examplemask1",
"examplemask2",
"examplemask3",
"examplemask4"
]
}
This should work. The problem here is that you have a comma at the end of a list which your parser can't handle. ECMAScript 5 introduced the ability to parse that. But apparently JSON in general doesn't support it (yet?). So, make sure to not have a comma at the end of a list.
For your if-else statement to be correct, you'd have to change it to something like this:
keyVal = 'examplestring2'
keyName = 'skrotyPodatnikowCzynnych'
if keyName in data.keys() and keyval in data[keyName]:
# Print the success message and the value of the key
print("Data is found in JSON data")
else:
# Print the message if the value does not exist
print("Data is not found in JSON data")
Remove the trailing comma. JSON specification does not allow a trailing comma
If you don't want to change the file structure then you have to do this:
import yaml
with open('20210514.json') as myfile:
data = yaml.load(myfile, Loader=yaml.FullLoader)
print(data)
You also need to install yaml first.
https://pyyaml.org/

How to get python script properly working to take input from .txt files and return the number of positives

I have a python script that is supposed to take a directory full of .txt files and determine if each .txt file return positive or negative for matching certain text statements inside the file itself like "known infection source". However, my script doesn't work and returns the following error message. Any help would be greatly appreciated!
Sample JSON file text
{
"detected_referrer_samples": [
{
"positives": 1,
"sha256": "325f928105efb4c227be1a83fb3d0634ec5903bdfce2c3580ad113fc0f15373c",
"total": 52
},
{
"positives": 20,
"sha256": "48d85943ea9cdd1e480d73556e94d8438c1b2a8a30238dff2c52dd7f5c047435",
"total": 53
}
],
"detected_urls": [],
"domain_siblings": [],
"resolutions": [],
"response_code": 1,
"verbose_msg": "Domain found in dataset",
"whois": null
}
Error
Traceback (most recent call last):
File "vt_reporter1.py", line 35, in <module>
print(vt_result_check(path))
File "vt_reporter1.py", line 20, in vt_result_check
vt_result |= any(sample['positives'] > 0 for sample_type in sample_types
File "vt_reporter1.py", line 21, in <genexpr>
for sample in vt_data.get(sample_type, []))
AttributeError: 'list' object has no attribute 'get'
Code
import os
import json
import csv
path=r'./output/'
csvpath='C:/Users/bwerner/Documents'
def vt_result_check(path):
vt_result = False
for filename in os.listdir(path):
with open(path + filename, 'r') as vt_result_file:
vt_data = json.load(vt_result_file)
# Look for any positive detected referrer samples
# Look for any positive detected communicating samples
# Look for any positive detected downloaded samples
# Look for any positive detected URLs
sample_types = ('detected_referrer_samples', 'detected_communicating_samples',
'detected_downloaded_samples', 'detected_urls')
vt_result |= any(sample['positives'] > 0 for sample_type in sample_types
for sample in vt_data.get(sample_type, []))
# Look for a Dr. Web category of known infection source
vt_result |= vt_data.get('Dr.Web category') == "known infection source"
# Look for a Forecepoint ThreatSeeker category of elevated exposure
# Look for a Forecepoint ThreatSeeker category of phishing and other frauds
# Look for a Forecepoint ThreatSeeker category of suspicious content
threats = ("elevated exposure", "phishing and other frauds", "suspicious content")
vt_result |= vt_data.get('Forcepoint ThreatSeeker category') in threats
return vt_result
if __name__ == '__main__':
print(vt_result_check(path))
with open(csvpath, 'w') as csvfile:
writer.writerow([vt_result_check(path)])
The error tells you everything you need to know about what is going wrong, which is that you cannot call the get() function on a list. In Python, the get() function can only be used with dictionaries, which are different from lists. Instead of using the get() function, call a specific index of the list and your program should work. For example:
for sample in list[10:11]
which returns the 11th element of the list.
Can you post the contents of the file, or some text that represents the content of the file that is being read in?
Here's some feedback based on what is seen in the code you posted:
vt_result_file contains valid JSON
And whatever it is reading, is being read into python as a List. We can determine this because of the error that you are receiving. Look at the last line of the error:
AttributeError: 'list' object has no attribute 'get'
It says that you are trying to access the "get" attribute on a "list". Looking at your code, we can see that you are calling "get" on "vt_data" three times:
Once as vt_data.get(sample_type, [])
Another time as vt_data.get('Dr.Web category')
And finally as vt_data.get('Forcepoint ThreatSeeker category')
Per the error message, your variable vt_data is a list and not a dictionary.
So you need to ask yourself:
Were you expecting vt_result_file to contain a dictionary? If so, open the file and examine what is contained there, and turn it into a dictionary.
Unfortunately, without seeing the contents of this file, it is hard to suggest what you need to change to fix this error.

Extract JSON Data in Python - Example Code Included

I am brand new to using JSON data and fairly new to Python. I am struggling with being able to parse the following JSON data in Python, in order to import the data into a SQL Server database. I already have a program that will import the parsed data into sql server using PYDOBC, however I can't for the life of me figure out how to correctly parse the JSON data into a Python dictionary.
I know there are a number of threads that address this issue, however I was unable to find any examples of the same JSON data structure. Any help would be greatly appreciated as I am completely stuck on this issue. Thank you SO! Below is a cut of the JSON data I am working with:
{
"data":
[
{
"name": "Mobile Application",
"url": "https://www.example-url.com",
"metric": "users",
"package": "example_pkg",
"country": "USA",
"data": [
[ 1396137600000, 5.76 ],
[ 1396224000000, 5.79 ],
[ 1396310400000, 6.72 ],
....
[ 1487376000000, 7.15 ]
]
}
],"as_of":"2017-01-22"}
Again, I apologize if this thread is repetitive, however as I mentioned above, I was not able to work out the logic from other threads as I am brand new to using JSON.
Thank you again for any help or advice in regard to this.
import json
with open("C:\\Pathyway\\7Park.json") as json_file:
data = json.load(json_file)
assert data["data"][0]["metric"] == "users"
The above code results with the following error:
Traceback (most recent call last):
File "JSONpy", line 10, in <module>
data = json.load(json_file)
File "C:\json\__init__.py", line 291, in load
**kw)
File "C:\json\__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "C:\json\decoder.py", line 367, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 7 column 1 (char 23549 - 146249)
Assuming the data you've described (less the ... ellipsis) is in a file called j.json, this code parses the JSON document into a Python object:
import json
with open("j.json") as json_file:
data = json.load(json_file)
assert data["data"][0]["metric"] == "users"
From your error message it seems possible that your file is not a single JSON document, but a sequence of JSON documents separated by newlines. If that is the case, then this code might be more helpful:
import json
with open("j.json") as json_file:
for line in json_file:
data = json.loads(line)
print (data["data"][0]["metric"])

Python -- get at JSON info that's written like XML

In Python, I usually do simple JSON with this sort of template:
url = "url"
file = urllib2.urlopen(url)
json = file.read()
parsed = json.loads(json)
and then get at the variables with calls like:
parsed[obj name][value name]
But, this works with JSON that's formatted roughly like:
{'object':{'index':'value', 'index':'value'}}
The JSON I just encountered is formatted like:
{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}
so there are no names for me to reference the different blocks. Of course the blocks give different info, but have the same "keys" -- much like XML is usually formatted. Using my method above, how would I parse through this JSON?
The following is not a valid JSON.
{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}
Where as
[{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}] is a valid JSON.
and python trackback shows that
import json
string = "{'index':'value', 'index':'value'},{'index':'value', 'index':'value'}"
parsed = json.loads(string)
print parsed
Traceback (most recent call last):
File "/Users/tron/Desktop/test3.py", line 3, in <module>
parsed_json = json.loads(json_string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 27 - line 1 column 54 (char 26 - 53)
[Finished in 0.0s with exit code 1]
where is if you do
json_string = '[{"a":"value", "b":"value"},{"a":"value", "b":"value"}]'
everything works fine.
If that is the case, you can refer to it as an array of Jsons. where json_string[0] is the first JSON string. json_string[1] is the second and so on.
Otherwise if you think this is going to be an issue that you "just have to deal with". Here is one option:
Think of the ways JSON can be malformed and write a simple class to account for them. In the case above, here is a hacky way you can deal with it.
import json
json_string = '{"a":"value", "b":"value"},{"a":"value", "b":"value"}'
def parseJson(string):
parsed_json = None
try:
parsed_json = json.loads(string)
print parsed_json
except ValueError, e:
print string, "didnt parse"
if "Extra data" in str(e.args):
newString = "["+string+"]"
print newString
return parseJson(newString)
You could add more if/else to deal with various things you run into. I have to admit, this is very hacky and I don't think you can ever account for every possible mutation.
Good luck
The result must be list of dict:
[{'index1':'value1', 'index2':'value2'},{'index1':'value1', 'index2':'value2'}]
thus you can reference it using numbers: item[1]['index1']

Categories

Resources