JSON Data Masking using PARANOID - python

I have just started with Python programming & working on https://pypi.org/project/PARANOID/ to mask the PII details such as first_name, last_name & email_address.
{
"id": 324324,
"first_name": "John",
"last_name": "Smith",
"email": "john.smith#abc.com"
}
When I am executing paranoid -i my.json -o output, all the fields of my json (id, first_name, last_name & email_address) are getting masked. But I don't want to mask id. For that -l with Xpath to the json has to be provided.
I have tried various combinations for Xpath to json, but still it masks all the fields in the file.
Please guide me.

This library doesn't seems very attractive for me as there's no documentation at all ,the code is very messy and they seem to be overcomplicating some stuff! the only reference to what it can do was here and they just refer to using xpath to process xml not to process json.
I then installed the library locally and verified that xpath doesn't apply to json. But it turns out the library is just a single module( a unique file) with a bunch of functions. So I investigated which functions are being used when you're processing a json file. Only two functions are being used jsonParse2 and maskGenerator. So it was doable to reuse it.
jsonParse2 Doesn't make sense at all for me as they are parsing a json file manually when there are so many easy tools to use such as the json library. I will discard the jsonParse2 function as it was the main problem to filter which keys should be processed.
I will simply reuse the maskGenerator function into my solution then we just pass the keys,values we're interested in to the maskGenerator.
CODE Solution
create an input.json file with your json in the same folder as the solution:
import paranoid
import json
list_not_to_mask = ["id"]
with open("input.json") as input_file:
input_dict =json.loads(input_file.read())
print(input_dict)
output_dict = input_dict
for key in input_dict.keys():
if key in list_not_to_mask:
pass
else:
output_dict[key] = paranoid.maskGenerator(str(input_dict[key]),is_json=True)
print(output_dict)
with open('output.json', 'w') as output_file:
json.dump(output_dict, output_file, ensure_ascii=False, indent=4)
OUTPUT
it will also create a output.json
the input we have is: {'id': 324324, 'first_name': 'John', 'last_name': 'Smith', 'email': 'john.smith#abc.com'}
the output we have is: {'id': 324324, 'first_name': 'Vxqt', 'last_name': 'Yiphq', 'email': 'vuxr.kcicc#muj.jbj'}

Related

How to print attributes of a json file

I would appreciate some help: how could I print just the country from the info obtained via this API call? Thanks!
import requests
import json
url = "https://randomuser.me/api/"
data = requests.get(url).json()
print(data)
You should play a little more with the json in order to learn how to use it, a helpful way to understand them is to go layer by layer printing the keys dict.keys() to see where you should go next if you dont have a documentation
in this particular case it returns a dictionary with the following first layer structure:
{
"results": [ ... ]
"info": { ... }
}
where results contains a single dictionary inside, therefore we can take
data['results'][0] to wok with
there is 'location', and there is a 'country', you can access this in that order to print the country:
print(data['results'][0]['location']['country'])

I am looking to create an API endpoint route that returns txt in a json format -Python

I'm new to developing and my question(s) involves creating an API endpoint in our route. The api will be used for a POST from a Vuetify UI. Data will come from our MongoDB. We will be getting a .txt file for our shell script but it will have to POST as a JSON. I think these are the steps for converting the text file:
1)create a list for the lines of the .txt
2)add each line to the list
3) join the list elements into a string
4)create a dictionary with the file/file content and convert it to JSON
This is my current code for the steps:
import json
something.txt: an example of the shell script ###
f = open("something.txt")
create a list to put the lines of the file in
file_output = []
add each line of the file to the list
for line in f:
file_output.append(line)
mashes all of the list elements together into one string
fileoutput2 = ''.join(file_output)
print(fileoutput2)
create a dict with file and file content and then convert to JSON
json_object = {"file": fileoutput2}
json_response = json.dumps(json_object)
print(json_response)
{"file": "Hello\n\nSomething\n\nGoodbye"}
I have the following code for my baseline below that I execute on my button press in the UI
#bp_customer.route('/install-setup/<string:customer_id>', methods=['POST'])
def install_setup(customer_id):
cust = Customer()
customer = cust.get_customer(customer_id)
### example of a series of lines with newline character between them.
script_string = "Beginning\nof\nscript\n"
json_object = {"file": script_string}
json_response = json.dumps(json_object)
get the install shell script content
replace the values (somebody has already done this)
attempt to return the below example json_response
return make_response(jsonify(json_response), 200)
my current Vuetify button press code is here: so I just have to ammend it to a POST and the new route once this is established
onClickScript() {
console.log("clicked");
axios
.get("https://sword-gc-eadsusl5rq-uc.a.run.app/install-setup/")
.then((resp) => {
console.log("resp: ", resp.data);
this.scriptData = resp.data;
});
},
I'm having a hard time combining these 2 concepts in the correct way. Any input as to whether I'm on the right path? Insight from anyone who's much more experienced than me?
You're on the right path, but needlessly complicating things a bit. For example, the first bit could be just:
import json
with open("something.txt") as f:
json_response = json.dumps({'file': f.read()})
print(json_response)
And since you're looking to pass everything through jsonify anyway, even this would suffice:
with open("something.txt") as f:
data = {'file': f.read()}
Where you can pass data directly through jsonify. The rest of it isn't sufficiently complete to offer any concrete comments, but the basic idea is OK.
If you have a working whole, you could go to https://codereview.stackexchange.com/ to ask for some reviews, you should limit questions on StackOverflow to actual questions about getting something to work.

Does python provide a hook to customize json stringification based on key name?

I'm trying to write an efficient stringification routine for logging dicts, but want to redact certain values based on key names. I see that JSONDecoder provides the object_pairs_hook which provides key and value, but I don't see a corresponding hook for JSONEncoder - just 'default' which only provides value. In my case, the values are just other strings so can't base the processing on that alone. Is there something I missed?
For example, if I have a dict with:
{
"username": "Todd",
"role": "Editor",
"privateKey": "1234ad1234e434134"
}
I would want to log:
'{"username":"Todd","role":"Editor","privateKey":"**redacted**"}'
Any good tools in python to do this? Or should I just recursively iterate the (possibly nested) dict directly?
You can "reload" it using the object hook then dump it again.
def redact(o):
if 'privateKey' in o:
o['privateKey'] = '***redacted***'
return o
>>> d
{'username': 'Todd', 'role': 'Editor', 'privateKey': '1234ad1234e434134', 'foo': ['bar', {'privateKey': 'baz'}]}
>>> json.dumps(json.loads(json.dumps(d), object_hook=redact))
'{"username": "Todd", "role": "Editor", "privateKey": "***redacted***", "foo": ["bar", {"privateKey": "***redacted***"}]}'
JSON library has two functions viz. dumps() and loads(). To convert a json to a string use dumps() and vice-versa use loads().
import json
your_dict = {
"username": "Todd",
"role": "Editor",
"privateKey": "1234ad1234e434134"
}
string_of_your_json = json.dumps(your_dict)

Extract value from uncorrectly parsed dict from json output

I am processing kafka streams in a python flask server. I read the responses with json and need to extract the udid values from the stream. I read each response with request.json and save it in a dictionary. When i try to parse it fails. the dict contains the following values
dict_items([('data', {'SDKVersion': '7.1.2', 'appVersion': '6.5.5', 'dateTime': '2019-08-05 15:01:28+0200', 'device': 'iPhone', 'id': '3971',....})])
parsing the dict the normal way doesnt work ie event_data['status'] gives error.Perhaps it is because its not a pure dict....?
#app.route('/data/idApp/5710/event/start', methods=['POST'])
def give_greeting():
print("Hola")
event_data = request.json
print(event_data.items())
print(event_data['status'])
#print(event_data['udid'])
#print(event_data['Additional'])
return 'Hello, {0}!'.format(event_data)
The values contained in event data are the following
dict_items([('data', {'SDKVersion': '7.1.2', 'appVersion': '6.5.5', 'dateTime': '2019-08-05 15:01:28+0200', 'device': 'iPhone', 'id': '3971',....})])
The expected result would be this result
print(event_data['status'])->start
print(event_data['udid'])->BAEB347B-9110-4CC8-BF99-FA4039C4599B
print(event_data['SDKVersion'])->7.1.2
etc
the output of
print(event_data.keys()) is dict_keys(['data'])
The data you are expecting is wrapped in an additional data property.
You only need to do one extra step to access this data.
data_dict = request.json
event_data = data_dict['data']
Now you should be able to access the information you want with
event_data['SDKVersion']
...
as you have described above.
As #jonrsharpe stated, this is not an issue with the parsing. The parsing either fails or succeeds, but you will never get a "broken" object (be it dict, list, ...) from parsing JSON.

How to accept any key in python json dictionary?

Here's a simplified version of the JSON I am working with:
{
"libraries": [
{
"library-1": {
"file": {
"url": "foobar.com/.../library-1.bin"
}
}
},
{
"library-2": {
"application": {
"url": "barfoo.com/.../library-2.exe"
}
}
}
]
}
Using json, I can json.loads() this file. I need to be able to find the 'url', download it, and save it to a local folder called library. In this case, I'd create two folders within libraries/, one called library-1, the other library-2. Within these folder's would be whatever was downloaded from the url.
The issue, however, is being able to get to the url:
my_json = json.loads(...) # get the json
for library in my_json['libraries']:
file.download(library['file']['url']) # doesn't access ['application']['url']
Since the JSON I am using uses a variety of accessors, sometimes 'file', other times 'dll' etc, I can't use one specific dictionary key. How can I use multiple. Would there be a modular way to do this?
Edit: There are numerous accessors, 'file', 'application' and 'dll' are only some examples.
You can just iterate through each level of the dictionary and download the files if you find a url.
urls = []
for library in my_json['libraries']:
for lib_name, lib_data in library.items():
for module_name, module_data in lib_data.items():
url = module_data.get('url')
if url is not None:
# create local directory with lib_name
# download files from url to local directory
urls.append(url)
# urls = ['foobar.com/.../library-1.bin', 'barfoo.com/.../library-2.exe']
This should work:
for library in my_json['libraries']:
for value in library.values():
for url in value.values():
file.download(url)
I would suggest doing it like this:
for library in my_json['libraries']:
library_data = library.popitem()[1].popitem()[1]
file.download(library_data['url'])
Try this
for library in my_json['libraries']:
if 'file' in library:
file.download(library['file']['url'])
elif 'dll' in library:
file.download(library['dll']['url'])
It just sees if your dict(created by parsing JSON) has a key named 'file'. If so, then use 'url' of the dict corresponds to the 'file' key. If not, try the same with 'dll' keyword.
Edit: If you don't know the key to access the dict containing the url, try this.
for library in my_json['libraries']:
for key in library:
if 'url' in library['key']:
file.download(library[key]['url'])
This iterates over all the keys in your library. Then, whichever key contains an 'url', downloads using that.

Categories

Resources