Convert the output of openssl command to JSON - python

The output of the openssl command looks like this:
serial=CABCSDUMMY4A168847703FGH
notAfter=Oct 21 16:43:47 2024 GMT
subject= /C=US/ST=WA/L=Seattle/O=MyCo/OU=TME/CN=MyCo.example.com
How do I convert this string to JSON?
I tried these:
temp_txt_bytes = subprocess.check_output (["openssl", "x509", "-serial", "-enddate", "-subject", "-noout", "-in", pem_file_name])
temp_txt_strings = temp_txt_bytes.decode("utf-8")
test = json.loads(temp_txt_strings) #json.parse, json.dump, and json.load also failing

You can split every line with "=" as a separator, put the two parts in an ordered dictionary and then dump it to json:
my_list = "serial=CABCSDUMMY4A168847703FGH".split("=")
ordered_dict = OrderedDict()
ordered_dict[my_list[0]] = my_list[1]
print(json.dumps(ordered_dict))
the output would be like this:
{"serial": "CABCSDUMMY4A168847703FGH"}
you can do it for all lines.
PS don't forget to import json and OrderedDict

Related

How to open and edit an incorrect json file?

Maybe this question is a bit dumb but I actually don't know how to fix it. I have a json file that is in the wrong format. it has a b' before the first { and also it uses single quotes instead of double quotes which is not an accepted format for json.
I know that I have to replace the single quotes with double quotes. I would use something like this:
json = file.decode('utf8').replace("'", '"')
But the problem is how can I replace the quotes in the file if I can't open it?
import json
f = open("data/qa_Automotive.json",'r')
file = json.load(f)
Opening the file gives me an error because It has single quotes and not double quotes:
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
How am I supposed to change the quotes in the json file if I can't open the file because it has the wrong format? (This is the json file btw: https://jmcauley.ucsd.edu/data/amazon/qa/qa_Appliances.json.gz)
The file isn't JSON (the filename is incorrect); instead, it's composed of valid Python literals. There's no reason to try to transform it to JSON. Don't do that; instead, just tell Python to parse it as-is.
#!/usr/bin/env python3
import ast, json
results = [ ast.literal_eval(line) for line in open('qa_Appliances.json') ]
print(json.dumps(results))
...properly gives you a list named results with all your lines in it, and (for demonstration purposes) dumps it to stdout as JSON.
There are multiple issue with that file :
it is ndjson (new line delimited, each line is a single object)
there is a mixing of quote
First issue is easily solve, you can just load each line individually, second issue is tricker.
I tried (as you did) to simply replace ' with ". But that is not enough, as there is ' in the text (like "you're"), there is a single quote there. If you replace all the ', that will be transformed into a " and break your string.
You'll be left with something like "message": "you"re" which is invalid.
Since there is a lot of issue with that file, I suggest to use something a little dirty: the python eval function.
eval simply play a string as if it was a command.
>>> eval("4 + 2")
6
Because json format, and Python native dict type are really close (minus some difference like Python use True/False and json use true/false. Python use None and json use null, and maybe other that I forgot), the structure with the curly and squared bracket are the same. But here, "eval" will help you because Python support both single, and double quote
Using a line from your file:
>>> import json
>>>
>>> data = "{'questionType': 'yes/no', 'asin': 'B0002Z1GG0', 'answerTime': 'Dec 10, 2013', 'unixTime': 1386662400, 'question': 'would this work on my hotpoint model ctx14ayxkrwh serial hr749157 refrigerator do the drawers slide into slots ?', 'answerType': '?', 'answer': 'the drawers do fit into the slots.'}"
>>> parsed = eval(data)
>>> print(parsed)
{'questionType': 'yes/no', 'asin': 'B0002Z1GG0', 'answerTime': 'Dec 10, 2013', 'unixTime': 1386662400, 'question': 'would this work on my hotpoint model ctx14ayxkrwh serial hr749157 refrigerator do the drawers slide into slots ?', 'answerType': '?', 'answer': 'the drawers do fit into the slots.'}
>>> type(parsed)
<class 'dict'>
>>> print(json.dumps(parsed, indent=2))
{
"questionType": "yes/no",
"asin": "B0002Z1GG0",
"answerTime": "Dec 10, 2013",
"unixTime": 1386662400,
"question": "would this work on my hotpoint model ctx14ayxkrwh serial hr749157 refrigerator do the drawers slide into slots ?",
"answerType": "?",
"answer": "the drawers do fit into the slots."
}
I can do that on the whole file:
>>> data = open("<path to file>").readlines()
>>> parsed = [ eval(line) for line in data ]
>>>
>>> len(parsed)
9011
>>> parsed[0]
{'questionType': 'yes/no', 'asin': 'B00004U9JP', 'answerTime': 'Jun 27, 2014', 'unixTime': 1403852400, 'question': 'I have a 9 year old Badger 1 that needs replacing, will this Badger 1 install just like the original one?', 'answerType': '?', 'answer': 'I replaced my old one with this without a hitch.'}
>>> parsed[0]['questionType']
'yes/no'
BEWARE You should never use eval on unsanitazed data, as it can be use to breach your system, but if you use it in a controlled environment, you do you.

How to solve problem decoding from wrong json format

everyone. Need help opening and reading the file.
Got this txt file - https://yadi.sk/i/1TH7_SYfLss0JQ
It is a dictionary
{"id0":"url0", "id1":"url1", ..., "idn":"urln"}
But it was written using json into txt file.
#This is how I dump the data into a txt
json.dump(after,open(os.path.join(os.getcwd(), 'before_log.txt'), 'a'))
So, the file structure is
{"id0":"url0", "id1":"url1", ..., "idn":"urln"}{"id2":"url2", "id3":"url3", ..., "id4":"url4"}{"id5":"url5", "id6":"url6", ..., "id7":"url7"}
And it is all a string....
I need to open it and check repeated ID, delete and save it again.
But getting - json.loads shows ValueError: Extra data
Tried these:
How to read line-delimited JSON from large file (line by line)
Python json.loads shows ValueError: Extra data
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 190)
But still getting that error, just in different place.
Right now I got as far as:
with open('111111111.txt', 'r') as log:
before_log = log.read()
before_log = before_log.replace('}{',', ').split(', ')
mu_dic = []
for i in before_log:
mu_dic.append(i)
This eliminate the problem of several {}{}{} dictionaries/jsons in a row.
Maybe there is a better way to do this?
P.S. This is how the file is made:
json.dump(after,open(os.path.join(os.getcwd(), 'before_log.txt'), 'a'))
Your file size is 9,5M, so it'll took you a while to open it and debug it manually.
So, using head and tail tools (found normally in any Gnu/Linux distribution) you'll see that:
# You can use Python as well to read chunks from your file
# and see the nature of it and what it's causing a decode problem
# but i prefer head & tail because they're ready to be used :-D
$> head -c 217 111111111.txt
{"1933252590737725178": "https://instagram.fiev2-1.fna.fbcdn.net/vp/094927bbfd432db6101521c180221485/5CC0EBDD/t51.2885-15/e35/46950935_320097112159700_7380137222718265154_n.jpg?_nc_ht=instagram.fiev2-1.fna.fbcdn.net",
$> tail -c 219 111111111.txt
, "1752899319051523723": "https://instagram.fiev2-1.fna.fbcdn.net/vp/a3f28e0a82a8772c6c64d4b0f264496a/5CCB7236/t51.2885-15/e35/30084016_2051123655168027_7324093741436764160_n.jpg?_nc_ht=instagram.fiev2-1.fna.fbcdn.net"}
$> head -c 294879 111111111.txt | tail -c 12
net"}{"19332
So the first guess is that your file is a malformed series ofJSON data, and the best guess is to seperate }{ by a \n for further manipulations.
So, here is an example of how you can solve your problem using Python:
import json
input_file = '111111111.txt'
output_file = 'new_file.txt'
data = ''
with open(input_file, mode='r', encoding='utf8') as f_file:
# this with statement part can be replaced by
# using sed under your OS like this example:
# sed -i 's/}{/}\n{/g' 111111111.txt
data = f_file.read()
data = data.replace('}{', '}\n{')
seen, total_keys, to_write = set(), 0, {}
# split the lines of the in memory data
for elm in data.split('\n'):
# convert the line to a valid Python dict
converted = json.loads(elm)
# loop over the keys
for key, value in converted.items():
total_keys += 1
# if the key is not seen then add it for further manipulations
# else ignore it
if key not in seen:
seen.add(key)
to_write.update({key: value})
# write the dict's keys & values into a new file as a JSON format
with open(output_file, mode='a+', encoding='utf8') as out_file:
out_file.write(json.dumps(to_write) + '\n')
print(
'found duplicated key(s): {seen} from {total}'.format(
seen=total_keys - len(seen),
total=total_keys
)
)
Output:
found duplicated key(s): 43836 from 45367
And finally, the output file will be a valid JSON file and the duplicated keys will be removed with their values.
The basic difference between the file structure and actual json format is the missing commas and the lines are not enclosed within [. So the same can be achieved with the below code snippet
with open('json_file.txt') as f:
# Read complete file
a = (f.read())
# Convert into single line string
b = ''.join(a.splitlines())
# Add , after each object
b = b.replace("}", "},")
# Add opening and closing parentheses and ignore last comma added in prev step
b = '[' + b[:-1] + ']'
x = json.loads(b)

python from hex to shellcode format

I try to convert a hex string to shellcode format
For example: I have a file in hex string like aabbccddeeff11223344
and I want to convert that through python to show this exact format:
"\xaa\xbb\xcc\xdd\xee\xff\x11\x22\x33\x44" including the quotes "".
My code is:
with open("file","r") as f:
a = f.read()
b = "\\x".join(a[i:i+2] for i in range(0, len(a), 2))
print b
so my output is aa\xbb\xcc\xdd\xee\xff\x11\x22\x33\x44\x.
I understand I can do it via sed command but I wonder how I may accomplish this through python.
The binascii standard module will help here:
import binascii
print repr(binascii.unhexlify("aabbccddeeff11223344"))
Output:
>>> print repr(binascii.unhexlify("aabbccddeeff11223344"))
'\xaa\xbb\xcc\xdd\xee\xff\x11"3D'

Zeroes appearing when reading file (where aren't any)

When reading a file (UTF-8 Unicode text, csv) with Python on Linux, either with:
csv.reader()
file()
values of some columns get a zero as their first characeter (there are no zeroues in input), other get a few zeroes, which are not seen when viewing file with Geany or any other editor. For example:
Input
10016;9167DE1;Tom;Sawyer ;Street 22;2610;Wil;;378983561;tom#hotmail.com;1979-08-10 00:00:00.000;0;1;Wil;081208608;NULL;2;IZMH726;2010-08-30 15:02:55.777;2013-06-24 08:17:22.763;0;1;1;1;NULL
Output
10016;9167DE1;Tom;Sawyer ;Street 22;2610;Wil;;0378983561;tom#hotmail.com;1979-08-10 00:00:00.000;0;1;Wil;081208608;NULL;2;IZMH726;2010-08-30 15:02:55.777;2013-06-24 08:17:22.763;0;1;1;1;NULL
See 378983561 > 0378983561
Reading with:
f = file('/home/foo/data.csv', 'r')
data = f.read()
split_data = data.splitlines()
lines = list(line.split(';') for line in split_data)
print data[51220][8]
>>> '0378983561' #should have been '478983561' (reads like this in Geany etc.)
Same result with csv.reader().
Help me solve the mystery, what could be the cause of this? Could it be related to encoding/decoding?
The data you're getting is a string.
print data[51220][8]
>>> '0478983561'
If you want to use this as an integer, you should parse it.
print int(data[51220][8])
>>> 478983561
If you want this as a string, you should convert it back to a string.
print repr(int(data[51220][8]))
>>> '478983561'
csv.reader treats all columns as strings. Conversion to the appropriate type is up to you as in:
print int(data[51220][8])

How do I get this to encode properly?

I have a XML file with Russian text:
<p>все чашки имеют стандартный посадочный диаметр - 22,2 мм</p>
I use xml.etree.ElementTree to do manipulate it in various ways (without ever touching the text content). Then, I use ElementTree.tostring:
info["table"] = ET.tostring(table, encoding="utf8") #table is an Element
Then I do some other stuff with this string, and finally write it to a file
f = open(newname, "w")
output = page_template.format(**info)
f.write(output)
f.close()
I wind up with this in my file:
<p>\xd0\xb2\xd1\x81\xd0\xb5 \xd1\x87\xd0\xb0\xd1\x88\xd0\xba\xd0\xb8 \xd0\xb8\xd0\xbc\xd0\xb5\xd1\x8e\xd1\x82 \xd1\x81\xd1\x82\xd0\xb0\xd0\xbd\xd0\xb4\xd0\xb0\xd1\x80\xd1\x82\xd0\xbd\xd1\x8b\xd0\xb9 \xd0\xbf\xd0\xbe\xd1\x81\xd0\xb0\xd0\xb4\xd0\xbe\xd1\x87\xd0\xbd\xd1\x8b\xd0\xb9 \xd0\xb4\xd0\xb8\xd0\xb0\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80 - 22,2 \xd0\xbc\xd0\xbc</p>
How do I get it encoded properly?
You use
info["table"] = ET.tostring(table, encoding="utf8")
which returns bytes. Then later you apply that to a format string, which is a str (unicode), if you do that you'll end up with a representation of the bytes object.
etree can return an unicode object instead if you use:
info["table"] = ET.tostring(table, encoding="unicode")
The problem is that ElementTree.tostring returns a binary object and not an actual string. The answer to this is:
info["table"] = ET.tostring(table, encoding="utf8").decode("utf8")
Try this - with output parameter being just the Russian string without utf-8 encoding.
import codecs
#output=u'все чашки имеют стандартный посадочный диаметр'
with codecs.open(newname, "w", "utf-16") as stream: #or utf-8
stream.write(output + u"\n")

Categories

Resources