Related
I have been trying to replace integer components of a dictionary with string values given in another dictionary. However, I am getting the following error:
Traceback (most recent call last):
File "<string>", line 11, in <module>
File "/usr/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 14 (char 13)
The code has been given below. Not sure why I am getting an error.
import re
from json import loads, dumps
movable = {"movable": [0, 3, 6, 9], "fixed": [1, 4, 7, 10], "mixed": [2, 5, 8, 11]}
int_mapping = {0: "Ar", 1: "Ta", 2: "Ge", 3: "Ca", 4: "Le", 5: "Vi", 6: "Li", 7: "Sc", 8: "Sa", 9: "Ca", 10: "Aq", 11: "Pi"}
movable = dumps(movable)
for key in int_mapping.keys():
movable = re.sub('(?<![0-9])' + str(key) + '(?![0-9])', int_mapping[key], movable)
movable = loads(movable)
I understand that this code can easily be written in a different way to get the desired output. However, I am interested to understand what I am doing wrong.
If you print how movable looks like right before calling json.loads, you'll see what the problem is:
for key in int_mapping.keys():
movable = re.sub('(?<![0-9])' + str(key) + '(?![0-9])', int_mapping[key], movable)
print(movable)
outputs:
{"movable": [Ar, Ca, Li, Ca], "fixed": [Ta, Le, Sc, Aq], "mixed": [Ge, Vi, Sa, Pi]}
Those strings (Ar, Ca...) are not quoted, therefore it is not valid JSON.
If you choose to continue the way you're going, you must add the ":
movable = re.sub(
'(?<![0-9])' + str(key) + '(?![0-9])',
'"' + int_mapping[key] + '"',
movable)
(notice the '"' + int_mapping[key] + '"')
Which produces:
{"movable": ["Ar", "Ca", "Li", "Ca"], "fixed": ["Ta", "Le", "Sc", "Aq"], "mixed": ["Ge", "Vi", "Sa", "Pi"]}
This said... you are probably much better off by just walking the movable values and substituting them by the values in int_mapping. Something like:
mapped_movable = {}
for key, val in movable.items():
mapped_movable[key] = [int_mapping[v] for v in val]
print(mapped_movable)
You could use a dict comprehension and make the mapping replacements directly in Python:
...
movable = {
k: [int_mapping[v] for v in values]
for k, values in movable.items()
}
print(type(movable))
print(movable)
Out:
<type 'dict'>
{'mixed': ['Ge', 'Vi', 'Sa', 'Pi'], 'fixed': ['Ta', 'Le', 'Sc', 'Aq'], 'movable': ['Ar', 'Ca', 'Li', 'Ca']}
I'm writing JSON to a file using DataFrame.to_json() with the indent option:
df.to_json(path_or_buf=file_json, orient="records", lines=True, indent=2)
The important part here is indent=2, otherwise it works.
Then how do I read this file using DataFrame.read_json()?
I'm trying the code below, but it expects the file to be a JSON object per line, so the indentation messes things up:
df = pd.read_json(file_json, lines=True)
I didn't find any options in read_json to make it handle the indentation.
How else could I read this file created by to_json, possibly avoiding writing my own reader?
The combination of lines=True, orient='records', and indent=2 doesn't actually produce valid json.
lines=True is meant to create line-delimited json, but indent=2 adds extra lines. You can't have your delimiter be line breaks, AND have extra line breaks!
If you do just orient='records', and indent=2, then it does produce valid json.
The current read_json(lines=True) code can be found here:
def _combine_lines(self, lines) -> str:
"""
Combines a list of JSON objects into one JSON object.
"""
return (
f'[{",".join([line for line in (line.strip() for line in lines) if line])}]'
)
You can see that it expects to read the file line by line, which isn't possible when indent has been used.
The other answer is good, but it turned out it requires reading the entire file in memory. I ended up writing a simple lazy parser that I include below. It requires removing lines=True argument in df.to_json.
The use is following:
for obj, pos, length in lazy_read_json('file.json'):
print(obj['field']) # access json object
It includes pos - start position in file for the object, and length - the length of object in file; it allows some more functionality for me, like being able to index object and load them to memory on demand.
The parser is below:
def lazy_read_json(filename: str):
"""
:return generator returning (json_obj, pos, lenth)
>>> test_objs = [{'a': 11, 'b': 22, 'c': {'abc': 'z', 'zzz': {}}}, \
{'a': 31, 'b': 42, 'c': [{'abc': 'z', 'zzz': {}}]}, \
{'a': 55, 'b': 66, 'c': [{'abc': 'z'}, {'z': 3}, {'y': 3}]}, \
{'a': 71, 'b': 62, 'c': 63}]
>>> json_str = json.dumps(test_objs, indent=4, sort_keys=True)
>>> _create_file("/tmp/test.json", [json_str])
>>> g = lazy_read_json("/tmp/test.json")
>>> next(g)
({'a': 11, 'b': 22, 'c': {'abc': 'z', 'zzz': {}}}, 120, 116)
>>> next(g)
({'a': 31, 'b': 42, 'c': [{'abc': 'z', 'zzz': {}}]}, 274, 152)
>>> next(g)
({'a': 55, 'b': 66, 'c': [{'abc': 'z'}, {'z': 3}, {'y': 3}]}, 505, 229)
>>> next(g)
({'a': 71, 'b': 62, 'c': 63}, 567, 62)
>>> next(g)
Traceback (most recent call last):
...
StopIteration
"""
with open(filename) as fh:
state = 0
json_str = ''
cb_depth = 0 # curly brace depth
line = fh.readline()
while line:
if line[-1] == "\n":
line = line[:-1]
line_strip = line.strip()
if state == 0 and line == '[':
state = 1
pos = fh.tell()
elif state == 1 and line_strip == '{':
state = 2
json_str += line + "\n"
elif state == 2:
if len(line_strip) > 0 and line_strip[-1] == '{': # count nested objects
cb_depth += 1
json_str += line + "\n"
if cb_depth == 0 and (line_strip == '},' or line_strip == '}'):
# end of parsing an object
if json_str[-2:] == ",\n":
json_str = json_str[:-2] # remove trailing comma
state = 1
obj = json.loads(json_str)
yield obj, pos, len(json_str)
pos = fh.tell()
json_str = ""
elif line_strip == '}' or line_strip == '},':
cb_depth -= 1
line = fh.readline()
# this function is for doctest
def _create_file(filename, lines):
# cause doctest can't input new line characters :(
f = open(filename, "w")
for line in lines:
f.write(line)
f.write("\n")
f.close()
I have a task for my college (I am beginner), which asks you to validate a password using ASCII characters. I tried using simple code and it worked, however it kept skipping my ASCII part.
Requirement list:
1.4 Call function to get a valid password OUT: password
1.4.1 Loop until password is valid
1.4.2 Ask the user to enter a password
1.4.3 Check that the first character is a capital letter (ASCII values 65 to 90)
1.4.4 Check that the last character is #, $ or % (ASCII values 35 to 37) 1.4.5 Return a valid password
U = [65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90]
upCase = ''.join(chr(i) for i in U)
print(upCase) #Ensure it is working
def passVal(userPass):
SpecialSym = ["#", "$", "%"]
val = True
#Common way to validate password VVV
if len(userPass) < 8:
print("Length of password should be at least 8")
val = False
if not any(char.isdigit() for char in userPass):
print("Password should have at least one numeral")
val = False
#I Tried same with ASCII (and other methods too) but it seemed to be skipping this part VVV
if not any(upCase for char in userPass):
print("Password should have at least one uppercase letter")
val = False
if not any(char.islower() for char in userPass):
print("Password should have at least one lowercase letter")
val = False
if not any(char in SpecialSym for char in userPass):
print("Password should have at least on fo the symbols $%#")
val = False
if val:
return val
def password():
if (passVal(userPass)):
print("Password is valid")
else:
print("Invalid Password !!")
userPass = input("Pass: ")
password()
From Python 3.7 you can use str.isascii()...
>>> word = 'asciiString'
>>> word.isascii()
True
Otherwise you could use:
>>> all([ord(c) < 128 for c in word])
True
Since all ASCII characters have an ordinal (ord) value less than 128 (0 -> 127): https://en.wikipedia.org/wiki/ASCII
So your logic will either be (3.7+):
if word.isascii():
# string is ascii
...
Or:
if all([ord(c) < 128 for c in word]):
# string is ascii
else:
# string contains at least one non-ascii character
I have a file where on each line I have text like this (representing cast of a film):
[{'cast_id': 23, 'character': "Roger 'Verbal' Kint", 'credit_id': '52fe4260c3a36847f8019af7', 'gender': 2, 'id': 1979, 'name': 'Kevin Spacey', 'order': 5, 'profile_path': '/x7wF050iuCASefLLG75s2uDPFUu.jpg'}, {'cast_id': 27, 'character': 'Edie's Finneran', 'credit_id': '52fe4260c3a36847f8019b07', 'gender': 1, 'id': 2179, 'name': 'Suzy Amis', 'order': 6, 'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]
I need to convert it in a valid json string, thus converting only the necessary single quotes to double quotes (e.g. the single quotes around word Verbal must not be converted, eventual apostrophes in the text also should not be converted).
I am using python 3.x. I need to find a regular expression which will convert only the right single quotes to double quotes, thus the whole text resulting in a valid json string. Any idea?
First of all, the line you gave as example is not parsable! … 'Edie's Finneran' … contains a syntax error, not matter what.
Assuming that you have control over the input, you could simply use eval() to read in the file. (Although, in that case one would wonder why you can't produce valid JSON in the first place…)
>>> f = open('list.txt', 'r')
>>> s = f.read().strip()
>>> l = eval(s)
>>> import pprint
>>> pprint.pprint(l)
[{'cast_id': 23,
'character': "Roger 'Verbal' Kint",
...
'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]
>>> import json
>>> json.dumps(l)
'[{"cast_id": 23, "character": "Roger \'Verbal\' Kint", "credit_id": "52fe4260ca36847f8019af7", "gender": 2, "id": 1979, "name": "Kevin Spacey", "order": 5, "rofile_path": "/x7wF050iuCASefLLG75s2uDPFUu.jpg"}, {"cast_id": 27, "character":"Edie\'s Finneran", "credit_id": "52fe4260c3a36847f8019b07", "gender": 1, "id":2179, "name": "Suzy Amis", "order": 6, "profile_path": "/b1pjkncyLuBtMUmqD1MztDSG80.jpg"}]'
If you don't have control over the input, this is very dangerous, as it opens you up to code injection attacks.
I cannot emphasize enough that the best solution would be to produce valid JSON in the first place.
If you do not have control over the JSON data, do not eval() it!
I created a simple JSON correction mechanism, as that is more secure:
def correctSingleQuoteJSON(s):
rstr = ""
escaped = False
for c in s:
if c == "'" and not escaped:
c = '"' # replace single with double quote
elif c == "'" and escaped:
rstr = rstr[:-1] # remove escape character before single quotes
elif c == '"':
c = '\\' + c # escape existing double quotes
escaped = (c == "\\") # check for an escape character
rstr += c # append the correct json
return rstr
You can use the function in the following way:
import json
singleQuoteJson = "[{'cast_id': 23, 'character': 'Roger \\'Verbal\\' Kint', 'credit_id': '52fe4260c3a36847f8019af7', 'gender': 2, 'id': 1979, 'name': 'Kevin Spacey', 'order': 5, 'profile_path': '/x7wF050iuCASefLLG75s2uDPFUu.jpg'}, {'cast_id': 27, 'character': 'Edie\\'s Finneran', 'credit_id': '52fe4260c3a36847f8019b07', 'gender': 1, 'id': 2179, 'name': 'Suzy Amis', 'order': 6, 'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]"
correctJson = correctSingleQuoteJSON(singleQuoteJson)
print(json.loads(correctJson))
Here is the code to get desired output
import ast
def getJson(filepath):
fr = open(filepath, 'r')
lines = []
for line in fr.readlines():
line_split = line.split(",")
set_line_split = []
for i in line_split:
i_split = i.split(":")
i_set_split = []
for split_i in i_split:
set_split_i = ""
rev = ""
i = 0
for ch in split_i:
if ch in ['\"','\'']:
set_split_i += ch
i += 1
break
else:
set_split_i += ch
i += 1
i_rev = (split_i[i:])[::-1]
state = False
for ch in i_rev:
if ch in ['\"','\''] and state == False:
rev += ch
state = True
elif ch in ['\"','\''] and state == True:
rev += ch+"\\"
else:
rev += ch
i_rev = rev[::-1]
set_split_i += i_rev
i_set_split.append(set_split_i)
set_line_split.append(":".join(i_set_split))
line_modified = ",".join(set_line_split)
lines.append(ast.literal_eval(str(line_modified)))
return lines
lines = getJson('test.txt')
for i in lines:
print(i)
Apart from eval() (mentioned in user3850's answer), you can use ast.literal_eval
This has been discussed in the thread: Using python's eval() vs. ast.literal_eval()?
You can also look at the following discussion threads from Kaggle competition which has data similar to the one mentioned by OP:
https://www.kaggle.com/c/tmdb-box-office-prediction/discussion/89313#latest-517927
https://www.kaggle.com/c/tmdb-box-office-prediction/discussion/80045#latest-518338
I'm new to python and I'm trying a new problem but couldn't get the solution. I've a text file called replace.txt with the contents like this:
81, 40.001, 49.9996, 49.9958
82, 41.1034, 54.5636, 49.9958
83, 44.2582, 58.1856, 49.9959
84, 48.7511, 59.9199, 49.9957
85, 53.4674, 59.3776, 49.9958
86, 57.4443, 56.6743, 49.9959
87, 59.7, 52.4234, 49.9958
Now I have one more file called actual data and it has a huge amount of data like the one above now I want to replace the above lines in actualdata.txt by matching the first number like search for '81' in actualdata.txt and replace it with line having '81' in replace.txt
here the actualdata.txt looks like this:
--------lines above--------
81, 40.0 , 50.0 , 50.0
82, 41.102548189607, 54.564575695695, 50.0
83, 44.257790830341, 58.187003960661, 50.0
84, 48.751279796738, 59.921728571875, 50.0
85, 53.468166336575, 59.379329520912, 50.0
86, 57.445611860313, 56.675542227082, 50.0
87, 59.701750075154, 52.424055585018, 50.0
88, 59.725876387298, 47.674633684987, 50.0
89, 57.511209176153, 43.398353484768, 50.0
90, 53.558991157616, 40.654756186166, 50.0
91, 48.853051436724, 40.06599229952 , 50.0
92, 44.335578609695, 41.75898487363 , 50.0
93, 41.139049269956, 45.364964707822, 50.0
94, 4.9858306110506, 4.9976785333108, 50.0
95, 9.9716298556132, 4.9995886389273, 50.0
96, 4.9712790759448, 9.9984071508336, 50.0
97, 9.9421696473295, 10.002460334272, 50.0
98, 14.957223264745, 5.0022762348283, 50.0
99, 4.9568005100444, 15.000751982196, 50.0
------lines below----------
How can I do this please help me I 'm trying to use fileinput and replace but I'm not getting the output.
this is the sample code which I'm still improvising (this is working fyn for one line):
oldline=' 82, 41.102548189607, 54.564575695695, 50.0'
newline=' 81, 40.001, 49.9996, 49.9958'
for line in fileinput.input(inpfile, inplace = 1):
print line.replace(oldline,newline),
this is the code I wrote finally :
replacefile= open('temp.txt','r')
for line1 in replacefile:
newline = line1.rstrip()
rl=newline
rl=rl.split()
search =rl[0]
with open(inpfile) as input:
intable = False
for line in input:
fill=[]
if line.strip() == "*NODE":
intable = True
if line.strip() == "---------------------------------------------------------------":
intable = False
if intable:
templine=(line.rstrip())
tl=line.rstrip()
tl= tl.split()
if tl[0] == search:
oldline=templine
for line2 in fileinput.input(inpfile, inplace = 1):
line2.replace(oldline,newline)
But I couldn't get the output the contents of the actualdata.txt are getting deletd, help me with this
output I wanted is to change the actualdata.txt like this:
-------lines above------
81, 40.001, 49.9996, 49.9958
82, 41.1034, 54.5636, 49.9958
83, 44.2582, 58.1856, 49.9959
84, 48.7511, 59.9199, 49.9957
85, 53.468166336575, 59.379329520912, 50.0
86, 57.445611860313, 56.675542227082, 50.0
87, 59.701750075154, 52.424055585018, 50.0
88, 59.725876387298, 47.674633684987, 50.0
89, 57.511209176153, 43.398353484768, 50.0
90, 53.558991157616, 40.654756186166, 50.0
-------lines below------
Use fileinput module to replace lines inplace:
import fileinput
def get_next_line(fptr):
x = fptr.readline()
if(x != ''):
return x.strip(), x.strip().split()[0].strip()
else:
return '',''
f = open("replace.txt", "r")
f_line, f_line_no = get_next_line(f)
for line in fileinput.input("actualdata.txt", inplace=True):
if(line.strip().split()[0].strip() == f_line_no): #if line number matches
print(f_line) # write newline
f_line, f_line_no = get_next_line(f) # Get next newline
else: # Otherwise
print(line.strip()) # write original one
By the way I am using python3. Make appropriate changes if you are using python2
Is replace.txt is also big? If not, you can load that first into memory, build a dictionary and use it to replace lines in actualdata.txt
Here's what I am doing:
First open replace.txt and build a dictionary. Since you are replacing the lines by the first value of the line, we make that as dictionary key. And the whose value will be line you want to replace. Like:
replacement_data = {
'81': '81, 40.001, 49.9996, 49.9958',
'82': 82, 41.1034, 54.5636, 49.9958,
...
...
}
Next we start reading the actualdata.txt file, line by line. So, we have to find if the first number of this line whether to be replaced or not. So, we will first split it by ,, get the first character and see if it is present in replacement_data dictionary. If it is present, we will replace it and if not, we will simply ignore.
line = "83, 44.257790830341, 58.187003960661, 50.0"
first_char = line.split(',')[0].strip() #first char is 83
# lets check whether to replace it or not
if first_char in replacement_data.keys():
# if key exists, we have to replace
line = replacement_data[first_char]
print line # so that it writes to file
Putting all pieces together:
import fileinput
import sys
inpfile = 'actualdata.txt'
replacement_file = 'replace.txt'
replacement_data = {}
with open(replacement_file) as f:
for line in f:
key = line.split(',')[0].strip()
replacement_data[key] = line
for line in fileinput.input(inpfile, inplace = 1):
first_char = line.split(',')[0].strip()
try:
int(first_char)
line = replacement_data[first_char]
print line,
except (ValueError, KeyError):
print line,
continue
It generates the original file to:
--------lines above--------
81, 40.001, 49.9996, 49.9958
82, 41.1034, 54.5636, 49.9958
...
...
86, 57.4443, 56.6743, 49.9959
87, 59.7, 52.4234, 49.9958 88, 59.725876387298, 47.674633684987, 50.0
89, 57.511209176153, 43.398353484768, 50.0
...
99, 4.9568005100444, 15.000751982196, 50.0
------lines below----------