How to delete particular content from a file using python - python

I have the following JSON file which contains some content(1st line and last line) due to which I am unable to load it as a JSON file. I want to edit this file using python so that I only have the content inside {} braces and "source.value(" & ");" are removed.
source.value(
{Meli:1,jack:3,rustin:4}
);
with open('check.json', 'rb') as g:
b=json.load(g)
Traceback (most recent call last):
File "<input>", line 2, in <module>
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I'm not sure how best to generalize for your specific problem since many things could change based on how these files are generated.
First off the json format requires the names to be in double quotes - {"Meli":1,"jack":3,"rustin":4}
Assuming this is fixed, we open the file, read the lines and only use json to read the second line. Remember python indexes from 0.
import json
with open('test.json', 'r') as file:
lines = file.readlines()
data = json.loads(lines[1])

Related

Unable to read json file with python

I am reading a json file with python using below code:
import json
Ums = json.load(open('commerceProduct.json'))
for um in Ums :
des = um['description']
if des == None:
um['description'] = "Null"
with open("sample.json", "w") as outfile:
json.dump(um, outfile)
break
It is giving me the following error:
Traceback (most recent call last):
File "test.py", line 2, in <module>
Ums = json.load(open('commerceProduct.json'))
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 5528 (char 5527)
while I am checking the json file, it looks fine.
The thing is it has one object on one line with deliminator being '\n'.
It is not corrupted since i have imported the same file in mongo.
Can someone please suggest what can be wrong in it ?
Thanks.
your JSON data is not in a valid format, one miss will mess up the python parser. Try to test your JSON data here, make sure it is in a correct format.
this return _default_decoder.decode(s) is returned when the python parser find somthing wrong in your json.
The code is valid and will work with a valid json doc.
You have one json object per line? That's not a valid json file. You have newline-delimited json, so consider using the ndjson package to read it. It has the same API as the json package you are familiar with.
import ndjson
Ums = ndjson.load(open('commerceProduct.json'))
...

`r+`, `a+`, `w+` for doing both reading and writing file in python

I have a file named dict_file.json in current folder with empty dict content {}
I want to open it such that I can do both read and modify the content. That is I will read the json file as dict, modify that dict and write it back to json file.
I tried r and r+ below:
import json
f = open('dict_file.json', 'r') # same output for r+
json.loads(list_file.read())
This prints
{}
When I tried w+:
f = open('dict_file.json', 'r')
this first clears the file. Then,
json.loads(list_file.read())
gives error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "F:\ProgramFiles\Python37\lib\json\__init__.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "F:\ProgramFiles\Python37\lib\json\__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "F:\ProgramFiles\Python37\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "F:\ProgramFiles\Python37\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
When I tried a+:
f = open('dict_file.json', 'a+')
json.loads(list_file.read())
gives error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "F:\ProgramFiles\Python37\lib\json\__init__.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "F:\ProgramFiles\Python37\lib\json\__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "F:\ProgramFiles\Python37\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "F:\ProgramFiles\Python37\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Though, it does not clear the file.
So, I guess I should be using r+ for my usecase scenario. Also I tried reading writing both :
f = open('dict_file.json', 'r+')
a = json.loads(list_file.read())
a = {'key':'value'} # this involves complex logic instead of plain assignment
json.dump(a,f)
f.flush()
Now when I open the file, its contents are weird:
{}{'key':'value'}
whereas I want it to be:
{'key':'value'}
So I have few questions:
Q1. Why w+ and a+ gives error?
Q2. Why file contained {}{'key':'value'} instead of {'key':'value'}?
Q3. How to correctly do reading and writing (with or without with block)?
PS: I am reading file, then running some loop which computes new dict and write it to file. Then loop sleeps for some time and repeats the same. Thats why I felt flush() will be correct here. That is I open file once outside loop and only flush inside the loop. No need to open file in each iteration.
Why w+ and a+ gives error?
Because the file reader is pointed at the end of the file, where there's no json object to read
its contents are weird
Seems like you're not resetting the file to an actual valid JSON object throughout your tests, and so you've managed to clear the file, read nothing, maybe write some other object, clear it again, then append one or two objects to it, etc etc.
Keep it simple. Start over
filename = "data.json"
with open(filename) as f:
data = json.load(f)
data["foo"] = "bar"
with open(filename, "w") as f:
json.dump(data, f)
You have to think in terms of what the operations do to a file when they open it:
r and r+ open a file for reading. This means that the contents of the file is intact, and the file pointer is the beginning of the file. The entire file is visible to the reader. After reading, the file pointer will be at the end, meaning that you're effectively appending at that point.
w and w+ open a file for writing. That means that the contents of the file is truncated. There is nothing to be read in, since the original contents is destroyed.
a and a+ open the file for appending. The contents of the file are unchanged. However, the file pointer is at the end, so a reader will see no data and raise an error.
The correct way to do this is to open the file twice, and use with blocks to do it:
with open('dict_file.json', 'r') as f:
a = json.load(f)
# you don't need the file to be open here
a = {'key': 'value'}
with open('dict_file.json', 'w') as f:
json.dump(a, f)
You might be tempted to try to open the file in read-write mode (r+), read it, then rewind it, so the pointer returns to the beginning, and then overwrite. There are two reasons not to do this:
Not every stream is rewindable
If the contents of the file decreases in size, you will end up with trash at the end. You would need to truncate to the current pointer after writing, which is an unnecessary layer of complexity

how to pass directly variable name as dict key with json.loads in python?

I'm making a script that write other script, arguments are passed as JSON while calling the script from terminal.
The script that need to be written contains a dictionnary.
One of the key value in this dict is a variable name (not a string) call strategy.
My problem looks like this.
d = json.loads(sys.argv[2])
# d should looks like that
d = {
"stopLossValue": 5,
"strategy": strategy,
"strategyTitle": "week5"
}
dic = """
parameterDict = {}
""".format(json.dumps(d, sort_keys=True, indent=4))
Running the script return an error that disapear if i set strategy key value as string.
Error:
Traceback (most recent call last):
File "updateCandleStrategy.py", line 11, in <module>
d = json.loads(sys.argv[2])
File "/usr/lib/python3.4/json/__init__.py", line 318, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.4/json/decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.4/json/decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 506 (char 505)
Is there a simple way to achieve my goal?
Thanks
when passing a json string from command line, there's a great chance that one of the quotes / escape char is interpreted by the underlying shell.
So that's not a viable/reliable method to pass json strings. Pass a file containing json data instead and read it:
with open(sys.argv[2]) as f:
d = json.load(f)
Example from windows console, just printing second argument:
S:\python>foo.py ff "d = {"s":12,"d":15}"
d = {s:12,d:15}
the quotes have been removed. Would need to double them.
On a Linux terminal, wrapping your argument into single quotes could solve most situations, though, until you stumble on a value containing a single quote...
Instead of passing a dictionary, why not using getopt or argparse and build/parse a proper command line?

"json.decoder.JSONDecodeError: Extra data: line 1 column 5287" but there is nothing there

I have a series of .json files. Each file contains tweets based on a different keyword. Each line in every file is a json object. I read the files using the following code:
# Get tweets out of JSON file
tweetsFromJSON = []
with open(json_file) as f:
for line in f:
json_object = json.loads(line)
tweet_text = json_object["text"]
tweetsFromJSON.append(tweet_text)
For every JSON file I have this works flawlessly. But this particular file gives me the following error:
Traceback (most recent call last):
File "C:/Users/alexandros/Dropbox/Development/Sentiment Analysis/lda_analysis.py", line 119, in <module>
lda_analysis('precision_medicine.json', 'precision medicine')
File "C:/Users/alexandros/Dropbox/Development/Sentiment Analysis/lda_analysis.py", line 46, in lda_analysis
json_object = json.loads(line)
File "C:\Users\alexandros\AppData\Local\Programs\Python\Python35-32\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\Users\alexandros\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 5287 (char 5286)
So tried removing the first line to see what happens. The error persists and again it's in the exact same position (line 1 column 5287 (char 5286)). I removed another line and it's the same. I'm breaking my head trying to figure out what's wrong. What am I missing?

Read filenames from a textfile in python (double backslash issue)

I am trying to read a list of files from a text file. I am using the following code to do that:
filelist = input("Please Enter the filelist: ")
flist = open (os.path.normpath(filelist),"r")
fname = []
for curline in flist:
# check if its a coment - do comment parsing in this if block
if curline.startswith('#'):
continue
fname.append(os.path.normpath(curline));
flist.close() #close the list file
# read the slave files 100MB at a time to generate stokes vectors
tmp = fname[0].rstrip()
t = np.fromfile(tmp,dtype='float',count=100*1000)
This works perfectly fine and I get the following array:
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin\n'
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin\n'
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\q_HH_slv3_08Oct2012.bin\n'
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\q_VV_slv3_08Oct2012.bin'
The problem is that the '\' charecter is escaped and there is a trailing '\n' in the strings. I used the str.rstrip() to get rid of the '\n' - this works, but leaves the problem of the two back slashes.
I have used the following approaches to try getting rid of these:
Used the codecs.unicode_escape_decode() but I get this error:
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 56-57: malformed \N character escape. Clearly this is not the right approach because I just want to decode the backslashed, not the rest of the string.
This does not work either: tmp = fname[0].rstrip().replace(r'\\','\\');
Is there no way to make readline() read a raw string?
UPDATE:
Basically I have a text file with 4 file names I would like to open and read data from in python. The text file contains:
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\i_HH_mst_08Oct2012.bin
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\i_HH_mst_08Oct2012.bin
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\q_HH_slv3_08Oct2012.bin
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\q_VV_slv3_08Oct2012.bin
I would like to open each file one by one and read 100MBs of data from them.
When I use this command:np.fromfile(flist[0],dtype='float',count=100) I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'H:\\Shaunak\\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin'
Update
Full Traceback:
Please Enter the filelist: H:/Shaunak/TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri/NEST_oregistration/Glacier_coreg_Cnv/filelist.txt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "G:\WinPython-32bit-3.3.2.3\python-3.3.2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 581, in runfile
execfile(filename, namespace)
File "G:\WinPython-32bit-3.3.2.3\python-3.3.2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 41, in execfile
exec(compile(open(filename).read(), filename, 'exec'), namespace)
File "H:/Shaunak/Programs/Arnab_glacier_vel/Stokes_generation_2.py", line 28, in <module>
t = np.fromfile(tmp,dtype='float',count=100*1000)
FileNotFoundError: [Errno 2] No such file or directory: 'H:\\Shaunak\\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin'
>>>
As #volcano stated, double slash is only an internal representation. If you print it, they're gone. The same if you write it to files, there will only be one '\'.
>>> string_with_double_backslash = "Here is a double backslash: \\"
>>> print(string_with_double_backslash)
Here is a double backslash: \
try this:
a_escaped = 'attachment; filename="Nuovo Cinema Paradiso 1988 Director\\\'s Cut"'
a_unescaped = codecs.getdecoder("unicode_escape")(a)[0]
yielding:
'attachment; filename="Nuovo Cinema Paradiso 1988 Director\'s Cut"'

Categories

Resources