Fast way to update json lines file in python [duplicate] - python

This question already has answers here:
Search and replace a line in a file in Python
(13 answers)
Closed 3 years ago.
I have a jsonline file like below:
{"id":0,"country":"fr"}
{"id":1,"country":"en"}
{"id":2,"country":"fr"}
{"id":3,"country":"fr"}
I have a list of codes, i want to attribute a code to each user, by updating the file lines.
The result should be the following:
{"id":0,"country":"fr", code:1}
{"id":1,"country":"en", code:2}
{"id":2,"country":"fr", code:3}
{"id":3,"country":"fr", code:4}
This is how i do it now:
import ujson
fh, abs_path = mkstemp()
with open(fh, 'w') as tmp_file:
with open(shooting.segment_filename) as segment_filename:
for line in segment_filename:
enriched_line = ujson.loads(line)
code = compute_code()
if code:
enriched_line["code"] = code
tmp_file.write(ujson.dumps(enriched_line) + '\n')
My question is, is there a faster way to do this ? May be via a linux command launched via sarge for example ? or any pythonic way without having to read the read / write / replace the original file ?
Thank you !

For performance you can skip the json serialization / deserialization step completely and just replace the closing bracket with your code + a closing bracket.
So this should perform much better:
content = ""
with open("inp.txt", "r") as inp:
for line in inp:
content += line[:-1] + ", code:%s}\n" % compute_code()
with open("inp.txt", "w") as out:
out.write(content)
EDIT:
If you don't want to load the whole file into memory you can do something like this.
with open("inp.txt", "r") as inp, open("out.txt", "w") as out:
for line in inp:
out.write(line[:-1] + ", code:%s}\n" % compute_code())

I do not know if this will satisfy you but here is some "cleaner" code:
import json
with open(shooting.segment_filename, "r") as f:
data = [json.loads(line) for line in f.readlines()]
for json_line in data:
code = compute_code()
if code:
json_line["code"] = code
# Will overwrite source file, you might want to give a bogus file to test it first
with open(shooting.segment_filename, "w") as f:
f.write("\n".join([json.dumps(elem) for elem in data]))

Related

Reading Two Files and Writing To One File Using Python3

I'm currently using Python 3 on Ubuntu 18.04. I'm not a programmer by any means and I'm not asking for a code review, however, I'm having an issue that I can't seem to resolve.
I have 1 text file named content.txt that I'm reading lines from.
I have 1 text file named standard.txt that I'm reading lines from.
I have 1text file named outfile.txt that I'm writing to.
content = open("content.txt", "r").readlines()
standard = open("standard.txt", "r").readlines()
outfile = "outfile.txt"
outfile_set = set()
with open(outfile, "w") as f:
for line in content:
if line not in standard:
outfile_set.add(line)
f.writelines(sorted(outfile_set))
I'm not sure where to put the following line though. My for loop nesting may all be off:
f.write("\nNo New Content")
Any code examples to make this work would be most appreciated. Thank you.
if i understand good you whant to add outfile_set if this is not empty to the outfile or add the string "\nNo New Content"
Replace the line
f.writelines(sorted(outfile_set))
to
if any(outfile_set):
f.writelines(sorted(outfile_set))
else:
f.write("\nNo New Content")
I'm assuming that you want to write "No new content" to the file if every line in content is in standard. So you might do something like:
with open(outfile, "w") as f:
for line in content:
if line not in standard:
outfile_set.add(line)
if len(outfile_set) > 0:
f.writelines(sorted(outfile_set))
else:
f.write("\nNo New Content")
Your original code was almost there!
You can reduce your runtime a lot by using set/frozenset:
with open("content.txt", "r") as f:
content = frozenset(f.readlines()) # only get distinct values from file
with open("standard.txt", "r") as f:
standard = frozenset(f.readlines()) # only get distinct values from file
# only keep whats in content but not in standard
outfile_set = sorted(content-standard) # set difference, no loops or tests needed
with open ("outfile.txt","w") as outfile:
if outfile_set:
outfile.writelines(sorted(outfile_set))
else:
outfile.write("\nNo New Content")
You can read more about it here:
set operator list (python 2 - but valid for 3 - can't find this overview in py3 doku
set difference
Demo:
# Create files
with open("content.txt", "w") as f:
for n in map(str,range(1,10)): # use range(1,10,2) for no changes
f.writelines(n+"\n")
with open("standard.txt", "w") as f:
for n in map(str,range(1,10,2)):
f.writelines(n+"\n")
# Process files:
with open("content.txt", "r") as f:
content = set(f.readlines())
with open("standard.txt", "r") as f:
standard = set(f.readlines())
# only keep whats in content but not in standard
outfile_set = sorted(content-standard)
with open ("outfile.txt","w") as outfile:
if outfile_set:
outfile.writelines(sorted(outfile_set))
else:
outfile.write("\nNo New Content")
with open ("outfile.txt") as f:
print(f.read())
Output:
2
4
6
8
or
No New Content

Inconsistent return values when using regex functions [duplicate]

This question already has answers here:
Why can't I call read() twice on an open file?
(7 answers)
Python : The second for loop is not running
(1 answer)
Closed 4 years ago.
My code is behaving strangely, and I have a feeling it has to do with the regular expressions i'm using.
I'm trying to determine the number of total words, number of unique words, and number of sentences in a text file.
Here is my code:
import sys
import re
file = open('sample.txt', 'r')
def word_count(file):
words = []
reg_ex = r"[A-Za-z0-9']+"
p = re.compile(reg_ex)
for l in file:
for i in p.findall(l):
words.append(i)
return len(words), len(set(words))
def sentence_count(file):
sentences = []
reg_ex = r'[a-zA-Z0-9][.!?]'
p = re.compile(reg_ex)
for l in file:
for i in p.findall(l):
sentences.append(i)
return sentences, len(sentences)
sentence, sentence_count = sentence_count(file)
word_count, unique_word_count = word_count(file)
print('Total word count: {}\n'.format(word_count) +
'Unique words: {}\n'.format(unique_word_count) +
'Sentences: {}'.format(sentence_count))
The output is the following:
Total word count: 0
Unique words: 0
Sentences: 5
What is really strange is that if I comment out the sentence_count() function, the word_count() function starts working and outputs the correct numbers.
Why is this inconsistency happening? If I comment out either function, one will output the correct value while the other will output 0's. Can someone help me such that both functions work?
The issue is that you can only iterate over an open file once. You need to either reopen or rewind the file to iterate over it again.
For example:
with open('sample.txt', 'r') as f:
sentence, sentence_count = sentence_count(f)
with open('sample.txt', 'r') as f:
word_count, unique_word_count = word_count(f)
Alternatively, f.seek(0) would rewind the file.
Make sure to open and close your file properly. One way you can do this is by saving all the text first.
with open('sample.txt', 'r') as f:
file = f.read()
The with statement can be used to open and safely close the file handle. Since you would have extracted all the contents into file, you don't need the file open anymore.

Concat/Append to the end of a specific line of text file in Python

I need to add some text to the end of a specific line in a text file. I'm currently trying to use a method similar to this:
entryList = [5,4,3,8]
dataFile = open("file.txt","a+)
for i in dataFile:
for j in entryList:
lines[i] = lines[i].strip()+entryList[j]+" "
dataFile.write(lines[i])
I'd like to add the numbers immediately following the text.
Text file setup:
it is
earlier it was
before that it was
later it will be
You have mentioned specific line in the question and in your code you are writing for every line.
import fileinput
entryList = [5,4,3,8]
count = 0
for line in fileinput.FileInput('data.txt',inplace=1):
print line+str(entryList[count])+" "
count+=1
Reading from and then writing to the same file is not such a good idea. I suggest opening it as needed:
entryList = [5,4,3,8]
with open("file.txt", "r") as dataFile:
# You may want to add '\n' at the end of the format string
lines = ["{} {} ".format(s.strip(), i) for s,i in zip(dataFile, entryList)]
with open("file.txt", "w") as outFile:
outFile.writelines(lines)

Save each line as separate .txt file using Notepad++

I am using Notepad++ to restructure some data. Each .txt file has 99 lines. I am trying to run a python script to create 99 single-line files.
Here is the .py script I am currently running, which I found in a previous thread on the topic. I'm not sure why, but it isn't quite doing the job:
yourfile = open('filename.TXT', 'r')
counter = 0
magic = yourfile.readlines()
for i in magic:
counter += 1
newfile = open(('filename_' + str(counter) + '.TXT'), "w")
newfile.write(i)
newfile.close()
When I run this particular script, it simply creates a copy of the host file, and it still has 99 lines.
You may want to change the structure of your script a bit:
with open('filename.txt', 'r') as f:
for i, line in enumerate(f):
with open('filename_{}.txt'.format(i), 'w') as wf:
wf.write(line)
In this format you have the benefit of relying on context managers to close your file handler and also you don't have to read things separately, there isa better logical flow.
You can use the following piece of code to achieve that. It's commented, but feel free to ask.
#reading info from infile with 99 lines
infile = 'filename.txt'
#using context handler to open infile and readlines
with open(infile, 'r') as f:
lines = f.readlines()
#initializing counter
counter = 0
#for each line, create a new file and write line to it.
for line in lines:
#define outfile name
outfile = 'filename_' + str(counter) + '.txt'
#create outfile and write line
with open(outfile, 'w') as g:
g.write(line)
#add +1 to counter
counter += 1
magic = yourfile.readlines(99)
Please try remove '99' like this.
magic = yourfile.readlines()
I tried it and I have 99 file that have a single line each one.

Insert string at the beginning of each line

How can I insert a string at the beginning of each line in a text file, I have the following code:
f = open('./ampo.txt', 'r+')
with open('./ampo.txt') as infile:
for line in infile:
f.insert(0, 'EDF ')
f.close
I get the following error:
'file' object has no attribute 'insert'
Python comes with batteries included:
import fileinput
import sys
for line in fileinput.input(['./ampo.txt'], inplace=True):
sys.stdout.write('EDF {l}'.format(l=line))
Unlike the solutions already posted, this also preserves file permissions.
You can't modify a file inplace like that. Files do not support insertion. You have to read it all in and then write it all out again.
You can do this line by line if you wish. But in that case you need to write to a temporary file and then replace the original. So, for small enough files, it is just simpler to do it in one go like this:
with open('./ampo.txt', 'r') as f:
lines = f.readlines()
lines = ['EDF '+line for line in lines]
with open('./ampo.txt', 'w') as f:
f.writelines(lines)
Here's a solution where you write to a temporary file and move it into place. You might prefer this version if the file you are rewriting is very large, since it avoids keeping the contents of the file in memory, as versions that involve .read() or .readlines() will. In addition, if there is any error in reading or writing, your original file will be safe:
from shutil import move
from tempfile import NamedTemporaryFile
filename = './ampo.txt'
tmp = NamedTemporaryFile(delete=False)
with open(filename) as finput:
with open(tmp.name, 'w') as ftmp:
for line in finput:
ftmp.write('EDF '+line)
move(tmp.name, filename)
For a file not too big:
with open('./ampo.txt', 'rb+') as f:
x = f.read()
f.seek(0,0)
f.writelines(('EDF ', x.replace('\n','\nEDF ')))
f.truncate()
Note that , IN THEORY, in THIS case (the content is augmented), the f.truncate() may be not really necessary. Because the with statement is supposed to close the file correctly, that is to say, writing an EOF (end of file ) at the end before closing.
That's what I observed on examples.
But I am prudent: I think it's better to put this instruction anyway. For when the content diminishes, the with statement doesn't write an EOF to close correctly the file less far than the preceding initial EOF, hence trailing initial characters remains in the file.
So if the with statement doens't write EOF when the content diminishes, why would it write it when the content augments ?
For a big file, to avoid to put all the content of the file in RAM at once:
import os
def addsomething(filepath, ss):
if filepath.rfind('.') > filepath.rfind(os.sep):
a,_,c = filepath.rpartition('.')
tempi = a + 'temp.' + c
else:
tempi = filepath + 'temp'
with open(filepath, 'rb') as f, open(tempi,'wb') as g:
g.writelines(ss + line for line in f)
os.remove(filepath)
os.rename(tempi,filepath)
addsomething('./ampo.txt','WZE')
f = open('./ampo.txt', 'r')
lines = map(lambda l : 'EDF ' + l, f.readlines())
f.close()
f = open('./ampo.txt', 'w')
map(lambda l : f.write(l), lines)
f.close()

Categories

Resources