Python string matching in if else condition - python

I am currently trying to find line based on the same pattern. If the line match the pattern, i want to print the line in output file.
Here is one the example of the line in "in.txt":
in_file [0:2] declk
out_file [0:1] subclk
The script that i currently have with the help of #gilch:
#!/usr/bin/python
import re
with open("in.txt", "r+") as f:
with open("out.txt, "w+") as fo:
for line in f:
if "\S*\s*[\d:\d]\s*\S*" in line:
fo.write(line) #need to fix this line
But then, is it possible to make the output like below:
e.g
Output in "out.txt":
in_file [0] declk
in_file [1] declk
in_file [2] declk
out_file [0] subclk
out_file [1] subclk

You'll need to import the re module to use regex.
import re
with open("out.txt", "w+") as fo:
for line in f:
if re.match(r"\S*\s*\[-?\d*:?-?\d*\]\s*\S*", line):
fo.write(line)
Also, indentation is part of Python's syntax. The colon isn't enough.
This also assumes that f is already some iterable containing your lines. (The above code never assigns to it.)

Try this:
import re
with open("out.txt", "w+") as fo:
for line in f:
if re.match(r"\w+\s\[\d\:\d\]\s\w+",line):
fo.write(line)

Related

Python sort and delete duplicates in list an use re.sub

I am total new with Python.
I try to make analog bash command: cat domains.txt |sort -u|sed 's/^*.//g' > domains2.txt
File domains contains list of domains with and without mask prefix *. like:
*.example.com
example2.org
About 300k+ lines
I wrote this code:
infile = "domains.txt"
outfile = "2"
outfile2 = "3"
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
line = line.replace('*.', "")
fout.write(line)
with open('2', 'r') as r, open(outfile2, "w") as fout2 :
for line in sorted(r):
print(line, end='',file=fout2)
its cut *. as planned, sort list, but doesn't remove duplicates of lines
I had advise to use re.sub instead of replace to make pattern more strict (like in sed where I do it from beginning of lines), but when I tried this:
import re
infile = "domains.txt"
outfile = "2"
outfile2 = "3"
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
newline = re.sub('^*.', '', line)
fout.write(newline)
with open('2', 'r') as r, open(outfile2, "w") as fout2 :
for line in sorted(r):
print(line, end='',file=fout2)
it just doesn't work with errors, which I don't understand.
In regular expressions *, . and alike are special characters. You should escape them in order to use them.
import re
s = "*.example.com"
re.sub(r'^\*\.', '', s)
> 'example.com'

make list from text file and compare the lists

The full.txt contains:
www.example.com/a.jpg
www.example.com/b.jpg
www.example.com/k.jpg
www.example.com/n.jpg
www.example.com/x.jpg
The partial.txt contains:
a.jpg
k.jpg
Why the following code does not provide the desired result?
with open ('full.txt', 'r') as infile:
lines_full=[line for line in infile]
with open ('partial.txt', 'r') as infile:
lines_partial=[line for line in infile]
with open ('remaining.txt', 'w') as outfile:
for element in lines_full:
if element[16:21] not in lines_partial: #element[16:21] means like a.jpg
outfile.write (element)
The desired remaining.txt should have those elements of full.txt that are not in partial.txt exactly as follows:
www.example.com/b.jpg
www.example.com/n.jpg
www.example.com/x.jpg
you can use os.path library:
from os import path
with open ('full.txt', 'r') as f:
lines_full = f.read().splitlines()
with open ('partial.txt', 'r') as f:
lines_partial = set(f.read().splitlines()) # create set for faster checking
lines_new = [x + '\n' for x in lines_full if path.split(x)[1] not in lines_partial]
with open('remaining.txt', 'w') as f:
f.writelines(lines_new)
This code will include the newline character at the end of each line, which means it will never match "a.jpg" or "k.jpg" precisely.
with open ('partial.txt', 'r') as infile:
lines_partial=[line for line in infile]
Change it to
with open ('partial.txt', 'r') as infile:
lines_partial=[line[:-1] for line in infile]
to get rid of the newline characters (line[:-1] means "without the last character of the line")

Change values in CSV or text style file

I'm having some problems with the following file.
Each line has the following content:
foobar 1234.569 7890.125 12356.789 -236.4569 236.9874 -569.9844
What I want to edit in this file, is reverse last three numbers, positive or negative.
The output should be:
foobar 1234.569 7890.125 12356.789 236.4569 -236.9874 569.9844
Or even better:
foobar,1234.569,7890.125,12356.789,236.4569,-236.9874,569.9844
What is the easiest pythonic way to accomplish this?
At first I used the csv.reader, but I found out it's not tab separated, but random (3-5) spaces.
I've read the CSV module and some examples / similar questions here, but my knowledge of python ain't that good and the CSV module seems pretty tough when you want to edit a value of a row.
I can import and edit this in excel with no problem, but I want to use it in a python script, since I have hundreds of these files. VBA in excel is not an option.
Would it be better to just regex each line?
If so, can someone point me in a direction with an example?
You can use str.split() to split your white-space-separated lines into a row:
row = line.split()
then use csv.writer() to create your new file.
str.split() with no arguments, or None as the first argument, splits on arbitrary-width whitespace and ignores leading and trailing whitespace on the line:
>>> 'foobar 1234.569 7890.125 12356.789 -236.4569 236.9874 -569.9844\n'.split()
['foobar', '1234.569', '7890.125', '12356.789', '-236.4569', '236.9874', '-569.9844']
As a complete script:
import csv
with open(inputfilename, 'r') as infile, open(outputcsv, 'wb') as outfile:
writer = csv.writer(outfile)
for line in infile:
row = line.split()
inverted_nums = [-float(val) for val in row[-3:]]
writer.writerow(row[:-3] + inverted_nums)
from operator import neg
with open('file.txt') as f:
for line in f:
line = line.rstrip().split()
last3 = map(str,map(neg,map(float,line[-3:])))
print("{0},{1}".format(line[0],','.join(line[1:-3]+last3)))
Produces:
>>>
foobar,1234.569,7890.125,12356.789,236.4569,-236.9874,569.9844
CSV outputting version:
with open('file.txt') as f, open('ofile.txt','w+') as o:
writer = csv.writer(o)
for line in f:
line = line.rstrip().split()
last3 = map(neg,map(float,line[-3:]))
writer.writerow(line[:-3]+last3)
You could use genfromtxt:
import numpy as np
a=np.genfromtxt('foo.csv', dtype=None)
with open('foo.csv','w') as f:
for el in a[()]:
f.write(str(el)+',')

Join the content of files to one file

I have two files and I want to join the content of them into one file side-by-side, i.e., line n of the output file should consist of line n of file 1 and line n of file 2. The files have the same number of lines.
What I have until now:
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
fout.write(f1+f2)
but it gives an error saying -
TypeError: unsupported operand type(s) for +: 'file' and 'file'
What am I doing wrong?
I'd try itertools.chain() and work line per line (you use "r" to open your files, so I assume you do not red binary files:
from itertools import chain
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
for line in chain(f1, f2):
fout.write(line)
It works as generator, so no memory problems are likely, even for huge files.
Edit
New reuqirements, new sample:
from itertools import izip_longest
separator = " "
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
for line1, line2 in izip_longest(f1, f2, fillvalue=""):
line1 = line1.rstrip("\n")
fout.write(line1 + separator + line2)
I added a separator string which is put between the lines.
izip_longest also works if one file has more lines than the other. The fill_value "" is then used for the missing line. izip_longestalso works as generator.
Important is also the line line1 = line1.rstrip("\n"), I guess it's obvious what it does.
You can do it with:
fout.write(f1.read())
fout.write(f2.read())
You are actualy concatenating 2 file objects, however, you want to conctenate strings.
Read the file contents first with f.read. For example, this way:
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
fout.write(f1.read()+f2.read())
I would prefer to use shutil.copyfileobj. You can easily combine it with glob.glob to concatenate a bunch of files by patterns
>>> import shutil
>>> infiles = ["test1.txt", "test2.txt"]
>>> with open("test.out","wb") as fout:
for fname in infiles:
with open(fname, "rb") as fin:
shutil.copyfileobj(fin, fout)
Combining with glob.glob
>>> import glob
>>> with open("test.out","wb") as fout:
for fname in glob.glob("test*.txt"):
with open(fname, "rb") as fin:
shutil.copyfileobj(fin, fout)
But over and above that if you are in a system where you can use posix utilities, prefer its use
D:\temp>cat test1.txt test2.txt > test.out
In case you are using windows, you can issue the following from command prompt.
D:\temp>copy/Y test1.txt+test2.txt test.out
test1.txt
test2.txt
1 file(s) copied.
Note
Based on your latest update
Yes it has the same number of lines and I want to join every line of
one file with the other file
with open("test.out","wb") as fout:
fout.writelines('\n'.join(''.join(map(str.strip, e))
for e in zip(*(open(fname) for fname in infiles))))
And on posix system, you can do
paste test1.txt test2.txt

Insert string at the beginning of each line

How can I insert a string at the beginning of each line in a text file, I have the following code:
f = open('./ampo.txt', 'r+')
with open('./ampo.txt') as infile:
for line in infile:
f.insert(0, 'EDF ')
f.close
I get the following error:
'file' object has no attribute 'insert'
Python comes with batteries included:
import fileinput
import sys
for line in fileinput.input(['./ampo.txt'], inplace=True):
sys.stdout.write('EDF {l}'.format(l=line))
Unlike the solutions already posted, this also preserves file permissions.
You can't modify a file inplace like that. Files do not support insertion. You have to read it all in and then write it all out again.
You can do this line by line if you wish. But in that case you need to write to a temporary file and then replace the original. So, for small enough files, it is just simpler to do it in one go like this:
with open('./ampo.txt', 'r') as f:
lines = f.readlines()
lines = ['EDF '+line for line in lines]
with open('./ampo.txt', 'w') as f:
f.writelines(lines)
Here's a solution where you write to a temporary file and move it into place. You might prefer this version if the file you are rewriting is very large, since it avoids keeping the contents of the file in memory, as versions that involve .read() or .readlines() will. In addition, if there is any error in reading or writing, your original file will be safe:
from shutil import move
from tempfile import NamedTemporaryFile
filename = './ampo.txt'
tmp = NamedTemporaryFile(delete=False)
with open(filename) as finput:
with open(tmp.name, 'w') as ftmp:
for line in finput:
ftmp.write('EDF '+line)
move(tmp.name, filename)
For a file not too big:
with open('./ampo.txt', 'rb+') as f:
x = f.read()
f.seek(0,0)
f.writelines(('EDF ', x.replace('\n','\nEDF ')))
f.truncate()
Note that , IN THEORY, in THIS case (the content is augmented), the f.truncate() may be not really necessary. Because the with statement is supposed to close the file correctly, that is to say, writing an EOF (end of file ) at the end before closing.
That's what I observed on examples.
But I am prudent: I think it's better to put this instruction anyway. For when the content diminishes, the with statement doesn't write an EOF to close correctly the file less far than the preceding initial EOF, hence trailing initial characters remains in the file.
So if the with statement doens't write EOF when the content diminishes, why would it write it when the content augments ?
For a big file, to avoid to put all the content of the file in RAM at once:
import os
def addsomething(filepath, ss):
if filepath.rfind('.') > filepath.rfind(os.sep):
a,_,c = filepath.rpartition('.')
tempi = a + 'temp.' + c
else:
tempi = filepath + 'temp'
with open(filepath, 'rb') as f, open(tempi,'wb') as g:
g.writelines(ss + line for line in f)
os.remove(filepath)
os.rename(tempi,filepath)
addsomething('./ampo.txt','WZE')
f = open('./ampo.txt', 'r')
lines = map(lambda l : 'EDF ' + l, f.readlines())
f.close()
f = open('./ampo.txt', 'w')
map(lambda l : f.write(l), lines)
f.close()

Categories

Resources