I have 2 text files that I need to compare line by line.
I'm basically wanting to output either "matching" or "not matching" for each line depending on if it matches.
I've tried reading a few tutorial and using stuff like diff and dircmp but can't seem to find a way to do this. I don't care if it's bash, perl, python, etc. Both files are 243 lines.
Is there a command available in Linux to do this?
Here's an example of what I'm looking for...
File 1
Test
Hello
Example
File 2
Test
What
Example
And I'd want to output this:
matching
not matching
matching
In perl:
#!/usr/bin/perl
use strict;
use File::Slurp;
my #file1 = read_file 'file1', { chomp => 1 };
my #file2 = read_file 'file2', { chomp => 1 };
foreach (#file1) {
my $line = shift #file2;
print $_ eq $line ? "not matching\n" : "matching\n";
}
What you are after is an awk script of the following form:
$ awk '(NR==FNR){a[FNR]=$0;next}
!(FNR in a) { print "file2 has more lines than file1"; exit 1 }
{ print (($0 == a[FNR]) ? "matching" : "not matching") }
END { if (NR-FNR > FNR) print "file1 has more lines than file2"; exit 1}' file1 file2
This script works on the basis that both of your files are 243 lines. You will need to sort both files before running the script ie sort file1.txt > file1.sorted.txt and the same for the other file.
#!/bin/bash
while read file1 <&3 && read file2 <&4
if [[ $file1 == $file2 ]]; then
echo "matching" >> three.txt
else
echo "not matching" >> three.txt
fi
done 3</path/to/file1.sorted.txt 4</path/to/file2.sorted.txt
The above script will read each file line by line, comparing the input using the if statement. If the two strings are identical, it will write "matching" to three.txt else it will write "not matching" to the same file. The loop will go through each line.
You will have to sort the data within both files to make a comparison.
I've tested it with the following data:
one.sorted.txt
abc
cba
efg
gfe
xyz
zxy
two.sorted.txt
abc
cbd
efh
gfe
xyz
zmo
three.txt
matching
not matching
not matching
matching
matching
not matching
Its best to use dedicated linux file comparing tools such as Meld or Vimdiff, they are pretty straight forward and very convinient.
You can enter 'which meld' to check if you have it installed, if not found, install using this:
sudo apt-get install meld
In addition, here is a simple python script to get the results you asked for:
#!/usr/bin/env python3
with open ('1.txt') as f1:
lines1 = f1.readlines()
lines1 = [line.rstrip() for line in lines1]
with open ('2.txt') as f2:
lines2 = f2.readlines()
lines2 = [line.rstrip() for line in lines2]
for i, line in enumerate(range(min(len(lines1),len(lines2)))):
print("matching") if lines1[i] == lines2[i] else print("not matching")
Related
I have a python script which opens a file and reads it:
f= open("/file.dat", "r").read()
file.dat is a multiple lines file with quotes, spaces, new lines and special characters such as #,&,"
I would like to echo f into a new file named t.dat. I have tried:
cd= "echo \"{}\" >> t.dat".format(f)
os.system(cd)
which prints to the screen the file content until the "& config" which is in it and the error:
sh: 32: Config: not found
Tried the following as well with similar results:
cd= "$echo \"{}\" >> t.dat".format(f)
cd= "$echo {} >> t.dat".format(f)
What is the correct way to perform this?
Thank you!
Use shlex.quote()
import shlex
cd = r"printf '%s\n' {} >> t.dat".format(shlex.quote(f.read()))
I have a requirement to fetch the count the occurrence of '|' in each line of a file then match the count with given inputcount, needs to throw exception when the count is wrong.
Say if the inputcount=3 and the file has following content
s01|test|aaa|hh
S02|test|bbb
so3|test|ccc|oo
then exception should get thrown on executing the line 2 and it should exit the file.
Tried below Awk command to fetch the count for each lines, but I was not sure how to compare and throw the exception, when it not matches
awk ' {print (split($0,a,"\|")-1) }' test.dat
Can anyone please help me with it?
You may use this awk:
awk -v inputcount=3 -F '\\|' 'NF && NF != inputcount+1 {exit 1}' file &&
echo "good" || echo "bad"
Details:
-F '\\|' sets | as input field separator
NF != inputcount+1 will return true if any line doesn't have inputcount pipe delimiters.
$ inputcount=3
$ awk -v c="$inputcount" 'gsub(/\|/,"&") != c{exit 1}' file
$ echo $?
1
As you also tagged the post with python I will write a python answer that could be a simple script.
The core is:
with open(filename) as f:
for n, line in enumerate(f):
if line.count("|") != 3:
print(f"Not valid file at line {n + 1}")
Than you can add some boilerplate:
import fileinput
import sys
with fileinput.input() as f:
for n, line in enumerate(f):
if line.count("|") != 3:
print(f"Not valid file at line {n + 1}")
sys.exit(1)
And with fileinput you can accept almost any sort of input: see Piping to a python script from cmd shell
Maybe try
awk -F '[|]' -v cols="$inputcount" 'NF != cols+1 {
print FILENAME ":" FNR ":" $0 >"/dev/stderr"; exit 1 }' test.dat
The -F argument says to split on this delimiter; the number of resulting fields NF will be one more than there are delimiters, so we scream and die when that number is wrong.
I have thousands of text files on my disk.
I need to search for them in terms of selected words.
Currently, I use:
grep -Eri 'text1|text2|text3|textn' dir/ > results.txt
The result is saved to a file: results.txt
I would like the result to be saved to many files.
results_text1.txt, results_text2.txt, results_textn.txt
Maybe someone has encountered some kind of script eg in python?
One solution might be to use a bash for loop.
for word in text1 text2 text3 textn; do grep -Eri '$word' dir/ > results_$word.txt; done
You can run this directly from the command line.
By using combination of "sed" and "xargs"
echo "text1,text2,text3,textn" | sed "s/,/\n/g" | xargs -I{} sh -c "grep -ir {} * > result_{}"
One way (using Perl because it's easier for regex and one-liner).
Sample data:
% mkdir dir dir/dir1 dir/dir2
% echo -e "text1\ntext2\nnope" > dir/file1.txt
% echo -e "nope\ntext3" > dir/dir1/file2.txt
% echo -e "nope\ntext2" > dir/dir1/file3.txt
Search:
% find dir -type f -exec perl -ne '/(text1|text2|text3|textn)/ or next;
$pat = $1; unless ($fh{$pat}) {
($fn = $1) =~ s/\W+/_/ag;
$fn = "results_$fn.txt";
open $fh{$pat}, ">>", $fn;
}
print { $fh{$pat} } "$ARGV:$_"' {} \;
Content of results_text1.txt:
dir/file1.txt:text1
Content of results_text2.txt:
dir/dir2/file3.txt:text2
dir/file1.txt:text2
Content of results_text3.txt:
dir/dir1/file2.txt:text3
Note:
you need to put the pattern inside parentheses to capture it. grep doesn't allow one to do this.
the captured pattern is then filtered (s/\W+/_/ag means to replace nonalphanumeric characters with underscore) to ensure it's safe as part of a filename.
My goal is to compare two data one is from text file and one is from directory and after comparing it this is will notify or display in the console what are the data that is not found for example:
ls: /var/patchbundle/rpms/:squid-2.6.STABLE21-7.el5_10.x86_64.rpm NOT FOUND!
ls: /var/patchbundle/rpms/:tzdata-2014j-1.el5.x86_64.rpm
ls: /var/patchbundle/rpms/:tzdata-java-2014j-1.el5.x86_64.rpm
ls: /var/patchbundle/rpms/:wireshark-1.0.15-7.el5_11.x86_64.rpm
ls: /var/patchbundle/rpms/:wireshark-gnome-1.0.15-7.el5_11.x86_64.rpm
ls: /var/patchbundle/rpms/:yum-updatesd-0.9-6.el5_10.noarch.rpm NOT FOUND
It must be like that. So Here's my python code.
import package, sys, os, subprocess
path = '/var/tools/tools/newrpms.txt'
newrpms = open(path, "r")
fds = newrpms.readline()
def checkrc(rc):
if(rc != 0):
sys.exit(rc)
cmd = package.Errata()
for i in newrpms:
rc = cmd.execute("ls /var/patchbundle/rpms/ | grep %newrpms ")
if ( != 0):
cmd.logprint ("%s not found !" % i)
checkrc(rc)
sys.exit(0)
newrpms.close
Please see the shell script. This script its executing file but because I want to use another language that's why Im trying python
retval=0
for i in $(cat /var/tools/tools/newrpms.txt)
do
ls /var/patchbundle/rpms/ | grep $i
if [ $? != 0 ]
then
echo "$i NOT FOUND!"
retval=255
fi
done
exit $retval
Please see my Python code. What is wrong because it is not executing like the shell executing it.
You don't say what the content of "newrpms.txt" is; you say the script is not executing how you want - but you don't say what it is doing; I don't know what package or package.Errata are, so I'm playing guess-the-problem; but lots of things are wrong.
if ( != 0): is a syntax error. If {empty space} is not equal to zero?
cmd.execute("ls /var/patchbundle/rpms/ | grep %newrpms ") is probably not doing what you want. You can't put a variable in a string in Python like that, and if you could newrpms is the file handle not the current line. That should probably be ...grep %s" % (i,)) ?
The control flow is doing:
Look in this folder, try to find files
Call checkrc()
Only quit with an error if the last file was not found
newrpms.close isn't doing anything, it would need to be newrpms.close() to call the close method.
You're writing shell-script-in-Python. How about:
import os, sys
retval=0
for line in open('/var/tools/tools/newrpms.txt'):
rpm_path = '/var/patchbundle/rpms/' + line.strip()
if not os.path.exists(rpm_path):
print rpm_path, "NOT FOUND"
retval = 255
else:
print rpm_path
sys.exit(retval)
Edited code slightly, and an explanation:
The code is almost a direct copy of the shell script into Python. It loops over every line in the text file, and calls line.strip() to get rid of the newline character at the end. It builds rpm_path which will be something like "/var/patchbundle/rpms/:tzdata-2014j-1.el5.x86_64.rpm".
Then it uses sys.path.exists() which tests if a file exists and returns True if it does, False if it does not, and uses that test to set the error value and print the results like the shell script prints them. This replaces the "ls ... | grep " part of your code for checking if a file exists.
I need to put different codes in one file to many files.
The file is apparantly shared by AWK's creators at their homepage.
The file is also here for easy use.
My attempt to the problem
I can get the lines where each code locate by
awk '{ print $1 }'
However, I do no know how
to get the exact line numbers so that I can use them
to collect codes between the specific lines so that the first word of each line is ignored
to put these separate codes into new files which are named by the first word at the line
I am sure that the problem can be solved by AWK and with Python too. Perhaps, we need to use them together.
[edit] after the first answer
I get the following error when I try to execute it with awk
$awk awkcode.txt
awk: syntax error at source line 1
context is
>>> awkcode <<< .txt
awk: bailing out at source line 1
Did you try to:
Create a file unbundle.awk with the following content:
$1 != prev { close(prev); prev = $1 }
{ print substr($0, index($0, " ") + 1) >$1 }
Remove the following lines form the file awkcode.txt:
# unbundle - unpack a bundle into separate files
$1 != prev { close(prev); prev = $1 }
{ print substr($0, index($0, " ") + 1) >$1 }
Run the following command:
awk -f unbundle.awk awkcode.txt
Are you trying to unpack a file in that format? It's a kind of shell archive. For more information, see http://en.wikipedia.org/wiki/Shar
If you execute that program with awk, awk will create all those files. You don't need to write or rewrite much. You can simply run that awk program, and it should still work.
First, view the file in "plain" format. http://dpaste.com/12282/plain/
Second, save the plain version of the file as 'awkcode.shar'
Third, I think you need to use the following command.
awk -f awkcode.shar
If you want to replace it with a Python program, it would be something like this.
import urllib2, sys
data= urllib2.urlopen( "http://dpaste.com/12282/plain/" )
currName, currFile = None, sys.stdout
for line in data:
fileName, _, text= line.strip().partition(' ')
if fileName == currName:
currFile.write(line+"\n")
else:
if currFile is not None:
currFile.close()
currName= fileName
currFile= open( currName, "w" )
if currFile is not None:
currFile.close()
Awk file awkcode.txt should not contain ANY BLANK line. If any blank line is encountered, the awk program fails. There is no error check to filter out blank line in the code. This I could find out after several days of struggle.