How to get consolidated count of delimiter occurrence - python

I have a requirement to fetch the count the occurrence of '|' in each line of a file then match the count with given inputcount, needs to throw exception when the count is wrong.
Say if the inputcount=3 and the file has following content
s01|test|aaa|hh
S02|test|bbb
so3|test|ccc|oo
then exception should get thrown on executing the line 2 and it should exit the file.
Tried below Awk command to fetch the count for each lines, but I was not sure how to compare and throw the exception, when it not matches
awk ' {print (split($0,a,"\|")-1) }' test.dat
Can anyone please help me with it?

You may use this awk:
awk -v inputcount=3 -F '\\|' 'NF && NF != inputcount+1 {exit 1}' file &&
echo "good" || echo "bad"
Details:
-F '\\|' sets | as input field separator
NF != inputcount+1 will return true if any line doesn't have inputcount pipe delimiters.

$ inputcount=3
$ awk -v c="$inputcount" 'gsub(/\|/,"&") != c{exit 1}' file
$ echo $?
1

As you also tagged the post with python I will write a python answer that could be a simple script.
The core is:
with open(filename) as f:
for n, line in enumerate(f):
if line.count("|") != 3:
print(f"Not valid file at line {n + 1}")
Than you can add some boilerplate:
import fileinput
import sys
with fileinput.input() as f:
for n, line in enumerate(f):
if line.count("|") != 3:
print(f"Not valid file at line {n + 1}")
sys.exit(1)
And with fileinput you can accept almost any sort of input: see Piping to a python script from cmd shell

Maybe try
awk -F '[|]' -v cols="$inputcount" 'NF != cols+1 {
print FILENAME ":" FNR ":" $0 >"/dev/stderr"; exit 1 }' test.dat
The -F argument says to split on this delimiter; the number of resulting fields NF will be one more than there are delimiters, so we scream and die when that number is wrong.

Related

Print Matching line and one line before from the matched line

looking forward to print Matching line in a file on Linux host and one line before from the matched line included into one line.
Below is just the content from the log file:
[2020/02/18 08:25:21.229198, 1] ../source3/lib/smbldap.c:1206(get_cached_ldap_connect)
Connection to LDAP server failed for the 1 try!
[2020/02/18 08:25:21.229221, 2] ../source3/passdb/pdb_ldap_util.c:287(smbldap_search_domain_info)
smbldap_search_domain_info: Problem during LDAPsearch: Timed out
What i have tried:
I have tried following with grep and sed which somehow works..
$ egrep -B 1 "failed|Timed" /var/log/samba/smbd.log.old |tr -d "\n" | sed "s/--/\n/g"
[2020/02/18 08:25:21.229198, 1] ../source3/lib/smbldap.c:1206(get_cached_ldap_connect) Connection to LDAP server failed for the 1 try!
[2020/02/18 08:25:21.229221, 2] ../source3/passdb/pdb_ldap_util.c:287(smbldap_search_domain_info) smbldap_search_domain_info: Problem during LDAPsearch: Timed out
This does not looks to be a cleaner solution, i'm looking forward some expert one lines, one liner is acceptable with awk, sed, grep or even python.
It can be done with awk alone:
awk ' /Timed|failed/ { print previous, $0; }; {previous = $0;}' /var/log/samba/smbd.log.old
This might work for you (GNU sed):
sed -n 'N;/\n.*\(failed\|Timed\)/s/\n//p;D' file
Turn off implicit printing. Append the next line. If the appended line contains failed or Timed, delete the newline and print the result. Delete the first line in the pattern space and repeat.
Could you please try following tac + awk solution:
tac Input_file | awk '/failed/{found=1;val=$0;next} found && NF{print $0,val;val=found=""}'
OR adding a non-one liner form of solution:
tac Input_file |
awk '
/failed/{
found=1
val=$0
next
}
found && NF{
print $0,val
val=found=""
}
'

Execute chained bash commands including multiple pipes and grep in Python3

I have to use the below bash command in a python script which includes multiple pip and grep commands.
grep name | cut -d':' -f2 | tr -d '"'| tr -d ','
I tried to do the same using subprocess module but didn't succeed.
Can anyone help me to run the above command in Python3 scripts?
I have to get the below output from a file file.txt.
Tom
Jack
file.txt contains:
"name": "Tom",
"Age": 10
"name": "Jack",
"Age": 15
Actually I want to know how can run the below bash command using Python.
cat file.txt | grep name | cut -d':' -f2 | tr -d '"'| tr -d ','
This works without having to use the subprocess library or any other os cmd related library, only Python.
my_file = open("./file.txt")
line = True
while line:
line = my_file.readline()
line_array = line.split()
try:
if line_array[0] == '"name":':
print(line_array[1].replace('"', '').replace(',', ''))
except IndexError:
pass
my_file.close()
If you not trying to parse a json file or any other structured file for which using a parser would be the best approach, just change your command into:
grep -oP '(?<="name":[[:blank:]]").*(?=",)' file.txt
You do not need any pipe at all.
This will give you the output:
Tom
Jack
Explanations:
-P activate perl regex for lookahead/lookbehind
-o just output the matching string not the whole line
Regex used: (?<="name":[[:blank:]]").*(?=",)
(?<="name":[[:blank:]]") Positive lookbehind: to force the constraint "name": followed by a blank char and then another double quote " the name followed by a double quote " extracted via (?=",) positive lookahead
demo: https://regex101.com/r/JvLCkO/1

Get a string in Shell/Python using sys.argv

I'm beginning with bash and I'm executing a script :
$ ./readtext.sh ./InputFiles/applications.txt
Here is my readtext.sh code :
#!/bin/bash
filename="$1"
counter=1
while IFS=: true; do
line=''
read -r line
if [ -z "$line" ]; then
break
fi
echo "$line"
python3 ./download.py \
-c ./credentials.json \
--blobs \
"$line"
done < "$filename"
I want to print the string ("./InputFiles/applications.txt") in a python file, I used sys.argv[1] but this line gives me -c. How can I get this string ? Thank you
It is easier for you to pass the parameter "$1" to the internal command python3.
If you don't want to do that, you can still get the external command line parameter with the trick of /proc, for example:
$ cat parent.sh
#!/usr/bin/bash
python3 child.py
$ cat child.py
import os
ext = os.popen("cat /proc/" + str(os.getppid()) + "/cmdline").read()
print(ext.split('\0')[2:-1])
$ ./parent.sh aaaa bbbb
['aaaa', 'bbbb']
Note:
the shebang line in parent.sh is important, or you should execute ./parent.sh with bash, else you will get no command line param in ps or /proc/$PPID/cmdline.
For the reason of [2:-1]: ext.split('\0') = ['bash', './parent.sh', 'aaaa', 'bbbb', ''], real parameter of ./parent.sh begins at 2, ends at -1.
Update: Thanks to the command of #cdarke that "/proc is not portable", I am not sure if this way of getting command line works more portable:
$ cat child.py
import os
ext = os.popen("ps " + str(os.getppid()) + " | awk ' { out = \"\"; for(i = 6; i <= NF; i++) out = out$i\" \" } END { print out } ' ").read()
print(ext.split(" ")[1 : -1])
which still have the same output.
This is the python file that you can use in ur case
import sys
file_name = sys.argv[1]
with open(file_name,"r") as f:
data = f.read().split("\n")
print("\n".join(data))
How to use sys.argv
How to use join method inside my python code

How do I convert this bash loop to python?

How would I do file reading loop in python? I'm trying to convert my bash script to python but have never written python before. FYI, the reason I am read reading the file after a successful command competition is to make sure it reads the most recent edit (say if the URLs were reordered).
Thanks!
#!/bin/bash
FILE=$1
declare -A SUCCESS=()
declare -A FAILED=()
for (( ;; )); do
# find a new link
cat "$FILE" > temp.txt
HASNEW=false
while read; do
[[ -z $REPLY || -n ${SUCCESS[$REPLY]} || -n ${FAILED[$REPLY]} ]] && continue
HASNEW=true
break
done < temp.txt
[[ $HASNEW = true ]] || break
# download
if axel --alternate --num-connections=6 "$REPLY"; then
echo
echo "Succeeded at $DATETIME downloading following link $REPLY"
echo "$DATETIME Finished: $REPLY" >> downloaded-links.txt
echo
SUCCESS[$REPLY]=.
else
echo
echo "Failed at $DATETIME to download following link $REPLY"
echo "$DATETIME Failed: $REPLY" >> failed-links.txt
FAILED[$REPLY]=.
fi
# refresh file
cat "$FILE" > temp.txt
while read; do
[[ -z ${SUCCESS[REPLY]} ]] && echo "$REPLY"
done < temp.txt > "$FILE"
done
This is what I've got so far which is working, and I can't figure out how to make it read the top line of the file after every successful execution of the axel line like the bash script does. I'm open to other options on the subprocess call such as threading, but I'm not sure how to make that work.
#!/usr/bin/env python
import subprocess
from optparse import OptionParser
# create command line variables
axel = "axel --alternate --num-connections=6 "
usage = "usage: %prog [options] ListFile.txt"
parser = OptionParser(usage=usage)
parser.add_option("-s", "--speed", dest="speed",
help="speed in bits per second i.e. 51200 is 50kps", metavar="speedkbps")
(opts, args) = parser.parse_args()
if args[0] is None:
print "No list file given\n"
parser.print_help()
exit(-1)
list_file_1 = args[0]
try:
opts.speed
except NoSpeed:
with open(list_file_1, 'r+') as f:
for line in f:
axel_call = axel + "--max-speed=" + opts.speed + " " + line
# print ("speed option set line send to subprocess is: " + axel_call)
subprocess.call(axel_call, shell=True)
else:
with open(list_file_1, 'r+') as f:
for line in f:
axel_call = axel + line
# print ("no speed option set line send to subprocess is:" + axel_call)
subprocess.call(axel_call, shell=True)
Fully Pythonic way to read a file is the following:
with open(...) as f:
for line in f:
<do something with line>
The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered IO and memory management so you don't have to worry about large files.
There should be one -- and preferably only one -- obvious way to do it.
A demonstration of using a loop to read a file is shown at http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files

Unable to separate codes in one file to many files in AWK/Python

I need to put different codes in one file to many files.
The file is apparantly shared by AWK's creators at their homepage.
The file is also here for easy use.
My attempt to the problem
I can get the lines where each code locate by
awk '{ print $1 }'
However, I do no know how
to get the exact line numbers so that I can use them
to collect codes between the specific lines so that the first word of each line is ignored
to put these separate codes into new files which are named by the first word at the line
I am sure that the problem can be solved by AWK and with Python too. Perhaps, we need to use them together.
[edit] after the first answer
I get the following error when I try to execute it with awk
$awk awkcode.txt
awk: syntax error at source line 1
context is
>>> awkcode <<< .txt
awk: bailing out at source line 1
Did you try to:
Create a file unbundle.awk with the following content:
$1 != prev { close(prev); prev = $1 }
{ print substr($0, index($0, " ") + 1) >$1 }
Remove the following lines form the file awkcode.txt:
# unbundle - unpack a bundle into separate files
$1 != prev { close(prev); prev = $1 }
{ print substr($0, index($0, " ") + 1) >$1 }
Run the following command:
awk -f unbundle.awk awkcode.txt
Are you trying to unpack a file in that format? It's a kind of shell archive. For more information, see http://en.wikipedia.org/wiki/Shar
If you execute that program with awk, awk will create all those files. You don't need to write or rewrite much. You can simply run that awk program, and it should still work.
First, view the file in "plain" format. http://dpaste.com/12282/plain/
Second, save the plain version of the file as 'awkcode.shar'
Third, I think you need to use the following command.
awk -f awkcode.shar
If you want to replace it with a Python program, it would be something like this.
import urllib2, sys
data= urllib2.urlopen( "http://dpaste.com/12282/plain/" )
currName, currFile = None, sys.stdout
for line in data:
fileName, _, text= line.strip().partition(' ')
if fileName == currName:
currFile.write(line+"\n")
else:
if currFile is not None:
currFile.close()
currName= fileName
currFile= open( currName, "w" )
if currFile is not None:
currFile.close()
Awk file awkcode.txt should not contain ANY BLANK line. If any blank line is encountered, the awk program fails. There is no error check to filter out blank line in the code. This I could find out after several days of struggle.

Categories

Resources