Python counting the unique occurences of a string in a file

Python counting the unique occurences of a string in a file - python

I’m trying to count the unique IP addresses in a Apache log-file using python 3.3.1
The thing is I don’t think that it is counting everything correctly.
Here is my code:
import argparse
import os
import sys
from collections import Counter
#
# This function counts the unique IP adresses in the logfile
#
def print_unique_ip(logfile):
IPset = set()
for line in logfile:
head, sep, tail = line.partition(" ")
if(len(head) > 1):
IPset.update(head)
print(len(IPset))
return
#
# This is the main function of the program
#
def main():
parser = argparse.ArgumentParser(description="An appache log file processor")
parser.add_argument('-l', '--log-file', help='This is the log file to work on', required=True)
parser.add_argument('-n', help='Displays the number of unique IP adresses', action='store_true')
parser.add_argument('-t', help='Displays top T IP adresses', type=int)
parser.add_argument('-v', help='Displays the number of visits of a IP adress')
arguments = parser.parse_args()
if(os.path.isfile(arguments.log_file)):
logfile = open(arguments.log_file)
else:
print('The file <', arguments.log_file, '> does not exist')
sys.exit
if(arguments.n == True):
print_unique_ip(logfile)
if(arguments.t):
print_top_n_ip(arguments.t, logfile)
if(arguments.v):
number_of_ocurrences(arguments.v, logfile)
return
if __name__ == '__main__':
main()
I have left put everything else.
When I run it I get
$ python3 assig4.py -l apache_short.log -n
12
But I know that there are more than 12 unique IPs in the file
It doesn’t seem to be giving me the right result. What I am trying to do is to read the file line by line, then when I find an IP address I put it into a set as it only saves unique elements and then I print out the length of said set.

IPset.update(head)
Bug. This will not do what you're expecting. You want to add each IP to your set instead. Examples make it clearest:
>>> s1 = set()
>>> s2 = set()
>>> s1.add('11.22.33.44')
>>> s2.update('11.22.33.44')
>>> s1
set(['11.22.33.44'])
>>> s2
set(['1', '3', '2', '4', '.'])

Related

Using Argparse for Dictionary

I want to read any one of the items from a list of videos. The video reading and display code is the following. This code is working perfectly fine.
import cv2
def VideoReading(vid):
cap = cv2.VideoCapture(vid)
while True:
ret, frame = cap.read()
cv2.imshow('Video', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Since I've large number of videos and I'm calling the code through command line, writing the entire video name is cumbersome. So I created a dictionary. Here given the example of 2:
{"Video1.mp4": 1, 'Video2.mp4': 2}
Now I'm using the following code to call the video using value 1 or 2, rather than Video name. The code is the following:
def Main():
VideoFiles= ["Video1.mp4", "Video2.mp4"]
VideoFilesIndicator = [1, 2]
model_list = {}
for i in range(len(VideoFiles)):
model_list[VideoFiles[i]] = VideoFilesIndicator[i]
print(model_list)
def convertvalues(value):
return model_list.get(value, value)
parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group()
group.add_argument("-v", "--video", help = "add video file name of any format", type = convertvalues,\
choices = [1,2], default = 1)
args =parser.parse_args()
return VideoReading(args.video)
if __name__ == "__main__":
Main()
Now when I'm running the code in cmd "python VideoReading.py -v 2", it's throwing me the following error.
error: argument -v/--video: invalid choice: '2' (choose from 1, 2)
I'm not understanding why I'm getting this error. I'm following this post to build my program.

The problem is that convertvalues is returning '2' as a string, because convertvalues returns value as it is (i.e. a string) when it is not found in model_list. Try with:
def convertvalues(value):
return model_list.get(value, int(value))
Also, as it is, your argument parser will always receive an integer in video in the end (either you passed an integer or convertvalues transformed a video file name into an integer). To get the actual file name again you can do something like
args = parser.parse_args()
video_file = VideoFiles[VideoFilesIndicator.index(args.video)]
return VideoReading(video_file)
My suggestion is based on trying to make the minimal amount of changes to the code. However, you may also consider more changes in the program, like flevinkelming suggests, if you don't feel comfortable with the final shape of the code.

Your dictionary is backwards; you want to map a number to a file name, so that when you enter a number, a file name can be returned. There's no need to provide a default value from convertvalues, because you are using choices to limit the allowable inputs to the valid keys of the dict.
def main():
video_files = ["Video1.mp4", "Video2.mp4"]
model_list = dict(enumerate(video_files, start=1))
print(model_list)
parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group()
group.add_argument("-v", "--video",
help="add video file name of any format",
type=lambda str: model_list[int(str)],
choices=model_list.values())
args = parser.parse_args()
return VideoReading(args.video)

An alternative solution, with minimal code, and dynamic help output for users:
import argparse
def main():
model = {
1: "Video1.mp4",
2: "Video2.mp4",
3: "Video3.mp4"
} # Add more if needed
videos = ['{}({})'.format(v, str(k)) for k, v in model.items()]
help_ = "Videos to choose from: {}".format(', '.join(videos))
parser = argparse.ArgumentParser()
parser.add_argument('-v', '--video', help=help_, type=int, default=1)
args = parser.parse_args()
return VideoReading(model[args.video])
if __name__ == '__main__':
main()
python VideoReading.py -h:
usage: VideoReading.py [-h] [-v VIDEO]
optional arguments:
-h, --help show this help message and exit
-v VIDEO, --v VIDEO
Videos to choose from: Video1.mp4(1), Video2.mp4(2),
Video3.mp4(3)
python VideoReading.py:
If you were printing the selection - Video1.mp4
python VideoReading.py -v 3:
If you were printing the selection - Video3.mp4

Python multiple user arguments to a list

I've got not words to thank you all of you for such great advice. Now everything started to make sense. I apologize for for my bad variable naming. It was just because I wanted to quickly learn and I wont carry out such practices when I write the final script with my own enhancements which will be posted here.
I want to go an another step further by passing the values we've isolated (ip,port,and name) to a template. I tried but couldn't get it right even though I feel close. The text I want to construct looks like this. (
Host Address:<IP>:PORT:<1>
mode tcp
bind <IP>:<PORT> name <NAME>
I have tried this within the working script provided by rahul.(I've edited my original code abiding stackexchange's regulations. Please help out just this once as well. Many thanks in advance.
#!/usr/bin/python
import argparse
import re
import string
p = argparse.ArgumentParser()
p.add_argument("input", help="input the data in format ip:port:name", nargs='*')
args = p.parse_args()
kkk_list = args.input
def func_three(help):
for i in help:
print(i)
for kkk in kkk_list:
bb = re.split(":|,", kkk)
XXX=func_three(bb)
for n in XXX:
ip, port, name = n
template ="""HOST Address:{0}:PORT:{1}
mode tcp
bind {0}:{1} name {2}"""
sh = template.format(ip,port,name)
print sh
orignial post:--
Beginner here. I wrote the below code and it doesn't get me anywhere.
#!/usr/bin/python
import argparse
import re
import string
p = argparse.ArgumentParser()
p.add_argument("INPUT")
args = p.parse_args()
KKK= args.INPUT
bb=re.split(":|,", KKK)
def func_three(help):
for i in help:
#print help
return help
#func_three(bb[0:3])
YY = var1, var2, var3 = func_three(bb[0:3])
print YY
The way to run this script should be "script.py :". i.e: script.py 192.168.1.10:80:string 172.25.16.2:100:string
As you can see if one argument is passed I have no problems. But when there are more arguments I cant determine how to workout the regexes and get this done via a loop.
So to recap, this is how i want the output to look like to proceed further.
192.168.1.10
80
name1
172.25.16.2
100
name2
If there are better other ways to achieve this please feel free to suggest.

I would say what you are doing could be done more simply. If you want to split the input whenever a colon appears you could use:
#!/usr/bin/python
import sys
# sys.argv is the list of arguments you pass when you run the program
# but sys.argv[0] is the actual program name
# so you want to start at sys.argv[1]
for arg in sys.argv[1:]:
listVar = arg.split(':')
for i in listVar:
print i
# Optionally print a new line
print

Please name your variable with respect to context. You will need to use nargs=* for accepting multiple arguments. I have added the updated code below which prints as you wanted.
#!/usr/bin/python
import argparse
import re
import string
p = argparse.ArgumentParser()
p.add_argument("input", help="input the data in format ip:port:name", nargs='*')
args = p.parse_args()
kkk_list = args.input # ['192.168.1.10:80:name1', '172.25.16.2:100:name3']
def func_three(help):
for i in help:
print(i)
for kkk in kkk_list:
bb = re.split(":|,", kkk)
func_three(bb)
print('\n')
# This prints
# 192.168.1.10
# 80
# name1
# 172.25.16.2
# 100
# name3
Updated Code for new requirement
#!/usr/bin/python
import argparse
import re
import string
p = argparse.ArgumentParser()
p.add_argument("input", help="input the data in format ip:port:name", nargs='*')
args = p.parse_args()
kkk_list = args.input # ['192.168.1.10:80:name1', '172.25.16.2:100:name3']
def printInFormat(ip, port, name):
formattedText = '''HOST Address:{ip}:PORT:{port}
mode tcp
bind {ip}:{port} name {name}'''.format(ip=ip,
port=port,
name=name)
textWithoutExtraWhitespaces = '\n'.join([line.strip() for line in formattedText.splitlines()])
# you can break above thing
# text = ""
# for line in formattedText.splitlines():
# text += line.strip()
# text += "\n"
print(formattedText)
for kkk in kkk_list:
ip, port, name = re.split(":|,", kkk)
printInFormat(ip, port, name)
# HOST Address:192.168.1.10:PORT:80
# mode tcp
# bind 192.168.1.10:80 name name1
# HOST Address:172.25.16.2:PORT:100
# mode tcp
# bind 172.25.16.2:100 name name3

Bad variable names aside, if you want to use argparse (which I think is a good habit, even if it is somewhat more complex initially) you should use the nargs='+' option:
#!/usr/bin/env python
import argparse
import re
import string
p = argparse.ArgumentParser()
p.add_argument("INPUT", nargs='+')
args = p.parse_args()
KKK= args.INPUT
def func_three(help):
for i in help:
#print help
return help
for kkk in KKK:
bb=re.split(":|,", kkk)
#func_three(bb[0:3])
YY = var1, var2, var3 = func_three(bb[0:3])
print YY

If you look at the documentation for argparse, you'll notice that there's an nargs argument you can pass to add_argument, which allows you to group more than one input.
For example:
p.add_argument('INPUT', nargs='+')
Would make it so that there is a minimum of one argument, but all arguments will be gathered into a list.
Then you can go through each of your inputs like this:
args = p.parse_args()
for address in args.INPUT:
ip, port = address.split(':')

Copy parameters into list

I am trying to copy parameters passed into a python script to a file. Here is the parameters.
["0013","1","1","\"john.dow#gmail.com\"","1","P123-ND 10Q","10Q H??C"]
I understand that there is a buffer problem and I am getting bad data into my parameters. However, I do not have control over what is being passed in. I am trying to copy, starting at the 5th parameter, the parameters into a file.
f = open(in_file_name, 'w')
for x in range(5, len(arg_list)):
f.write(arg_list[x] + '\n')
f.close()
The result of the file is below:
P123-ND 10Q
10Q H??C
Here is what it should be:
P123-ND
10Q
How can I not include the bad data? What is happening to the spaces between the valid information and the bad information?
As requested, here is the full program:
#!/bin/python
class Argument_Indices:
PRINTER_INDEX = 0
AREA_INDEX = 1
LABEL_INDEX = 2
EMAIL_INDEX = 3
RUN_TYPE_INDEX = 4
import argparse
import json
import os
from subprocess import call
import sys
from time import strftime
def _handle_args():
''' Setup and run argpars '''
parser = argparse.ArgumentParser(description='Set environment variables for and to call Program')
parser.add_argument('time_to_run', default='NOW', choices=['NOW', 'EOP'], help='when to run the report')
parser.add_argument('arguments', nargs='+', help='the remaining command line arguments')
return parser.parse_args()
def _proces_program(arg_list):
time_stamp = strftime("%d_%b_%Y_%H_%M_%S")
printer = arg_list[Argument_Indices.PRINTER_INDEX]
area = arg_list[Argument_Indices.AREA_INDEX]
label = arg_list[Argument_Indices.LABEL_INDEX]
in_file_name = "/tmp/program{0}.inp".format(time_stamp)
os.environ['INPUT_FILE'] = in_file_name
f = open(in_file_name, 'w')
for x in range(5, len(arg_list)):
f.write(arg_list[x])
f.close()
call(['./Program.bin', printer, area, label])
os.remove(in_file_name)
def main():
''' Main Function '''
arg_list = None
args = _handle_args()
if len(args.arguments) < 1:
print('Missing name of input file')
return -1
with open(args.arguments[0]) as input_file:
arg_list = json.load(input_file)
_process_program(arg_list)
return 0
if __name__ == '__main__':
if main() != 0:
print('Program run failed')
sys.exit()

For your exact case (where you're getting duplicated parameters received with some spaces in between) this would work:
received_param_list = ["0013","1","1","\"john.dow#gmail.com\"","1","P123-ND 10Q","10Q H??C"]
arg_list = [i.split(" ")[0] for i in received_param_list]
last_param = received_param_list[-1].split()[-1]
if last_param != arg_list[-1]:
arg_list.append(last_param)
for x in range(5, len(arg_list)):
print (arg_list[x])
Although there might be another simpler way

Cannot output file: no file created

I'm brand new to python, and am struggling to understand why my program will not print despite my best efforts to understand I/O and file handling.
The below code should take in a fastQ or fasta file (for DNA or protein sequences) and prune the sequences according to user-specified quality, then create a new file with the pruned sequences.
The trouble comes when I attempt to run the program from the command line:
python endtrim --min_q 35 --in_33 fQ.txt --out_33 fQ_out.txt
The program runs without incident (no errors or trace issues), but I don't see the file fQ_out.txt being created. Methinks the problem lies somewhere with argparse, since I don't get a help message when running:
python endtrim --help
Can someone please point me in the right direction?
from __future__ import division, print_function
import argparse
import collections
import sys
import re
from string import punctuation
from fastRead import *
ready2trim = ()
def parse_arguments():
"""Creates a bevvy of possible sort arguments from command line and
binds them to their respective names"""
parser = argparse.ArgumentParser("--h", "--help", description=__doc__, \
formatter_class=argparse.\
RawDescriptionHelpFormatter)
options = parse_arguments()
#quality argument
parser.add_argument("--min_qual", action='store', default=30, \
dest='min_qual', help="""Lowest quality value
that can appear in the output""")
#input arguments
parser.add_argument("--in_33", action='store', default=sys.stdin, \
dest='in_33', nargs='?', help="""Input file in fastq format, using Phred+33 coding""")
parser.add_argument("--in_64", action='store', default=sys.stdin, \
dest='in_64', nargs='?', help="""Input file in fastq format, using Phred+64 coding""")
parser.add_argument("--in_fasta", action='store', default=sys.stdin, \
dest='in_fasta', nargs='?', help="""Input fasta format, requires concurrent --in_qual argument""")
parser.add_argument("--in_qual", action='store', default=sys.stdin, \
dest='in_qual', nargs='?', help="""Input quality format, requires concurrent --in_fasta argument""")
#output arguments
parser.add_argument("--out_33", action='store', default=sys.stdout, \
dest='out_33', nargs='?', help="""Output file in fastq format,
using Phred+33 coding""")
parser.add_argument("--out_64", action='store', default=sys.stdout, \
dest='out_64', nargs='?', help="""Output file in fastq format,
using Phred+33 coding""")
parser.add_argument("--out_fasta", action='store', default=sys.stdout, \
dest='out_fasta', nargs='?', help="""Output fasta format,
""")
parser.add_argument("--out_qual", action='store', default=False, \
dest='out_qual', nargs='?', help="""Output quality format,
""")
args = parser.parse_args()
return args
def incoming(args):
"""interprets argparse command and assigns appropriate format for
incoming file"""
if options.in_fasta and options.in_qual:
#ready2trim is the input after being read by fastRead.py
ready2trim = read_fasta_with_quality(open(options.in_fasta), \
open(options.in_qual))
return ready2trim
elif options.in_33:
ready2trim = read_fastq(open(options.in_33))
#phredCode_in specifies the Phred coding of the input fastQ
phredCode_in = 33
return ready2trim
elif options.in_64:
ready2trim = read_fastq(open(options.in_64))
phredCode_in = 64
return ready2trim
else: sys.stderr.write("ERR: insufficient input arguments")
def print_output(seqID, seq, comm, qual):
"""interprets argparse command and creates appropriate format for
outgoing file"""
#Printing a fastQ
if options.out_33 or options.out_64:
if options.out_33:
#phredCode_out specifies the Phred coding of the output fastQ
phredCode_out = 33
if comm:
#outputfh is the file handle of new output file
with open(options.out_33,'a') as outputfh:
outputfh.write("#{}\n{}\n{}\n+".format(seqID, seq, comm))
else:
with open(options.out_33,'a') as outputfh:
outputfh.write("#{}\n{}\n+".format(seqID, seq))
else:
phredCode_out = 64
if comm:
#outputfh is the file handle of new output file
with open(options.out_33,'a') as outputfh:
outputfh.write("#{}\n{}\n{}\n+".format(seqID, seq, comm))
else:
with open(options.out_33,'a') as outputfh:
outputfh.write("#{}\n{}\n+".format(seqID, seq))
print(''.join(str(chr(q+phredCode_out)) for q in qual))
#Print a fasta
if options.out_fasta:
outputfh = open(options.out_fasta, "a")
if(comment == ''):
output.write('>{}\n{}\n'.format(seqID, seq))
else: output.write('>{} {}\n{}\n'.format(seqID, comm, seq))
#Print a qual
if options.out_qual:
outputfh = open(options.out_qual, "a")
if(comment == ''):
output.write('>{}\n{}\n'.format(seqID, seq))
else: output.write('>{} {}\n{}\n'.format(seqID, comm, seq))
def main(args):
"""Prints combined fastq sequence from separate fasta and quality
files according to user-generated arguments """
for (seqID, seq, comm, qual) in ready2trim:
for q in qual:
#i counts satisfactory bases to later print that number of
i = 0
if ord(q) - phredCode_in >= min_qual:
i += 1
print_output(seqID, seq[0:i], comm, qual[0:i])
sys.stderr.write("ERR: sys.stdin is without sequence data")
if __name__ == "__main__" :
sys.exit(main(sys.argv))

parse_arguments seems to be calling itself recursively, while it is not called at all from anywhere else in the program
def parse_arguments():
"""Creates a bevvy of possible sort arguments from command line and
binds them to their respective names"""
parser = argparse.ArgumentParser("--h", "--help", description=__doc__, \
formatter_class=argparse.\
RawDescriptionHelpFormatter)
options = parse_arguments()
Perhaps this options line should be in the main function, or global?

Editing text file through command line argument in Python

I want to edit text file by passing integer number via command line argument in Python. However my code is not working, can some one point me where I am wrong.
import sys, argparse
def main(argv=None):
if argv is None:
argv=sys.argv[1:]
p = argparse.ArgumentParser(description="Editing omnetpp.ini")
p.add_argument('arg1', action='store', default= 1, type=int, help="number of clients")
args = p.parse_args(argv)
n = args.arg1
f = open('C:\\Users\Abcd\Desktop\Omnet\omnetpp.ini', 'a')
for i in range(n):
f.write('*.voipClient['+str(i)+'].udpApp['+str(i)+'].destAddresses = "voipGateway"\n')
f.write('*.voipGateway.udpApp['+str(i)+'].destAddresses = "voipClient['+str(i)+']"\n')
f.close()
If integer number 5 is passed via command line argument then it should add following lines in text file, which is not happening
Output
*.voipClient[0].udpApp[0].destAddresses = "voipGateway"
*.voipGateway.udpApp[0].destAddresses = "voipClient[0]"
*.voipClient[1].udpApp[1].destAddresses = "voipGateway"
*.voipGateway.udpApp[1].destAddresses = "voipClient[1]"
*.voipClient[2].udpApp[2].destAddresses = "voipGateway"
*.voipGateway.udpApp[2].destAddresses = "voipClient[2]"
*.voipClient[3].udpApp[3].destAddresses = "voipGateway"
*.voipGateway.udpApp[3].destAddresses = "voipClient[3]"
*.voipClient[4].udpApp[4].destAddresses = "voipGateway"
*.voipGateway.udpApp[4].destAddresses = "voipClient[4]"
I am following these steps:
Code is saved in test.py
From command line C:\Users\Abcd\Desktop>python test.py 5

Don't close the file in the loop, as soon as it is closed you cannot write to it anymore (in fact, an error should be thrown if you try to write to a closed file object).
Instead, close it after the loop.
Also, to put each sentence on a new line, end the string with the newline symbol \n (sort of pressing "ENTER").
f = open('C:\\Users\Abcd\Desktop\Omnet\omnetpp.ini', 'a')
for i in range(n):
f.write('*.voipClient['+str(i)+'].udpApp['+str(i)+'].destAddresses = "voipGateway"\n')
f.write('*.voipGateway.udpApp['+str(i)+'].destAddresses = "voipClient['+str(i)+']"\n')
f.close()
EDIT
By the way, as Rostyslav Dzinko said in the comments, the way you defined your code is not how you define a main function. In fact, try something like this (see also this SO question):
if __name__ == '__main__':
p = argparse.ArgumentParser(description="Editing omnetpp.ini")
p.add_argument('arg1', action='store', default= 1, type=int, help="number of clients")
args = p.parse_args()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python counting the unique occurences of a string in a file - python

IPset.update(head) Bug. This will not do what you're expecting. You want to add each IP to your set instead. Examples make it clearest: >>> s1 = set() >>> s2 = set() >>> s1.add('11.22.33.44') >>> s2.update('11.22.33.44') >>> s1 set(['11.22.33.44']) >>> s2 set(['1', '3', '2', '4', '.'])

Related

Using Argparse for Dictionary

Python multiple user arguments to a list

Copy parameters into list

Cannot output file: no file created

Editing text file through command line argument in Python

Categories

Resources