Python Regex Either/Or within Line - python

I have done a good many searches but have not been able to find a solution to regex a line and split the values into two variable components. I am using python2.6 and trying to figure out how to regex the integers into value variable and the text into the metric variable. The output information is pulled from a subprocess command running netstat -s.
The below match will only provide the top 6 lines but not the bottom ones where a string is first. I tried using an or conditional within the parenthesis and that did not work, tried (?P<value>[0-9]+|\w+\s[0-9]+) I have been using this site which is really helpful but still no luck, https://regex101.com/r/yV5hA4/3#python
Any help or thoughts of using another method will be appreciated.
Code:
for line in output.split("\n"):
match = re.search(r"(?P<value>[0-9]+)\s(?P<metric>\w+.*)", line, re.I)
if match:
value, metric = match.group('value', 'metric')
print "%s => " % value + metric
What is trying to be regex:
17277 DSACKs received
4 DSACKs for out of order packets received
2 connections reset due to unexpected SYN
10294 connections reset due to unexpected data
48589 connections reset due to early user close
294 connections aborted due to timeout
TCPDSACKIgnoredOld: 15371
TCPDSACKIgnoredNoUndo: 1554
TCPSpuriousRTOs: 2
TCPSackShifted: 6330903
TCPSackMerged: 1883219
TCPSackShiftFallback: 792316

I would just forget about using re here, and just do something like this:
for line in output.split("\n"):
value = None
metric = ""
for word in line.split():
if word.isdigit():
value = int(word)
else:
metric = "{} {}".format(metric, word)
print "{} => {}".format(metric.strip(":"), value)
One slight caveat is that any line that has two or more numbers in it will only report the last one, but that's no worse than how your current approach would deal with that case...
Edit: missed that OP is on Python 2.6, in which case, this should work:
for line in output.split("\n"):
value = None
metric = ""
for word in line.split():
if word.isdigit():
value = int(word)
else:
metric = metric + " " + word
print "%s => %s" % (metric.strip(":"), str(value))

Related

how to replace (or delete) a part of string from txt file in python

i am very new in python (and programming in general) and here is my issue. i would like to replace (or delete) a part of a string from a txt file which contains hundreds or thousands of lines. each line starts with the very same string which i want to delete.
i have not found a method to delete it so i tried a replace it with empty string but for some reason it doesn't work.
here is what i have written:
file = "C:/Users/experimental/Desktop/testfile siera.txt"
siera_log = open(file)
text_to_replace = "Chart: Bar Backtest: NQU8-CME [CB] 1 Min #1 | Study: free dll = 0 |"
for each_line in siera_log:
new_line = each_line.replace("text_to_replace", " ")
print(new_line)
when i print it to check if it was done, i can see that the lines are as they were before. no change was made.
can anyone help me to find out why?
each line starts with the very same string which i want to delete.
The problem is you're passing a string "text_to_replace" rather than the variable text_to_replace.
But, for this specific problem, you could just remove the first n characters from each line:
text_to_replace = "Chart: Bar Backtest: NQU8-CME [CB] 1 Min #1 | Study: free dll = 0 |"
n = len(text_to_replace)
for each_line in siera_log:
new_line = each_line[n:]
print(new_line)
If you quote a variable it becomes a string literal and won't be evaluated as a variable.
Change your line for replacement to:
new_line = each_line.replace(text_to_replace, " ")

Checking to see if a specific string is in a file txt

I'm trying to check if a specific string is in a file text
so i have this file that contains the following:
Active Internet connections
Proto Recv-Q Send-Q Local Address Foreign Address (state) rxbytes txbytes
tcp4 0 0 192.168.1.6.50860 72.21.91.29.http CLOSE_WAIT 892 691
tcp4 0 0 192.168.1.6.50858 www.v.dropbox.co.https ESTABLISHED 27671 7563
tcp4 0 0 192.168.1.6.50857 162.125.17.1.https ESTABLISHED 17581 3642
and here is my code:
char = ""
file = open("location")
for i, line in enumerate(file):
addi = i + 1
if line.strip() == char:
print "MATCH FOUND on line " + str(addi)
print "finished"
For this to work, I have to paste the entire line in my char var. For example, it works if I paste "Active Internet connections", but If I put "Internet", it goes straight to the print "finished" line. How would I fix this?
You need to look for contains (in) rather than equals (==). You can also use a list comprehension to get all the matches then print out the results:
char = "<search-string>"
with open("location") as file:
results = [i for i, line in enumerate(file, 1) if char in line]
if results:
print "MATCHES FOUND on lines " + ', '.join(results)
print "finished"
If you need more complicated search rules, then you may want to look at the regex module re
Might want to try using with open() as for proper file-handling.
And using the in keyword will work better than == because you want a match if it contains your string.
Also, using str.format is more readable IMO than "stuff" + str(value)
find = "Active Internet connections"
with open('location') as f:
for i, line in enumerate(f, 1):
if find in line:
print("Match found on line {}".format(i))
print("finished")
In Python, strings are nothing more than lists of characters. To check if a string exists in another, you can use the in operator.
if char in line:
# do something
As simple as char in line.
Example usage is that "hi" in "hit" will be True, and "hi" in "hello" will be False.
You are checking if the line is in char, but you should do the reverse, since the entire line isn't in the char:
for i, line in enumerate(file):
line_index = i + 1
if char in line:
print "MATCH FOUND on line " + str(line_index)
print "finished"
also, I would recommend not to use char as a variable name. try to use more explicit and less ambiguous names like pattern_to_find
Looking for a sub-string in Python is very simple task. Python's methods find() and count() are very useful in this context.
# This is the string you're looking for
ip = "192.168.1.6.50860"
# You need to do both, open and read file, to get its content
file = open("/home/my/own/directory/here/file.txt").read()
def findLine(text, string):
if string in text:
return "MATCH FOUND on line {}".format(text[0, text.find(string)].count("\n") + 1)
else:
return "MATCH NOT FOUND"
print(findLine(file, ip)) # Prints 3 (1-based indexing)
Try this:
search = "what you want to find goes here"
filename = "file to read"
with open(filename) as f:
for i, line in enumerate(f, 1):
if search in line:
print "MATCH FOUND in line", i

wordcount: reducer python program throws ValueError

I get this error whenever I try running Reducer python program in Hadoop system. The Mapper program is perfectly running though. Have given the same permissions as my Mapper program. Is there a syntax error?
Traceback (most recent call last):
File "reducer.py", line 13, in
word, count = line.split('\t', 1)
ValueError: need more than 1 value to unpack
#!/usr/bin/env python
import sys
# maps words to their counts
word2count = {}
# input comes from STDIN
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# parse the input we got from mapper.py
word, count = line.split('\t', 1)
# convert count (currently a string) to int
try:
count = int(count)
except ValueError:
continue
try:
word2count[word] = word2count[word]+count
except:
word2count[word] = count
# write the tuples to stdout
# Note: they are unsorted
for word in word2count.keys():
print '%s\t%s'% ( word, word2count[word] )
The error ValueError: need more than 1 value to unpack is thrown when you do a multi-assign with too few values on the right hand side. So it looks like line has no \t in it, so line.split('\t',1) results in a single value, causing something like word, count = ("foo",).
I cannot answer in detail.
However, I solved the same issue I had when I removed some extra print I had added in the mapper. Probably it is related with how print works for sys.stdin.
I know probably you have already solved the issue now
I changed line.split('\t', 1) to line.split(' ', 1) and it worked.
It seems that the space is not clear, to be perfectly clear: It should be line.split('(one space here)', 1).

How can I read one line from a telnet response with Python?

I was surprised that I couldn't find this question on here.
I would like to take extract one line from a telnet response and make it a variable. (actually one number from that line). I can extract up to where I need using telnet.read_until(), but the whole beginning is still there. The printout shows different statuses of a machine.
The line I am trying to get is formatted like this:
CPU Utilization : 5 %
I really only need the number, but there are many ':' and '%' characters in the rest of the output. Can anyone help me extract this value? Thanks in advance!
Here is my code (this reads the whole output and prints):
import telnetlib, time
print ("Starting Client...")
host = input("Enter IP Address: ")
timeout = 120
print ("Connecting...")
try:
session = telnetlib.Telnet(host, 23, timeout)
except socket.timeout:
print ("socket timeout")
else:
print("Sending Commands...")
session.write("command".encode('ascii') + b"\r")
print("Reading...")
output = session.read_until(b"/r/n/r/n#>", timeout )
session.close()
print(output)
print("Done")
Edit: some example of what an output could be:
Boot Version : 1.1.3 (release_82001975_C)
Post Version : 1.1.3 (release_82001753_E)
Product VPD Version : release_82001754_C
Product ID : 0x0076
Hardware Strapping : 0x004C
CPU Utilization : 5 %
Uptime : 185 days, 20 hours, 31 minutes, 29 seconds
Current Date/Time : Fri Apr 26 17:50:30 2013
As you say in the question:
I can extract up to where I need using telnet.read_until(), but the whole beginning is still there.
So you can get all of the lines up to and including the one you want into a variable output. The only thing you're missing is how to get just the last line in that output string, right?
That's easy: just split output into lines and take the last one:
output.splitlines()[:-1]
Or just split off the last line:
output.rpartition('\n')[-1]
This doesn't change output, it's just an expression that computes a new value (the last line in output). So, just doing this, followed by print(output), won't do anything visibly useful.
Let's take a simpler example:
a = 3
a + 1
print(a)
That's obviously going to print 3. If you want to print 4, you need something like this:
a = 3
b = a + 1
print(b)
So, going back to the real example, what you want is probably something like this:
line = output.rpartition('\n')[-1]
print(line)
And now you'll see this:
CPU Utilization : 5 %
Of course, you still need something like Johnny's code to extract the number from the rest of the line:
numbers = [int(s) for s in line.split() if s.isdigit()]
print(numbers)
Now you'll get this:
['5']
Notice that gives you a list of one string. If you want just the one string, you still have another step:
number = numbers[0]
print(number)
Which gives you:
5
And finally, number is still the string '5', not the integer 5. If you want that, replace that last bit with:
number = int(numbers[0])
print(number)
This will still print out 5, but now you have a variable you can actually use as a number:
print(number / 100.0) # convert percent to decimal
I'm depending on the fact that telnet defines end-of-line as \r\n, and any not-quite-telnet-compatible server that gets it wrong is almost certainly going to use either Windows-style (also \r\n) or Unix-style (just \n) line endings. So, splitting on \n will always get the last line, even for screwy servers. If you don't need to worry about that extra robustness, you can split on \r\n instead of \n.
There are other ways you could solve this. I would probably either use something like session.expect([r'CPU Utilization\s*: (\d+)\s*%']), or wrap the session as an iterator of lines (like a file) and then just do write the standard itertools solution. But this seems to be simplest given what you already have.
As I understand the problem, you want to select 1 line out of a block of lines, but not necessarily the last line.
The line you're interested in always starts with "CPU Utilization"
This should work:
for line in output.splitlines():
if 'CPU Utilization' in line:
cpu_utilization = line.split()[-2]
If you want to get only numbers:
>>> output = "CPU Utilization : 5 %"
>>> [int(s) for s in output.split() if s.isdigit()]
[5]
>>> output = "CPU Utilization : 5 % % 4.44 : 1 : 2"
>>> [int(s) for s in output.split() if s.isdigit()]
[5, 4.44, 1, 2]
EDIT:
for line in output:
print line # this will print every single line in a loop, so you can make:
print [int(s) for s in line.split() if s.isdigit()]
In [27]: mystring= "% 5 %;%,;;;;;%"
In [28]: ''.join(c for c in mystring if c.isdigit())
Out[28]: '5'
faster way :
def find_digit(mystring):
return filter(str.isdigit, mystring)
find_digit(mystring)
5

How to append two strings in Python?

I have done this operation millions of times, just using the + operator! I have no idea why it is not working this time, it is overwriting the first part of the string with the new one! I have a list of strings and just want to concatenate them in one single string! If I run the program from Eclipse it works, from the command-line it doesn't!
The list is:
["UNH+1+XYZ:08:2:1A+%CONVID%'&\r", "ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&\r", "DUM'&\r"]
I want to discard the first and the last elements, the code is:
ediMsg = ""
count = 1
print "extract_the_info, lineList ",lineList
print "extract_the_info, len(lineList) ",len(lineList)
while (count < (len(lineList)-1)):
temp = ""
# ediMsg = ediMsg+str(lineList[count])
# print "Count "+str(count)+" ediMsg ",ediMsg
print "line value : ",lineList[count]
temp = lineList[count]
ediMsg += " "+temp
print "ediMsg : ",ediMsg
count += 1
print "count ",count
Look at the output:
extract_the_info, lineList ["UNH+1+XYZ:08:2:1A+%CONVID%'&\r", "ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&\r", "DUM'&\r"]
extract_the_info, len(lineList) 8
line value : ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&
ediMsg : ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&
count 2
line value : DUM'&
DUM'& : ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&
count 3
Why is it doing so!?
While the two answers are correct (use " ".join()), your problem (besides very ugly python code) is this:
Your strings end in "\r", which is a carriage return. Everything is fine, but when you print to the console, "\r" will make printing continue from the start of the same line, hence overwrite what was written on that line so far.
You should use the following and forget about this nightmare:
''.join(list_of_strings)
The problem is not with the concatenation of the strings (although that could use some cleaning up), but in your printing. The \r in your string has a special meaning and will overwrite previously printed strings.
Use repr(), as such:
...
print "line value : ", repr(lineList[count])
temp = lineList[count]
ediMsg += " "+temp
print "ediMsg : ", repr(ediMsg)
...
to print out your result, that will make sure any special characters doesn't mess up the output.
'\r' is the carriage return character. When you're printing out a string, a '\r' will cause the next characters to go at the start of the line.
Change this:
print "ediMsg : ",ediMsg
to:
print "ediMsg : ",repr(ediMsg)
and you will see the embedded \r values.
And while your code works, please change it to the one-liner:
ediMsg = ' '.join(lineList[1:-1])
Your problem is printing, and it is not string manipulation. Try using '\n' as last char instead of '\r' in each string in:
lineList = [
"UNH+1+TCCARQ:08:2:1A+%CONVID%'&\r",
"ORG+1A+77499505:PARAF0103+++A+FR:EUR++11730788+1A'&\r",
"DUM'&\r",
"FPT+CC::::::::N'&\r",
"CCD+CA:5132839000000027:0450'&\r",
"CPY+++AF'&\r",
"MON+712:1.00:EUR'&\r",
"UNT+8+1'\r"
]
I just gave it a quick look. It seems your problem arises when you are printing the text. I haven't done such things for a long time, but probably you only get the last line when you print. If you check the actual variable, I'm sure you'll find that the value is correct.
By last line, I'm talking about the \r you got in the text strings.

Categories

Resources