Print out lines that begin with two different string outputs? - python
I am trying to scan an input file and print out parts of lines that begin with a certain string. The text file is 10000+ lines, but I am only concerned with the beginning line, and more specifically the data within that line. For clarification, here are two lines of code which explain what I am trying to say.
inst "N69" "IOB",placed BIOB_X11Y0 R8 ,
inst "n0975" "SLICEX",placed CLEXL_X20Y5 SLICE_X32Y5 ,
Here is the code that I have gotten to so far:
searchfile = open("C:\PATH\TO\FILE.txt","r")
for line in searchfile:
if "inst " in line:
print line
searchfile.close()
Now this is great if I am looking for all lines that start with "inst", but I am specifically looking for lines that start with "inst "N"" or "inst "n"". From there, I wanted to extract just the string starting with N or n.
My idea was to first extract those lines (as shown above) to a new .txt file, then run another script to get only the portions of the lines that have N or n. In the example above, I am only concerned with N69 and n0975. Is there an easier method of doing this?
Yes with the re module.
re.finditer(r'^inst\s+\"n(\d+)\"', the_whole_file, re.I)
Will return you an iterator of all the matches.
For each match you will need to do .group(1) to get those numbers you wanted.
Notice that you don't need to filter the file first using this method. You can do this for the whole file.
The output in your case will be:
69
0975
With re.search() function:
Sample file.txt content:
inst "N69" "IOB",placed BIOB_X11Y0 R8 ,
some text
inst "n0975" "SLICEX",placed CLEXL_X20Y5 SLICE_X32Y5 ,
text
another text
import re
with open('file.txt', 'r') as f:
for l in f.read().splitlines():
m = re.search(r'^inst "([Nn][^"]+)"', l)
if m:
print(m.group(1))
The output:
N69
n0975
Here is one solution:
with open('nfile.txt','r') as f:
for line in f:
if line.startswith('inst "n') or line.startswith('inst "N'):
print line.split()[1]
For each line in the file startswith part checks if the line starts with one of your target patters. If yes, it splits the line using split and prints the second component which is the part with n or N.
Related
'regular expression in <string>' requires string as left operand, not list
I am new to python and I don't seem to find why the second script does not work when using regular expressions. Use case: I want to extract entries starting with "crypto map IPSEC xx ipsec-isakmp" from a Cisco running configuration file and print this line and the next 4. I have managed to print the lines after the match but not the matched line itself. My workaround for this is to print the text "crypto map IPSEC" statically first. The script will then print me the next 4 lines using "islice". As this is not perfect I wanted to use regular expression. This does not work at all. >>>>>> from itertools import islice import re #This works print('Crypto map configurations: \n') with open('show_run.txt', 'r') as f: for line in f: if 'crypto map IPSEC' and 'ipsec-isakmp' in line: print('crypto map IPSEC') print(''.join(islice(f, 4))) f.close() # The following does not work. # Here I would like to use regular expressions to fetch the lines # with "crypto map IPSEC xx ipsec-isakmp" # ''' print('Crypto map configurations: \n') with open('show_run.txt', 'r') as f: for line in f: pattern = r"crypto\smap\sIPSEC\s\d+\s.+" matched = re.findall(pattern, line) if str(matched) in line: print(str(matched)) print(''.join(islice(f, 4))) f.close() '''
if 'crypto map IPSEC' and 'ipsec-isakmp' in line: should be: if 'crypto map IPSEC' in line and 'ipsec-isakmp' in line: Another alternative (if the line looks like what you described in the question): if line.startswith('crypto map IPSEC') and line.endswith('ipsec-isakmp'): ... And in: print(''.join(islice(f, 4))) You probably want to parse the line not f. As for your question about regex: no need to parse it using a regex (consider previous parts of this answer) as it's running much slower and usually harder to maintain. That said, if this question is for learning, you can do: import re line = 'crypto map IPSEC 12345 ipsec-isakmp' pattern = r'crypto map IPSEC (\d+) ipsec-isakmp' matched = re.findall(pattern, line) if matched: print(matched[0]) See repl
I want to extract entries starting with "crypto map IPSEC xx ipsec-isakmp" from a Cisco running configuration file and print this line and the next 4. Then you're making it much more complicated than it has to be: for line in f: if line.startswith("crypto map IPSEC") and "ipsec-isakmp" in line: print(line.strip()) for i in range(4): try: print next(f).strip() except StopIteration: # we're reached the end of file and there weren't 4 lines left # after the last "crypto map IPSEC" line. Sh!t happens... break nb: if you really insist on use regexps, replace the second line with if re.match(r"^crypto map IPSEC \d+ ipsec-isakmp", line): (assuming this is the correct pattern of course - hard to tell for sure without seeing your real data)
Concatenate lines with previous line based on number of letters in first column
New to coding and trying to figure out how to fix a broken csv file to make be able to work with it properly. So the file has been exported from a case management system and contains fields for username, casenr, time spent, notes and date. The problem is that occasional notes have newlines in them and when exporting the csv the tooling does not contain quotation marks to define it as a string within the field. see below example: user;case;hours;note;date; tnn;123;4;solved problem;2017-11-27; tnn;124;2;random comment;2017-11-27; tnn;125;3;I am writing a comment that contains new lines without quotation marks;2017-11-28; HJL;129;8;trying to concatenate lines to re form the broken csv;2017-11-29; I would like to concatenate lines 3,4 and 5 to show the following: tnn;125;3;I am writing a comment that contains new lines without quotation marks;2017-11-28; Since every line starts with a username (always 3 letters) I thought I would be able to iterate the lines to find which lines do not start with a username and concatenate that with the previous line. It is not really working as expected though. This is what I have got so far: import re with open('Rapp.txt', 'r') as f: for line in f: previous = line #keep current line in variable to join next line if not re.match(r'^[A-Za-z]{3}', line): #regex to match 3 letters print(previous.join(line)) Script shows no output just finishes silently, any thoughts?
I think I would go a slightly different way: import re all_the_data = "" with open('Rapp.txt', 'r') as f: for line in f: if not re.search("\d{4}-\d{1,2}-\d{1,2};\n", line): line = re.sub("\n", "", line) all_the_data = "".join([all_the_data, line]) print (all_the_data) There a several ways to do this each with pros and cons, but I think this keeps it simple. Loop the file as you have done and if the line doesn't end in a date and ; take off the carriage return and stuff it into all_the_data. That way you don't have to play with looking back 'up' the file. Again, lots of way to do this. If you would rather use the logic of starts with 3 letters and a ; and looking back, this works: import re all_the_data = "" with open('Rapp.txt', 'r') as f: all_the_data = "" for line in f: if not re.search("^[A-Za-z]{3};", line): all_the_data = re.sub("\n$", "", all_the_data) all_the_data = "".join([all_the_data, line]) print ("results:") print (all_the_data) Pretty much what was asked for. The logic being if the current line doesn't start right, take out the previous line's carriage return from all_the_data. If you need help playing with the regex itself, this site is great: http://regex101.com
The regex in your code matches to all the lines (string) in the txt (finds a valid match to the pattern). The if condition is never true and hence nothing prints. with open('./Rapp.txt', 'r') as f: join_words = [] for line in f: line = line.strip() if len(line) > 3 and ";" in line[0:4] and len(join_words) > 0: print(';'.join(join_words)) join_words = [] join_words.append(line) else: join_words.append(line) print(";".join(join_words)) I've tried to not use regex here to keep it a little clear if possible. But, regex is a better option.
A simple way would be to use a generator that acts as a filter on the original file. That filter would concatenate a line to the previous one if it has not a semicolon (;) in its 4th column. Code could be: def preprocess(fd): previous = next(fd) for line in fd: if line[3] == ';': yield previous previous = line else: previous = previous.strip() + " " + line yield previous # don't forget last line! You could then use: with open(test.txt) as fd: rd = csv.DictReader(preprocess(fd)) for row in rd: ... The trick here is that the csv module only requires on object that returns a line each time next function is applied to it, so a generator is appropriate. But this is only a workaround and the correct way would be that the previous step directly produces a correct CSV file.
Python, Extracting 3 lines before and after a match
I am trying to figure out how to extract 3 lines before and after a matched word. At the moment, my word is found. I wrote up some text to test my code. And, I figured out how to print three lines after my match. But, I am having difficulty trying to figure out how to print three lines before the word, "secure". Here is what I have so far: from itertools import islice with open("testdoc.txt", "r") as f: for line in f: if "secure" in line: print("".join(line)) print ("".join(islice(f,3))) Here is the text I created for testing: ---------------------------- This is a test to see if i can extract information using this code I hope, I try, maybe secure shell will save thee Im adding extra lines to see my output hoping that it comes out correctly boy im tired, sleep is nice until then, time will suffice
You need to buffer your lines so you can recall them. The simplest way is to just load all the lines into a list: with open("testdoc.txt", "r") as f: lines = f.readlines() # read all lines into a list for index, line in enumerate(lines): # enumerate the list and loop through it if "secure" in line: # check if the current line has your substring print(line.rstrip()) # print the current line (stripped off whitespace) print("".join(lines[max(0,index-3):index])) # print three lines preceeding it But if you need maximum storage efficiency you can use a buffer to store the last 3 lines as you loop over the file line by line. A collections.deque is ideal for that.
i came up with this solution, just adding the previous lines in a list, and deleting the first one after 4 elements from itertools import islice with open("testdoc.txt", "r") as f: linesBefore = list() for line in f: linesBefore.append(line.rstrip()) if len(linesBefore) > 4: #Adding up to 4 lines linesBefore.pop(0) if "secure" in line: if len(linesBefore) == 4: # if there are at least 3 lines before the match for i in range(3): print(linesBefore[i]) else: #if there are less than 3 lines before the match print(''.join(linesBefore)) print("".join(line.rstrip())) print ("".join(islice(f,3)))
reading data from multiple lines as a single item
I have a set of data from a file as such "johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,\ 00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,\ 77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,\ 00,2e,00,77,00,61,00,76,00,ff,00 "johnnyboy"="gotwastedatthehouse" "johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,\ 00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,\ 77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,\ 00,2e,00,77,00,61,00,76,00,ff,00 [mattplayhouse\wherecanwego\tothepoolhall] How can I read/reference the text per "johnnyboy"=splice(23) as as single line as such: "johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,00,2e,00,77,00,61,00,76,00,ff,00 I am currently matching he regex based on splice(23): with a search as follows: re_johnny = re.compile('splice') with open("file.txt", 'r') as file: read = file.readlines() for line in read: if re_johnny.match(line): print(line) I think I need to take and remove the backslashes and the spaces to merge the lines but am unfamiliar with how to do that and not obtain the blank lines or the new line that is not like my regex. When trying the first solution attempt, my last row was pulled inappropriately. Any assistance would be great.
Input file: fin "johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,\ 00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,\ 77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,\ 00,2e,00,77,00,61,00,76,00,ff,00 "johnnyboy"="gotwastedatthehouse" "johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,\ 00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,\ 77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,\ 00,2e,00,77,00,61,00,76,00,ff,00 [mattplayhouse\wherecanwego\tothepoolhall] Adding to tigerhawk's suggestion you can try something like this: Code: import re with open('fin', 'r') as f: for l in [''.join([b.strip('\\') for b in a.split()]) for a in f.read().split('\n\n')]: if 'splice' in l: print(l) Output: "johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,00,2e,00,77,00,61,00,76,00,ff,00 "johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,00,2e,00,77,00,61,00,76,00,ff,00
With regex you have multiplied your problems. Instead, keep it simple: If a line starts with ", it begins a record. Else, append it to the previous record. You can implement parsing for such a scheme in just a few lines in Python. And you don't need regex.
How to use regex to escape some info in python?
I have a text file and I read that using Python. It starts with a web address and provides other info starts with (y) or (n). Between the lines, there might be few blank lines. For example the text file can be like this, http://usatoday30.usatoday.com/money/industries/energy/2005-12-28-enron-participants_x.htm (y) Lay, Kenneth (y) Skilling, Jeffrey (n) Howard, Kevin (n) Krautz, Michael I would like have the names starts with (y) and returns as list. Say, for this case the return list would be like this, result = ["Lay, Kenneth", "Skilling, Jeffrey"] I read the data as following, poi_names_data = open("../final_project/poi_names.txt", "r") for row in poi_names_data: print row, "\n" How to extract the right info from the row?
As suggested in the comments, you can use startswith to decide if you are going to process the row and use re.sub to remove (y), leading spaces and line breaks \n, after that it should give you the expected output: import re result = [] with open("test.txt") as text: for row in text: if row.startswith("(y)"): result.append(re.sub(r"\(y\)\s+|\n", "", row)) result # ['Lay, Kenneth', 'Skilling, Jeffrey']
I'd recommend read file line by line and process accordingly. The reason is that if your file is big, really big then it will be much better performance and less memory footprint. import io import re result = [] rx = re.compile(r'(?<=\(y\)).*', re.MULTILINE) with open('data.txt','r+') as f: for line in f: match = rx.search(line) if match: result.append(match.group(0).strip()) print(result) I'll get following output from your sample data. (assuming data is stored in file test.txt) ['Lay, Kenneth', 'Skilling, Jeffrey']