PYTHON : How to find block comments string in Python - python

I want to parse block comments in python. I order to do that I need to find comments section in .py file.
For example, my sample.py contains
'''
This is sample file for testing
This is sample file for development
'''
import os
from os import path
def somefun():
return
I want to parse the comments section for which I am trying to do following in sampleparsing.py
tempfile = open('sample.py', 'r')
for line in tempfile:
if(line.find(''''') != -1):
dosomeoperation()
break
Since if(line.find(''''') != -1): assumes as remaining lines below as commented how to replace this string to find comments?
I tried keeping '\' (escape character) in between but I could not find out solution for this issue.
I need following two lines after parsing in sampleparsing.py:
This is sample file for testing
This is sample file for development

Try "'''" as your string instead. Right now Python thinks you are still writing a string.

Try this
tempfile = open('sample.py', 'r')
lines = tempfile.readlines()
comments = lines.split( "'''" )
tempfile.close()
Assuming the file starts with a comment block, the comment blocks should now be the odd numbered indices, 1, 3, 5...

Related

Searching file for two strings with no duplicates, cut paste into file

I have a file of configs. I am trying to get my python code to search for two different strings in a text file, copy (Cut would make my life so much easier) and paste them into a text file without duplicates. My code is working for just one string and every time I try to make it do two it will either not work or only find the lines with both strings.
What am I doing wrong?
import sys
with open("ns-batch.bak.txt") as f:
lines = f.readlines()
lines = [l for l in lines if "10.42.88.192"
in l]
with open("Py_parse2.txt", "w") as f1:
f1.writelines(lines)
Okay, here's my take on things.
Assuming that you are looking for certain strings within each line, and then want to "copy" those lines to another file to see in which lines those strings were found, this, for example, should work:
lines = list()
with open("ns-batch.bak.txt", "r") as orig_file:
for line in orig_file:
if ("12.32.45.1" in line) or ("27.82.1.0" in line): #if "12.32.45.1" in line:
lines.append(line)
with open("Py_parse2.txt", "x") as new_file:
for line in lines:
new_file.write(line + '\n')
Depending on how many strings you are looking for on each line, you can either add or remove in statements on line 5 of my example code (I also provided an example line of code on the same line that demonstrates only needing to find one string on a line, which I commented out). The import sys statement does absolutely nothing in this case; the sys module/package is not needed to do this work, so do not include that import statement. If you want to learn more about file I/O, check out this link ( https://docs.python.org/3/tutorial/inputoutput.html?highlight=write ) and go to section "7.2 Reading and Writing Files".

Code for coyping specific lines from multiple files to a single file (and removing part of the copied lines)

First of all, I am really new to this. I've been reading up on some tutorials over the past days, but now I've hit a wall with what I want to achieve.
To give you the long version: I have multiple files in a directory, all of which contain information in certain lines (23-26). Now, the code would have to find and open all files (naming pattern: *.tag) and then copy lines 23-26 to a new single file. (And add a new line after each new entry...). Optionally it would also remove a specific part from each line that I do not need:
C12b2
-> everything before C12b2 (or similar) would need to be removed.
Thus far I have managed to copy those lines from a single file to a new file, but the rest still eludes me: (no idea how formatting works here)
f = open('2.tag')
n = open('output.txt', 'w')
for i, text in enumerate(f):
if i >= 23 and i < 27:
n.write(text)
else:
pass
Could anyone give me some advice ? I do not need a complete code as an answer, however, good tutorials that don't skip explanations seem to be hard to come by.
You can look at the glob module , it gives a list of filenames that match the pattern you provide it , please note this pattern is not regex , it is shell-style pattern (using shell-style wildcards).
Example of glob -
>>> import glob
>>> glob.glob('*.py')
['a.py', 'b.py', 'getpip.py']
You can then iterate over each of the file returned by the glob.glob() function.
For each file you can do that same thing you are doing right now.
Then when writing files, you can use str.find() to find the first instance of the string C12b2 and then use slicing to remove of the part you do not want.
As an example -
>>> s = "asdbcdasdC12b2jhfasdas"
>>> s[s.find("C12b2"):]
'C12b2jhfasdas'
You can do something similar for each of your lines , please note if the usecase if that only some lines would have C12b2 , then you need to first check whether that string is present in the line, before doing the above slicing. Example -
if 'C12b2' in text:
text = text[text.find("C12b2"):]
You can do above before writing the line into the output file.
Also, would be good to look into the with statement , you can use it for openning files, so that it will automatically handle closing the file, when you are done with the processing.
Without importing anything but os:
#!/usr/bin/env python3
import os
# set the directory, the outfile and the tag below
dr = "/path/to/directory"; out = "/path/to/newfile"; tag = ".txt"
for f in [f for f in os.listdir(dr) if f.endswith(".txt")]:
open(out, "+a").write(("").join([l for l in open(dr+"/"+f).readlines()[22:25]])+"\n")
What it does
It does exactly as you describe, it:
collects a defined region of lines from all files (that is: of a defined extension) in a directory
pastes the sections into a new file, separated by a new line
Explanation
[f for f in os.listdir(dr) if f.endswith(".tag")]
lists all files of the specific extension in your directory,
[l for l in open(dr+"/"+f).readlines()[22:25]]
reads the selected lines of the file
open(out, "+a").write()
writes to the output file, creates it if it does not exist.
How to use
Copy the script into an empty file, save it as collect_lines.py
set in the head section the directory with your files, the path to the new file and the extension
run it with the command:
python3 /path/to/collect_lines.py
The verbose version, with explanation
If we "decompress" the code above, this is what happens:
#!/usr/bin/env python3
import os
#--- set the path to the directory, the new file and the tag below
dr = "/path/to/directory"; out = "/path/to/newfile"; tag = ".txt"
#---
files = os.listdir(dr)
for f in files:
if f.endswith(tag):
# read the file as a list of lines
content = open(dr+"/"+f).readlines()
# the first item in a list = index 0, so line 23 is index 22
needed_lines = content[22:25]
# convert list to string, add a new line
string_topaste = ("").join(needed_lines)+"\n"
# add the lines to the new file, create the file if necessary
open(out, "+a").write(string_topaste)
Using the glob package you can get a list of all *.tag files:
import glob
# ['1.tag', '2.tag', 'foo.tag', 'bar.tag']
tag_files = glob.glob('*.tag')
If you open your file using the with statement, it is being closed automatically afterwards:
with open('file.tag') as in_file:
# do something
Use readlines() to read your entire file into a list of lines, which can then be sliced:
lines = in_file.readlines()[22:26]
If you need to skip everything before a specific pattern, use str.split() to separate the string at the pattern and take the last part:
pattern = 'C12b2'
clean_lines = [line.split(pattern, 1)[-1] for line in lines]
Take a look at this example:
>>> lines = ['line 22', 'line 23', 'Foobar: C12b2 line 24']
>>> pattern = 'C12b2'
>>> [line.split(pattern, 1)[-1] for line in lines]
['line 22', 'line 23', ' line 24']
You can realines and writelines using a and b as line bounds for the slice of lines to write:
with open('oldfile.txt', 'r') as old:
lines = old.readlines()[a:b]
with open('newfile.txt', 'w') as new:
new.writelines(lines)

Python failing to read lines properly

I'm supposed to open a file, read it line per line and display the lines out.
Here's the code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import re
in_path = "../vas_output/Glyph/20140623-FLYOUT_mins_cleaned.csv"
out_path = "../vas_gender/Glyph/"
csv_read_line = open(in_path, "rb").read().split("\n")
line_number = 0
for line in csv_read_line:
line_number+=1
print str(line_number) + line
Here's the contents of the input file:
12345^67890^abcedefg
random^test^subject
this^sucks^crap
And here's the result:
this^sucks^crapjectfg
Some weird combo of all three. In addition to this, the result of line_number is missing. Printing out the result of len(csv_read_line) outputs 1, for some reason, no matter how many is in the input file. Changing the split type from \n to ^ gives the expected output, though, so I'm assuming the problem is probably with the input file.
I'm using a Mac, and did both the python code and the input file (on Sublime Text) on the Mac itself.
Am I missing something?
You seem to be splitting on "\n" which isn't necessary, and could be incorrect depending on the line terminators used in the input file. Python includes functionality to iterate over the lines of a file one at a time. The advantages are that it will worry about processing line terminators in a portable way, as well as not requiring the entire file to be held in memory at once.
Further, note that you are opening the file in binary mode (the b character in your mode string) when you actually intend to read the file as text. This can cause problems similar to the one you are experiencing.
Also, you do not close the file when you are done with it. In this case that isn't a problem, but you should get in the habit of using with blocks when possible to make sure the file gets closed at the earliest possible time.
Try this:
with open(in_path, "r") as f:
line_number = 0
for line in f:
line_number += 1
print str(line_number) + line.rstrip('\r\n')
So your example just works for me.
But then, i just copied your text into a text editor on linux, and did it that way, so any carriage returns will have been wiped out.
Try this code though:
import os
in_path = "input.txt"
with open(in_path, "rb") as inputFile:
for lineNumber, line in enumerate(inputFile):
print lineNumber, line.strip()
It's a little cleaner, and the for line in file style deals with line breaks for you in a system independent way - Python's open has universal newline support.
I'd try the following Pythonic code:
#!/usr/bin/env python
in_path = "../vas_output/Glyph/20140623-FLYOUT_mins_cleaned.csv"
out_path = "../vas_gender/Glyph/"
with open(in_path, 'rb') as f:
for i, line in enumerate(f):
print(str(i) + line)
There are several improvements that can be made here to make it more idiomatic python.
import csv
in_path = "../vas_output/Glyph/20140623-FLYOUT_mins_cleaned.csv"
out_path = "../vas_gender/Glyph/"
#Lets open the file and make sure that it closes when we unindent
with open(in_path,"rb") as input_file:
#Create a csv reader object that will parse the input for us
reader = csv.reader(input_file,delimiter="^")
#Enumerate over the rows (these will be lists of strings) and keep track of
#of the line number using python's built in enumerate function
for line_num, row in enumerate(reader):
#You can process whatever you would like here. But for now we will just
#print out what you were originally printing
print str(line_num) + "^".join(row)

Write strings to another file

The Problem - Update:
I could get the script to print out but had a hard time trying to figure out a way to put the stdout into a file instead of on a screen. the below script worked on printing results to the screen. I posted the solution right after this code, scroll to the [ solution ] at the bottom.
First post:
I'm using Python 2.7.3. I am trying to extract the last words of a text file after the colon (:) and write them into another txt file. So far I am able to print the results on the screen and it works perfectly, but when I try to write the results to a new file it gives me str has no attribute write/writeline. Here it the code snippet:
# the txt file I'm trying to extract last words from and write strings into a file
#Hello:there:buddy
#How:areyou:doing
#I:amFine:thanks
#thats:good:I:guess
x = raw_input("Enter the full path + file name + file extension you wish to use: ")
def ripple(x):
with open(x) as file:
for line in file:
for word in line.split():
if ':' in word:
try:
print word.split(':')[-1]
except (IndexError):
pass
ripple(x)
The code above works perfectly when printing to the screen. However I have spent hours reading Python's documentation and can't seem to find a way to have the results written to a file. I know how to open a file and write to it with writeline, readline, etc, but it doesn't seem to work with strings.
Any suggestions on how to achieve this?
PS: I didn't add the code that caused the write error, because I figured this would be easier to look at.
End of First Post
The Solution - Update:
Managed to get python to extract and save it into another file with the code below.
The Code:
inputFile = open ('c:/folder/Thefile.txt', 'r')
outputFile = open ('c:/folder/ExtractedFile.txt', 'w')
tempStore = outputFile
for line in inputFile:
for word in line.split():
if ':' in word:
splitting = word.split(':')[-1]
tempStore.writelines(splitting +'\n')
print splitting
inputFile.close()
outputFile.close()
Update:
checkout droogans code over mine, it was more efficient.
Try this:
with open('workfile', 'w') as f:
f.write(word.split(':')[-1] + '\n')
If you really want to use the print method, you can:
from __future__ import print_function
print("hi there", file=f)
according to Correct way to write line to file in Python. You should add the __future__ import if you are using python 2, if you are using python 3 it's already there.
I think your question is good, and when you're done, you should head over to code review and get your code looked at for other things I've noticed:
# the txt file I'm trying to extract last words from and write strings into a file
#Hello:there:buddy
#How:areyou:doing
#I:amFine:thanks
#thats:good:I:guess
First off, thanks for putting example file contents at the top of your question.
x = raw_input("Enter the full path + file name + file extension you wish to use: ")
I don't think this part is neccessary. You can just create a better parameter for ripple than x. I think file_loc is a pretty standard one.
def ripple(x):
with open(x) as file:
With open, you are able to mark the operation happening to the file. I also like to name my file object according to its job. In other words, with open(file_loc, 'r') as r: reminds me that r.foo is going to be my file that is being read from.
for line in file:
for word in line.split():
if ':' in word:
First off, your for word in line.split() statement does nothing but put the "Hello:there:buddy" string into a list: ["Hello:there:buddy"]. A better idea would be to pass split an argument, which does more or less what you're trying to do here. For example, "Hello:there:buddy".split(":") would output ['Hello', 'there', 'buddy'], making your search for colons an accomplished task.
try:
print word.split(':')[-1]
except (IndexError):
pass
Another advantage is that you won't need to check for an IndexError, since you'll have, at least, an empty string, which when split, comes back as an empty string. In other words, it'll write nothing for that line.
ripple(x)
For ripple(x), you would instead call ripple('/home/user/sometext.txt').
So, try looking over this, and explore code review. There's a guy named Winston who does really awesome work with Python and self-described newbies. I always pick up new tricks from that guy.
Here is my take on it, re-written out:
import os #for renaming the output file
def ripple(file_loc='/typical/location/while/developing.txt'):
outfile = "output.".join(os.path.basename(file_loc).split('.'))
with open(outfile, 'w') as w:
lines = open(file_loc, 'r').readlines() #everything is one giant list
w.write('\n'.join([line.split(':')[-1] for line in lines]))
ripple()
Try breaking this down, line by line, and changing things around. It's pretty condensed, but once you pick up comprehensions and using lists, it'll be more natural to read code this way.
You are trying to call .write() on a string object.
You either got your arguments mixed up (you'll need to call fileobject.write(yourdata), not yourdata.write(fileobject)) or you accidentally re-used the same variable for both your open destination file object and storing a string.

How to Find a String in a Text File And Replace Each Time With User Input in a Python Script?

I am new to python so excuse my ignorance.
Currently, I have a text file with some words marked as <>.
My goal is to essentially build a script which runs through a text file with such marked words. Each time the script finds such a word, it would ask the user for what it wants to replace it with.
For example, if I had a text file:
Today was a <<feeling>> day.
The script would run through the text file so the output would be:
Running script...
feeling? great
Script finished.
And generate a text file which would say:
Today was a great day.
Advice?
Edit: Thanks for the great advice! I have made a script that works for the most part like I wanted. Just one thing. Now I am working on if I have multiple variables with the same name (for instance, "I am <>. Bob is also <>.") the script would only prompt, feeling?, once and fill in all the variables with the same name.
Thanks so much for your help again.
import re
with open('in.txt') as infile:
text = infile.read()
search = re.compile('<<([^>]*)>>')
text = search.sub(lambda m: raw_input(m.group(1) + '? '), text)
with open('out.txt', 'w') as outfile:
outfile.write(text)
Basically the same solution as that offerred by #phihag, but in script form
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import argparse
import re
from os import path
pattern = '<<([^>]*)>>'
def user_replace(match):
return raw_input('%s? ' % match.group(1))
def main():
parser = argparse.ArgumentParser()
parser.add_argument('infile', type=argparse.FileType('r'))
parser.add_argument('outfile', type=argparse.FileType('w'))
args = parser.parse_args()
matcher = re.compile(pattern)
for line in args.infile:
new_line = matcher.sub(user_replace, line)
args.outfile.write(new_line)
args.infile.close()
args.outfile.close()
if __name__ == '__main__':
main()
Usage: python script.py input.txt output.txt
Note that this script does not account for non-ascii file encoding.
To open a file and loop through it:
Use raw_input to get input from user
Now, put this together and update you question if you run into problems :-)
I understand you want advice on how to structure your script, right? Here's what I would do:
Read the file at once and close it (I personally don't like to have open file objects, especially if my filesystem is remote).
Use a regular expression (phihag has suggested one in his answer, so I won't repeat it) to match the pattern of your placeholders. Find all of your placeholders and store them in a dictionary as keys.
For each word in the dictionary, ask the user with raw_input (not just input). And store them as values in the dictionary.
When done, parse your text substituting any instance of a given placeholder (key) with the user word (value). This is also done with regex.
The reason for using a dictionary is that a given placeholder could occur more than once and you probably don't want to make the user repeat the entry over and over again...
Try something like this
lines = []
with open(myfile, "r") as infile:
lines = infile.readlines()
outlines = []
for line in lines:
index = line.find("<<")
if index > 0:
word = line[index+2:line.find(">>")]
input = raw_input(word+"? ")
outlines.append(line.replace("<<"+word+">>", input))
else:
outlines.append(line)
with open(outfile, "w") as output:
for line in outlines:
outfile.write(line)
Disclaimer: I haven't actually run this, so it might not work, but it looks about right and is similar to something I've done in the past.
How it works:
It parses the file in as a list where each element is one line of the file.
It builds the output list of lines. It iterates through the lines in the input, checking if the string << exist. If it does, it rips out the word inside the << and >> brackets, using it as the question for a raw_input query. It takes the input from that query and replaces the value inside the arrows (and the arrows) with the input. It then appends this value to the list. If it didn't see the arrows it simply appended the line.
After running through all the lines, it writes them to the output file. You can make this whatever file you want.
Some issues:
As written, this will work for only one arrow statement per line. So if you had <<firstname>> <<lastname>> on the same line it would ignore the lastname portion. Fixing this wouldn't be too hard to implement - you could place a while loop using the index > 0 statement and holding the lines inside that if statement. Just remember to update the index again if you do that!
It iterates through the list three times. You could likely reduce this to two, but if you have a small text file this shouldn't be a huge problem.
It could be sensitive to encoding - I'm not entirely sure about that however. Worst case there you need to cast as a string.
Edit: Moved the +2 to fix the broken if statement.

Categories

Resources