(Edit: the script seems to work for others here trying to help. Is it because I'm running python 2.7? I'm really at a loss...)
I have a raw text file of a book I am trying to tag with pages.
Say the text file is:
some words on this line,
1
DOCUMENT TITLE some more words here too.
2
DOCUMENT TITLE and finally still more words.
I am trying to use python to modify the example text to read:
some words on this line,
</pg>
<pg n=2>some more words here too,
</pg>
<pg n=3>and finally still more words.
My strategy is to load the text file as a string. Build search-for and a replace-with strings corresponding to a list of numbers. Replace all instances in string, and write to a new file.
Here is the code I've written:
from sys import argv
script, input, output = argv
textin = open(input,'r')
bookstring = textin.read()
textin.close()
pages = []
x = 1
while x<400:
pages.append(x)
x = x + 1
pagedel = "DOCUMENT TITLE"
for i in pages:
pgdel = "%d\n%s" % (i, pagedel)
nplus = i + 1
htmlpg = "</p>\n<p n=%d>" % nplus
bookstring = bookstring.replace(pgdel, htmlpg)
textout = open(output, 'w')
textout.write(bookstring)
textout.close()
print "Updates to %s printed to %s" % (input, output)
The script runs without error, but it also makes no changes whatsoever to the input text. It simply reprints it character for character.
Does my mistake have to do with the hard return? \n? Any help greatly appreciated.
In python, strings are immutable, and thus replace returns the replaced output instead of replacing the string in place.
You must do:
bookstring = bookstring.replace(pgdel, htmlpg)
You've also forgot to call the function close(). See how you have textin.close? You have to call it with parentheses, like open:
textin.close()
Your code works for me, but I might just add some more tips:
Input is a built-in function, so perhaps try renaming that. Although it works normally, it might not for you.
When running the script, don't forget to put the .txt ending:
$ python myscript.py file1.txt file2.txt
Make sure when testing your script to clear the contents of file2.
I hope these help!
Here's an entirely different approach that uses re(import the re module for this to work):
doctitle = False
newstr = ''
page = 1
for line in bookstring.splitlines():
res = re.match('^\\d+', line)
if doctitle:
newstr += '<pg n=' + str(page) + '>' + re.sub('^DOCUMENT TITLE ', '', line)
doctitle = False
elif res:
doctitle = True
page += 1
newstr += '\n</pg>\n'
else:
newstr += line
print newstr
Since no one knows what's going on, it's worth a try.
Related
I'm new to Python and relatively new to programming. I'm trying to replace part of a file path with a different file path. If possible, I'd like to avoid regex as I don't know it. If not, I understand.
I want an item in the Python list [] before the word PROGRAM to be replaced with the 'replaceWith' variable.
How would you go about doing this?
Current Python List []
item1ToReplace1 = \\server\drive\BusinessFolder\PROGRAM\New\new.vb
item1ToReplace2 = \\server\drive\BusinessFolder\PROGRAM\old\old.vb
Variable to replace part of the Python list path
replaceWith = 'C:\ProgramFiles\Microsoft\PROGRAM'
Desired results for Python List []:
item1ToReplace1 = C:\ProgramFiles\Micosoft\PROGRAM\New\new.vb
item1ToReplace2 = C:\ProgramFiles\Micosoft\PROGRAM\old\old.vb
Thank you for your help.
The following code does what you ask, note I updated your '' to '\', you probably need to account for the backslash in your code since it is used as an escape character in python.
import os
item1ToReplace1 = '\\server\\drive\\BusinessFolder\\PROGRAM\\New\\new.vb'
item1ToReplace2 = '\\server\\drive\\BusinessFolder\\PROGRAM\\old\\old.vb'
replaceWith = 'C:\ProgramFiles\Microsoft\PROGRAM'
keyword = "PROGRAM\\"
def replacer(rp, s, kw):
ss = s.split(kw,1)
if (len(ss) > 1):
tail = ss[1]
return os.path.join(rp, tail)
else:
return ""
print(replacer(replaceWith, item1ToReplace1, keyword))
print(replacer(replaceWith, item1ToReplace2, keyword))
The code splits on your keyword and puts that on the back of the string you want.
If your keyword is not in the string, your result will be an empty string.
Result:
C:\ProgramFiles\Microsoft\PROGRAM\New\new.vb
C:\ProgramFiles\Microsoft\PROGRAM\old\old.vb
One way would be:
item_ls = item1ToReplace1.split("\\")
idx = item_ls.index("PROGRAM")
result = ["C:", "ProgramFiles", "Micosoft"] + item_ls[idx:]
result = "\\".join(result)
Resulting in:
>>> item1ToReplace1 = r"\\server\drive\BusinessFolder\PROGRAM\New\new.vb"
... # the above
>>> result
'C:\ProgramFiles\Micosoft\PROGRAM\New\new.vb'
Note the use of r"..." in order to avoid needing to have to 'escape the escape characters' of your input (i.e. the \). Also that the join/split requires you to escape these characters with a double backslash.
I am totally new in python world. Here I am looking for some suggestion about my problem. I have three text file one is original text file, one is text file for updating original text file and write in a new text file without modifying the original text file. So file1.txt looks like
$ego_vel=x
$ped_vel=2
$mu=3
$ego_start_s=4
$ped_start_x=5
file2.txt like
$ego_vel=5
$ped_vel=5
$mu=6
$to_decel=5
outputfile.txt should be like
$ego_vel=5
$ped_vel=5
$mu=6
$ego_start_s=4
$ped_start_x=5
$to_decel=5
the code I tried till now is given below:
import sys
import os
def update_testrun(filename1: str, filename2: str, filename3: str):
testrun_path = os.path.join(sys.argv[1] + "\\" + filename1)
list_of_testrun = []
with open(testrun_path, "r") as reader1:
for line in reader1.readlines():
list_of_testrun.append(line)
# print(list_of_testrun)
design_path = os.path.join(sys.argv[3] + "\\" + filename2)
list_of_design = []
with open(design_path, "r") as reader2:
for line in reader1.readlines():
list_of_design .append(line)
print(list_of_design)
for i, x in enumerate(list_of_testrun):
for test in list_of_design:
if x[:9] == test[:9]:
list_of_testrun[i] = test
# list_of_updated_testrun=list_of_testrun
break
updated_testrun_path = os.path.join(sys.argv[5] + "\\" + filename3)
def main():
update_testrun(sys.argv[2], sys.argv[4], sys.argv[6])
if __name__ == "__main__":
main()
with this code I am able to get output like this
$ego_vel=5
$ped_vel=5
$mu=3
$ego_start_s=4
$ped_start_x=5
$to_decel=5
all the value I get correctly except $mu value.
Will any one provide me where I am getting wrong and is it possible to share a python script for my task?
Looks like your problem comes from the if statement:
if x[:9] == test[:9]:
Here you're comparing the first 8 characters of each string. For all other cases this is fine as you're not comparing past the '=' character, but for $mu this means you're evaluating:
if '$mu=3' == '$mu=6'
This obviously evaluates to false so the mu value is not updated.
You could shorten to if x[:4] == test[:4]: for a quick fix but maybe you would consider another method, such as using the .split() string function. This lets you split a string around a specific character which in your case could be '='. For example:
if x.split('=')[0] == test.split('=')[0]:
Would evaluate as:
if '$mu' == '$mu':
Which is True, and would work for the other statements too. Regardless of string length before the '=' sign.
I'm making a simple rpg and I'm trying to add a feature where the user can type 'save' and their stats will be written onto a txt file named 'save.txt'.
Here is the code for the saving:
elif first_step == 'save':
f = open("save.txt", "w")
f.write(f'''{player1.name}
{player1.char_type}
{player1.life}
{player1.energy}
{player1.strength}
{player1.money}
{player1.weapon_lvl}
{player1.wakefulness}
{player1.days_left}
{player1.battle_count}''')
f.close()
But, I also need the user to be able to load their saved stats. So they would enter 'load' and their stats will be updated.
I'm trying to read the txt file one line at a time and then the value of that line would become one of the variables for the player stats.
If I do this without converting it first to a string I get issues, such as some lines being skipped as python is reading 2 lines as one.
So, I tried the following:
elif first_step == 'load':
f = open("save.txt", 'r')
player1.name_saved = f.readline()
player1.name_saved2 = str(player1.name_saved)
player1.name = player1.name_saved2
#player_name is fine but when I print char_type I get 'wizard/n'
player1.char_type_saved = f.readlines(1)
player1.char_type_saved2 = str(player1.char_type_saved)
I tried the following to remove the brackets and \n
final_player1.char_type = player1.char_type_saved2.translate({
ord(c): None for c in "[']\n" })
Deletes the brackets but the \n is still in there.
I've also tried:
final_player1.char_type = final_player1.char_type.replace("\n", "")
If anyone could help me with this I would greatly appreciate it.
Without seeing your code it is hard to tell what you are doing wrong. But my guess is that you are using the original string without realizing it.
Take a look at this example code. Buf still has the newline but replacedBuf doesn't.
buf = "a string\n"
replacedBuf = buf.replace("\n", "")
print (buf)
print (replacedBuf)
I want to remove something from the start and end of a string before writing to the .txt
I'm reading an incoming string from a serial port. I want to write the string to a .txt file, which I can do. I've tried using the rstrip() (also tried strip()) function to remove the 'OK' in the end, but with no luck.
Ideally, I want the program to be dynamic so I can use it for other files. This gives a problem, because the unwanted text in the start and end of the string might vary, so I can't look for specific chars/words to remove.
While this is said, all unwanted text in the start of the string will start with a '+' like in the example below (It might be possible to check if the first line starts with a '+' and remove it if it does. This would be ideal).
def write2file():
print "Listing local files ready for copying:"
listFiles()
print 'Enter name of file to copy:'
name = raw_input()
pastedFile = []
tempStr = readAll('AT+URDFILE="' + name + '"') #Calls another function who reads the file locally and return it in a string
tempStr = tempStr.rstrip('OK') #This is where I try to remove the 'OK' I know is going to be in the end of the string
pastedFile[:0] = tempStr #Putting the string into a list. Workaround for writing 128 bytes at a time. I know it's not pretty, but it works :-)
print 'Enter path to file directory'
path = raw_input()
myFile = open(join(path, name),"w")
while len(pastedFile):
tempcheck = pastedFile[0:128]
for val in tempcheck:
myFile.write(val)
del pastedFile[0:128]
myFile.close()
I expect the .txt to include all the text from the local file, but remove the OK in the end. When program is run it returns:
+URDFILE: "develop111.txt",606,"** ((content of local file)) OK
The 'OK' I wanted to be removed is still in there.
The text "+URDFILE: "develop111.txt",606," is also unwanted in the final .txt file.
So summarizes the problem:
How can I remove the unwanted text in the start and end of a string, before writing it to a .txt file
I assume that your URDFILE is always has the same return pattern +URDFILE: "filname",filesize,"filedata"\nOK as it is AT command. So, it should be enough to ''.join(tempStr.split(',')[3:])[:-3]
Working example:
>>> s = '+URDFILE: "filname",filesize,"filedata, more data"\nOK'
>>> ','.join(s.split(',')[2:])[:-3]
'"filedata, more data"'
or to remove with quotes:
>>>','.join(s.split(',')[2:])[1:-4]
'filedata, more data'
Can you try the following:
tempStr = '+URDFILE: "develop111.txt",606,"** ((content of local file)) OK'
tempStr = tempStr.strip()
if tempStr.startswith('+'):
tempStr = tempStr[1:]
if tempStr.endswith('OK'):
tempStr = tempStr[:-2]
print(tempStr)
Output:
URDFILE: "develop111.txt",606,"** ((content of local file))
If you want to select the required text then you can use regex for that. Can you try the following:
import re
tempStr = 'URDFILE: "develop111.txt",606,"** 01((content of local file)) OK'
tempStr = tempStr.strip()
if tempStr.startswith('+'):
tempStr = tempStr[1:]
if tempStr.endswith('OK'):
tempStr = tempStr[:-2]
# print(tempStr)
new_str = ''.join(re.findall(r'01(.+)', tempStr))
new_str = new_str.strip()
print(new_str)
Output:
((content of local file))
I have this code, which I want to open a specified file, and then every time there is a while loop it will count it, finally outputting the total number of while loops in a specific file. I decided to convert the input file to a dictionary, and then create a for loop that every time the word while followed by a space was seen it would add a +1 count to WHILE_ before finally printing WHILE_ at the end.
However this did not seem to work, and I am at a loss as to why. Any help fixing this would be much appreciated.
This is the code I have at the moment:
WHILE_ = 0
INPUT_ = input("Enter file or directory: ")
OPEN_ = open(INPUT_)
READLINES_ = OPEN_.readlines()
STRING_ = (str(READLINES_))
STRIP_ = STRING_.strip()
input_str1 = STRIP_.lower()
dic = dict()
for w in input_str1.split():
if w in dic.keys():
dic[w] = dic[w]+1
else:
dic[w] = 1
DICT_ = (dic)
for LINE_ in DICT_:
if ("while\\n',") in LINE_:
WHILE_ += 1
elif ('while\\n",') in LINE_:
WHILE_ += 1
elif ('while ') in LINE_:
WHILE_ += 1
print ("while_loops {0:>12}".format((WHILE_)))
This is the input file I was working from:
'''A trivial test of metrics
Author: Angus McGurkinshaw
Date: May 7 2013
'''
def silly_function(blah):
'''A silly docstring for a silly function'''
def nested():
pass
print('Hello world', blah + 36 * 14)
tot = 0 # This isn't a for statement
for i in range(10):
tot = tot + i
if_im_done = false # Nor is this an if
print(tot)
blah = 3
while blah > 0:
silly_function(blah)
blah -= 1
while True:
if blah < 1000:
break
The output should be 2, but my code at the moment prints 0
This is an incredibly bizarre design. You're calling readlines to get a list of strings, then calling str on that list, which will join the whole thing up into one big string with the quoted repr of each line joined by commas and surrounded by square brackets, then splitting the result on spaces. I have no idea why you'd ever do such a thing.
Your bizarre variable names, extra useless lines of code like DICT_ = (dic), etc. only serve to obfuscate things further.
But I can explain why it doesn't work. Try printing out DICT_ after you do all that silliness, and you'll see that the only keys that include while are while and 'while. Since neither of these match any of the patterns you're looking for, your count ends up as 0.
It's also worth noting that you only add 1 to WHILE_ even if there are multiple instances of the pattern, so your whole dict of counts is useless.
This will be a lot easier if you don't obfuscate your strings, try to recover them, and then try to match the incorrectly-recovered versions. Just do it directly.
While I'm at it, I'm also going to fix some other problems so that your code is readable, and simpler, and doesn't leak files, and so on. Here's a complete implementation of the logic you were trying to hack up by hand:
import collections
filename = input("Enter file: ")
counts = collections.Counter()
with open(filename) as f:
for line in f:
counts.update(line.strip().lower().split())
print('while_loops {0:>12}'.format(counts['while']))
When you run this on your sample input, you correctly get 2. And extending it to handle if and for is trivial and obvious.
However, note that there's a serious problem in your logic: Anything that looks like a keyword but is in the middle of a comment or string will still get picked up. Without writing some kind of code to strip out comments and strings, there's no way around that. Which means you're going to overcount if and for by 1. The obvious way of stripping—line.partition('#')[0] and similarly for quotes—won't work. First, it's perfectly valid to have a string before an if keyword, as in "foo" if x else "bar". Second, you can't handle multiline strings this way.
These problems, and others like them, are why you almost certainly want a real parser. If you're just trying to parse Python code, the ast module in the standard library is the obvious way to do this. If you want to be write quick&dirty parsers for a variety of different languages, try pyparsing, which is very nice, and comes with some great examples.
Here's a simple example:
import ast
filename = input("Enter file: ")
with open(filename) as f:
tree = ast.parse(f.read())
while_loops = sum(1 for node in ast.walk(tree) if isinstance(node, ast.While))
print('while_loops {0:>12}'.format(while_loops))
Or, more flexibly:
import ast
import collections
filename = input("Enter file: ")
with open(filename) as f:
tree = ast.parse(f.read())
counts = collections.Counter(type(node).__name__ for node in ast.walk(tree))
print('while_loops {0:>12}'.format(counts['While']))
print('for_loops {0:>14}'.format(counts['For']))
print('if_statements {0:>10}'.format(counts['If']))