minidom doesn't read \\n newline character at the end of line - python

I am using minidom parser to read the xml. The problem I am facing is that it is not reading end of line character when it is done reading the line. For example my xml file is something like :
<?xml version="1.0" ?><ItemGroup>
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">setlocal
C:\Tools\CMake2.8\bin\cmake.exe C:/tb/Source/../</Command>
</ItemGroup>
and my python code looks something like :
dom = xml.dom.minidom.parse(fileFullPath)
nodes = dom.getElementsByTagName('Command')
for j in range(len(nodes)):#{
path = nodes[j].childNodes[0].nodeValue
if nodeName == 'Command':#{
pathList = path.split(' ')
for i in range(len(pathList)):#{
sPath = pathList[i]
if sPath.find('\\n')!=-1:
print 'sPath has \\n'
#}
#}
#}
(Please ignore/point out any indentation errors)
now even though setlocal and C:\Tools\CMake2.8\bin\cmake.exe have a newline character in between them in the xml file, my code is not able to read it and I don't know why. Can somebody help ?
update :
I am trying to split the <Command> into ['setlocal', 'C:\Tools\CMake2.8\bin\cmake.exe', 'C:/tb/Source/../']

Another possibility, considering line separators independently of
the particular OS, might be the following, using the in operator
and os.linesep. I also tried this code using '\n' (without escaping
the backslash) instead of os.linesep. Both versions worked.
(My shell didn't take running xml.dom.minidom.parse(...), therefore
there are some changes in the imports you may ignore.)
from xml.dom.minidom import parse
import os
dom = parse(fileFullPath)
nodes = dom.getElementsByTagName('Command')
for node in nodes:
path = node.childNodes[0].nodeValue
if node.nodeName == 'Command':
for path in path.split(' '):
if os.linesep in path:
print r'Path contains \n or whatever your OS uses.'
I also left ' ' inside the split, as it seems that having setlocal in your list of paths
is not your aim.
EDIT:
After I noticed your comment stating that you actually want to have setlocal in your
list, I would also say that checking for \n is redundant, because splitting
by all whitespaces of course also considers line separators as whitespaces.
'a\nb'.split()
gives
['a', 'b']

Instead of splitting the text value on a space (' '), you want to split it on all white space and since these look like command lines, they should be split using a proper parser. You want to change:
pathList = path.split(' ')
for i in range(len(pathList)):#{
sPath = pathList[i]
if sPath.find('\\n')!=-1:
print 'sPath has \\n'
To:
import shlex
pathList = shlex.split(path, posix=False)
This will give you:
['setlocal', 'C:\\Tools\\CMake2.8\\bin\\cmake.exe', 'C:/tb/Source/../']
NOTE: If any of your paths contain spaces and are not properly quoted, they will be split incorrectly. E.g., 'C:\\Program Files' would be split to ['C:\\Program', 'Files'] but '"C:\\Program Files"' will be split to ['C:\\Program Files'].
Also, your code could use a little cleaning because Python is not C,
Javascript, etc.
import xml.dom.minidom
import shlex
dom = xml.dom.minidom.parse(fileFullPath)
nodes = dom.getElementsByTagName('Command')
for node in nodes:
path = node.childNodes[0].nodeValue
pathList = shlex.split(path, posix=False)
print pathList

Related

How to enter a long and not-a-string data into the argument?

I have a problem with Python and need your help.
Take a look at this code:
import os
os.chdir('''C:\\Users\\Admin\\Desktop\\Automate_the_Boring_Stuff_onlimematerials_v.2\\automate_online-materials\\example.xlsx''')
The os.chdir() did not work because the directory I put in between the ''' and ''' is considered as raw string. Note that a line is no more than 125 characters, so I have to make a new line.
So how can I fix this?
You can split your statement into multiple lines by using the backslash \ to indicate that a statement is continued on the new line.
message = 'This message will not generate an error because \
it was split by using the backslash on your \
keyboard'
print(message)
Output
This message will not generate an error because it was split by using the backslash on your keyboard
Lines can be longer than 125 characters, but you should probably avoid that. You have a few solutions:
x = ('hi'
'there')
# x is now the string "hithere"
os.chdir('hi'
'there') # does a similar thing (os.chdir('hithere'))
You could also set up a variable:
root_path = "C:\\Users\\Admin\\Desktop"
filepath = "other\\directories" # why not just rename it though
os.chdir(os.path.join(root_path, filepath))
Do these work for you?
I'm also curious why you have to chdir there; if it's possible, you should just run the python script from that directory.

Pyperclip not pasting new lines?

I'm trying to make a simple script which takes a list of names off the clipboard formatted as "Last, First", then pastes them back as "First Last". I'm using Python 3 and Pyperclip.
Here is the code:
import pyperclip
text = pyperclip.paste()
lines = text.split('\n')
for i in range(len(lines)):
last, first = lines[i].split(', ')
lines[i] = first + " " + last
text = '\n'.join(lines)
pyperclip.copy(text)
When I copy this to the clipboard:
Carter, Bob
Goodall, Jane
Then run the script, it produces: Bob CarterJane Goodall with the names just glued together and no new line. I'm not sure what's screwy.
Thanks for your help.
Apparently I need to use '\r\n' instead of just '\n'. I don't know exactly why this is but I found that answer on the internet and it worked.
To include newlines in your file, you need to explicitly pass them to the file methods.
On Unix platforms, strings passed into .write should end with \n. Likewise, each of
the strings in the sequence that is passed into to .writelines should end in \n. On
Windows, the newline string is \r\n.
To program in a cross platform manner, the linesep string found in the os module
defines the correct newline string for the platform:
>>> import os
>>> os.linesep # Unix platform
'\n'
Souce: Illustrated Guide to Python 3

Python: A character that appears to be a space but isn't a space. What is it?

This is in Python 3.4.
I'm writing my first program for myself and I'm stuck at a certain part. I'm trying to rename all of my audio files according to its metadata in python. However, when I try to rename the files, it doesn't finish renaming it sometimes. I think it is due to an invalid character. The problem is that I don't know what character it is. To me, it looks like a space but it isn't.
My code is here:
from tinytag import TinyTag
import os
print("Type in the file directory of the songs you would like to rename and organize:")
directory = input()
os.chdir(directory)
file_list = os.listdir(directory)
for item in file_list:
tag = TinyTag.get(item)
title = str(tag.title)
artist = str(tag.artist)
if artist[-1] == "\t" or artist[-1] == " ":
print("Found it!")
new_title = artist + "-" + title + ".mp3"
#os.rename(item, new_title)
print(new_title)
This is the list that it outputs:
http://imgur.com/tfgBdMZ
It is supposed to output "Artist-Title.mp3" but sometimes it outputs "Artist -Title .mp3". When the space is there, python stops renaming it at that point. The first entry that does so is Arethra Franklin. It simply names the file Arethra Franklin rather than Arethra Franklin-Dr.Feelgood.mp3
Why does it do this? My main question is what is that space? I tried setting up a == boolean to see if it is a space (" ") but it isn't.
It ends the renaming by stopping once it hits that "space". It doesn't when it hits a normal space however.
It's possible that the filename has some unicode in it. If all you are looking for is a file renamer, you could use str.encode and handle the errors by replacing them with a ? character. As an example:
# -*- coding: utf-8 -*-
funky_name = u"best_song_ever_пустынных.mp3"
print (funky_name)
print (funky_name.encode('ascii','replace'))
This gives:
best_song_ever_пустынных.mp3
best_song_ever_?????????.mp3
As mentioned in the comments, you can use ord to find out what the "offending space" really is.

Python normalized pathname with a special case

consider the following example
#junk path ending with a test file
test = "C:\\test1/test2\test3.txt"
with import os and os.path.abspath in can normalize the pathname
test_norm = os.path.abspath(test)
print test_norm
C:\\test1\\test2\test3.txt'
if i split the pathname with os.path.split i have the following problem
os.path.split(test_norm)
('C:\\test1', 'test2\test3.txt')
instead of
C:\\test1\\test2 and test3.txt
this problem originates from the fact that an user typed with input_raw a directory as the example. Can I avoid this with raw_input?
Easy: '\t' is a tab character. You need to use 'C:\\test1\\test2\\test3.txt' or r'C:\test1\test2\test3.txt'.
You didn't escape that final slash so python thinks you want a tab character (\t) not a separator (\\). test = "C:\\test1/test2\test3.txt" should be test = "C:\\test1/test2\\test3.txt".

Python 3: How can I get os.getcwd() to play nice with re.sub()?

I am trying to replace some content in a file with the current working directory using python 3.3. I have:
def ReplaceInFile(filename, replaceRegEx, replaceWithRegEx):
''' Open a file and use a re.sub to replace content within it in place '''
with fileinput.input(filename, inplace=True) as f:
for line in f:
line = re.sub(replaceRegEx, replaceWithRegEx, line)
#sys.stdout.write (line)
print(line, end='')
and I am using it like so:
ReplaceInFile(r'Path\To\File.iss', r'(#define RootDir\s+)(.+)', r'\g<1>' + os.getcwd())
Unfortunately for me, my path is C:\Tkbt\Launch, so the substitution that I get is:
#define RootDir C: kbt\Launch
i.e. it's interpreting \t as tab.
So it looks to me like I need to tell python to double escape everything from os.getcwd(). I thought maybe .decode('unicode_escape') might be the answer but it is not. Can anybody help me out?
I'm hoping there's a solution that isn't "find replace each '\' with '\\'".
You'll have to resort to .replace('\\', '\\\\') I am afraid, that's the only option you have to make this work.
Using encoding to unicode_escape then decode again from ASCII would have been nice, if it worked:
replacepattern = r'\g<1>' + os.getcwd().encode('unicode_escape').decode('ascii')
This does the right thing with paths:
>>> print(re.sub(r'(#define RootDir\s+)(.+)', r'\g<1>' + r'C:\Path\to\File.iss'.encode('unicode_escape').decode('ascii'), '#define Root
#define RootDir C:\Path\to\File.iss
but not with existing non-ASCII characters because re.sub() does not process \u or \x escapes.
Don't use re.escape() to escape special characters in a string, that escapes a little too much:
>>> print(re.sub(r'(#define RootDir\s+)(.+)', r'\g<1>' + re.escape(r'C:\Path\To\File.iss'), '#define RootDir foo/bar/baz'))
#define RootDir C\:\Path\To\File\.iss
note the \: there.
Only .replace() results in a working replacement pattern, including non-ASCII characters:
>>> print(re.sub(r'(#define RootDir\s+)(.+)', r'\g<1>' + 'C:\\Path\\To\\File-with-non-
ASCII-\xef.iss'.replace('\\', '\\\\'), '#define Root
#define RootDir C:\Path\To\File-with-non-ASCII-ï.iss

Categories

Resources