Multiple Re.Subs Python - python

Sorry, but I can't figure this out from the Python documentation or any of the stuff I've found from Google.
So, I've been working on renaming files with code 99% from one of the awesome helpers here at StackOverflow.
I'm working on putting together a renaming script that (and this is what I got help with from someone here) works with the name (not the extension).
I'm sure I'll come up with more replacements, but my problem at the moment is that I can't figure out how to do more than one re.sub.
Current Code (Replaces dots with spaces):
import os, shutil, re
def rename_file (original_filename):
name, extension = os.path.splitext(original_filename)
#Remove Spare Dots
modified_name = re.sub("\.", r" ", name)
new_filename = modified_name + extension
try:
# moves files or directories (recursively)
shutil.move(original_filename, new_filename)
except shutil.Error:
print ("Couldn't rename file %(original_filename)s!" % locals())
[rename_file(f) for f in os.listdir('.') if not f.startswith('.')]
Hoping to also
re.sub("C126", "Perception", name)
re.sub("Geo1", "Geography", name)
Also, it'd be awesome if I could have it capitalize the first letter of any word except "and|if"
I tried
modified_name = re.sub("\.", r" ", name) && re.sub(...
but that didn't work; neither did putting them on different lines. How do I do all the subs and stuff I want to do/make?

Just operate over the same string over and over again, replacing it each time:
name = re.sub(r"\.", r" ", name)
name = re.sub(r"C126", "Perception", name)
name = re.sub(r"Geo1", "Geography", name)
#DanielRoseman is right though, these are literal patterns that don't need regexes to be described/found/replaced. You can use timeit to demonstrate plain old replace() is preferrable:
In [16]: timeit.timeit("test.replace('asdf','0000')",setup="test='asdfASDF1234'*10")
Out[16]: 1.0641241073608398
In [17]: timeit.timeit("re.sub(r'asdf','0000',test)",setup="import re; test='asdfASDF1234'*10")
Out[17]: 6.126996994018555

Well, each time you call re.sub(), it returns a new, changed string. So, if you want to keep modifying each new string, keep assigning the new strings to the same variable name. Essentially, don't think of yourself as modifying the same string over and over - instead, think of yourself as modifying a new String each time.
Example: If you're using the string "lol.Geo1",
newString = re.sub("\.", r" ", originalString)
will return the string "lol Geo1", and assign it to newString. Now, if you want to change that new string, do your next substitution and it will return another string, which you can put to "newString" again -
newString = re.sub("Geo1", "Geography", newString)
Now, newString evaluates to "lol Geography". With each substitution, you are creating a new string, not the same one. That's why
modified_name = re.sub("\.", r" ", name) && re.sub(...
didn't work - "re.sub(".", r" ", name)" will return one string, "re.sub(...)" will return another string, etc., etc. - each of those strings only having their individual substitution on the original string, like this:
modified_name = "lol Geo1" && "lol.Geography"...
So, to get it to work, follow the other poster's suggestions - just keep repeating the assignment with each substitution you want, assigning the substituted newString to itself, until you've finished all of your substitutions.
Hopefully, that's a clear explanation. Feel free to ask questions. :)

modified_name = re.sub("\.", r" ", name)
modified_name = re.sub("C126", "Perception", modified_name)
modified_name = re.sub("Geo1", "Geography", modified_name)
Pass the output of one as the input of the next.

you could also stack one over the other, but this consider less readable
name = re.sub(r"Geo1", "Geography",
re.sub(r"C126", "Perception",
re.sub(r"\.", r" ", name)
)
)

Just put a vertical bar :)
name = re.sub(r"\.|C126|Geo1", r" ", name)

Related

How to remove whitespaces in a string except from between certain elements

I have a string similar to (the below one is simplified):
" word= {his or her} whatever "
I want to delete every whitespace except between {}, so that my modified string will be:
"word={his or her}whatever"
lstrip or rstrip doesn't work of course. If I delete all whitespaces the whitespaces between {} are deleted as well. I tried to look up solutions for limiting the replace function to certain areas but even if I found out it I haven't been able to implement it. There are some stuff with regex (I am not sure if they are relevant here) but I haven't been able to understand them.
EDIT: If I wanted to except the area between, say {} and "", that is:
if I wanted to turn this string:
" word= {his or her} and "his or her" whatever "
into this:
"word={his or her}and"his or her"whatever"
What would I change
re.sub(r'\s+(?![^{]*})', '', list_name) into?
See instead going arround re you can replace uisng string.replace. Which will be much more easier and less complex when you playing around strings. Espacillay when you have multiple substitutions you end up bigger regex.
st =" word= {his or her} whatever "
st2=""" word= {his or her} and "his or her" whatever """
new = " ".join(st2.split())
new = new.replace("= ", "=").replace("} ", "}").replace('" ' , '"').replace(' "' , '"')
print(new)
Some outputs
Example 1 output
word={his or her}whatever
Example 2 output
word={his or her}and"his or her"whatever
You can use by replace
def remove(string):
return string.replace(" ", "")
string = 'hell o whatever'
print(remove(string)) // Output: hellowhatever

Python inserting spaces in string

Alright, I'm working on a little project for school, a 6-frame translator. I won't go into too much detail, I'll just describe what I wanted to add.
The normal output would be something like:
TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD
The important part of this string are the M and the _ (the start and stop codons, biology stuff). What I wanted to do was highlight these like so:
TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSD
Now here is where (for me) it gets tricky, I got my output to look like this (adding a space and a ' to highlight the start and stop). But it only does this once, for the first start and stop it finds. If there are any other M....._ combinations it won't highlight them.
Here is my current code, attempting to make it highlight more than once:
def start_stop(translation):
index_2 = 0
while True:
if 'M' in translation[index_2::1]:
index_1 = translation[index_2::1].find('M')
index_2 = translation[index_1::1].find('_') + index_1
new_translation = translation[:index_1] + " '" + \
translation[index_1:index_2 + 1] + "' " +\
translation[index_2 + 1:]
else:
break
return new_translation
I really thought this would do it, guess not. So now I find myself being stuck.
If any of you are willing to try and help, here is a randomly generated string with more than one M....._ set:
'TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLYMPPARRLATKSRFLTPVISSG_DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI'
Thank you to anyone willing to help :)
Regular expressions are pretty handy here:
import re
sequence = "TTCP...."
highlighted = re.sub(r"(M\w*?_)", r" '\1' ", sequence)
# Output:
"TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLY 'MPPARRLATKSRFLTPVISSG_' DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI"
Regex explanation:
We look for an M followed by any number of "word characters" \w* then an _, using the ? to make it a non-greedy match (otherwise it would just make one group from the first M to the last _).
The replacement is the matched group (\1 indicates "first group", there's only one), but surrounded by spaces and quotes.
You just require little slice of 'slice' module , you don't need any external module :
Python string have a method called 'index' just use it.
string_1='TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD'
before=string_1.index('M')
after=string_1[before:].index('_')
print('{} {} {}'.format(string_1[:before],string_1[before:before+after+1],string_1[before+after+1:]))
output:
TTCPTISPALGLAWS_DLGTLGF MSYSANTASGETLVSLYQLGLFEM_ VVSYGRTKYYLICP_LFHLSVGFVPSD

Python - Print Each Sentence On New Line

Per the subject, I'm trying to print each sentence in a string on a new line. With the current code and output shown below, what's the syntax to return "Correct Output" shown below?
Code
sentence = 'I am sorry Dave. I cannot let you do that.'
def format_sentence(sentence):
sentenceSplit = sentence.split(".")
for s in sentenceSplit:
print s + "."
Output
I am sorry Dave.
I cannot let you do that.
.
None
Correct Output
I am sorry Dave.
I cannot let you do that.
You can do this :
def format_sentence(sentence) :
sentenceSplit = filter(None, sentence.split("."))
for s in sentenceSplit :
print s.strip() + "."
There are some issues with your implementation. First, as Jarvis points out in his answer, if your delimiter is the first or last character in your string or if two delimiter characters are right next to each other, None will be inserted into your array. To fix this, you need to filter out the None values. Also, instead of using the + operator, use formatting instead.
def format_sentence(sentences):
sentences_split = filter(None, sentences.split('.'))
for s in sentences_split:
print '{0}.'.format(s.strip())
You can split the string by ". " instead of ".", then print each line with an additional "." until the last one, which will have a "." already.
def format_sentence(sentence):
sentenceSplit = sentence.split(". ")
for s in sentenceSplit[:-1]:
print s + "."
print sentenceSplit[-1]
Try:
def format_sentence(sentence):
print(sentence.replace('. ', '.\n'))

Python: Remove words from a given string

I'm quite new to programming (and this is my first post to stackoverflow) however am finding this problem quite difficult. I am supposed to remove a given string in this case (WUB) and replace it with a space. For example: song_decoder(WUBWUBAWUBWUBWUBBWUBC) would give the output: A B C. From other questions on this forums I was able to establish that I need to replace "WUB" and to remove whitespace use a split/join. Here is my code:
def song_decoder(song):
song.replace("WUB", " ")
return " ".join(song.split())
I am not sure where I am going wrong with this as I the error of WUB should be replaced by 1 space: 'AWUBBWUBC' should equal 'A B C' after running the code. Any help or pointing me in the right direction would be appreciated.
You're close! str.replace() does not work "in-place"; it returns a new string that has had the requested replacement performed on it.
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
Do this instead:
def song_decoder(song):
song = song.replace("WUB", " ")
return " ".join(song.split())
For example:
In [14]: song_decoder("BWUBWUBFF")
Out[14]: 'B FF'
Strings are immutable in Python. So changing a string (like you try to do with the "replace" function) does not change your variable "song". It rather creates a new string which you immediately throw away by not assigning it to something. You could do
def song_decoder(song):
result = song.replace("WUB", " ") # replace "WUB" with " "
result = result.split() # split string at whitespaces producing a list
result = " ".join(result) # create string by concatenating list elements around " "s
return result
or, to make it shorter (one could also call it less readable) you can
def song_decoder(song):
return " ".join(song.replace("WUB", " ").split())
Do the both steps in a single line.
def song_decoder(song):
return ' '.join(song.replace('WUB',' ').split())
Result
In [95]: song_decoder("WUBWUBAWUBWUBWUBBWUBC")
Out[95]: 'A B C'

How to print spaces in Python?

In C++, \n is used, but what do I use in Python?
I don't want to have to use:
print (" ").
This doesn't seem very elegant.
Any help will be greatly appreciated!
Here's a short answer
x=' '
This will print one white space
print(x)
This will print 10 white spaces
print(10*x)
Print 10 whites spaces between Hello and World
print(f"Hello{x*10}World")
If you need to separate certain elements with spaces you could do something like
print "hello", "there"
Notice the comma between "hello" and "there".
If you want to print a new line (i.e. \n) you could just use print without any arguments.
A lone print will output a newline.
print
In 3.x print is a function, therefore:
print()
print("hello" + ' '*50 + "world")
Any of the following will work:
print 'Hello\nWorld'
print 'Hello'
print 'World'
Additionally, if you want to print a blank line (not make a new line), print or print() will work.
First and foremost, for newlines, the simplest thing to do is have separate print statements, like this:
print("Hello")
print("World.")
#the parentheses allow it to work in Python 2, or 3.
To have a line break, and still only one print statement, simply use the "\n" within, as follows:
print("Hello\nWorld.")
Below, I explain spaces, instead of line breaks...
I see allot of people here using the + notation, which personally, I find ugly.
Example of what I find ugly:
x=' ';
print("Hello"+10*x+"world");
The example above is currently, as I type this the top up-voted answer. The programmer is obviously coming into Python from PHP as the ";" syntax at the end of every line, well simple isn't needed. The only reason it doesn't through an error in Python is because semicolons CAN be used in Python, really should only be used when you are trying to place two lines on one, for aesthetic reasons. You shouldn't place these at the end of every line in Python, as it only increases file-size.
Personally, I prefer to use %s notation. In Python 2.7, which I prefer, you don't need the parentheses, "(" and ")". However, you should include them anyways, so your script won't through errors, in Python 3.x, and will run in either.
Let's say you wanted your space to be 8 spaces,
So what I would do would be the following in Python > 3.x
print("Hello", "World.", sep=' '*8, end="\n")
# you don't need to specify end, if you don't want to, but I wanted you to know it was also an option
#if you wanted to have an 8 space prefix, and did not wish to use tabs for some reason, you could do the following.
print("%sHello World." % (' '*8))
The above method will work in Python 2.x as well, but you cannot add the "sep" and "end" arguments, those have to be done manually in Python < 3.
Therefore, to have an 8 space prefix, with a 4 space separator, the syntax which would work in Python 2, or 3 would be:
print("%sHello%sWorld." % (' '*8, ' '*4))
I hope this helps.
P.S. You also could do the following.
>>> prefix=' '*8
>>> sep=' '*2
>>> print("%sHello%sWorld." % (prefix, sep))
Hello World.
rjust() and ljust()
test_string = "HelloWorld"
test_string.rjust(20)
' HelloWorld'
test_string.ljust(20)
'HelloWorld '
Space char is hexadecimal 0x20, decimal 32 and octal \040.
>>> SPACE = 0x20
>>> a = chr(SPACE)
>>> type(a)
<class 'str'>
>>> print(f"'{a}'")
' '
Tryprint
Example:
print "Hello World!"
print
print "Hi!"
Hope this works!:)
this is how to print whitespaces in python.
import string
string.whitespace
'\t\n\x0b\x0c\r '
i.e .
print "hello world"
print "Hello%sworld"%' '
print "hello", "world"
print "Hello "+"world
Sometimes, pprint() in pprint module works wonder, especially for dict variables.
simply assign a variable to () or " ", then when needed type
print(x, x, x, Hello World, x)
or something like that.
Hope this is a little less complicated:)
To print any amount of lines between printed text use:
print("Hello" + '\n' *insert number of whitespace lines+ "World!")
'\n' can be used to make whitespace, multiplied, it will make multiple whitespace lines.
In Python2 there's this.
def Space(j):
i = 0
while i<=j:
print " ",
i+=1
And to use it, the syntax would be:
Space(4);print("Hello world")
I haven't converted it to Python3 yet.
A lot of users gave you answers, but you haven't marked any as an answer.
You add an empty line with print().
You can force a new line inside your string with '\n' like in print('This is one line\nAnd this is another'), therefore you can print 10 empty lines with print('\n'*10)
You can add 50 spaces inside a sting by replicating a one-space string 50 times, you can do that with multiplication 'Before' + ' '*50 + 'after 50 spaces!'
You can pad strings to the left or right, with spaces or a specific character, for that you can use .ljust() or .rjust() for example, you can have 'Hi' and 'Carmen' on new lines, padded with spaces to the left and justified to the right with 'Hi'.rjust(10) + '\n' + 'Carmen'.rjust(10)
I believe these should answer your question.

Categories

Resources