Remove string tail from first occurrence of a symbol - python

I need to remove comment (if it exists) from a string. Comments start with #.
Line may have multiple #.
E.g., "separator" line: ################################
Is there a better (one-liner) way to do it than this:
ipound = line.find('#')
if ipound >= 0:
line = line[: ipound].rstrip()
(rstrip is optional to remove white space before comment)
PS: cannot avoid if like this:
>>> line = "test"
>>> line = line[:line.find('#')]
>>> line
'tes'

This should be remove the comment and return the line.
def remove_comment(line):
return line.split('#')[0].rstrip()

line.index("#") returns the first occurrence of "#"
You can then use string slicing to get the stuff before it: line = line[:line.index("#")]If there is no instance of "#", this will cause an error, so instead do line = line[:line.index("#")] if "#" in line else line

If you want to remove the end of the string, I would advise you to use the re library. You can do a lot of complicated stuff with one line of code.
Removing what is after # if there is a # is equivalent to keeping all of what is before # and everything if there is no #.
import re
string='hi#there'
new_string=re.findall('.*#?',string)[0]
>>>new_string='hi'
string2='hithere'
new_string_2=re.findall('.*#?',string2)[0]
>>>new_string='hithere'
You did not specify if you were going to have many # in your string. If yes, it will only consider the last #.
string3='hi#there#how are you ?'
new_string_3=re.findall('.*#',strign3)[0]
>>>new_string_3='hi#there#

Related

Python Code to Replace First Letter of String: String Index Error

Currently, I am working on parsing resumes to remove "-" only when it is used at the beginning of each line. I've tried identifying the first character of each string after the text has been split. Below is my code:
for line in text.split('\n'):
if line[0] == "-":
line[0] = line.replace('-', ' ')
line is a string. This is my way of thinking but every time I run this, I get the error IndexError: string index out of range. I'm unsure of why because since it is a string, the first element should be recognized. Thank you!
The issue you're getting is because some lines are empty.
Then your replacement is wrong:
first because it will assign the first "character" of the line but you cannot change a string because it's immutable
second because the replacement value is the whole string minus some dashes
third because line is lost at the next iteration. The original list of lines too, by the way.
If you want to remove the first character of a string, no need for replace, just slice the string (and don't risk to remove other similar characters).
A working solution would be to test with startswith and rebuild a new list of strings. Then join back
text = """hello
-yes--
who are you"""
new_text = []
for line in text.splitlines():
if line.startswith("-"):
line = line[1:]
new_text.append(line)
print("\n".join(new_text))
result:
hello
yes--
who are you
with more experience, you can pack this code into a list comprehension:
new_text = "\n".join([line[1:] if line.startswith("-") else line for line in text.splitlines()])
finally, regular expression module is also a nice alternative:
import re
print(re.sub("^-","",text,flags=re.MULTILINE))
this removes the dash on all lines starting with dash. Multiline flag tells regex engine to consider ^ as the start of the line, not the start of the buffer.
this could be due to empty lines. You could just check the length before taking the index.
new_text = []
text="-testing\nabc\n\n\nxyz"
for line in text.split("\n"):
if line and line[0] == '-':
line = line[1:]
new_text.append(line)
print("\n".join(new_text))

Stripping Hex code from a plain text file in Python [duplicate]

I have a string. How do I remove all text after a certain character? (In this case ...)
The text after will ... change so I that's why I want to remove all characters after a certain one.
Split on your separator at most once, and take the first piece:
sep = '...'
stripped = text.split(sep, 1)[0]
You didn't say what should happen if the separator isn't present. Both this and Alex's solution will return the entire string in that case.
Assuming your separator is '...', but it can be any string.
text = 'some string... this part will be removed.'
head, sep, tail = text.partition('...')
>>> print head
some string
If the separator is not found, head will contain all of the original string.
The partition function was added in Python 2.5.
S.partition(sep) -> (head, sep, tail)
Searches for the separator sep in S, and returns the part before it,
the separator itself, and the part after it. If the separator is not
found, returns S and two empty strings.
If you want to remove everything after the last occurrence of separator in a string I find this works well:
<separator>.join(string_to_split.split(<separator>)[:-1])
For example, if string_to_split is a path like root/location/child/too_far.exe and you only want the folder path, you can split by "/".join(string_to_split.split("/")[:-1]) and you'll get
root/location/child
Without a regular expression (which I assume is what you want):
def remafterellipsis(text):
where_ellipsis = text.find('...')
if where_ellipsis == -1:
return text
return text[:where_ellipsis + 3]
or, with a regular expression:
import re
def remwithre(text, there=re.compile(re.escape('...')+'.*')):
return there.sub('', text)
import re
test = "This is a test...we should not be able to see this"
res = re.sub(r'\.\.\..*',"",test)
print(res)
Output: "This is a test"
The method find will return the character position in a string. Then, if you want remove every thing from the character, do this:
mystring = "123⋯567"
mystring[ 0 : mystring.index("⋯")]
>> '123'
If you want to keep the character, add 1 to the character position.
From a file:
import re
sep = '...'
with open("requirements.txt") as file_in:
lines = []
for line in file_in:
res = line.split(sep, 1)[0]
print(res)
This is in python 3.7 working to me
In my case I need to remove after dot in my string variable fees
fees = 45.05
split_string = fees.split(".", 1)
substring = split_string[0]
print(substring)
Yet another way to remove all characters after the last occurrence of a character in a string (assume that you want to remove all characters after the final '/').
path = 'I/only/want/the/containing/directory/not/the/file.txt'
while path[-1] != '/':
path = path[:-1]
another easy way using re will be
import re, clr
text = 'some string... this part will be removed.'
text= re.search(r'(\A.*)\.\.\..+',url,re.DOTALL|re.IGNORECASE).group(1)
// text = some string

Regex: Capture a line when certain columns are equal to certain values

Let's say we have this data extract:
ID,from,to,type,duration
1,paris,berlin,member,12
2,berlin,paris,member,12
3,paris,madrid,non-member,10
I want to retrieve the line when from = paris, and type = member.
Which means in this example I have only:
1,paris,berlin,member,12
That satisfy these rules. I am trying to do this with Regex only. I am still learning and I could only get this:
^.*(paris).*(member).*$
However, this will give me also the second line where paris is a destination.
The idea I guess is to:
Divide the line by commas.
Check if the second item is equal to 'paris'
Check if the fourth item is equal to 'member', or even check if there is 'member' in that line as there is no confusion with this part.
Any solution where I can use only regex?
Use [^,]* instead of .* to match a sequence of characters that doesn't include the comma separator. Use this for each field you want to skip when matching the line.
^[^,]*,paris,[^,]*,member,
Note that this is a very fragile mechanism compared to use the csv module, since it will break if you have any fields that contain comma (the csv module understands quoting a field to protect the delimiter).
This should do it:
^.*,(paris),.*,(member),.*$
As many have pointed out, I would read this into a dictionary using csv. However, if you insist on using regex, this should work:
[0-9]+\,paris.*[^-]member.*
try this.
import re
regex = r"\d,paris,\w+,member,\d+"
str = """ID,from,to,type,duration
1,paris,berlin,member,12
2,berlin,paris,member,12
3,paris,madrid,non-member,10"""
str = str.split("\n")
for line in str:
if (re.match(regex, line)):
print(line)
You can try this:
import re
s = """
ID,from,to,type,duration
1,paris,berlin,member,12
2,berlin,paris,member,12
3,paris,madrid,non-member,10
"""
final_data = re.findall('\d+,paris,\w+,member,\d+', s)
Output:
['1,paris,berlin,member,12']
However, note that the best solution is to read the file and use a dictionary:
import csv
l = list(csv.reader(open('filename.csv')))
final_l = [dict(zip(l[0], i)) for i in l[1:]]
final_data = [','.join(i[b] for b in l[0]) for i in final_l if i['from'] == 'paris' and i['type'] == 'member']

Is there a specific way to input characters at the beginning of a new line?

Current I am trying to implement a way to input ">" at the beginning of each separate line in a string.
An example would be:
String:
"Hello how are you!
Python is cool!"
Now that's all one big string, with a line break. But is there a function to establish when and where the line break is? For as I stated above, I'd like to incorporate a ">" at the beginning of each new line. Like so:
String:
">Hello how are you!
>Python is cool!"
Note: The string isn't permanently set, so that's why I am having to work around this.
Hopefully that makes sense, and thanks for your help!
Just split the lines and concat:
lines = """Hello how are you!
Python is cool!"""
for line in lines.splitlines():
if line:
print(">" + line)
else:
print(line)
>Hello how are you!
> Python is cool!
To get a new string and keep the newlines set keepends=True:
new_s = "".join([">{}".format(line) if line.strip() else line
for line in lines.splitlines(True)])
print(new_s)
>Hello how are you!
> Python is cool!
str.splitlines([keepends])
Return a list of the lines in the string, breaking at line boundaries. This method uses the universal newlines approach to splitting lines. Line breaks are not included in the resulting list unless keepends is given and true.
Use a regular expression to find groups of non-newline characters and insert a > character before:
new_string = re.sub(r'[^\n]+', '>\g<0>', old_string) # be sure to import re
This should work exactly as print except for what you ask:
def newprint(*args, **kwargs):
to_print = " ".join([str(a) for a in args])
print(">", "\n> ".join(to_print.splitlines()), **kwargs)

Python complex regex replace

I'm trying to do a simple VB6 to c translator to help me port an open source game to the c language.
I want to be able to get "NpcList[NpcIndex]" from "With Npclist[NpcIndex]" using ragex and to replace it everywhere it has to be replaced. ("With" is used as a macro in VB6 that adds Npclist[NpcIndex] when ever it needs to until it founds "End With")
Example:
With Npclist[NpcIndex]
.goTo(245) <-- it should be replaced with Npclist[NpcIndex].goTo(245)
End With
Is it possible to use regex to do the job?
I've tried using a function to perfom another regex replace between the "With" and the "End With" but I can't know the text the "With" is replacing (Npclist[NpcIndex]).
Thanks in advance
I personally wouldn't trust any single-regex solution to get it right on the first time nor feel like debugging it. Instead, I would parse the code line-to-line and cache any With expression to use it to replace any . directly preceded by whitespace or by any type of brackets (add use-cases as needed):
(?<=[\s[({])\. - positive lookbehind for any character from the set + escaped literal dot
(?:(?<=[\s[({])|^)\. - use this non-capturing alternatives list if to-be-replaced . can occur on the beginning of line
import re
def convert_vb_to_c(vb_code_lines):
c_code = []
current_with = ""
for line in vb_code_lines:
if re.search(r'^\s*With', line) is not None:
current_with = line[5:] + "."
continue
elif re.search(r'^\s*End With', line) is not None:
current_with = "{error_outside_with_replacement}"
continue
line = re.sub(r'(?<=[\s[({])\.', current_with, line)
c_code.append(line)
return "\n".join(c_code)
example = """
With Npclist[NpcIndex]
.goTo(245)
End With
With hatla
.matla.tatla[.matla.other] = .matla.other2
dont.mind.me(.do.mind.me)
.next()
End With
"""
# use file_object.readlines() in real life
print(convert_vb_to_c(example.split("\n")))
You can pass a function to the sub method:
# just to give the idea of the regex
regex = re.compile(r'''With (.+)
(the-regex-for-the-VB-expression)+?
End With''')
def repl(match):
beginning = match.group(1) # NpcList[NpcIndex] in your example
return ''.join(beginning + line for line in match.group(2).splitlines())
re.sub(regex, repl, the_string)
In repl you can obtain all the information about the matching from the match object, build whichever string you want and return it. The matched string will be replaced by the string you return.
Note that you must be really careful to write the regex above. In particular using (.+) as I did matches all the line up to the newline excluded, which or may not be what you want(but I don't know VB and I have no idea which regex could go there instead to catch only what you want.
The same goes for the (the-regex-forthe-VB-expression)+. I have no idea what code could be in those lines, hence I leave to you the detail of implementing it. Maybe taking all the line can be okay, but I wouldn't trust something this simple(probably expressions can span multiple lines, right?).
Also doing all in one big regular expression is, in general, error prone and slow.
I'd strongly consider regexes only to find With and End With and use something else to do the replacements.
This may do what you need in Python 2.7. I'm assuming you want to strip out the With and End With, right? You don't need those in C.
>>> import re
>>> search_text = """
... With Np1clist[Npc1Index]
... .comeFrom(543)
... End With
...
... With Npc2list[Npc2Index]
... .goTo(245)
... End With"""
>>>
>>> def f(m):
... return '{0}{1}({2})'.format(m.group(1), m.group(2), m.group(3))
...
>>> regex = r'With\s+([^\s]*)\s*(\.[^(]+)\(([^)]+)\)[^\n]*\nEnd With'
>>> print re.sub(regex, f, search_text)
Np1clist[Npc1Index].comeFrom(543)
Npc2list[Npc2Index].goTo(245)

Categories

Resources