Python - Comma in string causes issue with strip - python

I have strings as tuples that I'm trying to remove quotation marks from. If there isn't a comma present in the string, then it works. But if there is a comma, then quotation marks still remain:
example = [('7-30-17','0x34','"Upload Complete"'),('7-31-17','0x35','"RCM","Interlock error"')]
example = [(x,y,(z.strip('"')))
for x,y,z in example]
The result is that quotation marks partially remain in the strings that had commas in them. The second tuple now reads RCM","Interlock error as opposed to RCM, Interlock error
('7-30-17','0x34','Upload Complete')
('7-31-17','0x35','RCM","Interlock error')
Any ideas what I'm doing wrong? Thanks!

You can use list comprehension to iterate the list items and similarly for the inner tuple items
>>> [tuple(s.replace('"','') for s in tup) for tup in example]
[('7-30-17', '0x34', 'Upload Complete'), ('7-31-17', '0x35', 'RCM,Interlock error')]

It seems like you're looking for the behaviour of replace(), rather than strip().
Try using replace('"', '') instead of strip('"'). strip only removes characters from the beginning and end of strings, while replace will take care of all occurrences.
Your example would be updated to look like this:
example = [('7-30-17','0x34','"Upload Complete"'),('7-31-17','0x35','"RCM","Interlock error"')]
example = [(x,y,(z.replace('"', '')))
for x,y,z in example]
example ends up with the following value:
[('7-30-17', '0x34', 'Upload Complete'), ('7-31-17', '0x35', 'RCM,Interlock error')]

The problem is because strip will remove only from ends of string.
Use a regex to replace ":
import re
example = [('7-30-17','0x34','"Upload Complete"'),('7-31-17','0x35','"RCM","Interlock error"')]
example = [(x,y,(re.sub('"','',z)))
for x,y,z in example]
print(example)
# [('7-30-17', '0x34', 'Upload Complete'), ('7-31-17', '0x35', 'RCM,Interlock error')]

Related

generate a list of string elements without the quotation mark

I would like to generate a list of string elements, but without the quotation marks. This is how I am generating the list:
[f'test_{i+1}' for i in range(5)]
This yields the following result:
['test_1', 'test_2', 'test_3', 'test_4', 'test_5']
How do I remove the quotaton marks? I tried as shown below but this gives me a syntax error.
[f'test_{i+1}' for i in range(5)].replace(''', '')
There are no quotation marks in your strings. The quotation marks you trying to remove are a part of the Python syntax. They are necessary to delimit your strings. You cannot remove them.
P.S: Python lists have no replace method. If you want to replace anything within the string, the following syntax will do:
a = # the character to be replaced
b = # the character to replace a
[f'test_{i+1}'.replace(a, b) for i in range(5)]
If for some reason you cannot have quotes in the print statement, you can use
print(', '.join(['test_1', 'test_2', 'test_3', 'test_4', 'test_5']))
Note that this is joining all of the elements together into a single string.

Remove brackets and number inside from string Python

I've seen a lot of examples on how to remove brackets from a string in Python, but I've not seen any that allow me to remove the brackets and a number inside of the brackets from that string.
For example, suppose I've got a string such as "abc[1]". How can I remove the "[1]" from the string to return just "abc"?
I've tried the following:
stringTest = "abc[1]"
stringTestWithoutBrackets = str(stringTest).strip('[]')
but this only outputs the string without the final bracket
abc[1
I've also tried with a wildcard option:
stringTest = "abc[1]"
stringTestWithoutBrackets = str(stringTest).strip('[\w+\]')
but this also outputs the string without the final bracket
abc[1
You could use regular expressions for that, but I think the easiest way would be to use split:
>>> stringTest = "abc[1][2][3]"
>>> stringTest.split('[', maxsplit=1)[0]
'abc'
You can use regex but you need to use it with the re module:
re.sub(r'\[\d+\]', '', stringTest)
If the [<number>] part is always at the end of the string you can also strip via:
stringTest.rstrip('[0123456789]')
Though the latter version might strip beyond the [ if the previous character is in the strip list too. For example in "abc1[5]" the "1" would be stripped as well.
Assuming your string has the format "text[number]" and you only want to keep the "text", then you could do:
stringTest = "abc[1]"
bracketBegin = stringTest.find('[')
stringTestWithoutBrackets = stringTest[:bracketBegin]

Python 3 split()

When I'm splitting a string "abac" I'm getting undesired results.
Example
print("abac".split("a"))
Why does it print:
['', 'b', 'c']
instead of
['b', 'c']
Can anyone explain this behavior and guide me on how to get my desired output?
Thanks in advance.
As #DeepSpace pointed out (referring to the docs)
If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']).
Therefore I'd suggest using a better delimiter such as a comma , or if this is the formatting you're stuck with then you could just use the builtin filter() function as suggested in this answer, this will remove any "empty" strings if passed None as the function.
sample = 'abac'
filtered_sample = filter(None, sample.split('a'))
print(filtered_sample)
#['b', 'c']
When you split a string in python you keep everything between your delimiters (even when it's an empty string!)
For example, if you had a list of letters separated by commas:
>>> "a,b,c,d".split(',')
['a','b','c','d']
If your list had some missing values you might leave the space in between the commas blank:
>>> "a,b,,d".split(',')
['a','b','','d']
The start and end of the string act as delimiters themselves, so if you have a leading or trailing delimiter you will also get this "empty string" sliced out of your main string:
>>> "a,b,c,d,,".split(',')
['a','b','c','d','','']
>>> ",a,b,c,d".split(',')
['','a','b','c','d']
If you want to get rid of any empty strings in your output, you can use the filter function.
If instead you just want to get rid of this behavior near the edges of your main string, you can strip the delimiters off first:
>>> ",,a,b,c,d".strip(',')
"a,b,c,d"
>>> ",,a,b,c,d".strip(',').split(',')
['a','b','c','d']
In your example, "a" is what's called a delimiter. It acts as a boundary between the characters before it and after it. So, when you call split, it gets the characters before "a" and after "a" and inserts it into the list. Since there's nothing in front of the first "a" in the string "abac", it returns an empty string and inserts it into the list.
split will return the characters between the delimiters you specify (or between an end of the string and a delimiter), even if there aren't any, in which case it will return an empty string. (See the documentation for more information.)
In this case, if you don't want any empty strings in the output, you can use filter to remove them:
list(filter(lambda s: len(s) > 0, "abac".split("a"))

Last element in python list, created by splitting a string is empty

So I have a string which I need to parse. The string contains a number of words, separated by a hyphen (-). The string also ends with a hyphen.
For example one-two-three-.
Now, if I want to look at the words on their own, I split up the string to a list.
wordstring = "one-two-three-"
wordlist = wordstring.split('-')
for i in range(0, len(wordlist)):
print(wordlist[i])
Output
one
two
three
#empty element
What I don't understand is, why in the resulting list, the final element is an empty string.
How can I omit this empty element?
Should I simply truncate the list or is there a better way to split the string?
You have an empty string because the split on the last - character produces an empty string on the RHS. You can strip all '-' characters from the string before splitting:
wordlist = wordstring.strip('-').split('-')
If the final element is always a - character, you can omit it by using [:-1] which grabs all the elements of the string besides the last character.
Then, proceed to split it as you did:
wordlist = wordstring[:-1].split('-')
print(wordlist)
['one', 'two', 'three']
You can use regex to do this :
import re
wordlist = re.findall("[a-zA-Z]+(?=-)", wordstring)
Output :
['one', 'two', 'three']
You should use the strip built-in function of Python before splitting your String. E.g:
wordstring = "one-two-three-"
wordlist = wordstring.strip('-').split('-')
I believe .split() is assuming there is another element after the last - but it is obviously a blank entry.
Are you open to removing the dash in wordstring before splitting it?
wordstring = "one-two-three-"
wordlist = wordstring[:-1].split('-')
print wordlist
OUT: 'one-two-three'
This is explained in the docs:
...
If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']).
...
If you know your strings will always end in '-', then just remove the last one by doing wordlist.pop().
If you need something more complicated you may want to learn about regular expressions.
Just for the variaty of options:
wordlist = [x for x in wordstring.split('-') if x]
Note that the above also handles cases such as: wordstring = "one-two--three-" (double hyphen)
First strip() then split()
wordstring = "one-two-three-"
x = wordstring.strip('-')
y = x.split('-')
for word in y:
print word
Strip/trim the string before splitting. This way you will remove the trailing "\n" and you should be fine.

Python - regex, blank element at the end of the list?

I have a code
print(re.split(r"[\s\?\!\,\;]+", "Holy moly, feferoni!"))
which results
['Holy', 'moly', 'feferoni', '']
How can I get rid of this last blank element, what caused it?
If this is a dirty way to get rid of punctuation and spaces from a string, how else can I write but in regex?
Expanding on what #HamZa said in his comment, you would use re.findall and a negative character set:
>>> from re import findall
>>> findall(r"[^\s?!,;]+", "Holy moly, feferoni!")
['Holy', 'moly', 'feferoni']
>>>
You get the empty string as the last element of you list, because the RegEx splits after the last !. It ends up giving you what's before the ! and what's after it, but after it, there's simply nothing, i.e. an empty string! You might have the same problem in the middle of the string if you didn't wisely add the + to your RegEx.
Add a call to list if you can't work with an iterator. If you want to elegantly get rid of the optional empty string, do:
filter(None, re.split(r"[\s?!,;]+", "Holy moly, feferoni!"))
This will result in:
['Holy', 'moly', 'feferoni']
What this does is remove every element that is not a True value. The filter function generally only returns elements that satisfy a requirement given as a function, but if you pass None it will check if the value itself is True. Because an empty string is False and every other string is True it will remove every empty string from the list.
Also note I removed the escaping of special characters in the character class, as it is simply not neccessary and just makes the RegEx harder to read.
the first thing which comes to my mind is something like this:
>>> mystring = re.split(r"[\s\?\!\,\;]+", "Holy moly, feferoni!")
['Holy', 'moly', 'feferoni', '']
>>> mystring.pop(len(mystring)-1)
>>> print mystring
['Holy', 'moly', 'feferoni']
__import__('re').findall('[^\s?!,;]+', 'Holy moly, feferoni!')

Categories

Resources