python string, delete character, count from right - python

I have some strings I created with elements coming from many sources, number of elements will vary each time the program is run; I created a sample string that my program creates now.
I want to count in [:-3] for the following string and delete the last comma:
'{"SEignjExQfumwZRacPNHvq8UcsBjKWPERB":1.00000000,"SCaWymicaunRLAxNSTTRhVxLMAB9PaKBDK":2.80000000,"SGFHTxuRLttUShUjZyFMzs8NgC1JopSUK6":1.20000000,}'
So my string looks like:
'{"SEignjExQfumwZRacPNHvq8UcsBjKWPERB":1.00000000,"SCaWymicaunRLAxNSTTRhVxLMAB9PaKBDK":2.80000000,"SGFHTxuRLttUShUjZyFMzs8NgC1JopSUK6":1.20000000}'
I just cant quite get there, help appreciated.

To remove the third last character from the string you can use:
string[:-3] + string[-2:]
>>> string = "hellothere"
>>> string[:-3] + string[-2:]
'hellothre'

I would use rsplit to split on the right most occurrence of a substring (limiting to two results) and then join them with an empty string
''.join(s.rsplit(',', 2))

a = '{"SEignjExQfumwZRacPNHvq8UcsBjKWPERB":1.00000000,"SCaWymicaunRLAxNSTTRhVxLMAB9PaKBDK":2.80000000,"SGFHTxuRLttUShUjZyFMzs8NgC1JopSUK6":1.20000000,}'
a[:len(a) - 2] + a[len(a) - 1:]
You could obviously use different expressions in the brackets, I just wanted to show that you could use any expressions you wanted.

you can try with rfind to find the last comma
s = '{"SEignjExQfumwZRacPNHvq8UcsBjKWPERB":1.00000000,"SCaWymicaunRLAxNSTTRhVxLMAB9PaKBDK":2.80000000,"SGFHTxuRLttUShUjZyFMzs8NgC1JopSUK6":1.20000000,}'
idx = s.rfind(",")
s[:idx]+s[idx+1:]
you get,
'{"SEignjExQfumwZRacPNHvq8UcsBjKWPERB":1.00000000,"SCaWymicaunRLAxNSTTRhVxLMAB9PaKBDK":2.80000000,"SGFHTxuRLttUShUjZyFMzs8NgC1JopSUK6":1.20000000}'

Using regex:
>>> print re.sub(r ",(?=[^.]*$)", r '', s)
{"SEignjExQfumwZRacPNHvq8UcsBjKWPERB":1.00000000,"SCaWymicaunRLAxNSTTRhVxLMAB9PaKBDK":2.80000000,"SGFHTxuRLttUShUjZyFMzs8NgC1JopSUK6":1.20000000}
This will match a ',' all before a any potential NOT ','. It matches the last ',' right before the end of a string.

Related

How to partial split and take the first portion of string in Python?

Have a scenario where I wanted to split a string partially and pick up the 1st portion of the string.
Say String could be like aloha_maui_d0_b0 or new_york_d9_b10. Note: After d its numerical and it could be any size.
I wanted to partially strip any string before _d* i.e. wanted only _d0_b0 or _d9_b10.
Tried below code, but obviously it removes the split term as well.
print(("aloha_maui_d0_b0").split("_d"))
#Output is : ['aloha_maui', '0_b0']
#But Wanted : _d0_b0
Is there any other way to get the partial portion? Do I need to try out in regexp?
How about just
stArr = "aloha_maui_d0_b0".split("_d")
st2 = '_d' + stArr[1]
This should do the trick if the string always has a '_d' in it
You can use index() to split in 2 parts:
s = 'aloha_maui_d0_b0'
idx = s.index('_d')
l = [s[:idx], s[idx:]]
# l = ['aloha_maui', '_d0_b0']
Edit: You can also use this if you have multiple _d in your string:
s = 'aloha_maui_d0_b0_d1_b1_d2_b2'
idxs = [n for n in range(len(s)) if n == 0 or s.find('_d', n) == n]
parts = [s[i:j] for i,j in zip(idxs, idxs[1:]+[None])]
# parts = ['aloha_maui', '_d0_b0', '_d1_b1', '_d2_b2']
I have two suggestions.
partition()
Use the method partition() to get a tuple containing the delimiter as one of the elements and use the + operator to get the String you want:
teste1 = 'aloha_maui_d0_b0'
partitiontest = teste1.partition('_d')
print(partitiontest)
print(partitiontest[1] + partitiontest[2])
Output:
('aloha_maui', '_d', '0_b0')
_d0_b0
The partition() methods returns a tuple with the first element being what is before the delimiter, the second being the delimiter itself and the third being what is after the delimiter.
The method does that to the first case of the delimiter it finds on the String, so you can't use it to split in more than 3 without extra work on the code. For that my second suggestion would be better.
replace()
Use the method replace() to insert an extra character (or characters) right before your delimiter (_d) and use these as the delimiter on the split() method.
teste2 = 'new_york_d9_b10'
replacetest = teste2.replace('_d', '|_d')
print(replacetest)
splitlist = replacetest.split('|')
print(splitlist)
Output:
new_york|_d9_b10
['new_york', '_d9_b10']
Since it replaces all cases of _d on the String for |_d there is no problem on using it to split in more than 2.
Problem?
A situation to which you may need to be careful would be for unwanted splits because of _d being present in more places than anticipated.
Following the apparent logic of your examples with city names and numericals, you might have something like this:
teste3 = 'rio_de_janeiro_d3_b32'
replacetest = teste3.replace('_d', '|_d')
print(replacetest)
splitlist = replacetest.split('|')
print(splitlist)
Output:
rio|_de_janeiro|_d3_b32
['rio', '_de_janeiro', '_d3_b32']
Assuming you always have the numerical on the end of the String and _d won't happen inside the numerical, rpartition() could be a solution:
rpartitiontest = teste3.rpartition('_d')
print(rpartitiontest)
print(rpartitiontest[1] + rpartitiontest[2])
Output:
('rio_de_janeiro', '_d', '3_b32')
_d3_b32
Since rpartition() starts the search on the String's end and only takes the first match to separate the terms into a tuple, you won't have to worry about the first term (city's name?) causing unexpected splits.
Use regex's split and keep delimiters capability:
import re
patre = re.compile(r"(_d\d)")
#👆 👆
#note the surrounding parenthesises - they're what drives "keep"
for line in """aloha_maui_d0_b0 new_york_d9_b10""".split():
parts = patre.split(line)
print("\n", line)
print(parts)
p1, p2 = parts[0], "".join(parts[1:])
print(p1, p2)
output:
aloha_maui_d0_b0
['aloha_maui', '_d0', '_b0']
aloha_maui _d0_b0
new_york_d9_b10
['new_york', '_d9', '_b10']
new_york _d9_b10
credit due: https://stackoverflow.com/a/15668433

how to get second last and last value in a string after separator in python

In Python, how do you get the last and second last element in string ?
string "client_user_username_type_1234567"
expected output : "type_1234567"
Try this :
>>> s = "client_user_username_type_1234567"
>>> '_'.join(s.split('_')[-2:])
'type_1234567'
You can also use re.findall:
import re
s = "client_user_username_type_1234567"
result = re.findall('[a-zA-Z]+_\d+$', s)[0]
Output:
'type_1234567'
There's no set function that will do this for you, you have to use what Python gives you and for that I present:
split slice and join
"_".join("one_two_three".split("_")[-2:])
In steps:
Split the string by the common separator, "_"
s.split("_")
Slice the list so that you get the last two elements by using a negative index
s.split("_")[-2:]
Now you have a list composed of the last two elements, now you have to merge that list again so it's like the original string, with separator "_".
"_".join("one_two_three".split("_")[-2:])
That's pretty much it. Another way to investigate is through regex.

How to get sub string from a string in python using split or regex

I have a str in python like below. I want extract a substring from it.
table='abc_test_01'
number=table.split("_")[1]
I am getting test as a result.
What I want is everything after the first _.
The result I want is test_01 how can I achieve that.
Here is the code as already given by many of them
table='abc_test_01'
number=table.split("_",1)[1]
But the above one may fail in situations when the occurrence is not in the string, then you'll get IndexError: list index out of range
For eg.
table='abctest01'
number=table.split("_",1)[1]
The above one will raise IndexError, as the occurrence is not in the string
So the more accurate code for handling this is
table.split("_",1)[-1]
Therefore -1 will not get any harm because the number of occurrences is already set to one.
Hope it helps :)
To get the substring (all characters after the first occurrence of underscore):
number = table[table.index('_')+1:]
# Output: test_01
You could do it like:
import re
string = "abc_test_01"
rx = re.compile(r'[^_]*_(.+)')
match = rx.match(string).group(1)
print(match)
Or with normal string functions:
string = "abc_test_01"
match = '_'.join(string.split('_')[1:])
print(match)
Nobody mentions that the split() function can have an maxsplit argument:
str.split(sep=None, maxsplit=-1)
return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements).
So the solution is only:
table.split('_', 1)[1]
You can try this:
Edit: Thanks to #valtah's comment:
table = 'abc_test_01'
#final = "_".join(table.split("_")[1:])
final = table.split("_", 1)[1]
print final
Output:
'test_01'
Also the answer of #valtah in the comment is correct:
final = table.partition("_")[2]
print final
Will output the same result

Python - Parse strings with variable repeating substring

I am trying to do something which I thought would be simple (and probably is), however I am hitting a wall. I have a string that contains document numbers. In most cases the format is ######-#-### however in some cases, where the single digit should be, there are multiple single digits separated by a comma (i.e. ######-#,#,#-###). The number of single digits separated by a comma is variable. Below is an example:
For the string below:
('030421-1,2-001 & 030421-1-002,030421-1,2,3-002, 030421-1-003')
I need to return:
['030421-1-001', '030421-2-001' '030421-1-002', '030421-1-002', '030421-2-002', '030421-3-002' '030421-1-003']
I have only gotten as far as returning the strings that match the ######-#-### pattern:
import re
p = re.compile('\d{6}-\d{1}-\d{3}')
m = p.findall('030421-1,2-001 & 030421-1-002,030421-1,2,3-002, 030421-1-003')
print m
Thanks in advance for any help!
Matt
Perhaps something like this:
>>> import re
>>> s = '030421-1,2-001 & 030421-1-002,030421-1,2,3-002, 030421-1-003'
>>> it = re.finditer(r'(\b\d{6}-)(\d(?:,\d)*)(-\d{3})\b', s)
>>> for m in it:
a, b, c = m.groups()
for x in b.split(','):
print a + x + c
...
030421-1-001
030421-2-001
030421-1-002
030421-1-002
030421-2-002
030421-3-002
030421-1-003
Or using a list comprehension
>>> [a+x+c for a, b, c in (m.groups() for m in it) for x in b.split(',')]
['030421-1-001', '030421-2-001', '030421-1-002', '030421-1-002', '030421-2-002', '030421-3-002', '030421-1-003']
Use '\d{6}-\d(,\d)*-\d{3}'.
* means "as many as you want (0 included)".
It is applied to the previous element, here '(,\d)'.
I wouldn't use a single regular expression to try and parse this. Since it is essentially a list of strings, you might find it easier to replace the "&" with a comma globally in the string and then use split() to put the elements into a list.
Doing a loop of the list will allow you to write a single function to parse and fix the string and then you can push it onto a new list and the display your string.
replace(string, '&', ',')
initialList = string.split(',')
for item in initialList:
newItem = myfunction(item)
newList.append(newItem)
newstring = newlist(join(','))
(\d{6}-)((?:\d,?)+)(-\d{3})
We take 3 capturing groups. We match the first part and last part the easy way. The center part is optionally repeated and optionally contains a ','. Regex will however only match the last one, so ?: won't store it at all. What where left with is the following result:
>>> p = re.compile('(\d{6}-)((?:\d,?)+)(-\d{3})')
>>> m = p.findall('030421-1,2-001 & 030421-1-002,030421-1,2,3-002, 030421-1-003')
>>> m
[('030421-', '1,2', '-001'), ('030421-', '1', '-002'), ('030421-', '1,2,3', '-002'), ('030421-', '1', '-003')]
You'll have to manually process the 2nd term to split them up and join them, but a list comprehension should be able to do that.

Ways to slice a string?

I have a string, example:
s = "this is a string, a"
Where a ',' (comma) will always be the 3rd to the last character, aka s[-3].
I am thinking of ways to remove the ',' but can only think of converting the string into a list, deleting it, and converting it back to a string. This however seems a bit too much for simple task.
How can I accomplish this in a simpler way?
Normally, you would just do:
s = s[:-3] + s[-2:]
The s[:-3] gives you a string up to, but not including, the comma you want removed ("this is a string") and the s[-2:] gives you another string starting one character beyond that comma (" a").
Then, joining the two strings together gives you what you were after ("this is a string a").
A couple of variants, using the "delete the last comma" rather than "delete third last character" are:
s[::-1].replace(",","",1)[::-1]
or
''.join(s.rsplit(",", 1))
But these are pretty ugly. Slightly better is:
a, _, b = s.rpartition(",")
s = a + b
This may be the best approach if you don't know the comma's position (except for last comma in string) and effectively need a "replace from right". However Anurag's answer is more pythonic for the "delete third last character".
Python strings are immutable. This means that you must create at least 1 new string in order to remove the comma, as opposed to editing the string in place in a language like C.
For deleting every ',' character in the text, you can try
s = s.split(',')
>> ["this is a string", " a"]
s = "".join(s)
>> "this is a string a"
Or in one line:
s0 = "".join(s.split(','))
The best simple way is : You can use replace function as :-
>>> s = 'this is a string, a'
>>> s = s.replace(',','')
>>> s
'this is a string a'
Here, replace() function search the character ',' and replace it by '' i.e. empty character
Note that , the replace() function defaults all ',' but if you want only replace some ',' in some case you can use : s.replace(',' , '', 1)
To slice a string of arbitrary length into multiple equal length slices of arbitrary length you could do
def slicer(string, slice_length):
return [string[i:i + slice_length]
for i in xrange(0, len(string), slice_length)]
If slice_length does not divide exactly into len(string) then there will be a single slice at the end of the list that holds the remainder.

Categories

Resources