I have a string for example:
"112233445566778899"
How can I spilt it to the following pattern:
"\x11\x22\x33\x44\x55\x66\x77\x88\x99"
I could spilt the string with following commands, but I could find out a way to append "\x" to the them:
s = "112233445566778899"
[s[i:i + 2] for i in range(0, len(s), 2)]
Assuming your string will always have an even length, you always want to split the string into pairs, and that your string is already ordered:
>>> string = "112233445566778899"
>>> joined = ''.join(r'\x{}'.format(s + s) for s in string[1::2])
>>> print(joined)
\x11\x22\x33\x44\x55\x66\x77\x88\x99
>>>
You can do the following edit to your code:
...
[r"\x"+s[i:i + 2] for i in range(0, len(s), 2)]
...
Notice that this will return two forward slashes:
['\\x11', '\\x22', '\\x33', '\\x44', '\\x55', '\\x66', '\\x77', '\\x88', '\\x99']
This is because of Python escaping the \ using the escaping character \.
When using the string you will notice that one of the \ disappears:
>> x = ['\\x11', '\\x22', '\\x33', '\\x44', '\\x55', '\\x66', '\\x77', '\\x88', '\\x99']
>> print(x[0])
>> '\x11'
s = "112233445566778899"
a = [r'\x' + s[i:i + 2] for i in range(0, len(s), 2)]
print(''.join(a))
I think using regular expressions can you the best. Because it can find doubled characters anywhere on the string.
>>>import re
>>>string = "112233445566778899"
>>>x = ''.join(r'\x{}'.format(s) for s in re.finditer(r'(\w)\1',string))
>>>x
'\\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88\\x99'
>>> print(x)
\x11\x22\x33\x44\x55\x66\x77\x88\x99
Related
Is there a simple way in python to replace multiples characters by another?
For instance, I would like to change:
name1_22:3-3(+):Pos_bos
to
name1_22_3-3_+__Pos_bos
So basically replace all "(",")",":" with "_".
I only know to do it with:
str.replace(":","_")
str.replace(")","_")
str.replace("(","_")
You could use re.sub to replace multiple characters with one pattern:
import re
s = 'name1_22:3-3(+):Pos_bos '
re.sub(r'[():]', '_', s)
Output
'name1_22_3-3_+__Pos_bos '
Use a translation table. In Python 2, maketrans is defined in the string module.
>>> import string
>>> table = string.maketrans("():", "___")
In Python 3, it is a str class method.
>>> table = str.maketrans("():", "___")
In both, the table is passed as the argument to str.translate.
>>> 'name1_22:3-3(+):Pos_bos'.translate(table)
'name1_22_3-3_+__Pos_bos'
In Python 3, you can also pass a single dict mapping input characters to output characters to maketrans:
table = str.maketrans({"(": "_", ")": "_", ":": "_"})
Sticking to your current approach of using replace():
s = "name1_22:3-3(+):Pos_bos"
for e in ((":", "_"), ("(", "_"), (")", "__")):
s = s.replace(*e)
print(s)
OUTPUT:
name1_22_3-3_+___Pos_bos
EDIT: (for readability)
s = "name1_22:3-3(+):Pos_bos"
replaceList = [(":", "_"), ("(", "_"), (")", "__")]
for elem in replaceList:
print(*elem) # : _, ( _, ) __ (for each iteration)
s = s.replace(*elem)
print(s)
OR
repList = [':','(',')'] # list of all the chars to replace
rChar = '_' # the char to replace with
for elem in repList:
s = s.replace(elem, rChar)
print(s)
Another possibility is usage of so-called list comprehension combined with so-called ternary conditional operator following way:
text = 'name1_22:3-3(+):Pos_bos '
out = ''.join(['_' if i in ':)(' else i for i in text])
print(out) #name1_22_3-3_+__Pos_bos
As it gives list, I use ''.join to change list of characters (strs of length 1) into str.
I need to split text before the second occurrence of the '-' character. What I have now is producing inconsistent results. I've tried various combinations of rsplit and read through and tried other solutions on SO, with no results.
Sample file name to split: 'some-sample-filename-to-split' returned in data.filename. In this case, I would only like to have 'some-sample' returned.
fname, extname = os.path.splitext(data.filename)
file_label = fname.rsplit('/',1)[-1]
file_label2 = file_label.rsplit('-',maxsplit=3)
print(file_label2,'\n','---------------','\n')
You can do something like this:
>>> a = "some-sample-filename-to-split"
>>> "-".join(a.split("-", 2)[:2])
'some-sample'
a.split("-", 2) will split the string upto the second occurrence of -.
a.split("-", 2)[:2] will give the first 2 elements in the list. Then simply join the first 2 elements.
OR
You could use regular expression : ^([\w]+-[\w]+)
>>> import re
>>> reg = r'^([\w]+-[\w]+)'
>>> re.match(reg, a).group()
'some-sample'
EDIT: As discussed in the comments, here is what you need:
def hyphen_split(a):
if a.count("-") == 1:
return a.split("-")[0]
return "-".join(a.split("-", 2)[:2])
>>> hyphen_split("some-sample-filename-to-split")
'some-sample'
>>> hyphen_split("some-sample")
'some'
A generic form to split a string into halves on the nth occurence of the separator would be:
def split(strng, sep, pos):
strng = strng.split(sep)
return sep.join(strng[:pos]), sep.join(strng[pos:])
If pos is negative it will count the occurrences from the end of string.
>>> strng = 'some-sample-filename-to-split'
>>> split(strng, '-', 3)
('some-sample-filename', 'to-split')
>>> split(strng, '-', -4)
('some', 'sample-filename-to-split')
>>> split(strng, '-', 1000)
('some-sample-filename-to-split', '')
>>> split(strng, '-', -1000)
('', 'some-sample-filename-to-split')
You can use str.index():
def hyphen_split(s):
pos = s.index('-')
try:
return s[:s.index('-', pos + 1)]
except ValueError:
return s[:pos]
test:
>>> hyphen_split("some-sample-filename-to-split")
'some-sample'
>>> hyphen_split("some-sample")
'some'
You could use regular expressions:
import re
file_label = re.search('(.*?-.*?)-', fname).group(1)
When proceeding with the dataframe and the split needed
for the entire column values, lambda function is better than regex.
df['column_name'].apply(lambda x: "-".join(x.split('-',2)[:2]))
Here's a somewhat cryptic implementation avoiding the use of join():
def split(string, sep, n):
"""Split `string´ at the `n`th occurrence of `sep`"""
pos = reduce(lambda x, _: string.index(sep, x + 1), range(n + 1), -1)
return string[:pos], string[pos + len(sep):]
I have the following string where I need to extract only the first digits from it.
string = '50.2000\xc2\xb0 E'
How do I extract 50.2000 from string?
If the number can be followed by any kind of character, try using a regex:
>>> import re
>>> r = re.compile(r'(\d+\.\d+)')
>>> r.match('50.2000\xc2\xb0 E').group(1)
'50.2000'
mystring = '50.2000\xc2\xb0 E'
print mystring.split("\xc2", 1)[0]
Output
50.2000
If you just wanted to split the first digits, just slice the string:
start = 10 #start at the 10th digit
print mystring[start:]
Demo:
>>> my_string = 'abcasdkljf23u109842398470ujw{}{\\][\\['
>>> start = 10
>>> print(my_string[start:])
23u109842398470ujw{}{\][\[
You can, split the string at the first \:
>>> s = r'50.2000\xc2\xb0 E'
>>> s.split('\\', 1)
['50.2000', 'xc2\\xb0 E']
You could solve this using a regular expression:
In [1]: import re
In [2]: string = '50.2000\xc2\xb0 E'
In [3]: m = re.match('^([0-9]+\.?[0-9]*)', string)
In [4]: m.group(0)
Out[4]: '50.2000'
I need to add a space on each 3 characters of a python string but don't have many clues on how to do it.
The string:
345674655
The output that I need:
345 674 655
Any clues on how to achieve this?
Best Regards,
You just need a way to iterate over your string in chunks of 3.
>>> a = '345674655'
>>> [a[i:i+3] for i in range(0, len(a), 3)]
['345', '674', '655']
Then ' '.join the result.
>>> ' '.join([a[i:i+3] for i in range(0, len(a), 3)])
'345 674 655'
Note that:
>>> [''.join(x) for x in zip(*[iter(a)]*3)]
['345', '674', '655']
also works for partitioning the string. This will work for arbitrary iterables (not just strings), but truncates the string where the length isn't divisible by 3. To recover the behavior of the original, you can use itertools.izip_longest (itertools.zip_longest in py3k):
>>> import itertools
>>> [''.join(x) for x in itertools.izip_longest(*[iter(a)]*3, fillvalue=' ')]
['345', '674', '655']
Of course, you pay a little in terms of easy reading for the improved generalization in these latter answers ...
Best Function based on #mgilson's answer
def litering_by_three(a):
return ' '.join([a[i:i + 3] for i in range(0, len(a), 3)])
# replace (↑) with you character like ","
output example:
>>> x="500000"
>>> print(litering_by_three(x))
'500 000'
>>>
or for , example:
>>> def litering_by_three(a):
>>> return ','.join([a[i:i + 3] for i in range(0, len(a), 3)])
>>> # replace (↑) with you character like ","
>>> print(litering_by_three(x))
'500,000'
>>>
a one-line solution will be
" ".join(splitAt(x,3))
however, Python is missing a splitAt() function, so define yourself one
def splitAt(w,n):
for i in range(0,len(w),n):
yield w[i:i+n]
How about reversing the string to jump by 3 starting from the units, then reversing again. The goal is to obtain "12 345".
n="12345"
" ".join([n[::-1][i:i+3] for i in range(0, len(n), 3)])[::-1]
Join with '-' the concatenated of the first, second and third characters of each 3 characters:
' '.join(a+b+c for a,b,c in zip(x[::3], x[1::3], x[2::3]))
Be sure string length is dividable by 3
I have possible strings of prices like:
20.99, 20, 20.12
Sometimes the string could be sent to me wrongly by the user to something like this:
20.99.0, 20.0.0
These should be converted back to :
20.99, 20
So basically removing anything from the 2nd . if there is one.
Just to be clear, they would be alone, one at a time, so just one price in one string
Any nice one liner ideas?
For a one-liner, you can use .split() and .join():
>>> '.'.join('20.99.0'.split('.')[:2])
'20.99'
>>> '.'.join('20.99.1231.23'.split('.')[:2])
'20.99'
>>> '.'.join('20.99'.split('.')[:2])
'20.99'
>>> '.'.join('20'.split('.')[:2])
'20'
You could do something like this
>>> s = '20.99.0, 20.0.0'
>>> s.split(',')
['20.99.0', ' 20.0.0']
>>> map(lambda x: x[:x.find('.',x.find('.')+1)], s.split(','))
['20.99', ' 20.0']
Look at the inner expression of find. I am finding the first '.' and incrementing by 1 and then find the next '.' and leaving everything from that in the string slice operation.
Edit: Note that this solution will not discard everything from the second decimal point, but discard only the second point and keep additional digits. If you want to discard all digits, you could use e.g. #Blender's solution
It only qualifies as a one-liner if two instructions per line with a ; count, but here's what I came up with:
>>> x = "20.99.1234"
>>> s = x.split("."); x = s[0] + "." + "".join(s[1:])
>>> x
20.991234
It should be a little faster than scanning through the string multiple times, though. For a performance cost, you can do this:
>>> x = x.split(".")[0] + "." + "".join(x.split(".")[1:])
For a whole list:
>>> def numify(x):
>>> s = x.split(".")
>>> return float( s[0] + "." + "".join(s[1:]))
>>> x = ["123.4.56", "12.34", "12345.6.7.8.9"]
>>> [ numify(f) for f in x ]
[123.456, 12.34, 12345.6789]
>>> s = '20.99, 20, 20.99.23'
>>> ','.join(x if x.count('.') in [1,0] else x[:x.rfind('.')] for x in s.split(','))
'20.99, 20, 20.99'
If you are looking for a regex based solution and your intended behaviour is to discard everthing after the second .(decimal) than
>>> st = "20.99.123"
>>> string_decimal = re.findall(r'\d+\.\d+',st)
>>> float(''.join(string_decimal))
20.99