How can I convert string into integer and remove every character from that change.
Example:
S = "--r10-" I want to have this: S = 10
This not work:
S = "--10-"
int(S)
You can use filter(str.isdigit, s) to keep only those characters of s that are digits:
>>> s = "--10-"
>>> int(filter(str.isdigit, s))
10
Note that this might lead to unexpected results for strings that contain multiple numbers
>>> int(filter(str.isdigit, "12 abc 34"))
1234
or negative numbers
>>> int(filter(str.isdigit, "-10"))
10
Edit: To make this work for unicode objects instead of str objects, use
int(filter(unicode.isdigit, u"--10-"))
remove all non digits first like that:
int(''.join(c for c in "abc123def456" if c.isdigit()))
You could just strip off - and r:
int("--r10-".strip('-r'))
use regex replace with /w to replace non word characters with "" empty string. then cast it
I prefer Sven Marnach's answer using filter and isdigit, but if you want you can use regular expressions:
>>> import re
>>> pat = re.compile(r'\d+') # '\d' means digit, '+' means one or more
>>> int(pat.search('--r10-').group(0))
10
If there are multiple integers in the string, it pulls the first one:
>>> int(pat.search('12 abc 34').group(0))
12
If you need to deal with negative numbers use this regex:
>>> pat = re.compile(r'\-{0,1}\d+') # '\-{0,1}' means zero or one dashes
>>> int(pat.search('negative: -8').group(0))
-8
This is simple and does not require you to import any packages.
def _atoi(self, string):
i = 0
for c in string:
i += ord(c)
return i
Related
I have lot of string somethings like this "01568460144" ,"0005855048560"
I want to remove all zero from beginning. I tried this which only removing one zeo from beginning but I also have others string those have multiple zeo at the beginning.
re.sub(r'0','',number)
so my expected result will be for "0005855048560" this type of string "5855048560"
If the goal is to remove all leading zeroes from a string, skip the regex, and just call .lstrip('0') on the string. The *strip family of functions are a little weird when the argument isn't a single character, but for the purposes of stripping leading/trailing copies of a single character, they're perfect:
>>> s = '000123'
>>> s = s.lstrip('0')
>>> s
'123'
>>> v = '0001111110'
>>>
>>> str(int(v))
'1111110'
>>>
>>> str(int('0005855048560'))
'5855048560'
If the string should contain only digits, you can use either isnumeric() or use re.sub and match only digits:
import re
strings = [
"01568460144",
"0005855048560",
"00test",
"00000",
"0"
]
for s1 in strings:
if s1.isnumeric():
print(f"'{s1.lstrip('0')}'")
else:
print(f"'{s1}'")
print("----------------------------")
for s2 in strings:
res = re.sub(r"^0+(\d*)$", r"\1", s2)
print(f"'{res}'")
Output
'1568460144'
'5855048560'
'00test'
''
''
----------------------------
'1568460144'
'5855048560'
'00test'
''
''
I am focusing on data with a regular expression. I am using python and I implement this function:
import re
exp = r"\bTimestamp\s+([0-9]+)\s+ID=(\w{32})0*\s+Dest_ID=(\w{32})0*\sASN_Received\s+(?!0000)[0-9A-F]{4}+"
rx = re.compile(exp)
m=rx.match("Timestamp 1549035123 ID=02141592cc0000000300000000000000 Dest_ID=00000000000000000000000000000000 Nbr_Received = ec30000000")
m.groups()
print(m.groups())
But it does not work correctly:
I expect to have this result:
('1549033267', '02141592cc0000000500000000000000','00000000000000000000000000000000','ec30000000')
Then I want to convert the hexadecimal value to decimal by using this function:
def Convert_Decimal(nbr_hex):
nbr_dec = nbr_hex[5] + nbr_hex[2:4] + nbr_hex[0:2]
reversed = int(nbr_dec, 16)
print(reversed)
As finalresult I want to have:
('1549033267', '02141592cc0000000500000000000000','00000000000000000000000000000000','12524')
Hexadecimal values use the digits 0-9 and the letters A through to F (upper or lowercase), only, and in your case are of a fixed length, so [0-9a-fA-F]{32} suffices to match those values. You don't need to match trailing zeros, when you have a fixed-length value.
You really don't want to use \w here, you wouldn't want to match underscores, the rest of the English alphabet, or any other letter-like symbol in the Unicode standard (there are thousands).
Next, you are looking for ASN_Received, but your input string uses the text Nbr_Received = with whitespace around the = character. Account for that:
exp = (
r'\bTimestamp\s+([0-9]+)\s+'
r'ID=([0-9a-fA-F]{32})\s+'
r'Dest_ID=([0-9a-fA-F]{32})\s+'
r'Nbr_Received\s*=\s*([0-9a-fA-F]{4,})'
)
I broke the expression across multiple lines to be easier to follow. Note that I used {4,} for the last hexadecimal value, matching 4 or more digits. You can't use + and {n,m} patterns together, choose one or the other.
You then get:
>>> import re
>>> exp = (
... r'\bTimestamp\s+([0-9]+)\s+'
... r'ID=([0-9a-fA-F]{32})\s+'
... r'Dest_ID=([0-9a-fA-F]{32})\s+'
... r'Nbr_Received\s*=\s*([0-9a-fA-F]{4,})'
... )
>>> rx = re.compile(exp)
>>> m = rx.match("Timestamp 1549035123 ID=02141592cc0000000300000000000000 Dest_ID=00000000000000000000000000000000 Nbr_Received = ec30000000")
>>> print(m.groups())
('1549035123', '02141592cc0000000300000000000000', '00000000000000000000000000000000', 'ec30000000')
Also see this online demo at regex101, which explains each part of the pattern on the right-hand side.
I'd convert the last hexadecimal number via bytes.fromhex() and int.from_bytes() to an integer:
>>> m.group(4)
'ec30000000'
>>> bytes.fromhex(m.group(4))
b'\xec0\x00\x00\x00'
>>> int.from_bytes(bytes.fromhex(m.group(4)), 'little')
12524
Try this:
>>> import re
>>> string = "Timestamp 1549035123 ID=02141592cc0000000300000000000000 Dest_ID=00000000000000000000000000000000 Nbr_Received = ec30000000"
>>> pat = r'Timestamp\s+(\d+)\s+ID=(\w+)\s+Dest_ID=(\d+)\s+Nbr_Received\s+?=\s+?(\w+)'
>>> re.findall(pat, string)
[('1549035123', '02141592cc0000000300000000000000', '00000000000000000000000000000000', 'ec30000000')]
I am trying to check that an input string which contains a version number of the correct format.
vX.X.X
where X can be any number of numerical digits, e.g:
v1.32.12 or v0.2.2 or v1232.321.23
I have the following regular expression:
v([\d.][\d.])([\d])
This does not work.
Where is my error?
EDIT: I also require the string to have a max length of 20 characters, is there a way to do this through regex or is it best to just use regular Python len()
Note that [\d.] should match any one character either a digit or a dot.
v(\d+)\.(\d+)\.\d+
Use \d+ to match one or more digit characters.
Example:
>>> import re
>>> s = ['v1.32.12', 'v0.2.2' , 'v1232.321.23', 'v1.2.434312543898765']
>>> [i for i in s if re.match(r'^(?!.{20})v(\d+)\.(\d+)\.\d+$', i)]
['v1.32.12', 'v0.2.2', 'v1232.321.23']
>>>
(?!.{20}) negative lookahead at the start checks for the string length before matching. If the string length is atleast 20 then it would fails immediately without do matching on that particular string.
#Avinash Raj.Your answer is perfect except for one correction.
It would allow only 19 characters.Slight correction
>>> import re
>>> s = ['v1.32.12', 'v0.2.2' , 'v1232.321.23', 'v1.2.434312543898765']
>>> [i for i in s if re.match(r'^(?!.{21})v(\d+)\.(\d+)\.\d+$', i)]
['v1.32.12', 'v0.2.2', 'v1232.321.23']
>>>
I have a spreadsheet with text values like A067,A002,A104. What is most efficient way to do this? Right now I am doing the following:
str = 'A067'
str = str.replace('A','')
n = int(str)
print n
Depending on your data, the following might be suitable:
import string
print int('A067'.strip(string.ascii_letters))
Python's strip() command takes a list of characters to be removed from the start and end of a string. By passing string.ascii_letters, it removes any preceding and trailing letters from the string.
If the only non-number part of the input will be the first letter, the fastest way will probably be to slice the string:
s = 'A067'
n = int(s[1:])
print n
If you believe that you will find more than one number per string though, the above regex answers will most likely be easier to work with.
You could use regular expressions to find numbers.
import re
s = 'A067'
s = re.findall(r'\d+', s) # This will find all numbers in the string
n = int(s[0]) # This will get the first number. Note: If no numbers will throw exception. A simple check can avoid this
print n
Here's some example output of findall with different strings
>>> a = re.findall(r'\d+', 'A067')
>>> a
['067']
>>> a = re.findall(r'\d+', 'A067 B67')
>>> a
['067', '67']
You can use the replace method of regex from re module.
import re
regex = re.compile("(?P<numbers>.*?\d+")
matcher = regex.search(line)
if matcher:
numbers = int(matcher.groupdict()["numbers"] #this will give you the numbers from the captured group
import string
str = 'A067'
print (int(str.strip(string.ascii_letters)))
I am trying to split a string such as: add(ten)sub(one) into add(ten) sub(one).
I can't figure out how to match the close parentheses. I have used re.sub(r'\\)', '\\) ') and every variation of escaping the parentheses,I can think of. It is hard to tell in this font but I am trying to add a space between these commands so I can split it into a list later.
There's no need to escape ) in the replacement string, ) has a special a special meaning only in the regex pattern so it needs to be escaped there in order to match it in the string, but in normal string it can be used as is.
>>> strs = "add(ten)sub(one)"
>>> re.sub(r'\)(?=\S)',r') ', strs)
'add(ten) sub(one)'
As #StevenRumbalski pointed out in comments the above operation can be simply done using str.replace and str.rstrip:
>>> strs.replace(')',') ').strip()
'add(ten) sub(one)'
d = ')'
my_str = 'add(ten)sub(one)'
result = [t+d for t in my_str.split(d) if len(t) > 0]
result = ['add(ten)','sub(one)']
Create a list of all substrings
import re
a = 'add(ten)sub(one)'
print [ b for b in re.findall('(.+?\(.+?\))', a) ]
Output:
['add(ten)', 'sub(one)']