python finding clear numbers in file [duplicate] - python

This question already has answers here:
How to match a whole word with a regular expression?
(4 answers)
Closed 2 years ago.
I want to find just numbers in textfile so I made this code
r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?"
but I also get and numbers from string with characters (e.g. my txt file include string a278, and it also find number 278, so I want to not find that kind of numbers)
I want to find just "clear numbers", not a numbers from string which include char.

You can consider look at wordboundaries.
https://www.regular-expressions.info/wordboundaries.html

You could solve such a problem with list comprehension, even without regex, as a simpler solution.
Would have been beneficial if you'd gave us an idea of the type of data you're dealing with i/e of your input data.
Either way considering what you've stated, you want only numbers to be detected without string numbers.
case = "test123,#213 12" output = [int(i) for i in case .split() if i.isdigit()]
output Out[29]: [12]

Related

Python list manipulation challenges [duplicate]

This question already has answers here:
Split a string only by first space in python [duplicate]
(4 answers)
Closed 1 year ago.
I have a list in python:
name = ['A.A.BCD', 'B.B.AAD', 'B.A.A.D']
I wish to discard everything before the second '.' and keep the rest. Below is what I have come up with.
[n.split('.')[2] for n in name]
Above is working for all except the last entry. Any way to do this:
Expected output: ['BCD', 'AAD', 'A.D']
Read the documentation for split() and you’ll find it has an optional parameter for the maximum number of splits - use this to get the last one to work:
[n.split('.',maxsplit=2)[2] for n in name]
See https://docs.python.org/3/library/stdtypes.html?highlight=split#str.split
Big disadvantage of doing this as a one-liner is it will fail if there ever aren’t two . in a string, so using a for loop can be more robust.
name = ['A.A.BCD', 'B.B.AAD', 'B.A.A.D']
['.'.join(n.split('.')[2:]) for n in name]
result
['BCD', 'AAD', 'A.D']

Numeric pattern search in regular expression using Python [duplicate]

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 2 years ago.
I have text as below-
my_text = "My telephone number is 408-555-1234"
on which i am searching the pattern
re.findall(r'\d{3}-\d{1,}',my_text)
My intention was to search for three digit numeric value followed by - and then another set of one or more than one digit numeric value. Hence I was expecting the result to be - ['408-555','555-1234'],
However the result i am getting os only ['408-555'] .
Could anyone suggest me what is wrong in my understaning here. And suggest a pattern that would serve my purpose
you can use:
re.findall(r'(?=(\d{3}-\d+))', my_text)
output:
['408-555', '555-1234']

Re.sub in python (remove last _) [duplicate]

This question already has answers here:
Remove Last instance of a character and rest of a string
(5 answers)
Closed 3 years ago.
I have a string such as:
string="lcl|NC_011588.1_cds_YP_002321424.1_1"
and I would like to keep only: "YP_002321424.1"
So I tried :
string=re.sub(".*_cds_","",string)
string=re.sub("_\d","",string)
Does someone have an idea?
But the first _ is removed to
Note: The number can change (they are not fixed).
"Ordinary" split, as proposed in the other answer, is not enough,
because you also want to strip the trailing _1, so the part to capture
should end after a dot and digit.
Try the following pattern:
(?<=_cds_)\w+\.\d
For a working example see https://regex101.com/r/U2QsFH/1
Don't bother with regexes, a simple
string.split('_cds_')[1]
will be enough

How to read '$1,234.56' as 1234.56 [duplicate]

This question already has answers here:
How do I convert a currency string to a floating point number in Python?
(10 answers)
Closed 7 years ago.
I've looked through the 'currency' threads, but they're all for going the other way. I'm reading financial data that comes in as $1,234.56 &c, with everything a string. I split the input line, and want to convert the value item to float for add/subtract (I'm mot worried about roundoff error). Naturally, the float() throws an error.
I could write a function to call as 'amount = float(num(value_string)), but woder if there's a "dollar_string_to_float()" function in one of the 32,000 Python modules.
I think this question is slightly different from this question, but I'm not sure.
Anyway, the code from the afformentioned question just need one function change from Decimal to Float and the removal of the Decimal import.
As you requested, the code is in a dollar_string_to_float function:
>>> from re import sub
>>> def dollar_string_to_float(s):
return float(sub(r'[^\d.]', '', money))
>>> money = '$1,234.56'
>>> print dollar_string_to_float(money)
1234.56
Look into the regular expressions module. You can compile a pattern that matches your dollars/cents format and extract the floating-point number from it.

Python - Remove first three chars' of string [duplicate]

This question already has answers here:
Are there limits to using string.lstrip() in python? [duplicate]
(3 answers)
Closed 8 years ago.
So I have a super long string composed of integers and I am trying to extract and remove the first three numbers in the string, and I have been using the lstrip method (the idea is kinda like pop) but sometimes it would remove more than three.
x="49008410..."
x.lstrip(x[0:3])
"8410..."
I was hoping it would just remove 490 and return 08410 but it's being stubborn -_- .
Also I am running Python 2.7 on Windows... And don't ask why the integers are strings. If that bothers you, just replace them with letters. Same thing! LOL
Instead of remove the first 3 numbers, get all numbers behind the third position. You can do it using : operator.
x="49008410..."
x[3:]
>> "8410..."

Categories

Resources