Eliminate numbers from a sentence in Python - python

I am parsing from a xml file and trying to build a dictionary for every context in the xml. I have done parsing successfully, and now I need to get rid of the stopwords, punctuations and numbers from the string I get.
However, for some reason, I couldn't get rid of the numbers, I have been debugging all night, hope someone could help me with it...
def is_number(s):
try:
float(s)
return True
except ValueError:
return False
I have been checking that the method 'is_number' is working, but I don't why it still could get pass the if statement:
if (words[headIndex + index] not in cachedStopWords) and ~isNumber:
Thanks in advance!

The problem is:
~isNumber
~ is the bitwise not operator. You want the not boolean operator:
>>> ~True
-2
>>> ~False
-1
>>> not True
False
>>> not False
True
The bitwise operator will lead to ~isNumber always being a truthy value (-1 or -2), and so your if statement is entered.

Related

Why does the str.isdigit always output the opposite result

def is_valid_zip(zip_code):
"""Returns whether the input string is a valid (5 digit) zip code
"""
if (len(zip_code) == 5) and (str.isnumeric == True):
return True
else :
return False
First of all, it should be str.isnumeric() == True as that's calling the isnumeric function. Second of all you should be really using str.isdigit().
str.isnumeric()
In Python, decimal characters (like: 0, 1, 2..), digits (like: subscript, superscript), and characters having Unicode numeric value property (like: fraction, roman numerals, currency numerators) are all considered numeric characters. Therefore even japanese character for 1, 2 and 3 would pass this check.
str.isdigit()
On the other hand isdigit() will only return True if all characters in a string are digits. If not, it returns False.
source: https://www.programiz.com/python-programming/methods/string/isdigit
A few point to discuss. Regarding your condition:
str.isnumeric == True
That thing on the left side is the function itself, not a call to the function giving a result, the latter would be some_string.isnumeric().
The chances of the function object being equal to true are somewhere between zero and a very, very small number :-)
It's also redundant to compare boolean values against boolean constants since the result of the comparison is just another boolean value. Where do you stop in that case? For example:
(((some_bool_value == True) == True) == True) != False ...
Another point, the code form if cond then return true else return false can be replaced with the much less verbose return cond.
And also keep in mind that isnumeric() allows other things than raw digits, like ¾. If you just want the digits, you're probably better off with another method. You may be tempted to instead use isdigit(), but even that allows other things than just what most would consider "normal" digits, such as allowing "90²10" as a postal code, presumably the much trendier part of Beverly Hills :-).
If you only wanted the raw digits 0-9 (which is probably the case with US postal codes like you seem to be targeting), neither isnumeric() nor isdigit() is really suitable.
An implementation of the function, taking all that into account, could be as follows:
def is_valid_zip(zip_code):
if len(zip_code) != 5:
return False
return all([x in "1234567890" for x in zip_code])
it should be zip_code.isnumeric() not str.isnumeric
Also, why don't you use regex:
import re
RE_ZIP_CODE = re.compile(r'^[0-9]{5}$') # or r'^\d{5}$' for all digit characters
def is_valid_zip(zip_code):
return RE_ZIP_CODE.search(zip_code) is not None
This should work
def is_valid_zip(zip_code):
"""Returns whether the input string is a valid (5 digit) zip code
"""
if len(zip_code) == 5 and zip_code.isnumeric():
return True
else:
return False
print(is_valid_zip("98909"))

How to convert string to Boolean in Python

I came across this question for the interview. I had to convert the string to Boolean and return the Boolean value
For ex s="3>1>5" => false
I tried using
ans= bool(s)
But I always get the answer as True even though the above example gives False if we don't pass it as a string
You must be looking for eval(string)
>>> eval("3>1>5")
False

If not... followed by function

I am stuck with this question. I am not too concerned about what each function does, but more importantly how does the IF statement work with functions. From my understanding, the IF.... or statements usually work with a condition, but for this scenario it only involves two functions without any conditions?
def disk_check_usage(disk):
du = shutil.disk_usage(disk)
free = du.free/du.total * 100
return free > 20
def check_cpu_usage():
usage = psutil.cpu_percent(1)
return usage < 75
if not disk_check_usage("/") or not check_cpu_usage():
print("ERROR!")
else:
print("Everything is OK")
I want it to give an 'Error!' message when both conditions (free > 20 and usage < 75) are not True/Satisfied.
Edit: When I run the code, 'free' = 17 which gives 'False' and Usage < 75 which gives 'True'. So my IF statement would mean 'If not False or not True:'. What does that mean and how does the system whether to run 'if' or 'else' statement?
Any help will be appreciated!
They are some operations you can do with booleans.
In particular, (not a) or (not b) is equivalent to not(a and b); at least one of the two have to be False
You just have to inverse your tow statements: if everything is ok, print ok else print error
if disk_check_usage("/") and check_cpu_usage():
print("Everything is OK")
else:
print("Error")
From my understanding, the IF.... or statements usually work with a
condition, but for this scenario it only involves two functions
without any conditions?
There's always a condition. The code you posted is no exception.
What you call a "condition" is really just a boolean expression in a specific context. If the entire expression evaluates to True, you enter the body of the if-statement. Otherwise, if it evaluates to False, you don't.
Example
The following code iterates over the characters in a string, and only prints characters which are both alphabetic and uppercase:
string = "Hell2O]WoR3(Ld"
for char in string:
if char.isalpha() and char.isupper():
print(char)
Output:
H
O
W
R
L
>>>
str.isalpha returns a boolean - True if the string (in this case a string consisting of a single character) contains only alphabetic characters, False otherwise.
str.isupper returns a boolean - True if the string (again, just a single character in this case) contains only uppercase characters, False otherwise.
I picked these two string methods to simulate the two functions you have in your code, since they also return booleans.
Let's take the first character, "H":
>>> char = "H"
>>> char.isalpha()
True
>>> char.isupper()
True
>>> char.isalpha() and char.isupper()
True
>>>
You see, the entire boolean expression char.isalpha() and char.isupper() evaluates to True. In order for the entire expression to be evaluated, both functions need to be called - you can think of their return values effectively "replacing" their respective function calls. When char is "H", after calling the functions, the expression really looks like this:
True and True
Which evaluates to True, so we enter the body of the if-statement.
In your case, you used or instead of and. What I said about boolean expressions collapsing down into a single True or False value still applies, the only difference would be in short-circuiting, which isn't that relevant to your question I think.
Example
Here's something closer to what you have. Imagine a scenario where you want to monitor the temperature and radiation of something. If either the temperature OR the radiation levels are not nominal, we trigger an alarm.
def is_temperature_nominal(temp):
return 70 <= temp <= 100
def is_radiation_nominal(rad):
return rad < 200
if not is_temperature_nominal(250) or not is_radiation_nominal(50):
print("Alarm triggered!")
else:
print("Everything is good.")
Output:
Alarm triggered!
>>>
I picked the nominal ranges arbitrarily, and I picked the arguments 250 and 50 arbitrarily as well. With these two hardcoded values, the boolean expression would look like this after calling the two functions:
not False or not True
Which is another way of saying:
True or False
With an or, only one of the operands has to be True in order for the entire expression to evaluate to True - therefore the entire expression evaluates to True, and we enter the body of the if-statement, triggering the alarm (because only one of the values needs to be not nominal in order for the alarm to trigger. If both values are not nominal it would also trigger the alarm. The only way we enter the body of the else is if both values are nominal).
From my understanding, the IF.... or statements usually work with a
condition, but for this scenario it only involves two functions
without any conditions
if (as well as while) needs an expression that evaluates to a logical (type bool) value (either True or False), or can be typecast to a logical value (for example None is typecast to False).
In your case, the conditions (namely, comparisons) are done inside your functions (see the return statements), so that the functions already return logical values.
When calling a method, it's executed and return a value, which can be use in different manner : directly or stored in a variable, this code is exactly the same as the following, but just not saving in variables and inline the methods in the condition
disck_check = disk_check_usage("/") # True or False
cpu_check = check_cpu_usage() # True or False
if not disk_check or not cpu_check: # Boolean conbination
print("ERROR!")
else:
print("Everything is OK")
You have to use 'and' operator instead the 'or' used by you.
def disk_check_usage(disk):
du = shutil.disk_usage(disk)
free = du.free/du.total * 100
return free > 20
def check_cpu_usage():
usage = psutil.cpu_percent(1)
return usage < 75
if not disk_check_usage("/") and not check_cpu_usage():
print("ERROR!")
else:
print("Everything is OK")

How to convert a numerical reference in a string to an integer in Python?

I have a series of text files that include numerical references. I have word tokenized them and I would like to be able to identify where tokens are numbers and convert them to integer format.
mysent = ['i','am','10','today']
I am unsure how to proceed given the immutability of strings.
Please try
[item if not item.isdigit() else int(item) for item in mysent]
If you try to convert a string that is not a representation of an int to an int, you get a ValueError.
You can try to convert all the elements to int, and catch ValueErrors:
mysent = ['i','am','10','today']
for i in mysent:
try:
print(int(i))
except ValueError:
continue
OUTPUT:
10
If you want to directly modify the int inside mysent, you can use:
mysent = ['i','am','10','today']
for n, i in enumerate(mysent):
try:
mysent[n] = int(i)
except ValueError:
continue
print(mysent)
OUTPUT:
['i', 'am', 10, 'today']
.isdigit() IS NOT THE SAME AS try/except!!!!
In the comments has been pointed out that .isdigit() may be more elegant and obvious. As stated in the Zen of Python, There should be one-- and preferably only one --obvious way to do it.
From the official documentation, .isdigit() Return true if all characters in the string are digits and there is at least one character, false otherwise.
Meanwhile, the try/except block catches the ValueError raised by applying int to a non-numerical string.
They may look similar, but their behavior is really different:
def is_int(n):
try:
int(n)
return True
except ValueError:
return False
EXAMPLES:
Positive integer:
n = "42"
print(is_int(n)) --> True
print(n.isdigit()) --> True
Positive float:
n = "3.14"
print(is_int(n)) --> False
print(n.isdigit()) --> False
Negative integer:
n = "-10"
print(is_int(n)) --> True
print(n.isdigit()) --> False
u hex:
n = "\u00B23455"
print(is_int(n)) --> False
print(n.isdigit()) --> True
These are only some example, and probably you can already tell which one suits better your needs.
The discussion open around which one should be used is exhausting and neverending, you can have a look a this couple of interesting SO QA:
try/except comparsion
input validation analysis

python - str containing whitespace characters and int(str)

I'm trying to write a really simple function. It should return True if given object is a digit (0-9), False otherwise. Here are examples of input and desired output:
is_digit("") => False
is_digit("7") => True
is_digit(" ") => False
is_digit("a") => False
is_digit("a5") => False
My code works for the above examples.
def is_digit(n):
try:
return int(n) in range(0, 10)
except:
return False
Trouble is, the function returns True for n = "1\n" when it should return False. So, a string like "1" should be converted to integer and is a digit, but a string like "1\n" should not, yet I don't know how to get around that. How can I account for string literals?
P.S. If my title is lame, advice on renaming it is welcome.
You don't need to define a custom function for this. There is a built-in function for this, namely isdigit().
You can use it as: "a5".isdigit() or "1/n".isdigit().In both cases it will return False.
First you have to convert your literals into string then you can apply isdigit.
You can not apply isdigit directly to number. It will throw an error
AttributeError: 'int' object has no attribute 'isdigit'
You have to typecast your number in string.
eg:
In [3]: str(0).isdigit()
Out[3]: True
or
In [1]: "0".isdigit()
Out[1]: True

Categories

Resources