How to strip letters out of a string and compare values? - python

I have just learned Python for this project I am working on and I am having trouble comparing two values - I am using the Python xlwt and xlrd libraries and pulling values of cells from the documents. The problem is some of the values are in the format 'NP_000000000', 'IPI00000000.0', and '000000000' so I need to check which format the value is in and then strip the characters and decimal points off if necessary before comparing them.
I have tried using S1[:3] to get the value without alphabet characters, but I get a 'float is not subscriptable' error
Then I tried doing re.sub(r'[^\d.]+, '', S1) but I get a Typerror: expected a string or buffer
I figured since the value of the cell that is being returned via sheet.cell( x, y).value would be a string since it is alphanumeric, but it seems like it must be returned as a float
What is the best way to format these values and then compare them?

You are trying to get the numbers from the strings in the format shown? Like to get 2344 from NP_2344? If yes then use this
float(str(S1)[3:])
to get what you want. You can change float to int.

It sounds like the API you're using is returning different types depending on the content of the cells. You have two options.
You can convert everything to a string and then do what you're currently doing:
s = str(S1)
...
You can check the types of the input and act appropriately:
if isinstance(S1, basestring):
# this is a string, strip off the prefix
elif isinstance(S1, float):
# this is a float, just use it

Related

Converting exponential number from string to float (encoding issue?) Python

I want to calculate with a value in my dataframe, however, this string consists of an exponential number ('10⁻³'). Is this some kind of encoding issue? How can I convert this string into a float (e.g. 10e-3) so that can perform calculations with this value?
(using Python 3.8.8)
First problem is to convert the Unicode symbols to something easier to work with.
import unidecode
simpler = unidecode.unidecode('10⁻³')
Now you can put an 'e' in front of any '-' or '+':
simpler = simpler.replace('-', 'e-').replace('+', 'e+')
Now you have a format you can give to float.
f = float(simpler)

Is there a way to set the default float presentation for Python "{:5.3e}".format(a) if a=None and is not a float?

Is there an easy way to set the default float presentation for Python's format command:
"{:5.3e}".format(a)
so that if the variable a has a value of None instead of a float, some default like 5 spaces might be printed?
The format string can be include many fields and is given by a user.
The values in a are calculated internally.
I would suggest something like
print("{:.3e}".format(a)) if a else print(" "*5).
you don't need 5.3e since in scientific notation you always get 5 digits because of .3e. Except you want to shift the whole text to the right. Then you could use {:>10.3e}.

Import string that looks like a list "[0448521958, +61439800915]" from JSON into Python and make it an actual list?

I am extracting a string out of a JSON document using python that is being sent by an app in development. This question is similar to some other questions, but I'm having trouble just using x = ast.literal_eval('[0448521958, +61439800915]') due to the plus sign.
I'm trying to get each phone number as a string in a python list x, but I'm just not sure how to do it. I'm getting this error:
raise ValueError('malformed string')
ValueError: malformed string
your problem is not just the +
the first number starts with 0 which is an octal number ... it only supports 0-7 ... but the number ends with 8 (and also has other numbers bigger than 8)
but it turns out your problems dont stop there
you can use regex to fix the plus
fixed_string = re.sub('\+(\d+)','\\1','[0445521757, +61439800915]')
ast.literal_eval(fixed_string)
I dont know what you can do about the octal number problem however
I think the problem is that ast.literal_eval is trying to interpret the phone numbers as numbers instead of strings. Try this:
str = '[0448521958, +61439800915]'
str.strip('[]').split(', ')
Result:
['0448521958', '+61439800915']
Technically that string isn't valid JSON. If you want to ignore the +, you could strip it out of the file or string before you evaluate it. If you want to preserve it, you'll have to enclose the value with quotes.

Type inference of values contained in strings stored in a list

I am trying to figure out how to do some nice type inference on the columns of a CSV file.
Are there any libraries that might tell me, for example, that a column contains only integers?
All values are of course available in string format.
I will write my own tool if nothing of this sort already exists, but it seems weird to me that such a basic task does not have a library counterpart somewhere.
Why don't you do the straightforward approach?
if all values can be parsed as integers, to column is integers
otherwise, if all values can be parsed as doubles, to column is doubles
otherwise, the column is all strings
The reason why there is no library for this is probably because it's trivial to implement using the existing string to int and string to double conversion functions.
Regular expressions are good for that, in Python, you could use something like this:
import re
def str_is_num(s):
number_pattern = re.compile("-?^\d+(\.\d+)?$")
return re.match(number_pattern, s) != None
To check whether a cell is a number, you can evaluate str_is_num(cell)

python trying to remove single apostrophe

I'm using a program called CityEngine which has a python element to it.
The problem: I've just called a function on an object and returns me a list of numbers, xyz. I split the xyz into their own names. I also call a function to retrieve a different attribute related to this object to replace the previously retrieved y value.
Now, when I print the y value, it contains numerical characters only apart from decimal place.
When I incorporate the y value into a new list, it's value has single apostrophe around it.
For example, print(y) returns 5.0000000
If I place it like this position[x,y,z] I get a print(position) of [0, '5.000000' , 0]. The program can't read the single apostrophes so ignored the value completely.
I've tried .remove("'","") and .strip() and nothing.
Any help would be appreciated.
Thanks.
That looks more as if the function were not returning a number but a string. So, in order to deal with it, you’ll have to convert the string using either int() or float().
In general, if you do a print(l) on some list of items, each item will be printed with the output of it’s __repr__ method. Convention has it that the __repr__ method of string wraps the string with single apostrophes, whereas numbers do not get wrapped. This is to remove potential ambiguity. Hence, a print(l) which returned
[0, '5.00000', 0.1]
would be a list containing an int, a str and a float.
Convert it to float ... It is a string, so you need to do string to float conversion

Categories

Resources