Split string of numbers with no whitespaces in python

Split string of numbers with no whitespaces in python - python

I have a string of numbers with no whitespaces like this:
s = '12.2321.4310.85'
I know that the format for each number is F5.2 (I am reading the string from a FORTRAN code output)
I need to obtain the following list of numbers based on s:
[12.23,21.43,10.85]
How can I do this in python?
Thanks in advance for any help!

Slice the string into chunks of 5 characters. Convert each chunk to float.
>>> [float(s[i:i+5]) for i in range(0, len(s), 5)]
[12.23, 21.43, 10.85]

If you are really sure of the format, and that will always be handed in that way then using a step of 5 in your loop might work:
s = '12.2321.4310.85'
output = []
for i in range(0,len(s),5):
output.append(float(s[i:i+5]))
print(output)
Output:
[12.23, 21.43, 10.85]

I think the safest way is to rely on . points. Because we know that every floating point should have one fraction and always there are two fraction numbers (there might be values like 1234.56 and 78.99 in the data that generates s = "1234.5678.99"). But we are not sure how many digits are before .. So we can extract values one by one based on ..
s = '12.2321.4310.85'
def extractFloat(s):
# Extracts the first floating number with 2 floatings from the string
return float( s[:s.find('.')+3]) , s[s.find('.')+3:]
l = []
while len(s) > 0:
value, s = extractFloat(s)
l.append(value)
print(l)
# Output:
# [12.23, 21.43, 10.85]

Related

Convert str into float but problems with multiple dots in numbers

I downloaded this datafile from the TCGA database but I am not sure how to process it in python. After importing it with pd.read_csv, I wanted to convert the reads_per_million_miRNA_mapped column to floats, as they are strings now, but it gives me the following error can't be done because of the dots.
ValueError: could not convert string to float: '1.024.089'
The txt file looks like this:
miRNA_ID read_count reads_per_million_miRNA_mapped
hsa-mir-1227 1 0.204818
hsa-mir-1228 5 1.024.089
hsa-mir-1229 12 2.457.814
So I was thinking to remove the dots, but then you get the problem of also removing dots that act like commas, like 0.204818.
EDIT:
I think the best solution for this would be do remove the dots except if there are more than 3 numbers behind a dot (so 0.204818 would be an exception). Does anyone know how to do this?
Thanks!

Assuming all numbers will be floats (i.e. the last dot acts as a decimal point), you can get rid of all but the last dot and then cast into floats:
example = '1.024.089'
num = example.replace('.', '', example.count('.') - 1)
print(float(num))
Output:
1024.089
EDIT:
To check whether there are more than 3 numbers after the last/only dot you can do something like this:
i = num.index('.')
digits_after_dot = len(num[i+1:])
Example:
num = '12.12345'
i = num.index('.')
digits_after_dot = len(num[i+1:])
print(digits_after_dot)
Output:
5

Convert a list of float to string in Python

I have a list of floats in Python and when I convert it into a string, I get the following
[1883.95, 1878.3299999999999, 1869.4300000000001, 1863.4000000000001]
These floats have 2 digits after the decimal point when I created them (I believe so),
Then I used
str(mylist)
How do I get a string with 2 digits after the decimal point?
======================
Let me be more specific, I want the end result to be a string and I want to keep the separators:
"[1883.95, 1878.33, 1869.43, 1863.40]"
I need to do some string operations afterwards. For example +="!\t!".
Inspired by #senshin the following code works for example, but I think there is a better way
msg = "["
for x in mylist:
msg += '{:.2f}'.format(x)+','
msg = msg[0:len(msg)-1]
msg+="]"
print msg

Use string formatting to get the desired number of decimal places.
>>> nums = [1883.95, 1878.3299999999999, 1869.4300000000001, 1863.4000000000001]
>>> ['{:.2f}'.format(x) for x in nums]
['1883.95', '1878.33', '1869.43', '1863.40']
The format string {:.2f} means "print a fixed-point number (f) with two places after the decimal point (.2)". str.format will automatically round the number correctly (assuming you entered the numbers with two decimal places in the first place, in which case the floating-point error won't be enough to mess with the rounding).

If you want to keep full precision, the syntactically simplest/clearest way seems to be
mylist = list(map(str, mylist))

map(lambda n: '%.2f'%n, [1883.95, 1878.3299999999999, 1869.4300000000001, 1863.4000000000001])
map() invokes the callable passed in the first argument for each element in the list/iterable passed as the second argument.

Get rid of the ' marks:
>>> nums = [1883.95, 1878.3299999999999, 1869.4300000000001, 1863.4000000000001]
>>> '[{:s}]'.format(', '.join(['{:.2f}'.format(x) for x in nums]))
'[1883.95, 1878.33, 1869.43, 1863.40]'
['{:.2f}'.format(x) for x in nums] makes a list of strings, as in the accepted answer.
', '.join([list]) returns one string with ', ' inserted between the list elements.
'[{:s}]'.format(joined_string) adds the brackets.

str([round(i, 2) for i in mylist])

Using numpy you may do:
np.array2string(np.asarray(mylist), precision=2, separator=', ')

Strings, ints and leading zeros

I need to record SerialNumber(s) on an object. We enter many objects. Most serial numbers are strings - the numbers aren't used numerically, just as unique identifiers - but they are often sequential. Further, leading zeros are important due to unique id status of serial number.
When doing data entry, it's nice to just enter the first "sequential" serial number (eg 000123) and then the number of items (eg 5) to get the desired output - that way we can enter data in bulk see below:
Obj1.serial = 000123
Obj2.serial = 000124
Obj3.serial = 000125
Obj4.serial = 000126
Obj5.serial = 000127
The problem is that when you take the first number-as-string, turn to integer and increment, you loose the leading zeros.
Not all serials are sequential - not all are even numbers (eg FDM-434\RRTASDVI908)
But those that are, I would like to automate entry.
In python, what is the most elegant way to check for leading zeros (*and, I guess, edge cases like 0009999) in a string before iterating, and then re-application of those zeros after increment?
I have a solution to this problem but it isn't elegant. In fact, it's the most boring and blunt alg possible.
Is there an elegant solution to this problem?
EDIT
To clarify the question, I want the serial to have the same number of digits after the increment.
So, in most cases, this will mean reapplying the same number of leading zeros. BUT in some edge cases the number of leading zeros will be decremented. eg: 009 -> 010; 0099 -> 0100

Try str.zfill():
>>> s = "000123"
>>> i = int(s)
>>> i
123
>>> n = 6
>>> str(i).zfill(n)
'000123'

I develop my comment here, Obj1.serial being a string:
Obj1.serial = "000123"
('%0'+str(len(Obj1.serial))+'d') % (1+int(Obj1.serial))
It's like #owen-s answer '%06d' % n: print the number and pad with leading 0.
Regarding '%d' % n, it's just one way of printing. From PEP3101:
In Python 3.0, the % operator is supplemented by a more powerful
string formatting method, format(). Support for the str.format()
method has been backported to Python 2.6.
So you may want to use format instead… Anyway, you have an integer at the right of the % sign, and it will replace the %d inside the left string.
'%06d' means print a minimum of 6 (6) digits (d) long, fill with 0 (0) if necessary.
As Obj1.serial is a string, you have to convert it to an integer before the increment: 1+int(Obj1.serial). And because the right side takes an integer, we can leave it like that.
Now, for the left part, as we can't hard code 6, we have to take the length of Obj1.serial. But this is an integer, so we have to convert it back to a string, and concatenate to the rest of the expression %0 6 d : '%0'+str(len(Obj1.serial))+'d'. Thus
('%0'+str(len(Obj1.serial))+'d') % (1+int(Obj1.serial))
Now, with format (format-specification):
'{0:06}'.format(n)
is replaced in the same way by
('{0:0'+str(len(Obj1.serial))+'}').format(1+int(Obj1.serial))

You could check the length of the string ahead of time, then use rjust to pad to the same length afterwards:
>>> s = "000123"
>>> len_s = len(s)
>>> i = int(s)
>>> i
123
>>> str(i).rjust(len_s, "0")
'000123'
You can check a serial number for all digits using:
if serial.isdigit():

Python - Incrementing a binary sequence while maintaining the bit length

I am trying to increment a binary sequence in python while maintaining the bit length.
So far I am using this piece of code...
'{0:b}'.format(long('0100', 2) + 1)
This will take the binary number, convert it to a long, adds one, then converts it back to a binary number. Eg, 01 -> 10.
However, if I input a number such as '0100', instead of incrementing it to '0101', my code
increments it to '101', so it is disregarding the first '0', and just incrementing '100'
to '101'.
Any help on how to make my code maintain the bit length will be greatly appreciated.
Thanks

str.format lets you specify the length as a parameter like this
>>> n = '0100'
>>> '{:0{}b}'.format(long(n, 2) + 1, len(n))
'0101'

That's because 5 is represented as '101' after conversion from int(or long) to binary, so to prefix some 0's before it you've use 0 as filler and pass the width of the initial binary number while formatting.
In [35]: b='0100'
In [36]: '{0:0{1:}b}'.format(long(b, 2) + 1,len(b))
Out[36]: '0101'
In [37]: b='0010000'
In [38]: '{0:0{1:}b}'.format(long(b, 2) + 1,len(b))
Out[38]: '0010001'

This is probably best solved using format strings. Get the length of your input, construct a format string from it, and then use it to print the incremented number.
from __future__ import print_function
# Input here, as a string
s = "0101"
# Convert to a number
n = long(s, 2)
# Construct a format string
f = "0{}b".format(len(s))
# Format the incremented number; this is your output
t = format(n + 1, f)
print(t)
To hardcode to four binary places (left-padded by 0) you would use 04b, for five you would use 05b, etc. In the code above we just get the length of the input string.
Oh, and if you input a number like 1111 and add 1 you'll get 10000 since you need an extra bit to represent that. If you want to wrap around to 0000 do t = format(n + 1, f)[-len(s):].

Format a number containing a decimal point with leading zeroes

I want to format a number with a decimal point in it with leading zeros.
This
>>> '3.3'.zfill(5)
003.3
considers all the digits and even the decimal point. Is there a function in python that considers only the whole part?
I only need to format simple numbers with no more than five decimal places. Also, using %5f seems to consider trailing instead of leading zeros.

Is that what you look for?
>>> "%07.1f" % 2.11
'00002.1'
So according to your comment, I can come up with this one (although not as elegant anymore):
>>> fmt = lambda x : "%04d" % x + str(x%1)[1:]
>>> fmt(3.1)
0003.1
>>> fmt(3.158)
0003.158

I like the new style of formatting.
loop = 2
pause = 2
print 'Begin Loop {0}, {1:06.2f} Seconds Pause'.format(loop, pause)
>>>Begin Loop 2, 0002.1 Seconds Pause
In {1:06.2f}:
1 is the place holder for variable pause
0 indicates to pad with leading zeros
6 total number of characters including the decimal point
2 the precision
f converts integers to floats

print('{0:07.3f}'.format(12.34))
This will have total 7 characters including 3 decimal points, ie. "012.340"

Like this?
>>> '%#05.1f' % 3.3
'003.3'

Starting with a string as your example does, you could write a small function such as this to do what you want:
def zpad(val, n):
bits = val.split('.')
return "%s.%s" % (bits[0].zfill(n), bits[1])
>>> zpad('3.3', 5)
'00003.3'

With Python 3.6+ you can use the fstring method:
f'{3.3:.0f}'[-5:]
>>> '3'
f'{30000.3:.0f}'[-5:]
>>> '30000'
This method will eliminate the fractional component (consider only the whole part) and return up to 5 digits. Two caveats: First, if the whole part is larger than 5 digits, the most significant digits beyond 5 will be removed.
Second, if the fractional component is greater than 0.5, the function will round up.
f'{300000.51:.0f}'[-5:]
>>>'00001'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split string of numbers with no whitespaces in python - python

Slice the string into chunks of 5 characters. Convert each chunk to float. >>> [float(s[i:i+5]) for i in range(0, len(s), 5)] [12.23, 21.43, 10.85]

If you are really sure of the format, and that will always be handed in that way then using a step of 5 in your loop might work: s = '12.2321.4310.85' output = [] for i in range(0,len(s),5): output.append(float(s[i:i+5])) print(output) Output: [12.23, 21.43, 10.85]

Related

Convert str into float but problems with multiple dots in numbers

Convert a list of float to string in Python

Strings, ints and leading zeros

Python - Incrementing a binary sequence while maintaining the bit length

Format a number containing a decimal point with leading zeroes

Categories

Resources