I downloaded this datafile from the TCGA database but I am not sure how to process it in python. After importing it with pd.read_csv, I wanted to convert the reads_per_million_miRNA_mapped column to floats, as they are strings now, but it gives me the following error can't be done because of the dots.
ValueError: could not convert string to float: '1.024.089'
The txt file looks like this:
miRNA_ID read_count reads_per_million_miRNA_mapped
hsa-mir-1227 1 0.204818
hsa-mir-1228 5 1.024.089
hsa-mir-1229 12 2.457.814
So I was thinking to remove the dots, but then you get the problem of also removing dots that act like commas, like 0.204818.
EDIT:
I think the best solution for this would be do remove the dots except if there are more than 3 numbers behind a dot (so 0.204818 would be an exception). Does anyone know how to do this?
Thanks!
Assuming all numbers will be floats (i.e. the last dot acts as a decimal point), you can get rid of all but the last dot and then cast into floats:
example = '1.024.089'
num = example.replace('.', '', example.count('.') - 1)
print(float(num))
Output:
1024.089
EDIT:
To check whether there are more than 3 numbers after the last/only dot you can do something like this:
i = num.index('.')
digits_after_dot = len(num[i+1:])
Example:
num = '12.12345'
i = num.index('.')
digits_after_dot = len(num[i+1:])
print(digits_after_dot)
Output:
5
Related
shocked beyond belief how difficult this is turning out to be. All I can find are suggestions to change the format of the column to 'int' but I need to keep the comma thousand separators and changing the format to int gets rid of them. THEN i can't find anything on how to add comma separators to an int column. any ideas? really is nothing for me to share in addition to above in terms of what i've tried.
Format your floats...in a string format?
my_string = '{:,.0f}'. format(my_number)
E.g.:
x = 1000.00
'{:,.0f}'. format(x)-> 1,000
Which gives you what you want...something you can print with commas. 0f sets to 0 precision. (for how many decimal places)
I wanted to try and grab a hex value in between a bunch of zeros and convert it to decimal. Here's a sample: '00000000002E3706400000'. So I only want to grab '2E37064' and disregard everything else around it. I know to use the int() function to convert it to decimal, but when I do, it includes the leading zeros right after the actual hex value. Here's a sample of my code:
hex_val = '00000000002E3706400000'
dec_val = int(hex_val, 16)
print(dec_val)
And then here's the output:
50813862936576
The actual value I want is:
48459876
Is there an optimal way to accomplish this?
You can use the .strip() function to remove the leading and trailing zeroes (though removing the leading zeroes here isn't technically necessary):
int(hex_val.strip('0'), 16)
This outputs:
48459876
I dont find out how i can set the decimal (point) to the two last numbers...
I tried this '{0:.2f}'.format(a) but that makes like this '117085.00'
This is what i have
117085
55688
And i want
1170.85
556.88
So i need a point at the last two numbers.
And i dont want new numbers, i only need to set the point
Can someone help at this (easy i think) problem? :/ i am really new
In [33]: x = 117085
In [34]: x/100
Out[34]: 1170.85
The way that you are receiving your numbers, they are 100x the value you are trying to print. To format them the way you want, divide them by 100 before formatting them.
Additionally, the example you provide appears to be right-justified, meaning that the right side all lines up. If you want to accomplish that, you can use something like the below:
a = 117085
b = 55688
print('{0:>7.2f}'.format(a / 100))
print('{0:>7.2f}'.format(b / 100))
Output:
1170.85
556.88
Edit: Converted the rjust(7) to the format string >7
Let's break down the format string that we're using above...
{0:>7.2f} # The whole string
{ } # Brackets to denote a processed value
0 # Take the first argument passed through the `format()` function
: # A delimiter to separate the identifier (in this case, 0) from the format notation
>7 # Right justify this element, with a width of 7
.2f # Format the input as a float, with 2 digits to the right of the decimal point
Python implicitly assumes a few things apparently, so here's a shorter alternative:
{:7.2f} # The whole string
{ } # Brackets to denote a processed value
: # A delimiter to separate the identifier (in this case, assumed 0) from the format notation
7 # justify this element (Right justification by default), with a width of 7
.2f # Format the input as a float, with 2 digits to the right of the decimal point
I have a string of numbers with no whitespaces like this:
s = '12.2321.4310.85'
I know that the format for each number is F5.2 (I am reading the string from a FORTRAN code output)
I need to obtain the following list of numbers based on s:
[12.23,21.43,10.85]
How can I do this in python?
Thanks in advance for any help!
Slice the string into chunks of 5 characters. Convert each chunk to float.
>>> [float(s[i:i+5]) for i in range(0, len(s), 5)]
[12.23, 21.43, 10.85]
If you are really sure of the format, and that will always be handed in that way then using a step of 5 in your loop might work:
s = '12.2321.4310.85'
output = []
for i in range(0,len(s),5):
output.append(float(s[i:i+5]))
print(output)
Output:
[12.23, 21.43, 10.85]
I think the safest way is to rely on . points. Because we know that every floating point should have one fraction and always there are two fraction numbers (there might be values like 1234.56 and 78.99 in the data that generates s = "1234.5678.99"). But we are not sure how many digits are before .. So we can extract values one by one based on ..
s = '12.2321.4310.85'
def extractFloat(s):
# Extracts the first floating number with 2 floatings from the string
return float( s[:s.find('.')+3]) , s[s.find('.')+3:]
l = []
while len(s) > 0:
value, s = extractFloat(s)
l.append(value)
print(l)
# Output:
# [12.23, 21.43, 10.85]
I'm using PYTHON to write to a file where the formatting is very strict. I have 10 available spaces in each column which cannot be exceeded.
I want to write the as many decimals as I can, but if the number is negative, the minus sign must be preferred over the last decimals. Also the period in the float must be counted into the number of available spaces. Numbers should be right trunctated
Example:
Let's say I want to print two numbers
a = 123.4567891011
b = 0.9876543210
Then I would want the result:
123.4567890.98765432
But if I now have the following:
a = -123.1111111111
b = 98765.432101234
c = 567
d = 0.1234
Then I'd want:
-123.1111198765.4321 567.0 0.1234
Would be to nice use exponential notation for high numbers, but not a necessity. I'm unable to find the answer. All I can find is to fix the format to number of significant digits, which really won't help me.
I've tried several methods of the
f.write({0:>10}{1:>10}.format(a,b))
but can't figure it out. Hope you see what I`m looking for.
Okay, so I found a way. I basically convert everything to strings and use:
f.write("{0:>10.10}{1:>10.10}".format(str(a),str(b)))
and so on..