Converting string to float - python

I am trying to write a program that tallies the values in a file. For example, I am given a file with numbers like this
2222 (First line)
4444 (Second line)
1111 (Third line)
My program takes in the name of an input file (E.G. File.txt), and the column of numbers to tally. So for example, if my file.txt contains the number above and i need the sum of column 2, my function should be able to print out 7(2+4+1)
t1 = open(argv[1], "r")
number = argv[2]
k = 0
while True:
n = int(number)
t = t1.readline()
z = list(t)
if t == "":
break
k += float(z[n])
t1.close()
print k
This code works for the first column when I set it to 0, but it doesn't return a consistent result when I set it to 1 even though they should be the same answer.
Any thoughts?

A somewhat uglier implementation that demonstrates the cool-factor of zip:
def sum_col(filename, colnum):
with open(filename) as inf:
columns = zip(*[line.strip() for line in inf])
return sum([int(num) for num in list(columns)[colnum]])
zip(*iterable) flips from row-wise to columnwise, so:
iterable = ['aaa','bbb','ccc','ddd']
zip(*iterable) == ['abcd','abcd','abcd'] # kind of...
zip objects aren't subscriptable, so we need to cast as list before we subscript it (doing [colnum]). Alternatively we could do:
...
for _ in range(colnum-1):
next(columns) # skip the columns we don't need
return sum([int(num) for num in next(columns)])
Or just calculate all the sums and grab the sum that we need
...
col_sums = [sum(int(num) for num in column) for column in columns]
return col_sums[colnum]

Related

How to create multiple lists of data that can be looked through for specific specifications

sf = input('pick a number between 1-5: ')
string = ''
with open('dna.txt','r')as f:
count = 0
for line in f:
row = list(map(str, line.strip().split()))
Letter, number = row
n=int(number)
chunks = [row[x:x+2] for x in range(0, len(row),2)]
for x in chunks:
print(x)
if x >= sf:
string += x
After getting a user input, it needs to search through the created lists for values equal to or greater than the given value and be added to 'string'. in the text file the first column are all letters and the second are all numbers. Is it possible to have the number column become an int?
[This is what the data looks like]
[1]: https://i.stack.imgur.com/VM3aJ.png
Thank you in advance :)
If you want the second column to be an int, you don't want to be returning the data as a single string; a list of (str, int) tuples would allow you to keep track of both values with their correct types.
I think this is closer to what you're aiming for:
from typing import List, Tuple
def read_values_with_minimum(
path: str,
n: int
) -> List[Tuple[str, int]]:
with open(path) as f:
return [
(s, int(v))
for s, v in (
line.strip().split()
for line in f.readlines()
)
if int(v) >= n
]
if __name__ == '__main__':
sf = int(input('pick a number between 1-5: '))
assert 1 <= sf <= 5
print("\n".join(
map(str, read_values_with_minimum("dna.txt", sf))
))

Finding missing lines in file

I have a 7000+ lines .txt file, containing description and ordered path to image. Example:
abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png
abnormal /Users/alex/Documents/X-ray-classification/data/images/2.png
normal /Users/alex/Documents/X-ray-classification/data/images/3.png
normal /Users/alex/Documents/X-ray-classification/data/images/4.png
Some lines are missing. I want to somehow automate the search of missing lines. Intuitively i wrote:
f = open("data.txt", 'r')
lines = f.readlines()
num = 1
for line in lines:
if num in line:
continue
else:
print (line)
num+=1
But of course it didn't work, since lines are strings.
Is there any elegant way to sort this out? Using regex maybe?
Thanks in advance!
the following should hopefully work - it grabs the number out of the filename, sees if it's more than 1 higher than the previous number, and if so, works out all the 'in-between' numbers and prints them. Printing the number (and then reconstructing the filename later) is needed as line will never contain the names of missing files during iteration.
# Set this to the first number in the series -1
num = lastnum = 0
with open("data.txt", 'r') as f:
for line in f:
# Pick the digit out of the filename
num = int(''.join(x for x in line if x.isdigit()))
if num - lastnum > 1:
for i in range(lastnum+1, num):
print("Missing: {}.png".format(str(i)))
lastnum = num
The main advantage of working this way is that as long as your files are sorted in the list, it can handle starting at numbers other than 1, and also reports more than one missing number in the sequence.
You can try this:
lines = ["abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png","normal /Users/alex/Documents/X-ray-classification/data/images/3.png","normal /Users/alex/Documents/X-ray-classification/data/images/4.png"]
maxvalue = 4 # or any other maximum value
missing = []
i = 0
for num in range(1, maxvalue+1):
if str(num) not in lines[i]:
missing.append(num)
else:
i += 1
print(missing)
Or if you want to check for the line ending with XXX.png:
lines = ["abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png","normal /Users/alex/Documents/X-ray-classification/data/images/3.png","normal /Users/alex/Documents/X-ray-classification/data/images/4.png"]
maxvalue = 4 # or any other maximum value
missing = []
i = 0
for num in range(1, maxvalue+1):
if not lines[i].endswith(str(num) + ".png"):
missing.append(num)
else:
i += 1
print(missing)
Example: here

Python: read line after string is found

I have a file which contains blocks of lines that I would like to separate. Each block contains a number identifier in the block's header: "Block X" is the header line for the X-th block of lines. Like this:
Block X
#L E C A F X M N
11.2145 15 27 29.444444 7.6025229 1539742 29.419783
11.21451 13 28 24.607143 6.8247935 1596787 24.586264
...
Block Y
#L E C A F X M N
11.2145 15 27 29.444444 7.6025229 1539742 29.419783
11.21451 13 28 24.607143 6.8247935 1596787 24.586264
...
I can use "enumerate" to find the header line of the block as follows:
with open(filename,'r') as indata:
for num, line in enumerate(indata):
if 'Block X' in line:
startblock=num
print startblock
This will yield the line number of the first line of block #X.
However, my problem is identifying the last line of the block. To do that, I could find the next occurrence of a header line (i.e., the next block) and subtract a few numbers.
My question: how can I find the line number of a the next occurrence of a condition (i.e., right after a certain condition was met)?
I tried using enumerate again, this time indicating the starting value, like this:
with open(filename,'r') as indata:
for num, line in enumerate(indata,startblock):
if 'Block Y ' in line:
endscan=num
break
print endscan
That doesn't work, because it still begins reading the file from line 0, NOT from the line number "startblock". Instead, by starting the "enumerate" counter from a different number, the resulting value of the counter, in this case "endscan" is shifted from 0 by the amount "startblock".
Please, help! How can tell python to disregard the lines previous to "startblock"?
If you want the groups using Block as the delimiter for each section, you can use itertools.groupby:
from itertools import groupby
with open('test.txt') as f:
grp = groupby(f,key=lambda x: x.startswith("Block "))
for k,v in grp:
if k:
print(list(v) + list(next(grp, ("", ""))[1]))
Output:
['Block X\n', '#L E C A F X M N \n', '11.2145 15 27 29.444444 7.6025229 1539742 29.419783\n', '11.21451 13 28 24.607143 6.8247935 1596787 24.586264\n']
['Block Y\n', '#L E C A F X M N \n', '11.2145 15 27 29.444444 7.6025229 1539742 29.419783\n', '11.21451 13 28 24.607143 6.8247935 1596787 24.586264']
If Block can appear elsewhere but you want it only when followed by a space and a single char:
import re
with open('test.txt') as f:
r = re.compile("^Block \w$")
grp = groupby(f, key=lambda x: r.search(x))
for k, v in grp:
if k:
print(list(v) + list(next(grp, ("", ""))[1]))
You can use the .tell() and .seek() methods of file objects to move around. So for example:
with open(filename, 'r') as infile:
start = infile.tell()
end = 0
for line in infile:
if line.startswith('Block'):
end = infile.tell()
infile.seek(start)
# print all the bytes in the block
print infile.read(end - start)
# now go back to where we were so we iterate correctly
infile.seek(end)
# we finished a block, mark the start
start = end
If the difference between the header lines is uniform throughout the file, just use the distance to increase the indexing variable accordingly.
file1 = open('file_name','r')
lines = file1.readlines()
numlines = len(lines)
i=0
for line in file:
if line == 'specific header 1':
line_num1 = i
if line == 'specific header 2':
line_num2 = i
i+=1
diff = line_num2 - line_num1
Now that we know the difference between the line numbers we use for loops to acquire the data.
k=0
array = np.zeros([numlines, diff])
for i in range(numlines):
if k % diff == 0:
for j in range(diff):
array[i][j] = lines[i+j]
k+=1
% is the mod operator which returns 0 only when k is a multiple of the difference in line numbers between the two header lines in the file, which will only occur when the line corresponds to the a header line. Once the line is fixed we go on to the second for loop that fills the array so that we have a matrix that is numlines number of rows and a diff number of columns. The nonzeros rows will contain the data inbetween the header lines.
I have not tried this out, I am just writing off the top of my head. Hopefully it helps!

Convert string to integers

I'm just starting out with programming, and I know I'm still missing some of the basics, but I'm trying to work this out. I have a list of 3 and 4 digit numbers that I've brought in from a text file, and I'm trying to get a sum of these numbers. So far all I can get python to do is perform a sum of each individual number, so if the first number in the list is 427, it's printing 13, rather than adding 427 + 504 + 219, etc.
This is what I have:
myList = []
inFile = open('E:/GIS/GTECH 731/NYCElementarySchools.txt', 'r')
for row in inFile:
col = row.split('\t')
if col[1]=='BK':
myList = (col[3])
intList = [int(x) for x in myList]
print sum(intList)
Additionally, when i have it print length, it gives me a list of 3's and 4's, telling me the length of each number, not giving me the total count of numbers.
I must be missing something fundamental, but I don't know what it is! Any suggestions are appreciated! Thanks!
This:
myList = (col[3])
will set myList to a str, not a list, which would be a representation of a number. Thus:
intList = [int(x) for x in myList]
would convert the digits to numbers. You want int(myList) to convert the whole string to a number.
You can keep a running total (initialised to 0) and do total += int(myList) to total all of the numbers. Then after the loop you can print the result.
In your code:
col = row.split('\t')
if col[1]=='BK':
myList = (col[3])
intList = [int(x) for x in myList]
print sum(intList)
'col = row.split('\t')' makes a list which is divided by TAB.
If the line which is read from the file, looks like this:
# \t is TAB
SOMETHING\tBK\t1\t2\t3
The col structure is:
col[0] = SOMETHING
col[1] = BK
col[2] = 1
col[3] = 2
col[4] = 3
So, if you intend sum col[3] to col[...] then use col[3:] = list of col[3], col[4]
Thus, if you want to accumulate sum result you need another variable.
myList = []
inFile = open('E:/GIS/GTECH 731/NYCElementarySchools.txt', 'r')
sumList = []
for row in inFile:
row_total = 0
col = row.split('\t')
if col[1]=='BK':
intList = [int(x) for x in col[3:]]
row_sum = sum(intList)
# row_sum = map(lambda x: int(x), col[3:])
print 'row total: %d' % (row_sum)
sumList.append(row_sum)
print 'total: %d' % (sum(sumList))
Probably you want to use a slice to assign myList
myList = col[3:]
this will return a list everything from the 3rd column on.

Read in file contents to Arrays

I am a bit familiar with Python. I have a file with information that I need to read in a very specific way. Below is an example...
1
6
0.714285714286
0 0 1.00000000000
0 1 0.61356352337
...
-1 -1 0.00000000000
0 0 5.13787636499
0 1 0.97147643932
...
-1 -1 0.00000000000
0 0 5.13787636499
0 1 0.97147643932
...
-1 -1 0.00000000000
0 0 0 0 5.13787636499
0 0 0 1 0.97147643932
....
So every file will have this structure (tab delimited).
The first line must be read in as a variable as well as the second and third lines.
Next we have four blocks of code separated by a -1 -1 0.0000000000. Each block of code is 'n' lines long. The first two numbers represent the position/location that the 3rd number in the line is to be inserted in an array. Only the unique positions are listed (so, position 0 1 would be the same as 1 0 but that information would not be shown).
Note: The 4th block of code has a 4-index number.
What I need
The first 3 lines read in as unique variables
Each block of data read into an array using the first 2 (or 4 ) column of numbers as the array index and the 3rd column as the value being inserted into an array.
Only unique array elements shown. I need the mirrored position to be filled with the proper value as well (a 0 1 value should also appear in 1 0).
The last block would need to be inserted into a 4-dimensional array.
I rewrote the code. Now it's almost what you need. You only need fine tuning.
I decided to leave the old answer - perhaps it would be helpful too.
Because the new is feature-rich enough, and sometimes may not be clear to understand.
def the_function(filename):
"""
returns tuple of list of independent values and list of sparsed arrays as dicts
e.g. ( [1,2,0.5], [{(0.0):1,(0,1):2},...] )
on fail prints the reason and returns None:
e.g. 'failed on text.txt: invalid literal for int() with base 10: '0.0', line: 5'
"""
# open file and read content
try:
with open(filename, "r") as f:
data_txt = [line.split() for line in f]
# no such file
except IOError, e:
print 'fail on open ' + str(e)
# try to get the first 3 variables
try:
vars =[int(data_txt[0][0]), int(data_txt[1][0]), float(data_txt[2][0])]
except ValueError,e:
print 'failed on '+filename+': '+str(e)+', somewhere on lines 1-3'
return
# now get arrays
arrays =[dict()]
for lineidx, item in enumerate(data_txt[3:]):
try:
# for 2d array data
if len(item) == 3:
i, j = map(int, item[:2])
val = float(item[-1])
# check for 'block separator'
if (i,j,val) == (-1,-1,0.0):
# make new array
arrays.append(dict())
else:
# update last, existing
arrays[-1][(i,j)] = val
# almost the same for 4d array data
if len(item) == 5:
i, j, k, m = map(int, item[:4])
val = float(item[-1])
arrays[-1][(i,j,k,m)] = val
# if value is unparsable like '0.00' for int or 'text'
except ValueError,e:
print 'failed on '+filename+': '+str(e)+', line: '+str(lineidx+3)
return
return vars, arrays
As i anderstand what did you ask for..
# read data from file into list
parsed=[]
with open(filename, "r") as f:
for line in f:
# # you can exclude separator here with such code (uncomment) (1)
# # be careful one zero more, one zero less and it wouldn work
# if line == '-1 -1 0.00000000000':
# continue
parsed.append(line.split())
# a simpler version
with open(filename, "r") as f:
# # you can exclude separator here with such code (uncomment, replace) (2)
# parsed = [line.split() for line in f if line != '-1 -1 0.00000000000']
parsed = [line.split() for line in f]
# at this point 'parsed' is a list of lists of strings.
# [['1'],['6'],['0.714285714286'],['0', '0', '1.00000000000'],['0', '1', '0.61356352337'] .. ]
# ALT 1 -------------------------------
# we do know the len of each data block
# get the first 3 lines:
head = parsed[:3]
# get the body:
body = parsed[3:-2]
# get the last 2 lines:
tail = parsed[-2:]
# now you can do anything you want with your data
# but remember to convert str to int or float
# first3 as unique:
unique0 = int(head[0][0])
unique1 = int(head[1][0])
unique2 = float(head[2][0])
# cast body:
# check each item of body has 3 inner items
is_correct = all(map(lambda item: len(item)==3, body))
# parse str and cast
if is_correct:
for i, j, v in body:
# # you can exclude separator here (uncomment) (3)
# # * 1. is the same as float(1)
# if (i,j,v) == (0,0,1.):
# # here we skip iteration for line w/ '-1 -1 0.0...'
# # but you can place another code that will be executed
# # at the point where block-termination lines appear
# continue
some_body_cast_function(int(i), int(j), float(v))
else:
raise Exception('incorrect body')
# cast tail
# check each item of body has 5 inner items
is_correct = all(map(lambda item: len(item)==5, tail))
# parse str and cast
if is_correct:
for i, j, k, m, v in body: # 'l' is bad index, because similar to 1.
some_tail_cast_function(int(i), int(j), int(k), int(m), float(v))
else:
raise Exception('incorrect tail')
# ALT 2 -----------------------------------
# we do NOT know the len of each data block
# maybe we have some array?
array = dict() # your array may be other type
v1,v2,v2 = parsed[:3]
unique0 = int(v1[0])
unique1 = int(v2[0])
unique2 = float(v3[0])
for item in parsed[3:]:
if len(item) == 3:
i,j,v = item
i = int(i)
j = int(j)
v = float(v)
# # yo can exclude separator here (uncomment) (4)
# # * 1. is the same as float(1)
# # logic is the same as in 3rd variant
# if (i,j,v) == (0,0,1.):
# continue
# do your stuff
# for example,
array[(i,j)]=v
array[(j,i)]=v
elif len(item) ==5:
i, j, k, m, v = item
i = int(i)
j = int(j)
k = int(k)
m = int(m)
v = float(v)
# do your stuff
else:
raise Exception('unsupported') # or, maybe just 'pass'
To read lines from a file iteratively, you can use something like:
with open(filename, "r") as f:
var1 = int(f.next())
var2 = int(f.next())
var3 = float(f.next())
for line in f:
do some stuff particular to the line we are on...
Just create some data structures outside the loop, and fill them in the loop above. To split strings into elements, you can use:
>>> "spam ham".split()
['spam', 'ham']
I also think you want to take a look at the numpy library for array datastructures, and possible the SciPy library for analysis.

Categories

Resources