Iterating through tables in text file

Iterating through tables in text file - python

everyone.
I would say this is the first task I have not a clear idea where to start with:
Create a text file (using an editor, not necessarily Python)
containing two tab- separated columns, with each column containing a
number. Then use Python to read through the file you’ve created. For
each line, multiply each first number by the second, and then sum
the results from all the lines. Ignore any line that doesn’t contain
two numeric columns.
so far I wrote a couple of lines, but I am not sure where would I need to go next:
filename = 'path'
def sum_columns(filename):
sum = 0
multiply = 0
with open (filename) as f:
Should I split my file with 2 columns and create a list of them, or should I do something else?
Thank you in advance

Here is a short solution:
def sum_columns(filename):
counter = 0
with open(filename) as file:
for line in file:
try:
a, b = [int(x) for x in line.split('\t')]
counter += a * b
except ValueError:
continue
return counter
file_name = 'myfile.txt'
print(sum_columns(file_name))
This is what a lot of people (#martineau to be the first) suggested to use in comments (also this is something I learned just now) so I decided to put it in an answer.
Basically what happens, the loop iterates over each line and for each line creates a list of two integers (the list comprehension is for just that since otherwise both numbers are strings which will raise a ValueError if you try multiplying them), then also unpack the two values, which is great since then you only need one except since the only reasonable error thrown is ValueError (either because couldn't unpack or character couldn't be converted to integer) then multiply both values and add to the counter and at the end of the loop return the counter

You can pretty much do a lot of things, given the exercise text. In my opinion, the best way would be to do something like this:
filename = 'path'
def sum_columns(filename):
sum = 0
multiply = 0
with open (filename) as f:
all_lines = f.readlines()
f.close()
for line in all_lines:
splitted = line.split("\t")
sum += int(splitted[0]) * int(splitted[1])
return sum
You'll get all lines of the file listed into all_lines, then you can iterate through every line and split them from the tab, then multiply them and sum them to the sum variable you initialized to 0, which you'll return at the end. As hinted by someone else, you could also read the file line by line without memorizing every line into a list, but if the file is relatively small, you can go with my option.

If you have a file like this:
1 2
2 4
4 8
You can do the following:
from functools import reduce
def is_int(s):
try:
int(s)
return True
except ValueError:
return False
filename = 'path'
def sum_columns(filename):
with open (filename) as f:
lines = f.readlines()
return sum([
reduce(lambda x, y: x * y, map(int,line.split("\t")))
for line in lines
if len(list(filter(is_int, line.split("\t")))) == 2
])
Explanation:
At the top I define a helper function, that determins if a string can be converted into an int or not. This will be used later to ignore lines that don't have 2 numbers. It's based on this answer
def is_int(s):
try:
int(s)
return True
except ValueError:
return False
Then, we open the file, and read all lines into a variable. This is not the most efficient, as it can be processed line by line without storing the while file, however, for smaller files this is negligable.
with open (filename) as f:
lines = f.readlines()
Next, is a single operation to perform your query, but let's break it down:
First, we iterate through all the lines:
for line in lines
Next, we only keep the lines that have exactly two numbers separated by tabs:
if len(list(filter(is_int, line.split("\t")))) == 2
Finally, we turn each number in the line into ints, and multiply them all together:
reduce(lambda x, y: x * y, map(int,line.split("\t")))
We then sum all of these and return the result
Performance consideration
If performance is a concern, you can achieve the same thing, reading the contents line by line, instead of pulling the whole file into a variable. It is less elegant, but more efficicient:
def sum_columns(filename):
total = 0
with open (filename) as f:
for line in f:
if len(list(filter(is_int, line.split("\t")))) != 2:
continue
total += reduce((lambda x, y: x * y), map(int,line.split("\t")))
return total
(Note, that you still need the import and helpers from the above example)

input.txt
1 3
2 6
3 7
7 12
8
script.py
with open('input.txt') as f:
total = 0
for line in f:
numbers = line.read().split('\t')
try:
line_value = int(numbers[0]) * int(numbers[1])
except IndexError as e:
# the line doesn't contain two numbers
continue
except ValueError as e:
# a value couldn't be converted to a number
continue
total += line_value

Related

Read text line by line in python

I would like to make a script that read a text line by line and based on lines if it finds a certain parameter populates an array. The idea is this
Read line
if Condition 1
#True
nested if Condition 2
...
else Condition 1 is not true
read next line
I can't get it to work though. I'm using readline () to read the text line by line, but the main problem is that the command never works to make it read the next line. Can you help me? Below an extract of my actual code:
col = 13 # colonne
rig = 300 # righe
a = [ [ None for x in range(col) ] for y in range(rig) ]
counter = 1
file = open('temp.txt', 'r')
files = file.readline()
for line in files:
if 'bandEUTRA: 32' in line:
if 'ca-BandwidthClassDL-EUTRA: a' in line:
a[counter][5] = 'DLa'
counter = counter + 1
else:
next(files)
else:
next(files)
print('\n'.join(map(str, a)))

Fixes for the code you asked about inline, and some other associated cleanup, with comments:
col = 13 # colonne
rig = 300 # righe
a = [[None] * col for y in range(rig)] # Innermost repeated list of immutable
# can use multiplication, just don't do it for
# outer list(s), see: https://stackoverflow.com/q/240178/364696
counter = 1
with open('temp.txt') as file: # Use with statement to get guaranteed file closure; 'r' is implicit mode and can be omitted
# Removed: files = file.readline() # This makes no sense; files would be a single line from the file, but your original code treats it as the lines of the file
# Replaced: for line in files: # Since files was a single str, this iterated characters of the file
for line in file: # File objects are iterators of their own lines, so you can get the lines one by one this way
if 'bandEUTRA: 32' in line and 'ca-BandwidthClassDL-EUTRA: a' in line: # Perform both tests in single if to minimize arrow pattern
a[counter][5] = 'DLa'
counter += 1 # May as well not say "counter" twice and use +=
# All next() code removed; next() advances an iterator and returns the next value,
# but files was not an iterator, so it was nonsensical, and the new code uses a for loop that advances it for you, so it was unnecessary.
# If the goal is to intentionally skip the next line under some conditions, you *could*
# use next(files, None) to advance the iterator so the for loop will skip it, but
# it's rare that a line *failing* a test means you don't want to look at the next line
# so you probably don't want it
# This works:
print('\n'.join(map(str, a)))
# But it's even simpler to spell it as:
print(*a, sep="\n")
# which lets print do the work of stringifying and inserting the separator, avoiding
# the need to make a potentially huge string in memory; it *might* still do so (no documented
# guarantees), but if you want to avoid that possibility, you could do:
sys.stdout.writelines(map('{}\n'.format, a))
# which technically doesn't guarantee it, but definitely actually operates lazily, or
for x in a:
print(x)
# which is 100% guaranteed not to make any huge strings

You can do:
with open("filename.txt", "r") as f:
for line in f:
clean_line = line.rstrip('\r\n')
process_line(clean_line)
Edit:
for your application of populating an array, you could do something like this:
with open("filename.txt", "r") as f:
contains = ["text" in l for l in f]
This will give you a list of length number of lines in filename.txt, the contents of the array will be False for each line that doesn't contain text, and True for each line that does.
Edit 2: To reflect #ShadowRanger's comments, I've changed my code to not do iterate over each line in the file without reading the whole thing at once.

Reading a numerical data file and assigning to each items at the same time

I am a beginner in python, and I have a problem when I want to read my numeric data file that contains more lines. My data in the input file contains rows that include a counter number, three float numbers, and finally, a character letter that all of them separated by space.it look like this:
1 12344567.143 12345678.154 1234.123 w
2 23456789.231 23413456.342 4321.321 f
I want to assign each item in the line to a specific parameter that I can use them to other steps.
like this "NO"=first item "X"=second item "Y"=third item "code"=forth item
I am trying to write it as follow:
f1=open('t1.txt','r')
line: float
for line in f1:
print(line.split(', ',4))
f1=float
select(line.split('')
nob: object(1)=int(line[1, 1])
cnt = 0
print(nob)
cnt +=1
but received more error each time when I run the program. Anyone can help me?

The error is probably due to the wrong indentation: in Python indentation is part of the syntax. It would be helpful if you also included the error message in your question.
How about this:
all_first_numbers = []
with open('t1.txt', 'r') as f:
for line in f:
values = line.split()
first_number = int(values[0])
second_number = float(values[1])
letter_code = values[4]
# If you want to save all the first numbers in one array:
all_first_numbers.append(first_number)

Removing lines from a txt file based on the structure of the line

Code:
with open("filename.txt" 'r') as f: #I'm not sure about reading it as r because I would be removing lines.
lines = f.readlines() #stores each line in the txt into 'lines'.
invalid_line_count = 0
for line in lines: #this iterates through each line of the txt file.
if line is invalid:
# something which removes the invalid lines.
invalid_line_count += 1
print("There were " + invalid_line_count + " amount of invalid lines.")
I have a text file like so:
1,2,3,0,0
2,3,0,1,0
0,0,0,1,2
1,0,3,0,0
3,2,1,0,0
The valid line structure is 5 values split by commas.
For a line to be valid, it must have a 1, 2, 3 and two 0's. It doesn't matter in what position these numbers are.
An example of a valid line is 1,2,3,0,0
An example of an invalid line is 1,0,3,0,0, as it does not contain a 2 and has 3 0's instead of 2.
I would like to be able to iterate through the text file and remove invalid lines.
and maybe a little message saying "There were x amount of invalid lines."
Or maybe as suggested:
As you read each line from the original file, test it for validity. If it passes, write it out to the new file. When you're finished, rename the original file to something else, then rename the new file to the original file.
I think that the csv module may help so I read the documentation and it doesn't help me.
Any ideas?

You can't remove lines from a file, per se. Rather, you have to rewrite the file, including only the valid lines. Either close the file after you've read all the data, and reopen in mode "w", or write to a new file as you process the lines (which takes less memory in the short term.
Your main problem with detecting line validity seems to be handling the input. You want to convert the input text to a list of values; this is a skill you should get from learning your tools. The ones you need here are split to divide the line, and int to convert the values. For instance:
line_vals = line.split(',')
Now iterate through line_vals, and convert each to integer with int.
Validity: you need to count the quantity of each value you have in this list. You should be able to count things by value; if not back up to your prior lessons and review basic logic and data flow. If you want the advanced method for this, use collections.Counter, which is a convenient type of dictionary that accumulates counts from any sequence.
Does that get you moving? If you're still lost, I recommend some time with a local tutor.

One of the possible right approaches:
with open('filename.txt', 'r+') as f: # opening file in read/write mode
inv_lines_cnt = 0
valid_list = [0, 0, 1, 2, 3] # sorted list of valid values
lines = f.read().splitlines()
f.seek(0)
f.truncate(0) # truncating the initial file
for l in lines:
if sorted(map(int, l.split(','))) == valid_list:
f.write(l+'\n')
else:
inv_lines_cnt += 1
print("There were {} amount of invalid lines.".format(inv_lines_cnt))
The output:
There were 2 amount of invalid lines.
The final filename.txt contents:
1,2,3,0,0
2,3,0,1,0
3,2,1,0,0

This is a mostly language-independent problem. What you would do is open another file for writing. As you read each line from the original file, test it for validity. If it passes, write it out to the new file. When you're finished, rename the original file to something else, then rename the new file to the original file.

For a line to be valid, each line must have a 1, 2, 3 and 2 0's. It doesn't matter in what position these numbers are.
CHUNK_SIZE = 65536
def _is_valid(line):
"""Check if a line is valid.
A line is valid if it is of length 5 and contains '1', '2', '3',
in any order, as well as '0', twice.
:param list line: The line to check.
:return: True if the line is valid, else False.
:rtype: bool
"""
if len(line) != 5:
# If there's not exactly five elements in the line, return false
return False
if all(x in line for x in {"1", "2", "3"}) and line.count("0") == 2:
# Builtin `all` checks if a condition (in this case `x in line`)
# applies to all elements of a certain iterator.
# `list.count` returns the amount of times a specific
# element appears in it. If "0" appears exactly twice in the line
# and the `all` call returns True, the line is valid.
return True
# If the previous block doesn't execute, the line isn't valid.
return False
def get_valid_lines(path):
"""Get the valid lines from a file.
The valid lines will be written to `path`.
:param str path: The path to the file.
:return: None
:rtype: None
"""
invalid_lines = 0
contents = []
valid_lines = []
with open(path, "r") as f:
# Open the `path` parameter in reading mode.
while True:
chunk = f.read(CHUNK_SIZE)
# Read `CHUNK_SIZE` bytes (65536) from the file.
if not chunk:
# Reaching the end of the file, we get an EOF.
break
contents.append(chunk)
# If the chunk is not empty, add it to the contents.
contents = "".join(contents).split("\n")
# `contents` will be split in chunks of size 65536. We need to join
# them using `str.join`. We then split all of this by newlines, to get
# each individual line.
for line in contents:
if not _is_valid(line=line):
invalid_lines += 1
else:
valid_lines.append(line)
print("Found {} invalid lines".format(invalid_lines))
with open(path, "w") as f:
for line in valid_lines:
f.write(line)
f.write("\n")
I'm splitting this up into two functions, one to check if a line is valid according to your rules, and a second one to manipulate a file. If you want to return the valid lines instead, just remove the second with statement and replace it with return valid_lines.

python delete specific line and re-assign the line number

I would like delete specific line and re-assign the line number:
eg:
0,abc,def
1,ghi,jkl
2,mno,pqr
3,stu,vwx
what I want: if line 1 is the line need to be delete, then
output should be:
0,abc,def
1,mno,pqr
2,stu,vwx
What I have done so far:
f=open(file,'r')
lines = f.readlines()
f.close()
f.open(file,'w')
for line in lines:
if line.rsplit(',')[0] != 'line#':
f.write(line)
f.close()
above lines can delete specifc line#, but I don't konw how to rewrite the line number before the first ','

Here is a function that will do the job.
def removeLine(n, file):
f = open(file,"r+")
d = f.readlines()
f.seek(0)
for i in range(len(d)):
if i > n:
f.write(d[i].replace(d[i].split(",")[0],str(i -1)))
elif i != n:
f.write(d[i])
f.truncate()
f.close()
Where the parameters n and file are the line you wish to delete and the filepath respectively.
This is assuming the line numbers are written in the line as implied by your example input.
If the number of the line is not included at the beginning of each line, as some other answers have assumed, simply remove the first if statement:
if i > n:
f.write(d[i].replace(d[i].split(",")[0],str(i -1)))

I noticed that your account wasn't created in the past few hours, so I figure that there's no harm in giving you the benefit of the doubt. You will really have more fun on StackOverflow if you spend the time to learn its culture.
I wrote a solution that fits your question's criteria on a file that's already written (you mentioned that you're opening a text file), so I assume it's a CSV.
I figured that I'd answer your question differently than the other solutions that implement the CSV reader library and use a temporary file.
import re
numline_csv = re.compile("\d\,")
# substitute your actual file opening here
so_31195910 = """
0,abc,def
1,ghi,jkl
2,mno,pqr
3,stu,vwx
"""
so = so_31195910.splitlines()
# this could be an input or whatever you need
delete_line = 1
line_bank = []
for l in so:
if l and not l.startswith(str(delete_line)+','):
print(l)
l = re.split(numline_csv, l)
line_bank.append(l[1])
so = []
for i,l in enumerate(line_bank):
so.append("%s,%s" % (i,l))
And the output:
>>> so
['0,abc,def', '1,mno,pqr', '2,stu,vwx']

In order to get a line number for each line, you should use the enumerate method...
for line_index, line in enumerate(lines):
# line_index is 0 for the first line, 1 for the 2nd line, &ct
In order to separate the first element of the string from the rest of the string, I suggest passing a value for maxsplit to the split method.
>>> '0,abc,def'.split(',')
['0', 'abc', 'def']
>>> '0,abc,def'.split(',',1)
['0', 'abc,def']
>>>
Once you have those two, it's just a matter of concatenating line_index to split(',',1)[1].

Printing specific lines txt file python

I have a text file I wish to analyze. I'm trying to find every line that contains certain characters (ex: "#") and then print the line located 3 lines before it (ex: if line 5 contains "#", I would like to print line 2)
This is what I got so far:
file = open('new_file.txt', 'r')
a = list()
x = 0
for line in file:
x = x + 1
if '#' in line:
a.append(x)
continue
x = 0
for index, item in enumerate(a):
for line in file:
x = x + 1
d = a[index]
if x == d - 3:
print line
continue
It won't work (it prints nothing when I feed it a file that has lines containing "#"), any ideas?

First, you are going through the file multiple times without re-opening it for subsequent times. That means all subsequent attempts to iterate the file will terminate immediately without reading anything.
Second, your indexing logic a little convoluted. Assuming your files are not huge relative to your memory size, it is much easier to simply read the whole into memory (as a list) and manipulate it there.
myfile = open('new_file.txt', 'r')
a = myfile.readlines();
for index, item in enumerate(a):
if '#' in item and index - 3 >= 0:
print a[index - 3].strip()
This has been tested on the following input:
PrintMe
PrintMe As Well
Foo
#Foo
Bar#
hello world will print
null
null
##

Ok, the issue is that you have already iterated completely through the file descriptor file in line 4 when you try again in line 11. So line 11 will make an empty loop. Maybe it would be a better idea to iterate the file only once and remember the last few lines...
file = open('new_file.txt', 'r')
a = ["","",""]
for line in file:
if "#" in line:
print(a[0], end="")
a.append(line)
a = a[1:]

For file IO it is usually most efficient for programmer time and runtime to use reg-ex to match patterns. In combination with iteration through the lines in the file. your problem really isn't a problem.
import re
file = open('new_file.txt', 'r')
document = file.read()
lines = document.split("\n")
LinesOfInterest = []
for lineNumber,line in enumerate(lines):
WhereItsAt = re.search( r'#', line)
if(lineNumber>2 and WhereItsAt):
LinesOfInterest.append(lineNumber-3)
print LinesOfInterest
for lineNumber in LinesOfInterest:
print(lines[lineNumber])
Lines of Interest is now a list of line numbers matching your criteria
I used
line1,0
line2,0
line3,0
#
line1,1
line2,1
line3,1
#
line1,2
line2,2
line3,2
#
line1,3
line2,3
line3,3
#
as input yielding
[0, 4, 8, 12]
line1,0
line1,1
line1,2
line1,3

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iterating through tables in text file - python

Related

Read text line by line in python

Reading a numerical data file and assigning to each items at the same time

Removing lines from a txt file based on the structure of the line

python delete specific line and re-assign the line number

Printing specific lines txt file python

Categories

Resources