for loops on text files - python

I'm writing a huge code and one of the little things I need it to do is go over a text file that is divided to different lines.
i need it to create a new list of lines every time the line is empty. for example if the text is: (each number is a new line)
1
2
3
4
5
6
3
1
2
it should build 3 different lists: [1,2,3,4], [5,6,3], [1,2]
this is my code so far (just getting started):
new_list=[]
my_list=[]
doc=open(filename, "r")
for line in doc:
line=line.rstrip()
if line !="":
new_list.append(line)
return new_list

Ok, This should work now:
initial_list, temp_list = [], []
for line in open(filename):
if line.strip() == '':
initial_list.append(temp_list)
temp_list = []
else: temp_list.append(line.strip())
if len(temp_list) > 0: initial_list.append(temp_list)
final_list = [item for item in initial_list if len(item) > 0]
print final_list

You could do something like:
[x.split() for x in fileobject if x.strip()]
To get integers, you could use map:
[map(int,x.split()) for x in fileobject if x.strip()]
where fileobject is the object returned by open. This is probably best to do in a context manager:
with open(filename) as fileobject:
data_list = [map(int,x.split()) for x in fileobject if x.strip()]
Reading some of the comments on the other post, it seems that I also didn't understand your question properly. Here's my stab at correcting it:
with open(filename) as fileobject:
current = []
result = [current]
for line in fileobject:
if line.strip(): #Non-blank line -- Extend current working list.
current.extend(map(int,line.split()))
else: #blank line -- Start new list to work with
current = []
result.append(current)
Now your resulting list should be contained in result.

Related

Splitting a special type of list data and saving data into two separate dataframe using condition in python

Want to seperate a list data into two parts based on condition. If the value is less than "H1000", we want in a first dataframe(Output for list 1) and if it is greater or equal to "H1000" we want in a second dataframe(Output for list2). First column starts the value with H followed by a four numbers.
Here is my python code:
with open(fn) as f:
text = f.read().strip()
print(text)
lines = [[(Path(fn.name), line_no + 1, col_no + 1, cell) for col_no, cell in enumerate(
re.split('\t', l.strip())) if cell != ''] for line_no, l in enumerate(re.split(r'[\r\n]+', text))]
print(lines)
if (lines[:][:][3] == "H1000"):
list1
list2
I am not able to write a python logic to divide the list data into two parts.
Attach python code & file here
So basically you want to check if the number after the H is greater or not than 1000 right? If I'm right then just do like this:
with open(fn) as f:
text = f.read().strip()
print(text)
lines = [[(Path(fn.name), line_no + 1, col_no + 1, cell) for col_no, cell in enumerate(
re.split('\t', l.strip())) if cell != ''] for line_no, l in enumerate(re.split(r'[\r\n]+', text))]
print(lines)
value = lines[:][:][3]
if value[1:].isdigit():
if (int(value[1:]) < 1000):
#list 1
else:
#list 2
we simply take the numerical part of the factor "hxxxx" with the slices, convert it into an integer and compare it with 1000
with open(fn) as f:
text = f.read().strip()
lines =text.split('\n')
list1=[]
list2=[]
for i in lines:
if int(i.split(' ')[0].replace("H",""))>=1000:
list2.append(i)
else:
list1.append(i)
print(list1)
print("***************************************")
print(list2)
I'm not sure exactly where the problem lies. Assuming you read the above text file line by line, you can simply make use of str.__le__ to check your condition, e.g.
lines = """
H0002 Version 3
H0003 Date_generated 5-Aug-81
H0004 Reporting_period_end_date 09-Jun-99
H0005 State WAA
H0999 Tene_no/Combined_rept_no E79/38975
H1001 Tene_holder Magnetic Resources NL
""".strip().split("\n")
# Or
# with open(fn) as f: lines = f.readlines()
list_1, list_2 = [], []
for line in lines:
if line[:6] <= "H1000":
list_1.append(line)
else:
list_2.append(line)
print(list_1, list_2, sep="\n")
# ['H0002 Version 3', 'H0003 Date_generated 5-Aug-81', 'H0004 Reporting_period_end_date 09-Jun-99', 'H0005 State WAA', 'H0999 Tene_no/Combined_rept_no E79/38975']
# ['H1001 Tene_holder Magnetic Resources NL']

Do not add "," after last field in CSV

f=open('students.csv', 'r')
a=f.readline()
length=len(a.split(","))
fw=open('output.csv', 'w')
lst = []
while a:
lst.append(a)
a=f.readline()
for counter in range(length):
for item in lst:
x = len(item.split(","))
if x == length:
x = item.split(",")
#here i want if condition to check whether it is the last element of row and add","?
fw.write(x[counter].split("\n")[0]+",")
#elif the condition that it is the last element of each row to not add ","?
fw.write("\n")
fw.close()
f.close()
join will be your friend here, if you cannot use the csv module:
for counter in range(length):
fw.write(','.join(x[counter] for x in (item.split(',') for item in lst)))
fw.write('\n')
But you should first strip the end of line characters:
a=f.readline().strip()
length=len(a.split(","))
fw=open('output.csv', 'w')
lst = []
while a:
lst.append(a)
a=f.readline().strip()
But your code is neither Pythonic nor efficient.
You split the same string in every iteration of counter, when you could have splitted it once at read time. Next for iterating the lines of a text file, the Pythonic way is to iterate the file. And finaly, with ensure that the files will be properly closed at the end of the block. Your code could become:
with open('students.csv', 'r') as f, open('output.csv', 'w') as fw
lst = [a.strip().split(',') for a in f]
counter = len(lst[0])
for counter in range(length):
fw.write(','.join(x[counter] for x in (item for item in lst)))
fw.write('\n')

Extract from current position until end of file

I want to pull all data from a text file from a specified line number until the end of a file. This is how I've tried:
def extract_values(f):
line_offset = []
offset = 0
last_line_of_heading = False
if not last_line_of_heading:
for line in f:
line_offset.append(offset)
offset += len(line)
if whatever_condition:
last_line_of_heading = True
f.seek(0)
# non-functioning pseudocode follows
data = f[offset:] # read from current offset to end of file into this variable
There is actually a blank line between the header and the data I want, so ideally I could skip this also.
Do you know the line number in advance? If so,
def extract_values(f):
line_number = # something
data = f.readlines()[line_number:]
If not, and you need to determine the line number based on the content of the file itself,
def extract_values(f):
lines = f.readlines()
for line_number, line in enumerate(lines):
if some_condition(line):
data = lines[line_number:]
break
This will not be ideal if your files are enormous (since the lines of the file are loaded into memory); in that case, you might want to do it in two passes, only storing the file data on the second pass.
Your if clause is at the wrong position:
for line in f:
if not last_line_of_heading:
Consider this code:
def extract_values(f):
rows = []
last_line_of_heading = False
for line in f:
if last_line_of_heading:
rows.append(line)
elif whatever_condition:
last_line_of_heading = True
# if you want a string instead of an array of lines:
data = "\n".join(rows)
you can use enumerate:
f=open('your_file')
for i,x in enumerate(f):
if i >= your_line:
#do your stuff
here i will store line number starting from 0 and x will contain the line
using list comprehension
[ x for i,x in enumerate(f) if i >= your_line ]
will give you list of lines after specified line
using dictionary comprehension
{ i:x for i,x in enumerate(f) if i >= your_line }
this will give you line number as key and line as value, from specified line number.
Try this small python program, LastLines.py
import sys
def main():
firstLine = int(sys.argv[1])
lines = sys.stdin.read().splitlines()[firstLine:]
for curLine in lines:
print curLine
if __name__ == "__main__":
main()
Example input, test1.txt:
a
b
c
d
Example usage:
python LastLines.py 2 < test1.txt
Example output:
c
d
This program assumes that the first line in a file is the 0th line.

How to add specific lines from a file into List in Python?

I have an input file:
3
PPP
TTT
QPQ
TQT
QTT
PQP
QQQ
TXT
PRP
I want to read this file and group these cases into proper boards.
To read the Count (no. of boards) i have code:
board = []
count =''
def readcount():
fp = open("input.txt")
for i, line in enumerate(fp):
if i == 0:
count = int(line)
break
fp.close()
But i don't have any idea of how to parse these blocks into List:
TQT
QTT
PQP
I tried using
def readboard():
fp = open('input.txt')
for c in (1, count): # To Run loop to total no. of boards available
for k in (c+1, c+3): #To group the boards into board[]
board[c].append(fp.readlines)
But its wrong way. I know basics of List but here i am not able to parse the file.
These boards are in line 2 to 4, 6 to 8 and so on. How to get them into Lists?
I want to parse these into Count and Boards so that i can process them further?
Please suggest
I don't know if I understand your desired outcome. I think you want a list of lists.
Assuming that you want boards to be:
[[data,data,data],[data,data,data],[data,data,data]], then you would need to define how to parse your input file... specifically:
line 1 is the count number
data is entered per line
boards are separated by white space.
If that is the case, this should parse your files correctly:
board = []
count = 0
currentBoard = 0
fp = open('input.txt')
for i,line in enumerate(fp.readlines()):
if i == 0:
count = int(i)
board.append([])
else:
if len(line[:-1]) == 0:
currentBoard += 1
board.append([])
else: #this has board data
board[currentBoard].append(line[:-1])
fp.close()
import pprint
pprint.pprint(board)
If my assumptions are wrong, then this can be modified to accomodate.
Personally, I would use a dictionary (or ordered dict) and get the count from len(boards):
from collections import OrderedDict
currentBoard = 0
board = {}
board[currentBoard] = []
fp = open('input.txt')
lines = fp.readlines()
fp.close()
for line in lines[1:]:
if len(line[:-1]) == 0:
currentBoard += 1
board[currentBoard] = []
else:
board[currentBoard].append(line[:-1])
count = len(board)
print(count)
import pprint
pprint.pprint(board)
If you just want to take specific line numbers and put them into a list:
line_nums = [3, 4, 5, 1]
fp = open('input.txt')
[line if i in line_nums for i, line in enumerate(fp)]
fp.close()

Too much data in a Python dictionary?

I have a text file with about 10,000 lines.
A typical line look like this:
'1 2/1/2011 9:30,ZQZ,200.02,B,500'
If I run #1, I can iterate through the entire file, and i will count the total number of lines in the file. However, if I create a dictionary which records the data in each line as I iterate through the file (as in #2) I will get about half way through. I cannot figure out why this is happening. Is it possible that 10,000 lines of data is too large to contain within a dictionary? How can I determine this?
#1
TheFile = open(file_name)
TheFile.next()
i = 0
for l in TheFile:
i += 1
print i
#2
TheFile = open(file_name)
TheFile.next()
thedata = {}
i = 0
for l in TheFile:
i += 1
print i
this_line = TheFile.next()
the_info = this_line.split(',')
the_ticker = the_info[1]
#print type(the_info[1])
#print this_line
if the_ticker not in thedata.keys():
thedata[the_ticker] = {}
thedata[the_ticker]['trade'+ str(len(thedata[the_ticker]) + 1)] =
{'the_trade_number':len(thedata[the_ticker]),
'theTime':the_info[0],
'thePrice':float(the_info[2]),
'theTransaction':the_info[3],
'theQuantity':int(the_info[4])}
The problem is #2 does not give me any errors, which is why I have trouble figuring out what the problem is
Your problem is right here in run #2:
for l in TheFile:
i += 1
print i
this_line = TheFile.next()
l already has the current line, and then you get another line using TheFile.next(). I bet that if you change this_line = TheFile.next() to this_line = l, you'll get the results you expect.

Categories

Resources