Python programming changing a specific value in file - python

I have a problem with concerning my code:
with open('Premier_League.txt', 'r+') as f:
data = [int(line.strip()) for line in f.readlines()] # [1, 2, 3]
f.seek(0)
i = int(input("Add your result! \n"))
data[i] += 1 # e.g. if i = 1, data now [1, 3, 3]
for line in data:
f.write(str(line) + "\n")
f.truncate()
print(data)
The code works that the file "Premier_League.txt" that contains for example:
1
2
3
where i=1
gets converted to and saved to already existing file (the previous info gets deleted)
1
3
3
My problem is that I want to chose a specific value in a matris (not only a vertical line) for example:
0 0 0 0
0 0 0 0
0 0 0 0
where i would like to change it to for example:
0 1 0 0
0 0 0 0
0 0 0 0
When I run this trough my program this appears:
ValueError: invalid literal for int() with base 10: '1 1 1 1'
So my question is: how do I change a specific value in a file that contains more than a vertical line of values?

The problem is you are not handling the increased number of dimensions properly. Try something like this;
with open('Premier_League.txt', 'r+') as f:
# Note this is now a 2D matrix (nested list)
data = [[int(value) for value in line.strip().split()] for line in f ]
f.seek(0)
# We must specify both a column and row
i = int(input("Add your result to column! \n"))
j = int(input("Add your result to row! \n"))
data[i][j] += 1 # Assign to the index of the column and row
# Parse out the data and write back to file
for line in data:
f.write(' '.join(map(str, line)) + "\n")
f.truncate()
print(data)
You could also use a generator expression to write to the file, for example;
# Parse out the data and write back to file
f.write('\n'.join((' '.join(map(str, line)) for line in data)))
instead of;
# Parse out the data and write back to file
for line in data:
f.write(' '.join(map(str, line)) + "\n")

First up, you are trying to parse the string '0 0 0 0' as an int, that's the error you are getting. To fix this, do:
data = [[int(ch) for ch in line.strip().split()] for line in f.readlines()]
This will create a 2D array, where the first index corresponds to the row, and the second index corresponds to the column. Then, you would probably want the user to give you two values, instead of a singular i since you are trying to edit in a 2D array.
Edit:
So your following code will look like this:
i = int(input("Add your result row: \n"))
j = int(input("Add your result column: \n"))
data[i][j] += 1
# For data = [[1,2,1], [2,3,2]], and user enters i = 1
# and j = 0, the new data will be [[1,2,1], [3,3,2]]

Related

Add new line after finding last string in a region

I have this input test.txt file with the output interleaved as #Expected in it (after finding the last line containing 1 1 1 1 within a *Title region
and this code in Python 3.6
index = 0
insert = False
currentTitle = ""
testfile = open("test.txt","r")
content = testfile.readlines()
finalContent = content
testfile.close()
# Should change the below line of code I guess to adapt
#titles = ["TitleX","TitleY","TitleZ"]
for line in content:
index = index + 1
for title in titles:
if line in title+"\n":
currentTitle = line
print (line)
if line == "1 1 1 1\n":
insert = True
if (insert == True) and (line != "1 1 1 1\n"):
finalContent.insert(index-1, currentTitle[:6] + "2" + currentTitle[6:])
insert = False
f = open("test.txt", "w")
finalContent = "".join(finalContent)
f.write(finalContent)
f.close()
Update:
Actual output with the answer provided
*Title Test
12125
124125
asdas 1 1 1 1
rthtr 1 1 1 1
asdasf 1 1 1 1
asfasf 1 1 1 1
blabla 1 1 1 1
#Expected "*Title Test2" here <-- it didn't add it
124124124
*Title Dunno
12125
124125
12763125 1 1 1 1
whatever 1 1 1 1
*Title Dunno2
#Expected "*Title Dunno2" here <-- This worked great
214142122
#and so on for thousands of them..
Also is there a way to overwrite this in the test.txt file?
Because you are already reading the entire file into memory anyway, it's easy to scan through the lines twice; once to find the last transition out of a region after each title, and once to write the modified data back to the same filename, overwriting the previous contents.
I'm introducing a dictionary variable transitions where the keys are the indices of the lines which have a transition, and the value for each is the text to add at that point.
transitions = dict()
in_region = False
reg_end = -1
current_title = None
with open("test.txt","r") as testfile:
content = testfile.readlines()
for idx, line in enumerate(content):
if line.startswith('*Title '):
# Commit last transition before this to dict, if any
if current_title:
transitions[reg_end] = current_title
# add suffix for printing
current_title = line.rstrip('\n') + '2\n'
elif line.strip().endswith(' 1 1 1 1'):
in_region = True
# This will be overwritten while we remain in the region
reg_end = idx
elif in_region:
in_region = False
if current_title:
transitions[reg_end] = current_title
with open("test.txt", "w") as output:
for idx, line in enumerate(content):
output.write(line)
if idx in transitions:
output.write(transitions[idx])
This kind of "remember the last time we saw something" loop is very common, but takes some time getting used to. Inside the loop, keep in mind that we are looping over all the lines, and remembering some things we saw during a previous iteration of this loop. (Forgetting the last thing you were supposed to remember when you are finally out of the loop is also a very common bug!)
The strip() before we look for 1 1 1 1 normalizes the input by removing any surrounding whitespace. You could do other kinds of normalizations, too; normalizing your data is another very common technique for simplifying your logic.
Demo: https://ideone.com/GzNUA5
try this, using itertools.zip_longest
from itertools import zip_longest
with open("test.txt","r") as f:
content = f.readlines()
results, title = [], ""
for i, j in zip_longest(content, content[1:]):
# extract title.
if i.startswith("*"):
title = i
results.append(i)
# compare value in i'th index with i+1'th (if mismatch add title)
if "1 1 1 1" in i and "1 1 1 1" not in j:
results.append(f'{title.strip()}2\n')
print("".join(results))

Print data between positions within a loop

I have one files.
File1 which has 3 columns. Data are tab separated
File1:
2 4 Apple
6 7 Samsung
Let's say if I run a loop of 10 iteration. If the iteration has value between column 1 and column 2 of File1, then print the corresponding 3rd column from File1, else print "0".
The columns may or may not be sorted, but 2nd column is always greater than 1st. Range of values in the two columns do not overlap between lines.
The output Result should look like this.
Result:
0
Apple
Apple
Apple
0
Samsung
Samsung
0
0
0
My program in python is here:
chr5_1 = [[]]
for line in file:
line = line.rstrip()
line = line.split("\t")
chr5_1.append([line[0],line[1],line[2]])
# Here I store all position information in chr5_1 list in list
chr5_1.pop(0)
for i in range (1,10):
for listo in chr5_1:
L1 = " ".join(str(x) for x in listo[:1])
L2 = " ".join(str(x) for x in listo[1:2])
L3 = " ".join(str(x) for x in listo[2:3])
if int(L1) <= i and int(L2) >= i:
print(L3)
break
else:
print ("0")
break
I am confused with loop iteration and it break point.
Try this:
chr5_1 = dict()
for line in file:
line = line.rstrip()
_from, _to, value = line.split("\t")
for i in range(int(_from), int(_to) + 1):
chr5_1[i] = value
for i in range (1, 10):
print chr5_1.get(i, "0")
I think this is a job for else:
position_information = []
with open('file1', 'rb') as f:
for line in f:
position_information.append(line.strip().split('\t'))
for i in range(1, 11):
for start, through, value in position_information:
if i >= int(start) and i <= int(through):
print value
# No need to continue searching for something to print on this line
break
else:
# We never found anything to print on this line, so print 0 instead
print 0
This gives the result you're looking for:
0
Apple
Apple
Apple
0
Samsung
Samsung
0
0
0
Setup:
import io
s = '''2 4 Apple
6 7 Samsung'''
# Python 2.x
f = io.BytesIO(s)
# Python 3.x
#f = io.StringIO(s)
If the lines of the file are not sorted by the first column:
import csv, operator
reader = csv.reader(f, delimiter = ' ', skipinitialspace = True)
f = list(reader)
f.sort(key = operator.itemgetter(0))
Read each line; do some math to figure out what to print and how many of them to print; print stuff; iterate
def print_stuff(thing, n):
while n > 0:
print(thing)
n -= 1
limit = 10
prev_end = 1
for line in f:
# if iterating over a file, separate the columns
begin, end, text = line.strip().split()
# if iterating over the sorted list of lines
#begin, end, text = line
begin, end = map(int, (begin, end))
# don't exceed the limit
begin = begin if begin < limit else limit
# how many zeros?
gap = begin - prev_end
print_stuff('0', gap)
if begin == limit:
break
# don't exceed the limit
end = end if end < limit else limit
# how many words?
span = (end - begin) + 1
print_stuff(text, span)
if end == limit:
break
prev_end = end
# any more zeros?
gap = limit - prev_end
print_stuff('0', gap)

How to find all instances of list values(ex: [1,2,3]) in a file at a specific index

I want to find out a list of elements in a file at a specific index.
For ex, below are the contents of the file "temp.txt"
line_0 1
line_1 2
line_2 3
line_3 4
line_4 1
line_5 1
line_6 2
line_7 1
line_8 2
line_9 3
line_10 4
Now, I need to find out the list of values [1,2,3] occurring in sequence at column 2 of each line in above file.
Output should look like below:
line_2 3
line_9 3
I have tried the below logic, but it some how not working ;(
inf = open("temp.txt", "rt")
count = 0
pos = 0
ListSeq = ["1","2","3"]
for line_no, line in enumerate(inf):
arr = line.split()
if len(arr) > 1:
if count == 1 :
pos = line_no
if ListSeq[count] == arr[1] :
count += 1
elif count > 0 :
inf.seek(pos)
line_no = pos
count = 0
else :
count = 0
if count >= 3 :
print(line)
count = 0
Can somebody help me in finding the issue with above code? or even a different logic which will give a correct output is also fine.
Your code is flawed. Most prominent bug: trying to seek in a text file using line number is never going to work: you have to use byte offset for that. Even if you did that, it would be wrong because you're iterating on the lines, so you shouldn't attempt to change file pointer while doing that.
My approach:
The idea is to "transpose" your file to work with vertical vectors, find the sequence in the 2nd vertical vector, and use the found index to extract data on the first vertical vector.
split lines to get text & number, zip the results to get 2 vectors: 1 of numbers 1 of text.
At this point, one list contains ["line_0","line_1",...] and the other one contains ["1","2","3","4",...]
Find the indexes of the sequence in the number list, and print the couple txt/number when found.
code:
with open("text.txt") as f:
sequence = ('1','2','3')
txt,nums = list(zip(*(l.split()[:2] for l in f))) # [:2] in case there are more columns
for i in range(len(nums)-len(sequence)+1):
if nums[i:i+len(sequence)]==sequence:
print("{} {}".format(txt[i+2],nums[i+2]))
result:
line_2 3
line_9 3
last for loop can be replaced by a list comprehension to generate the tuples:
result = [(txt[i+2],nums[i+2]) for i in range(len(nums)-len(sequence)) if nums[i:i+len(sequence)]==sequence ]
result:
[('line_2', '3'), ('line_9', '3')]
Generalizing for any sequence and any column.
sequence = ['1','2','3']
col = 1
with open(filename, 'r') as infile:
idx = 0
for _i, line in enumerate(infile):
if line.strip().split()[col] == sequence[idx]:
if idx == len(sequence)-1:
print(line)
idx = 0
else:
idx += 1
else:
idx = 0

Read in file contents to Arrays

I am a bit familiar with Python. I have a file with information that I need to read in a very specific way. Below is an example...
1
6
0.714285714286
0 0 1.00000000000
0 1 0.61356352337
...
-1 -1 0.00000000000
0 0 5.13787636499
0 1 0.97147643932
...
-1 -1 0.00000000000
0 0 5.13787636499
0 1 0.97147643932
...
-1 -1 0.00000000000
0 0 0 0 5.13787636499
0 0 0 1 0.97147643932
....
So every file will have this structure (tab delimited).
The first line must be read in as a variable as well as the second and third lines.
Next we have four blocks of code separated by a -1 -1 0.0000000000. Each block of code is 'n' lines long. The first two numbers represent the position/location that the 3rd number in the line is to be inserted in an array. Only the unique positions are listed (so, position 0 1 would be the same as 1 0 but that information would not be shown).
Note: The 4th block of code has a 4-index number.
What I need
The first 3 lines read in as unique variables
Each block of data read into an array using the first 2 (or 4 ) column of numbers as the array index and the 3rd column as the value being inserted into an array.
Only unique array elements shown. I need the mirrored position to be filled with the proper value as well (a 0 1 value should also appear in 1 0).
The last block would need to be inserted into a 4-dimensional array.
I rewrote the code. Now it's almost what you need. You only need fine tuning.
I decided to leave the old answer - perhaps it would be helpful too.
Because the new is feature-rich enough, and sometimes may not be clear to understand.
def the_function(filename):
"""
returns tuple of list of independent values and list of sparsed arrays as dicts
e.g. ( [1,2,0.5], [{(0.0):1,(0,1):2},...] )
on fail prints the reason and returns None:
e.g. 'failed on text.txt: invalid literal for int() with base 10: '0.0', line: 5'
"""
# open file and read content
try:
with open(filename, "r") as f:
data_txt = [line.split() for line in f]
# no such file
except IOError, e:
print 'fail on open ' + str(e)
# try to get the first 3 variables
try:
vars =[int(data_txt[0][0]), int(data_txt[1][0]), float(data_txt[2][0])]
except ValueError,e:
print 'failed on '+filename+': '+str(e)+', somewhere on lines 1-3'
return
# now get arrays
arrays =[dict()]
for lineidx, item in enumerate(data_txt[3:]):
try:
# for 2d array data
if len(item) == 3:
i, j = map(int, item[:2])
val = float(item[-1])
# check for 'block separator'
if (i,j,val) == (-1,-1,0.0):
# make new array
arrays.append(dict())
else:
# update last, existing
arrays[-1][(i,j)] = val
# almost the same for 4d array data
if len(item) == 5:
i, j, k, m = map(int, item[:4])
val = float(item[-1])
arrays[-1][(i,j,k,m)] = val
# if value is unparsable like '0.00' for int or 'text'
except ValueError,e:
print 'failed on '+filename+': '+str(e)+', line: '+str(lineidx+3)
return
return vars, arrays
As i anderstand what did you ask for..
# read data from file into list
parsed=[]
with open(filename, "r") as f:
for line in f:
# # you can exclude separator here with such code (uncomment) (1)
# # be careful one zero more, one zero less and it wouldn work
# if line == '-1 -1 0.00000000000':
# continue
parsed.append(line.split())
# a simpler version
with open(filename, "r") as f:
# # you can exclude separator here with such code (uncomment, replace) (2)
# parsed = [line.split() for line in f if line != '-1 -1 0.00000000000']
parsed = [line.split() for line in f]
# at this point 'parsed' is a list of lists of strings.
# [['1'],['6'],['0.714285714286'],['0', '0', '1.00000000000'],['0', '1', '0.61356352337'] .. ]
# ALT 1 -------------------------------
# we do know the len of each data block
# get the first 3 lines:
head = parsed[:3]
# get the body:
body = parsed[3:-2]
# get the last 2 lines:
tail = parsed[-2:]
# now you can do anything you want with your data
# but remember to convert str to int or float
# first3 as unique:
unique0 = int(head[0][0])
unique1 = int(head[1][0])
unique2 = float(head[2][0])
# cast body:
# check each item of body has 3 inner items
is_correct = all(map(lambda item: len(item)==3, body))
# parse str and cast
if is_correct:
for i, j, v in body:
# # you can exclude separator here (uncomment) (3)
# # * 1. is the same as float(1)
# if (i,j,v) == (0,0,1.):
# # here we skip iteration for line w/ '-1 -1 0.0...'
# # but you can place another code that will be executed
# # at the point where block-termination lines appear
# continue
some_body_cast_function(int(i), int(j), float(v))
else:
raise Exception('incorrect body')
# cast tail
# check each item of body has 5 inner items
is_correct = all(map(lambda item: len(item)==5, tail))
# parse str and cast
if is_correct:
for i, j, k, m, v in body: # 'l' is bad index, because similar to 1.
some_tail_cast_function(int(i), int(j), int(k), int(m), float(v))
else:
raise Exception('incorrect tail')
# ALT 2 -----------------------------------
# we do NOT know the len of each data block
# maybe we have some array?
array = dict() # your array may be other type
v1,v2,v2 = parsed[:3]
unique0 = int(v1[0])
unique1 = int(v2[0])
unique2 = float(v3[0])
for item in parsed[3:]:
if len(item) == 3:
i,j,v = item
i = int(i)
j = int(j)
v = float(v)
# # yo can exclude separator here (uncomment) (4)
# # * 1. is the same as float(1)
# # logic is the same as in 3rd variant
# if (i,j,v) == (0,0,1.):
# continue
# do your stuff
# for example,
array[(i,j)]=v
array[(j,i)]=v
elif len(item) ==5:
i, j, k, m, v = item
i = int(i)
j = int(j)
k = int(k)
m = int(m)
v = float(v)
# do your stuff
else:
raise Exception('unsupported') # or, maybe just 'pass'
To read lines from a file iteratively, you can use something like:
with open(filename, "r") as f:
var1 = int(f.next())
var2 = int(f.next())
var3 = float(f.next())
for line in f:
do some stuff particular to the line we are on...
Just create some data structures outside the loop, and fill them in the loop above. To split strings into elements, you can use:
>>> "spam ham".split()
['spam', 'ham']
I also think you want to take a look at the numpy library for array datastructures, and possible the SciPy library for analysis.

Help with an if else loop in python

Hi here is my problem. I have a program that calulcates the averages of data in columns.
Example
Bob
1
2
3
the output is
Bob
2
Some of the data has 'na's
So for Joe
Joe
NA
NA
NA
I want this output to be NA
so I wrote an if else loop
The problem is that it doesn't execute the second part of the loop and just prints out one NA. Any suggestions?
Here is my program:
with open('C://achip.txt', "rtU") as f:
columns = f.readline().strip().split(" ")
numRows = 0
sums = [0] * len(columns)
numRowsPerColumn = [0] * len(columns) # this figures out the number of columns
for line in f:
# Skip empty lines since I was getting that error before
if not line.strip():
continue
values = line.split(" ")
for i in xrange(len(values)):
try: # this is the whole strings to math numbers things
sums[i] += float(values[i])
numRowsPerColumn[i] += 1
except ValueError:
continue
with open('c://chipdone.txt', 'w') as ouf:
for i in xrange(len(columns)):
if numRowsPerColumn[i] ==0 :
print 'NA'
else:
print>>ouf, columns[i], sums[i] / numRowsPerColumn[i] # this is the average calculator
The file looks like so:
Joe Bob Sam
1 2 NA
2 4 NA
3 NA NA
1 1 NA
and final output is the names and the averages
Joe Bob Sam
1.5 1.5 NA
Ok I tried Roger's suggestion and now I have this error:
Traceback (most recent call last):
File "C:/avy14.py", line 5, in
for line in f:
ValueError: I/O operation on closed file
Here is this new code:
with open('C://achip.txt', "rtU") as f:
columns = f.readline().strip().split(" ")
sums = [0] * len(columns)
rows = 0
for line in f:
line = line.strip()
if not line:
continue
rows += 1
for col, v in enumerate(line.split()):
if sums[col] is not None:
if v == "NA":
sums[col] = None
else:
sums[col] += int(v)
with open("c:/chipdone.txt", "w") as out:
for name, sum in zip(columns, sums):
print >>out, name,
if sum is None:
print >>out, "NA"
else:
print >>out, sum / rows
with open("c:/achip.txt", "rU") as f:
columns = f.readline().strip().split()
sums = [0.0] * len(columns)
row_counts = [0] * len(columns)
for line in f:
line = line.strip()
if not line:
continue
for col, v in enumerate(line.split()):
if v != "NA":
sums[col] += int(v)
row_counts[col] += 1
with open("c:/chipdone.txt", "w") as out:
for name, sum, rows in zip(columns, sums, row_counts):
print >>out, name,
if rows == 0:
print >>out, "NA"
else:
print >>out, sum / rows
I'd also use the no-parameter version of split when getting the column names (it allows you to have multiple space separators).
Regarding your edit to include input/output sample, I kept your original format and my output would be:
Joe 1.75
Bob 2.33333333333
Sam NA
This format is 3 rows of (ColumnName, Avg) columns, but you can change the output if you want, of course. :)
Using numpy:
import numpy as np
with open('achip.txt') as f:
names=f.readline().split()
arr=np.genfromtxt(f)
print(arr)
# [[ 1. 2. NaN]
# [ 2. 4. NaN]
# [ 3. NaN NaN]
# [ 1. 1. NaN]]
print(names)
# ['Joe', 'Bob', 'Sam']
print(np.ma.mean(np.ma.masked_invalid(arr),axis=0))
# [1.75 2.33333333333 --]
Using your original code, I would add one loop and edit the print statement
with open(r'C:\achip.txt', "rtU") as f:
columns = f.readline().strip().split(" ")
numRows = 0
sums = [0] * len(columns)
numRowsPerColumn = [0] * len(columns) # this figures out the number of columns
for line in f:
# Skip empty lines since I was getting that error before
if not line.strip():
continue
values = line.split(" ")
### This removes any '' elements caused by having two spaces like
### in the last line of your example chip file above
for count, v in enumerate(values):
if v == '':
values.pop(count)
### (End of Addition)
for i in xrange(len(values)):
try: # this is the whole strings to math numbers things
sums[i] += float(values[i])
numRowsPerColumn[i] += 1
except ValueError:
continue
with open('c://chipdone.txt', 'w') as ouf:
for i in xrange(len(columns)):
if numRowsPerColumn[i] ==0 :
print>>ouf, columns[i], 'NA' #Just add the extra parts
else:
print>>ouf, columns[i], sums[i] / numRowsPerColumn[i]
This solution also gives the same result in Roger's format, not your intended format.
Solution below is cleaner and has fewer lines of code ...
import pandas as pd
# read the file into a DataFrame using read_csv
df = pd.read_csv('C://achip.txt', sep="\s+")
# compute the average of each column
avg = df.mean()
# save computed average to output file
avg.to_csv("c:/chipdone.txt")
They key to the simplicity of this solution is the way the input text file is read into a Dataframe. Pandas read_csv allows you to use regular expressions for specifying the sep/delimiter argument. In this case, we used the "\s+" regex pattern to take care of having one or more spaces between columns.
Once the data is in a dataframe, computing the average and saving to a file can all be done with straight forward pandas functions.

Categories

Resources