I searched everywhere and even though there were a couple of questions and answers regarding this error I couldn't find a solution to fix my problem
I'm reading in from a file that contains letters and numbers and I'm populating my matrix depending on the values in that file.
ex: file
description of letters and numbers ...
table:
a b c d
a 1 2 5 6
b 5 6 3 4
c 3 2 1 4
d 2 4 6 8
Here's the code
matrix = [[0 for j in range(4)] for i in range(4)]
i = 0
j = 0
for line in file:
for a in line:
if is_number(a):
matrix[i][j] = int(a)
j+= 1
if matrix.count(0) < 2: #since matrix already populated with zeroes. it shouldn't have
#many per program specifications, that's why I use this
#conditional to increment i and reset j back to 0
i += 1
j = 0
file.close()
I don't understand why I keep getting that error.
I see two possible ways you could end up with an IndexError in your code.
The first problem occurs because of the way you are iterating through the file that you're reading. Your code:
for line in file:
for a in line:
if is_number(a):
# do stuff
Reads a line in the file into the variable line. Then, each character is stored in the variable a and you check if it is a number. If any of the integers you are reading in are greater than 9 you will see an IndexError since it will count each digit as a separate number, causing you to eventually run out of room in your pre-allocated array.
A possible fix would be to change the line:
for a in line:
to
for a in line.split()
which will split the line into a list of words (that is, a new entry for everything separated by whitespace). So, "6 12 4 5" will become [6,12,4,5], making it so that you don't count the 1 and 2 in 12 separately.
The second issue I see with your code is in the line:
if matrix.count(0) < 2:
If your input file ever contains a zero, it will cause this line to stay true for one iteration of the loop longer than you would like. A possible fix would be to change the line to:
if j == len(matrix[0]) - 1:
try something like this:
with open("data1.txt") as f:
next(f) #skip the first line
lis=[map(int,x.split()[1:]) for x in f] #use x.split()[1:] to remove the alphabet
print lis
output:
[[1, 2, 5, 6], [5, 6, 3, 4], [3, 2, 1, 4], [2, 4, 6, 8]]
If you know the input file already has the right matrix (line by line) layout you could use the following :
matrix = filter(lambda x: len(x)>0, [[int(a) for a in l.split() if is_number(a)] for l in file])
If you cannot expect anything from the input layout, you could try:
data = open("test").read()
l = filter(lambda x: is_number(x), data.replace("\n"," ").split())
width = int(math.sqrt(len(l)))
print [[int(l[i+width*j]) for i in range(width)] for j in range(width)]
You're constructing a 4x4 matrix in the first line of code, but your data is a 6x6 matrix. When you try to store an element at index 4 in row 0, you get an IndexError.
The problem is here:
matrix = [[0 for j in range(4)] for i in range(4)]
Your matrix is 6x6, but your code only compensates for a 4x4 matrix.
Related
Wrote a python script that reads from a file input.txt
input.txt
2 //number of test cases
2 4 //testcase 1 ->'2' is size of 1st array and 4 is size of 2nd array
6 10 6 7 8 9 //testcase 1 -> 1st array is [6,10] and 2nd array is [6,7,8,9]
1 3 //testcase 2 ->'1' is size of 1st array and 3 is size of 2nd array
7 7 8 14 //testcase 2 -> 1st array is [7] and 2nd array is [7,8,14]
The 1st line in the file indicates number of test cases. In this example, we have 2 test cases. Each test case have 2 lines to process - in which first line indicates size of 1st array and size of 2nd array. 2nd line indicates the both array details.
ie, In above example, line 2 indicates size of 1st array and 2nd array for testcase1. line 3 indicates 2 arrays in mentioned sizes for testcase1.line 4 indicates size of 1st array and 2nd array for testcase2. line 5 indicates 2 arrays in mentioned sizes for testcase2.
I need to check whether the elements of 1st array is present in the 2nd one for each test cases. I wrote below program, but that will execute only for 1 test case(ie, I'm checking 2nd and 3rd line manually by giving the check i == 0)
from itertools import islice
def search(arr, element):
for i in range(len(arr)):
if int(arr[i]) == int(element):
return "yes"
return "no"
f = open("output.txt", "w")
with open("input.txt") as y_file:
count = y_file.readline()
if(count > 0):
for i, line in enumerate(y_file):
if(i == 0):
num, size = line.split()
split_list = [int(num), int(size)]
if(i == 1):
temp = iter(line.split())
res = [list(islice(temp, 0, ele)) for ele in split_list]
for i in range(len(res[0])):
result = search(res[1], res[0][i])
f.write("The result is : " + str(result) + "\n")
f.close()
Can anyone please help me in this?
output will be like
The result is : yes
The result is : no
The result is : yes
I would read the file line by line and process.
# read a line and convert to int() for num_cases.
for i in range(0, 2 * num_cases, 2):
# read a line for split_list
# check length of split_list
# read a line for list values all_nums
# check length of all_nums to be equal to sum of split_list
# split the list into arr1 and arr2.
# You can use slicing notation.
# For example:
# arr[:5] are the first 5 elements of arr. i.e. indexes 0, 1, 2, 3, 4
# arr[:-5] are the last 5 elements of arr. i.e. indexes len(arr) - 5, ... , len(arr) - 1
# arr[5:] are elements 5 and on from arr. i.e. indexes 5, 6, ...
for value in arr1:
if value in arr2:
# output
else:
# output
The function int(x) will raise ValueError when input is bad.
Checking the length of the lists allows you to make sure you get the expected number of values on each line.
If the file runs out of lines, readline() will return an empty string which results in a list with length 0 when you split and should fail the length checks.
You can use try, catch to write code to handle the errors such as ValueError from int().
Try this
first_array=[1,2,3]
sec_array=[4,5,3]
print(any(i in sec_array for i in first_array)) # prints True
I have a text file with millions of index points that are all interpreted as strings and is tab delimited. However, some index points could be missing. Here is an example of my text file:
1 0 4 0d 07:00:37.0400009155273
2 0 4 0d 07:00:37.0400009155273
3 0 4 0d 07:00:37.0400009155273
5 0 4 0d 07:00:37.0400009155273
7 0 4 0d 07:00:37.0400009155273
9 0 4 0d 07:00:37.0400009155273
Notice that rows 4, 6 and 8 are missing. My goal is to create a function that can parse through the text file, identify possible missing index points and return a list that has all the missing index points (if any) or return nothing.
I'm using Python 3.7 in Spyder IDE Windows10 os. I am relatively new to Python and Stackoverflow.
This is what I've got so far. This works to ID 1 missing index but fails if there are several missing index points.
The error starts after the first else line. I'm not sure how to track the observed index in the doc (1, 2, 3, 5...) with the for loop's index (0, 1, 2, 3...) as missing index points compound over time.
Note, the first 4 rows of the text doc contain header info which I ignore during the parsing that's why data = f.readlines()[4:]
def check_sorted_file(fileName):
missing_idx = []
count = 1
with open(fileName, 'r') as f:
data = f.readlines()[4:]
for x, line in enumerate(data):
idx = int(line.split()[0])
if idx == (count + x):
pass
else:
missing_idx.append(count + x)
count += 1
if missing_idx != []:
print('\nThe following idicie(s) are missing: ')
print(*missing_idx, sep=", ")
else:
print('\nAll indices are accounted for. ')
return missing_idx
...
Thanks for any and all help!
The other answer give you much better overall solutions, however I want to just help guide your given one in the right direction so you see how you could change yours to work:
def check_sorted_file(fileName):
missing_idx = []
last_index = 0
with open(fileName, 'r') as f:
data = f.readlines()[4:]
for line in data:
idx = int(line.split()[0])
if idx == last_index+1:
pass
else:
missing_idx.extend(list(range(last_index+1, idx)))
last_index = idx
if missing_idx:
print('\nThe following idicie(s) are missing: ')
print(*missing_idx, sep=", ")
else:
print('\nAll indices are accounted for. ')
return missing_idx
So instead of needing to use enumerate we will use the incoming index as our guide of where we are at.
To solve multiple missing, we use range to get all the numbers between the last index and the current one, and extend our list with that new set of numbers.
You can do this with Python alone:
with open(filename) as f:
indices = [int(row.split('\t')[0]) for row in f.read().split('\n')[4:]]
missing_indices = [index
for index in range(1, len(indices) + 1)
if index not in indices]
This converts your data into a nested list where each outer list contains a row and each inner list contains an element. Since we only care about the indices, we get the first element and ignore the others.
Then, since the indices are in running order starting from 1, we construct a range spanning the expected range of indices, and get the indices that exist in that range but not in the file.
Assuming the indices are unique (which seems reasonable), we can also use DYZ's suggestion to use sets:
missing_indices = set(range(1, len(indices) + 1) - set(indices)
pandas works fine too:
import pandas as pd
df = pd.read_csv(filename, sep='\t').iloc[4:]
range_index = pd.RangeIndex(1, len(df) + 1)
print(range_index[~range_index.isin(df.iloc[:, 0])]
This creates a pandas DataFrame from your data, cutting off the first four rows. Following the same principle as the other answer, it creates an index with all expected values and takes the subset of it that does not exist in the first column of the DataFrame.
Since you have a large number of rows, you might want to do this in a lazy fashion without making large lists or using in to test if every value is in a million line list. You can mix a few of the itertools to do this as an iterator and save the list for the end (if you even need it then).
Basically you make tee a map into two iterators to get the indexes, knock off a value of one of them with next() then zip them checking the difference as you go:
from itertools import chain, tee
lines = ["1 0 4 0d 07:00:37.0400009155273",
"2 0 4 0d 07:00:37.0400009155273",
"3 0 4 0d 07:00:37.0400009155273",
"5 0 4 0d 07:00:37.0400009155273",
"7 0 4 0d 07:00:37.0400009155273",
"9 0 4 0d 07:00:37.0400009155273"
]
#two iterators going over indexes
i1, i2 = tee(map(lambda x: int(x.split()[0]), lines), 2)
# move one forward
next(i2)
# chain.from_iterable will be an iterator producing missing indexes:
list(chain.from_iterable(range(i+1, j) for i, j in zip(i1, i2) if j-i!=1))
Result:
[4, 6, 8]
Here's a compact, robust, set-based, core Python-only solution. Read the file, split each line into fields, convert the first field into an int, and build a set of actual indexes:
skip = 4 # Skip that many lines
with open(yourfile) as f:
for _ in range(skip):
next(f)
actual = {int(line.split()[0]) for line in f}
Create a set of expected indexes and take set difference:
expected = set(range(min(actual), max(actual) + 1))
sorted(expected - actual)
#[4, 6, 8]
The solution works even when the indexes do not start at 1.
I am trying to insert a number into a list of a sequence of numbers, for some reason this small program just sits there consuming CPU power... no idea why it's not working:
number = 5
lst = [4,5,6]
if all(x > number for x in lst):
lst.insert(0,number)
elif all(x < number for x in lst):
lst.append(number)
else:
for i,v in enumerate(lst):
if v>number:
lst.insert(i-1,number)
print (lst)
expected output:
lst = [4,5,5,6]
Your for loop is inserting the number 5 into the middle of the list a theoretically infinite amount of times (or until you run out of whatever limited resource the list consumes, whichever happens first).
1) for i,v in enumerate(lst):
2) if v>number:
3) lst.insert(i-1,number)
On the first pass, line 1 starts the loop with v = 4 and i = 0. Line 2 finds v is not greater than number.
On the second pass, line 1 continues the loop with v = 5 and i = 1. Line 2 is also false.
Third pass, line 1: v = 6, i = 2. Line 2 finds a true statement and moves to line 3. Line 3 inserts the object referenced by number into position i - 1, inserting 5 into position 1 of the list.
At this point the list is:
lst = [4, *5*, **5**, 6]
The italicized 5 is the number you added to the list. The bolded 5 is where the current pointer is, i = 2. Notice that the 6 we just checked got moved forward with the insert.
Fourth pass: v = 6, i = 3. Line 2 finds a true statement and moves to line 3. Line 3 inserts the object referenced by number into position i - 1, inserting 5 into position 2 of the list.
At this point the list is:
lst = [4, 5, *5*, **5**, 6]
etc etc etc.
A quick fix:
for i, v in enumerate(lst):
if v > number:
lst.insert(i-1, number)
**break**
You're just checking for and adding a single number, so break out of the loop once you insert it, since you're done.
I am doing some homework and I am stuck. I supposed to write a code in Python 3, that reads a text file that is like a bill with an amount of numbers. I am supposed to write the code so it will calculate the total amount.
I figured that a bill(simple example) will contain a number and the a prise. like:
2 10$
1 10 $
and so on
So I figured that I create a list with all numbers in it and then I want to multiply the first with the second element in the list and then jump in the list so the tird and the fourth element get multiply and so on, until there is no more numbers in my list. During this happens I want the sums of every multiplication in a new list called sums there they later will be summed.
my code looks like this so far:
file = open('bill.txt')
s = file.read()
file.close()
numbers = []
garbage = []
for x in s.split():
try:
numbers.append(float(x))
except ValueEror:
garbage.append()
print(numbers)
for n in numbers:
sums = []
start = 0
nxt = start + 1
t = numbers[start]*numbers[nxt]
if n <= len(numbers):
start += 2
nxt += 2
summor.append(t)
if n == len(numbers):
print(sum(sums))
The problem in your code is likely that you keep resetting sums for every number in the list. So you basically keep forgetting the sums you previously collected. The same applies to start and nxt. If you move these definition outside of the loop, and also fix the summor.append(t) to sums.append(t) it should almost work. Also note that the loop variable n does not iterate over indexes of numbers but over the actual items (i.e. the numbers). So the check at the end n == len(numbers) is unlikely to do what you expect. Instead, you can just put the print after the loop because you know that when you reached that point the loop is over and all numbers have been looked at.
A common idiom is to use zip to combine two elements with each other. Usually, you use zip on two different iterators or sequences, but you can also use zip on the same iterator for a single list:
>>> numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> it = iter(numbers)
>>> for a, b in zip(it, it):
print(a, b)
1 2
3 4
5 6
7 8
As you can see, this will group the numbers automatically while iterating through the list. So now you just need to collect those sums:
>>> sums = []
>>> it = iter(numbers)
>>> for a, b in zip(it, it):
sums.append(a * b)
>>> sum(sums)
100
And finally, you can shorten that a bit more using generator expressions:
>>> it = iter(numbers)
>>> sum(a * b for a, b in zip(it, it))
100
This may be a typo when you entered it but on your 2nd line of data there is a space between the amount and $ and no space on the line above. This will cause some problems when striping out the $.
Try this:
total = 0
with open('data.txt', 'r') as f:
lines = f.readlines()
amounts = []
garbage = []
for line in lines:
try:
multiples = line.split()
multiplier = multiples[0]
amount = multiples[1].strip('$\n')
amounts.append(int(multiplier)*float(amount))
except:
garbage.append(line)
total = sum(amounts)
print(total)
With some minor formatting on output or the text data you should be able to get the result you want. Here is the txt data I used:
2 10$
1 10$
3 20.00$
and the output is:
90.0
Also, you may want to get away from file = open('data.txt', 'r') which requires you to explicitly close your file. Use with open('data.txt', 'r') as f: and your file will implicitly be closed.
Firstly, I would say, quite a good attempt. If you don't want to be spoiled, you should not read the code I have written unless you try out the corrections/improvements yourself. (I am not a pro)
Well, like you said every line contains 2 numbers that need be multiplied, you should start by reading line by line & not individual numbers as that might get troublesome.
file = open('bill.txt')
s = file.readlines()
file.close()
numbers = []
garbage = []
for line in s:
try:
line.strip('$') # removes the trailing $ sign for easy number extraction
numbers.append(map(int, line.split()))
except ValueEror:
garbage.append(line)
print(numbers)
Later, when summing up the multiplication of the number & the bills, you probably won't require a list for that, instead a sum integer variable should be enough.
sum = 0
for num, bill in numbers:
sum += num * bill
print("The total amount is", sum)
Several things:
sums = [] is inside the loop for n in numbers. This means you're resetting the sums list every time. You should move this outside the loop.
The same applies to start = 0 and nxt = start + 1. These values now keep being reset to 0 and 1 respectively for each number in the list.
You are comparing if n <= len(numbers): but what is n? It's an actual value in your list of numbers, it doesn't represent the position in the list. You should replace this with while start < len(numbers): - as long as the starting position is still within the list we should keep going. This also means you can remove n from your code.
The code now becomes:
sums = []
start = 0
nxt = start + 1
while start < len(numbers):
t = numbers[start]*numbers[nxt]
start += 2
nxt += 2
sums.append(t)
print(sum(sums))
I have met a very strange issue. I use a multi-dimension list to store some data. The data are in a .txt file, every line has 9 digit, and it has 450 lines data. For every 9 lines data(a 9x9 digit grid), I want to group them as a sublist. I use the code below to store the data, and my problem is when I finished and print the multi-dimension list, it seems every line of data in the list are the same. Sorry for my poor description, maybe my code can tell everything, and please tell me what's wrong with the code. I use python 2.7.5 on Windows, thanks.
# grid is a 3-dimension list, the first dimension is the index of 9x9 digit subgrid
# there are 50 9x9 digit subgrid in total.
grid = [[[0]*9]*9]*50
with open('E:\\sudoku.txt', 'r') as f:
lines = f.readlines()
for line_number, line in enumerate(lines, 1):
# omit this line, it is not data
if line_number % 10 == 1:
continue
else:
for col, digit in enumerate(line[:-1]):
if line_number % 10 == 0:
x = line_number / 10 - 1
y = 8
else:
x = line_number / 10
y = line_number % 10 - 2
grid[x][y][col] = int(digit)
# I print all the digits in the list and compare them with the .txt file
# it shows every single cell in grid are set correctly !!
print 'x=%d, y=%d, z=%d, value=%d '% (x, y, col, grid[x][y][col])
# But strange thing happens here
# I only get same line of value, like this:
# [[[0, 0, 0, 0, 0, 8, 0, 0, 6], [0, 0, 0, 0, 0, 8, 0, 0, 6] ... all the same line
# and 000008006 happens to be the last line of data in the .txt file
# what happens to the rest of data ? It's like they are all overwritten by the last line
print grid
The list multiplication does not clone (or create new) objects, but simply references the same (in your case mutable) object various times.
See here:
a = [[3]*3]*3
print(a)
a[0][1] = 1
print(a)
grid = [[[0]*9]*9]*50 does not create 50 of 9x9 grid. When you state [[[0]*9]*9]*50, python creates 9 reference of [0].
List multiplication only creates new references of the same object, in other words, every single list you made refer to the same place in the memory, used to store the [0] list.
A similiar question.