I am trying to check the first character in each line from a separate data file. This is the loop that I am using, but for some reason I get an error that says string index out of range.
for line_no in length:
line_being_checked = linecache.getline(file_path, line_no)
print(line_being_checked[0])
From what I understand (not very in english), lenght is the number of lines you want to check in the files.
You could do something like that:
for line in open("file.txt", "r").read().splitlines():
print(line[0])
This way, you'll be sure that the lenght is correct.
For the error, it is possible that you have an empty line, so you could len(line) to check if it is the case.
Related
In Codio, there is a challenge to take a file, search for the number of times a
string appears in it, then print that number. I was able to get the result using some suggestions, but I am still unclear on a few things.
The main question is, at what point is the loop searching for the substring S? The count() syntax that I see everywhere involves using the string to be searched, followed by the dot operator, and then the function with the substring we want to find as the parameter. It would look something like: P.count(S)
What confuses me is that the function is using line in place of P. So does this mean the function is searching line for the substring? And if so, how does that work if line is simply the counter variable for the for loop? I just want to have a clearer understanding of how this function is working in this context to get me the correct count of times that substring S appears in file P.
import sys
P= sys.argv[1]
S= sys.argv[2]
# Your code goes here
f = open(P, 'r')
c = 0
for line in f.readlines():
c += line.count(S)
print(c)
does this mean the function is searching "line" for the substring
Yes, that's exactly what it means. And the value of line changes in every loop iteration.
And if so, how does that work if "line" is simply the counter variable for the "for" loop
It's not. Python for loops don't have counters. line is the actual line of text.
for letter in ['A', 'B', 'C']:
print(letter)
prints
A
B
C
Let's dissect the loop:
for line in f.readlines():
c += line.count(S)
f is a file descriptor of your open file.
readlines is a generator, a function sort of thing that returns the lines of the file. If you think of it as a list of strings, each of which is a line of the file, you'll be close enough to understand the loop operation.
Thus, the statement for line in f.readlines(): iterates the variable line through the file contents; on each loop iteration, line will be the appropriate string value, the next line of the file.
Therefore, line.count(S) returns the quantity of times the target string S appears in that line of the file. The increment c += adds that to your counter.
Does that make things clear enough?
BTW, please learn to use descriptive variable names. One-letter names with mixed upper- and lower-case are a bad habit in the long run.
I have a file which currently stores a string eeb39d3e-dd4f-11e8-acf7-a6389e8e7978
which I am trying to pass into as a variable to my subprocess command.
My current code looks like this
with open(logfilnavn, 'r') as t:
test = t.readlines()
print(test)
But this prints ['eeb39d3e-dd4f-11e8-acf7-a6389e8e7978\n'] and I don't want the part with ['\n'] to be passed into my command, so i'm trying to remove them by using replace.
with open(logfilnavn, 'r') as t:
test = t.readlines()
removestrings = test.replace('[', '').replace('[', '').replace('\\', '').replace("'", '').replace('n', '')
print(removestrings)
I get an exception value saying this so how can I replace these with nothing and store them as a string for my subprocess command?
'list' object has no attribute 'replace'
so how can I replace these with nothing and store them as a string for my subprocess command?
readline() returns a list. Try print(test[0].strip())
You can read the whole file and split lines using str.splitlines:
test = t.read().splitlines()
Your test variable is a list, because readlines() returns a list of all lines read.
Since you said the file only contains this one line, you probably wish to perform the replace on only the first line that you read:
removestrings = test[0].replace('[', '').replace('[', '').replace('\\', '').replace("'", '').replace('n', '')
Where you went wrong...
file.readlines() in python returns an array (collection or grouping of the same variable type) of the lines in the file -- arrays in python are called lists. you, here are treating the list as a string. you must first target the string inside it, then apply that string-only function.
In this case however, this would not work as you are trying to change the way the python interpretter has displayed it for one to understand.
Further information...
In code it would not be a string - we just can't easily understand the stack, heap and memory addresses easily. The example below would work for any number of lines (but it will only print the first element) you will need to change that and
this may be useful...
you could perhaps make the variables globally available (so that other parts of the program can read them
more useless stuff
before they go out of scope - the word used to mean the points at which the interpreter (what runs the program) believes the variable is useful - so that it can remove it from memory, or in much larger programs only worry about the locality of variables e.g. when using for loops i is used a lot without scope there would need to be a different name for each variable in the whole project. scopes however get specialised (meaning that if a scope contains the re-declaration of a variable this would fail as it is already seen as being one. an easy way to understand this might be to think of them being branches and the connections between the tips of branches. they don't touch along with their variables.
solution?
e.g:
with open(logfilenavn, 'r') as file:
lines = file.readlines() # creates a list
# an in-line for loop that goes through each item and takes off the last character: \n - the newline character
#this will work with any number of lines
strippedLines = [line[:-1] for line in lines]
#or
strippedLines = [line.replace('\n', '') for line in lines]
#you can now print the string stored within the list
print(strippedLines[0]) # this prints the first element in the list
I hope this helped!
You get the error because readlines returns a list object. Since you mentioned in the comment that there is just one line in the file, its better to use readline() instead,
line = "" # so you can use it as a variable outside `with` scope,
with open("logfilnavn", 'r') as t:
line = t.readline()
print(line)
# output,
eeb39d3e-dd4f-11e8-acf7-a6389e8e7978
readlines will return a list of lines, and you can't use replace with a list.
If you really want to use readlines, you should know that it doesn't remove the newline character from the end, you'll have to do it yourself.
lines = [line.rstrip('\n') for line in t.readlines()]
But still, after removing the newline character yourself from the end of each line, you'll have a list of lines. And from the question, it looks like, you only have one line, you can just access first line lines[0].
Or you can just leave out readlines, and just use read, it'll read all of the contents from the file. And then just do rstrip.
contents = t.read().rstrip('\n')
i have a very large file, which can not be opened by kind of texteditor or something.
And i need to Check if (1) the line starts with a specific string and (2) if a number at a specific position (col 148 (3 digits)) is smaller than a predefined number. This complete line should be printed then
so i tried the following code. but it doesnt work.
fobj = open("test2.txt")
for line in fobj:
if (line.startswith("ABS")) and (fp.seek(3, 148) < 400):
print line.rstrip()
Can anyone help me?
To compare a number with a string you need to convert it:
int(fp.seek(3, 148)) < 400
You have to check the string to contain only numbers.
But seek() is not the function you are looking for, you can use it to skip the bytes of a file to a specific point.
Look here: seek() function?
If your number is always on the same position you can use:
int(line[148:150]) < 400
Try it with regular expressions and string operations:
http://pymotw.com/2/re/
Note: I was using the wrong source file for my data - once that was fixed, my issue was resolved. It turns out, there is no simple way to use int(..) on a string that is not an integer literal.
This is an example from the book "Machine Learning In Action", and I cannot quite figure out what is wrong. Here's some background:
from numpy import as *
def file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines())
returnMat = zeros((numberOfLines,3))
classLabelVector = []
fr = open(filename)
index = 0
for line in fr.readlines():
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1])) # Problem here.
index += 1
return returnMat,classLabelVector
The .txt file is as follows:
40920 8.326976 0.953952 largeDoses
14488 7.153469 1.673904 smallDoses
26052 1.441871 0.805124 didntLike
75136 13.147394 0.428964 didntLike
38344 1.669788 0.134296 didntLike
...
I am getting an error on the line classLabelVector.append(int(listFromLine[-1])) because, I believe, int(..) is trying to parse over a String (ie "largeDoses") that is a not a literal integer. Am I missing something?
I looked up the documentation for int(), but it only seems to parse numbers and integer literals:
http://docs.python.org/2/library/functions.html#int
Also, an excerpt from the book explains this section as follows:
Finally, you loop over all the lines in the file and strip off the return line character with line.strip(). Next, you split the line
into a list of elements delimited by the tab character: '\t'. You take
the first three elements and shove them into a row of your matrix, and
you use the Python feature of negative indexing to get the last item
from the list to put into classLabelVector. You have to explicitly
tell the interpreter that you’d like the integer version of the last
item in the list, or it will give you the string version. Usually,
you’d have to do this, but NumPy takes care of those details for you.
strings like "largeDoses" could not be converted to integers. In folder Ch02 of that code project, you have two data files, use the second one datingTestSet2.txt instead of loading the first
You can use ast.literal_eval and catch the exception ValueError the malformed string (by the way int('9.4') will raise an exception)
I can't seem to figure out how to use values given in a text file and import them into python to create a list. What I'm trying to accomplish here is to create a gameboard and then put numbers on it as a sample set. I have to use Quickdraw to accomplish this - I kind of know how to get the numbers on Quickdraw but I cannot seem to import the numbers from the text file. Previous assignments involved getting the user to input values or using an I/O redirection, this is a little different. Could anyone assist me on this?
Depends on the contents of the file you want to read and output in the list you want to get.
# assuming you have values each on separate line
values = []
for line in open('path-to-the-file'):
values.append(line)
# might want to implement stripping newlines and such in here
# by using line.strip() or .rstrip()
# or perhaps more than one value in a line, with some separator
values = []
for line in open('path-to-the-file'):
# e.g. ':' as a separator
separator = ':'
line = line.split(separator)
for value in line:
values.append(value)
# or all in one line with separators
values = open('path-to-the-file').read().split(separator)
# might want to use .strip() on this one too, before split method
It could be more accurate if we knew the input and output requirements.
Two steps here:
open the file
read the lines
This page might help you: http://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects