find line index in a text file python - python

I want to get the index of the line that corresponds to a certain string (in this case, InChI=1S/C11etc..) in a text file (content), here is my code:
with open('compounds.dat', encoding='utf-8', errors='ignore') as f:
content = f.readlines()
index = [x for x in range(len(content)) if "InChI=1S/C11H8O3/c1-6-5-9(13)10-7(11(6)14)3-2-4-8(10)12/h2-5" in content[x].lower()]
print(index)
However I get empty bracket []. But I am pretty sure that the line exist, because if I run this:
for line in f:
if u"InChI=1S/C11H8O3/c1-6-5-9(13)10-7(11(6)14)3-2-4-8(10)12/h2-5" in line:
l = line
I get the line I am interested.

Expanding upon my comment, calling lower() will lowercase your target string, while your search string has upper case letters - there's no chance you'll match anything like that.
Additionally, you don't have to iterate over the range. for can directly iterate over the items in content. This will work.
search_str = "InChI=1S/C11H8O3/c1-6-5-9(13)10-7(11(6)14)3-2-4-8(10)12/h2-5"
lines = [x for x in content if search_str in content]

Do not use .lower() in the code and it should work fine.

Related

What's the fastest way to convert a list into a python list?

Say I have a list like such:
1
2
3
abc
What's the fastest way to convert this into python list syntax?
[1,2,3,"abc"]
The way I currently do it is to use regex in a text editor. I was looking for a way where I can just throw in my list and have it convert immediately.
Read the file, split into lines, convert each line if numeric.
# Read the file
with open("filename.txt") as f:
text = f.read()
# Split into lines
lines = text.splitlines()
# Convert if numeric
def parse(line):
try:
return int(line)
except ValueError:
return line
lines = [parse(line) for line in lines]
If it's in a text file like you mentioned in the comments then you can simply read it and split by a new line. For example, your code would be:
with open("FileName.txt", "r") as f:
L = f.read().split("\n")
print(L)
Where L is the list.
Now, if your looking for the variables to be of the correct type instead of all being strings, then you can loop through each in the list to check. For example, you could do the following:
for i in range(len(L)):
if L[i].isnumeric():
L[i] = int(L[i])

Remove commas and newlines from text file in python

I have text file which looks like this:
ab initio
ab intestato
ab intra
a.C.
acanka, acance, acanek, acankach, acankami, acankÄ…
Achab, Achaba, Achabem, Achabie, Achabowi
I would like to pars every word separated by comma into a list. So it would look like ['ab initio', 'ab intestato', 'ab intra','a.C.', 'acanka', ...] Also mind the fact that there are words on new lines that are not ending with commas.
When I used
list1.append(line.strip()) it gave me string of every line instead of separate words. Can someone provide me some insight into this?
Full code below:
list1=[]
filepath="words.txt"
with open(filepath, encoding="utf8") as fp:
line = fp.readline()
while line:
list1.append(line.strip(','))
line = fp.readline()
Very close, but I think you want split instead of strip, and extend instead of append
You can also iterate directly over the lines with a for loop.
list1=[]
filepath="words.txt"
with open(filepath, encoding="utf8") as fp:
for line in fp:
list1.extend(line.strip().split(', '))
You can use your code to get down to "list of line"-content and apply:
cleaned = [ x for y in list1 for x in y.split(',')]
this essentially takes any thing you parsed into your list and splits it at , to creates a new list.
sberrys all in one solution that uses no intermediate list is faster.

Breaking txt file into list of lists by character and by row

I am just learning to code and am trying to take an input txt file and break into a list (by row) where each row's characters are elements of that list. For example if the file is:
abcde
fghij
klmno
I would like to create
[['a','b','c','d','e'], ['f','g','h','i','j'],['k','l','m','n','o']]
I have tried this, but the results aren't what I am looking for.
file = open('alpha.txt', 'r')
lst = []
for line in file:
lst.append(line.rstrip().split(','))
print(lst)
[['abcde', 'fghij', 'klmno']]
I also tried this, which is closer, but I don't know how to combine the two codes:
file = open('alpha.txt', 'r')
lst = []
for line in file:
for c in line:
lst.append(c)
print(lst)
['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o']
I tried to add the rstrip into the lst.append but it didn't work (or I didn't do it properly). Sorry - complete newbie here!
I should mention that I don't want newline characters included. Any help is much appreciated!
This is very simple. You have to use the list() constructor to make a string into its respective characters.
with open('alpha.txt', 'r') as file:
print([list(line)[:-1] for line in file.readlines()])
(The with open construct is just an idiom, so you don't have to do all the handling with the file like closing it, which you forgot to do)
If you want to split a string to it's charts you can just use list(s) (where s = 'asdf'):
file = open('alpha.txt', 'r')
lst = []
for line in file:
lst.append(list(line.strip()))
print(lst)
You are appending each entry to your original list. You want to create a new list for each line in your input, append to that list, and then append that list to your master list. For example,
file = open('alpha.txt', 'r')
lst = []
for line in file:
newLst = []
for c in line:
newLst.append(c)
lst.append(newLst)
print(lst)
use a nested list comprehension. The outer loop iterates over the lines in the file and the inner loop over the characters in the strings of each line.
with open('alpha.txt') as f:
out = [[char for char in line.strip()] for line in f]
req = [['a','b','c','d','e'], ['f','g','h','i','j'],['k','l','m','n','o']]
print(out == req)
prints
True

Python - replace a line by its column in file

Sorry for posting such an easy question, but i couldn't find an answer on google.
I wish my code to do something like this
code:
lines = open("Bal.txt").write
lines[1] = new_value
lines.close()
p.s i wish to replace the line in a file with a value
xxx.dat before:
ddddddddddddddddd
EEEEEEEEEEEEEEEEE
fffffffffffffffff
with open('xxx.txt','r') as f:
x=f.readlines()
x[1] = "QQQQQQQQQQQQQQQQQQQ\n"
with open('xxx.txt','w') as f:
f.writelines(x)
xxx.dat after:
ddddddddddddddddd
QQQQQQQQQQQQQQQQQQQ
fffffffffffffffff
Note:f.read() returns a string, whereas f.readlines() returns a list, enabling you to replace an occurrence within that list.
Inclusion of the \n (Linux) newline character is important to separate line[1] from line[2] when you next read the file, or you would end up with:
ddddddddddddddddd
QQQQQQQQQQQQQQQQQQQfffffffffffffffff

Python reading file problems

highest_score = 0
g = open("grades_single.txt","r")
arrayList = []
for line in highest_score:
if float(highest_score) > highest_score:
arrayList.extend(line.split())
g.close()
print(highest_score)
Hello, wondered if anyone could help me , I'm having problems here. I have to read in a file of which contains 3 lines. First line is no use and nor is the 3rd. The second contains a list of letters, to which I have to pull them out (for instance all the As all the Bs all the Cs all the way upto G) there are multiple letters of each. I have to be able to count how many off each through this program. I'm very new to this so please bear with me if the coding created is wrong. Just wondered if anyone could point me in the right direction of how to pull out these letters on the second line and count them. I then have to do a mathamatical function with these letters but I hope to work that out for myself.
Sample of the data:
GTSDF60000
ADCBCBBCADEBCCBADGAACDCCBEDCBACCFEABBCBBBCCEAABCBB
*
You do not read the contents of the file. To do so use the .read() or .readlines() method on your opened file. .readlines() reads each line in a file seperately like so:
g = open("grades_single.txt","r")
filecontent = g.readlines()
since it is good practice to directly close your file after opening it and reading its contents, directly follow with:
g.close()
another option would be:
with open("grades_single.txt","r") as g:
content = g.readlines()
the with-statement closes the file for you (so you don't need to use the .close()-method this way.
Since you need the contents of the second line only you can choose that one directly:
content = g.readlines()[1]
.readlines() doesn't strip a line of is newline(which usually is: \n), so you still have to do so:
content = g.readlines()[1].strip('\n')
The .count()-method lets you count items in a list or in a string. So you could do:
dct = {}
for item in content:
dct[item] = content.count(item)
this can be made more efficient by using a dictionary-comprehension:
dct = {item:content.count(item) for item in content}
at last you can get the highest score and print it:
highest_score = max(dct.values())
print(highest_score)
.values() returns the values of a dictionary and max, well, returns the maximum value in a list.
Thus the code that does what you're looking for could be:
with open("grades_single.txt","r") as g:
content = g.readlines()[1].strip('\n')
dct = {item:content.count(item) for item in content}
highest_score = max(dct.values())
print(highest_score)
highest_score = 0
arrayList = []
with open("grades_single.txt") as f:
arraylist.extend(f[1])
print (arrayList)
This will show you the second line of that file. It will extend arrayList then you can do whatever you want with that list.
import re
# opens the file in read mode (and closes it automatically when done)
with open('my_file.txt', 'r') as opened_file:
# Temporarily stores all lines of the file here.
all_lines_list = []
for line in opened_file.readlines():
all_lines_list.append(line)
# This is the selected pattern.
# It basically means "match a single character from a to g"
# and ignores upper or lower case
pattern = re.compile(r'[a-g]', re.IGNORECASE)
# Which line i want to choose (assuming you only need one line chosen)
line_num_i_need = 2
# (1 is deducted since the first element in python has index 0)
matches = re.findall(pattern, all_lines_list[line_num_i_need-1])
print('\nMatches found:')
print(matches)
print('\nTotal matches:')
print(len(matches))
You might want to check regular expressions in case you need some more complex pattern.
To count the occurrences of each letter I used a dictionary instead of a list. With a dictionary, you can access each letter count later on.
d = {}
g = open("grades_single.txt", "r")
for i,line in enumerate(g):
if i == 1:
holder = list(line.strip())
g.close()
for letter in holder:
d[letter] = holder.count(letter)
for key,value in d.iteritems():
print("{},{}").format(key,value)
Outputs
A,9
C,15
B,15
E,4
D,5
G,1
F,1
One can treat the first line specially (and in this case ignore it) with next inside try: except StopIteration:. In this case, where you only want the second line, follow with another next instead of a for loop.
with open("grades_single.txt") as f:
try:
next(f) # discard 1st line
line = next(f)
except StopIteration:
raise ValueError('file does not even have two lines')
# now use line

Categories

Resources