Return only returning 1st item - python

In my function below I'm trying to return the full text from multiple .txt files and append the results to a list, however, my function only returns the text from the first file in my directory. That said if I replace return with print it logs all the results to the console on its own line but I can't seem to append the results to a list. What am I doing wrong with the return function.
Thanks.
import glob
import copy
file_paths = []
file_paths.extend(glob.glob("C:\Users\7812397\PycharmProjects\Sequence diagrams\*"))
matching_txt = [s for s in file_paths if ".txt" in s]
full_text = []
def fulltext():
for file in matching_txt:
f = open(file, "r")
ftext = f.read()
all_seqs = ftext.split("title ")
return all_seqs
print fulltext()

You're putting return inside your loop. I think you want to do. Although, you could yield the values here.
for file in matching_txt:
f = open(file, "r")
....
full_text.append(all_seqs)
return full_text

You can convert your function to a generator which is more efficient in terms of memory use:
def fulltext():
for file_name in matching_txt:
with open(file_name) as f:
ftext = f.read()
all_seqs = ftext.split("title ")
yield all_seqs
Then convert the generator to a list if they are not huge, other wise you can simply loop over the result if you want to use the generator's items:
full_text = list(fulltext())
Some issues in your function:
First off, don't use python built-in names as variable names (in this case file).
Secondly for dealing with files (and other internal connections) you better to use with statement which will close the file at the end of the block automatically.

Related

Python Question - How to extract text between {textblock}{/textblock} of a .txt file?

I want to extract the text between {textblock_content} and {/textblock_content}.
With this script below, only the 1st line of the introtext.txt file is going to be extracted and written in a newly created text file. I don't know why the script does not extract also the other lines of the introtext.txt.
f = open("introtext.txt")
r = open("textcontent.txt", "w")
for l in f.readlines():
if "{textblock_content}" in l:
pos_text_begin = l.find("{textblock_content}") + 19
pos_text_end = l.find("{/textblock_content}")
text = l[pos_text_begin:pos_text_end]
r.write(text)
f.close()
r.close()
How to solve this problem?
Your code actually working fine, assuming you have begin and end block in your line. But I think this is not what you dreamed of. You can't read multiple blocks in one line, and you can't read block which started and ended in different lines.
First of all take a look at the object which returned by open function. You can use method read in this class to access whole text. Also take a look at with statements, it can help you to make actions with file easier and safely. And to rewrite your code so it will read something between {textblockcontent} and {\textblockcontent} we should write something like this:
def get_all_tags_content(
text: str,
tag_begin: str = "{textblock_content}",
tag_end: str = "{/textblock_content}"
) -> list[str]:
useful_text = text
ans = []
# Heavy cicle, needs some optimizations
# Works in O(len(text) ** 2), we can better
while tag_begin in useful_text:
useful_text = useful_text.split(tag_begin, 1)[1]
if tag_end not in useful_text:
break
block_content, useful_text = useful_text.split(tag_end, 1)
ans.append(block_content)
return ans
with open("introtext.txt", "r") as f:
with open("textcontent.txt", "w+") as r:
r.write(str(get_all_tags_content(f.read())))
To write this function efficiently, so it can work with a realy big files on you. In this implementation I have copied our begin text every time out context block appeared, it's not necessary and it's slow down our program (Imagine the situation where you have millions of lines with content {textblock_content}"hello world"{/textblock_content}. In every line we will copy whole text to continue out program). We can use just for loop in this text to avoid copying. Try to solve it yourself
When you call file.readlines() the file pointer will reach the end of the file. For further calls of the same, the return value will be an empty list so if you change your code to sth like one of the below code snippets it should work properly:
f = open("introtext.txt")
r = open("textcontent.txt", "w")
f_lines = f.readlines()
for l in f_lines:
if "{textblock_content}" in l:
pos_text_begin = l.find("{textblock_content}") + 19
pos_text_end = l.find("{/textblock_content}")
text = l[pos_text_begin:pos_text_end]
r.write(text)
f.close()
r.close()
Also, you can implement it through with context manager like the below code snippet:
with open("textcontent.txt", "w") as r:
with open("introtext.txt") as f:
for line in f:
if "{textblock_content}" in l:
pos_text_begin = l.find("{textblock_content}") + 19
pos_text_end = l.find("{/textblock_content}")
text = l[pos_text_begin:pos_text_end]
r.write(text)

Read multiple files, search for string and store in a list

I am trying to search through a list of files, look for the words 'type' and the following word. then put them into a list with the file name. So for example this is what I am looking for.
File Name, Type
[1.txt, [a, b, c]]
[2.txt, [a,b]]
My current code returns a list for every type.
[1.txt, [a]]
[1.txt, [b]]
[1.txt, [c]]
[2.txt, [a]]
[2.txt, [b]]
Here is my code, i know my logic will return a single value into the list but I'm not sure how to edit it to it will just be the file name with a list of types.
output = []
for file_name in find_files(d):
with open(file_name, 'r') as f:
for line in f:
line = line.lower().strip()
match = re.findall('type ([a-z]+)', line)
if match:
output.append([file_name, match])
Learn to categorize your actions at the proper loop level.
In this case, you say that you want to accumulate all of the references into a single list, but then your code creates one output line per reference, rather than one per file. Change that focus:
with open(file_name, 'r') as f:
ref_list = []
for line in f:
line = line.lower().strip()
match = re.findall('type ([a-z]+)', line)
if match:
ref_list.append(match)
# Once you've been through the entire file,
# THEN you add a line for that file,
# with the entire reference list
output.append([file_name, ref_list])
You might find it useful to use a dict here instead
output = {}
for file_name in find_files(d):
with open(file_name, 'r') as f:
output[file_name] = []
for line in f:
line = line.lower().strip()
match = re.findall('type ([a-z]+)', line)
if match:
output[file_name].append(*match)

Naming lists based on the names of text files

I'm trying to write a function which takes word lists from text files and appends each word in the file to a list, with the same name as the text file. For instance, using the text files Verbs.txt and Nouns.txt would result in all the words in Verbs.txt being in a verbs list and all the nouns in a nounslist. I'm trying to do it in a for loop:
def loadAllWords():
fileList = ['Adjectives.txt', 'Adverbs.txt', 'Conjunctions.txt',
'IntransitiveVerbs.txt', 'Leadin.txt', 'Nounmarkers.txt',
'Nouns.txt', 'TransitiveVerbs.txt']
for file in fileList:
infile = open(file, 'r')
word_type = file[:-4]
word_list = [line for line in infile]
return word_list
Of course, I could do it easily once for each text file:
def loadAllWords():
infile = open("Adjectives.txt", "r")
wordList = []
wordList = [word for word in infile]
return wordList
but I'd like my function to do it automatically with each one. Is there a way to do this, or should I just stick with a for loop for each file?
You should use a dict for that like (untested):
results = {}
for file in file_list:
infile = open(file, 'r')
word_type = file[:-4]
results[word_type] = [line for line in infile]
return results
also you don't need the list comprehension, you can just do:
results[word_type] = list(infile)
You can create new variables with custom names by manipulating the locals() dictionary, which is where local variables are stored. But it is hard to imagine any case where this would be a good idea. I strongly recommend Stephen Roach’s suggestion of using a dictionary, which will let you keep track of the lists more neatly. But if you really want to create local variables for each file, you can use a slight variation on his code:
results = {}
for file in file_list:
with open(file, 'r') as infile:
word_type = file[:-4]
results[word_type] = list(infile)
# store each list in a local variable with that name
locals.update(results)

Python list can't delete first item

I'm trying to create a list of text files from a directory so I can extract key data from them, however, the list my function returns also contains a list of the file pathways as the first item of the list. I've tried del full_text[0] which didn't work, as well as any other value, and also the remove function. Any ideas as to why this might be happening?
Thanks
import glob
file_paths = []
file_paths.extend(glob.glob("C:\Users\12342255\PycharmProjects\Sequence diagrams\*"))
matching_txt = [s for s in file_paths if ".txt" in s]
print matching_txt
full_text = []
def fulltext():
for file in matching_txt:
f = open(file, "r")
ftext = f.read()
all_seqs = ftext.split("title ")
print all_seqs
full_text.append(fulltext())
print full_text
You can use slicing to get rid of the first element - full_text[1:]. This creates a copy of the list. Otherwise, you can full_text.pop(0) and resume using full_text
I see at least two ways to do so:
1) you can create a new list from first position e.g. newList = oldList[1:]
2) use remove method - full_text.remove(full_text[0])

How to read last line of CSV file into a list that I can manipulate (python)

I wrote a small function that will return the last row of a csv file. I can't for the life of me figure out how to store that row into a list or array to use after the function has completed.
import os
import csv
hello = []
def getLastFile(filename):
distance = 1024
with open(filename,'rb') as f:
reader = csv.reader(f)
reader.seek(0, os.SEEK_END)
if reader.tell() < distance:
reader.seek(0, os.SEEK_SET)
lines = reader.readlines()
lastline = lines[-1]
else:
reader.seek(-1 * distance, os.SEEK_END)
lines = reader.readlines()
lastline = lines[-1]
return lastline
I have tried defining an empty list at the top, and appending lastline to that list, but I found that that was wrong. Right now the function returns a csv row, ex: 'a','b','c', how can I make the output of my function save this into a global list or array that I can then use? Thank you!
If you want the result of a function to be used elsewhere you have to actually call the function. For instance:
def print_last_line(filename):
print getLastFile(filename)
Additionally there's the problem of scope, where you must define anything you want to use within the function in the function itself. For instance:
test = []
def use_last_line(filename):
test.append(getLastFile(filename)) # This will error, because test is not in scope
def use_last_line(filename):
test = []
test.append(getLastFile(filename)) # This will success because test is in the function.
Specifically to do this for what I'm guessing you're trying to do above, you would want to just actually call the function and assign the result to your hello array:
hello = getLastFile(filename)
you could open the CSV file as an array with numpy's genfromtxt and then slice the last row, or if you know the length of your file, it looks like 1024 in your example, you can use the skip_header keyword. i.e.
import numpy as np
data = np.genfromtxt(filename, delimiter=",",skip_header=1024)

Categories

Resources