I'm trying to create a list of text files from a directory so I can extract key data from them, however, the list my function returns also contains a list of the file pathways as the first item of the list. I've tried del full_text[0] which didn't work, as well as any other value, and also the remove function. Any ideas as to why this might be happening?
Thanks
import glob
file_paths = []
file_paths.extend(glob.glob("C:\Users\12342255\PycharmProjects\Sequence diagrams\*"))
matching_txt = [s for s in file_paths if ".txt" in s]
print matching_txt
full_text = []
def fulltext():
for file in matching_txt:
f = open(file, "r")
ftext = f.read()
all_seqs = ftext.split("title ")
print all_seqs
full_text.append(fulltext())
print full_text
You can use slicing to get rid of the first element - full_text[1:]. This creates a copy of the list. Otherwise, you can full_text.pop(0) and resume using full_text
I see at least two ways to do so:
1) you can create a new list from first position e.g. newList = oldList[1:]
2) use remove method - full_text.remove(full_text[0])
Related
I want to tokenize my CSV in one list rather than a separate list?
with open ('train.csv') as file_object:
for trainline in file_object:
tokens_train = sent_tokenize(trainline)
print(tokens_train)
This is how I am getting the output:
['2.1 Separated of trains']
['Principle: The method to make the signal is different.']
['2.2 Context']
I want all of them in one list
['2.1 Separated of trains','Principle: The method to make the signal is different.','2.2 Context']
Since sent_tokenize() returns a list, you could simply extend a starting list each time.
alltokens = []
with open ('train.csv') as file_object:
for trainline in file_object:
tokens_train = sent_tokenize(trainline)
alltokens.extend(tokens_train)
print(alltokens)
Or with a list comprehension:
with open ('train.csv') as file_object:
alltokens = [token for trainline in file_object for token in sent_tokenize(trainline)]
print(alltokens)
Both solutions will work even if sent_tokenize() returns a list longer than 1.
Initialize an empty list
out = []
And inside the loop append items to it.
out.append(tokens_train)
Maybe you have to modify your tokenizer too.
First off I am aware this kind of question has already been posted but none of those things seemed to be working for me.
I have a text file that looks like this:
"foo","bar","green","white"
and so on.
I want to import the entire content to a list like this:
txtfile = open("H:/Python/colors.txt", "r")
list1 = [txtfile.read()]
print (list1[0])
but python does not recognize the words as seperate list items, so that list1[0] gives me the entire content and list1[1] does not exist.
Why does python do that and how do I avoid it?
You no need to specify separately list and can write as:
with open("stopwords.txt", "r") as fd:
var = fd.readline()
list1 = list()
temp = eval(var)
list1 = [ele for ele in temp]
print(list1)
Explanation:
Read the line using readline and get that into var and that is a string datatype. Now using eval evaluate it. Now using list comprehension you can get required output
You can use:
list1 = txtfile.read().split(',')
Try this.
txtfile = open("H:/Python/colors.txt", "r")
list1 = txtfile.read().split(",")
print(list1[0])
Try using the mentioned below code:
file=open("H:/Python/colors.txt","r")
for line in file:
fields=line.split(",")
print(fields[0],fields[1],fields[2],fields[3])
I just started to learn python. I need to store a csv file data into a list of tuples: tuples to represent the values on each row, list to store all the rows.
The function I have problem with is when I need to filter the list. Basically create a copy of the list with only the ones that met criteria. I have successfully appended all the tuples into a list, but when I need to append the tuples into a new list, it doesn't work.
def filterRecord():
global filtered
filtered = list()
try:
if int(elem[2])>= int(command[-1]): #the condition
#if I print(elem) here, all results are correct
filtered.append(tuple(elem)) #tuples do not add into the list
#len(filtered) is 0
except ValueError:
pass
def main():
infile = open('file.csv')
L = list()
for line in infile:
parseLine() #a function store each row into tuple
for line in stdin:
command = line.split() #process user input, multiple lines
for elem in L:
if command == 0:
filterRecord()
If I run it, the program doesn't response. If I force stop it, the traceback is always for line in stdin
Also, I am not allowed to use the csv module in this program.
I think you need to import sys and use for line in sys.stdin
You should use python's built-in library to parse csv files (unless this is something like a homework assignment): https://docs.python.org/2/library/csv.html.
You can then do something like:
import csv
with open ('file.csv', 'r') as f:
reader = csv.DictReader(f, delimiter=",")
In my function below I'm trying to return the full text from multiple .txt files and append the results to a list, however, my function only returns the text from the first file in my directory. That said if I replace return with print it logs all the results to the console on its own line but I can't seem to append the results to a list. What am I doing wrong with the return function.
Thanks.
import glob
import copy
file_paths = []
file_paths.extend(glob.glob("C:\Users\7812397\PycharmProjects\Sequence diagrams\*"))
matching_txt = [s for s in file_paths if ".txt" in s]
full_text = []
def fulltext():
for file in matching_txt:
f = open(file, "r")
ftext = f.read()
all_seqs = ftext.split("title ")
return all_seqs
print fulltext()
You're putting return inside your loop. I think you want to do. Although, you could yield the values here.
for file in matching_txt:
f = open(file, "r")
....
full_text.append(all_seqs)
return full_text
You can convert your function to a generator which is more efficient in terms of memory use:
def fulltext():
for file_name in matching_txt:
with open(file_name) as f:
ftext = f.read()
all_seqs = ftext.split("title ")
yield all_seqs
Then convert the generator to a list if they are not huge, other wise you can simply loop over the result if you want to use the generator's items:
full_text = list(fulltext())
Some issues in your function:
First off, don't use python built-in names as variable names (in this case file).
Secondly for dealing with files (and other internal connections) you better to use with statement which will close the file at the end of the block automatically.
Trying to analyze a 2 column (color number_of_occurances) .tsv file that has a heading line with a dictionary. Trying to skip the heading line in the most generic way possible (assume this to be by requiring the 2nd column to be of int type). The following is the best I've come up with, but seems like there has to be better:
filelist = []
color_dict = {}
with open('file1.tsv') as F:
filelist = [line.strip('\n').split('\t') for line in F]
for item in filelist:
try: #attempt to add values to existing dictionary entry
x = color_dict[item[0]]
x += int(item[1])
color_dict[item[0]] = x
except: #if color has not been observed yet (KeyError), or if non-convertable string(ValueError) create new entry
try:
color_dict[item[0]] = int(item[1])
except(ValueError): #if item[1] can't convert to int
pass
Seems like there should be a better way to handle the trys and exceptions.
File excerpt by request:
color Observed
green 15
gold 20
green 35
Can't you just skip the first element in the list by slicing your list as [1:] like this:
filelist = [line.strip('\n').split('\t') for line in F][1:]
Now, fileList won't at all contain the element for first line, i.e., the heading line.
Or, as pointed in comment by #StevenRumbalski, you can simply do next(F, None) before your list comprehension to avoid making a copy of your list, after first element like this:
with open('file1.tsv') as F:
next(F, None)
filelist = [line.strip('\n').split('\t') for line in F]
Also, it would be better if you use a defaultdict here.
Use it like this:
from collections import defaultdict
color_dict = defaultdict(int)
And this way, you won't have to check for existence of key, before operating on it. So, you can simply do:
color_dict[item[0]] += int(item[1])
I would use defaultdict in this case. Because, when each key is encountered for the first time, it is not already in the mapping; so an entry is automatically created.
from collections import defaultdict
color_dict = defaultdict(int)
for item in filelist:
color_dict[item[0]] += int(item[1])