What I have is a CSV file where the header is "keyword" and each cell under the header contains text so that it that looks like this:
Keyword
Lions Tigers Bears
Dog Cat
Fish
Shark Guppie
What I am trying to do is to parse each of the phrases in that list into individual words so that the end product looks like this:
Keyword
Lion
Tigers
Bear
Dog
Cat...
Right now, my code takes the CSV file and splits the list into individual parts but still does not create a uniform column.
datafile = open(b'C:\Users\j\Desktop\helloworld.csv', 'r')
data = []
for row in datafile:
data.append(row.strip().split(","))
white = row.split()
print (white)
and my output looks like this:
['Keyword']
['Lion', 'Tigers']
['Dolphin', 'Bears', 'Zebra']
['Dog', 'Cat']
I know that a possible solution would involve the use of lineterminator = '\n' but I am not sure how to incorporate that into my code. Any help would be very much appreciated!
** EDITED -- the source CSV does not have commas separating the words within each phrase
Use extend instead of append on lists to add all items from a list to another one:
datafile = open(b'C:\Users\j\Desktop\helloworld.csv', 'r')
data = []
for row in datafile:
data.extend(row.strip().split())
print(data)
To get rid of further whitespace around the individual entries, use
datafile = open(b'C:\Users\j\Desktop\helloworld.csv', 'r')
data = []
for row in datafile:
data.extend(item.strip() for item in row.split())
print(data)
Also, to read files safely, you can make use of a with statement (you won't have to take care of closing your files anymore):
with open('C:\Users\j\Desktop\helloworld.csv', 'r') as datafile:
data = []
for row in datafile:
data.extend(item.strip() for item in row.split())
print(data)
EDIT: After OP clarification, I removed the "," argument in split to split on whitespace rather than on commata.
You should be able to use this code to read your file. Replace file name with what you have. My file content is exactly what you posted above.
keyword = "Keyword"
with open("testing.txt") as file:
data = file.read().replace("\n", " ").split(" ")
for item in data:
if item == keyword:
print("%s" % keyword)
else:
print(" %s" % item)
Output:
Keyword
Lions
Tigers
Bears
Dog
Cat
Fish
Shark
Guppie
Keyword
Dog
Something
Else
Entirely
You just need to split the read:
with open("in.txt","r+") as f:
data = f.read().split()
f.seek(0) # go back to start of file
f.write("\n".join(data)) # write new data to file
['Keyword', 'Lions', 'Tigers,', 'Bears', 'Dog', 'Cat', 'Fish', 'Shark', 'Guppie']
Related
So I understand how to manipulate a text file and move data in and out of the program, but I'm trying to take raw data in a text file, and load them into an array that is originally empty, how would I make this approach?
Assume my raw data contains 3 words, I want to place those words into a variable called Array. The raw data of the text file contains the 3 following words: ' Apple Banana Orange '. I would like it to load into the array as: Array = ["Apple", "Banana", "Orange"]. How would you approach this?
with open("C:\\Users\\NameList.txt","r") as f:
Array = []
nameList = f.readlines(Array)
Am aware the code is wrong, but I'm not sure how to fix even after reading so much.
If your input test.txt is like below:
Apple Banana Orange
This is the solution you are looking for.
with open("test.txt","r")as f:
text = f.readlines()
Array = text[0].split()
In case you have more than 1 line, you can use this one:
with open("test.txt","r")as f:
text = f.read().splitlines()
Array = [i.split() for i in text]
This will read all the lines in your file:
with open("C:\\Users\\NameList.txt","r")as f:
lines = f.read().splitlines()
Array = list()
for line in lines:
Array.append(line)
print(Array)
for item in Array:
if 'Apple' == item:
print(item)
Output:
#first loop
['Apple', 'Banana', 'Orange']
#second loop
Apple
I'm making a program for shopping lists :)
The data will be like:
Amanda, Bananas, Apples, Oranges
Steve, Mushrooms, Pork, Spaghetti, Sauce
Dave, Onions, Eggs, Bread, Bacon
This is what I have so far
file = open(filename, "r")
readfile = file.read()
shlist = readfile.splitlines()
So I have created a list where each person's shopping is an item in the list.
Is it possible to split these into another list while still being items of a list themselves? I tried to add the following:
for shopping in shlist:
shopping.split(,)
But I am receiving an error.
Alternately I could just use the index of the commas to deduce the location & length of the items. I am not sure which would be best.
Well, you're getting an error because you meant to type .split(','), but that still won't solve your problem. A call to split() takes a string a generates a list of strings as a result. The result doesn't magically replace the string.
The simplest solution is something like:
with open(filename, "r") as file:
result = [line.split(',') for line in file]
If you need both the pre-split lines and the post-split lines:
with open(filename, "r") as file:
lines = file.readlines()
result = [line.split(',') for line in lines]
I'm a beginner in python, and tried to find solution by googling. However, I couldn't find any solution that I wanted.
What I'm trying to do with python is pre-processing of data that finds keywords and get all rows that include keyword from a large csv file.
And somehow the nested loop goes through just once and then it doesn't go through on second loop.
The code shown below is a part of my code that finds keywords from the csv file and writes into text file.
def main():
#Calling file (Directory should be changed)
data_file = 'dataset.json'
#Loading data.json file
with open(data_file, 'r') as fp:
data = json.load(fp)
#Make the list for keys
key_list = list(data.keys())
#print(key_list)
preprocess_txt = open("test_11.txt", "w+", -1, "utf-8")
support_fact = 0
for i, k in enumerate(key_list):
count = 1
#read csv, and split on "," the line
with open("my_csvfile.csv", 'r', encoding = 'utf-8') as csvfile:
reader = csv.reader(csvfile)
#The number of q_id is 2
#This is the part that the nested for loop doesn't work!!!!!!!!!!!!!!!!!!!!!!!!!!!!
if len(data[k]['Qids']) == 2:
print("Number 2")
for m in range(len(data[k]['Qids'])):
print(len(data[k]['Qids']))
q_id = [data[k]['Qids'][m]]
print(q_id)
for row in reader: #--->This nested for loop doesn't work after going through one loop!!!!!
if all([x in row for x in q_id]):
print("YES!!!")
preprocess_txt.write("%d %s %s %s\n" % (count, row[0], row[1], row[2]))
count += 1
For the details of above code,
First, it extracts all keys from data.json file, and then put those keys into list(key_list).
Second, I used all([x in row for x in q_id]) method to check each row which contains a keyword(q_id).
However, as I commented above in the code, when the length of data[k]['Qids'] has 2, it prints out YES!!! at first loop correctly, but doesn't print out YES!!!at second loop which means it doesn't go into for row in reader loop even though that csv file contains the keyword.
The figure of print is shown as below,
What did I do wrong..? or what should I add for the code to make it work..?
Can anybody help me out..?
Thanks for looking!
For sake of example, let's say I have a CSV file which looks like this:
foods.csv
beef,stew,apple,sauce
apple,pie,potato,salami
tomato,cherry,pie,bacon
And the following code, which is meant to simulate the structure of your current code:
def main():
import csv
keywords = ["apple", "pie"]
with open("foods.csv", "r") as file:
reader = csv.reader(file)
for keyword in keywords:
for row in reader:
if keyword in row:
print(f"{keyword} was in {row}")
print("Done")
main()
The desired result is that, for every keyword in my list of keywords, if that keyword exists in one of the lines in my CSV file, I will print a string to the screen - indicating in which row the keyword has occurred.
However, here is the actual output:
apple was in ['beef', 'stew', 'apple', 'sauce']
apple was in ['apple', 'pie', 'potato', 'salami']
Done
>>>
It was able to find both instances of the keyword apple in the file, but it didn't find pie! So, what gives?
The problem
The file handle (in your case csvfile) yields its contents once, and then they are consumed. Our reader object wraps around the file-handle and consumes its contents until they are exhausted, at which point there will be no rows left to read from the file (the internal file pointer has advanced to the end), and the inner for-loop will not execute a second time.
The solution
Either move the interal file pointer to the beginning using seek after each iteration of the outer for-loop, or read the contents of the file once into a list or similar collection, and then iterate over the list instead:
Updated code:
def main():
import csv
keywords = ["apple", "pie"]
with open("foods.csv", "r") as file:
contents = list(csv.reader(file))
for keyword in keywords:
for row in contents:
if keyword in row:
print(f"{keyword} was in {row}")
print("Done")
main()
New output:
apple was in ['beef', 'stew', 'apple', 'sauce']
apple was in ['apple', 'pie', 'potato', 'salami']
pie was in ['apple', 'pie', 'potato', 'salami']
pie was in ['tomato', 'cherry', 'pie', 'bacon']
Done
>>>
I believe that your reader variable contains only the first line of your csv file, thus for row in reader executes only once.
try:
with open("my_csvfile.csv", newline='', 'r', encoding = 'utf-8') as csvfile:
newline='' is the new argument introduced above.
reference: https://docs.python.org/3/library/csv.html#id3
Quote: "If csvfile is a file object, it should be opened with newline=''
Example of data in txt file:
apple
orange
banana
lemon
pears
Code of filtering words with 5 letters without dictionary:
def numberofletters(n):
file = open("words.txt","r")
lines = file.readlines()
file.close()
for line in lines:
if len(line) == 6:
print(line)
return
print("===================================================================")
print("This program can use for identify and print out all words in 5
letters from words.txt")
n = input("Please Press enter to start filtering words")
print("===================================================================")
numberofletters(n)
My question is how create a dictionary whose keys are integers and values the English words with that many letters and Use the dictionary to identify and print out all the 5 letter words?
Imaging with a huge list of words
Sounds like a job for a defaultdict.
>>> from collections import defaultdict
>>> length2words = defaultdict(set)
>>>
>>> with open('file.txt') as f:
... for word in f: # one word per line
... word = word.strip()
... length2words[len(word)].add(word)
...
>>> length2words[5]
set(['lemon', 'apple', 'pears'])
If you care about duplicates and insertion order, use a defaultdict(list) and append instead of add.
you can make your for loop like this:
for line in lines:
line_len = len(line)
if line_len not in dicword.keys():
dicword.update({line_len: [line]})
else:
dicword[line_len].append(line)
Then you can get it by just doing dicword[5]
If I understood, you need to write filter your document and result into a file. For that you can write a CSV file with DictWriter (https://docs.python.org/2/library/csv.html).
DictWriter: Create an object which operates like a regular writer but maps dictionaries onto output rows.
BTW, you will be able to store and structure your document
def numberofletters(n):
file = open("words.txt","r")
lines = file.readlines()
file.close()
dicword = {}
writer = csv.DictWriter(filename, fieldnames=fieldnames)
writer.writeheader()
for line in lines:
if len(line) == 6:
writer.writerow({'param_label': line, [...]})
return
I hope that help you.
i have been trying to read/write values(lists) in a .txt file and using them later, but i can't find a function or something to help me use these values as lists and not strings, since using the readline function doesn't help.
Also, im don't want to use multiple text files to make up 1 list
example:
v=[]
f = open("test.txt","r+",-1)
f.seek(0)
v.append(f.readline())
print(v)
in test.txt
cat, dog, dinosaur, elephant
cheese, hotdog, pizza, sushi
101, 23, 58, 23
im expecting to the list v = [cat, dog, dinosaur, elephant] in separate indexes, but by doing this code (which is totally wrong) i get this instead
v = ['cat,dog,dinosaur,elephant'] which is what i don't want
Sounds like you want to read it as comma separated values.
Try the following
import csv
with open('test.txt', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print(row)
I believe that will put you on the right track. For more information about how the csv parser works, have a look at the docs
https://docs.python.org/3/library/csv.html
To me, it looks like you're trying to read a file, and split it by ,.
This can be accomplished by
f = open("test.txt", "r+").read()
v = f.split(",")
print(v)
It should output
['cat', ' dog', ' dinosaur', ' elephant\ncheese', ...]
And so forth.