How to use lists from a text file in python? - python

i have been trying to read/write values(lists) in a .txt file and using them later, but i can't find a function or something to help me use these values as lists and not strings, since using the readline function doesn't help.
Also, im don't want to use multiple text files to make up 1 list
example:
v=[]
f = open("test.txt","r+",-1)
f.seek(0)
v.append(f.readline())
print(v)
in test.txt
cat, dog, dinosaur, elephant
cheese, hotdog, pizza, sushi
101, 23, 58, 23
im expecting to the list v = [cat, dog, dinosaur, elephant] in separate indexes, but by doing this code (which is totally wrong) i get this instead
v = ['cat,dog,dinosaur,elephant'] which is what i don't want

Sounds like you want to read it as comma separated values.
Try the following
import csv
with open('test.txt', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print(row)
I believe that will put you on the right track. For more information about how the csv parser works, have a look at the docs
https://docs.python.org/3/library/csv.html

To me, it looks like you're trying to read a file, and split it by ,.
This can be accomplished by
f = open("test.txt", "r+").read()
v = f.split(",")
print(v)
It should output
['cat', ' dog', ' dinosaur', ' elephant\ncheese', ...]
And so forth.

Related

Nested for loop doesn't work in python while reading a same csv file

I'm a beginner in python, and tried to find solution by googling. However, I couldn't find any solution that I wanted.
What I'm trying to do with python is pre-processing of data that finds keywords and get all rows that include keyword from a large csv file.
And somehow the nested loop goes through just once and then it doesn't go through on second loop.
The code shown below is a part of my code that finds keywords from the csv file and writes into text file.
def main():
#Calling file (Directory should be changed)
data_file = 'dataset.json'
#Loading data.json file
with open(data_file, 'r') as fp:
data = json.load(fp)
#Make the list for keys
key_list = list(data.keys())
#print(key_list)
preprocess_txt = open("test_11.txt", "w+", -1, "utf-8")
support_fact = 0
for i, k in enumerate(key_list):
count = 1
#read csv, and split on "," the line
with open("my_csvfile.csv", 'r', encoding = 'utf-8') as csvfile:
reader = csv.reader(csvfile)
#The number of q_id is 2
#This is the part that the nested for loop doesn't work!!!!!!!!!!!!!!!!!!!!!!!!!!!!
if len(data[k]['Qids']) == 2:
print("Number 2")
for m in range(len(data[k]['Qids'])):
print(len(data[k]['Qids']))
q_id = [data[k]['Qids'][m]]
print(q_id)
for row in reader: #--->This nested for loop doesn't work after going through one loop!!!!!
if all([x in row for x in q_id]):
print("YES!!!")
preprocess_txt.write("%d %s %s %s\n" % (count, row[0], row[1], row[2]))
count += 1
For the details of above code,
First, it extracts all keys from data.json file, and then put those keys into list(key_list).
Second, I used all([x in row for x in q_id]) method to check each row which contains a keyword(q_id).
However, as I commented above in the code, when the length of data[k]['Qids'] has 2, it prints out YES!!! at first loop correctly, but doesn't print out YES!!!at second loop which means it doesn't go into for row in reader loop even though that csv file contains the keyword.
The figure of print is shown as below,
What did I do wrong..? or what should I add for the code to make it work..?
Can anybody help me out..?
Thanks for looking!
For sake of example, let's say I have a CSV file which looks like this:
foods.csv
beef,stew,apple,sauce
apple,pie,potato,salami
tomato,cherry,pie,bacon
And the following code, which is meant to simulate the structure of your current code:
def main():
import csv
keywords = ["apple", "pie"]
with open("foods.csv", "r") as file:
reader = csv.reader(file)
for keyword in keywords:
for row in reader:
if keyword in row:
print(f"{keyword} was in {row}")
print("Done")
main()
The desired result is that, for every keyword in my list of keywords, if that keyword exists in one of the lines in my CSV file, I will print a string to the screen - indicating in which row the keyword has occurred.
However, here is the actual output:
apple was in ['beef', 'stew', 'apple', 'sauce']
apple was in ['apple', 'pie', 'potato', 'salami']
Done
>>>
It was able to find both instances of the keyword apple in the file, but it didn't find pie! So, what gives?
The problem
The file handle (in your case csvfile) yields its contents once, and then they are consumed. Our reader object wraps around the file-handle and consumes its contents until they are exhausted, at which point there will be no rows left to read from the file (the internal file pointer has advanced to the end), and the inner for-loop will not execute a second time.
The solution
Either move the interal file pointer to the beginning using seek after each iteration of the outer for-loop, or read the contents of the file once into a list or similar collection, and then iterate over the list instead:
Updated code:
def main():
import csv
keywords = ["apple", "pie"]
with open("foods.csv", "r") as file:
contents = list(csv.reader(file))
for keyword in keywords:
for row in contents:
if keyword in row:
print(f"{keyword} was in {row}")
print("Done")
main()
New output:
apple was in ['beef', 'stew', 'apple', 'sauce']
apple was in ['apple', 'pie', 'potato', 'salami']
pie was in ['apple', 'pie', 'potato', 'salami']
pie was in ['tomato', 'cherry', 'pie', 'bacon']
Done
>>>
I believe that your reader variable contains only the first line of your csv file, thus for row in reader executes only once.
try:
with open("my_csvfile.csv", newline='', 'r', encoding = 'utf-8') as csvfile:
newline='' is the new argument introduced above.
reference: https://docs.python.org/3/library/csv.html#id3
Quote: "If csvfile is a file object, it should be opened with newline=''

Parsing a text file with line breaks in python

I have a text file with about 20 entries. They look like this:
~
England
Link: http://imgur.com/foobar.jpg
Capital: London
~
Iceland
Link: http://imgur.com/foobar2.jpg
Capital: Reykjavik
...
etc.
I would like to take these entries and turn them into a CSV.
There is a '~' separating each entry. I'm scratching my head trying to figure out how to go thru line by line and create the CSV values for each country. Can anyone give me a clue on how to go about this?
Use the libraries luke :)
I'm assuming your data is well formatted. Most real world data isn't that way. So, here goes a solution.
>>> content.split('~')
['\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n', '\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n']
For writing the CSV, Python has standard library functions.
>>> import csv
>>> csvfile = open('foo.csv', 'wb')
>>> fieldnames = ['Country', 'Link', 'Capital']
>>> writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
>>> for entry in entries:
... cols = entry.strip().splitlines()
... writer.writerow({'Country': cols[0], 'Link':cols[1].split(': ')[1], 'Capital':cols[2].split(':')[1]})
...
If your data is more semi structured or badly formatted, consider using a library like PyParsing.
Edit:
Second column contains URLs, so we need to handle the splits well.
>>> cols[1]
'Link: http://imgur.com/foobar2.jpg'
>>> cols[1].split(':')[1]
' http'
>>> cols[1].split(': ')[1]
'http://imgur.com/foobar2.jpg'
The way that I would do that would be to use the open() function using the syntax of:
f = open('NameOfFile.extensionType', 'a+')
Where "a+" is append mode. The file will not be overwritten and new data can be appended. You could also use "r+" to open the file in read mode, but would lose the ability to edit. The "+" after a letter signifies that if the document does not exist, it will be created. The "a+" I've never found to work without the "+".
After that I would use a for loop like this:
data = []
tmp = []
for line in f:
line.strip() #Removes formatting marks made by python
if line == '~':
data.append(tmp)
tmp = []
continue
else:
tmp.append(line)
Now you have all of the data stored in a list, but you could also reformat it as a class object using a slightly different algorithm.
I have never edited CSV files using python, but I believe you can use a loop like this to add the data:
f2 = open('CSVfileName.csv', 'w') #Can change "w" for other needs i.e "a+"
for entry in data:
for subentry in entry:
f2.write(str(subentry) + '\n') #Use '\n' to create a new line
From my knowledge of CSV that loop would create a single column of all of the data. At the end remember to close the files in order to save the changes:
f.close()
f2.close()
You could combine the two loops into one in order to save space, but for the sake of explanation I have not.

Create a column from a CSV list in Python 3

What I have is a CSV file where the header is "keyword" and each cell under the header contains text so that it that looks like this:
Keyword
Lions Tigers Bears
Dog Cat
Fish
Shark Guppie
What I am trying to do is to parse each of the phrases in that list into individual words so that the end product looks like this:
Keyword
Lion
Tigers
Bear
Dog
Cat...
Right now, my code takes the CSV file and splits the list into individual parts but still does not create a uniform column.
datafile = open(b'C:\Users\j\Desktop\helloworld.csv', 'r')
data = []
for row in datafile:
data.append(row.strip().split(","))
white = row.split()
print (white)
and my output looks like this:
['Keyword']
['Lion', 'Tigers']
['Dolphin', 'Bears', 'Zebra']
['Dog', 'Cat']
I know that a possible solution would involve the use of lineterminator = '\n' but I am not sure how to incorporate that into my code. Any help would be very much appreciated!
** EDITED -- the source CSV does not have commas separating the words within each phrase
Use extend instead of append on lists to add all items from a list to another one:
datafile = open(b'C:\Users\j\Desktop\helloworld.csv', 'r')
data = []
for row in datafile:
data.extend(row.strip().split())
print(data)
To get rid of further whitespace around the individual entries, use
datafile = open(b'C:\Users\j\Desktop\helloworld.csv', 'r')
data = []
for row in datafile:
data.extend(item.strip() for item in row.split())
print(data)
Also, to read files safely, you can make use of a with statement (you won't have to take care of closing your files anymore):
with open('C:\Users\j\Desktop\helloworld.csv', 'r') as datafile:
data = []
for row in datafile:
data.extend(item.strip() for item in row.split())
print(data)
EDIT: After OP clarification, I removed the "," argument in split to split on whitespace rather than on commata.
You should be able to use this code to read your file. Replace file name with what you have. My file content is exactly what you posted above.
keyword = "Keyword"
with open("testing.txt") as file:
data = file.read().replace("\n", " ").split(" ")
for item in data:
if item == keyword:
print("%s" % keyword)
else:
print(" %s" % item)
Output:
Keyword
Lions
Tigers
Bears
Dog
Cat
Fish
Shark
Guppie
Keyword
Dog
Something
Else
Entirely
You just need to split the read:
with open("in.txt","r+") as f:
data = f.read().split()
f.seek(0) # go back to start of file
f.write("\n".join(data)) # write new data to file
['Keyword', 'Lions', 'Tigers,', 'Bears', 'Dog', 'Cat', 'Fish', 'Shark', 'Guppie']

parse a csv file into a text file

I am a second year EE student.
I just started learning python for my project.
I intend to parse a csv file with a format like
3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1
2466023,"Montréal (Que.)",V ,F,1620693,1583590,T,F,2.3,787060,743204,365.1303,4438.7,2
5915022,"Vancouver (B.C.)",CY ,F,578041,545671,F,F,5.9,273804,253212,114.7133,5039.0,8
3519038,"Richmond Hill (Ont.)",T ,F,162704,132030,F,F,23.2,53028,51000,100.8917,1612.7,28
into a text file like the following
Toronto 2503281
Montreal 1620693
Vancouver 578041
I am extracting the 1st and 5th column and save it into a text file.
This is what i have so far.
import csv
file = open('raw.csv')
reader = csv.reader(file)
f = open('NicelyDone.text','w')
for line in reader:
f.write("%s %s"%line[1],%line[5])
This is not working for me, I was able to extract the data from the csv file as line[1],line[5]. (I am able to print it out)
But I dont know how to write it to a .text file in the format i wanted.
Also, I have to process the first column eg, "Toronto (Ont.)" into "Toronto".
I am familiar with the function find(), I assume that i could extract Toronto out of Toronto(Ont.) using "(" as the stopping character,
but based on my research , I have no idea how to use it and ask it to return me the string(Toronto).
Here is my question:
What is the data format for line[1]?
If it is string how come f.write() does not work?
If it is not string, how do i convert it to a string?
How do i extract the word Toronto out of Toronto(Ont) into a string form using find() or other methods.
My thinking is that I could add those 2 string together like c = a+ ' ' + b, that would give me the format i wanted.
So i can use f.write() to write into a file :)
Sorry if my questions sounds too easy or stupid.
Thanks ahead
Zhen
All data read you get from csv.reader are strings.
There is a variety of solutions to this, but the simplest would be to split on ( and strip away any whitespace:
>>> a = 'Toronto (Ont.)'
>>> b = a.split('(')
>>> b
Out[16]: ['Toronto ', 'Ont.)']
>>> c = b[0]
>>> c
Out[18]: 'Toronto '
>>> c.strip()
Out[19]: 'Toronto'
or in one line:
>>> print 'Toronto (Ont.)'.split('(')[0].strip()
Another option would have been to use regular expression (the re module).
The specific problem in your code lies here:
f.write("%s %s"%line[1],%line[5])
Using the % syntax to format your string, you have to provide either a single value, or an iterable. In your case this should be:
f.write("%s %s" % (line[1], line[5]))
Another way to do the exact same thing, is to use the format method.
f.write('{} {}'.format(line[1], line[5]))
This is a flexible way of formating strings, and I recommend that you read about in the docs.
Regarding your code, there is a couple of things you should consider.
Always remember to close your file handlers. If you use with open(...) as fp, this is taken care of for you.
with open('myfile.txt') as ifile:
# Do stuff
# The file is closed here
Don't use reserved words as your variable name. file is such a thing, and by using it as something else (shadowing it), you may cause problems later on in your code.
To write your data, you can use csv.writer:
with open('myfile.txt', 'wb') as ofile:
writer = csv.writer(ofile)
writer.writerow(['my', 'data'])
From Python 2.6 and above, you can combine multiple with statements in one statement:
with open('raw.csv') as ifile, open('NicelyDone.text','w') as ofile:
reader = csv.reader(ifile)
writer = csv.writer(ofile)
Combining this knowledge, your script can be rewritten to something like:
import csv
with open('raw.csv') as ifile, open('NicelyDone.text', 'wb') as ofile:
reader = csv.reader(ifile)
writer = csv.writer(ofile, delimiter=' ')
for row in reader:
city, num = row[1].split('(')[0].strip(), row[5]
writer.writerow([city, num])
I don't recall csv that well, so I don't know if it's a string or not. What error are you getting? In any case, assuming it is a string, your line should be:
f.write("%s %s " % (line[1], line[5]))
In other words, you need a set of parentheses. Also, you should have a trailing space in your string.
A somewhat hackish but concise way to do this is: line[1].split("(")[0]
This will create a list that splits on the ( symbol, and then you extract the first element.

Fixed Length Text File using csv

I have a csv file that looks like this:
123456,456789,12345,123.45,123456
123456,456789,12345,123.45,123456
123456,456789,12345,123.45,123456
I am extremly new to Python programming but I'm learning and finding Python to be very useful. I basically want the output to look like this:
123456 456789 12345 123.45 123456
123456 456789 12345 123.45 123456
123456 456789 12345 123.45 123456
Basically, all fields right justified, having fixed length. There are no heading in the csv file.
Here's the code I have tried so far and like I said, I'm very new to Python:
import csv
with open('test.csv') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
for row in spamreader:
print(', '.join(row))
with open('test2.txt', 'wb') as f:
writer = csv.writer(f)
writer.writerows(f)
Any help would be greatly appreciated: Thank You in advance.
OK you have a mess of problems with your code:
Your indentation is all wrong. That's one of the basic concepts of python. Go search the web and read a little about it if you don't understand what I mean
the part that opens 'test2.txt' is inside the loop of spamreader, meaning it is re-opened and truncated for every row in 'test.csv'.
you are trying to write the file to itself with this line: writer.writerows(f) (remember? f is the file you are writing to...)
You are using a csv.writer to write lines to a txt file.
You want a spacing between each item but you're not doing that anywhere in your code
So to sum up all those problems, here's a fixed example, which is really not that far away from your code as it is:
import csv
res = []
# start a loop to collect the data
with open('test.csv') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
for row in spamreader:
line = '\t'.join(row) + '\r\n' # the \n is for linebreaks. \r is so notepad loves you too
res.append(line)
# now, outside the loop, we can do this:
with open('test2.txt', 'wb') as f:
f.writelines(res)
EDIT
If you want to control the spacing you can use ljust function like this:
line = ''.ljust(2).join(row)
This will make sure there are 2 spaces between each item. space is the default, but if you want to specify what ljust will be using you can add a second parameter to it:
line = ''.ljust(5, '-').join(row)
then each line would look like this:
123456-----456789-----12345-----123.45-----123456
And thanks for Philippe T. who mentioned it in the comments
2nd Edit
If you want a different length for each column you need to predefine it. The best way would be to create a list in the same length as your csv file columns, with each item being the length of that column and last one being the ending of the line (which is convenient because ''.join doesn't do that by itself), then zip it with your row. Say you want a tab for the first column, then two spaces between each of the other columns. Then your code would look like this:
spacing = ['\t', ' ', ' ', ' ', '\r\n']
# ... the same code from before ...
line = ''.join([j for i in zip(row, spacing) for j in i])
# ... rest of the code ...
The list comprehension loop is a bit convoluted, but think about it like this:
for i in zip(row, spacing): # the zip here equals ==> [(item1, '\t'), (item2, ' ') ...]
for j in i: # now i == (item1, '\t')
j # so j is just the items of each tuple
With the list comprehension, this outputs: [item1, '\t', item2, ' ', ... ]. You join that together and thats it.
Try this:
import csv
with open('data.csv') as fin, open('out.txt','w') as fout:
data = csv.reader(fin,delimiter=',')
resl = csv.writer(fout,delimiter='\t')
resl.writerows(data)

Categories

Resources