How to split into a 2D array - python

I'm making a program for shopping lists :)
The data will be like:
Amanda, Bananas, Apples, Oranges
Steve, Mushrooms, Pork, Spaghetti, Sauce
Dave, Onions, Eggs, Bread, Bacon
This is what I have so far
file = open(filename, "r")
readfile = file.read()
shlist = readfile.splitlines()
So I have created a list where each person's shopping is an item in the list.
Is it possible to split these into another list while still being items of a list themselves? I tried to add the following:
for shopping in shlist:
shopping.split(,)
But I am receiving an error.
Alternately I could just use the index of the commas to deduce the location & length of the items. I am not sure which would be best.

Well, you're getting an error because you meant to type .split(','), but that still won't solve your problem. A call to split() takes a string a generates a list of strings as a result. The result doesn't magically replace the string.
The simplest solution is something like:
with open(filename, "r") as file:
result = [line.split(',') for line in file]
If you need both the pre-split lines and the post-split lines:
with open(filename, "r") as file:
lines = file.readlines()
result = [line.split(',') for line in lines]

Related

Remove commas and newlines from text file in python

I have text file which looks like this:
ab initio
ab intestato
ab intra
a.C.
acanka, acance, acanek, acankach, acankami, acanką
Achab, Achaba, Achabem, Achabie, Achabowi
I would like to pars every word separated by comma into a list. So it would look like ['ab initio', 'ab intestato', 'ab intra','a.C.', 'acanka', ...] Also mind the fact that there are words on new lines that are not ending with commas.
When I used
list1.append(line.strip()) it gave me string of every line instead of separate words. Can someone provide me some insight into this?
Full code below:
list1=[]
filepath="words.txt"
with open(filepath, encoding="utf8") as fp:
line = fp.readline()
while line:
list1.append(line.strip(','))
line = fp.readline()
Very close, but I think you want split instead of strip, and extend instead of append
You can also iterate directly over the lines with a for loop.
list1=[]
filepath="words.txt"
with open(filepath, encoding="utf8") as fp:
for line in fp:
list1.extend(line.strip().split(', '))
You can use your code to get down to "list of line"-content and apply:
cleaned = [ x for y in list1 for x in y.split(',')]
this essentially takes any thing you parsed into your list and splits it at , to creates a new list.
sberrys all in one solution that uses no intermediate list is faster.

How to create a list from a text file in Python

I have a text file called "test", and I would like to create a list in Python and print it. I have the following code, but it does not print a list of words; it prints the whole document in one line.
file = open("test", 'r')
lines = file.readlines()
my_list = [line.split(' , ')for line in open ("test")]
print (my_list)
You could do
my_list = open("filename.txt").readlines()
When you do this:
file = open("test", 'r')
lines = file.readlines()
Lines is a list of lines. If you want to get a list of words for each line you can do:
list_word = []
for l in lines:
list_word.append(l.split(" "))
I believe you are trying to achieve something like this:
data = [word.split(',') for word in open("test", 'r').readlines()]
It would also help if you were to specify what type of text file you are trying to read as there are several modules(i.e. csv) that would produce the result in a much simpler way.
As pointed out, you may also strip a new line(depends on what line ending you are using) and you'll get something like this:
data = [word.strip('\n').split(',') for word in open("test", 'r').readlines()]
This produces a list of lines with a list of words.

Reading inputs written in different lines from a file and storing each input inside a list

I'm trying to read a file named one.txt which contains the following:
hat
cow
Zu6
This is a sentence
and I'm trying to store each string written on each line inside a list. For example, my output list should contain the following elements:
['hat', 'cow', 'Zu6', 'This is a sentence']
Here's my approach for doing this:
def first(ss):
f = open(ss, 'r')
text = f.readline()
f.close()
lines = []
li = [lines.append(line) for line in text]
print li
first('D:\\abc\\1\\one.txt')
However, when I try to print li, here's what I get as the output:
[None, None, None, None]
What's wrong with my approach?
print list(open("my_text.txt"))
is probably a pretty easy way to do it ...
ofc people are gonna come screaming about dangling pointers so for the sake of good habits
with open("my_text.txt") as f:
print list(f)
alternatively
f.readlines()
you might need to strip off some newline characters
[line.strip() for line in f]

Create a column from a CSV list in Python 3

What I have is a CSV file where the header is "keyword" and each cell under the header contains text so that it that looks like this:
Keyword
Lions Tigers Bears
Dog Cat
Fish
Shark Guppie
What I am trying to do is to parse each of the phrases in that list into individual words so that the end product looks like this:
Keyword
Lion
Tigers
Bear
Dog
Cat...
Right now, my code takes the CSV file and splits the list into individual parts but still does not create a uniform column.
datafile = open(b'C:\Users\j\Desktop\helloworld.csv', 'r')
data = []
for row in datafile:
data.append(row.strip().split(","))
white = row.split()
print (white)
and my output looks like this:
['Keyword']
['Lion', 'Tigers']
['Dolphin', 'Bears', 'Zebra']
['Dog', 'Cat']
I know that a possible solution would involve the use of lineterminator = '\n' but I am not sure how to incorporate that into my code. Any help would be very much appreciated!
** EDITED -- the source CSV does not have commas separating the words within each phrase
Use extend instead of append on lists to add all items from a list to another one:
datafile = open(b'C:\Users\j\Desktop\helloworld.csv', 'r')
data = []
for row in datafile:
data.extend(row.strip().split())
print(data)
To get rid of further whitespace around the individual entries, use
datafile = open(b'C:\Users\j\Desktop\helloworld.csv', 'r')
data = []
for row in datafile:
data.extend(item.strip() for item in row.split())
print(data)
Also, to read files safely, you can make use of a with statement (you won't have to take care of closing your files anymore):
with open('C:\Users\j\Desktop\helloworld.csv', 'r') as datafile:
data = []
for row in datafile:
data.extend(item.strip() for item in row.split())
print(data)
EDIT: After OP clarification, I removed the "," argument in split to split on whitespace rather than on commata.
You should be able to use this code to read your file. Replace file name with what you have. My file content is exactly what you posted above.
keyword = "Keyword"
with open("testing.txt") as file:
data = file.read().replace("\n", " ").split(" ")
for item in data:
if item == keyword:
print("%s" % keyword)
else:
print(" %s" % item)
Output:
Keyword
Lions
Tigers
Bears
Dog
Cat
Fish
Shark
Guppie
Keyword
Dog
Something
Else
Entirely
You just need to split the read:
with open("in.txt","r+") as f:
data = f.read().split()
f.seek(0) # go back to start of file
f.write("\n".join(data)) # write new data to file
['Keyword', 'Lions', 'Tigers,', 'Bears', 'Dog', 'Cat', 'Fish', 'Shark', 'Guppie']

Deleting certain line of text file in python

I have the following text file:
This is my text file
NUM,123
FRUIT
DRINK
FOOD,BACON
CAR
NUM,456
FRUIT
DRINK
FOOD,BURGER
CAR
NUM,789
FRUIT
DRINK
FOOD,SAUSAGE
CAR
NUM,012
FRUIT
DRINK
FOOD,MEATBALL
CAR
And I have the following list called 'wanted':
['123', '789']
What I'm trying to do is if the numbers after NUM is not in the list called 'wanted', then that line along with 4 lines below it gets deleted. So the output file will looks like:
This is my text file
NUM,123
FRUIT
DRINK
FOOD,BACON
CAR
NUM,789
FRUIT
DRINK
FOOD,SAUSAGE
CAR
My code so far is:
infile = open("inputfile.txt",'r')
data = infile.readlines()
for beginning_line, ube_line in enumerate(data):
UNIT = data[beginning_line].split(',')[1]
if UNIT not in wanted:
del data_list[beginning_line:beginning_line+4]
You shouldn't modify a list while you are looping over it.
What you could try is to just advance the iterator on the file object when needed:
wanted = set(['123', '789'])
with open("inputfile.txt",'r') as infile, open("outfile.txt",'w') as outfile:
for line in infile:
if line.startswith('NUM,'):
UNIT = line.strip().split(',')[1]
if UNIT not in wanted:
for _ in xrange(4):
infile.next()
continue
outfile.write(line)
And use a set. It is faster for constantly checking the membership.
This approach doesn't make you read in the entire file at once to process it in a list form. It goes line by line, reading from the file, advancing, and writing to the new file. If you want, you can replace the outfile with a list that you are appending to.
There are some issues with the code; for instance, data_list isn't even defined. If it's a list, you can't del elements from it; you can only pop. Then you use both enumerate and direct index access on data; also readlines is not needed.
I'd suggest to avoid keeping all lines in memory, it's not really needed here. Maybe try with something like (untested):
with open('infile.txt') as fin, open('outfile.txt', 'w') as fout:
for line in fin:
if line.startswith('NUM,') and line.split(',')[1] not in wanted:
for _ in range(4):
fin.next()
else:
fout.write(line)
import re
# find the lines that match NUM,XYZ
nums = re.compile('NUM,(?:' + '|'.join(['456','012']) + ")")
# find the three lines after a nums match
line_matches = breaks = re.compile('.*\n.*\n.*\n')
keeper = ''
for line in nums.finditer(data):
keeper += breaks.findall( data[line.start():] )[0]
result on the given string is
NUM,456
FRUIT
DRINK
FOOD,BURGER
NUM,012
FRUIT
DRINK
FOOD,MEATBALL
edit: deleting items while iterating is probably not a good idea, see: Remove items from a list while iterating
infile = open("inputfile.txt",'r')
data = infile.readlines()
SKIP_LINES = 4
skip_until = False
result_data = []
for current_line, line in enumerate(data):
if skip_until and skip_until < current_line:
continue
try:
_, num = line.split(',')
except ValueError:
pass
else:
if num not in wanted:
skip_until = current_line + SKIP_LINES
else:
result_data.append(line)
... and result_data is what you want.
If you don't mind building a list, and iff your "NUM" lines come every 5 other line, you may want to try:
keep = []
for (i, v) in enumerate(lines[::5]):
(num, current) = v.split(",")
if current in wanted:
keep.extend(lines[i*5:i*5+5])
Don't try to think of this in terms of building up a list and removing stuff from it while you loop over it. That way leads madness.
It is much easier to write the output file directly. Loop over lines of the input file, each time deciding whether to write it to the output or not.
Also, to avoid difficulties with the fact that not every line has a comma, try just using .partition instead to split up the lines. That will always return 3 items: when there is a comma, you get (before the first comma, the comma, after the comma); otherwise, you get (the whole thing, empty string, empty string). So you can just use the last item from there, since wanted won't contain empty strings anyway.
skip_counter = 0
for line in infile:
if line.partition(',')[2] not in wanted:
skip_counter = 5
if skip_counter:
skip_counter -= 1
else:
outfile.write(line)

Categories

Resources