Python - file contents to nested list

Python - file contents to nested list - python

I have a file in tab delimited format with trailing newline characters, e.g.,
123 abc
456 def
789 ghi
I wish to write function to convert the contents of the file into a nested list. To date I have tried:
def ls_platform_ann():
keyword = []
for line in open( "file", "r" ).readlines():
for value in line.split():
keyword.append(value)
and
def nested_list_input():
nested_list = []
for line in open("file", "r").readlines():
for entry in line.strip().split():
nested_list.append(entry)
print nested_list
.
The former creates a nested list but includes \n and \t characters. The latter does not make a nested list but rather lots of equivalent lists without \n and \t characters.
Anyone help?
Regards,
S ;-)

You want the csv module.
import csv
source = "123\tabc\n456\tdef\n789\tghi"
lines = source.split("\n")
reader = csv.reader(lines, delimiter='\t')
print [word for word in [row for row in reader]]
Output:
[['123', 'abc'], ['456', 'def'], ['789', 'ghi']]
In the code above Ive put the content of the file right in there for easy testing. If youre reading from a file from disk you can do this as well (which might be considered cleaner):
import csv
reader = csv.reader(open("source.csv"), delimiter='\t')
print [word for word in [row for row in reader]]

Another option that doesn't involve the csv module is:
data = [[item.strip() for item in line.rstrip('\r\n').split('\t')] for line in open('input.txt')]
As a multiple line statement it would look like this:
data = []
for line in open('input.txt'):
items = line.rstrip('\r\n').split('\t') # strip new-line characters and split on column delimiter
items = [item.strip() for item in items] # strip extra whitespace off data items
data.append(items)

First off, have a look at the csv module, it should handle the whitespace for you. You may also want to call strip() on value/entry.

Related

Remove commas and newlines from text file in python

I have text file which looks like this:
ab initio
ab intestato
ab intra
a.C.
acanka, acance, acanek, acankach, acankami, acanką
Achab, Achaba, Achabem, Achabie, Achabowi
I would like to pars every word separated by comma into a list. So it would look like ['ab initio', 'ab intestato', 'ab intra','a.C.', 'acanka', ...] Also mind the fact that there are words on new lines that are not ending with commas.
When I used
list1.append(line.strip()) it gave me string of every line instead of separate words. Can someone provide me some insight into this?
Full code below:
list1=[]
filepath="words.txt"
with open(filepath, encoding="utf8") as fp:
line = fp.readline()
while line:
list1.append(line.strip(','))
line = fp.readline()

Very close, but I think you want split instead of strip, and extend instead of append
You can also iterate directly over the lines with a for loop.
list1=[]
filepath="words.txt"
with open(filepath, encoding="utf8") as fp:
for line in fp:
list1.extend(line.strip().split(', '))

You can use your code to get down to "list of line"-content and apply:
cleaned = [ x for y in list1 for x in y.split(',')]
this essentially takes any thing you parsed into your list and splits it at , to creates a new list.
sberrys all in one solution that uses no intermediate list is faster.

Breaking txt file into list of lists by character and by row

I am just learning to code and am trying to take an input txt file and break into a list (by row) where each row's characters are elements of that list. For example if the file is:
abcde
fghij
klmno
I would like to create
[['a','b','c','d','e'], ['f','g','h','i','j'],['k','l','m','n','o']]
I have tried this, but the results aren't what I am looking for.
file = open('alpha.txt', 'r')
lst = []
for line in file:
lst.append(line.rstrip().split(','))
print(lst)
[['abcde', 'fghij', 'klmno']]
I also tried this, which is closer, but I don't know how to combine the two codes:
file = open('alpha.txt', 'r')
lst = []
for line in file:
for c in line:
lst.append(c)
print(lst)
['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o']
I tried to add the rstrip into the lst.append but it didn't work (or I didn't do it properly). Sorry - complete newbie here!
I should mention that I don't want newline characters included. Any help is much appreciated!

This is very simple. You have to use the list() constructor to make a string into its respective characters.
with open('alpha.txt', 'r') as file:
print([list(line)[:-1] for line in file.readlines()])
(The with open construct is just an idiom, so you don't have to do all the handling with the file like closing it, which you forgot to do)

If you want to split a string to it's charts you can just use list(s) (where s = 'asdf'):
file = open('alpha.txt', 'r')
lst = []
for line in file:
lst.append(list(line.strip()))
print(lst)

You are appending each entry to your original list. You want to create a new list for each line in your input, append to that list, and then append that list to your master list. For example,
file = open('alpha.txt', 'r')
lst = []
for line in file:
newLst = []
for c in line:
newLst.append(c)
lst.append(newLst)
print(lst)

use a nested list comprehension. The outer loop iterates over the lines in the file and the inner loop over the characters in the strings of each line.
with open('alpha.txt') as f:
out = [[char for char in line.strip()] for line in f]
req = [['a','b','c','d','e'], ['f','g','h','i','j'],['k','l','m','n','o']]
print(out == req)
prints
True

How to delete the last letter in the word if it is 'a' or 'o'?

I've got txt file with list of words, something like this:
adsorbowanie
adsorpcje
adular
adwena
adwent
adwentnio
adwentysta
adwentystka
adwersarz
adwokacjo
And I want to delete the last letter in every word, if that letter is "a" or "o".
I'm very new to this, so please explain this simply.

re.sub(r"[ao]$","",word)
This should do it for you.

Try this:
import re
import os
# Read the file, split the contents into a list of lines,
# removing line separators
with open('input.txt') as infile:
lines = infile.read().splitlines()
# Remove any whitespace around the word.
# If you are certain the list doesn't contain whitespace
# around the word, you can leave this out...
# (this is called a "list comprehansion", by the way)
lines = [line.strip() for line in lines]
# Remove letters if necessary, using regular expressions.
outlines = [re.sub('[ao]$', '', line) for line in lines]
# Join the output with appropriate line separators
outdata = os.linesep.join(outlines)
# Write the output to a file
with open('output.txt', 'w') as outfile:
outfile.write(outdata)

First read the file and split the lines. After that cut off the last char if your condition is fulfilled and append the new string to a list containing the analysed and modified strings/lines:
#!/usr/bin/env python3
# coding: utf-8
# open file, read lines and store them in a list
with open('words.txt') as f:
lines = f.read().splitlines()
# analyse lines read from file
new_lines = []
for s in lines:
# analyse last char of string,
# get rid of it if condition is fulfilled and append string to new list
s = s[:-1] if s[-1] in ['a', 'o'] else s
new_lines.append(s)
print(new_lines)

Strip outside quotes from text when writing to list in Python

I have a text file that looks like this when opened as file:
'element1', 'element2', 'element3', 'element4'
I then use a list comprehension to read this into a list
thelist = [line.strip() for line in open('file.txt', 'r')]
But the list looks like this with only one element at index 0
["'element1', 'element2', 'element3', 'element4'"]
Since the double quotes are appended to the ends of the first and last elements, python thinks it is a single element list. Is there a way to use the "strip()" inside the list comprehension to remove those outside double quotes?

You don't have a list, you have a string that looks like a Python sequence.
You could use ast.literal_eval() to interpret that as a Python literal; it'll be parsed as a tuple of strings:
import ast
thelist = [ast.literal_eval(line) for line in open('file.txt', 'r')]
or you could just split on the comma, then strip the spaces and quotes:
thelist = [[elem.strip(" '") for elem in line.split(',')]
for line in open('file.txt', 'r')]
This only works if your quoted values don't themselves contain commas.
Either way you get a list of lists; you could flatten that list:
thelist = [elem for line in open('file.txt', 'r')
for elem in ast.literal_eval(line)]
or just read the one line:
thelist = ast.literal_eval(next(open('file.txt', 'r')))
You could also use the csv module:
import csv
reader = csv.reader(
open('file.text', 'rb'),
quotechar="'", skipinitialspace=True)
thelist = list(reader)
or for the first row only:
thelist = next(reader)

How to extract text, line by line from a txt file in python

I have a txt file like this :
audi lamborghini
ferrari
pagani
when I use this code :
with open("test.txt") as inp:
data = set(inp.read().split())
this gives data as : ['pagani', 'lamborghini', 'ferrari', 'audi']
What I want, is to extract text from the txt file, line by line such the output data is
['audi lamborghini','ferrari','pagani']
How this can be done ?

data = inp.read().splitlines()
You could do
data = inp.readlines()
or
data = list(inp)
but the latter two will leave newline characters on each line, which tends to be undesirable.
Note that since you care about order, putting your strings into any sort of set is not advisable - that destroys order.

Because file objects are iterable, you can just do:
with open("test.txt") as inp:
data = list(inp) # or set(inp) if you really need a set
(documentation reference)
Alternatively, more verbose (with list comprehension you can remove trailing newlines here also):
with open("test.txt") as inp:
data = inp.readlines()
or (not very Pythonic, but gives you even more control):
data = []
with open("test.txt") as inp:
for line in inp:
data.append(line)

You can try the readlines command which would return a list.
with open("test.txt") as inp:
data = set(inp.readlines())
In case of the doing.
data = set(inp.read().split())
You are first reading the whole file as one string (inp.read()), then you are using split() on that which causes the string to be split on whitespaces.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - file contents to nested list - python

First off, have a look at the csv module, it should handle the whitespace for you. You may also want to call strip() on value/entry.

Related

Remove commas and newlines from text file in python

Breaking txt file into list of lists by character and by row

How to delete the last letter in the word if it is 'a' or 'o'?

Strip outside quotes from text when writing to list in Python

How to extract text, line by line from a txt file in python

Categories

Resources