Trouble with turning .txt file into lists

Trouble with turning .txt file into lists - python

In a project I am doing I am storing the lists in a .txt file in order to save space in my actual code. I can turn each line into a separate list but I need to turn multiple lines into one list where each letter is a different element. I would appreciate any help I can get. The program I wrote is as follows:
lst = open('listvariables.txt', 'r')
data = lst.readlines()
for line in data:
words = line.split()
print(words)
Here is a part of the .txt file I am using:
; 1
wwwwwwwwwww
wwwwswwwwww
wwwsswwwssw
wwskssssssw
wssspkswssw
wwwskwwwssw
wwwsswggssw
wwwswwgwsww
wwsssssswww
wwssssswwww
wwwwwwwwwww
; 2
wwwwwwwww
wwwwwsgsw
wwwwskgsw
wwwsksssw
wwskpswww
wsksswwww
wggswwwww
wssswwwww
wwwwwwwww
If someone could make the program print out two lists that would be great.

You can load the whole file and turn it into a char-list like:
with open('listvariables.txt', 'r') as f
your_list = list(f.read())
I'm not sure why you want to do it, tho. You can iterate over string the same way you can iterate over a list - the only advantage is that list is a mutable object but you wouldn't want to do complex changes to it, anyway.

If you want each character in a string to be an element of the final list you should use
myList = list(myString)

If I understand you correctly this should work:
with open('listvariables.txt', 'r') as my_file:
my_list = [list(line) for line in my_file.read().splitlines()]

Related

Sort file by key

I am learning Python 3 and I'm having issues completing this task. It's given a file with a string on each new line. I have to sort its content by the string located between the first hyphen and the second hyphen and write the sorted content into a different file. This is what I tried so far, but nothing gets sorted:
def sort_keys(path, input, output):
list = []
with open(path+'\\'+input, 'r') as f:
for line in f:
if line.count('-') >= 1:
list.append(line)
sorted(list, key = lambda s: s.split("-")[1])
with open(path + "\\"+ output, 'w') as o:
for line in list:
o.write(line)
sort_keys("C:\\Users\\Daniel\\Desktop", "sample.txt", "results.txt")
This is the input file: https://pastebin.com/j8r8fZP6
Question 1: What am I doing wrong with the sorting? I've used it to sort the words of a sentence on the last letter and it worked fine, but here don't know what I am doing wrong
Question 2: I feel writing the content of the input file in a list, sorting the list and writing aftwerwards that content is not very efficient. What is the "pythonic" way of doing it?
Question 3: Do you know any good exercises to learn working with files + folders in Python 3?
Kind regards

Your sorting is fine. The problem is that sorted() returns a list, rather than altering the one provided. It's also much easier to use list comprehensions to read the file:
def sort_keys(path, infile, outfile):
with open(path+'\\'+infile, 'r') as f:
inputlines = [line.strip() for line in f.readlines() if "-" in line]
outputlines = sorted(inputlines, key=lambda s: s.split("-")[1])
with open(path + "\\" + outfile, 'w') as o:
for line in outputlines:
o.write(line + "\n")
sort_keys("C:\\Users\\Daniel\\Desktop", "sample.txt", "results.txt")
I also changed a few variable names, for legibility's sake.
EDIT: I understand that there are easier ways of doing the sorting (list.sort(x)), however this way seems more readable to me.

First, your data has a couple lines without hyphens. Is that a typo? Or do you need to deal with those lines? If it is NOT a typo and those lines are supposed to be part of the data, how should they be handled?
I'm going to assume those lines are typos and ignore them for now.
Second, do you need to return the whole line? But each line is sorted by the 2nd group of characters between the hyphens? If that's the case...
first, read in the file:
f = open('./text.txt', 'r')
There are a couple ways to go from here, but let's clean up the file contents a little and make a list object:
l = [i.replace("\n","") for i in f]
This will create a list l with all the newline characters removed. This particular way of creating the list is called a list comprehension. You can do the exact same thing with the following code:
l = []
for i in f:
l.append(i.replace("\n","")
Now lets create a dictionary with the key as the 2nd group and the value as the whole line. Again, there are some lines with no hyphens, so we are going to just skip those for now with a simple try/except block:
d = {}
for i in l:
try:
d[i.split("-")[1]] = i
except IndexError:
pass
Now, here things can get slightly tricky. It depends on how you want to approach the problem. Dictionaries are inherently unsorted in python, so there is not a really good way to simply sort the dictionary. ONE way (not necessarily the BEST way) is to create a sorted list of the dictionary keys:
s = sorted([k for k, v in d.items()])
Again, I used a list comprehension here, but you can rewrite that line to do the exact same thing here:
s = []
for k, v in d.items():
s.append(k)
s = sorted(s)
Now, we can write the dictionary back to a file by iterating through the dictionary using the sorted list. To see what I mean, lets print out the dictionary one value at a time using the sorted list as the keys:
for i in s:
print(d[i])
But instead of printing, we will now append the line to a file:
o = open('./out.txt', 'a')
for i in s:
o.write(d[i] + "\n")
Depending on your system and formatting, you may or may not need the + "\n" part. Also note that you want to use 'a' and not 'w' because you are appending one line at a time and if you use 'w' your file will only be the last item of the list.

python store data from file to variables

I’m new to python and I’m taking my first steps to create some scripts.
I want to access a file and store the items into a list and access each item as its own variable.
The format of the file is .txt and it’s as such but not limited to 3 columns it could be 4 or more.
textData,1
textData,2
textData,3,moreData
textData,4,moreData4
textData,5
I know how to read and append to a list and access individual items starting with [0] but when I do that I get textData, 1 for [0] and I only want textData on its own and 1 on its own and so on as I loop through the file.
Below is my start of this:
file = open('fileName','r')
list = []
for items in file:
list.append(items)
print(list[0])
Thank you for taking the time to read and provide direction.

You need to split the lines:
my_list = []
for lines in file:
my_list.append(lines.split(','))
print(my_list)

When you loop over a file object, you get each line as a str. You can then get each word by using the .split method, which returns a list of strs. As your items are comma-separated, we split on ','. We do not want to append this list to the overall list, but rather add all elements of the list to the overall list, hence the +=:
file = open('fileName', 'r')
mylist = []
for line in file:
words = line.split(',')
mylist += words
print(mylist[0])
Also, avoid using list as a variable name, as this is the name of the builtin list function.

How to create a list of string in nth position of every line in Python

What would be a pythonic way to create a list of (to illustrate with an example) the fifth string of every line of a text file, assuming it ressembles something like this:
12, 27.i, 3, 6.7, Hello, 438
In this case, the script would add "Hello" (without quotes) to the list.
In other words (to generalize), with an input "input.txt", how could I get a list in python that takes the nth string (n being a defined number) of every line?
Many thanks in advance!

You could use the csv module to read the file, and store all items in the fifth column in a list:
import csv
with open(my_file) as f:
lst = [row[4] for row in csv.reader(f)]

If its a text file it can be as simple as:
with open(my_file, 'r') as f:
mylist = [line.split(',')[4] for line in f] # adds the 5th element of split to my_list

Since you mentioned that you are using a .txt file, you can try this:
f = open('filename.txt').readlines()
f = [i.strip('\n').split(",") for i in f]
new_f = [i[4] for i in f]

This may not be the most efficient solution, but you could also just hard code it e.g. create a variable equivalent to zero, add one to the variable for each word in the line, and append the word to a list when variable = 5. Then reset the variable equal to zero.

Refering to a list of names using Python

I am new to Python, so please bear with me.
I can't get this little script to work properly:
genome = open('refT.txt','r')
datafile - a reference genome with a bunch (2 million) of contigs:
Contig_01
TGCAGGTAAAAAACTGTCACCTGCTGGT
Contig_02
TGCAGGTCTTCCCACTTTATGATCCCTTA
Contig_03
TGCAGTGTGTCACTGGCCAAGCCCAGCGC
Contig_04
TGCAGTGAGCAGACCCCAAAGGGAACCAT
Contig_05
TGCAGTAAGGGTAAGATTTGCTTGACCTA
The file is opened:
cont_list = open('dataT.txt','r')
a list of contigs that I want to extract from the dataset listed above:
Contig_01
Contig_02
Contig_03
Contig_05
My hopeless script:
for line in cont_list:
if genome.readline() not in line:
continue
else:
a=genome.readline()
s=line+a
data_out = open ('output.txt','a')
data_out.write("%s" % s)
data_out.close()
input('Press ENTER to exit')
The script successfully writes the first three contigs to the output file, but for some reason it doesn't seem able to skip "contig_04", which is not in the list, and move on to "Contig_05".
I might seem a lazy bastard for posting this, but I've spent all afternoon on this tiny bit of code -_-

I would first try to generate an iterable which gives you a tuple: (contig, gnome):
def pair(file_obj):
for line in file_obj:
yield line, next(file_obj)
Now, I would use that to get the desired elements:
wanted = {'Contig_01', 'Contig_02', 'Contig_03', 'Contig_05'}
with open('filename') as fin:
pairs = pair(fin)
while wanted:
p = next(pairs)
if p[0] in wanted:
# write to output file, store in a list, or dict, ...
wanted.forget(p[0])

I would recommend several things:
Try using with open(filename, 'r') as f instead of f = open(...)/f.close(). with will handle the closing for you. It also encourages you to handle all of your file IO in one place.
Try to read in all the contigs you want into a list or other structure. It is a pain to have many files open at once. Read all the lines at once and store them.
Here's some example code that might do what you're looking for
from itertools import izip_longest
# Read in contigs from file and store in list
contigs = []
with open('dataT.txt', 'r') as contigfile:
for line in contigfile:
contigs.append(line.rstrip()) #rstrip() removes '\n' from EOL
# Read through genome file, open up an output file
with open('refT.txt', 'r') as genomefile, open('out.txt', 'w') as outfile:
# Nifty way to sort through fasta files 2 lines at a time
for name, seq in izip_longest(*[genomefile]*2):
# compare the contig name to your list of contigs
if name.rstrip() in contigs:
outfile.write(name) #optional. remove if you only want the seq
outfile.write(seq)

Here's a pretty compact approach to get the sequences you'd like.
def get_sequences(data_file, valid_contigs):
sequences = []
with open(data_file) as cont_list:
for line in cont_list:
if line.startswith(valid_contigs):
sequence = cont_list.next().strip()
sequences.append(sequence)
return sequences
if __name__ == '__main__':
valid_contigs = ('Contig_01', 'Contig_02', 'Contig_03', 'Contig_05')
sequences = get_sequences('dataT.txt', valid_contigs)
print(sequences)
The utilizes the ability of startswith() to accept a tuple as a parameter and check for any matches. If the line matches what you want (a desired contig), it will grab the next line and append it to sequences after stripping out the unwanted whitespace characters.
From there, writing the sequences grabbed to an output file is pretty straightforward.
Example output:
['TGCAGGTAAAAAACTGTCACCTGCTGGT',
'TGCAGGTCTTCCCACTTTATGATCCCTTA',
'TGCAGTGTGTCACTGGCCAAGCCCAGCGC',
'TGCAGTAAGGGTAAGATTTGCTTGACCTA']

Converting .txt file to list AND be able to index and print list line by line

I want to be able to read the file line by line and then when prompted (say user inputs 'background'), it returns lines 0:24 because those are the lines in the .txt that relate to his/her background.
def anaximander_background():
f = open('Anaximander.txt', 'r')
fList = []
fList = f.readlines()
fList = [item.strip('\n') for item in fList]
print(fList[:20])
This code prints me the list like:
['ANAXIMANDER', '', 'Anaximander was born in Miletus in 611 or 610 BCE.', ...]
I've tried a lot of different ways (for, if, and while loops) and tried the csv import.
The closest I've gotten was being able to have a print out akin to:
[ANAXIMANDER]
[]
[info]
and so on, depending on how many objects I retrieve from fList.
I really want it to print like the example I just showed but without the list brackets ([ ]).
Definitely can clarify if necessary.

Either loop over the list, or use str.join():
for line in fList[:20]:
print(line)
or
print('\n'.join(fList[:20])
The first print each element contained in the fList slice separately, the second joins the lines into a new string with \n newline characters between them before printing.

To print the first 20 lines from a file:
import sys
from itertools import islice
with open('Anaximander.txt') as file:
sys.stdout.writelines(islice(file, 20))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trouble with turning .txt file into lists - python

If you want each character in a string to be an element of the final list you should use myList = list(myString)

If I understand you correctly this should work: with open('listvariables.txt', 'r') as my_file: my_list = [list(line) for line in my_file.read().splitlines()]

Related

Sort file by key

python store data from file to variables

How to create a list of string in nth position of every line in Python

Refering to a list of names using Python

Converting .txt file to list AND be able to index and print list line by line

Categories

Resources