Python: Too many values to unpack (dictionary) - python

I'm trying to add key-value pairs to a dictionary by pairing two and two lines from a text file. Why does this not work?
newdata = {}
os.chdir("//GOLLUM//tbg2//tbg2//forritGB")
f = open(filename)
for line1, line2 in f.readlines():
newdata[line1] = line2
edit: The error I get is
ValueError: too many values to unpack

You are reading all lines, and assigning the first line (a sequence) to two variables. This only works if the first line consists of 2 characters. Use the file as an iterator instead:
newdata = {}
os.chdir("//GOLLUM//tbg2//tbg2//forritGB")
with open(filename) as f:
for line1 in f:
newdata[line1.strip()] = next(f, '').strip()
Here next() reads the next line from the file.
The alternative would be to use a pair-wise recipe:
from itertools import izip_longest
def pairwise(iterable):
return izip_longest(*([iter(iterable)] * 2), '')
newdata = {}
os.chdir("//GOLLUM//tbg2//tbg2//forritGB")
with open(filename) as f:
for line1, line2 in pairwise(f):
newdata[line1.strip()] = line2.strip()
Note the str.strip() calls, to remove any extra whitespace (including the newline at the end of each line).

newdata = {}
os.chdir("//GOLLUM//tbg2//tbg2//forritGB")
with open(filename) as f:
for line1, line2 in zip(*[iter(f)]*2):
newdata[line1] = line2
or
os.chdir("//GOLLUM//tbg2//tbg2//forritGB")
with open(filename) as f:
newdata = dict(zip(*[iter(f)]*2))

Related

Compare lines in two files efficiently in Python

I am trying to compare the two lines and capture the lines that match with each other. For example,
file1.txt contains
my
sure
file2.txt contains
my : 2
mine : 5
sure : 1
and I am trying to output
my : 2
sure : 1
I have the following code so far
inFile = "file1.txt"
dicts = "file2.txt"
with open(inFile) as f:
content = f.readlines()
content = [x.strip() for x in content]
with open(dicts) as fd:
inDict = fd.readlines()
inDict = [x.strip() for x in inDict]
ordered_dict = {}
for line in inDict:
key = line.split(":")[0].strip()
value = int(line.split(":")[1].strip())
ordered_dict[key] = value
for (key, val) in ordered_dict.items():
for entry in content:
if entry == content:
print(key, val)
else:
continue
However, this is very inefficient because it loops two times and iterates a lot. Therefore, this is not ideal when it comes to large files. How can I make this workable for large files?
You don't need nested loops. One loop to read in file2 and translate to a dict, and another loop to read file1 and look up the results.
inFile = "file1.txt"
dicts = "file2.txt"
ordered_dict = {}
with open(dicts) as fd:
for line in fd:
a,b = line.split(' : ')
ordered_dict[a] = b
with open(inFile) as f:
for line in f:
line = line.strip()
if line in ordered_dict:
print( line, ":", ordered_dict[line] )
The first loop can be done as a list comprehension.
with open(dicts) as fd:
ordered_dict = dict( line.strip().split(' : ') for line in fd )
Here is a solution with one for loop:
inFile = "file1.txt"
dicts = "file2.txt"
with open(inFile) as f:
content_list = list(map(str.split,f.readlines()))
with open(dicts) as fd:
in_dict_lines = fd.readlines()
for dline in in_dict_lines:
key,val=dline.split(" : ")
if key in content_list:
ordered_dict[key] = value

filtering lines based on the presence of 2 short sequences in python

I have a text file like this example:
example:
>chr9:128683-128744
GGATTTCTTCTTAGTTTGGATCCATTGCTGGTGAGCTAGTGGGATTTTTTGGGGGGTGTTA
>chr16:134222-134283
AGCTGGAAGCAGCGTGAATAAAACAGAATGGCCGGGACCTTAAAGGCTTTGCTTGGCCTGG
>chr16:134226-134287
GGAAGCAGCGTGGGAATCACAGAATGGACGGCCGATTAAAGGCTTTGCTTGGCCTGGATTT
>chr1:134723-134784
AAGTGATTCACCCTGCCTTTCCGACCTTCCCCAGAACAGAACACGTTGATCGTGGGCGATA
>chr16:135770-135831
GCCTGAGCAAAGGGCCTGCCCAGACAAGATTTTTTAATTGTTTAAAAACCGAATAAATGTT
this file is divided into different parts and every part has 2 rows. the 1st row starts with > (and this row is called ID) and the 2nd row is the sequence of letters.
I want to search for 2 short motif (AATAAA and GGAC) in the sequence of letters and if they contain these motifs, I want to get the the ID and sequence of that part.
but the point is AATAAA should be the 1st sequence and GGAC will come after that. there is a distance between them but this distance can be 2 letters or more.
expected output:
>chr16:134222-134283
AGCTGGAAGCAGCGTGAATAAAACAGAATGGCCGGGACCTTAAAGGCTTTGCTTGGCCTGG
I am trying to do that in python using the following command:
infile = open('infile.txt', 'r')
mot1 = 'AATAAA'
mot2 = 'GGAC'
new = []
for line in range(len(infile)):
if not infile[line].startswith('>'):
for match in pattern.finder(mot1) and pattern.finder(mot2):
new.append(infile[line-1])
with open('outfile.txt', "w") as f:
for item in new:
f.write("%s\n" % item)
this code does not return what I want. do you know how to fix it?
You can group the ID with sequence, and then utilize re.findall:
import re
data = [i.strip('\n') for i in open('filename.txt')]
new_data = [[data[i], data[i+1]] for i in range(0, len(data), 2)]
final_result = [[a, b] for a, b in new_data if re.findall('AATAAA\w{2,}GGAC', b)]
Output:
[['>chr16:134222-134283', 'AGCTGGAAGCAGCGTGAATAAAACAGAATGGCCGGGACCTTAAAGGCTTTGCTTGGCCTGG']]
Not sure I've got your idea about this distance can be 2 letters or more, and is it obligatory to check, but following code gives you desired output:
mot1 = 'AATAAA'
mot2 = 'GGAC'
with open('infile.txt', 'r') as inp:
last_id = None
for line in inp:
if line.startswith('>'):
last_id = line
else:
if mot1 in line and mot2 in line:
print(last_id)
print(line)
You can redirect output to a file if you want
You can use a regex and a dictionary comprehension:
import re
with open('test.txt', 'r') as f:
lines = f.readlines()
data = dict(zip(lines[::2],lines[1::2]))
{k.strip(): v.strip() for k,v in data.items() if re.findall(r'AATAAA\w{2,}GGAC', v)}
Returns:
{'>chr16:134222-134283': 'AGCTGGAAGCAGCGTGAATAAAACAGAATGGCCGGGACCTTAAAGGCTTTGCTTGGCCTGG'}
You may slice the irrelevant part of the string if mot1 is found in it. Here's a way to do it:
from math import ceil
infile = open('infile.txt', 'r')
text = infile.readlines()
infile.close()
mot1 = 'AATAAA'
mot2 = 'GGAC'
check = [(text[x], text[x+1]) for x in range(ceil(len(text)/2))]
result = [(x + '\n' + y) for (x, y) in check if mot1 in y and mot2 in y[(y.find(mot1)+len(mot1)+2):]]
with open('outfile.txt', "w") as f:
for item in result:
f.write("%s\n" % item)
If the file is not too big, you can read it at once, and use re.findall():
import re
with open("infile.txt") as finp:
data=finp.read()
with open('outfile.txt', "w") as f:
for item in re.findall(r">.+?[\r\n\f][AGTC]*?AATAAA[AGTC]{2,}GGAC[AGTC]*", data):
f.write(item+"\n")
"""
+? and *? means non-greedy process;
>.+?[\r\n\f] matches a line starting with '>' and followed by any characters to the end of the line;
[AGTC]*?AATAAA matches any number of A,G,T,C characters, followed by the AATAAA pattern;
[AGTC]{2,} matches at least two or more characters of A,G,T,C;
GGAC matches the GGAC pattern;
[AGTC]* matches the empty string or any number of A,G,T,C characters.
"""

Python: How do I calculate the sum of numbers from a file?

How do I calculate the sum of numbers from a .txt file?
Data in file is formatted as:
7
8
14
18
16
8
23
...
I read the data from the file and assign every line value to 'line' vatiable, but I want to get something like: result = 7+8+14+...
f = open('data.txt', 'r') #LOOP AND READ DATA FROM THE FILE
for line in f:
code
This is most compact code I can think of right now:
(updated to handle the n at the end, thanks, #JonClements!)
with open('file.txt', 'r') as fin:
ans = sum(int(line) for line in fin if line.strip().isnumeric())
For the code structure you have, you can also go for this:
f = open('data.txt', 'r')
ans = 0
for line in f:
try:
ans += int(line.strip())
except ValueError:
pass
Edit:
Since the confusion with the 'n' has been cleared, the first example can be as simple as
with open('file.txt', 'r') as fin:
ans = sum(int(line) for line in fin)
Or even this one-liner:
ans = sum(int(line) for line in open('file.txt', 'r'))
But there are certain risks with file handling, so not strongly recommended.
Keep it simple:
with open('data.txt', 'r') as f:
result = sum(map(int, f))
int is mapped over each line from f, then sum() adds up the resulting integers.
file = open("data.txt", "r")
numbers = []
for line in file:
numbers.append(int(line))
print(sum(numbers))
This basically just creates a list of numbers, where each line is a new entry in the list. Then it shows the sum of the list.
A simple solution is, it will take care of the \n at the end of each line as well, based on steven's and AChamp's suggestion
with open("abc.txt","r")as f:
print(sum(int(x) for x in f))
Here is a solution (consider all of the lines are numbers):
def calculate_number_in_file(file_path):
with open(file_path, 'r') as f:
return sum([int(number.strip()) for number in f.readlines()])
with open ('data.txt', 'r') as f:
data = f.readlines()
sum = 0
for line in data:
sum += int(line.strip())
print(sum)
On smartphone...
with open(filepath) as f:
lines = f.readlines()
numbers = [int(line) for line in lines]
print(sum(numbers))

inverting the order of lines in a list

I'm having some difficulty with writing a program in Python. I would like the program to read lines between a set of characters, reverse the order of the lines and then write them into a new file. The input is:
AN10 G17 G21 G90
N20 '2014_12_08_Banding_Test_4
N30 M3 S1B
N40G00X0.000Y0.000Z17.000
N50 G00X0.001Y0.001Z17.000
N60 G01Z0.000F3900.0
N70 G01X0.251
N80 G01X149.999
N90 G01Y0.251
N100 G01X149.749
N110 G01X149.499Z-8.169
N120 G01X148.249Z-8.173
N130 G01X146.999Z-8.183
N140 G01X145.499Z-8.201
...
N3140 G01Y0.501
So far my code is:
with open('Source.nc') as infile, open('Output.nc', 'w') as outfile:
copy = False
strings_A = ("G01Y", ".251")
strings_B = ("G01Y", ".501")
content = infile.readlines()
for lines in content:
lines.splitlines(1)
if all(x in lines for x in strings_A):
copy = True
elif all(x in lines for x in strings_B):
copy = False
elif copy:
outfile.writelines(reversed(lines))
I think I am failing to understand something about the difference between lines and a multi-multiline string. I would really appreciate some help here!
Thanks in advance, Arthur
A string has multiple lines if it contains newline characters \n.
You can think of a file as either one long string that contains newline characters:
s = infile.read()
Or you can treat it like a list of lines:
lines = infile.readlines()
If you have a multiline string you can split it into a list of lines:
lines = s.splitlines(False)
# which is basically a special form of:
lines = s.split('\n')
If you want to process a file line by line all of the following methods are equivalent (in effect if not in efficiency) :
with open(filename, 'r') as f:
s = f.read()
lines = s.splitlines()
for line in lines:
# do something
pass
with open(filename, 'r') as f:
lines = f.readlines()
for line in lines:
# do something
pass
# this last option is the most pythonic one,
# it uses the fact that any file object can be treated as a list of lines
with open(filename, 'r') as f
for line in f:
# do something
pass
EDIT Now the solution of your problem:
with open('Source.nc') as infile, open('Output.nc', 'w') as outfile:
copy = False
strings_A = ("G01Y", ".251")
strings_B = ("G01Y", ".501")
target_lines = []
for line in infile:
if copy and all(x in line for x in strings_B):
outfile.writelines(reversed(target_lines))
break
if copy:
target_lines.append(line)
if all(x in line for x in strings_A):
copy = True
This will copy all lines between a line that matches all(x in line for x in strings_A) and a line that matches all(x in line for x in strings_B) into the outfile in reversed order. The identifying lines are NOT included in the output (I hope that was the intent).
The order of the if clauses is deliberate to achieve that.
Also be aware that the identification tests (all(x in line for x in strings_A)) you use, work as a substring search not a word match, again I don't know if that was your intent.
EDIT2 In response to comment:
with open('Source.nc') as infile, open('Output.nc', 'w') as outfile:
strings_A = ("G01Y", ".251")
strings_B = ("G01Y", ".501")
do_reverse = False
lines_to_reverse = []
for line in infile:
if all(x in line for x in strings_B):
do_reverse = False
outfile.writelines(reversed(lines_to_reverse))
outfile.writeline(line)
continue
if do_reverse:
lines_to_reverse.append(line)
continue
else:
outfile.writeline(line)
if all(x in line for x in strings_A):
do_reverse = True
lines_to_reverse = []

Change content of multiline file into a list

How can I parse through the following file, and turn each line to an element of a list (there is a whitespace at the beginning of each line) ? Unfortunately I've always sucked at regex :/ So turn this:
32.42.4.120', '32.42.4.127
32.42.5.128', '32.42.5.255
32.42.15.136', '32.42.15.143
32.58.129.0', '32.58.129.7
32.58.131.0', '32.58.131.63
46.7.0.0', '46.7.255.255
into a list :
('32.42.4.120', '32.42.4.127'),
('32.42.5.128', '32.42.5.255'),
('32.42.15.136', '32.42.15.143'),
('32.58.129.0', '32.58.129.7'),
('32.58.131.0', '32.58.131.63'),
How about this? (If I am wrong, at least let me know before down-voting)
>>> x = [tuple(line.strip().split("', '")) for line in open('file')]
>>> x
[('32.42.4.120', '32.42.4.127'), ('32.42.5.128', '32.42.5.255'), ('32.42.15.136', '32.42.15.143'), ('32.58.129.0', '32.58.129.7'), ('32.58.131.0', '32.58.131.63'), ('46.7.0.0', '46.7.255.255')]
no regex needed:
l = []
with open("name_file", "r") as f:
for line in f:
l.append(line.split(", "))
if you want to remove first space and to have tuple you can do:
l = []
with open("name_file", "r") as f:
for line in f:
data = line.split(", ")
l.append((data[0].strip(), data[1].strip()))
l = []
f = open("test_data.txt")
for line in f:
elems = line[1:-1].split("', '")
l.append((elems[0], elems[1]))
f.close()
print l
Output:
[('32.42.4.120', '32.42.4.127'), ('32.42.5.128', '32.42.5.255'), ('32.42.15.136', '32.42.15.143'), ('32.58.129.0', '32.58.129.7'), ('32.58.131.0', '32.58.131.63'), ('46.7.0.0', '46.7.255.25')]

Categories

Resources