Simple way to "mix" respectively two lists in python? - python

I have the following issue:
I need to "mix" respectively two lists in python...
I have this:
names = open('contactos.txt')
numbers = open('numeros.txt')
names1 = []
numbers1= []
for line in numbers:
numberdata = line.strip()
numbers1.append(numberdata)
print numbers1
for line in names:
data = line.strip()
names1.append(data)
print names1
names.close()
numbers.close()
This prints abot 300 numbers first, and the respective 300 names later, what I need to do is to make a new file (txt) that prints the names and the numbers in one line, separated by a comma (,), like this:
Name1,64673635
Name2,63513635
Name3,67867635
Name4,12312635
Name5,78679635
Name6,63457635
Name7,68568635
..... and so on...
I hope you can help me do this, I've tried with "for"s but I'm not sure on how to do it if I'm iterating two lists at once, thank you :)

Utilize zip:
for num, name in zip(numbers, names):
print('{0}, {1}'.format(num, name))

zip will combine the two lists together, letting you write them to a file:
In [1]: l1 = ['one', 'two', 'three']
In [2]: l2 = [1, 2, 3]
In [3]: zip(l1, l2)
Out[3]: [('one', 1), ('two', 2), ('three', 3)]
However you can save yourself a bit of time. In your code, you are iterating over each file separately, creating a list from each. You could also iterate over both at the same time, creating your list in one sweep:
results = []
with open('contactos.txt') as c:
with open('numeros.txt') as n:
for line in c:
results.append([line.strip(), n.readline().strip()])
print results
This uses a with statement (context manager), that essentially handles the closing of files for you. This will iterate through contactos, reading a line from numeros and appending the pair to the list. You can even cut out the list step and write directly to your third file in the same loop:
with open('output.txt', 'w') as output:
with open('contactos.txt', 'r') as c:
with open('numeros.txt', 'r') as n:
for line in c:
output.write('{0}, {1}\n'.format(line.strip(), n.readline().strip()))

A "pythonic" way to mix two lists in this way is the zip function!
names = open('contactos.txt')
numbers = open('numeros.txt')
names1 = []
numbers1= []
for line in numbers:
numberdata = line.strip()
numbers1.append(numberdata)
for line in names:
data = line.strip()
names1.append(data)
names.close()
numbers.close()
for name, number in zip(names1, numbers1):
print '%s, %s' % (name number)
There are other and better ways to print formatted text (e.g. Yuushi's answer). I also like to use the with statement and list comprehensions, e.g.
with open('contactos.txt') as f:
names = [line.strip() for line in f]
with open('numeros.txt') as f:
numbers = [line.strip() for line in f]
for name, number in zip(names, numbers):
print '%s, %s' % (name, number)
Finally, I just want to comment on how you could do it without the zip function. I'm not sure what you want to do if there are a different number of numbers and names, but you can use a for loop like this for the last bit to access the values from both lists in a single for loop:
for i in range(len(numbers)):
print '%s, %s' % (names[i], numbers[i])
This code in particular will throw an exception if there are more names than numbers, so you would probably want to add some code to handle that.

Everything at once:
import itertools
with open('contactos.txt') as names, open('numeros.txt') as numbers:
for num, name in itertools.izip(numbers, names):
print '%s, %s' % (num.strip(), name.strip())
This reads the two files in parallel, rather than reading each file completely into memory one at a time.

Related

Remove duplicates from large list but remove both if it does exist?

So I have a text file like this
123
1234
123
1234
12345
123456
You can see 123 appears twice so both instances should be removed. but 12345 appears once so it stays. My text file is about 70,000 lines.
Here is what I came up with.
file = open("test.txt",'r')
lines = file.read().splitlines() #to ignore the '\n' and turn to list structure
for appId in lines:
if(lines.count(appId) > 1): #if element count is not unique remove both elements
lines.remove(appId) #first instance removed
lines.remove(appId) #second instance removed
writeFile = open("duplicatesRemoved.txt",'a') #output the left over unique elements to file
for element in lines:
writeFile.write(element + "\n")
When I run this I feel like my logic is correct, but I know for a fact the output is suppose to be around 950, but Im still getting 23000 elements in my output so a lot is not getting removed. Any ideas where the bug could reside?
Edit: I FORGOT TO MENTION. An element can only appear twice MAX.
Use Counter from built in collections:
In [1]: from collections import Counter
In [2]: a = [123, 1234, 123, 1234, 12345, 123456]
In [3]: a = Counter(a)
In [4]: a
Out[4]: Counter({123: 2, 1234: 2, 12345: 1, 123456: 1})
In [5]: a = [k for k, v in a.items() if v == 1]
In [6]: a
Out[6]: [12345, 123456]
For your particular problem I will do it like this:
from collections import defaultdict
out = defaultdict(int)
with open('input.txt') as f:
for line in f:
out[line.strip()] += 1
with open('out.txt', 'w') as f:
for k, v in out.items():
if v == 1: #here you use logic suitable for what you want
f.write(k + '\n')
Be careful about removing elements from a list while still iterating over that list. This changes the behavior of the list iterator, and can make it skip over elements, which may be part of your problem.
Instead, I suggest creating a filtered copy of the list using a list comprehension - instead of removing elements that appear more than twice, you would keep elements that appear less than that:
file = open("test.txt",'r')
lines = file.read().splitlines()
unique_lines = [line for line in lines if lines.count(line) <= 2] # if it appears twice or less
with open("duplicatesRemoved.txt", "w") as writefile:
writefile.writelines(unique_lines)
You could also easily modify this code to look for only one occurrence (if lines.count(line) == 1) or for more than two occurrences.
You can count all of the elements and store them in a dictionary:
dic = {a:lines.count(a) for a in lines}
Then remove all duplicated one from array:
for k in dic:
if dic[k]>1:
while k in lines:
lines.remove(k)
NOTE: The while loop here is becaues line.remove(k) removes first k value from array and it must be repeated till there's no k value in the array.
If the for loop is complicated, you can use the dictionary in another way to get rid of duplicated values:
lines = [k for k, v in dic.items() if v==1]

Python - How do i build a dictionary from a text file?

for the class data structures and algorithms at Tilburg University i got a question in an in class test:
build a dictionary from testfile.txt, with only unique values, where if a value appears again, it should be added to the total sum of that productclass.
the text file looked like this, it was not a .csv file:
apples,1
pears,15
oranges,777
apples,-4
oranges,222
pears,1
bananas,3
so apples will be -3 and the output would be {"apples": -3, "oranges": 999...}
in the exams i am not allowed to import any external packages besides the normal: pcinput, math, etc. i am also not allowed to use the internet.
I have no idea how to accomplish this, and this seems to be a big problem in my development of python skills, because this is a question that is not given in a 'dictionaries in python' video on youtube (would be to hard maybe), but also not given in a expert course because there this question would be to simple.
hope you guys can help!
enter code here
from collections import Counter
from sys import exit
from os.path import exists, isfile
##i did not finish it, but wat i wanted to achieve was build a list of the
strings and their belonging integers. then use the counter method to add
them together
## by splitting the string by marking the comma as the split point.
filename = input("filename voor input: ")
if not isfile(filename):
print(filename, "bestaat niet")
exit()
keys = []
values = []
with open(filename) as f:
xs = f.read().split()
for i in xs:
keys.append([i])
print(keys)
my_dict = {}
for i in range(len(xs)):
my_dict[xs[i]] = xs.count(xs[i])
print(my_dict)
word_and_integers_dict = dict(zip(keys, values))
print(word_and_integers_dict)
values2 = my_dict.split(",")
for j in values2:
print( value2 )
the output becomes is this:
[['schijndel,-3'], ['amsterdam,0'], ['tokyo,5'], ['tilburg,777'], ['zaandam,5']]
{'zaandam,5': 1, 'tilburg,777': 1, 'amsterdam,0': 1, 'tokyo,5': 1, 'schijndel,-3': 1}
{}
so i got the dictionary from it, but i did not separate the values.
the error message is this:
28 values2 = my_dict.split(",") <-- here was the error
29 for j in values2:
30 print( value2 )
AttributeError: 'dict' object has no attribute 'split'
I don't understand what your code is actually doing, I think you don't know what your variables are containing, but this is an easy problem to solve in Python. Split into a list, split each item again, and count:
>>> input = "apples,1 pears,15 oranges,777 apples,-4 oranges,222 pears,1 bananas,3"
>>> parts = input.split()
>>> parts
['apples,1', 'pears,15', 'oranges,777', 'apples,-4', 'oranges,222', 'pears,1', 'bananas,3']
Then split again. Behold the list comprehension. This is an idiomatic way to transform a list to another in python. Note that the numbers are strings, not ints yet.
>>> strings = [s.split(',') for s in strings]
>>> strings
[['apples', '1'], ['pears', '15'], ['oranges', '777'], ['apples', '-4'], ['oranges', '222'], ['pears', '1'], ['bananas', '3']]
Now you want to iterate over pairs, and sum all the same fruits. This calls for a dict:
>>> result = {}
>>> for fruit, countstr in pairs:
... if fruit not in result:
... result[fruit] = 0
... result[fruit] += int(countstr)
>>> result
{'pears': 16, 'apples': -3, 'oranges': 999, 'bananas': 3}
This pattern of adding an element if it doesn't exist comes up frequently. You should checkout defaultdict in the collections module. If you use that, you don't even need the if.
Let's walk through what you need to do to. First, check if the file exists and read the contents to a variable. Second, parse each line - you need to split the line on the comma, convert the number from a string to an integer, and then pass the values to a dictionary. In this case I would recommend using defaultdict from collections, but we can also do it with a standard dictionary.
from os.path import exists, isfile
from collections import defaultdict
filename = input("filename voor input: ")
if not isfile(filename):
print(filename, "bestaat niet")
exit()
# this reads the file to a list, removing newline characters
with open(filename) as f:
line_list = [x.strip() for x in f]
# create a dictionary
my_dict = {}
# update the value in the dictionary if it already exists,
# otherwise add it to the dictionary
for line in line_list:
k, v_str = line.split(',')
if k in my_dict:
my_dict[k] += int(v_str)
else:
my_dict[k] = int(v_str)
# print the dictionary
table_str = '{:<30}{}'
print(table_str.format('Item','Count'))
print('='*35)
for k,v in sorted(my_dict.item()):
print(table_str.format(k,v))

Creating a program that compares two lists

I am trying to create a program that checks whether items from one list are not in another. It keeps returning lines saying that x value is not in the list. Any suggestions? Sorry about my code, it's quite sloppy.
Searching Within an Array
Putting .txt files into arrays
with open('Barcodes', 'r') as f:
barcodes = [line.strip() for line in f]
with open('EAN Staging', 'r') as f:
EAN_staging = [line.strip() for line in f]
Arrays
list1 = barcodes
list2 = EAN_staging
Main Code
fixed = -1
for x in list1:
for variable in list1: # Moves along each variable in the list, in turn
if list1[fixed] in list2: # If the term is in the list, then
fixed = fixed + 1
location = list2.index(list1[fixed]) # Finds the term in the list
print ()
print ("Found", variable ,"at location", location) # Prints location of terms
Instead of lists, read the files as sets:
with open('Barcodes', 'r') as f:
barcodes = {line.strip() for line in f}
with open('EAN Staging', 'r') as f:
EAN_staging = {line.strip() for line in f}
Then all you need to do is to calculate the symmetric difference between them:
diff = barcodes - EAN_staging # or barcodes.difference(EAN_stagin)
An extracted example:
a = {1, 2, 3}
b = {3, 4, 5}
print(a - b)
>> {1, 2, 4, 5} # 1, 2 are in a but in b
Note that if you are operating with sets, information about how many times an element is present will be lost. If you care about situations when an element is present in barcodes 3 times, but only 2 times in EAN_staging, you should use Counter from collections.
Your code doesn't seem to quite answer your question. If all you want to do is see which elements aren't shared, I think sets are the way to go.
set1 = set(list1)
set2 = set(list2)
in_first_but_not_in_second = set1.difference(set2) # outputs a set
not_in_both = set1.symmetric_difference(set2) # outputs a set

Change the display of a list took from text file

I have this code wrote in Python:
with open ('textfile.txt') as f:
list=[]
for line in f:
line = line.split()
if line:
line = [int(i) for i in line]
list.append(line)
print(list)
This actually read integers from a text file and put them in a list.But it actually result as :
[[10,20,34]]
However,I would like it to display like:
10 20 34
How to do this? Thanks for your help!
You probably just want to add the items to the list, rather than appending them:
with open('textfile.txt') as f:
list = []
for line in f:
line = line.split()
if line:
list += [int(i) for i in line]
print " ".join([str(i) for i in list])
If you append a list to a list, you create a sub list:
a = [1]
a.append([2,3])
print a # [1, [2, 3]]
If you add it you get:
a = [1]
a += [2,3]
print a # [1, 2, 3]!
with open('textfile.txt') as f:
lines = [x.strip() for x in f.readlines()]
print(' '.join(lines))
With an input file 'textfiles.txt' that contains:
10
20
30
prints:
10 20 30
It sounds like you are trying to print a list of lists. The easiest way to do that is to iterate over it and print each list.
for line in list:
print " ".join(str(i) for i in line)
Also, I think list is a keyword in Python, so try to avoid naming your stuff that.
If you know that the file is not extremely long, if you want the list of integers, you can do it at once (two lines where one is the with open(.... And if you want to print it your way, you can convert the element to strings and join the result via ' '.join(... -- like this:
#!python3
# Load the content of the text file as one list of integers.
with open('textfile.txt') as f:
lst = [int(element) for element in f.read().split()]
# Print the formatted result.
print(' '.join(str(element) for element in lst))
Do not use the list identifier for your variables as it masks the name of the list type.

Python - Importing strings into a list, into another list :)

Basically I want to read strings from a text file, put them in lists three by three, and then put all those three by three lists into another list. Actually let me explain it better :)
Text file (just an example, I can structure it however I want):
party
sleep
study
--------
party
sleep
sleep
-----
study
sleep
party
---------
etc
From this, I want Python to create a list that looks like this:
List1 = [['party','sleep','study'],['party','sleep','sleep'],['study','sleep','party']etc]
But it's super hard. I was experimenting with something like:
test2 = open('test2.txt','r')
List=[]
for line in 'test2.txt':
a = test2.readline()
a = a.replace("\n","")
List.append(a)
print(List)
But this just does horrible horrible things. How to achieve this?
If you want to group the data in size of 3. Assumes your data in the text file is not grouped by any separator.
You need to read the file, sequentially and create a list. To group it you can use any of the known grouper algorithms
from itertools import izip, imap
with open("test.txt") as fin:
data = list(imap(list, izip(*[imap(str.strip, fin)]*3)))
pprint.pprint(data)
[['party', 'sleep', 'study'],
['party', 'sleep', 'sleep'],
['study', 'sleep', 'party']]
Steps of Execution
Create a Context Manager with the file object.
Strip each line. (Remove newline)
Using zip on the iterator list of size 3, ensures that the items are grouped as tuples of three items
Convert tuples to list
Convert the generator expression to a list.
Considering all are generator expressions, its done on a single iteration.
Instead, if your data is separated and grouped by a delimiter ------ you can use the itertools.groupby solution
from itertools import imap, groupby
class Key(object):
def __init__(self, sep):
self.sep = sep
self.count = 0
def __call__(self, line):
if line == self.sep: self.count += 1
return self.count
with open("test.txt") as fin:
data = [[e for e in v if "----------" not in e]
for k, v in groupby(imap(str.strip, fin), key = Key("----------"))]
pprint.pprint(data)
[['party', 'sleep', 'study'],
['party', 'sleep', 'sleep'],
['study', 'sleep', 'party']]
Steps of Execution
Create a Key Class, to increase a counter when ever the separator is encountered. The function call spits out the counter every-time its called apart from conditionally increasing it.
Create a Context Manager with the file object.
Strip each line. (Remove newline)
Group the data using itertools.groupby and using your custom key
Remove the separator from the grouped data and create a list of the groups.
You can try with this:
res = []
tmp = []
for i, line in enumerate(open('file.txt'), 1):
tmp.append(line.strip())
if i % 3 == 0:
res.append(tmp)
tmp = []
print(res)
I've assumed that you don't have the dashes.
Edit:
Here is an example for when you have dashes:
res = []
tmp = []
for i, line in enumerate(open('file.txt')):
if i % 4 == 0:
res.append(tmp)
tmp = []
continue
tmp.append(line.strip())
print(res)
First big problem:
for line in 'test2.txt':
gives you
't', 'e', 's', 't', '2', '.', 't', 'x', 't'
You need to loop through the file you open:
for line in test2:
Or, better:
with open("test2.txt", 'r') as f:
for line in f:
Next, you need to do one of two things:
If the line contains "-----", create a new sub-list (myList.append([]))
Otherwise, append the line to the last sub-list in your list (myList[-1].append(line))
Finally, your print at the end should not be so far indented; currently, it prints for every line, rather than just when the processing is complete.
List.append(a)
print(List)
Perhaps a better structure for your file would be:
party,sleep,study
party,sleep,sleep
...
Now each line is a sub-list:
for line in f:
myList.append(line.split(','))

Categories

Resources