Apply 'for' loop by words - python

I have the following .txt file:
a.txt
which shows the following 2 lines of data:
300 100 500 250 150
34984 29220 43640 36410 7980
I need to create a code that creates a dictionary that shows the following result:
A 300
B 100
C 500
D 250
E 150
I have tried with this code, but I cannot separate the figures, nor choose only the first line. Any ideas?
f.read
import string
mayusculas = string.ascii_uppercase
f = open("a.txt", "r")
for i, c in zip(mayusculas, f):
print(i, c)
f.close()
Thank you all.

Preserving your code structure, split() is what you're looking for:
f = '''300 100 500 250 150
34984 29220 43640 36410 7980'''
for i, c in zip(mayusculas, (f.split('\n')[0]).split(' ')):
print(i, c)
Explanation:
f.split('\n'): splits your string into a list, by newline, so you get a two element list
(f.split('\n')[0]).split(' '): I take the first element in your list and I split those by space, getting a five element list with the five elements you need, as stated in your example.
Output:
A 300
B 100
C 500
D 250
E 150

A few notes on your code:
1. Always open files using a context manager
2. You need can read the file one line at a time
3. Use split to break the line, don't feed it any arguments so it can parse tabs and multiple spaces properly.
Putting it all together, your code should look like this:
import string
mayusculas = string.ascii_uppercase
with open("clientes_pibpc.txt", "r") as f:
for i, c in zip(mayusculas, f.readline().split()):
print(i, c)

Just read the first line of the file and then split it, followed by zipping with the uppercase:
with open('a.txt', 'r') as f:
data = f.readline().split()
final_dict = dict(zip(string.ascii_uppercase, data))
final_dict
{'A': '300', 'B': '100', 'C': '500', 'D': '250', 'E': '150'}

Related

how to create 3 different list from txt file with 3 columns in python?

In python IDLE, I want to create 3 lists from 3 columns in the text file which contains the following data set:
1.23 2.01 3.15
52.02 958.02 52.02
15.23 59.45 65.78
75.01 25.26 55.26
65.10 98.23 58.45
I want the output like this:
a = [1.23, 52.02, 15.23, 75.01, 65.10]
b = [2.01, 958.02, 59.45, 25.26, 98.23]
c = [3.15, 52.02, 65.78, 55.26, 58.45]
Assuming the following input:
s = '1.23 2.01 3.15 52.02 958.02 52.02 15.23 59.45 65.78 75.01 25.26 55.26 65.10 98.23 58.45'
You can use itertools.zip_longest in combination with zip and iter:
from itertools import zip_longest
a,b,c = zip(*zip_longest(*[iter(map(float, s.split()))]*3))
If your input is a multiple of 3, you can use only zip:
a,b,c = zip(*zip(*[iter(map(float, s.split()))]*3))
output:
>>> a,b,c
((1.23, 52.02, 15.23, 75.01, 65.1),
(2.01, 958.02, 59.45, 25.26, 98.23),
(3.15, 52.02, 65.78, 55.26, 58.45))
NB. there is actually a recipe for this in itertools documentation (search "grouper")
How it works:
It creates something like zip([<iterator>, <iterator>, <iterator>]]), where iterator is a reference to the same iterator. So each time zip collects a value, it actually takes the next one in the iterator. If the iterator was a list the output shape would be 3 times the input, but as the iterator gets consumed, the final number of elements is conserved (modulo the multiplier).
Firstly you could split them into a table of all on one line with a list comprehension
[line.split(' ') for line in open('s69498849.txt','r').read().split('\n')]
[['1.23', '2.01', '3.15'], ['52.02', '958.02', '52.02'], ['15.23', '59.45', '65.78'], ['75.01', '25.26', '55.26'], ['65.10', '98.23', '58.45']]
Then, just flatten the list
[i for o in [line.split(' ') for line in open('filename.txt','r').read().split('\n')] for i in o]
['1.23', '2.01', '3.15', '52.02', '958.02', '52.02', '15.23', '59.45', '65.78', '75.01', '25.26', '55.26', '65.10', '98.23', '58.45']
And then use slices to get them
data[::1]
['1.23', '52.02', '15.23', '75.01', '65.10']
All together now;
data=[i for o in [line.split(' ') for line in open('filename.txt','r').read().split('\n')] for i in o]
a=data[::1]
b=data[::2]
c=data[::3]
Note: if you want to control or do something to all of them, you could change
line.split(' ') to [line.split(' ')[0],line.split(' ')[1],line.split(' ')[2]]
and then you could do whaterver to them, like converting them to integers with int()
First, you need to open the file:
file = open("file name")
then make lists for each line using and split each one by the white space:
line1 = file[0].split()
line2 = file[1].split()
line3 = file[2].split()
line4 = file[3].split()
line5 = file[4].split()
lines = [line1, line2, line3, line4, line5]
and make your list using the first element in each line:
a = []
for i in lines:
a.append(i[0])
b = []
for i in lines:
a.append(i[1])
c = []
for i in lines:
a.append(i[2])
and done!

How to separate different input formats from the same text file with Python

I'm new to programming and python and I'm looking for a way to distinguish between two input formats in the same input file text file. For example, let's say I have an input file like so where values are comma-separated:
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
Where the format is N followed by N lines of Data1, and M followed by M lines of Data2. I tried opening the file, reading it line by line and storing it into one single list, but I'm not sure how to go about to produce 2 lists for Data1 and Data2, such that I would get:
Data1 = ["Washington,A,10", "New York,B,20", "Seattle,C,30", "Boston,B,20", "Atlanta,D,50"]
Data2 = ["New York,5", "Boston,10"]
My initial idea was to iterate through the list until I found an integer i, remove the integer from the list and continue for the next i iterations all while storing the subsequent values in a separate list, until I found the next integer and then repeat. However, this would destroy my initial list. Is there a better way to separate the two data formats in different lists?
You could use itertools.islice and a list comprehension:
from itertools import islice
string = """
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
"""
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [string.split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
This yields
[['Washington,A,10', 'New York,B,20', 'Seattle,C,30', 'Boston,B,20', 'Atlanta,D,50'], ['New York,5', 'Boston,10']]
For a file, you need to change it to:
with open("testfile.txt", "r") as f:
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [f.read().split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
You're definitely on the right track.
If you want to preserve the original list here, you don't actually have to remove integer i; you can just go on to the next item.
Code:
originalData = []
formattedData = []
with open("data.txt", "r") as f :
f = list(f)
originalData = f
i = 0
while i < len(f): # Iterate through every line
try:
n = int(f[i]) # See if line can be cast to an integer
originalData[i] = n # Change string to int in original
formattedData.append([])
for j in range(n):
i += 1
item = f[i].replace('\n', '')
originalData[i] = item # Remove newline char in original
formattedData[-1].append(item)
except ValueError:
print("File has incorrect format")
i += 1
print(originalData)
print(formattedData)
The following code will produce a list results which is equal to [Data1, Data2].
The code assumes that the number of entries specified is exactly the amount that there is. That means that for a file like this, it will not work.
2
New York,5
Boston,10
Seattle,30
The code:
# get the data from the text file
with open('filename.txt', 'r') as file:
lines = file.read().splitlines()
results = []
index = 0
while index < len(lines):
# Find the start and end values.
start = index + 1
end = start + int(lines[index])
# Everything from the start up to and excluding the end index gets added
results.append(lines[start:end])
# Update the index
index = end

Trying to compare lines of 2 large files and keep lines that match, but no matches

I have a dictionary file for a speech recognition engine that I'm trying to reduce the size of. The dictionary contains 133k+ lines like so:
abella AH B EH L AH
abeln AE B IH L N
abelow AE B AH L OW
abels EY B AH L Z
abelson AE B IH L S AH N
abend AE B EH N D
abend(2) AH B EH N D
I'm trying to reduce it to only hold the most common words and names in the U.S. from a file with 15k+ lines like so:
configurations
poison
james
john
robert
When I run the following script it results in a blank file, as if there no matches between the first token of lines in the dictionary and the lines of the common words dataset. Are my files too big for the way I'm doing this? What am I doing wrong?
import os
file_name = 'small_cmudict-en-us.dict'
f = open(file_name, 'w+')
with open('common_names_words.txt', 'r') as n:
for line in n:
line = line[:-1] #remove newline char
with open('cmudict-en-us.dict', 'r') as d:
for line2 in d:
dict_entry = line2.split()
#words with multiple pronunciations; abend, abend(2)
if dict_entry[0][-3:] == '(':
if dict_entry[0][:-3] in n:
f.write(line)
if dict_entry[0] in n:
f.write(line)
f.close
Thank you for your time.
You have a couple of problems. First, you iterate the entire file but don't save anything
for line in n:
line = line[:-1] #remove newline char
Then you ask if the word you want is in the file that you've already exhausted with that loop
if dict_entry[0] in n:
As soon as you are in the business of checking containment, you should be thinking sets. They provide fast lookup of hashable objects like strings. You can also "normalize" the data by stripping off things like the (2) and deciding on a case to compare. Perhaps both files are already lower case, but I assumed the case can vary.
file_name = 'small_cmudict-en-us.dict'
with open(file_name, 'w+') as f:
with open('common_names_words.txt', 'r') as n:
common = set(line.strip().lower() for line in n)
with open('cmudict-en-us.dict', 'r') as d:
for line2 in d:
# account for e.g, "abend" and "abend(2)"
word = line2.split()[0].split('(')[0].strip().lower()
if word in common:
f.write(line2)
You could also compress that code a bit by using writelines and a generator that filters lines for you.
with open('cmudict-en-us.dict', 'r') as d:
f.writelines(line for line in d
if line.split()[0].split('(')[0].strip().lower() in common)

Dictionary value list to be printed on same line while writing it in a file

I am new to python and trying to write my dictionary values to a file using Python 2.7. The values in my Dictionary D is a list with at least 2 items.
Dictionary has key as TERM_ID and
value has format [[DOC42, POS10, POS22], [DOC32, POS45]].
It means the TERM_ID (key) lies in DOC42 at POS10, POS22 positions and it also lies in DOC32 at POS45
So I have to write to a new file in the format: a new line for each TERM_ID
TERM_ID (tab) DOC42:POS10 (tab) 0:POS22 (tab) DOC32:POS45
Following code will help you understand what exactly am trying to do.
for key,valuelist in D.items():
#first value in each list is an ID
docID = valuelist[0][0]
for lst in valuelist:
file.write('\t' + lst[0] + ':' + lst[1])
lst.pop(0)
lst.pop(0)
for n in range(len(lst)):
file,write('\t0:' + lst[0])
lst.pop(0)
The output I get is :
TERM_ID (tab) DOC42:POS10 (tab) 0:POS22
DOC32:POS45
I tried using the new line tag as well as commas to continue file writing on the same line at no of places, but it did not work. I fail to understand how the file write really works.
Any kind of inputs will be helpful. Thanks!
#Falko I could not find a way to attach the text file hence here is my sample data-
879\t3\t1
162\t3\t1
405\t4\t1455
409\t5\t1
13\t6\t15
417\t6\t13
422\t57\t1
436\t4\t1
141\t8\t1
142\t4\t145
170\t8\t1
11\t4\t1
184\t4\t1
186\t8\t14
My sample running code is -
with open('sampledata.txt','r') as sample,open('result.txt','w') as file:
d = {}
#term= ''
#docIndexLines = docIndex.readlines()
#form a d with format [[doc a, pos 1, pos 2], [doc b, poa 3, pos 8]]
for l in sample:
tID = -1
someLst = l.split('\\t')
#if len(someLst) >= 2:
tID = someLst[1]
someLst.pop(1)
#if term not in d:
if not d.has_key(tID):
d[tID] = [someLst]
else:
d[tID].append(someLst)
#read the dionary to generate result file
docID = 0
for key,valuelist in d.items():
file.write(str(key))
for lst in valuelist:
file.write('\t' + lst[0] + ':' + lst[1])
lst.pop(0)
lst.pop(0)
for n in range(len(lst)):
file.write('\t0:' + lst[0])
lst.pop(0)
My Output:
57 422:1
3 879:1
162:1
5 409:1
4 405:1455
436:1
142:145
11:1
184:1
6 13:15
417:13
8 141:1
170:1
186:14
Expected output:
57 422:1
3 879:1 162:1
5 409:1
4 405:1455 436:1 142:145 11:1 184:1
6 13:15 417:13
8 141:1 170:1 186:14
You probably don't get the result you're expecting because you didn't strip the newline characters \n while reading the input data. Try replacing
someLst = l.split('\\t')
with
someLst = l.strip().split('\\t')
To enforce the mentioned line breaks in your output file, add a
file.write('\n')
at the very end of your second outer for loop:
for key,valuelist in d.items():
// ...
file.write('\n')
Bottom line: write never adds a line break. If you do see one in your output file, it's in your data.

print lists in a file in a special format in python

I have a large list of lists like:
X = [['a','b','c','d','e','f'],['c','f','r'],['r','h','l','m'],['v'],['g','j']]
each inner list is a sentence and the members of these lists are actually the word of this sentences.I want to write this list in a file such that each sentence(inner list) is in a separate line in the file, and each line has a number corresponding to the placement of this inner list(sentence) in the large this. In the case above. I want the output to look like this:
1. a b c d e f
2. c f r
3. r h l m
4.v
5.g j
I need them to be written in this format in a "text" file. Can anyone suggest me a code for it in python?
Thanks
with open('somefile.txt', 'w') as fp:
for i, s in enumerate(X):
print >>fp, '%d. %s' % (i + 1, ' '.join(s))
with open('file.txt', 'w') as f:
i=1
for row in X:
f.write('%d. %s'%(i, ' '.join(row)))
i+=1

Categories

Resources