So I have a file with letters and numbers related to them in it which is written as a list like this:
a 1
b 2
c 3
d 4
etc
I also have another file with the letters in it and a number of times to multiple them by so its like this:
a 3 b 5
c 6 d 2
so basically it means that I want to get the value of A from the original file and multiply it by 3 and then get B from the other file and multiply it by 5 etc.
I have made a dictionary of the original file but I don't know how to retrieve the number to use it to multiply. python essentially needs to go through the file being used to multipy and then see the A and get the value from the other file that corresponds to it and to then multiply it by 3.
d = {}
with open("numbers.txt") as numbers:
for line in numbers:
(key, val) = line.split()
d[key] = int(val)
print(d)
d = {}
with open("numbers.txt") as numbers:
for line in numbers:
pairs = line.split()
for i in range(0,len(pairs),2):
d[pairs[i]] = int(pairs[i+1])
print(d)
Related
I have two CSV files that I want to compare one looks like this:
"a" 1 6 3 1 8
"b" 15 6 12 5 6
"c" 7 4 1 4 8
"d" 14 8 12 11 4
"e" 1 8 7 13 12
"f" 2 5 4 13 9
"g" 8 6 9 3 3
"h" 5 12 8 2 3
"i" 5 9 2 11 11
"j" 1 9 2 4 9
So "a" possesses the numbers 1,6,3,1,8 etc. The actual CSV file is 1,000s of lines long so you know for efficiency sake when writing the code.
The second CSV file looks like this:
4
15
7
9
2
I have written some code to import these CSV files into lists in python.
with open('winningnumbers.csv', 'rb') as wn:
reader = csv.reader(wn)
winningnumbers = list(reader)
wn1 = winningnumbers[0]
wn2 = winningnumbers[1]
wn3 = winningnumbers[2]
wn4 = winningnumbers[3]
wn5 = winningnumbers[4]
print(winningnumbers)
with open('Entries#x.csv', 'rb') as en:
readere = csv.reader(en)
enl = list(readere)
How would I now search cross reference number 4 so wn1 of CSV file 2 with the first csv file. So that it returns that "b" has wn1 in it. I imported them as a list to see if I could figure out how to do it but just ended up running in circles. I also tried using dict() but had no success.
If I understood you correctly, you want to find the first index (or all indexes) of numbers in entries that are winning. If you want it, you can do that:
with open('winningnumbers.csv', 'rb') as wn:
reader = csv.reader(wn)
winningnumbers = list(reader)
with open('Entries#x.csv', 'rb') as en:
readere = csv.reader(en)
winning_number_index = -1 # Default value which we will print if nothing is found
current_index = 0 # Initial index
for line in readere: # Iterate over entries file
all_numbers_match = True # Default value that will be set to False if any of the elements doesn't match with winningnumbers
for i in range(len(line)):
if line[i] != winningnumbers[i]: # If values of current line and winningnumbers with matching indexes are not equal
all_numbers_match = False # Our default value is set to False
break # Exit "for" without finishing
if all_numbers_match == True: # If our default value is still True (which indicates that all numbers match)
winning_number_index = current_index # Current index is written to winning_number_index
break # Exit "for" without finishing
else: # Not all numbers match
current_index += 1
print(winning_number_index)
This will print the index of the first winning number in entries (if you want all the indexes, write about it in the comments).
Note: this is not the optimal code to solve your problem. It's just easier to undestand and debug if you're not familiar with Python's more advanced features.
You should probably consider not abbreviating your variables. entries_reader takes just a second more to write and 5 seconds less to understand then readere.
This is the variant that is faster, shorter and more memory efficient, but may be harder to understand:
with open('winningnumbers.csv', 'rb') as wn:
reader = csv.reader(wn)
winningnumbers = list(reader)
with open('Entries#x.csv', 'rb') as en:
readere = csv.reader(en)
for line_index, line in enumerate(readere):
if all((line[i] == winningnumbers[i] for i in xrange(len(line)))):
winning_number_index = line_index
break
else:
winning_number_index = -1
print(winning_number_index)
The features that might me unclear are probably enumerate(), any() and using else in for and not in if. Let's go through all of them one by one.
To understand this usage of enumerate, you'll need to understand that syntax:
a, b = [1, 2]
Variables a and b will be assigned according values from the list. In this case a will be 1 and b will be 2. Using this syntax we can do that:
for a, b in [[1, 2], [2, 3], ['spam', 'eggs']]:
# do something with a and b
in each iteration, a and b will be 1 and 2, 2 and 3, 'spam' and 'eggs' accordingly.
Let's assume we have a list a = ['spam', 'eggs', 'potatoes']. enumerate() just returns a "list" like that: [(1, 'spam'), (2, 'eggs'), (3, 'potatoes')]. So, when we use it like that,
for line_index, line in enumerate(readere):
# Do something with line_index and line
line_index will be 1, 2, 3, e.t.c.
any() function accepts a sequence (list, tuple, e.t.c.) and returns True if all the elements in it are equal to True.
Generator expression mylist = [line[i] == winningnumbers[i] for i in range(len(line))] returns a list and is similar to the following:
mylist = []
for i in range(len(line)):
mylist.append(line[i] == winningnumbers[i]) # a == b will return True if a is equal to b
So any will return True only in cases when all the numbers from entry match the winning numbers.
Code in else section of for is called only when for was not interrupted by break, so in our situation it's good for setting a default index to return.
Having duplicate numbers seems illogical but if you want to get the count of matched numbers for each row regardless of index then makes nums a set and sum the times a number from each row is in the set:
from itertools import islice, imap
import csv
with open("in.txt") as f,open("numbers.txt") as nums:
# make a set of all winning nums
nums = set(imap(str.rstrip, nums))
r = csv.reader(f)
# iterate over each row and sum how many matches we get
for row in r:
print("{} matched {}".format(row[0], sum(n in nums
for n in islice(row, 1, None))))
Which using your input will output:
a matched 0
b matched 1
c matched 2
d matched 1
e matched 0
f matched 2
g matched 0
h matched 1
i matched 1
j matched 2
presuming your file is comma separated and you have a number per line in your numbers file.
If you actually want to know which numbers if any are present then you need to iterate over the number and print each one that is in our set:
from itertools import islice, imap
import csv
with open("in.txt") as f, open("numbers.txt") as nums:
nums = set(imap(str.rstrip, nums))
r = csv.reader(f)
for row in r:
for n in islice(row, 1, None):
if n in nums:
print("{} is in row {}".format(n, row[0]))
print("")
But again, I am not sure having duplicate numbers makes sense.
To group the rows based on how many matches, you can use a dict using the sum as the key and appending the first column value:
from itertools import islice, imap
import csv
from collections import defaultdict
with open("in.txt") as f,open("numbers.txt") as nums:
# make a set of all winning nums
nums = set(imap(str.rstrip, nums))
r = csv.reader(f)
results = defaultdict(list)
# iterate over each row and sum how many matches we get
for row in r:
results[sum(n in nums for n in islice(row, 1, None))].append(row[0])
results:
defaultdict(<type 'list'>,
{0: ['a', 'e', 'g'], 1: ['b', 'd', 'h', 'i'],
2: ['c', 'f', 'j']})
The keys are numbers match, the values are the rows ids that matched the n numbers.
abcd
1234.984
5.2 1.33 0.00
0.00 2.00 1.92
0.00 1.22 1.22
1 1 1
asdf
1.512 1.11 1.50
0 0 0
0 0 1.512
Suppose if I have the above in a file named x (no blank lines between each line). What I want to do is read each line and store each of the value (seperated by multiple spaces) in the line in certain variable. Later I want to print (each of the floating point values)/2.12 in the same position in the file.
I was doing the following, but I think I am completely off. I am trying to read each line and use the strip().split() to obtain each value. But I am not able to get it.
f1=open("x","r")
f2=open("y","w")
for i, line in enumerate(f1):
# for line 0 and 1, i wanted to print the lines as such
for i in range(0,1):
print >> f2, i
# from lines 2 to 4 i tried reading each value in each line and store it in a,b,c and finally print
for i in range(2,4):
l=line.strip().split()
a=l[0]
b=l[1]
c=l[2]
print >> f2, a, b, c
if i == 5:
l=line.strip().split()
# I want to store the value (of 1 + 1 + 1), don't know how
t=l[0]
print >> f2, t
if i == 6:
print >> f2, i
for i in range(7,t): # not sure if i can use variable in range
l=line.strip().split()
a=l[0]
b=l[1]
c=l[2]
print >> f2, a, b, c
Any help is appreciated.
Its diffult to understand exactly what you are trying to achieve, but if my guess is correct, you could something like this (I write only about reading the input tile):
all_lines = []
# first read all lines in a file ( I assume file its not too big to do it)
with open('data.csv', 'r') as f:
all_lines = [l.rstrip() for l in f.readlines()]
# then process specific lines as you need.
for l in all_lines[2:4]:
a,b,c = map(float, l.split())
print(a,b,c)
# or save the values in other file
t = sum(map(float,all_lines[5].split()))
print(t)
for l in all_lines[7:7+t]:
a,b,c = map(float, l.split())
print(a,b,c)
# or save the values in other file
I have a dictionary with each key containing multiple values (list):
one = [1,2,3]
two = [1,2,3]
three= [1,2,3]
It was obtained with the following line of code:
output_file.write('{0}\t{1}\n'.format(key,"\t".join(value)))
So my final printed output looks like this:
one 1 2 3
two 1 2 3
three 1 2 3
My goal now is to have the output looking like this instead:
one 1
one 2
one 3
two 1
two 2
…
Any tips?
You can add the key itself as delimiter
#key = "one"
#value = ['1','2','3']
print(key+'\t'+'\n{0}\t'.format(key).join(value))
output
one 1
one 2
one 3
You could do this with nested for-loops:
for key, value_list in my_dict.iteritems():
for value in value_list:
output_file.write("{}\t{}\n".format(key, value))
this may also work...
#key = "one"
#value = ['1','2','3']
print '\n'.join(map(lambda i: key+'\t'+str(i), value))
I am new to python and trying to write my dictionary values to a file using Python 2.7. The values in my Dictionary D is a list with at least 2 items.
Dictionary has key as TERM_ID and
value has format [[DOC42, POS10, POS22], [DOC32, POS45]].
It means the TERM_ID (key) lies in DOC42 at POS10, POS22 positions and it also lies in DOC32 at POS45
So I have to write to a new file in the format: a new line for each TERM_ID
TERM_ID (tab) DOC42:POS10 (tab) 0:POS22 (tab) DOC32:POS45
Following code will help you understand what exactly am trying to do.
for key,valuelist in D.items():
#first value in each list is an ID
docID = valuelist[0][0]
for lst in valuelist:
file.write('\t' + lst[0] + ':' + lst[1])
lst.pop(0)
lst.pop(0)
for n in range(len(lst)):
file,write('\t0:' + lst[0])
lst.pop(0)
The output I get is :
TERM_ID (tab) DOC42:POS10 (tab) 0:POS22
DOC32:POS45
I tried using the new line tag as well as commas to continue file writing on the same line at no of places, but it did not work. I fail to understand how the file write really works.
Any kind of inputs will be helpful. Thanks!
#Falko I could not find a way to attach the text file hence here is my sample data-
879\t3\t1
162\t3\t1
405\t4\t1455
409\t5\t1
13\t6\t15
417\t6\t13
422\t57\t1
436\t4\t1
141\t8\t1
142\t4\t145
170\t8\t1
11\t4\t1
184\t4\t1
186\t8\t14
My sample running code is -
with open('sampledata.txt','r') as sample,open('result.txt','w') as file:
d = {}
#term= ''
#docIndexLines = docIndex.readlines()
#form a d with format [[doc a, pos 1, pos 2], [doc b, poa 3, pos 8]]
for l in sample:
tID = -1
someLst = l.split('\\t')
#if len(someLst) >= 2:
tID = someLst[1]
someLst.pop(1)
#if term not in d:
if not d.has_key(tID):
d[tID] = [someLst]
else:
d[tID].append(someLst)
#read the dionary to generate result file
docID = 0
for key,valuelist in d.items():
file.write(str(key))
for lst in valuelist:
file.write('\t' + lst[0] + ':' + lst[1])
lst.pop(0)
lst.pop(0)
for n in range(len(lst)):
file.write('\t0:' + lst[0])
lst.pop(0)
My Output:
57 422:1
3 879:1
162:1
5 409:1
4 405:1455
436:1
142:145
11:1
184:1
6 13:15
417:13
8 141:1
170:1
186:14
Expected output:
57 422:1
3 879:1 162:1
5 409:1
4 405:1455 436:1 142:145 11:1 184:1
6 13:15 417:13
8 141:1 170:1 186:14
You probably don't get the result you're expecting because you didn't strip the newline characters \n while reading the input data. Try replacing
someLst = l.split('\\t')
with
someLst = l.strip().split('\\t')
To enforce the mentioned line breaks in your output file, add a
file.write('\n')
at the very end of your second outer for loop:
for key,valuelist in d.items():
// ...
file.write('\n')
Bottom line: write never adds a line break. If you do see one in your output file, it's in your data.
so i have this text (wordnet) file made up of numbers and words, for example like this -
"09807754 18 n 03 aristocrat 0 blue_blood 0 patrician"
and i want to read in the first number as a dictionary name (or list) for the words that follow. the layout of this never changes, it is always an 8 digit key followed by a two digit number, a single letter and a two digit number. This last two digit number (03) tells how many words (three words in this case) are associated with the first 8 digit key.
my idea was that i would search for the 14th place in the string and use that number to run a loop to pick in all of the words associated with that key
so i think it would go something like this
with open('nouns.txt','r') as f:
for line in f:
words = range(14,15)
numOfWords = int(words)
while i =< numOfWords
#here is where the problem arises,
#i want to search for words after the spaces 3 (numOfWords) times
#and put them into a dictionary(or list) associated with the key
range(0,7) = {word(i+1), word(i+2)}
Technically i am looking for whichever one of these makes more sense:
09807754 = { 'word1':aristocrat, 'word2':blue_blood , 'word3':patrician }
or
09807754 = ['aristocrat', 'blue_blood', 'patrician']
Obviously this doesnt run but if anyone could give me any pointers it would be greatly appreciated
>>> L = "09807754 18 n 03 aristocrat 0 blue_blood 0 patrician".split()
>>> L[0], L[4::2]
('09807754', ['aristocrat', 'blue_blood', 'patrician'])
>>> D = {}
>>> D.update({L[0]: L[4::2]})
>>> D
{'09807754': ['aristocrat', 'blue_blood', 'patrician']}
For the extra line in your comment, some extra logic is needed
>>> L = "09827177 18 n 03 aristocrat 0 blue_blood 0 patrician 0 013 # 09646208 n 0000".split()
>>> D.update({L[0]: L[4:4 + 2 * int(L[3]):2]})
>>> D
{'09807754': ['aristocrat', 'blue_blood', 'patrician'], '09827177': ['aristocrat', 'blue_blood', 'patrician']}
res = {}
with open('nouns.txt','r') as f:
for line in f:
splited = line.split()
res[splited[0]] = [w for w in splited[4:] if not w.isdigit()]
Output:
{'09807754': ['aristocrat', 'blue_blood', 'patrician']}