I'm new to python and programming. I need some help with a python script. There are two files each containing email addresses (more than 5000 lines). Input file contains email addresses that I want to search in the data file(also contains email addresses). Then I want to print the output to a file or display on the console. I search for scripts and was able to modify but I'm not getting the desired results. Can you please help me?
dfile1 (50K lines)
yyy#aaa.com
xxx#aaa.com
zzz#aaa.com
ifile1 (10K lines)
ccc#aaa.com
vvv#aaa.com
xxx#aaa.com
zzz#aaa.com
Output file
xxx#aaa.com
zzz#aaa.com
datafile = 'C:\\Python27\\scripts\\dfile1.txt'
inputfile = 'C:\\Python27\\scripts\\ifile1.txt'
with open(inputfile, 'r') as f:
names = f.readlines()
outputlist = []
with open(datafile, 'r') as fd:
for line in fd:
name = fd.readline()
if name[1:-1] in names:
outputlist.append(line)
else:
print "Nothing found"
print outputlist
New Code
with open(inputfile, 'r') as f:
names = f.readlines()
outputlist = []
with open(datafile, 'r') as f:
for line in f:
name = f.readlines()
if name in names:
outputlist.append(line)
else:
print "Nothing found"
print outputlist
Maybe I'm missing something, but why not use a pair of sets?
#!/usr/local/cpython-3.3/bin/python
data_filename = 'dfile1.txt'
input_filename = 'ifile1.txt'
with open(input_filename, 'r') as input_file:
input_addresses = set(email_address.rstrip() for email_address in input_file.readlines())
with open(data_filename, 'r') as data_file:
data_addresses = set(email_address.rstrip() for email_address in data_file.readlines())
print(input_addresses.intersection(data_addresses))
mitan8 gives the problem you have, but this is what I would do instead:
with open(inputfile, "r") as f:
names = set(i.strip() for i in f)
output = []
with open(datafile, "r") as f:
for name in f:
if name.strip() in names:
print name
This avoids reading the larger datafile into memory.
If you want to write to an output file, you could do this for the second with statement:
with open(datafile, "r") as i, open(outputfile, "w") as o:
for name in i:
if name.strip() in names:
o.write(name)
Here's what I would do:
names=[]
outputList=[]
with open(inputfile) as f:
for line in f:
names.append(line.rstrip("\n")
myEmails=set(names)
with open(outputfile) as fd, open("emails.txt", "w") as output:
for line in fd:
for name in names:
c=line.rstrip("\n")
if name in myEmails:
print name #for console
output.write(name) #for writing to file
I think your issue stems from the following:
name = fd.readline()
if name[1:-1] in names:
name[1:-1] slices each email address so that you skip the first and last characters. While it might be good in general to skip the last character (a newline '\n'), when you load the name database in the "dfile"
with open(inputfile, 'r') as f:
names = f.readlines()
you are including newlines. So, don't slice the names in the "ifile" at all, i.e.
if name in names:
I think you can remove name = fd.readline() since you've already got the line in the for loop. It'll read another line in addition to the for loop, which reads one line every time. Also, I think name[1:-1] should be name, since you don't want to strip the first and last character when searching. with automatically closes the files opened.
PS: How I'd do it:
with open("dfile1") as dfile, open("ifile") as ifile:
lines = "\n".join(set(dfile.read().splitlines()) & set(ifile.read().splitlines())
print(lines)
with open("ofile", "w") as ofile:
ofile.write(lines)
In the above solution, basically I'm taking the union (elements part of both sets) of the lines of both the files to find the common lines.
Related
I'm making a program that takes text from an input file, then you input a file where it copies the already existing file text. Then, I need to replace a few words there and print the count of how many of these words were replaced. This is my code so far, but since with loops close the newly created file, I have no idea how to open it back again for reading and writing and counting. This is my awful code so far:
filename=input("Sisesta tekstifaili nimi: ")
inputFile=open(filename, "r")
b=input("Sisesta uue tekstifaili nimi: ")
uusFail=open(b+".txt", "w+")
f=uusFail
with inputFile as input:
with uusFail as output:
for line in input:
output.write(line)
lines[]
asendus = {'hello':'tere', 'Hello':'Tere'}
with uusFail as infile
for line in infile
for src, target in asendus
line = line, replace(src, target)
lines.append(line)
with uusFail as outfile:
for line in lines:
outfile.write(line)
There are a lot of unnecessary loops in your code. when you read the file, you can treat it as a whole and count the number of occurrences and replace them. Here is a modified version of your code:
infile = input('Enter file name: ')
outfile = input('enter out file: ')
with open(infile) as f:
content = f.read()
asendus = {'hello':'tere', 'Hello':'Tere'}
my_count = 0
for src, target in asendus.items():
my_count += content.count(src)
content = content.replace(src, target)
with open(f'{outfile}.txt','w+' ) as f:
f.write(content)
You need to reopen the file in the second block of code:
with open(b+".txt", "r") as infile:
I've seen really complex answers on this website as how to edit a specific line on a file but I was wondering if there was a simple way to do it?
I want to search for a name in a file, and on the line that I find that name on, I want to add an integer to the end of the line (as it is a score for a quiz). Or could you tell me how I can replace the entirety of the line with new data?
I have tried a lot of coding but either no change is made, or all of the data in the file gets deleted.
I tried this....
with open ('File.py', 'r') as class_file:
for number, line in enumerate(class_file):
if name in line:
s=open('File.py', 'r').readlines()
s[number]=str(data)
class_file=open('File.py', 'w')
class_file.writelines(new_score)
class_file.close()
As well as this function....
def replace (file, line_number, add_score):
s=open(file, 'w')
new_data=line[line_number].replace(line, add_score)
s.write(str(new_data))
s.close()
As well as this...
def replace_score(file_name, line_num, text):
new = open(file_name, 'r').readlines()
new[line_num] = text
adding_score= open(file_name, 'w')
adding_score.writelines(new)
adding_score.close()
But I still can't get it to work.
The last code works if I'm trying to replace the first line, but not the others.
You need to get the content of the file. Close the file. Modify the content and rewrite the file with the modified content. Try the following:
def replace_score(file_name, line_num, text):
f = open(file_name, 'r')
contents = f.readlines()
f.close()
contents[line_num] = text+"\n"
f = open(file_name, "w")
contents = "".join(contents)
f.write(contents)
f.close()
replace_score("file_path", 10, "replacing_text")
This is Tim Osadchiy's code:
def replace_score(file_name, line_num, text):
f = open(file_name, 'r')
contents = f.readlines()
f.close()
contents[line_num] = text+"\n"
f = open(file_name, "w")
contents = "".join(contents)
f.write(contents)
f.close()
replace_score("file_path", 10, "replacing_text")
This code does work but just remember that the line_num will always be one above the actual line number (as it is an index). So if you wanted line 9 then enter 8, not 9. Also, do not forget to put .txt at the end of the file path (I would've commented but do not have a high enough reputation)
I have a text file file that looks like this:
people 0.508931508057 -0.280345656093 -0.0318199105583 -0.189979892892 0.00748802665945 -0.0570929853912 0.0688883067716 0.187604694632 0.114414087961 0.150298183734
well 0.634085165013 -0.130742033765 0.0833007355449 -0.304469830925 0.133714906135 -0.0221626440854 0.062845160898 0.0607120405012 0.0384326647526 -0.0102762686058
it
0.451455675985 -0.0309283486444 -0.233415252863 -0.0273732833795 -0.294310277236 0.324236481567 -0.084486587459 0.340305398253 -0.56250445207 0.00640281538272
but 0.776732251824 0.0216479978956 0.326422159918 0.0654654707123 0.235569019918 0.0792330670559 0.22189299375 0.194232853917 0.102964793215 0.00926554861178
could 0.505766726467 -0.304640132821 0.015043924871 -0.42831149929 0.13475950648 0.0275223466164 0.154347034425 0.443048319277 0.229038343902 -0.209763506494
think 0.734314690035 -0.15352368041 0.383964369466 -0.283262375383 0.000534210123265 0.0452656078196 0.0174349360274 -0.0210130687293 0.0247592836651 0.0930452272721
movie
0.444291696176 -0.110937149049 -0.259525377532 0.00986849685667 -0.311934727067 0.319610517473 -0.0644468651461 0.372562407 -0.572686043624 0.0262434708424
made 0.546164908581 -0.148512160184 0.301391306124 -0.553970562504 -0.0423941756245 -0.0789194920559 -0.0336542251386 0.00929984630184 -0.030340761377 -0.112650323493
way 0.751616772605 -0.345057880564 0.10091886809 -0.147689086912 -0.0721519520719 -0.246317313253 -0.00606560306655 0.0689594126233 0.0468387063595 -0.00900506150062
I want to keep in the file only the lines that contain both a word and a set of values on the same line.
How can I delete the rest of them?
The expected output is:
people 0.508931508057 -0.280345656093 -0.0318199105583 -0.189979892892 0.00748802665945 -0.0570929853912 0.0688883067716 0.187604694632 0.114414087961 0.150298183734
well 0.634085165013 -0.130742033765 0.0833007355449 -0.304469830925 0.133714906135 -0.0221626440854 0.062845160898 0.0607120405012 0.0384326647526 -0.0102762686058
but 0.776732251824 0.0216479978956 0.326422159918 0.0654654707123 0.235569019918 0.0792330670559 0.22189299375 0.194232853917 0.102964793215 0.00926554861178
could 0.505766726467 -0.304640132821 0.015043924871 -0.42831149929 0.13475950648 0.0275223466164 0.154347034425 0.443048319277 0.229038343902 -0.209763506494
think 0.734314690035 -0.15352368041 0.383964369466 -0.283262375383 0.000534210123265 0.0452656078196 0.0174349360274 -0.0210130687293 0.0247592836651 0.0930452272721
made 0.546164908581 -0.148512160184 0.301391306124 -0.553970562504 -0.0423941756245 -0.0789194920559 -0.0336542251386 0.00929984630184 -0.030340761377 -0.112650323493
way 0.751616772605 -0.345057880564 0.10091886809 -0.147689086912 -0.0721519520719 -0.246317313253 -0.00606560306655 0.0689594126233 0.0468387063595 -0.00900506150062
A few ways to solve the problem.
Read the file as CSV. If the column count is 12 and the first column is not a blank string, write it out:
import csv
with open('original.txt','r') as f, open('new.txt','w') as o:
reader = csv.reader(f, delimiter=' ')
writer = csv.writer(o, delimiter=' ')
for row in reader:
if len(row) == 12 and row[0]:
writer.write(row)
Read the file, write out those lines where the first item is a word and there are more than 2 columns:
with open('original.txt', 'r') as f, open('new.txt', 'w') as o:
for line in f:
if line.lstrip().split(' ')[0].isalpha() and len(line.split(' ')) > 2:
o.write(line)
You can implement this very nicely using the fileinput module:
import fileinput
import sys
for line in fileinput.input(inplace=True):
if line.find(' ') > 0:
sys.stdout.write(line)
Note that this modifies all the files given on the command line in-place, i.e. they will be modified.
You can open the file and read it line by line. Each line that contains a letter can be copied to another file.
letters = "abcdefghijklmnopqrstuvwxyz"
letters += letters.upper()
f_in = open("myfile.txt", 'rb')
f_out = open("newfile.txt", 'wb')
for line in f_in:
for letter in letters:
if letter in line:
f_out.write(line)
break
f_in.close()
f_out.close()
Be careful, the example you have given contains only one long line.
This line has to be split into multiple lines.
Try this. Hope this gives you the desired result.
import re
f = open("i1.txt", "r")
lines = f.readlines()
new_list = []
for items in lines:
if re.match("[a-zA-Z]+[\s\d]+", items.strip()) or re.match("[\d.\s]+[a-zA-Z]+[\s\d.]*", items.strip()):
print items.strip()
The following is the output
people 0.508931508057 -0.280345656093 -0.0318199105583 -0.189979892892 0.00748802665945 -0.0570929853912 0.0688883067716 0.187604694632 0.114414087961 0.150298183734
well 0.634085165013 -0.130742033765 0.0833007355449 -0.304469830925 0.133714906135 -0.0221626440854 0.062845160898 0.0607120405012 0.0384326647526 -0.0102762686058
but 0.776732251824 0.0216479978956 0.326422159918 0.0654654707123 0.235569019918 0.0792330670559 0.22189299375 0.194232853917 0.102964793215 0.00926554861178
could 0.505766726467 -0.304640132821 0.015043924871 -0.42831149929 0.13475950648 0.0275223466164 0.154347034425 0.443048319277 0.229038343902 -0.209763506494
think 0.734314690035 -0.15352368041 0.383964369466 -0.283262375383 0.000534210123265 0.0452656078196 0.0174349360274 -0.0210130687293 0.0247592836651 0.0930452272721
made 0.546164908581 -0.148512160184 0.301391306124 -0.553970562504 -0.0423941756245 -0.0789194920559 -0.0336542251386 0.00929984630184 -0.030340761377 -0.112650323493
way 0.751616772605 -0.345057880564 0.10091886809 -0.147689086912 -0.0721519520719 -0.246317313253 -0.00606560306655 0.0689594126233 0.0468387063595 -0.00900506150062
0.34567 yes 0.9876 -0.00606560306655 0.0689594126233 0.0468387063595 -0.00900506150062
Input.txt File
12626232 : Bookmarks
1321121:
126262
Here 126262: can be anything text or digit, so basically will search for last word is : (colon) and delete the entire line
Output.txt File
12626232 : Bookmarks
My Code:
def function_example():
fn = 'input.txt'
f = open(fn)
output = []
for line in f:
if not ":" in line:
output.append(line)
f.close()
f = open(fn, 'w')
f.writelines(output)
f.close()
Problem: When I match with : it remove the entire line, but I just want to check if it is exist in the end of line and if it is end of the line then only remove the entire line.
Any suggestion will be appreciated. Thanks.
I saw as following but not sure how to use it in here
a = "abc here we go:"
print a[:-1]
I believe with this you should be able to achieve what you want.
with open(fname) as f:
lines = f.readlines()
for line in lines:
if not line.strip().endswith(':'):
print line
Here fname is the variable pointing to the file location.
You were almost there with your function. You were checking if : appears anywhere in the line, when you need to check if the line ends with it:
def function_example():
fn = 'input.txt'
f = open(fn)
output = []
for line in f:
if not line.strip().endswith(":"): # This is what you were missing
output.append(line)
f.close()
f = open(fn, 'w')
f.writelines(output)
f.close()
You could have also done if not line.strip()[:-1] == ':':, but endswith() is better suited for your use case.
Here is a compact way to do what you are doing above:
def function_example(infile, outfile, limiter=':'):
''' Filters all lines in :infile: that end in :limiter:
and writes the remaining lines to :outfile: '''
with open(infile) as in, open(outfile,'w') as out:
for line in in:
if not line.strip().endswith(limiter):
out.write(line)
The with statement creates a context and automatically closes files when the block ends.
To search if the last letter is : Do following
if line.strip().endswith(':'):
...Do Something...
You can use a regular expression
import re
#Something end with ':'
regex = re.compile('.(:+)')
new_lines = []
file_name = "path_to_file"
with open(file_name) as _file:
lines = _file.readlines()
new_lines = [line for line in lines if regex.search(line.strip())]
with open(file_name, "w") as _file:
_file.writelines(new_lines)
I've tried to put together a solution from similar questions but have failed miserably. I just don't know enough about Python yet :(
I have an inputlist containing elements in a particular order ex: ["GRE", "KIN", "ERD", "KIN"]
I have a datafile containing the elements, plus other data ex:
"ERD","Data","Data"...
"KIN","Data","Data"...
"FAC","Data","Data"...
"GRE","Data","Data"...
I need to create an outputlist that contains the lines from the datafile in the order they appear in the inputlist.
The code below returns the outputlist in the order the appear in the datafile, which is not the intended behavior... :-\
with open(inputfile, 'r') as f:
names = [line.strip() for line in f]
outputlist = []
with open(datafile, 'r') as f:
for line in f:
name = line.split(',')[0]
if name[1:-1] in names:
outputlist.append(line)
output = open(outputfile, 'w')
output.writelines(outputlist)
How can I have it return the list in the proper order? Thanks in advance for your help :-)
Edit
Thank's to Oscar, this is the solution I implemented:
datafile = 'C:\\testing\\bldglist.txt'
inputfile = 'C:\\testing\\inputlist.txt'
outputfile = "C:\\testing\\output.txt"
with open(inputfile, 'r') as f:
inputlist = [line.strip() for line in f]
def outputList(inputlist, datafile, outputfile):
d = {}
with open(datafile, 'r') as f:
for line in f:
line = line.strip()
key = line.split(',')[0]
d[key] = line
with open(outputfile, 'w') as f:
f.write('"Abbrev","Xcoord","Ycoord"\n')
for key in inputlist:
f.write(d[key] + '\n')
outputList(inputlist, datafile, outputfile)
This is the easy solution. It reads the entire input file into memory as a dictionary of first letter: line. It's then easy to write the lines in the write order.
If the file is very large (gigabytes) or you don't have a lot of memory, there are other ways. But they're not nearly as nice.
I haven't tested this.
import csv
data = {}
with open(datafile) as f:
for line in csv.reader(f):
data[line[0]] = line
with open(outputfile, "w") as f:
f = csv.writer(f)
for entry in inputlist:
f.writerow(data[entry])
Assuming a data file with this format:
"ERD","Data","Data"...
"KIN","Data","Data"...
"FAC","Data","Data"...
"GRE","Data","Data"...
Try this solution:
def outputList(inputlist, datafile, outputfile):
d = {}
with open(datafile, 'r') as f:
for line in f:
line = line.lstrip()
key = line.split(',')[0]
d[key] = line
with open(outputfile, 'w') as f:
for key in inputlist:
f.write(d[key])
Use it like this:
outputList(['"GRE"', '"KIN"', '"ERD"', '"KIN"'],
'/path/to/datafile',
'/path/to/outputfile')
It will write the output file with the expected order.
1) Create a list with the elements you wish to map to. In this case, ["GRE", "KIN", "ERD", "FAC"]
2) Read the file and map (using a dictionary of lists) the first elements.
3) Output to a file.
import csv
out_index=["GRE", "KIN", "ERD", "FAC"]
d={}
with open('/Users/andrew/bin/SO/abcd.txt','r') as fr:
for e in csv.reader(fr):
if e[0] not in d: d[e[0]]=[]
for ea in e[1:]:
d[e[0]].append(ea)
for i in out_index:
print i,":"
for e in d[i]:
print ' ',e
Given this example data:
"ERD","Data-a1","Data-a2"
"KIN","Data-b1","Data-b2"
"FAC","Data-c1","Data-c2"
"GRE","Data-d1","Data-d2"
"ERD","Data-a3","Data-a4"
"GRE","Data-d3","Data-d4"
Output:
GRE :
Data-d1
Data-d2
Data-d3
Data-d4
KIN :
Data-b1
Data-b2
ERD :
Data-a1
Data-a2
Data-a3
Data-a4
FAC :
Data-c1
Data-c2
Done!