Python - Writing to a new file from another file - python

I wish to have to have the first field (Username) from File1 and the second field(Password) output into a third file which is created during the function but I am unable to do it. :(
The format of the files will always be the same which are:
File 1:
Username:DOB:Firstname:Lastname:::
File2:
Lastname:Password
My current code:
def merge(f1,f2,f3):
with open(f3, "a") as outputFile:
with open(f1) as usernameFile:
for line in usernameFile:
line = line[:-3]
username = line.split(':')
outputFile.write(username[0])
with open(f2) as passwordFile:
for line in passwordFile:
password = line.split(':')
outputFile.write(password[1])
merge('file1.txt', 'file2.txt', 'output.txt')
I want the Username from File1 and the Password from File2 to write to File3 with the layout:
Username:Password
Username:Password
Username:Password
Any help would be appreciated. :)

If the files are identically sorted (i.e. the users appear in the same order in both files), use the tip in this answer to iterate over both files at the same time rather than one after the other in your example.
from itertools import izip
with open(f3, "a") as outputFile:
for line_from_f1, line_from_f2 in izip(open(f1), open(f2)):
username = line_from_f1.split(':')[0]
password = line_from_f1.split(':')[1]
outputfile.write("%s:%s" % (username, password))
If the files are not identically sorted, first create a dictionary with keys lastname and values username from file1. Then create a second dictionary with keys lastname and values password from file2. Then iterate over the keys of either dict and print both values.

This is the minimum change that you would need to do to your code to make it work:
def merge(f1,f2,f3):
with open(f3, "a") as outputFile:
with open(f1) as usernameFile:
for line in usernameFile:
username = line.split(':')[0]
lastname = line.split(':')[3]
outputFile.write(username)
with open(f2) as passwordFile:
for line in passwordFile:
lN, password = line.split(':')
if lN == lastname: outputFile.write(password[1])
merge('file1.txt', 'file2.txt', 'output.txt')
However, this method isn't very good because it reads a file multiple times. I would go ahead and make a dictionary for the second file, with the lastname as a key. Dictionaries are very helpful in these situations. The dictionary can be made apriori as follows:
def makeDict(f2):
dOut = {}
with open(f2) as f:
for l in f:
dOut[ l.split(':')[0] ] = l.split(':')[1]
return dOut
def merge(f1,f2,f3):
pwd = makeDict(f2)
print pwd
with open(f3, "a") as outputFile:
with open(f1) as usernameFile:
for line in usernameFile:
if line.strip() == '': continue
username = line.split(':')[0]
lastname = line.split(':')[3]
if lastname in pwd:
outputFile.write(username + ':' + pwd[lastname] + '\n')
merge('f1.txt', 'f2.txt', 'f3.txt' )
I just ran the following program using the files:
f1.txt
Username0:DOB:Firstname:Lastname0:::
Username1:DOB:Firstname:Lastname1:::
Username2:DOB:Firstname:Lastname2:::
Username3:DOB:Firstname:Lastname3:::
f2.txt
Lastname0:Password0
Lastname1:Password1
Lastname2:Password2
Lastname3:Password3
and got the output:
Username0:Password0
Username1:Password1
Username2:Password2
Username3:Password3
I did add the last line merge(...) and another like which would be used to skip blank lines in the input text, but otherwise, everything should be fine. There wont be any output if the merge(... function isn't called.

Abstract the data extraction from the file i/o, then you can re-use merge() with different extraction functions.
import itertools as it
from operator import itemgetter
from contextlib import contextmanager
def extract(foo):
"""Extract username and password, compose and return output string
foo is a tuple or list
returns str
>>> len(foo) == 2
True
"""
username = itemgetter(0)
password = itemgetter(1)
formatstring = '{}:{}\n'
item1, item2 = foo
item1 = item1.strip().split(':')
item2 = item2.strip().split(':')
return formatstring.format(username(item1), password(item2))
#contextmanager
def files_iterator(files):
"""Yields an iterator that produces lines synchronously from each file
Intended to be used with contextlib.contextmanager decorator.
yields an itertools.izip object
files is a list or tuple of file paths - str
"""
files = map(open, files)
try:
yield it.izip(*files)
finally:
for file in files:
file.close()
def merge(in_files,out_file, extract):
"""Create a new file with data extracted from multiple files.
Data is extracted from the same/equivalent line of each file:
i.e. File1Line1, File2Line1, File3Line1
File1Line2, File2Line2, File3Line2
in_files --> list or tuple of str, file paths
out_file --> str, filepath
extract --> function that returns list or tuple of extracted data
returns none
"""
with files_iterator(in_files) as files, open(out_file, 'w') as out:
out.writelines(map(extract, files))
## out.writelines(extract(lines) for lines in files)
merge(['file1.txt', 'file2.txt'], 'file3.txt', extract)
Files_Iterator is a With Statement Context Manager that allows multiple synchronous file iteration and ensures the files will be closed. Here is a good start for reading - Understanding Python's "with" statement

I would recommend building two dictionaries to represent the data in each file, then write File3 based on that structure:
d1 = {}
with open("File1.txt", 'r') as f:
for line in f:
d1[line.split(':')[3]] = line.split(':')[0]
d2 = {}
with open("File2.txt", 'r') as f:
for line in f:
d2[line.split(':')[0]] = line.split(':')[1]
This will give you two dictionaries that look like this:
d1 = {Lastname: Username}
d2 = {Lastname: Password}
To then write this to File 3, simply run through the keys of either dicitonary:
with open("File3.txt", 'w') as f:
for key in d1:
f.write("{}:{}\n".format(d1[key], d2[key]))
Some things to Note:
If the files don't have all the same values, you'll need to throw in some handling for that (let me know if this is the case and I can toss a few ideas your way
This approach does not preserve any order the files were in
The code assumes that all lines are of the same format. A more complicated file will need some code to handle "odd" lines

Its fine to avoid this if you have identically sorted rows in each file. But, if it gets any more complicated than that, then you should be using pandas for this. With pandas, you can essentially do a join, so, no matter how the rows are ordered in each file, this will work. Its also very concise.
import pandas as pd
df1 = pd.read_csv(f1, sep=':', header=None).ix[:,[0,3]]
df1.columns = ['username', 'lastname']
df2 = pd.read_csv(f2, sep=':', header=None)
df2.columns = ['lastname', 'password']
df3 = pd.merge(df1, df2).ix[:,['username','password']]
df3.to_csv(f3, header=False, index=False, sep=':')
Note that you will also have the option to do outer joins. This is useful, if for some reason, there are usernames without passwords or vice versa in your files.

This is pretty close. Be sure no blank line at end of input files, or add code to skip blank lines when you read.
#!/usr/bin/env python
"""
File 1:
Username:DOB:Firstname:Lastname:::
File2:
Lastname:Password
File3:
Username:Password
"""
def merge(f1,f2,f3):
username_lastname = {}
with open(f3, "a") as outputFile:
with open(f1) as usernameFile:
for line in usernameFile:
user = line.strip().split(':')
print user
username_lastname[user[3]] = user[0] # dict with Lastname as key, Username as value
print username_lastname
with open(f2) as passwordFile:
for line in passwordFile:
lastname_password = line.strip().split(':')
print lastname_password
password = lastname_password[1]
username = username_lastname[lastname_password[0]]
print username, password
out_line = "%s:%s\n" % (username, password)
outputFile.write(out_line)
outputFile.close()
merge('f1.txt', 'f2.txt', 'output.txt')
f1:
Username1:DOB:Firstname:Lastname1:::
Username2:DOB:Firstname:Lastname2:::
Username3:DOB:Firstname:Lastname3:::
f2:
Lastname1:Password1
Lastname2:Password2
Lastname3:Password3
f3:
Username1:Password1
Username2:Password2
Username3:Password3

Related

How to write all the items of a separate lists to a single file

i have made two lists from a database. one list for emails and another for passwords. i am trying to write the lists to a file using. using the following code but only the last items of the lists are getting written to the file
from app import db
from app import Users
filtered_users = []
all_users = Users.query.all()
filtered_users = all_users.copy()
# print(filtered_users)
for user in filtered_users:
`filtered_emails = []`
`filtered_passwords = []`
`filtered_emails.append(user.email)`
`filtered_passwords.append(user.password)`
`# print(filtered_emails, filtered_passwords)`
with open("users.txt", "w") as f:
`for email in filtered_emails:`
`for password in filtered_passwords:`
`#print(email, password)`
`print(email, password, file=f)`
Your loop resets filtered_emails and filtered_passwords on each iteration. You should at least remove that. And at write time you use nested loops (each email against each password) while emails and passwords should be bound.
You should do:
filtered_users = []
all_users = Users.query.all()
filtered_users = all_users.copy()
# print(filtered_users)
for user in filtered_users:
filtered_emails.append(user.email)
filtered_passwords.append(user.password)
# print(filtered_emails, filtered_passwords)
with open("users.txt", "w") as f:
for i, email in enumerate(filtered_emails):
#print(email, password)
print(email, filtered_passwords[i], file=f)
But it would be simpler to write to the file directly:
filtered_users = []
all_users = Users.query.all()
filtered_users = all_users.copy()
# print(filtered_users)
with open("users.txt", "w") as f:
for user in filtered_users:
print(user.email, user.password, file=f)
you can try use 'a' instead of 'w',because of the way of 'w' will delete all information and write from the first line。
filtered_emails = ['xx#xx.com','yyy#yy.com','zzz#zz.com']
filtered_passwords = ['1','2','345']
with open("users.txt", "w") as f:
for item in zip(filtered_emails,filtered_passwords):
f.write(item[0]+' ')
f.write(item[1]+'\n')
Assuming all of the entries are strings.zip returs a tuple of username and passwords.you can use itertools.izip_longest if you want to deal with lists of unequal length
Output:
xx#xx.com 1
yyy#yy.com 2
zzz#zz.com 345

How to edit specific line for all text files in a folder by python?

Here below is my code about how to edit text file.
Since python can't just edit a line and save it at the same time,
I save the previous text file's content into a list first then write it out.
For example,if there are two text files called sample1.txt and sample2.txt in the same folder.
Sample1.txt
A for apple.
Second line.
Third line.
Sample2.txt
First line.
An apple a day.
Third line.
Execute python
import glob
import os
#search all text files which are in the same folder with python script
path = os.path.dirname(os.path.abspath(__file__))
txtlist = glob.glob(path + '\*.txt')
for file in txtlist:
fp1 = open(file, 'r+')
strings = [] #create a list to store the content
for line in fp1:
if 'apple' in line:
strings.append('banana\n') #change the content and store into list
else:
strings.append(line) #store the contents did not be changed
fp2 = open (file, 'w+') # rewrite the original text files
for line in strings:
fp2.write(line)
fp1.close()
fp2.close()
Sample1.txt
banana
Second line.
Third line.
Sample2.txt
First line.
banana
Third line.
That's how I edit specific line for text file.
My question is : Is there any method can do the same thing?
Like using the other functions or using the other data type rather than list.
Thank you everyone.
Simplify it to this:
with open(fname) as f:
content = f.readlines()
content = ['banana' if line.find('apple') != -1 else line for line in content]
and then write value of content to file back.
Instead of putting all the lines in a list and writing it, you can read it into memory, replace, and write it using same file.
def replace_word(filename):
with open(filename, 'r') as file:
data = file.read()
data = data.replace('word1', 'word2')
with open(filename, 'w') as file:
file.write(data)
Then you can loop through all of your files and apply this function
The built-in fileinput module makes this quite simple:
import fileinput
import glob
with fileinput.input(files=glob.glob('*.txt'), inplace=True) as files:
for line in files:
if 'apple' in line:
print('banana')
else:
print(line, end='')
fileinput redirects print into the active file.
import glob
import os
def replace_line(file_path, replace_table: dict) -> None:
list_lines = []
need_rewrite = False
with open(file_path, 'r') as f:
for line in f:
flag_rewrite = False
for key, new_val in replace_table.items():
if key in line:
list_lines.append(new_val+'\n')
flag_rewrite = True
need_rewrite = True
break # only replace first find the words.
if not flag_rewrite:
list_lines.append(line)
if not need_rewrite:
return
with open(file_path, 'w') as f:
[f.write(line) for line in list_lines]
if __name__ == '__main__':
work_dir = os.path.dirname(os.path.abspath(__file__))
txt_list = glob.glob(work_dir + '/*.txt')
replace_dict = dict(apple='banana', orange='grape')
for txt_path in txt_list:
replace_line(txt_path, replace_dict)

how to join incorporate splitted lines with replacing data from a file into the same string

So as most of us are thinking it's a duplicate which is not, so what I'm trying to achieve is let's say there is a Master string like the below and couple of files mentioned in it then we need to open the files and check if there are any other files included in it, if so we need to copy that into the line where we fetched that particular text.
Master String:
Welcome
How are you
file.txt
everything alright
signature.txt
Thanks
file.txt
ABCDEFGHtele.txt
tele.txt
IJKL
signature.txt
SAK
Output:
Welcome
How are you
ABCD
EFGH
IJKL
everything alright
SAK
Thanks
for msplitin [stext.split('\n')]:
for num, items in enumerate(stext,1):
if items.strip().startswith("here is") and items.strip().endswith(".txt"):
gmsf = open(os.path.join(os.getcwd()+"\txt", items[8:]), "r")
gmsfstr = gmsf.read()
newline = items.replace(items, gmsfstr)
How to join these replace items in the same string format.
Also, any idea on how to re-iterate the same function until there are no ".txt". So, once the join is done there might be other ".txt" inside a ".txt.
Thanks for your help in advance.
A recursive approach that works with any level of file name nesting:
from os import linesep
def get_text_from_file(file_path):
with open(file_path) as f:
text = f.read()
return SAK_replace(text)
def SAK_replace(s):
lines = s.splitlines()
for index, l in enumerate(lines):
if l.endswith('.txt'):
lines[index] = get_text_from_file(l)
return linesep.join(lines)
You can try:
s = """Welcome
How are you
here is file.txt
everything alright
here is signature.txt
Thanks"""
data = s.split("\n")
match = ['.txt']
all_matches = [s for s in data if any(xs in s for xs in match)]
for index, item in enumerate(data):
if item in all_matches:
data[index] ="XYZ"
data = "\n".join(data)
print data
Output:
Welcome
How are you
XYZ
everything alright
XYZ
Thanks
Added new requirement:
def file_obj(filename):
fo = open(filename,"r")
s = fo.readlines()
data = s.split("\n")
match = ['.txt']
all_matches = [s for s in data if any(xs in s for xs in match)]
for index, item in enumerate(data):
if item in all_matches:
file_obj(item)
data[index] ="XYZ"
data = "\n".join(data)
print data
file_obj("first_filename")
We can create temporary file object and keep the replaced line in that temporary file object and once everything line is processed then we can replace with the new content to original file. This temporary file will be deleted automatically once its come out from the 'with' statement.
import tempfile
import re
file_pattern = re.compile(ur'(((\w+)\.txt))')
original_content_file_name = 'sample.txt'
"""
sample.txt should have this content.
Welcome
How are you
here is file.txt
everything alright
here is signature.txt
Thanks
"""
replaced_file_str = None
def replace_file_content():
"""
replace the file content using temporary file object.
"""
def read_content(file_name):
# matched file name is read and returned back for replacing.
content = ""
with open(file_name) as fileObj:
content = fileObj.read()
return content
# read the file and keep the replaced text in temporary file object(tempfile object will be deleted automatically).
with open(original_content_file_name, 'r') as file_obj, tempfile.NamedTemporaryFile() as tmp_file:
for line in file_obj.readlines():
if line.strip().startswith("here is") and line.strip().endswith(".txt"):
file_path = re.search(file_pattern, line).group()
line = read_content(file_path) + '\n'
tmp_file.write(line)
tmp_file.seek(0)
# assign the replaced value to this variable
replaced_file_str = tmp_file.read()
# replace with new content to the original file
with open(original_content_file_name, 'w+') as file_obj:
file_obj.write(replaced_file_str)
replace_file_content()

Change log file with index

I have a file which contains a user:
Sep 15 04:34:31 li146-252 sshd[13326]: Failed password for invalid user ronda from 212.58.111.170 port 42579 ssh2
Trying to use index method for string to edit the user within the file. So far I am able to print the user but now to delete and put in the new user.
newuser = 'PeterB'
with open ('test.txt') as file:
for line in file.readlines():
lines = line.split()
string = ' '.join(lines)
print string.index('user')+1
Do you want to update the file contents? If so, you can update the user name, but you will need to rewrite the file, or write to a second file (for safety):
keyword = 'user'
newuser = 'PeterB'
with open('test.txt') as infile, open('updated.txt', 'w') as outfile:
for line in infile.readlines():
words = line.split()
try:
index = words.index(keyword) + 1
words[index] = newuser
outfile.write('{}\n'.format(' '.join(words)))
except (ValueError, IndexError):
outfile.write(line) # no keyword, or keyword at end of line
Note that this code assumes that each word in the output file is to be separated by a single space.
Also note that this code does not drop lines that do not contain the keyword in them (as do other solutions).
If you want to preserve the original whitespace, regular expressions are very handy, and the resulting code is comparatively simple:
import re
keyword = 'user'
newuser = 'PeterB'
pattern = re.compile(r'({}\s+)(\S+)'.format(keyword))
with open('test.txt') as infile, open('updated.txt', 'w') as outfile:
for line in infile:
outfile.write(pattern.sub(r'\1{}'.format(newuser), line))
If you want to change the names in your log, here is how.
file = open('tmp.txt', 'r')
new_file = []
for line in file.readlines(): # read the lines
line = (line.split(' '))
line[10] = 'vader' # edit the name
new_file.append(' '.join(line)) # store the changes to a variable
file = open('tmp.txt', 'w') # write the new log to file
[file.writelines(line) for line in new_file]

Python search a file for text using input from another file

I'm new to python and programming. I need some help with a python script. There are two files each containing email addresses (more than 5000 lines). Input file contains email addresses that I want to search in the data file(also contains email addresses). Then I want to print the output to a file or display on the console. I search for scripts and was able to modify but I'm not getting the desired results. Can you please help me?
dfile1 (50K lines)
yyy#aaa.com
xxx#aaa.com
zzz#aaa.com
ifile1 (10K lines)
ccc#aaa.com
vvv#aaa.com
xxx#aaa.com
zzz#aaa.com
Output file
xxx#aaa.com
zzz#aaa.com
datafile = 'C:\\Python27\\scripts\\dfile1.txt'
inputfile = 'C:\\Python27\\scripts\\ifile1.txt'
with open(inputfile, 'r') as f:
names = f.readlines()
outputlist = []
with open(datafile, 'r') as fd:
for line in fd:
name = fd.readline()
if name[1:-1] in names:
outputlist.append(line)
else:
print "Nothing found"
print outputlist
New Code
with open(inputfile, 'r') as f:
names = f.readlines()
outputlist = []
with open(datafile, 'r') as f:
for line in f:
name = f.readlines()
if name in names:
outputlist.append(line)
else:
print "Nothing found"
print outputlist
Maybe I'm missing something, but why not use a pair of sets?
#!/usr/local/cpython-3.3/bin/python
data_filename = 'dfile1.txt'
input_filename = 'ifile1.txt'
with open(input_filename, 'r') as input_file:
input_addresses = set(email_address.rstrip() for email_address in input_file.readlines())
with open(data_filename, 'r') as data_file:
data_addresses = set(email_address.rstrip() for email_address in data_file.readlines())
print(input_addresses.intersection(data_addresses))
mitan8 gives the problem you have, but this is what I would do instead:
with open(inputfile, "r") as f:
names = set(i.strip() for i in f)
output = []
with open(datafile, "r") as f:
for name in f:
if name.strip() in names:
print name
This avoids reading the larger datafile into memory.
If you want to write to an output file, you could do this for the second with statement:
with open(datafile, "r") as i, open(outputfile, "w") as o:
for name in i:
if name.strip() in names:
o.write(name)
Here's what I would do:
names=[]
outputList=[]
with open(inputfile) as f:
for line in f:
names.append(line.rstrip("\n")
myEmails=set(names)
with open(outputfile) as fd, open("emails.txt", "w") as output:
for line in fd:
for name in names:
c=line.rstrip("\n")
if name in myEmails:
print name #for console
output.write(name) #for writing to file
I think your issue stems from the following:
name = fd.readline()
if name[1:-1] in names:
name[1:-1] slices each email address so that you skip the first and last characters. While it might be good in general to skip the last character (a newline '\n'), when you load the name database in the "dfile"
with open(inputfile, 'r') as f:
names = f.readlines()
you are including newlines. So, don't slice the names in the "ifile" at all, i.e.
if name in names:
I think you can remove name = fd.readline() since you've already got the line in the for loop. It'll read another line in addition to the for loop, which reads one line every time. Also, I think name[1:-1] should be name, since you don't want to strip the first and last character when searching. with automatically closes the files opened.
PS: How I'd do it:
with open("dfile1") as dfile, open("ifile") as ifile:
lines = "\n".join(set(dfile.read().splitlines()) & set(ifile.read().splitlines())
print(lines)
with open("ofile", "w") as ofile:
ofile.write(lines)
In the above solution, basically I'm taking the union (elements part of both sets) of the lines of both the files to find the common lines.

Categories

Resources