Related
I'm new to Python and kind of pulling my hair out here. I've tried for several things for a few hours and no luck.
I think it's fairly simple, hopefully. I'm trying to search for names from file1 in file2 by stripping the newline character after being read. Then matching. If found I'm trying to write the whole line from file2 to file3. If nothing found then write just the name to file3.
File1:
Abigail
Alexa
Jamie
File2:
Abigail,infoA,infoB,InfoC
John,infoA,infoB,InfoC
Jamie,infoA,infoB,InfoC
File3:
Abigail,infoA,infoB,InfoC
Alexa
Jamie,infoA,infoB,InfoC
Test Data file1:
abigail
anderson
jan
jane
jancith
larry
bob
bobbie
shirley
sharon
Test Data file2:
abigail,infoA,infoB,infoC
anderson,infoA,infoB,infoC
jan,infoA,infoB,infoC
jancith,infoA,infoB,infoC
larry,infoA,infoB,infoC
bob,infoA,infoB,infoC
bobbie,infoA,infoB,infoC
sharon,infoA,infoB,infoC
This version worked but only read and wrote the first instance.
import re
f1 = open("file1.txt", "r")
f2 = open("file2.txt", "r")
f3 = open("file3.txt", "w")
for nameinfo in f1:
nameinfo = nameinfo.rstrip()
for listinfo in f2:
if re.search(nameinfo, listinfo):
f3.write(listinfo)
else
file3.write(nameinfo)
This version worked but it wrote the name (that had no match) over and over while looping between matches.
import re
f1 = open("file1.txt", "r")
f2 = open("file2.txt", "r")
f3 = open("file3.txt", "w")
list2 = file2.readlines()
for nameinfo in file1:
nameinfo = gameInfo.rstrip()
for listinfo in list2:
if re.search(nameinfo, listinfo):
file3.write(listinfo)
else
file3.write(nameinfo)
Is it possible to use simple basic loop commands to achieve the desired results? Help with learning would be greatly appreciated. I see many examples that look incredibly complex or kind of hard to follow. I'm just starting out so simple basic methods would be best in learning the basics.
The reason your second solution keeps writing the unfound name is because it searches each line of file2.txt looking for a match and adds to file3.txt each time.
What you can do instead is introduce a new variable to store the value you want to add to file3.txt and then outside of the loop is when you actually append that value to your file.
Here is a working example:
import re
# note the .read().split('\n') this creates a list with each line as an item in the list
f1 = open("file1.txt", "r").read().split('\n')
f2 = open("file2.txt", "r").read().split('\n')
f3 = open("file3.txt", "w")
for name in f1:
# Edit: don't add aditional new line
if name == '':
continue
f3_text = name
for line in f2:
# if we find a match overwrite the name value in f3_text
# EDIT 2: don't match on partial names
# These are called fstrings if you haven't seen them before
# EDIT 3: using a regex allows us to use the ^ character which means start of line
# That way ron doesn't match with Sharon
if re.search(rf"^{name},", line):
f3_text = line
# at this point f3_text is just the name if we never
# found a match or the entire line if a match was found
f3.write(f3_text + '\n')
Edit:
The reason for the additional new line is if you look at f1 you will see it is actually 4 lines
f1 = ['Abigail', 'Alexa', 'Jamie', '']
Meaning the outside for loop is ran 4 times and on the last iteration f3_text = '' which causes an additional new line is appended. I added a check to the for loop to account for this.
You can also write it in pure Python without using the regex module (if you don't wanna learn it's minilanguage):
with open("file1.txt", "r") as f:
names = f.readlines()
with open("file2.txt", "r") as f:
lines = f.readlines()
names = [name.strip() for name in names] #strip of all other unwanted characters
with open("file3.txt", "w") as f:
for name in names:
to_write = name + '\n'
for line in lines:
if name in line: #If we find a match rewrite 'to_write' variable adn Break the for loop
to_write = line
break
f.write(to_write)
I'm trying to extract specific lines from a 4.7 GB text file into another text file.
I'm pretty new to python 3.7.1 and this was the best code I could come up with.
Here is a sample of what the text file looks like:
C00629618|N|TER|P|201701230300133512|15C|IND|DOE, JOHN A|PLEASANTVILLE|WA|00000|PRINCIPAL|DOUBLE NICKEL ADVISORS|01032017|40|H6CA34245|SA01251735122|1141239|||2012520171368850783
C00501197|N|M2|P|201702039042410893|15|IND|DOE, JANE|THE LODGE|GA|00000|UNUM|SVP, CORPORATE COMMUNICATIONS|01312017|230||PR1890575345050|1147350||P/R DEDUCTION ($115.00 BI-WEEKLY)|4020820171370029335
C00177436|N|M2|P|201702039042410893|15|IND|DOE, JOHN|RED ROOM|ME|00000|UNUM|SVP, DEPUTY GENERAL COUNSEL, BUSINESS|01312017|384||PR2260663445050|1147350||P/R DEDUCTION ($192.00 BI-WEEKLY)|4020820171370029336
C00177436|N|M2|P|201702039042410895|15|IND|PALMER, LAURA|TWIN PEAKS|WA|00000|UNUM|EVP, GLOBAL SERVICES|01312017|384||PR2283905245050|1147350||P/R DEDUCTION ($192.00 BI-WEEKLY)|4020820171370029342
C00501197|N|M2|P|201702039042410894|15|IND|COOPER, DALE|TWIN PEAKS|WA|00000|UNUM|SVP, CORP MKTG & PUBLIC RELAT.|01312017|384||PR2283904845050|1147350||P/R DEDUCTION ($192.00 BI-WEEKLY)|4020820171370029339
And this is the code I've written:
import re
with open("data.txt", 'r') as rf:
for line in rf:
field_match = re.match('^(.*):(.*)$',line)
if field_match :
(key) = field_match.groups()
if key == "C00501197" :
print(rec.split('|'))
with open('extracted_data.txt','w') as wf:
wf.write(line)
I need to extract full lines that contain the id C00501197 and then have the program write those extracted lines into another txt file, but as of now it's only extracting one line and that line doesn't begin with the id I want extracted.
Don't use regex if you can avoid it. csv is a good choice, or use simple string manipulation.
ans = []
with open('data.txt') as rf:
for line in rf:
line = line.strip()
if line.startswith("C00501197"):
ans.append(line)
with open('extracted_data.txt', 'w') as wf:
for line in ans:
wf.write(line)
Your output code was a bit busted as well - always wrote out the last line in the file, not the selected records.
You should implement the built in csv module that comes standard with python. It can easily parse each line into a list. Try something like this:
import csv
with open('text.txt', 'r') as file:
my_reader = csv.reader(file, delimiter='|')
for row in my_reader:
if row[0] == 'C00501197':
print(row)
This should output the lines you want. You can then do whatever you want to process them, and save them again.
You don't need to pass through regex, just split the line based on separator and check the nth field you're interested in:
found_lines = []
with open("data.txt", 'r') as rf:
for line_file in rf:
line = line_file.split("|")
if line[0] == "C00501197" :
found_lines.append( line )
with open('extracted_data.txt','w') as wf:
for found_line in found_lines :
wf.write("|".join(map(str,found_line)))
This should work.
input.txt
hi all.i hope all are doing well?
please help with the solution.i tried all the possible solution?
Expected o/p:
['hi all','i hope all are doing well,
'please help with the solution','i tried all the possible solution']
For a quick solution, you can try the following code. But I would suggest you to look into python's re module. Its pretty handy.
import re
result = []
pattern = re.compile('[,.?]')
with open('input.txt') as f:
for line in f.readlines():
r = re.split(pattern, line.strip())
if r:
result.append(r)
print result
Another Simple technique for this problem is, Use Split method. I have tried to solve it as simple as possible.
file = open("input.txt", "r")
Text = file.read()
Text = Text.replace(".", ",").replace("?", ",")
Text.split(",")
Text = Text[:-1]
I have a text file that looks like:
ABC
DEF
How can I read the file into a single-line string without newlines, in this case creating a string 'ABCDEF'?
For reading the file into a list of lines, but removing the trailing newline character from each line, see How to read a file without newlines?.
You could use:
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
Or if the file content is guaranteed to be one-line
with open('data.txt', 'r') as file:
data = file.read().rstrip()
In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:
from pathlib import Path
txt = Path('data.txt').read_text()
and then you can use str.replace to remove the newlines:
txt = txt.replace('\n', '')
You can read from a file in one line:
str = open('very_Important.txt', 'r').read()
Please note that this does not close the file explicitly.
CPython will close the file when it exits as part of the garbage collection.
But other python implementations won't. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951
To join all lines into a string and remove new lines, I normally use :
with open('t.txt') as f:
s = " ".join([l.rstrip("\n") for l in f])
with open("data.txt") as myfile:
data="".join(line.rstrip() for line in myfile)
join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.
This can be done using the read() method :
text_as_string = open('Your_Text_File.txt', 'r').read()
Or as the default mode itself is 'r' (read) so simply use,
text_as_string = open('Your_Text_File.txt').read()
I'm surprised nobody mentioned splitlines() yet.
with open ("data.txt", "r") as myfile:
data = myfile.read().splitlines()
Variable data is now a list that looks like this when printed:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
Note there are no newlines (\n).
At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:
for line in data:
print(line)
It's hard to tell exactly what you're after, but something like this should get you started:
with open ("data.txt", "r") as myfile:
data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.
with open("myfile.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
Here are four codes for you to choose one:
with open("my_text_file.txt", "r") as file:
data = file.read().replace("\n", "")
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().split("\n"))
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().splitlines())
or
with open("my_text_file.txt", "r") as file:
data = "".join([line for line in file])
you can compress this into one into two lines of code!!!
content = open('filepath','r').read().replace('\n',' ')
print(content)
if your file reads:
hello how are you?
who are you?
blank blank
python output
hello how are you? who are you? blank blank
You can also strip each line and concatenate into a final string.
myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
data = data + line.strip();
This would also work out just fine.
This is a one line, copy-pasteable solution that also closes the file object:
_ = open('data.txt', 'r'); data = _.read(); _.close()
f = open('data.txt','r')
string = ""
while 1:
line = f.readline()
if not line:break
string += line
f.close()
print(string)
python3: Google "list comprehension" if the square bracket syntax is new to you.
with open('data.txt') as f:
lines = [ line.strip('\n') for line in list(f) ]
Oneliner:
List: "".join([line.rstrip('\n') for line in open('file.txt')])
Generator: "".join((line.rstrip('\n') for line in open('file.txt')))
List is faster than generator but heavier on memory. Generators are slower than lists and is lighter for memory like iterating over lines. In case of "".join(), I think both should work well. .join() function should be removed to get list or generator respectively.
Note: close() / closing of file descriptor probably not needed
Have you tried this?
x = "yourfilename.txt"
y = open(x, 'r').read()
print(y)
To remove line breaks using Python you can use replace function of a string.
This example removes all 3 types of line breaks:
my_string = open('lala.json').read()
print(my_string)
my_string = my_string.replace("\r","").replace("\n","")
print(my_string)
Example file is:
{
"lala": "lulu",
"foo": "bar"
}
You can try it using this replay scenario:
https://repl.it/repls/AnnualJointHardware
I don't feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with '' you ended up creating a list. If you have a variable of x and print it out just by
x
or print(x)
or str(x)
You will see the entire list with the brackets. If you call each element of the (array of sorts)
x[0]
then it omits the brackets. If you use the str() function you will see just the data and not the '' either.
str(x[0])
Maybe you could try this? I use this in my programs.
Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()
Regular expression works too:
import re
with open("depression.txt") as f:
l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]
print (l)
['I', 'feel', 'empty', 'and', 'dead', 'inside']
with open('data.txt', 'r') as file:
data = [line.strip('\n') for line in file.readlines()]
data = ''.join(data)
from pathlib import Path
line_lst = Path("to/the/file.txt").read_text().splitlines()
Is the best way to get all the lines of a file, the '\n' are already stripped by the splitlines() (which smartly recognize win/mac/unix lines types).
But if nonetheless you want to strip each lines:
line_lst = [line.strip() for line in txt = Path("to/the/file.txt").read_text().splitlines()]
strip() was just a useful exemple, but you can process your line as you please.
At the end, you just want concatenated text ?
txt = ''.join(Path("to/the/file.txt").read_text().splitlines())
This works:
Change your file to:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
Then:
file = open("file.txt")
line = file.read()
words = line.split()
This creates a list named words that equals:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
That got rid of the "\n". To answer the part about the brackets getting in your way, just do this:
for word in words: # Assuming words is the list above
print word # Prints each word in file on a different line
Or:
print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space
This returns:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
with open(player_name, 'r') as myfile:
data=myfile.readline()
list=data.split(" ")
word=list[0]
This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.
Than you can easily access any word, or even store it in a string.
You can also do the same thing with using a for loop.
file = open("myfile.txt", "r")
lines = file.readlines()
str = '' #string declaration
for i in range(len(lines)):
str += lines[i].rstrip('\n') + ' '
print str
Try the following:
with open('data.txt', 'r') as myfile:
data = myfile.read()
sentences = data.split('\\n')
for sentence in sentences:
print(sentence)
Caution: It does not remove the \n. It is just for viewing the text as if there were no \n
From an input file I'm suppose to extract only first name of the student and then save the result in a new file called "student-‐firstname.txt" The output file should contain a list of
first names (not include middle name). I was able to get delete of the last name but I'm having problem deleting the middle name any help or suggestion?
the student name in the file look something like this (last name, first name, and middle initial)
Martin, John
Smith, James W.
Brown, Ashley S.
my python code is:
f=open("studentname.txt", 'r')
f2=open ("student-firstname.txt",'w')
str = ''
for line in f.readlines():
str = str + line
line=line.strip()
token=line.split(",")
f2.write(token[1]+"\n")
f.close()
f2.close()
f=open("studentname.txt", 'r')
f2=open ("student-firstname.txt",'w')
for line in f.readlines():
token=line.split()
f2.write(token[1]+"\n")
f.close()
f2.close()
Split token[1] with space.
fname = token[1].split(' ')[0]
with open("studentname.txt") as f, open("student-firstname.txt", 'w') as fout:
for line in f:
firstname = line.split()[1]
print >> fout, firstname
Note:
you could use a with statement to make sure that the files are always closed even in case of an exception. You might need contextlib.nested() on old Python versions
'r' is a default mode for files. You don't need to specify it explicitly
.readlines() reads all lines at once. You could iterate over the file line by line directly
To avoid hardcoding the filenames you could use fileinput. Save it to firstname.py:
#!/usr/bin/env python
import fileinput
for line in fileinput.input():
firstname = line.split()[1]
print firstname
Example: $ python firstname.py studentname.txt >student-firstname.txt
Check out regular expressions. Something like this will probably work:
>>> import re
>>> nameline = "Smith, James W."
>>> names = re.match("(\w+),\s+(\w+).*", nameline)
>>> if names:
... print names.groups()
('Smith', 'James')
Line 3 basically says find a sequence of word characters as group 0, followed by a comma, some space characters and another sequence of word characters as group 1, followed by anything in nameline.
f = open("file")
o = open("out","w")
for line in f:
o.write(line.rstrip().split(",")[1].strip().split()+"\n")
f.close()
o.close()