import re
re_for_identificate_1 = r""
with open("data_path/filename_1.txt","r+") as file:
for line in file:
#replace with a substring adding a space in the middle
line = re.sub(re_for_identificate_1, " milesimo", line)
#replace in txt with the fixed line
Example filename_1.txt :
unmilesimo primero
1001°
dosmilesimos quinto
2005°
tresmilesimos
3000°
nuevemilesimos doceavo
9012°
The correct output file that I need obtiene is this:
Rewrited input filename_1.txt
un milesimo primero
1001°
dos milesimos quinto
2005°
tres milesimos
3000°
nueve milesimos doceavo
9012°
What is the regex that I need and what is the best way to replace the fixed línes in their original positions in the input file?
You can use file.seek(0) to go beginning of the file, then write data and truncate the file. Like this:
import re
re_for_identificate_1 = "(?<!^)milesimo"
tmp = ""
with open("data.txt", "r+") as file:
for line in file:
line = re.sub(re_for_identificate_1, " milesimo", line)
tmp += line
file.seek(0)
file.write(tmp)
file.truncate()
The regex you want to use is "(?<!^)milesimo" to replace every instance of "milesimo" with " milesimo" but not at the beginning of a line.
I want to extract the specific word from the text file.
Here is the example text file:
https://drive.google.com/file/d/0BzQ6rtO2VN95d3NrTjktMExfNkU/view?usp=sharing
Kindly review it.
I am trying to extract the string as:
"Name": "the name infront of it"
"Link": "Link infront of it"
Say from the input file, I am expecting to get output like this:
"Name":"JTLnet"
"Link":"http://jtlnet.com"
"Name":"Apache 1.3"
"Link":"http://httpd.apache.org/docs/1.3"
"Name":"Apache"
"Link":"http://httpd.apache.org/"
.
.
.
"Name":"directNIC"
"Link":"http://directnic.com"
If these words are anywhere in the file, it should get extracted to another file.
Kindly let me know how I can achieve this sort of extraction? Kindly consider the file as the small part of big file.
Also, it is text file not json.
Kindly help me.
Since the text file is not formatted properly, the only option for you is regex. The below snippet works for the given sample file.
Keep in mind that this requires you to load the entire file into memory
import re, json
f = open(r'filepath')
textCorpus = f.read()
f.close()
# replace empty strings to non-empty, match regex easily
textCorpus = textCorpus.replace('""', '" "')
lstMatches = re.findall(r'"Name".+?"Link":".+?"', textCorpus)
with open(r'new_file.txt', 'ab+) as wf:
for eachMatch in lstMatches:
convJson = "{" + eachMatch + "}"
json_data = json.loads(convJson)
wf.write(json_data["Name"] + "\n")
wf.write(json_data["Link"] + "\n")
Short solution using re.findall() and str.split() functions:
import re
with open('test.txt', 'r') as fh:
p = re.compile(r'(?:"Categories":[^,]+,)("Name":"[^"]+"),(?:[^,]+,)("Link":"[^"]+")')
result = [pair for l in re.findall(p, fh.read()) for pair in l]
print('\n'.join(result))
The output(fragment):
"Name":"JTLnet"
"Link":"http://jtlnet.com"
"Name":"Apache 1.3"
"Link":"http://httpd.apache.org/docs/1.3"
"Name":"Apache"
"Link":"http://httpd.apache.org/"
"Name":"PHP"
....
Your file is a wrongly formatted json with extraneous double quote. But it is enough for the json module not to be able to load it. You are left with lower level regex parsing.
Assumptions:
the interesting part after "Name" or "Link" is:
separated from the identifier by a colon (:)
enclosed in double quotes (") with no included double quote
the file is structured in lines
Name and Link fields are always on one single line (no new line in fields)
You can process your file line by line with a simple re.finditer on each line:
rx = re.compile(r'(("Name":".*?")|("Link":".*?"))')
with open(inputfile) as fd:
for line in fd:
l = rx.finditer(line)
for elt in l:
print(elt.group(0))
If you want to output data to another file, just open it before above snippet with open(outputfile, "w") as fdout: and replace the print line with:
fdout.write(elt.group(0) + "\n")
def regexread():
import re
result = ''
savefileagain = open('sliceeverfile3.txt','w')
#text=open('emeverslicefile4.txt','r')
text='09,11,14,34,44,10,11, 27886637, 0\n561, Tue, 5,Feb,2013, 06,25,31,40,45,06,07, 19070109, 0\n560, Fri, 1,Feb,2013, 05,21,34,37,38,01,06, 13063500, 0\n559, Tue,29,Jan,2013,'
pattern='\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
#with open('emeverslicefile4.txt') as text:
f = re.findall(pattern,text)
for item in f:
print(item)
savefileagain.write(item)
#savefileagain.close()
The above function as written parses the text and returns sets of seven numbers. I have three problems.
Firstly the 'read' file which contains exactly the same text as text='09,...etc' returns a TypeError expected string or buffer, which I cannot solve even by reading some of the posts.
Secondly, when I try to write results to the 'write' file, nothing is returned and
thirdly, I am not sure how to get the same output that I get with the print statement, which is three lines of seven numbers each which is the output that I want.
This should do the trick:
import re
filename = 'sliceeverfile3.txt'
pattern = '\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
new_file = []
# Make sure file gets closed after being iterated
with open(filename, 'r') as f:
# Read the file contents and generate a list with each line
lines = f.readlines()
# Iterate each line
for line in lines:
# Regex applied to each line
match = re.search(pattern, line)
if match:
# Make sure to add \n to display correctly when we write it back
new_line = match.group() + '\n'
print new_line
new_file.append(new_line)
with open(filename, 'w') as f:
# go to start of file
f.seek(0)
# actually write the lines
f.writelines(new_file)
You're sort of on the right track...
You'll iterate over the file:
How to iterate over the file in python
and apply the regex to each line. The link above should really answer all 3 of your questions when you realize you're trying to write 'item', which doesn't exist outside of that loop.
I have a text file that looks like:
ABC
DEF
How can I read the file into a single-line string without newlines, in this case creating a string 'ABCDEF'?
For reading the file into a list of lines, but removing the trailing newline character from each line, see How to read a file without newlines?.
You could use:
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
Or if the file content is guaranteed to be one-line
with open('data.txt', 'r') as file:
data = file.read().rstrip()
In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:
from pathlib import Path
txt = Path('data.txt').read_text()
and then you can use str.replace to remove the newlines:
txt = txt.replace('\n', '')
You can read from a file in one line:
str = open('very_Important.txt', 'r').read()
Please note that this does not close the file explicitly.
CPython will close the file when it exits as part of the garbage collection.
But other python implementations won't. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951
To join all lines into a string and remove new lines, I normally use :
with open('t.txt') as f:
s = " ".join([l.rstrip("\n") for l in f])
with open("data.txt") as myfile:
data="".join(line.rstrip() for line in myfile)
join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.
This can be done using the read() method :
text_as_string = open('Your_Text_File.txt', 'r').read()
Or as the default mode itself is 'r' (read) so simply use,
text_as_string = open('Your_Text_File.txt').read()
I'm surprised nobody mentioned splitlines() yet.
with open ("data.txt", "r") as myfile:
data = myfile.read().splitlines()
Variable data is now a list that looks like this when printed:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
Note there are no newlines (\n).
At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:
for line in data:
print(line)
It's hard to tell exactly what you're after, but something like this should get you started:
with open ("data.txt", "r") as myfile:
data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.
with open("myfile.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
Here are four codes for you to choose one:
with open("my_text_file.txt", "r") as file:
data = file.read().replace("\n", "")
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().split("\n"))
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().splitlines())
or
with open("my_text_file.txt", "r") as file:
data = "".join([line for line in file])
you can compress this into one into two lines of code!!!
content = open('filepath','r').read().replace('\n',' ')
print(content)
if your file reads:
hello how are you?
who are you?
blank blank
python output
hello how are you? who are you? blank blank
You can also strip each line and concatenate into a final string.
myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
data = data + line.strip();
This would also work out just fine.
This is a one line, copy-pasteable solution that also closes the file object:
_ = open('data.txt', 'r'); data = _.read(); _.close()
f = open('data.txt','r')
string = ""
while 1:
line = f.readline()
if not line:break
string += line
f.close()
print(string)
python3: Google "list comprehension" if the square bracket syntax is new to you.
with open('data.txt') as f:
lines = [ line.strip('\n') for line in list(f) ]
Oneliner:
List: "".join([line.rstrip('\n') for line in open('file.txt')])
Generator: "".join((line.rstrip('\n') for line in open('file.txt')))
List is faster than generator but heavier on memory. Generators are slower than lists and is lighter for memory like iterating over lines. In case of "".join(), I think both should work well. .join() function should be removed to get list or generator respectively.
Note: close() / closing of file descriptor probably not needed
Have you tried this?
x = "yourfilename.txt"
y = open(x, 'r').read()
print(y)
To remove line breaks using Python you can use replace function of a string.
This example removes all 3 types of line breaks:
my_string = open('lala.json').read()
print(my_string)
my_string = my_string.replace("\r","").replace("\n","")
print(my_string)
Example file is:
{
"lala": "lulu",
"foo": "bar"
}
You can try it using this replay scenario:
https://repl.it/repls/AnnualJointHardware
I don't feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with '' you ended up creating a list. If you have a variable of x and print it out just by
x
or print(x)
or str(x)
You will see the entire list with the brackets. If you call each element of the (array of sorts)
x[0]
then it omits the brackets. If you use the str() function you will see just the data and not the '' either.
str(x[0])
Maybe you could try this? I use this in my programs.
Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()
Regular expression works too:
import re
with open("depression.txt") as f:
l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]
print (l)
['I', 'feel', 'empty', 'and', 'dead', 'inside']
with open('data.txt', 'r') as file:
data = [line.strip('\n') for line in file.readlines()]
data = ''.join(data)
from pathlib import Path
line_lst = Path("to/the/file.txt").read_text().splitlines()
Is the best way to get all the lines of a file, the '\n' are already stripped by the splitlines() (which smartly recognize win/mac/unix lines types).
But if nonetheless you want to strip each lines:
line_lst = [line.strip() for line in txt = Path("to/the/file.txt").read_text().splitlines()]
strip() was just a useful exemple, but you can process your line as you please.
At the end, you just want concatenated text ?
txt = ''.join(Path("to/the/file.txt").read_text().splitlines())
This works:
Change your file to:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
Then:
file = open("file.txt")
line = file.read()
words = line.split()
This creates a list named words that equals:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
That got rid of the "\n". To answer the part about the brackets getting in your way, just do this:
for word in words: # Assuming words is the list above
print word # Prints each word in file on a different line
Or:
print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space
This returns:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
with open(player_name, 'r') as myfile:
data=myfile.readline()
list=data.split(" ")
word=list[0]
This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.
Than you can easily access any word, or even store it in a string.
You can also do the same thing with using a for loop.
file = open("myfile.txt", "r")
lines = file.readlines()
str = '' #string declaration
for i in range(len(lines)):
str += lines[i].rstrip('\n') + ' '
print str
Try the following:
with open('data.txt', 'r') as myfile:
data = myfile.read()
sentences = data.split('\\n')
for sentence in sentences:
print(sentence)
Caution: It does not remove the \n. It is just for viewing the text as if there were no \n
From an input file I'm suppose to extract only first name of the student and then save the result in a new file called "student-‐firstname.txt" The output file should contain a list of
first names (not include middle name). I was able to get delete of the last name but I'm having problem deleting the middle name any help or suggestion?
the student name in the file look something like this (last name, first name, and middle initial)
Martin, John
Smith, James W.
Brown, Ashley S.
my python code is:
f=open("studentname.txt", 'r')
f2=open ("student-firstname.txt",'w')
str = ''
for line in f.readlines():
str = str + line
line=line.strip()
token=line.split(",")
f2.write(token[1]+"\n")
f.close()
f2.close()
f=open("studentname.txt", 'r')
f2=open ("student-firstname.txt",'w')
for line in f.readlines():
token=line.split()
f2.write(token[1]+"\n")
f.close()
f2.close()
Split token[1] with space.
fname = token[1].split(' ')[0]
with open("studentname.txt") as f, open("student-firstname.txt", 'w') as fout:
for line in f:
firstname = line.split()[1]
print >> fout, firstname
Note:
you could use a with statement to make sure that the files are always closed even in case of an exception. You might need contextlib.nested() on old Python versions
'r' is a default mode for files. You don't need to specify it explicitly
.readlines() reads all lines at once. You could iterate over the file line by line directly
To avoid hardcoding the filenames you could use fileinput. Save it to firstname.py:
#!/usr/bin/env python
import fileinput
for line in fileinput.input():
firstname = line.split()[1]
print firstname
Example: $ python firstname.py studentname.txt >student-firstname.txt
Check out regular expressions. Something like this will probably work:
>>> import re
>>> nameline = "Smith, James W."
>>> names = re.match("(\w+),\s+(\w+).*", nameline)
>>> if names:
... print names.groups()
('Smith', 'James')
Line 3 basically says find a sequence of word characters as group 0, followed by a comma, some space characters and another sequence of word characters as group 1, followed by anything in nameline.
f = open("file")
o = open("out","w")
for line in f:
o.write(line.rstrip().split(",")[1].strip().split()+"\n")
f.close()
o.close()