Binary text isn't found in a binary file - python

I opened a file in binary mode. I need to find a certain string inside this file and print the line after that. However, the string doesn't appear to be found in the text file. I looked into the text file manually, and the string is definitely found on one line.
I tried opening the file as textfile (not binary mode) and not making the string binary, but that gave an error that I solved with this question. The answer on that question led to the below (and current) code.
with open(os.path.join(directory, filename), 'rb') as read_obj:
# print(read_obj.read())
for line in read_obj:
line_number += 1
if b"PREPARED FOR" in line:
break
print(line_number)

Ok. So. Apparently .readlines() worked. I just had to read all the lines, loop through them. Find the one that had the string in it, call that index and add one to find the next line.

Related

Python script not finding value in log file when the value is in the file

The code below is meant to find any xls or csv file used in a process. The .log file contains full paths with extensions and definitely contains multiple values with "xls" or "csv". However, Python can't find anything...Any idea? The weird thing is when I copy the content of the log file and paste it to another notepad file and save it as log, it works then...
infile=r"C:\Users\me\Desktop\test.log"
important=[]
keep_words=["xls","csv"]
with open(infile,'r') as f:
for line in f:
for word in keep_words:
if word in line:
important.append(line)
print(important)
I was able to figure it out...encoding issue...
with io.open(infile,encoding='utf16') as f:
You must change the line
for line in f:
to
for line in f.readlines():
You made the python search in the bytes opened file, not in his content, even in his lines (in a list, just like the readlines method);
I hope I was able to help (sorry about my bad English).

Python - failing to read correctly the first line of a text file to a list

I'm having a problem understanding why my python program does what it does when reading (first) lines from files and adding the lines into a list. For some reason the first line needs to be empty or it'll not read the first line correctly. If the first line is empty, it's not empty (at least not according to python).
The thing is, I have two types of files:
First file is in the form:
text:more text
another text:and more
and the second file in the form:
text_file.txt
anothertext_file.txt
Both files are UTF-8 encoded text files. The first line of both files that gets added to a list in my program, is "text" and "text_file.txt" but any code that for example tries to say
if something == "text":
...
will not get executed even if the "something" is the same as the "text".
So I'm assuming that my problem is that somewhere in the machine code (or something), my computer writes some invisible code in the beginning of the text file and that makes the first line not what it is. Maybe? I have actually found a solution for the problem simply by adding an empty line and an if clause when reading the file line by line:
if not "." in line:
...
and in the other filetype:
if not ":" in line:
...
Those if clauses work and my program does what it's supposed to (as long as I always add an empty line to the beginning of the file), but I haven't been able to find a real reason for why my program is behaving as it is. Also, I would like to not have to do this kind of a workaround if there's an easier solution that doesn't involve me editing all my files and adding an if clauses to my code.
Would appreciate any help understanding what's happening here!
Edit: as you people have been asking for my code, here it is:
filelist = []
with open("filename.txt", "r", encoding="UTF-8") as f:
for line in f:
filelist.append(line.rstrip("\n"))
This does not work properly. Also I tried it like mxds said,
filelist = []
with open("filename.txt", "r", encoding="UTF-8") as f:
lines = f.readlines()
for line in lines:
filelist.append(line.rstrip("\n"))
and this does not work either. It is only a problem in the files in the first character of the first line.
Edit2:
It seems the problem is having a Byte order mark in the beginning of my text files. After a quick googling I didn't find a solution as to how I could remove it. I'm creating my files with just windows notepad.
Final edit:
Apparently notepad is not a real text editor. I guess I'll just swap over from notepad to notepad++ to avoid this problem. However, just in case I'll have to handle my files in notepad: If I open a textfile in notepad and add some text in it, will it add a BOM or should it do that only in the creating of the file?
Looks like you've already done the legwork on this, but according to How to make Notepad to save text in UTF-8 without BOM?, the best answer is not to use Notepad (but Notepad++ is ok). :)
Alternatively, you can strip the BOM in Python with:
line = line.decode("utf-8-sig").encode("utf-8")
See https://docs.python.org/3/library/codecs.html:
To increase the reliability with which a UTF-8 encoding can be
detected, Microsoft invented a variant of UTF-8 (that Python 2.5 calls
"utf-8-sig") for its Notepad program: Before any of the Unicode
characters is written to the file, a UTF-8 encoded BOM (which looks
like this as a byte sequence: 0xef, 0xbb, 0xbf) is written.
...
On decoding utf-8-sig will skip those three bytes if they appear as the first three bytes in the file. In UTF-8, the use of the BOM is discouraged and should generally be avoided.
A classic approach to reading text files in Python is:
with open(fname, 'r') as f:
lines = f.readlines()
After which you can process the lines like this:
for line in lines:
# do something with line...
As other comments have hinted, you may want to make sure this works first. It would help if you post your current code for review.
I just had similar issue: python readlines() reports invalid chars heading the first line, something like . I have tried all suggestions i can google, with no luck.
I came up with a simple trick: skip the line with
add a blank line as the first line in the text file
if len(line[i]) > len(line[0]):
do things
else:
skipping
in my case, the len(line[0] = 4, all other lines are longer than 4

Simple way to add text at the beginning of a script (file) in Python

I am a Python beginner and my next project is a program in which you enter the details of your program and then select the file (I'm using Tkinter), and then the program will format the details and write them to the start of the file.
I know that you'd have to 'rewrite' it and that a tmp file is probably in hand. I just want to know simple ways that one could achieve adding text to the beginning of a file.
Thanks.
To add text to the beginning of a file, you can (1) open the file for reading, (2) read the file, (3) open the file for writing and overwrite it with (your text + the original file text).
formatted_text_to_add = 'Sample text'
with open('userfile', 'rb') as filename:
filetext = filename.read()
newfiletext = formatted_text_to_add + '/n' + filetext
with open('userfile', 'wb') as filename:
filename.write(newfiletext)
This requires two I/O operations and I'm tempted to look for a way to do it in one pass. However, prior answers to similar questions suggest that trying to write to the beginning or middle of a file in Python gets complicated quite quickly unless you bite the bullet and overwrite the original file with the new text.
If I understand what you're asking, I believe you're looking for what's called a project skeleton. This link handles it pretty well.
This probably won't solve your exact problem, as you will need to know in advance the exact number of bytes you'll be adding to the beginning of the file.
# Put some text in the file
f = open("tmp.txt", "w")
print >>f, "123456789"
f.close()
# Open the file in read/write mode
f = open("tmp.txt", "r+")
f.seek(0) # reposition the file pointer to the beginning of the file
f.write('abc') # use write to avoid writing new lines
f.close()
When you reposition the file pointer using seek, you can overwrite the bytes that are already stored at that position. You can't, however, "insert" text, pushing existing bytes ahead to make room for new data. When I said you would need to know the exact number of bytes,
I meant you would have to "leave room" for the text at the beginning of the file. Something like:
f = open("tmp.txt", "w")
f.write("\0\0\0456789")
f.close()
# Some time later...
f = open("tmp.txt", "r+")
f.seek(0)
f.write('123')
f.close()
For text files, this can work if you leave a "blank" line of, say, 50 spaces at the beginning of the file. Later, you can go back and overwrite up to 50 bytes (the newline being byte 51)
without overwriting following lines. Of course, you can leave multiple lines at the beginning. The point is that you can't grow or shrink your reserved block of lines to be overwritten. There's nothing special about the newline in a file, other than that it is treated specially by file methods like read and readline for splitting blocks of data into separate strings.
To add one of more lines of text to the beginning of a file, without overwriting what's already present, you'll have to use the "read the old file, write to a new file" solution outlined in other answers.

editing a single .txt line in python 3.1

i have some data stored in a .txt file in this format:
----------|||||||||||||||||||||||||-----------|||||||||||
1029450386abcdefghijklmnopqrstuvwxy0293847719184756301943
1020414646canBeFollowedBySpaces 3292532113435532419963
don't ask...
i have many lines of this, and i need a way to add more digits to the end of a particular line.
i've written code to find the line i want, but im stumped as to how to add 11 characters to the end of it. i've looked around, this site has been helpful with some other issues i've run into, but i can't seem to find what i need for this.
it is important that the line retain its position in the file, and its contents in their current order.
using python3.1, how would you turn this:
1020414646canBeFollowedBySpaces 3292532113435532419963
into
1020414646canBeFollowedBySpaces 329253211343553241996301846372998
As a general principle, there's no shortcut to "inserting" new data in the middle of a text file. You will need to make a copy of the entire original file in a new file, modifying your desired line(s) of text on the way.
For example:
with open("input.txt") as infile:
with open("output.txt", "w") as outfile:
for s in infile:
s = s.rstrip() # remove trailing newline
if "target" in s:
s += "0123456789"
print(s, file=outfile)
os.rename("input.txt", "input.txt.original")
os.rename("output.txt", "input.txt")
Check out the fileinput module, it can do sort of "inplace" edits with files. though I believe temporary files are still involved in the internal process.
import fileinput
for line in fileinput.input('input.txt', inplace=1, backup='.orig'):
if line.startswith('1020414646canBeFollowedBySpaces'):
line = line.rstrip() + '01846372998' '\n'
print(line, end='')
The print now prints to the file instead of the console.
You might want to back up your original file before editing.
target_chain = '1020414646canBeFollowedBySpaces 3292532113435532419963'
to_add = '01846372998'
with open('zaza.txt','rb+') as f:
ch = f.read()
x = ch.find(target_chain)
f.seek(x + len(target_chain),0)
f.write(to_add)
f.write(ch[x + len(target_chain):])
In this method it's absolutely obligatory to open the file in binary mode 'b' for some reason linked to the treatment of the end of lines by Python (see Universal Newline, enabled by default)
The mode 'r+' is to allow the writing as well as the reading
In this method, what is before the target_chain in the file remains untouched. And what is after the target_chain is shifted ahead. As said by Greg Hewgill, there is no possibility to move apart bits on a hard drisk to insert new bits in the middle.
Evidently, if the file is very big, reading all of its content in ch could be too much memory consuming and the algorithm should then be changed: reading line after line until the line containing the target_chain, and then reading the next line before inserting, and then continuing to do "reading the next line - re-writing on the current line" until the end of the file in order to shift progressively the content from the line concerned with addition.
You see what I mean...
Copy the file, line by line, to another file. When you get to the line that needs extra chars then add them before writing.

question about splitting a large file

Hey I need to split a large file in python into smaller files that contain only specific lines. How do I do this?
You're probably going to want to do something like this:
big_file = open('big_file', 'r')
small_file1 = open('small_file1', 'w')
small_file2 = open('small_file2', 'w')
for line in big_file:
if 'Charlie' in line: small_file1.write(line)
if 'Mark' in line: small_file2.write(line)
big_file.close()
small_file1.close()
small_file2.close()
Opening a file for reading returns an object that allows you to iterate over the lines. You can then check each line (which is just a string of whatever that line contains) for whatever condition you want, then write it to the appropriate file that you opened for writing. It is worth noting that when you open a file with 'w' it will overwrite anything already written to that file. If you want to simply add to the end, you should open it with 'a', to append.
Additionally, if you expect there to be some possibility of error in your reading/writing code, and want to make sure the files are closed, you can use:
with open('big_file', 'r') as big_file:
<do stuff prone to error>
Do you mean breaking it down into subsections? Like if I had a file with chapter 1, chapter 2, and chapter 3, you want it to be broken down into separate files for each chapter?
The way I've done this is similar to Wilduck's response, but closes the input file as soon as it reads in the data and keeps all the lines read in.
data_file = open('large_file_name', 'r')
lines = data_file.readlines()
data_file.close()
outputFile = open('output_file_one', 'w')
for line in lines:
if 'SomeName' in line:
outputFile.write(line)
outputFile.close()
If you wanted to have more than one output file you could either add more loops or open more than one outputFile at a time.
I'd recommend using Wilducks response, however, as it uses less space and will take less time with larger files since the file is read only once.
How big and does it need to be done in python? If this is on unix, would split/csplit/grep suffice?
First, open the big file for reading.
Second, open all the smaller file names for writing.
Third, iterate through every line. Every iteration, check to see what kind of line it is, then write it to that file.
More info on File I/O: http://docs.python.org/tutorial/inputoutput.html

Categories

Resources