Writing multiple lines to notepad file python - python

I am trying to write to a notepad file with binary encodings each separated by a newline.The gist of the code is as follows
with open("filedir","ab") as Afile:
Afile.write(info+"\n")
However, the outputs are just being appended and not new lined.

If you're writing to a binary file (like you say) and you want it to work properly on Windows (I'm assuming you're on Windows since you're talking about notepad), then you need to use the Windows line endings "\r\n". Given that you're trying to write line endings in the proper "encoding" I'd have to ask why you want to use binary mode, given that all it does is disable converting "\n" into "\r\n" on Windows.

Related

How to disable universal newlines in Python 2.7 when using open()

I have a csv file that contains two different newline terminators (\n and \r\n). I want my Python script to use \r\n as the newline terminator and NOT \n. But the problem is that Python's universal newlines feature keeps normalizing everything to be \n when I open the file using open().
The strange thing is that it never used to normalize my newlines when I wrote this script, that's why I used Python 2.7 and it worked fine. But all of a sudden today it started normalizing everything and my script no longer works as needed.
How can I disable universal newlines when opening a file using open() (without opening in binary mode)?
You need to open the file in binary mode, as stated in the module documentation:
with open(csvfilename, 'rb') as fileobj:
reader = csv.reader(fileobj)
From the csv.reader() documentation:
If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.
In binary mode no line separator translations take place.

Python readline() fails when trying to read large (~ 13GB) csv file [duplicate]

This question already has answers here:
Unable to read huge (20GB) file from CPython
(2 answers)
Closed 9 years ago.
I'm a Python newbie and had a quick question regarding memory usage when reading large text files. I have a ~13GB csv I'm trying to read line-by-line following the Python documentation and more experienced Python user's advice to not use readlines() in order to avoid loading the entire file into memory.
When trying to read a line from the file I get the error below and am not sure what might be causing it. Besides this error, I also notice my PC's memory usage is excessively high. This was a little surprising since my understanding of the readline function is that it only loads a single line from the file at a time into memory.
For reference, I'm using Continuum Analytic's Anaconda distribution of Python 2.7 and PyScripter as my IDE for debugging and testing. Any help or insight is appreciated.
with open(R'C:\temp\datasets\a13GBfile.csv','r') as f:
foo = f.readline(); #<-- Err: SystemError: ..\Objects\stringobject.c:3902 bad argument to internal function
UPDATE:
Thank you all for the quick, informative and very helpful feedback, I reviewed the referenced link which is exactly the problem I was having. After applying the documented 'rU' option mode I was able to read lines from the file like normal. I didn't notice this mode mentioned in the documentation link I was referencing initially and neglected to look at the details for the open function first. Thanks again.
Unix text files end each line with \n.
Windows text files end each line with \r\n.
When you open a file in text mode, 'r', Python assumes it has the native line endings for your platform.
So, if you open a Unix text file on Windows, Python will look for \r\n sequences to split the lines. But there won't be any, so it'll treat your whole file is one giant 13-billion-character line. So that readline() call ends up trying to read the whole thing into memory.
The fix for this is to use universal newlines mode, by opening the file in mode rU. As explained in the docs for open:
supplying 'U' opens the file as a text file, but lines may be terminated by any of the following: the Unix end-of-line convention '\n', the Macintosh convention '\r', or the Windows convention '\r\n'.
So, instead of searching for \r\n sequences to split the lines, it looks for \r\n, or \n, or \r. And there are millions of \n. So, the problem is solved.
A different way to fix this is to use binary mode, 'rb'. In this mode, Python doesn't do any conversion at all, and assumes all lines end in \n, no matter what platform you're on.
On its own, this is pretty hacky—it means you'll end up with an extra \r on the end of every line in a Windows text file.
But it means you can pass the file on to a higher-level file reader like csv that wants binary files, so it can parse them the way it wants to. On top of magically solving this problem for you, a higher-level library will also probably make the rest of your code a lot simpler and more robust. For example, it might look something like this:
with open(R'C:\temp\datasets\a13GBfile.csv','rb') as f:
for row in csv.reader(f):
# do stuff
Now each row is automatically split on commas, except that commas that are inside quotes or escaped in the appropriate way don't count, and so on, so all you need to deal with is a list of column values.

How to change automatically the type of the excel file from Tab space separated Text to xls file?

I have an excel file whose extension is .xls but his type is Tab Space separated Text.
When I try to open the file by MS Excel it tells me that the extension is fake. And So I have to confirm that I trust the file and so I can read it then.
But my real problem is that when I try to read my file by the xlrd library it gives me this message :
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record;
And so to resolve this problem, I go to Save as in MS Excel and I change the type manually to .xls.
But my boss insist that I have to do this by code. I have 3 choices : Shell script under Linux, .bat file under Windows or Python.
So, how can I change the type of the excel file from Tab space separated Text to xls file by Shell script (command line), .bat or Python?
mv file.{xls,csv}
It's a csv file, stop treating it as an excel file and things will work a lot better. :) There are nice csv manipulation tools available in most languages. Do you really need the excel library?
The real type of the file is dictated by the contents of the file, not the name of it. xlrd doesn't care about the name at all, it cares about the contents, so xlrd is not your problem, and it's not even relevant to your task.
I don't know what you mean by "tab space separated text". Are the values separated by '\t ' (a tab character followed by a space character)? Sometimes tabs and sometimes spaces?
If the separator is constant, just use Python's csv module. If the separator is whitespace and the data does not contain whitespace, then you can use Python's split() string method. If the separator varies and can appear in the data, then you will have to write something fancier to parse it.
In any case, once you have read the data, to write out a real .xls file, your best Python option is the xlwt module.

Reading file in Python one line at a time

I do appreciate this question has been asked million of time, but I can't figure out while attempting to read a .txt file line by line I get the entire file read in one go.
This is my little snippet
num = 0
with open(inStream, "r") as f:
for line in f:
num += 1
print line + " ..."
print num
Having a look at the open function there is anything that suggest a second param to limit the reading as that is just the "mode" to pen the file.
So I can only guess there are same problem with my file, but this is a txt file, with entry line by line.
Any hint?
Without a little more information, it's hard to be absolutely sure… but most likely, your problem is inappropriate line endings.
For example, on a modern Mac OS X system, lines in text files end with '\n' newline characters. So, when you do for line in f:, Python breaks the text file on '\n' characters.
But on classic Mac OS 9, lines in text files ended with '\r' instead. If you have some ancient classic Mac text files lying around, and you give one to Python, it will go looking for '\n' characters and not find any, so it'll think the whole file is one giant line.
(Of course in real life, Windows is a problem more often than classic Mac OS, but I used this example because it's simpler.)
Python 2: Fortunately, Python has a feature called "universal newlines". For full details, see the link, but the short version is that adding "U" onto the end of the mode when opening a text file means Python will read any of the three standard line-ending conventions (and give them to your code as Unix-style '\n').
In other words, just change one line:
with open(inStream, "rU") as f:
Python 3: Universal newlines are part of the standard behavior; adding "U" has no effect and is deprecated.

How do I stop having carriage returns added to my file output in Python?

I'm trying to parse a BMP file, do some changes and then reassemble a new BMP using Python.
The carriage return seems to be a huge problem. When I open the bitmap file using Notepad++ and search for "\r', the character does not exist. I read the file in Python (readData = fileIn.read()) and try searching using readData.find('\r') it returns -1. Searching for "\n" works fine. All is good for now.
When I try to write this exact same block of text into a new BMP using fileOut.write(readData) and I use Notepad++ to search for "\r", I am able to find it (twice, each corresponding to the preexisting "\n" characters).
Is there a way to write this block of data to a new BMP without "\r" being added automatically? I've tried applying .strip() and .replace('\r','') to the string before writing it to the new file.
You're probably opening the file as text (the default) when you want to open it as binary.
open("example.bmp", "rb") # to [r]ead as [b]inary
open("example.bmp", "wb") # to [w]rite as [b]inary
From the documentation:
The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability.
You are opening the file in text mode, while you need binary mode. Find more about open() here:
http://docs.python.org/library/functions.html

Categories

Resources