Using Python to append CSV file, I get data every other row.
How do I fix?
import csv
LL = [(1,2),(3,4)]
Fn = ("C:\Test.csv")
w = csv.writer(open(Fn,'a'), dialect='excel')
w.writerows(LL)
C:\test.csv when opened looks like this:
1,2
3,4
1,2
3,4
Appending is irrelevant to the problem; notice that the first two rows (those from the original file) are also double-spaced.
The real problem is that you have opened your file in text mode.
CSV is a binary format, believe it or not. The csv module is writing the misleadingly-named "lineterminator (should be "rowseparator") as \r\n as expected but then the Windows C runtime kicks in and replaces the \n with \r\n so that you have \r\r\n between rows. When you "open" the csv file with Excel it becomes confused
Always open your CSV files in binary mode ('rb', 'wb', 'ab'), whether you are operating on Windows or not. That way, you will get the expected rowseparator (CR LF) even on *x boxes, your code will be portable, and any linefeeds embedded in your data won't be changed into something else (on writing) or cause dramas (on input, provided of course they're quoted properly).
Other problems:
(1) Don't put your data in your root directory (C:\). Windows inherited a hierarchical file system from MS-DOS in the 1980s. Use it.
(2) If you must embed hard-wired filenames in your code, use raw strings r"c:\test.csv" ... if you had "c:\test.csv" the '\t' would be interpreted as a TAB character; similar problems with \r and \n
(3) The examples in the Python manual are aligned more towards brevity than robust code.
Don't do this:
w = csv.writer(open('foo.csv', 'wb'))
Do this:
f = open('foo.csv', 'wb')
w = csv.writer(f)
Then when you are finished, you have f available so that you can do f.close() to ensure that your file contents are flushed to disk. Even better: read up on the new with statement.
I have encountered a similar problem with appending an already created csv file, while running on windows.
As in this case writing and appending in "binary" mode avoids adding extra line to each rows written or appended by using the python script. Therefore;
w = csv.writer(open(Fn,'ab'),dialect='excel')
Related
I am trying to create a binary file (called textsnew) and then append two (previously created) binary files to it. When I print the resulting (textsnew), it only shows the first file appended to it, not the second one. I do however see that the size of the new file (textsnew) is the sum of the two appended files. Maybe Im opening it incorrectly? This is my code
with open("/path/textsnew", "ab") as myfile, open("/path/names", "rb") as file2:
myfile.write(file2.read())
with open("/path/textsnew", "ab") as myfile, open("/path/namesthree", "rb") as file2:
myfile.write(file2.read())
this code is for reading the file:
import pickle
infile1 = open('/path/textsnew','rb')
names1 = pickle.load(infile1)
print (names1)
Open the new file, write its data.
Then, while the new file is still open (in append mode), open the second file, read its data and immediately write that data to the first file.
Then repeat the procedure for the third file.
Everything in binary, of course, although it will work just as well with text files. Linux/Macos/*nix don't even really care.
This also assume that the built-in I/O buffer size will read the full file contents in one go, as in your question. Otherwise, you would need to create a loop around the read/write parts.
with open('/path/textsnew', 'ab') as fpout:
fpout.write(data)
with open('/path/names', 'rb') as fpin:
fpout.write(fpin.read())
with open('/path/namesthree', 'rb') as fpin:
fpout.write(fpin.read())
I'm reading a csv file with python. I am skipping the first row, which is simply descriptive metadata. This is what I'm doing:
f = open(in_file)
#skip the first row
next(f)
#...some data processing
This works fine, but when the first row contains a cell with a newline character (for example:
some random cell
with a new line
The next(f) command returns all the cells up to and including this cell, but ends with some random cell \n' and doesn't remove any further cells in this first row. Why is this happening and how can I ensure the entire first row is removed regardless of newline characters in the cells?
You are dealing with a very basic and general problem (that's why you were downvoted I guess): in modern operating systems, files are not typed: their content is just a sequence of bytes, and the meaning of those bytes is given by the applications (binary vs text is still a antic distinction in Windows). This crucial and basic property of the operating system is masked by the Desktop application (Windows, Gnome, KDE, Finder, ...): I click on a ".csv" file, the desktop opens Calc (or Excel), I click on a ".exe" file and Windows launches the program, ... But that's just conventional. At OS level, the content of a file is just bytes, nothing more. There is a very good reason for that: typed files at OS level would help you one week, and you'd have to struggle against it for the rest of your life.
Back to your question: Python won't decide for you that your "xyz.csv" file should be opened with some specific care. It opens the file and let you read it as bytes or characters and you have to handle the content yourself. Luckily, Python comes with "batteries inside" and provides the csv module to wrap your file:
import csv
with open(path, 'r', encoding='...') as f: # set the encoding ofthe file, e.g. utf-8
reader = csv.reader(f) # you may set the delimiter, quote char, etc.
for row in reader:
... # do what you want with each row
I'm re-writing an older script which generates a lot of temporary files for saving and exchanging information/data between functions. I want to keep them as variables, to avoid the overhead of generating files.
My problem: I encountered a function in which two files are merged on a binary level using this code:
with open(first_file, "ab") as file1, open(second_file, "rb") as file2:
file1.write(file2.read())
I would like to do the same, using strings and the '.join' function like this:
first_file = ''.join([first_file, second_file])
My question: is the .join function equivalent to 'read binary'? Or does the 'read binary' mode even apply to .join?
The data I'm working on is binary, so the simple 'read' command would potentially alter the contents.
So far I found this info in the official Python documentation:
Python on Windows makes a distinction between text and binary files;
the end-of-line characters in text files are automatically altered
slightly when data is read or written. This behind-the-scenes
modification to file data is fine for ASCII text files, but it’ll
corrupt binary data like that in JPEG or EXE files.
Making a small test:
a.txt contains 'Hello', 'b.txt' contains 'World'.
with open('a.txt', "ab") as file1, open('b.txt', "rb") as file2:
file1.write(file2.read())
Now a.txt contains 'HelloWorld'.
Checking with the other snippet, after changing back a.txt to "Hello":
with open('a.txt', "rb") as file1, open('b.txt', "rb") as file2:
first_file = file1.read()
second_file = file2.read()
first_file = b''.join([first_file, second_file])
with open('a.txt', 'wb') as fp:
fp.write(first_file)
Now the content of a.txt is again 'HelloWorld', so the two methods are equivalent (with respect to the result at least).
Obviously, though, the first method is more compact.
Read-binary is somewhat similar to using r"somestring" to indicate raw strings - the underlying file is binary, you're just telling Python to skip trying to decode the binary data into ASCII or UTF-8 or what-have-you characters.
So, the mode doesn't really apply here.
Since join operates on strings, you'd need to open file A, read it as a string, then do the same for B, whereas the original code just needs to read B and seek to the end of file A to start writing. So, you're not really getting much mileage out of doing a str.join, and you're actually using more memory.
If you want to optimize, make a loop that reads B line by line with writes it - that allows you to load just one line's worth of memory at a time rather than dumping the whole B file into it all at once.
Here is the issue, I have recently switched from Windows to Ubuntu and some of my python scripts for analyzing data files give me errors that I am unsure how to address correctly.
The data files from my current instrumentation output something that this:
[Header]
Various information w.r.t the instrument etc.
[Data]
Status,Code,Temperature,Field, etc.........
0,0,300, 0.013, etc...
So basically, this snippet of code is meant to read the data file and parse out all the information from [Header] to [Data] and start reading the real data at the appropriate lines regardless of how the header is arranged as different instruments have different headers.
f = open('file.dat')
lines = f.readlines()
i = 0
while (lines[i]!="[Data]\n"):
i+=1
i = i + 2
This code runs fine in Windows, but in Ubuntu, the value of i always takes on the total number of line in the particular data file. So I know the issue is the handling of the "[Data]\n" line. Thanks for any help.
If you open a file in default text mode, on Windows \r\n is translated to \n when read. On Linux this doesn't happen. Your data file likely has \r\n especially if created on Windows. Use Universal newline mode instead:
open(filename, 'rU')
How can I tell Python to open a CSV file, and merge all columns per line, into new lines in a new TXT file?
To explain:
I'm trying to download a bunch of member profiles from a website, for a research project. To do this, I want to write a list of all the URLs in a TXT file.
The URLs are akin to this: website.com-name-country-title-id.html
I have written a script that takes all these bits of information for each member and saves them in columns (name/country/title/id), in a CSV file, like this:
mark japan rookie married
john sweden expert single
suzy germany rookie married
etc...
Now I want to open this CSV and write a TXT file with lines like these:
www.website.com/mark-japan-rookie-married.html
www.website.com/john-sweden-expert-single.html
www.website.com/suzy-germany-rookie-married.html
etc...
Here's the code I have so far. As you can probably tell I barely know what I'm doing so help will be greatly appreciated!!!
import csv
x = "http://website.com/"
y = ".html"
csvFile=csv.DictReader(open("NameCountryTitleId.csv")) #This file is stored on my computer
file = open("urls.txt", "wb")
for row in csvFile:
strArgument=str(row['name'])+"-"+str(row['country'])+"-"+str(row['title'])+"-"+str(row['id'])
try:
file.write(x + strArgument + y)
except:
print(strArgument)
file.close()
I don't get any error messages after running this, but the TXT file is completely empty.
Rather than using a DictReader, use a regular reader to make it easier to join the row:
import csv
url_format = "http://website.com/{}.html"
csv_file = 'NameCountryTitleId.csv'
urls_file = 'urls.txt'
with open(csv_file, 'rb') as infh, open(urls_file, 'w') as outfh:
reader = csv.reader(infh)
for row in reader:
url = url_format.format('-'.join(row))
outfh.write(url + '\n')
The with statement ensures the files are closed properly again when the code completes.
Further changes I made:
In Python 2, open a CSV files in binary mode, the csv module handles line endings itself, because correctly quoted column data can have embedded newlines in them.
Regular text files should be opened in text mode still though.
When writing lines to a file, do remember to add a newline character to delineate lines.
Using a string format (str.format()) is far more flexible than using string concatenations.
str.join() lets you join a sequence of strings together with a separator.
its actually quite simple, you are working with strings yet the file you are opening to write to is being opened in bytes mode, so every single time the write fails and it prints to the screen instead. try changing this line:
file = open("urls.txt", "wb")
to this:
file = open("urls.txt", "w")
EDIT:
i stand corrected, however i would like to point out that with an absence of newlines or some other form of separator, how do you intend to use the URLs later on? if you put newlines between each URL they would be easy to recover