Windows vs. Linux Text File Reading

Windows vs. Linux Text File Reading - python

Here is the issue, I have recently switched from Windows to Ubuntu and some of my python scripts for analyzing data files give me errors that I am unsure how to address correctly.
The data files from my current instrumentation output something that this:
[Header]
Various information w.r.t the instrument etc.
[Data]
Status,Code,Temperature,Field, etc.........
0,0,300, 0.013, etc...
So basically, this snippet of code is meant to read the data file and parse out all the information from [Header] to [Data] and start reading the real data at the appropriate lines regardless of how the header is arranged as different instruments have different headers.
f = open('file.dat')
lines = f.readlines()
i = 0
while (lines[i]!="[Data]\n"):
i+=1
i = i + 2
This code runs fine in Windows, but in Ubuntu, the value of i always takes on the total number of line in the particular data file. So I know the issue is the handling of the "[Data]\n" line. Thanks for any help.

If you open a file in default text mode, on Windows \r\n is translated to \n when read. On Linux this doesn't happen. Your data file likely has \r\n especially if created on Windows. Use Universal newline mode instead:
open(filename, 'rU')

Related

Edit Minecraft .dat File in Python

I'm looking to edit a Minecraft Windows 10 level.dat file in python. I've tried using the package nbt and pyanvil but get the error OSError: Not a gzipped file. If I print open("level.dat", "rb").read() I get a lot of nonsensical data. It seems like it needs to be decoded somehow, but I don't know what decoding it needs. How can I open (and ideally edit) one of these files?

To read data just do :
from nbt import nbt
nbtfile = nbt.NBTFile("level.dat", 'rb')
print(nbtfile) # Here you should get a TAG_Compound('Data')
print(nbtfile["Data"].tag_info()) # Data came from the line above
for tag in nbtfile["Data"].tags: # This loop will show us each entry
print(tag.tag_info())
As for editing :
# Writing data (changing the difficulty value
nbtfile["Data"]["Difficulty"].value = 2
print(nbtfile["Data"]["Difficulty"].tag_info())
nbtfile.write_file("level.dat")
EDIT:
It looks like Mojang doesn't use the same formatting for Java and bedrock, as bedrock's level.dat file is stored in little endian format and uses non-compressed UTF-8.
As an alternative, Amulet-Nbt is supposed to be a Python library written in Cython for reading and editing NBT files (supposedly works with Bedrock too).
Nbtlib also seems to work, as long as you set byteorder="little when loading the file.
Let me know if u need more help...

You'll have to give the path either relative to the current working directory
path/to/file.dat
Or you can use the absolute path to the file
C:user/dir/path/to/file.dat
Read the data,replace the values and then write it
# Read in the file
with open('file.dat', 'r') as file :
filedata = file.read()
# Replace the target string
filedata = filedata.replace('yuor replacement or edit')
# Write the file out again
with open('file.dat', 'w') as file:
file.write(filedata)

python: from arduino serial port to csv file, why the data format is strange?

/I apologize for my limited level of English which may cause the question not that clear./
I'm now using python to write the data which was sent from the arduino into a csv file, I want around 200 data in a group, one group for one row, every data separately in different colums. The data from my aruino is in the format: number+, (for example: 123,144,135,....) but in csv file, the number had been separated into different colums (1,2,3 in different colums instead of 123 in one colum), and when open the file by writing note, the data looks like "1,2,3","1,4,4",.....
I tried different delimeters like \t, space... \t looks fine when I viewed the file by excel but still didn't work in writing pad (a tab between every two numbers).
I also tried to delete the "," in arduino code but it doesn't help as well.
In the writerows() function, I tried data, str(data) and str(data)+"," ,not much difference.
I even changed the delimeter setting of my laptop from "," to "\t" but dosen't help.
The arduino part:
Serial.print(value);
Serial.print(",");
The python part:
while True:
try:
ser_bytes = ser.readline()
decoded_bytes = ser_bytes.decode('utf-8')
print(decoded_bytes)
#decoded_bytes = decoded_bytes.strip('|')
with open("test_data.csv","a",newline="") as f:
writer = csv.writer(f,delimiter=",")
writer.writerows([str(decoded_bytes),])
I searched a lot about the csv format but I still can't get the point why the code doesn't work.
Thank you for the help.

You're right, I think I didn't totally get what your question is, but here are some ideas. To get correct csv output, you have to change your code to something like this:
while True:
try:
ser_bytes = ser.readline()
// the following line splits the line you got from your arduino into an array
decoded_bytes = ser_bytes.split(",");
print(decoded_bytes)
with open("test_data.csv","a") as f:
writer = csv.writer(f,delimiter=",")
writer.writerow(decoded_bytes)
Like this you should get correct csv output and every line you get from the arduino is written to a line in the file.
Some additional thoughts: Since you're a getting a csv style line from your arduino already you could write that to a file directly, without splitting and using the csv writer. That's actually a little overkill, but it probably doesn't matter that much ;)

Why is python failing to skip first line in csv when there are newlines?

I'm reading a csv file with python. I am skipping the first row, which is simply descriptive metadata. This is what I'm doing:
f = open(in_file)
#skip the first row
next(f)
#...some data processing
This works fine, but when the first row contains a cell with a newline character (for example:
some random cell
with a new line
The next(f) command returns all the cells up to and including this cell, but ends with some random cell \n' and doesn't remove any further cells in this first row. Why is this happening and how can I ensure the entire first row is removed regardless of newline characters in the cells?

You are dealing with a very basic and general problem (that's why you were downvoted I guess): in modern operating systems, files are not typed: their content is just a sequence of bytes, and the meaning of those bytes is given by the applications (binary vs text is still a antic distinction in Windows). This crucial and basic property of the operating system is masked by the Desktop application (Windows, Gnome, KDE, Finder, ...): I click on a ".csv" file, the desktop opens Calc (or Excel), I click on a ".exe" file and Windows launches the program, ... But that's just conventional. At OS level, the content of a file is just bytes, nothing more. There is a very good reason for that: typed files at OS level would help you one week, and you'd have to struggle against it for the rest of your life.
Back to your question: Python won't decide for you that your "xyz.csv" file should be opened with some specific care. It opens the file and let you read it as bytes or characters and you have to handle the content yourself. Luckily, Python comes with "batteries inside" and provides the csv module to wrap your file:
import csv
with open(path, 'r', encoding='...') as f: # set the encoding ofthe file, e.g. utf-8
reader = csv.reader(f) # you may set the delimiter, quote char, etc.
for row in reader:
... # do what you want with each row

Python Does Not Read Entire Text File

I'm running into a problem that I haven't seen anyone on StackOverflow encounter or even google for that matter.
My main goal is to be able to replace occurences of a string in the file with another string. Is there a way there a way to be able to acess all of the lines in the file.
The problem is that when I try to read in a large text file (1-2 gb) of text, python only reads a subset of it.
For example, I'll do a really simply command such as:
newfile = open("newfile.txt","w")
f = open("filename.txt","r")
for line in f:
replaced = line.replace("string1", "string2")
newfile.write(replaced)
And it only writes the first 382 mb of the original file. Has anyone encountered this problem previously?
I tried a few different solutions such as using:
import fileinput
for i, line in enumerate(fileinput.input("filename.txt", inplace=1)
sys.stdout.write(line.replace("string1", "string2")
But it has the same effect. Nor does reading the file in chunks such as using
f.read(10000)
I've narrowed it down to mostly likely being a reading in problem and not a writing problem because it happens for simply printing out lines. I know that there are more lines. When I open it in a full text editor such as Vim, I can see what the last line should be, and it is not the last line that python prints.
Can anyone offer any advice or things to try?
I'm currently using a 32-bit version of Windows XP with 3.25 gb of ram, and running Python 2.7

Try:
f = open("filename.txt", "rb")
On Windows, rb means open file in binary mode. According to the docs, text mode vs. binary mode only has an impact on end-of-line characters. But (if I remember correctly) I believe opening files in text mode on Windows also does something with EOF (hex 1A).
You can also specify the mode when using fileinput:
fileinput.input("filename.txt", inplace=1, mode="rb")

Are you sure the problem is with reading and not with writing out?
Do you close the file that is written to, either explicitly newfile.close() or using the with construct?
Not closing the output file is often the source of such problems when buffering is going on somewhere. If that's the case in your setting too, closing should fix your initial solutions.

If you use the file like this:
with open("filename.txt") as f:
for line in f:
newfile.write(line.replace("string1", "string2"))
It should only read into memory one line at a time, unless you keep a reference to that line in memory.
After each line is read it will be up to pythons garbage collector to get rid of it. Give this a try and see if it works for you :)

Found to solution thanks to Gareth Latty. Using an iterator:
def read_in_chunks(file, chunk_size=1000):
while True:
data = file.read(chunk_size)
if not data: break
yield data
This answer was posted as an edit to the question Python Does Not Read Entire Text File by the OP user1297872 under CC BY-SA 3.0.

How to append to a CSV file?

Using Python to append CSV file, I get data every other row.
How do I fix?
import csv
LL = [(1,2),(3,4)]
Fn = ("C:\Test.csv")
w = csv.writer(open(Fn,'a'), dialect='excel')
w.writerows(LL)
C:\test.csv when opened looks like this:
1,2
3,4
1,2
3,4

Appending is irrelevant to the problem; notice that the first two rows (those from the original file) are also double-spaced.
The real problem is that you have opened your file in text mode.
CSV is a binary format, believe it or not. The csv module is writing the misleadingly-named "lineterminator (should be "rowseparator") as \r\n as expected but then the Windows C runtime kicks in and replaces the \n with \r\n so that you have \r\r\n between rows. When you "open" the csv file with Excel it becomes confused
Always open your CSV files in binary mode ('rb', 'wb', 'ab'), whether you are operating on Windows or not. That way, you will get the expected rowseparator (CR LF) even on *x boxes, your code will be portable, and any linefeeds embedded in your data won't be changed into something else (on writing) or cause dramas (on input, provided of course they're quoted properly).
Other problems:
(1) Don't put your data in your root directory (C:\). Windows inherited a hierarchical file system from MS-DOS in the 1980s. Use it.
(2) If you must embed hard-wired filenames in your code, use raw strings r"c:\test.csv" ... if you had "c:\test.csv" the '\t' would be interpreted as a TAB character; similar problems with \r and \n
(3) The examples in the Python manual are aligned more towards brevity than robust code.
Don't do this:
w = csv.writer(open('foo.csv', 'wb'))
Do this:
f = open('foo.csv', 'wb')
w = csv.writer(f)
Then when you are finished, you have f available so that you can do f.close() to ensure that your file contents are flushed to disk. Even better: read up on the new with statement.

I have encountered a similar problem with appending an already created csv file, while running on windows.
As in this case writing and appending in "binary" mode avoids adding extra line to each rows written or appended by using the python script. Therefore;
w = csv.writer(open(Fn,'ab'),dialect='excel')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Windows vs. Linux Text File Reading - python

If you open a file in default text mode, on Windows \r\n is translated to \n when read. On Linux this doesn't happen. Your data file likely has \r\n especially if created on Windows. Use Universal newline mode instead: open(filename, 'rU')

Related

Edit Minecraft .dat File in Python

python: from arduino serial port to csv file, why the data format is strange?

Why is python failing to skip first line in csv when there are newlines?

Python Does Not Read Entire Text File

How to append to a CSV file?

Categories

Resources