Reading a .dat file in Python corrupts the file

Reading a .dat file in Python corrupts the file - python

I have a dat file of 3D dimensional coordinates that I am trying to read like this:
path = '/path/to/dat-file.dat'
data_content = [i.strip().split() for i in open(path, encoding = 'ISO-8859-1').readLines()]
print(data_content)
This is my output:
...\xad\x05jR|APcAoSNvA07\x9...
Its basically a long cryptic line with letters that have accents as well as letters from the cyrillic alphabet.
Is the way I'm opening this file corrupting it? Where am I going wrong?

.dat files contain binary data, not text data, but print() only works with text data. The file is not being corrupted, you are just printing the data in the wrong format. If you want to print the data inside the .dat file and get something meaningful, then you will need to use a library that understands the .dat file format.

Related

Python - doc to docx file converter input, file path from a txt file

Hi stackoverflow community,
Situation,
I'm trying to run this converter found from here,
However what I want is for it to read an array of file path from a text file and convert them.
Reason being, these file path are filtered manually, so I don't have to convert unnecessary files. There are a large amount of unnecessary files in the folder.
How can I go about with this? Thank you.

with open("file_path",'r') as file_content:
content=file_content.read()
content=content.split('\n')
You can read the data of the file using the method above, Then covert the data of file into a list(or any other iteratable data type) so that we can use it with for loop.I used content=content.split('\n') to split the data of content by '\n' (Every time you press enter key, a new line character '\n' is sended), you can use any other character to split.
for i in content:
# the code you want to execute
Note
Some useful links:
Split
File writing
File read and write

By looking at your situation, I guess this is what you want (to only convert certain file in a directory), in which you don't need an extra '.txt' file to process:
import os
for f in os.listdir(path):
if f.startswith("Prelim") and f.endswith(".doc"):
convert(f)
But if for some reason you want to stick with the ".txt" processing, this may help:
with open("list.txt") as f:
lines = f.readlines()
for line in lines:
convert(line)

How do I convert my Csv file to right format?

When I read and print the CSV file I downloaded, then I got the following results
As you can see, the result is printed in a wired format.
If I want to print a specific column, here is the error message I got.
I believe the format of the winedata.csv file is wrong, because my code works for other csv file. How do I convert my csv file to right format?

Your winedata.csv file is separated by semi-colons, rather than commas. Therefore, you need to provide the sep option to pd.read_csv as follows:
wine_data = pd.read_csv("winedata.csv", sep=";")
You will then be able to access your pH column as:
wine_data["pH"]

How to read a file of extension .dat in python and extract data from it

Hello everyone!
I am currently working on a project and i need to automate the things on it using file handling..there is a file in '.dat' format and i want to extract the data from it..the data present in that is in the form of hex..and by getting these hex values i need to perform serial port communication..i can access this dat file from a tool named hex editor and can see the values from it..but the problem is that i do not want the complete data from that file i need to extract it in segments..i tried to read it but it reads it completely and i got some garbage values also in the output..
i will try to upload a screenshot of the hex editor and the values which i want to extract from it..so please antbody help me out in this

Open your .dat file in binary mode, access the data as per your need.
Use 'rb' parameters in open() method for reading in binary mode.
with open('input.dat', 'rb') as f:
data = f.read() # complete binary data will be available in 'data'
first_byte = data[0] # access individual byte like this
second_byte = data[1]
send_uart(data[:10]) # Send first 10 bytes

How to read a large .XLS file in chunks without loading it into RAM at once

I try to parse different kinds of very large Excel Files (.csv, .xlsx, .xls)
Working (.csv/.xlsx) flows
.csv is chunkable by using pandas.read_csv(file, chunksize=chunksize)
.xlsx is chunkable by unzipping it and parsing inner .xml files using lxml.etree.iterparse(zip_file.open('xl/worksheets/sheet1.xml')) and lxml.etree.iterparse(zip_file.open('xl/sharedStrings.xml')), performing additional operations afterwards.
Not working (.xls) flow
.xls I can't find any info on how to split this file in chunks!
Details: My file has a type of Django's TemporaryUploadedFile. I get it from request.data['file'] on PUT request.
I get a path of the file like request.data['file'].temporary_file_path(). This is '/tmp/tmpu73gux4m.upload'. (I'm not sure what the *****.upload file is. I guess it's some kind of HTTP file encoding)
When I try to read it:
file.open()
content = file.read()
the content looks like a bytes string b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00...etc.
Question
Are there any means of encoding and parsing this bytes string?
Ideally, I would want to read .xls row by row without loading the whole file into RAM at once. Are there any means of doing it?

python opens text file with a space between every character

Whenever I try to open a .csv file with the python command
fread = open('input.csv', 'r')
it always opens the file with spaces between every single character. I'm guessing it's something wrong with the text file because I can open other text files with the same command and they are loaded correctly. Does anyone know why a text file would load like this in python?
Thanks.
Update
Ok, I got it with the help of Jarret Hardie's post
this is the code that I used to convert the file to ascii
fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)
Thanks!

The post by recursive is probably right... the contents of the file are likely encoded with a multi-byte charset. If this is, in fact, the case you can likely read the file in python itself without having to convert it first outside of python.
Try something like:
fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
The 'b' flag ensures the file is read as binary data. You'll need to know (or guess) the original encoding... in this example, I've used utf-16, but YMMV. This will convert the file to unicode. If you truly have a file with multi-byte chars, I don't recommend converting it to ascii as you may end up losing a lot of the characters in the process.
EDIT: Thanks for uploading the file. There are two bytes at the front of the file which indicates that it does, indeed, use a wide charset. If you're curious, open the file in a hex editor as some have suggested... you'll see something in the text version like 'I.D.|.' (etc). The dot is the extra byte for each char.
The code snippet above seems to work on my machine with that file.

The file is encoded in some unicode encoding, but you are reading it as ascii. Try to convert the file to ascii before using it in python.

Isn't csv a simple txt file with values separated with comma.
Just try to open it with a text editor to see if the file is correctly formed.

To read an encoded file, you can simply replace open with codecs.open.
fread = codecs.open('input.csv', 'r', 'utf-16')

It did never ocurred to me, but as truppo said, it must be something wrong with the file.
Try to open the file in Excel/BrOffice Calc and Save As the file as Csv again.
If the problem persists, try a subset of the data: fist 10/last 10/intermediate 10 lines of the file.

Ok, I got it with the help of Jarret Hardie's post
this is the code that I used to convert the file to ascii
fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)
Thanks!

Open the file in binary mode, 'rb'. Check it in a HEX Editor and check for null padding '00'. Open the file in something like Scintilla Text Editor to check the characters present in the file.

Here's the quick and easy way, esp if python won't parse the input correctly
sed 's/ \(.\)/\1/g'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading a .dat file in Python corrupts the file - python

Related

Python - doc to docx file converter input, file path from a txt file

How do I convert my Csv file to right format?

How to read a file of extension .dat in python and extract data from it

How to read a large .XLS file in chunks without loading it into RAM at once

python opens text file with a space between every character

Categories

Resources