Hi stackoverflow community,
Situation,
I'm trying to run this converter found from here,
However what I want is for it to read an array of file path from a text file and convert them.
Reason being, these file path are filtered manually, so I don't have to convert unnecessary files. There are a large amount of unnecessary files in the folder.
How can I go about with this? Thank you.
with open("file_path",'r') as file_content:
content=file_content.read()
content=content.split('\n')
You can read the data of the file using the method above, Then covert the data of file into a list(or any other iteratable data type) so that we can use it with for loop.I used content=content.split('\n') to split the data of content by '\n' (Every time you press enter key, a new line character '\n' is sended), you can use any other character to split.
for i in content:
# the code you want to execute
Note
Some useful links:
Split
File writing
File read and write
By looking at your situation, I guess this is what you want (to only convert certain file in a directory), in which you don't need an extra '.txt' file to process:
import os
for f in os.listdir(path):
if f.startswith("Prelim") and f.endswith(".doc"):
convert(f)
But if for some reason you want to stick with the ".txt" processing, this may help:
with open("list.txt") as f:
lines = f.readlines()
for line in lines:
convert(line)
When I read and print the CSV file I downloaded, then I got the following results
As you can see, the result is printed in a wired format.
If I want to print a specific column, here is the error message I got.
I believe the format of the winedata.csv file is wrong, because my code works for other csv file. How do I convert my csv file to right format?
Your winedata.csv file is separated by semi-colons, rather than commas. Therefore, you need to provide the sep option to pd.read_csv as follows:
wine_data = pd.read_csv("winedata.csv", sep=";")
You will then be able to access your pH column as:
wine_data["pH"]
Hello everyone!
I am currently working on a project and i need to automate the things on it using file handling..there is a file in '.dat' format and i want to extract the data from it..the data present in that is in the form of hex..and by getting these hex values i need to perform serial port communication..i can access this dat file from a tool named hex editor and can see the values from it..but the problem is that i do not want the complete data from that file i need to extract it in segments..i tried to read it but it reads it completely and i got some garbage values also in the output..
i will try to upload a screenshot of the hex editor and the values which i want to extract from it..so please antbody help me out in this
Open your .dat file in binary mode, access the data as per your need.
Use 'rb' parameters in open() method for reading in binary mode.
with open('input.dat', 'rb') as f:
data = f.read() # complete binary data will be available in 'data'
first_byte = data[0] # access individual byte like this
second_byte = data[1]
send_uart(data[:10]) # Send first 10 bytes
I try to parse different kinds of very large Excel Files (.csv, .xlsx, .xls)
Working (.csv/.xlsx) flows
.csv is chunkable by using pandas.read_csv(file, chunksize=chunksize)
.xlsx is chunkable by unzipping it and parsing inner .xml files using lxml.etree.iterparse(zip_file.open('xl/worksheets/sheet1.xml')) and lxml.etree.iterparse(zip_file.open('xl/sharedStrings.xml')), performing additional operations afterwards.
Not working (.xls) flow
.xls I can't find any info on how to split this file in chunks!
Details: My file has a type of Django's TemporaryUploadedFile. I get it from request.data['file'] on PUT request.
I get a path of the file like request.data['file'].temporary_file_path(). This is '/tmp/tmpu73gux4m.upload'. (I'm not sure what the *****.upload file is. I guess it's some kind of HTTP file encoding)
When I try to read it:
file.open()
content = file.read()
the content looks like a bytes string b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00...etc.
Question
Are there any means of encoding and parsing this bytes string?
Ideally, I would want to read .xls row by row without loading the whole file into RAM at once. Are there any means of doing it?
Whenever I try to open a .csv file with the python command
fread = open('input.csv', 'r')
it always opens the file with spaces between every single character. I'm guessing it's something wrong with the text file because I can open other text files with the same command and they are loaded correctly. Does anyone know why a text file would load like this in python?
Thanks.
Update
Ok, I got it with the help of Jarret Hardie's post
this is the code that I used to convert the file to ascii
fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)
Thanks!
The post by recursive is probably right... the contents of the file are likely encoded with a multi-byte charset. If this is, in fact, the case you can likely read the file in python itself without having to convert it first outside of python.
Try something like:
fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
The 'b' flag ensures the file is read as binary data. You'll need to know (or guess) the original encoding... in this example, I've used utf-16, but YMMV. This will convert the file to unicode. If you truly have a file with multi-byte chars, I don't recommend converting it to ascii as you may end up losing a lot of the characters in the process.
EDIT: Thanks for uploading the file. There are two bytes at the front of the file which indicates that it does, indeed, use a wide charset. If you're curious, open the file in a hex editor as some have suggested... you'll see something in the text version like 'I.D.|.' (etc). The dot is the extra byte for each char.
The code snippet above seems to work on my machine with that file.
The file is encoded in some unicode encoding, but you are reading it as ascii. Try to convert the file to ascii before using it in python.
Isn't csv a simple txt file with values separated with comma.
Just try to open it with a text editor to see if the file is correctly formed.
To read an encoded file, you can simply replace open with codecs.open.
fread = codecs.open('input.csv', 'r', 'utf-16')
It did never ocurred to me, but as truppo said, it must be something wrong with the file.
Try to open the file in Excel/BrOffice Calc and Save As the file as Csv again.
If the problem persists, try a subset of the data: fist 10/last 10/intermediate 10 lines of the file.
Ok, I got it with the help of Jarret Hardie's post
this is the code that I used to convert the file to ascii
fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)
Thanks!
Open the file in binary mode, 'rb'. Check it in a HEX Editor and check for null padding '00'. Open the file in something like Scintilla Text Editor to check the characters present in the file.
Here's the quick and easy way, esp if python won't parse the input correctly
sed 's/ \(.\)/\1/g'