Converting .csv file to .mat file without reading the csv - python

I have a python project that gives outputs as csv. These outputs sometimes can be large as 15-16gb. When I try to save it with scipy, ram and cpu can't handle the data and closes the program so I need to convert csv file to mat file without reading the file. Is there a way to do that?

Yes and no. You can't do anything with the file unless you read it, but it's not necessary that you finish all the reading. I don't know all the details but usually you can use fopen and fscanf to read just a few lines of the csv file, process however you like and save the partial result, then repeat from fscanf for some more lines again and again.

Related

Converting load cell data to .txt file

I am trying to gather data from my load cell that is hooked up to a raspberry pi 3B. I have the script reading data but am trying to figure out how to export that data into a .txt file.
I have seen some info on how to create .txt files with text but not much for integers. I need something more fit to my application.
The script takes x (not sure on the exact value) samples of data per second so the amount of data can vary depending on how long I run the script. I want a text file to be created once I stop the script and record the data points on separate lines like in the attached image. I found a way to do it with words/letters but integers wouldn't work.
Let me know if there is anything I can share to help find a solution easier. I appreciate all the input.
In Python, you can use "open" with the "w" mode to create a new file:
file = open("load_cell_data.txt", "w")
Pass the data in with file.write and then close the file with file.close.
Docs: https://docs.python.org/3/library/functions.html#open

How to not read a csv file if its being written to in that instance?

Is this something that can be done in python or any language? Is there a way to detect if a csv file is being written to in that instantaneous moment?
So you want to update the CSV file atomically. Starting to write over the existing file as you have realized is not atomic and will get you in trouble.
The trick is to write the new data to a temporary new file and then move the temp file over the live file. The move operation is atomic (for practical purposes).
create-new-csv-data > new-data.csv
mv new-data.csv data.csv
For probably more info than you want to know about how atomic a mv really is, see for example https://unix.stackexchange.com/questions/322038/is-mv-atomic-on-my-fs.

Python Zipfile - is entire file unzipped to memory?

I have some code which I am using to open a large zip which contains some csv files and then parse them.
I am using this code below but I am now wondering if I am actually unzipping the entire file into memory and then extracting the file contents to disk as well, after which I read the files in one by one.
def unzip_file(file_path):
zip_ref = zipfile.ZipFile(file_path, 'r')
extracted = zip_ref.namelist()
zip_ref.extractall('/tmp/extracts')
zip_ref.close()
return extracted
Is this actually unzipping the files and their contents into memory and then extracting the files straight to disk? I use the extracted variable afterwards as it contains a list of the file names I need to process but I dont also want to open each file into memory and then read them again.
Your concern is that you are wasting memory or being inefficient in the manner you are reading the files when extracting them. The answer to if you're doing anything "wrong" is simply: "No". Your code is correct and it does not keep files in memory after you have finished the function call.
A few notes on what you can improve though.
Use Context Managers to Automatically Close File
The ZipFile is also a context manager and it is generally considered best practice to use it to make sure that files are closed and cleaned up from memory correctly. Instead of calling .close() manually you could do the following:
with ZipFile(file_path, "r") as zip_ref:
zip_ref.extractall("/tmp/extracts")
It will then automatically close the file after the context manager is done, and make sure that nothing is stored in memory.
Since you close the file, you do not have to fear that it will stay in memory.
Read Files without Extracting
Since you are extracting the files to a /tmp/ folder, I guess(?) that you actually don't want to store the files on disk. Perhaps all you want to do is to read the data and do something with it.
You can read each file within the zip file without extracting them to disk.
with ZipFile('spam.zip') as myzip:
with myzip.open('eggs.txt') as myfile:
print(myfile.read())
This might be a better solution depending on what you want to achieve. You can see more from the python docs.

Read/Write files on hdfs using Python

I am a newbie to Python, I want to read a file from hdfs (which I have achieved).
after reading the file I am doing some string operations and I want to write these modified contents into the output file.
Reading the file I achieved using subprocess (which took a lot of time) since open didn't work for me.
cat = Popen(["hadoop", "fs", "-cat", "/user/hdfs/test-python/input/test_replace"],stdout=PIPE)
Now, how to write to the output file with the modified contents is the question.
Your help is highly appreciated
You can use a library for reading and writing to HDFS, like https://github.com/mtth/hdfs

Tracking csv load in python

I would like to show the progress of processing a csv file.
I've searched and found this:
Tracking file load progress in Python
But this it will make my life a bit harder, because I'll need to process the bytes read.
Another approach is to count the lines but I wouldn't like to read the number of lines before start to process.
My idea is to get the file size(OS), and as I'm processing the file I get the bytes processed (should be the fastest approach).
Any other solution to show the progress?
I found file.tell() but I haven't used it. It should give the position in the file.
You could ball-park it, right? The csv is just a text file, and you can grab the file size from the os module. Then, from the first line you read in, you can calculate the size of each line, and estimate the total lines in the file.
Clicking through your link, though, it appears that this is exactly the same suggestion :)

Categories

Resources