I have a python project that gives outputs as csv. These outputs sometimes can be large as 15-16gb. When I try to save it with scipy, ram and cpu can't handle the data and closes the program so I need to convert csv file to mat file without reading the file. Is there a way to do that?
Yes and no. You can't do anything with the file unless you read it, but it's not necessary that you finish all the reading. I don't know all the details but usually you can use fopen and fscanf to read just a few lines of the csv file, process however you like and save the partial result, then repeat from fscanf for some more lines again and again.
I am working on backing up 5,000+ files of sigma plot data. Right now, I have to do it all manually by exporting to .csv and .txt files, and the tabs of data all have to be individually exported. Has anybody had any luck finding a way to process these files? I'd really like to use python to write the back up files, but I'll take any help you have.
o, I'm extracting lots of data from an OpenVMS RDB Oracle database with a Python script.
On "hand-made" profiling, 3/4 of the time is taken writing down the data to the TXT file.
print >> outputFile, T00_EXTPLANTRA_STRUCTS.setStruct(parFormat).build(hyperContainer)
This is the specific line that prints out the data, which takes 3/4 of the execution time.
T00_EXTPLANTRA_STRUCTS.py is an external file containing the data structures (Which .setStruct() defines), and the hyperContainer is a Container from "Construct" library that contains data.
It takes almost five minutes to extract the whole file. I'd really like to learn if there is a way to write TXT data faster than this.
I already optimized the rest of the code especially DB transactions, it's just the writing operation that's taking a long time to execute.
The data to write looks like this, with 167.000 lines of this kind. (I hid the actual data with "X")
XX;XXXXX;X;>;XXXXX;XXXX;XXXXXXXXXXXXXX ;XXX; ;XXX; ;
Lots of thanks for anyone spending any time on this.
I would like to show the progress of processing a csv file.
I've searched and found this:
Tracking file load progress in Python
But this it will make my life a bit harder, because I'll need to process the bytes read.
Another approach is to count the lines but I wouldn't like to read the number of lines before start to process.
My idea is to get the file size(OS), and as I'm processing the file I get the bytes processed (should be the fastest approach).
Any other solution to show the progress?
I found file.tell() but I haven't used it. It should give the position in the file.
You could ball-park it, right? The csv is just a text file, and you can grab the file size from the os module. Then, from the first line you read in, you can calculate the size of each line, and estimate the total lines in the file.
Clicking through your link, though, it appears that this is exactly the same suggestion :)
i just installed python
i am trying to run this script:
import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
print row
i am running on windows.
do i have to type each line individually into python shell or can i save this code into a text file and then run it from the shell?
where does some.csv have to be in order to run it? in the same c:\python26 folder?
what is this code supposed to do?
Yes, you can create a file. The interactive shell is only for learning syntax, etc., and toying with ideas. It's not for writing programs.
a. Note that the script must have a .py extension, e.g., csvprint.py. To run it, you enter python csvprint.py. This will try to load csvprint.py from the current directory and run it.
The some.csv file has to be in the current working directory, which doesn't have to be (in fact, almost never should be) in the Python folder. Usually this will be you home directory, or some kind of working area that you setup, like C:\work. It's entirely up to you, though.
Without knowing the csv module that well myself, I'm guessing it reads CSV separated values from the file as tuples and prints each one out on the console.
One final note: The usual way to write such logic is to take the input from the command-line rather than hard-coding it. Like so:
import csv
reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
print row
And run it like so:
python csvprint.py some.csv
In this case you can put some.csv anywhere:
python csvprint.py C:\stuff\csvfiles\some.csv
When you have IDLE open, click File > New Window. (Or hit Ctrl + N)
This opens up a new window for you that's basically just a text editor with Python syntax highlighting. This is where you can write a program and save it. To execute it quickly, hit F5.
You can do both! To run the code from a text file (such as 'csvread.py', but the extension doesn't matter), type: python csvread.py at the command prompt. Make sure your PATH is set to include the Python installation directory.
"some.csv" needs to be in the current directory.
This code opens a Python file descriptor specifically designed to read CSVs. The reader file descriptor then prints out each row of the CSV in order. Check the documentation out for a more detailed example: http://docs.python.org/library/csv.html
Type the code into a *.py file, and then execute it.
I think the file should be in the same folder as your *.py script.
This opens a file stored in comma separated value format and prints the contents of each row.
All import does "Python code in one module gains access to the code in another module by the process of importing it. The import statement is the most common way of invoking the import machinery, but it is not the only way". The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. There is no “CSV standard”, so the format is operationally defined by the many applications which read and write it. The lack of a standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.
The CSV module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats. All your code is doing is looping through that file.