I have a treatment that uses pandas.DataFrame.to_csv() to create .csv files of my data and I wish to execute this treatment several times, each time appending new data to the already existing file. (column names of the output are the same for each treatment)
Of course, an idea to work around my problem would be to create a file for each treatment and then concatenate them, but I would like to do that without changing much to my scripts.
Is it possible to append to an existing .csv file in pandas 0.14 ? (unfortunately, I cannot upgrade my version).
I was thinking I could do something using the 'mode' argument, http://pandas.pydata.org/pandas-docs/version/0.14.0/generated/pandas.DataFrame.to_csv.html?highlight=to_csv#pandas.DataFrame.to_csv , but I do not seem to find the right thing to do.
Any suggestions?
Yes you can use write mode 'a'. You may also need/want to use header=False.
I'm a little unclear why you don't want to do .read_csv() into df.append() into df.to_csv(), but that seems like an option too
Related
I love pycharm as it is very useful for my data analysis.
However there is something that I still can't figure out a big problem.
When I start to save a lot of variables, it is very useful. But sometimes, especially when I want to run a piece of row using seaborn and create a new graph, sometimes, all my variables disseapear and I have to reload them again from scratch.
I'd like to know, do you know a way to keep the data stored and run only a piece of my code without getting this problem ?
Thank you
I am not really sure what issue you are having, but from the sounds of it you just need to store a bunch of data and only use some of it.
If that's the case, I would save files which each set of data and then import the one you want to use at that time.
If you stick to the same data names then your program can just do something like:
import data_435 as data
# now your code can always access data.whatever_var
# even though you have 435 different data sets
I have a Python script that makes use of Pandas to read in an excel XLSX file to a DateFrame, perform some calculations & grouping, and generate a new DataFrame with the information I desire. All is fine and dandy there.
I want to now put this into production and call this script from a larger class that manages a series of operations I want to do with the info (put it into PDF, e-mail it, etc...) - how do I best 'store' and call this script?
Do I create a separate .py file and just place this script in a function that returns a DataFrame? Or do I make a Class that initializes and contains the DataFrame and can be called upon?
I realize it's a bit of a broad question, but I just want to understand what best practice may be, or just what most people like to do.
EDIT: To make my question more specific and less subjective - are these acceptable ways of storing and calling a pandas script?
I have a pretty complex excel file that includes pivot tables and sizes about 70 MB, and what I need is to edit one single cell with a script in Python. I'm trying openpyxl.
The problem is that it runs out of memory with no more than opening the file. Do you see any way around?
You can try pandas.read_excel. It may be better optimized for your purpose (reading one cell from one sheet).
I have a large (for my experience level anyway) text file of astrophysical data and I'm trying to get a handle on python/pandas. As a noob to python, it's comin' along slowly. Here is a sample of the text file, it's a 145Mb total file. When I'm trying to read this in pandas I'm getting confused because I don't know what to use pd.read_table(example.txt) or pd.read_csv(example.csv). In either case I can't call on a specific column without ipython freaking out, such as here. I know I'm doing something absent-minded. Can anyone explain what that might be? I've done this same procedure with smaller files and it works great, but this one seems to be limiting it's output, or just not working at all.
Thanks.
It looks like your columns are separated by varying amounts of whitespace, so you'll need to specify that as the separator. Try read_csv(example.csv, sep=r'\s+'). \s+ is the regular expression for "any amount of whitespace". Also, you should remove that # character from the beginning of the first line, as that will be read as an extra column and will mess up the reading.
Question: How do you write data to an already existing file at the beginning of the file with out writing over what's already there and with out reading the entire file into memory? (e.g. prepend)
Info:
I'm working on a project right now where the program frequently dumps data into a file. this file will very quickly balloon up to 3-4gb. I'm running this simulation on a computer with only 768mb of ram. pulling all that data to the ram over and over will be a great pain and a huge waste of time. The simulation already takes long enough to run as it is.
The file is structured such that the number of dumps it makes is listed at the beginning with just a simple value, like 6. each time the program makes a new dump I want that to be incremented, so now it's 7. the problem lies with the 10th, 100th, 1000th, and so dump. the program will enter the 10 just fine, but remove the first letter of the next line:
"9\n580,2995,2083,028\n..."
"10\n80,2995,2083,028\n..."
obviously, the difference between 580 and 80 in this case is significant. I can't lose these values. so i need a way to add a little space in there so that I can add in this new data without losing my data or having to pull the entire file up and then rewrite it.
Basically what I'm looking for is a kind of prepend function. something to add data to the beginning of a file instead of the end.
Programmed in Python
~n
See the answers to this question:
How do I modify a text file in Python?
Summary: you can't do it without reading the file in (this is due to how the operating system works, rather than a Python limitation)
It's not addressing your original question, but here are some possible workarounds:
Use SQLite (it's bundled with your Python)
Use a fancier database, either RDBMS or NoSQL
Just track the number of dumps in a different text file
The first couple of options are a little more work up front, but provide more flexibility. The last option is the easiest solution to your current problem.
You could quite easily create an new file, output the data you wish to prepend to that file and then copy the content of the existing file and append it to the new one, then rename.
This would prevent having to read the whole file if that is the primary issue.