Python: Date comparing works in Spyder but not in Console - python

i have written a little csv parser based on pandas.
It works like a charm in Spyder 3.
Yesterday i tried to put it into production and run it with a .bat file, like:
python my_parser.py
In the console it doesn't work at all.
Pandas behaves different: The read_csv method lost the "quotechar" keyword argument, for example.
Especially date comparisons break all the time.
I read the dates with pandas as per
pd.read_csv(parse_dates=[col3, col5, col8])
Then i try a date calculation by substracting pd.to_datetime('now')
I tested everything, and as said, in Spyder no failure is thrown, it works and produces results as it should be.
As soon as i start it in the console, he throws type errors.
The most often one of the two dates is a mere string and the other stays a datetime, so the minus operation fails.
I could now rewrite the code and find a procedure that works in both, Spyder and console.
However, i prefer to ask you guys here:
What could be a possible reason that spyder and the console python behave completely different from each other?
It's really annoying to debug code that does not throw any failures, so i really would like to understand the cause.

The problem was related to having several python installations on my PC. After removing all of it and installing a single instance, it was working well. Thanks for the tipp, Carlos Cordoba!

Related

Migrating Python code away from Jupyter and over to PyCharm

I'd like to take advantage of a number of features in PyCharm hence looking to port code over from my Notebooks. I've installed everything but am now faced with issues such as:
The Display function appears to fail hence dataframe outputs (used print) are not so nicely formatted. Equivalent function?
I'd like to replicate the n number of code cell in a Jupyter notebook. The Jupyter code is split over 9 cells in the one Jupyter file and shift+ Enteris an easy way to check outputs then move on. Now I've had to place all the code in the one Project/python file and have 1200 lines of code. Is there a way to section the code like it is in Jupyter? My VBA background envisions 9 routines and one additional calling routine to get the same result.
Each block of code is importing data from SQL Server and some flat files so there is some validation in between running them. I was hoping there was an alternative to manually selecting large chunks of code/executing and/or Breakpoints everytime it's run.
Any thoughts/links would be appreciated. I spent some $$ on Udemy on a PyCharm course but it does not help me with this one.
Peter
The migration part is solved in this question: convert json ipython notebook(.ipynb) to .py file, but perhaps you already knew that.
The code-splitting part is harder. One reason to why Jupyter is so widely spread is the functionality to split the output and run each cell separately. I would recommend #Andrews answer though.
If you are using classes put each class in a new file.

Merge CSV into HDF5 in pandas leads to crash

I have about 700 CSV files. They are each typically a few meg and few thousand rows. So, the total folder is ~1gig. I want to merge them into a single HDF5 file.
I first defined a function read_file(file) that reads a single file, and parses it using pd.read_csv(). It then returns a dataframe.
I then use this code to convert:
for file in files:
print (file + " Num: "+str(file_num)+" of: "+str(len(files)))
file_num=file_num+1
in_pd=read_file(file)
in_pd.to_hdf('AllFlightLogs.h5','flights',mode='a',append=True)
And, it works just fine for about 202 files, and then python crashes with: Abort trap: 6
I don't know what this error means. I have also seen it pop up a window showing a stack error.
I have tried using complib='lzo' and that doesn't seem to make any difference. I have tried saving to a different hdf5 file every 100 reads, and that does change the exact number of files before the crash. But, it still happens.
There doesn't seem to be anything special about that particular file. Is there anyway to find out anything else about this particular error? I know that the crash happens when I try to call in_pd.to_hdf() (I added print statements before and after).
I am running on a Mac, and using pandas 16.2.
I upgraded to 3.2.1 and that seems to have fixed it. So, it was not a problem with my code (which was driving me crazy), but was a pytables problem.
Adam's answer solved my problem on my imac.
But as of 1sep15, while pytables is available for linux and osx, it is still not for Windows - I use the Anaconda distribution (very good in every other respect). Does anybody know why ? Is there a specific reason for that ?

Append error in python

I am using JuliaBox to run python code in python 2.
My code is as follows:
l=[]
l.append(5)
And the following is the error I got:
type Array has no field append
But I have used append as given in the python documentation. https://docs.python.org/2.6/tutorial/datastructures.html
Where did I go wrong?
You are using Julia not Python:
I don't think you are obviously doing anything wrong. I can reproduce your problem by clicking New on the JuliaBox.org landing page and selecting Python 2 in the Notebooks subsection of the menu. This creates a new notebook which you would expect to be running against the python kernel and gives you some visual indications that it is running python.
However
In fact, it is not running Python, it is running Julia. You can test this by, for instance simply typing sin(0.3). This would fail in Python, but gives you a result in Julia. Similarly println("Hello world!")
I'm not familiar with IJulia or Juliabox, so can't state categorically whether this is a bug, but it certainly feels like one and is unexpected and counter intuitive behaviour at best.
My final comment is to try a different interpreter - if you want something with a similar look and feel, you could always use iPython directly. As a bonus, you'll be able to use Python 3 instead of being stuck with 2.6!
EDIT
As highlighted by Matt B. in comments, this is a known bug in IJulia
Your Python code is perfectly valid. Try another interpreter.

PyCharm's console hangs with "big" objects

I'm using PyCharm, and typically run parts of my scripts in its Python console for debugging purposes. However, when I have to run something on a "big" variable (that consumes a lot of memory), the console becomes very slow.
Say df is a huge pandas data frame, as soon as I type df. in the console, it won't react anymore for 10-15 seconds. I can't tell whether this is pandas specific, since the only "big" variables I use come from pandas.
I'm running the community edition 3.4.1, pandas 0.14, Python 2.7.3, on Mac OS X 10.9.4 (with 8 GB of ram).
Size of df:
In[94]: df.values.nbytes + df.index.nbytes + df.columns.nbytes
Out[94]: 2229198184
I've been having the same problem, and have yet to find a proper solution, but this is a workaround that I've been using, from PyCharm hanging for a long time in iPython console with big data.
Instead of typing df.columns type d.columns and then go back and add the f. This at least prevents the hanging, although it renders the auto-completion a bit useless.
Have you tried changing the "Variable Loading Policy" to "On Demand"?

Python crashes in rare cases when running code - how to debug?

I have a problem that I seriously spent months on now!
Essentially I am running code that requires to read from and save to HD5 files. I am using h5py for this.
It's very hard to debug because the problem (whatever it is) only occurs in like 5% of the cases (each run takes several hours) and when it gets there it crashes python completely so debugging with python itself is impossible. Using simple logs it's also impossible to pinpoint to the exact crashing situation - it appears to be very random, crashing at different points within the code, or with a lag.
I tried using OllyDbg to figure out whats happening and can safely conclude that it consistently crashes at the following location: http://i.imgur.com/c4X5W.png
It seems to be shortly after calling the python native PyObject_ClearWeakRefs, with an access violation error message. The weird thing is that the file is successfully written to. What would cause the access violation error? Or is that python internal (e.g. the stack?) and not file (i.e. my code) related?
Has anyone an idea whats happening here? If not, is there a smarter way of finding out what exactly is happening? maybe some hidden python logs or something I don't know about?
Thank you
PyObject_ClearWeakRefs is in the python interpreter itself. But if it only happens in a small number of runs, it could be hardware related. Things you could try:
Run your program on a different machine. if it doesn't crash there, it is probably a hardware issue.
Reinstall python, in case the installed version has somehow become corrupted.
Run a memory test program.
Thanks for all the answers. I ran two versions this time, one with a new python install and my same program, another one on my original computer/install, but replacing all HDF5 read/write procedures with numpy read/write procedures.
The program continued to crash on my second computer at odd times, but on my primary computer I had zero crashes with the changed code. I think it is thus safe to conclude that the problems were HDF5 or more specifically h5py related. It appears that more people encountered issues with h5py in that respect. Given that any error in my application translates to potentially large financial losses I decided to dump HDF5 completely in favor of other stable solutions.
Use a try catch statement. This can be put into the program in order to stop the program from crashing when erroneous data is entered

Categories

Resources