How to collaborate with teammate on code more efficiently? - python

I encounter some problems when collaborate with my teammate on some Python project. There are some difference between the usage of the function. For example, we want to read the xlsx file, pd.read_excel(f'D:\\Financial\\Data\\TRD_Dalyr.xlsx'), this works on my computer but not on my teammate, and pd.read_excel('.\Data\TRD_Dalyr.xlsx') works for his computer but not mine. So every time, he modify the file and send back to me, I have to fix those problem. Is there a more efficient way to deal with this kind of problem?

Related

How can I have cleaner code and do less copying/pasting of code (VBA. Python)

I find myself constantly copying code from previous sections of my project into other sections in VBA.
My entire code seems unnecessarily long because this.
Over half my code is copied from one section and pasted in other sections. This is an issue when I realize there's a small error, causing me to go back to correct it in every instance I copied and pasted.
Is there any way to make my code shorter and to reference copied code only once in VBA, and also python?
I uaually have a module for my functions. they work like a tool box for me. i need to call them always in other subs.they save my time and make my codes more readable. Also debugging is much easier. I recommend the same to you.

Generate a certificate for .exe created by pyinstaller

I wrote a script for my company that randomly selects employees for random drug tests. It works wonderfully, except when I gave it to the person who would use the program. She clicked on it and a message popped up asking if she trusts the program. AFter clicking run anyways, AVG flagged it two more times before it would finally load. I read someone else's comment saying to make an exception for it on the antivirus. The problem is, I wrote another program that reads other scripts and reads/writes txt files, generates excel spreadsheets and many other things. I'm really close to releasing the final product to a few select companies as a trial, and this certificate thing is going to be an issue. I code for fun, so there's a lot of lingo that goes right by me. Can someone point me in the right direction where I can get some information on creating a trusted program?
It appears to be a whole long process to obtain a digital certification. You need one to be issued by a certification authority. Microsoft appears to have a docs page on it.
After you have the certification, you'd need to sign your .exe file after it's been created using a tool like SignTool. You may find more useful and detailed answers than I can provide you in this thread, as I actually only know quite little about this whole process and can only redirect you to those who know more. I'd suggest you look through what I have listed here before asking me any more, since I probably know about as much as you do past this point.
If anyone else is having this problem, I stumbled on a solution that works for me.
I created an Install Wizard using Inno Setup. Before I could install the software (My drug test program), it got flagged, asking me if I trust the software. I clicked "run anyway" and my antivirus flagged it two more times. After the program was installed. it never flagged me again. Since my main program will probably be used by 100-200 people, I'm completely fine having to do that procedure once. However, for a more "professional" result, it's probably work investing in certificates.

How to save python process for debug?

In the PyCharm debugger we can pause a process. I have a program to debug that takes a lot of time before we arrive to the part I'm debugging.
The program can be modeled like that: GOOD_CODE -> CODE_TO_DEBUG.
I'm wondering if there is a way to..
run GOOD_CODE
save the process
edit the code in CODE_TO_DEBUG
restore the process and with the edited CODE_TO_DEBUG
Is serialization the good way to do it or is there some tool to do that?
I'm working on OSX with PyCharm.
Thank you for your kind answers.
The classic method is to write a program that reproduces the conditions that lead into the buggy code, without taking a bunch of time -- say, read in the data from a file instead of generating it -- and then paste in the code you're trying to fix. If you get it fixed in the test wrapper, and it still doesn't work in the original program, you then "only" have to find the interaction with the rest of the program that's faulty (global variables, bad parameters passes, etc.)

Python crashes in rare cases when running code - how to debug?

I have a problem that I seriously spent months on now!
Essentially I am running code that requires to read from and save to HD5 files. I am using h5py for this.
It's very hard to debug because the problem (whatever it is) only occurs in like 5% of the cases (each run takes several hours) and when it gets there it crashes python completely so debugging with python itself is impossible. Using simple logs it's also impossible to pinpoint to the exact crashing situation - it appears to be very random, crashing at different points within the code, or with a lag.
I tried using OllyDbg to figure out whats happening and can safely conclude that it consistently crashes at the following location: http://i.imgur.com/c4X5W.png
It seems to be shortly after calling the python native PyObject_ClearWeakRefs, with an access violation error message. The weird thing is that the file is successfully written to. What would cause the access violation error? Or is that python internal (e.g. the stack?) and not file (i.e. my code) related?
Has anyone an idea whats happening here? If not, is there a smarter way of finding out what exactly is happening? maybe some hidden python logs or something I don't know about?
Thank you
PyObject_ClearWeakRefs is in the python interpreter itself. But if it only happens in a small number of runs, it could be hardware related. Things you could try:
Run your program on a different machine. if it doesn't crash there, it is probably a hardware issue.
Reinstall python, in case the installed version has somehow become corrupted.
Run a memory test program.
Thanks for all the answers. I ran two versions this time, one with a new python install and my same program, another one on my original computer/install, but replacing all HDF5 read/write procedures with numpy read/write procedures.
The program continued to crash on my second computer at odd times, but on my primary computer I had zero crashes with the changed code. I think it is thus safe to conclude that the problems were HDF5 or more specifically h5py related. It appears that more people encountered issues with h5py in that respect. Given that any error in my application translates to potentially large financial losses I decided to dump HDF5 completely in favor of other stable solutions.
Use a try catch statement. This can be put into the program in order to stop the program from crashing when erroneous data is entered

writing large netCDF4 file with python?

I am trying to use the netCDF4 package with python. I am ingesting close to 20mil records of data, 28 bytes each, and then I need to write the data to a netCDF4 file. Yesterday, I tried doing it all at once, and after an hour or so of execution, python stopped running the code with the very helpful error message:
Killed.
Anyway, doing this with subsections of the data, it becomes apparent that somewhere between 2,560,000 records and 5,120,000 records, the code doesn't have enough memory and has to start swapping. Performance is, of course, greatly reduced. So two questions:
1) Anyone know how to make this work more effeciently? One thing I am thinking is to somehow put subsections of data in incrementally, instead of doing it all at once. Anyone know how to do that? 2) I presume the "Killed" message happened when memory finally ran out, but I don't know. Can anyone shed any light on this?
Thanks.
Addendum: netCDF4 provides an answer to this problem, which you can see in the answer I have given to my own question. So for the moment, I can move forward. But here's another question: The netCDF4 answer will not work with netCDF3, and netCDF3 is not gone by a long shot. Anyone know how to resolve this problem in the framework of netCDF3? Thanks again.
It's hard to tell what you are doing without seeing code, but you could try using the sync command to flush the data in memory to disk after some amount of data has been written to the file:
http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.Dataset-class.html
There is a ready answer in netCDF4: declare the netCDF4 variable with some specified "chunksize". I used 10000, and everything proceeded very nicely. As I indicated in the edit to my answer, I would like to find a way to resolve this in netCDF3 also, since netDF3 is far from dead.

Categories

Resources