I'm currently working on a project for fun, involving calculating way too many numbers of the Fibonacci sequence. Thing is, I want it to go fast and I also don't want to loose too much progress if I have to do an update or have a computer issues....
So I over-engineered that, and I am currently looking to implement it with multiprocessing, but everywhere I look the only way to share memory in that case is to make a duplicate of it and that takes about an hour for (Yeah I have very big numbers)
Is there any way to share a dictionary with multiprocessing without making a copy of it?
Here is the code, currently with my last implementation using threading instead of using multiprocessing. (ignore the bad timekeeping of this, i'll fix it later)
import threading
import time
def printProgressBar(iteration, total, prefix="", suffix="", decimals=1, length=100, fill="█", printEnd="\r"):
"""
Call in a loop to create terminal progress bar
#params:
iteration - Required : current iteration (Int)
total - Required : total iterations (Int)
prefix - Optional : prefix string (Str)
suffix - Optional : suffix string (Str)
decimals - Optional : positive number of decimals in percent complete (Int)
length - Optional : character length of bar (Int)
fill - Optional : bar fill character (Str)
printEnd - Optional : end character (e.g. "\r", "\r\n") (Str)
"""
percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
filledLength = int(length * iteration // total)
bar = fill * filledLength + "-" * (length - filledLength)
print(
f"\r{prefix} |{bar}| {percent}% {suffix} {time.strftime('%dd %H:%M:%S', time.gmtime(time.time() - start_time - 86400)).replace('31d', '0d')}",
end=printEnd,
)
# Print New Line on Complete
if iteration == total:
print()
dict47 = {0: 0, 1: 1, 2: 1}
dict47[0] = 2
def fibSequence(n):
if n not in dict47:
dict47[n] = fibSequence(n - 1) + fibSequence(n - 2)
if n > dict47[0]:
dict47[0] = n
if n > 5 and not ((n - 3) % (iterations // 100) == 0) and not ((n - 2) % (iterations // 100) == 0):
dict47.pop(n - 3, None)
return dict47[n]
def makeBackup(start, num, total):
x = threading.Thread(target=writeBackup, args=(num,), daemon=False)
y = threading.Thread(target=writeBackup, args=(num - 1,), daemon=False)
x.start()
y.start()
x.join()
y.join()
time.sleep(1)
print(
f'{num/10000000}% done after {time.strftime("%dd %H:%M:%S", time.gmtime(time.time() - start - 86400)).replace("31d", "0d")}'
)
timings = open("times.txt", "a")
timings.write(str(int(time.time() - start)) + "\n")
timings.close()
def writeBackup(num):
file = open(f".temp/fib{num}.txt", "a")
file.write(str(num) + " : " + str(dict47[num]))
file.close()
dict47.pop(num, None)
def loadDict():
from pathlib import Path
maximum = 0
for n in range(1, 100):
if Path(f".temp/fib{n*10000000}.txt").is_file():
maximum = n * 10000000
print("Maximum number found:", maximum)
if maximum != 0:
file = open(f".temp/fib{maximum}.txt", "r")
temp = "".join(file.readlines())
dict47[maximum] = int(temp.lstrip(str(maximum) + " : "))
file.close()
file = open(f".temp/fib{maximum - 1}.txt", "r")
temp = "".join(file.readlines())
dict47[maximum - 1] = int(temp.lstrip(str(maximum - 1) + " : "))
file.close()
dict47[0] = maximum
print("Dictionary loaded at ", maximum)
else:
print("No dictionary found, starting from scratch")
if __name__ == "__main__":
try:
timings = open("times.txt", "r")
lastTime = int(timings.readlines()[-1])
start_time = time.time() - lastTime
timings.close()
except:
start_time = time.time() + 86400
print("Start duration:", time.strftime("%dd %H:%M:%S", time.gmtime(time.time() - start_time)).replace("31d", "0d"))
try:
iterations = int(input("Enter the number of iterations: "))
except:
iterations = 1000000000
print(iterations, "iterations will be performed")
loadDict()
num = dict47[0]
while num < iterations:
if num == 2:
num += 248
else:
num += 250
fibSequence(num)
if num % 1000 == 0:
printProgressBar(num, iterations, prefix="Progress:", suffix="Complete", length=100)
if num % (iterations // 100) == 0:
save = threading.Thread(
target=makeBackup,
args=(start_time, num, iterations),
daemon=False,
)
save.start()
file = open("fib.txt", "a")
file.write(str(iterations) + " : " + str(fibSequence(iterations)) + "\n")
file.close()
try:
save.join()
except:
print("No save thread running, exiting...")
The script below works fine when running it with (F9). When running it with (F5) it still works (all operations are done fine), but produces the following error:
File "", line 1, in runfile('S:/Fakultaet/MFZ/NWFZ/AGdeHoz/Philipp/Python/Analysis_script.py',wdir='S:/Fakultaet/MFZ/NWFZ/AGdeHoz/Philipp/Python')
File "c:\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py",
line 709, in runfile namespace.pop('file')
KeyError: 'file'
I would like to know what is happening and how to fix it.
I read something about a nesting problem, so I took out the part where I import my own function and also the part where I use it. The error still persists. So it has at least nothing to do with my own function.
from IPython import get_ipython
get_ipython().magic('%reset -f')
import time
tic = time.time()
import sys
import os
import pickle
import CSC_getdata_timestamps
General_folder = 'Data' #where processed data will be saved
filename_extension = '_Sleep_auditory'; #file extension name
Database_name = 'mydata.obj' #name of database
ExpDates = 'ExpDates.obj' #name of object containing all experimental
dates
root = r"S:\Fakultaet\MFZ\NWFZ\AGdeHoz\Philipp" #root directory (data
location)
AnalyType = "LFP" #"SP" Local field potential or Spike analysis
spr = 30000 #sample rate
Cname = "CH"
file_md = open(r"S:\Fakultaet\MFZ\NWFZ\AGdeHoz\Philipp\Data\Database\\" +
Database_name, "rb")
md = pickle.load(file_md)
file_ExpDates =
open(r"S:\Fakultaet\MFZ\NWFZ\AGdeHoz\Philipp\Data\Database\\" +
ExpDates, "rb")
ExpDates = pickle.load(file_ExpDates)
for mi, expdates in enumerate(ExpDates):
day = expdates
mouse = md[expdates]['mouse']
files = md[expdates]['rectime']
filesm = md[expdates]['FRAm'] + md[expdates]['REMm'] + md[expdates]
['SWSm'] + md[expdates]['awakem']
rect = md[expdates]['rect']
filest = [md[expdates]['FRAt'], md[expdates]['REMt'], md[expdates]
['SWSt'], md[expdates]['awaket']]
channels = md[expdates]['channels']
for f, files_m in enumerate(filesm):
try:
if len(files) == 1:
recname = files
else:
tfiles = []
tfilesm = []
for i, file in enumerate(files):
tfiles.append(int(files[i][0:2]) * 60 + int(files[i]
[3:5]))
tfilesm = int(files_m[f][0:2]) * 60 + int(files_m[f]
[3:5])
recidx = [i for i in range(len(tfiles)) if tfiles[i] <
tfilesm]
recname = files[recidx[-1]]
TS0 = (int(rect[0:2]) * 3600 + int(rect[3:5]) * 60 +
int(rect[6:8])) * spr
TS1 = TS0 + (int(filest[f][0][0:2]) * 3600 + int(filest[f][0]
[3:5]) * 60 + int(filest[f][0][6:8])) * spr
TS2 = TS0 + TS1 + (int(filest[f][1][0:2]) * 3600 +
int(filest[f][1][3:5]) * 60 + int(filest[f][1][6:8])) * spr
TS = [TS1, TS2]
savename = files_m[0:2] + "-" + files_m[3:5]
if AnalyType == "LFP":
if not os.path.exists(root + "\Data\Raw" +
filename_extension + "\LFP\\" + day):
os.makedirs(root + "\Data\Raw" + filename_extension +
"\LFP\\" + day)
else:
print("Directory already exists.")
elif AnalyType == "SP":
if not os.path.exists(root + "\Data\Raw" +
filename_extension + "\SP\\" + day):
os.makedirs(root + "\Data\Raw" + filename_extension +
"\SP\\" + day)
else:
print("Directory already exists.")
else:
sys.exit("Invalid type of analysis. Check variable
AnalyType.")
varlist, trseq, trlength, stlength, ntrials, datacscini,
samplefreq, tscsc, tstrig =
CSC_getdata_timestamps.CSC_getdata_timestamps(root, day,
savename, recname, channels, AnalyType, TS)
print(trseq)
except Exception as e:
print('Error: ' + str(e))
print('mi = {}, f = {}'.format(mi, f))
toc = round(time.time() - tic, 2)
print("elapsed time:", toc, "seconds")
The whole script is to combine data from a (1) written database, with a (2) dataframe which includes specifications of a sound played during a (3) recording.
forgive me if it is written very unpythonic, I just recently switched from Matlab.
The script is for a application in Neuroscience.
(Spyder maintainer here) This error is fixed in our latest version (3.3.4). Please update by opening the Anaconda prompt and running there
conda install spyder=3.3.4
A question.
Using multiprocessing is for having a faster code.
But after using the following framework, getting the outputs takes the same time or more as it took in normal code.
import multiprocessing
def code() :
my code
if __name__ == '__main__' :
p = multiprocessing.Process(target = code)
p.start()
p.join()
because of being 2 processors laptop, after running this code the program wants me to import data two times.
The problem is time which did not make any sense in this way. I run into a long time as long as the normal code without parallelism.
import numpy as np
from scipy.integrate import odeint
from math import *
from scipy.integrate import quad
import pandas as pd
import os
import warnings
warnings.filterwarnings("ignore")
#you need add the following 3 lines
from multiprocessing import Pool
from multiprocessing import Process
import multiprocessing
print("Model 4, Equation 11")
print("")
###################### STEP NUMBER #######################
N = int(input("PLEASE ENTER NUMBER OF STEP WALKS: ")) # Step walk by user
dec=int(input("NUMBER OF DECIMAL PLACES OF OUTPUTS (RECOMENDED 10-15)?"))
print("")
print("PLEASE WAIT, METROPOLIS HASTINGS IS RUNNING ... ")
print("")
def FIT():
##########################################################
od0o = np.zeros((N,))
od0o[0]=0.72
od0n = np.zeros((N,))
Mo = np.zeros((N,))
Mo[0]= 0
Mn = np.zeros((N,))
co = np.zeros((N,))
co[0]= 0.84
cn = np.zeros((N,))
bo = np.zeros((N,))
bo[0]= 0.02
bn = np.zeros((N,))
H0o = np.zeros((N,))
H0o[0]= 70
H0n = np.zeros((N,))
Orco = np.zeros((N,))
Orco[0]= 0.0003
Orcn = np.zeros((N,))
temp=1e10 # a big number
##########################################################
CovCMB=[[3.182,18.253,-1.429],
[18.253,11887.879,-193.808],
[-1.429,-193.808,4.556]] # CMB DATA
##########################################################
def OD_H(U,z,c,b,Orc):
od, H = U
Omegai = 3 * b * ((1- od - 2*(Orc)**0.5) + (1- od - 2*(Orc)**0.5)**2/(1-2*(Orc)**0.5)) #equation 10
div1=np.divide((c**2-od),(1+z),where=(1+z)!=0)
div2=np.divide(H ,(1+z),where=(1+z)!=0)
dMdt = (div1)*(6*od+6-9*(od/c**2)+ Omegai)*(1+c**2+od*(1-3/c**2))**(-1)
dHdt = (div2)*(6*od+6-9*(od/c**2)+ Omegai)*(1+c**2+od*(1-3/c**2))**(-1)
return [dMdt, dHdt]
def solution(H0,z1,z,od0,c,b,Orc):
U = odeint(OD_H,[od0,H0],[z1,z], args=(c,b,Orc))[-1]
od, H = U
return H
##########################################################
def DMCMB1(H0,z1,z,od0,c,b,Orc):
dm = 1090 * 1/solution(H0,z1,z,od0,c,b,Orc)
return dm
def R1(H0,z1,z,od0,c,b,Orc):
#r=sqrt(Om)*(70/299000)*rszstar(z,Om,Od)
r = sqrt(1-od0-2*(Orc)**0.5)*DMCMB1(H0,z1,z,od0,c,b,Orc)
return r
def lA1(H0,z1,z,od0,c,b,Orc):
la=((3.14*299000/H0)*DMCMB1(H0,z1,z,od0,c,b,Orc))/(153)
return la
def CMBMATRIX1(H0,z1,z,od0,c,b,Orc,M):
hmCMB1=[lA1(H0,z1,z,od0,c,b,Orc)-301.57, R1(H0,z1,z,od0,c,b,Orc)-1.7382+M, 0.0222-0.02262]
vmCMB1=[[lA1(H0,z1,z,od0,c,b,Orc)-301.57], [R1(H0,z1,z,od0,c,b,Orc)-1.7382], [0.0222-0.02262]]
fmCMB1=np.dot(hmCMB1,CovCMB)
smCMB1=np.dot(fmCMB1,vmCMB1)[0]
return smCMB1
######################################################
def TOTAL(H0, od0, c, b,Orc, M) :
total = CMBMATRIX1(H0,0,1090,od0,c,b,Orc,M)
return total
######################################################
################## MCMC - MH #########################
highest=0
pat='C:/Users/21/Desktop/MHT/Models/outputs'
file_path = os.path.join(pat,'Model3.txt')
file_path2 = os.path.join(pat,'Model3min.txt')
with open(file_path, 'w') as f: # DATA WILL BE SAVED IN THIS FILE, PLEASE BECAREFUL TO CHANGE THE NAME IN EACH RUN TO AVOIDE REWRITING.
with open(file_path2, 'w') as d:
for i in range (1,N):
num = 0
R = np.random.uniform(0,1)
while True:
num += 1
od0n[i] = od0o[i-1] + 0.001 * np.random.normal()
H0n[i] = H0o[i-1] + 0.01 * np.random.normal()
bn[i] = bo[i-1] + 0.001 * np.random.normal()
cn[i] = co[i-1] + 0.001 * np.random.normal()
Mn[i] = Mo[i-1] + 0.01 * np.random.normal()
Orcn[i] = Orco[i-1] + 0.00001 * np.random.normal()
L = np.exp(-0.5 * (TOTAL(H0n[i], od0n[i], cn[i], bn[i],Orcn[i], Mn[i]) - TOTAL(H0o[i-1], od0o[i-1], co[i-1], bo[i-1],Orco[i-1], Mo[i-1]))) # L = exp(-( x^2 )/2)
LL=min(1,max(L,0))
if LL>R:
od0o[i]= od0n[i]
H0o[i] = H0n[i]
bo[i] = bn[i]
co[i] = cn[i]
Mo[i] = Mn[i]
Orco[i] = Orcn[i]
chi = TOTAL(H0o[i], od0o[i], co[i], bo[i],Orco[i], Mo[i])
else:
od0o[i]= od0o[i-1]
H0o[i] = H0o[i-1]
bo[i] = bo[i-1]
co[i] = co[i-1]
Mo[i] = Mo[i-1]
Orco[i] = Orco[i-1]
chi = TOTAL(H0o[i], od0o[i], co[i], bo[i],Orco[i], Mo[i])
if (Mo[i]>0 and 0<bo[i]<0.09 and Orco[i]>0) or num>100: # constraining the value to stay in positive area
highest = max(num, highest)
break
f.write("{0}\t{1}\t{2}\t{3}\t{4}\t{5}\t{6}\t{7}\t{8}\t{9}\t{10}\t{11}\t{12}\n".format(round(chi,dec),' ',round(H0o[i],dec),' ',round(od0o[i],dec),' ',
round(co[i],dec),' ',round(bo[i],dec),' ',
round(Orco[i],dec),' ',round(Mo[i],dec)))
if chi<temp:
temp=chi
aa = H0o[i]
bb = od0o[i]
cc = co[i]
dd = bo[i]
ee = Mo[i]
ff=Orco[i]
Om=1-2*sqrt(Orco[i])-od0o[i]
# minimum of chi and its parameters
d.write("{0}\t{1}\t{2}\t{3}\t{4}\t{5}\t{6}\t{7}\t{8}\t{9}\t{10}\t{11}\t{12},\t{13}\t{14}\n".format(round(temp,dec), "H =", round(aa,dec), "Orc=",
round(ff,dec), "OD =",round(bb,dec),"c =",
round(cc,dec),"b =", round(dd,dec),
"M =",round(ee,dec),"Om =",round(Om,dec)))
print(round(temp,dec), "H =", round(aa,dec), "Orc=",round(ff,dec), "OD =",round(bb,dec),"c =", round(cc,dec),"b =", round(dd,dec), "M =",round(ee,dec),"Om =",round(Om,dec))
#print(highest)
print("")
#test = input("Press the enter to exit...")
#print(test)
if __name__ == '__main__':
p = multiprocessing.Process(target=FIT)
p.start()
p.join()
I think you missed main concept of multiprocessing. It does not run your code faster, it just let's you run something in another process to bypass GIL (https://wiki.python.org/moin/GlobalInterpreterLock).
It can be used to parallel computing of some function for different input values, like this example from docs
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
Which results in computing f in different processes and just each of them returning separate value
You are creating only one process and all of your other logic is sequential thats why there is no change in performance because all of your code will run sequentially.
There are two different scenarios where you can use multiprocessing
Totally Independent functionalities : If you have functionalities which are totally Independent and there is no restriction to execute them sequentially you can execute them in parallel this way none of these functionalities will have to wait for each other.
A good analogy will be reading news paper and having breakfast here no need of doing them sequentially so we can do them at the same time.
Executing same functionality for different inputs :
If you are executing a functionality repeatedly for different input then you can execute multiple instances of that functionality for several inputs at a time.
For analogy think of single ticket counter
Then think of multiple ticket counters same functionality multiple instances.
Find these scenarios in your code then try to parallelize those functionalities.
Hope it helped.