I have a list that has strings separated by commas. The values of each string are nothing but the navigation steps/action of the same procedure done by different users. I want to create coordinates for these steps/actions and store them for creating graph. Each unique steps/actions
will have one coordinate. My idea is I will consider a string with more steps first. I will assign them coordinates ranging from (1,0) to (n,0). Here first string will have 'y' as 0 saying all the actions will be in one layer. When i check for steps/actions in second string, if there are any missing ones i will assign them (1,1) to (n,1). So on... Care has to be taken that if first steps/actions of one string falls in between of another bigger string, the coordinates should be after that.
This sounds confusing, but in simple terms, i want to create coordinates for user flow of a website.
Assume list,
A = ['A___O___B___C___D___E___F___G___H___I___J___K___L___M___N',
'A___O___B___C___E___D___F___G___H___I___J___K___L___M___N',
'A___B___C___D___E___F___G___H___I___J___K___L___M___N',
'A___B___C___E___D___F___G___H___I___J___K___L___M___N',
'A___Q___C___D___E___F___G___H___I___J___K___L___M___N',
'E___P___F___G___H___I___J___K___L___M___N']
I started below code, but it is getting complicated. Any help is appreciated.
A1 = [i.split('___') for i in A]
# A1.sort(key=len, reverse=True)
A1 = sorted(A1, reverse=True)
if len(A1)>1:
Actions = {}
horizontalVal = {}
verticalVal = {}
restActions = []
for i in A1:
for j in i[1:]:
restActions.append(j)
for i in range (len(A1)):
if A1[i][0] not in restActions and A1[i][0] not in Actions.keys():
Actions[A1[i][0]] = [i,0]
horizontalVal[A1[i][0]] = i
verticalVal[A1[i][0]] = 0
unmarkedActions = []
for i in range(len(sortedLen)):
currLen = sortedLen[i]
for j in range(len(A1)):
if len(A1[j]) == currLen:
if j == 0:
for k in range(len(A1[j])):
currK = A1[j][k]
if currK not in Actions.keys():
Actions[currK] = [k,0]
horizontalVal[currK] = k
verticalVal[currK] = 0
else:
currHori = []
print(A1[j])
for k in range(len(A1[j])):
currK = A1[j][k]
.
. to be continued
Related
I have a network as a list of lists, where the first list is the origin nodes and the second list is the destination nodes, and then the two lists combined tell you which origins have an edge to which destinations.
So essentially I have this:
edge_index = [[0,1,2,3,5,6,5,9,10,11,12,12,13],[1,2,3,4,6,7,8,10,11,10,13,12,9]]
And I want to split this list structure into:
[[0,1,2,3,5,6,5],[9,10,11,12,12,13]]
[[1,2,3,4,6,7,8],[10,11,10,13,12,9]]
i.e. there is no link between 8 and 9, so it's a new subgraph.
I cannot use networkx because it does not seem to give me the right number of subgraphs (I know how many networks there should be in advance). So I wanted to subgraph the list using a different method, and then see if I get the same number as NetworkX or not.
I wrote this code:
edge_index = [[0,1,2,3,5,6,5],[1,2,3,4,6,7,8]]
origins_split = edge_index[0]
dest_split = edge_index[1]
master_list_of_all_graph_nodes = [0,1,2,3,4,5,6,7,8] ##for testing
list_of_graph_nodes = []
list_of_origin_edges = []
list_of_dest_edges = []
graph_nodes = []
graph_edge_origin = []
graph_edge_dest = []
targets_list = []
for o,d in zip(origins_split,dest_split): #change
if o not in master_list_of_all_graph_nodes:
if d not in master_list_of_all_graph_nodes:
nodes = [o,d]
origin = [o]
dest = [d]
graph_nodes.append(nodes)
graph_edge_origin.append(origin)
graph_edge_dest.append(dest)
elif d in master_list_of_all_graph_nodes:
for index,graph_node_list in enumerate(graph_nodes):
if d in graph_node_list:
origin_list = graph_edge_origin[index]
origin_list.append(o)
dest_list.append(d)
master_list_of_all_graph_nodes.append(o)
if d not in master_list_of_all_graph_nodes:
if o in master_list_of_all_graph_nodes:
for index,graph_node_list in enumerate(graph_nodes):
if o in graph_node_list:
origin_list = graph_edge_origin[index]
origin_list.append(o)
dest_list.append(d)
master_list_of_all_graph_nodes.append(d)
if o in master_list_of_all_graph_nodes:
if d in master_list_of_all_graph_nodes:
o_index = ''
d_index = ''
for index,graph_node_list in enumerate(graph_nodes):
if d in graph_node_list:
d_index = index
if o in graph_node_list:
o_index = index
if o_index == d_index:
graph_edge_origin[o_index].append(o)
graph_edge_dest[d_index].append(d)
master_list_of_all_graph_nodes.append(o)
master_list_of_all_graph_nodes.append(d)
else:
o_list = graph_edge_origin[o_index]
d_list = graph_edge_dest[d_index]
node_o_list = node_list[o_index]
node_d_list = node_list[d_index]
new_node_list = node_o_list + node_d_list
node_list.remove(node_o_list)
node_list.remove(node_d_list)
graph_edge_origin.remove(o_list)
graph_edge_dest.remove(d_list)
new_origin_list = o_list.append(o)
new_dest_list = d_list.append(d)
graph_nodes.append(new_node_list)
graph_edge_dest.append(new_dest_list)
graph_edge_origin.append(new_origin_list)
master_list_of_all_graph_nodes.append(o)
master_list_of_all_graph_nodes.append(d)
print(graph_nodes)
print(graph_edge_dest)
print(graph_edge_origin)
And i get the error:
graph_edge_origin[o_index].append(o)
TypeError: list indices must be integers or slices, not str
I was wondering if someone could demonstrate where I'm going wrong, but also I feel like I'm doing this really inefficiently so if someone could demonstrate a better method I'd appreciate it. I can see other questions like this, but not one I can specifically figure out how to apply here.
In this line:
graph_edge_origin[o_index].append(o)
o_index is a string (probably the empty string, due to the for-loop not being entered).
In general either set a break-point on the line that is failing and inspect the variables in your debugger, or print out the variables before the failing line.
Good mooring to all,
The objective is to be able to create a series of new columns by inserting x and y into the df[f'sma_{x}Vs_sma{y}'] function.
The problem that I’m having is that I’m only getting the last tuple value into the function and therefore into the data frame as you can see on the last image.
On the second part of the code, 3 examples on how the tuples values must be plug into the function. IN the examples I will be using the first 2 tuples (10,11), (10,12) and the last tuple (48,49)
Code:
a = list(combinations(range(10, 15),2))
print(a)
for index, tuple in enumerate(a):
x = tuple[0]
y = tuple[1]
print(x, y)
df[f'sma_{x}_Vs_sma_{y}'] = np.where(ta.sma(df['close'], lenght = x) > ta.sma(df['close'], lenght = y),1,-1)
Code Examples:
Tuple (10,11)
df[f'sma_{10}_Vs_sma_{11}'] = np.where(ta.sma(df['close'], lenght = 10) > ta.sma(df['close'], lenght = 11),1,-1)
Tuple (10,12)
df[f'sma_{10}_Vs_sma_{12}'] = np.where(ta.sma(df['close'], lenght = 10) > ta.sma(df['close'], lenght = 12),1,-1)
Tuple (13,14)
df[f'sma_{13}_Vs_sma_{14}'] = np.where(ta.sma(df['close'], lenght = 13) > ta.sma(df['close'], lenght = 14),1,-1)
Error code
On the next lines the code that solve the issue. Although looking backwards seams very easy, it took me some time to get to the answer.
Thanks to the people that comment on the issue
a = list(combinations(range(5, 51),2))
print(a)
for x, y in a :
df[f'hma_{x}_Vs_hma_{y}'] = np.where(ta.hma(df['close'], lenght = x) > ta.hma(df['close'], lenght = y),1,-1)
There are two rasters as below. One consisting of only four values [1,2,3,4]. The other, consisting of values between 800 to 2500. The problem is to go through all of the raster-1 regions and find the maximum values of raster-2 which are located inside each region or segment.
In theory, it seems simple but I can't find a way to implement it. I'm reading scikit image documentation and I'm getting more confused. In theory, it would be:
for i in raster1rows:
for j in i:
# where j is a part of closed patch, iterate through the identical
# elements of raster-2 and find the maximum value.
There is another problem inherent to this question which I can't post as a different topic. As you can see, there are a lot of isolated pixels on raster-1, which could be interpreted as a region and produce a lot of additional maximums. to prevent this I used :
raster1 = raster1.astype(int)
raster1 = skimage.morphology.remove_small_objects(raster1 , min_size=20, connectivity=2, in_place=True)
But raster-1 seems to take no effect.
To remove the small object I've done
array_aspect = sp.median_filter(array_aspect, size=10)
And it gave me good results.
To find maximum elevation inside each closed part I've done:
# %%% to flood-fill closed boundaries on the classified raster
p = 5
ind = 1
for i in rangerow:
for j in rangecol:
if array_aspect[i][j] in [0, 1, 2, 3, 4]:
print("{}. row: {} col: {} is {} is floodfilled with {}, {} meters".format(ind, i, j, array_aspect[i][j], p, array_dem[i][j]))
array_aspect = sk.flood_fill(array_aspect, (i,j), p, in_place=True, connectivity=2)
p = p + 1
else:
pass
ind = ind + 1
# %%% Finds the max elev inside each fill and returns an array-based [Y,X, (ELEV #in meters)]
p = 5
maxdems = {}
for i in rangerow:
for j in rangecol:
try:
if bool(maxdems[array_aspect[i][j]]) == False or maxdems[array_aspect[i][j]][-1] < array_dem[i][j]:
maxdems[array_aspect[i][j]] = [i, j, array_dem[i][j]]
else:
pass
except: #This is very diabolical, but yeah :))
maxdems[array_aspect[i][j]] = [i, j, array_dem[i][j]]
print(maxdems)`
I've got my desired results.
I apologise for the terrible description and if this is a duplicated, i have no idea how to phrase this question. Let me explain what i am trying to do. I have a list consisting of 0s and 1s that is 3600 elements long (1 hour time series data). i used itertools.groupby() to get a list of consecutive keys. I need (0,1) to be counted as (1,1), and be summed with the flanking tuples.
so
[(1,8),(0,9),(1,5),(0,1),(1,3),(0,3)]
becomes
[(1,8),(0,9),(1,5),(1,1),(1,3),(0,3)]
which should become
[(1,8),(0,9),(1,9),(0,3)]
right now, what i have is
def counter(file):
list1 = list(dict[file]) #make a list of the data currently working on
graph = dict.fromkeys(list(range(0,3601))) #make a graphing dict, x = key, y = value
for i in list(range(0,3601)):
graph[i] = 0 # set all the values/ y from none to 0
for i in list1:
graph[i] +=1 #populate the values in graphing dict
x,y = zip(*graph.items()) # unpack graphing dict into list, x = 0 to 3600 and y = time where it bite
z = [(x[0], len(list(x[1]))) for x in itertools.groupby(y)] #make a new list z where consecutive y is in format (value, count)
z[:] = [list(i) for i in z]
for i in z[:]:
if i == [0,1]:
i[0]=1
return(z)
dict is a dictionary where the keys are filenames and the values are a list of numbers to be used in the function counter(). and this gives me something like this but much longer
[[1,8],[0,9],[1,5], [1,1], [1,3],[0,3]]
edits:
solved it with the help of a friend,
while (0,1) in z:
idx=z.index((0,1))
if idx == len(z)-1:
break
z[idx] = (1,1+z[idx-1][1] + z[idx+1][1])
del z[idx+1]
del z[idx-1]
Not sure what exactly is that you need. But this is my best attempt of understanding it.
def do_stuff(original_input):
new_original = []
new_original.append(original_input[0])
for el in original_input[1:]:
if el == (0, 1):
el = (1, 1)
if el[0] != new_original[-1][0]:
new_original.append(el)
else:
(a, b) = new_original[-1]
new_original[-1] = (a, b + el[1])
return new_original
# check
print (do_stuff([(1,8),(0,9),(1,5),(0,1),(1,3),(0,3)]))
this probably leads to scipy/numpy, but right now I'm happy with any functionality as I couldn't find anything in those packages. I have a matrix that contains data for a multi-variate distribution (let's say, 2, for the fun of it). Is there any function to compute (higher) moments of that? All I could find was numpy.mean() and numpy.cov() :o
Thanks :)
/edit:
So some more detail: I have multivariate data, that is, a matrix where rows display variables and columns observations. Now I would like to have a simple way of computing the joint moments of that data, as defined in http://en.wikipedia.org/wiki/Central_moment#Multivariate_moments .
I'm pretty new to python/scipy so I'm not sure I'd be the best person to code this one up, especially for the n-variables case (note that the wikipedia definition is for n=2), and I kind of expected there to be some out-of-the-box thing to use as I thought this would be a standard problem.
/edit2:
Just for the future, in case someone wants to do something similar, the following code (which is still under review) should give the sample equivalent of the raw moments E(X^2), E(Y^2), etc. It only works for two variables right now, but it should be extendable if one feels the need. If you see some mistakes or unclean/unpython-nish code, feel free to comment.
from numpy import *
# this function should return something as
# moments[0] = 1
# moments[1] = mean(X), mean(Y)
# moments[2] = 1/n*X'X, 1/n*X'Y, 1/n*Y'Y
# moments[3] = mean(X'X'X), mean(X'X'Y), mean(X'Y'Y),
# mean(Y'Y'Y)
# etc
def getRawMoments(data, moment, axis=0):
a = moment
if (axis==0):
n = float(data.shape[1])
X = matrix(data[0,:]).reshape((n,1))
Y = matrix(data[1,:]).reshape((n,1))
else:
n = float(data.shape[0])
X = matrix(data[:,0]).reshape((n,1))
Y = matrix(data[:,1]).reshape((n,11))
result = 1
Z = hstack((X,Y))
iota = ones((1,n))
moments = {}
moments[0] = 1
#first, generate huge-ass matrix containing all x-y combinations
# for every power-combination k,l such that k+l = i
# for all 0 <= i <= a
for i in arange(1,a):
if i==2:
moments[i] = moments[i-1]*Z
# if even, postmultiply with X.
elif i%2 == 1:
moments[i] = kron(moments[i-1], Z.T)
# Else, postmultiply with X.T
elif i%2==0:
temp = moments[i-1]
temp2 = temp[:,0:n]*Z
temp3 = temp[:,n:2*n]*Z
moments[i] = hstack((temp2, temp3))
# since now we have many multiple moments
# such as x**2*y and x*y*x, filter non-distinct elements
momentsDistinct = {}
momentsDistinct[0] = 1
for i in arange(1,a):
if i%2 == 0:
data = 1/n*moments[i]
elif i == 1:
temp = moments[i]
temp2 = temp[:,0:n]*iota.T
data = 1/n*hstack((temp2))
else:
temp = moments[i]
temp2 = temp[:,0:n]*iota.T
temp3 = temp[:,n:2*n]*iota.T
data = 1/n*hstack((temp2, temp3))
momentsDistinct[i] = unique(data.flat)
return momentsDistinct(result, axis=1)