Related
G'day. I am new to coding and python.
My goal is to try to create a code where if the element in y reaches the next 0, all the 0 to n (before the next zero) will become n. A sample output should look like this after executing the code below:
y = [0,1,2,3,4,5,6,7,8,0,1,0,1,2,3,4,5,6,7]
# I am interating over two inputs. y_1 = y[1:] and append 0 at the end.
y_1 = [1,2,3,4,5,6,7,8,0,1,0,1,2,3,4,5,6,7,0]
expected output:
x = [8, 8, 8, 8, 8, 8, 8, 8, 8, 1, 1, 7, 7, 7, 7, 7, 7, 7, 7]
The problem I'm facing I believe comes from the while loop not looping over after [0,1,2,3,4,5,6,7,8] is deleted from the list as specified in the code below (which logically to me should loop over?) :
y = [0,1,2,3,4,5,6,7,8,0,1,0,1,2,3,4,5,6,7]
y_1 = [1,2,3,4,5,6,7,8,0,1,0,1,2,3,4,5,6,7,0]
x = []
while len(y):
for i, j in zip(y, y_1):
if i > j:
z = i
for k in range(z+1):
x.append(y[i])
del y[0:z+1]
del y_1[0:z+1]
elif i == j:
z = 0
x.append(z)
del y[z]
del y_1[z]
Any suggestion would be greatly appreciated :)
I don't know why you use del and while because you should get expected result doing
y = [0,1,2,3,4,5,6,7,8,0,1,0,1,2,3,4,5,6,7]
y_1 = y[1:] + [0]
x = []
for i, j in zip(y, y_1):
if i > j:
z = i
for k in range(z+1):
x.append(y[i])
elif i == j:
z = 0
x.append(z)
print(x)
In Python you shouldn't delete element from list which you use as for-loop because when it delete element then it moves other elements in list used as for-loop and it can give unexpected results.
If you really want to run it in some while len(y) then you should rather create new list with elements which you want to keep. Or you should duplicate list - y_duplicated = y.copy() - and delete in duplicated list and after for-loop replace original list y = y_duplicated
I need to iterate sequentially until I find or maximize.
For example:
ds = [1,2,3,4,5,6,7,8,9]
tmp = 3 # start (variable)
max = 5 # maximize (variable)
target = 8
so output: [4,5,6,7,8]
Sorry, my english is not good.
As a very simple approach you could index over the concatenation with the same list.
However, from memory point of view certainly not the best solution.
# ds = [1,2,3,4,5,6,7,8,9]
start = 4
length = 7
res = (ds + ds)[start:start+length]
# [5, 6, 7, 8, 9, 1, 2]
There is a built-in way to do this.
new_data = i[starting_index : ending_index]
You can leave a number blank if you want to get the rest of the list. Like:
>>>i = [0,8,9,4]
>>>i[1:]
[8,9,4]
see this solution i used a for loop to reach your target
ds = [1,2,3,4,5,6,7,8,9]
tmp = 3 # start (variable)
max = 5 # maximize (variable)
target=8
i=0 # itiration to loop on the list
for x in ds:
if ds[i]<tmp: #till now we didnt reach start point
i=i+1
else:
print(ds[i])
i=i+1
if i == target: #since the target has been reached
break
Try:
>>> ds
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> (ds * ((5+6)//len(ds) + 1))[5:5+6]
[6, 7, 8, 9, 1, 2]
Now - 5 is your starting position 5+6 is your end position. You want to iterate over whole data set, as many times to contain end position, so:
((5+6)//len(ds) + 1)
In case if your starting position would be in second, or later repetition (so in your case if it would > 8. You can move it back by subtracting:
(startPosition//len(ds)) * len(ds)
from both start position, and end position.
Context
I have not yet found a generalized question about how to create an "asymmetric*" multi dimensional array in python. *With asymmetric I mean that some elements that are arrays themselves, vary in their lengths.
attempt 0
second_dim = [1,2]
second_dim[0] = [1,5]
second_dim[1] = [1,5,9]
list = [ [ [ [ '' for i in range(2) ] for j in range(len(second_dim[i])) ] for k in range(5) ] for l in range(6) ]
Note that this yields error: undefined name i
Question
How would one construct such^ an array in python using parameters?
^Suppose:
dimension 0 has length a = 2
the first element of dimensions 0 has length b = 2
the second element of dimensions 0 has length c = 3
dimension 2 has length 5 dimension 4 has length d = 5
dimension 3 has length 5 dimension 4 has length e = 6
I am really not sure what you have meant by the example, but this might help.
If you define your desired dimensions in a list like this:
dims = [
[4, 5],
[5, 6, 2],
]
You can use a very simple recursive function:
def make_dims(dims, fill_func=int):
if isinstance(dims, int):
return [fill_func() for _ in range(dims)]
return [make_dims(x, fill_func) for x in dims]
l = make_dims(dims)
to create the list list with that has "dimensions" like this:
len(l) == len(dims) == 2
len(l[0]) == len(dims[0]) == 2
len(l[0][0]) == dims[0][0]) == 4
len(l[0][1]) == dims[0][1]) == 5
len(l[1]) == len(dims[1]) == 3
len(l[1][0]) == dims[1][0] == 5
len(l[1][1]) == dims[1][1] == 6
len(l[1][2]) == dims[1][2] == 2
and all the last level elements being whatever is returned by the fill_func() producer.
For your example second_dim = [[1, 5], [1, 5, 9]] calling make_dims(second_dim) just works right out of the box.
However, if you are sure you don't need the flexibility with more dimensions this solution provides and you insist on using list comprehension, your "attempt 0" isn't too far off. However, you just need to better keep track what values are processing:
[[[0 for _ in range(y)] for y in x] for x in second_dim]
I have a 2D numpy array with dimension (690L, 15L).
I need to compute a columns wise mean on this dataset only in some particolar columns, but with a condition: I need to include a row if and only if an element in the same row at specific column satisfy a condition. Let's me more cleare with some code.
f = open("data.data")
dataset = np.loadtxt(fname = f, delimiter = ',')
I have array with fullfilled with indexes where I need to perform mean (and variance)
index_catego = [0, 3, 4, 5, 7, 8, 10, 11]
The condition is that the dataset[i, 14] == 1
As output I want an 1D array with length like len(index_catego) where each element of this array is the mean of the previously columns
output = [mean_of_index_0, mean_of_index_3, ..., mean_of_index_11]
I am using Python recently but I am sure there is a cool way of doing this with np.where, mask, np.mean or something else.
I already implement a solution, but it is not elegant and I am not sure if it is correct.
import numpy as np
index_catego = [0, 3, 4, 5, 7, 8, 10, 11]
matrix_mean_positive = []
matrix_variance_positive = []
matrix_mean_negative = []
matrix_variance_negative = []
n_positive = 0
n_negative = 0
sum_positive = np.empty(len(index_catego))
sum_negative = np.empty(len(index_catego))
for i in range(dataset.shape[0]):
if dataset[i, 14] == 0:
n_positive = n_positive + 1
j = 0
for k in index_catego:
sum_positive[j] = sum_positive[j] + dataset[i, k]
j = j + 1
else:
n_negative = n_negative + 1
j = 0
for k in index_catego:
sum_negative[j] = sum_negative[j] + dataset[i, k]
j = j + 1
for item in np.nditer(sum_positive):
matrix_mean_positive.append(item / n_positive)
for item in np.nditer(sum_negative):
matrix_mean_negative.append(item / n_negative)
print(matrix_mean_positive)
print(matrix_mean_negative)
If you wanna try your solution, I put some data example
1,22.08,11.46,2,4,4,1.585,0,0,0,1,2,100,1213,0
0,22.67,7,2,8,4,0.165,0,0,0,0,2,160,1,0
0,29.58,1.75,1,4,4,1.25,0,0,0,1,2,280,1,0
0,21.67,11.5,1,5,3,0,1,1,11,1,2,0,1,1
1,20.17,8.17,2,6,4,1.96,1,1,14,0,2,60,159,1
0,15.83,0.585,2,8,8,1.5,1,1,2,0,2,100,1,1
1,17.42,6.5,2,3,4,0.125,0,0,0,0,2,60,101,0
Thanks for you help.
UPDATE 1:
I tried with this
output_positive = dataset[:, index_catego][dataset[:, 14] == 0]
mean_p = output_positive.mean(axis = 0)
print(mean_p)
output_negative = dataset[:, index_catego][dataset[:, 14] == 1]
mean_n = output_negative.mean(axis = 0)
print(mean_n)
but means computed by the first (solution not cool) and the second solution (one line cool solotion) are all different.
I checked what dataset[:, index_catego][dataset[:, 14] == 0] and dataset[:, index_catego][dataset[:, 14] == 1] select and seems correct (right dimension and right element).
UPDATE 2:
Ok, the first solution is wrong because (for example) the first column have as element only 0 and 1, but as mean return a value > 1. I do not know where I failed. Seems that the positive class is correct (or at least plausible), while negative class are not even plausible.
So, is it second solution correct? Is there a better way of doing it?
UPDATE 3:
I think I found the problem with the first solution: I am using jupyter notebook and sometimes (not all the times) when I rerun the same cell where the first solution is, element in matrix_mean_positive and matrix_mean_negative are doubled. If someone know why, could be point me?
Now both solution return the same means.
Do Kernel->Restart in jupyter notebook to clean the memory before rerun
This is a simple program but I am finding difficulty how it is actually working.
I have database with 3 tuples.
import matplotlib.pyplot as plt
queries = {}
rewrites = {}
urls = {}
for line in open("data.tsv"):
q, r, u = line.strip().split("\t")
queries.setdefault(q,0)
queries[q] += 1
rewrites.setdefault(r,0)
rewrites[r] += 1
urls.setdefault(u,0)
urls[u] += 1
sQueries = []
sQueries = [x for x in rewrites.values()]
sQueries.sort()
x = range(len(sQueries))
line, = plt.plot(x, sQueries, '-' ,linewidth=2)
plt.show()
This is whole program,
Now
queries.setdefault(q,0)
This command will set the values as 0 , if key i,e and q is not found.
queries[q] += 1
This command will increment the value of each key by 1 if key is there.
Same we continue with all tuples.
Then,
sQueries = [x for x in rewrites.values()]
Then we store the values from Dictionary rewrites , to List Squeries
x = range(len(sQueries))
This command I am not getting what is happening. Can anyone please explain.
len(sQueries)
gives number of elements in your list sQueries
x = range(len(sQueries))
will create a list x containing elements from 0,1,... to (but not including) length of your sQueries array
This:
sQueries = []
sQueries = [x for x in rewrites.values()]
sQueries.sort()
is an obtuse way of writing
sQueries = rewrites.values()
sQueries = sorted(sQueries)
in other words, sort the values of the rewrites dictionary. If, for the sake of argument, sQueries == [2, 3, 7, 9], then len(sQueries) == 4 and range(4) == [0, 1, 2, 3].
So, now you're plotting (0,2), (1,3), (2,7), (3,9), which doesn't seem very useful to me. It seems more likely that you would want the keys of rewrites on the x-axis, which would be the distinct values of r that you read from the TSV file.
length = len(sQueries) # this is length of sQueries
r = range(length) # this one means from 0 to length-1
so
x = range(len(sQueries)) # means x is from 0 to sQueries length - 1