I have a list and want to sum the value of index(-1) with current value index for the whole list
list = [-2, -2, -1, 1, -1, 1, 3, 5, 6, -2, -1, 0, -2, -1, -2, 2]
Expected output:
new_list =[-2,-4,-3, 0, 0, 0, 4, 8, 11, 4, -3, -1, -2, -3, -3, 0]
new_list[0] = 0+ list[0] = 0+ (-2) = -2
new_list[1] = list[0] + list[1] = (-2) + (-2) = -4
new_list[2] = list[1] + list[2] = (-2)+ (-1) = -3
new_list[3] = list[2] + list[3] = (-1)+ (1) = 0
Basically new_list[index] = list[index -1] + list[index]
list1 = [-2, -2, -1, 1, -1, 1, 3, 5, 6, -2, -1, 0, -2, -1, -2, 2]
new_list=[list1[0]]
for i in range(len(list1)-1):
value=list1[i]+list1[i+1]
new_list.append(value)
print(new_list)
Output:[-2,-4,-3, 0, 0, 0, 4, 8, 11, 4, -3, -1, -2, -3, -3, 0]
You have to iterate on the list and add the numbers like so:
list = [-2, -2, -1, 1, -1, 1, 3, 5, 6, -2, -1, 0, -2, -1, -2, 2]
new_list = list[0] # We just take the first element of the list, because we don't add anything
for number, element in enumerate(list[1:]):
new_list.append(element + list[number - 1])
Or a more pythonic way:
new_list = [list[0]].extend([element + list[number - 1] for number, element in enumerate (list[1:])
If I understand your requirement correctly, you can do this quite easily with pandas. For example:
import pandas as pd
# Create a pandas Series of values
s = pd.Series([-2, -2, -1, 1, -1, 1, 3, 5, 6, -2, -1, 0, -2, -1, -2, 2])
# Add the current value in the series to the 'shifted' (previous) value.
output = s.add(s.shift(1), fill_value=0).tolist()
# Display the output.
print(output)
Output:
[-2.0, -4.0, -3.0, 0.0, 0.0, 0.0, 4.0, 8.0, 11.0, 4.0, -3.0, -1.0, -2.0, -3.0, -3.0, 0.0]
>>> list = [-2, -2, -1, 1, -1, 1, 3, 5, 6, -2, -1, 0, -2, -1, -2, 2]
>>> list_length = len(list)
>>> result_list = [list[0]]
>>> for i in range(list_length):
... if not (i+1) == list_length:
... result_list.append(list[i] + list[i+1])
...
>>> result_list
[2, -4, -3, 0, 0, 0, 4, 8, 11, 4, -3, -1, -2, -3, -3, 0]
The above is the solution of your quest.
Related
I need to check if the occurrences of identical consecutive numbers is below a certain threshold, e.g. maximal two same consecutive numbers.
pd.Series(data=[-1, -1, 2, -2, 2, -2, 1, 1]) # True
pd.Series(data=[-1, -1, -1, 2, 2, -2, 1, 1]) # False
Further checks:
Only the numbers +1 and -1 are allowed to occur as consecutive numbers with a maximum of two occurrences.
pd.Series(data=[-1, 1, -2, 2, -2, 2, -1, 1]) # True
pd.Series(data=[1, 1, -2, 2, -2, 2, -1, 1]) # True
pd.Series(data=[-1, -1, 2, 2, -2, 1, 1, -2]) # False
pd.Series(data=[-1, 1, -2, -2, 1, -1, 2, -2]) # False
You can use the shift method along with Boolean indexing to achieve this. The idea is to compare each element with the previous one, and if they are equal and not equal to +1 or -1, return False.
Here's an example implementation:
def check_consecutive(series):
consecutive = (series == series.shift()).values
allowed = ((series == 1) | (series == -1)).values
return (consecutive & ~allowed).sum() <= 2
print(check_consecutive(pd.Series(data=[-1, -1, 2, -2, 2, -2, 1, 1]))) # True
print(check_consecutive(pd.Series(data=[-1, -1, -1, 2, 2, -2, 1, 1]))) # False
print(check_consecutive(pd.Series(data=[-1, 1, -2, 2, -2, 2, -1, 1]))) # True
print(check_consecutive(pd.Series(data=[1, 1, -2, 2, -2, 2, -1, 1]))) # True
print(check_consecutive(pd.Series(data=[-1, -1, 2, 2, -2, 1, 1, -2]))) # False
print(check_consecutive(pd.Series(data=[-1, 1, -2, -2, 1, -1, 2, -2]))) # False
I have the following example array:
[[ 3, 5, 6],
[ 4, -1, -1],
[ 5, 7, -1],
[ 1, 6, -1],
[ 1, 0, 6],
[ 3, 4, 8],
[ 2, 3, 5],
[ 2, -1, -1],
[ 0, 4, 5],
[ 0, 5, -1]]
I am trying to:
merge two of the elements into one (in this case elements idx 2 and 4)
delete duplicates in the merged element (if any)
move any (-1) to the end of the element
supplement the rest of the elements with (-1) to pertain the rectangle shape and symmetry of the array
move merged element to position 0, like in the example below:
[[ 5, 7, 1, 0, 6, -1],
[ 3, 5, 6, -1, -1, -1],
[ 4, -1, -1, -1, -1, -1],
[ 1, 6, -1, -1, -1, -1],
[ 3, 4, 8, -1, -1, -1],
[ 2, 3, 5, -1, -1, -1],
[ 2, -1, -1, -1, -1, -1],
[ 0, 4, 5, -1, -1, -1],
[ 0, 5, -1, -1, -1, -1]]
Please suggest possible solution.
I'm not sure why people are saying your question is unclear (it's quite clear IMO), but here is a function that can merge any of the two elements like you specified:
import pprint
def move_n1_to_end(row): #function that moves all -1's to the end
n1_count = 0
for i in row:
if i==-1:
n1_count += 1
new_row = [e for e in row if e != -1]
new_row.extend([-1] * n1_count)
return new_row
def merge_move(arr, idx1, idx2):
merged_list = [*dict.fromkeys(arr[idx1] + arr[idx2])] #merge and remove duplicates
merged_list = move_n1_to_end(merged_list) #move -1's to the end
#removes the two lists that were merged
arr = [arr[i] for i in range(len(arr)) if i not in [idx1, idx2]]
#make a copy of the merged list (essentially moves it to the front of the new matrix)
new_arr = [merged_list[:]]
#find the maximum length of all the rows
max_length = max(len(e) for e in arr + new_arr)
for i in range(len(arr)): #for each index of the original matrix...
#set row to the i'th row
row = arr[i]
#move all -1's to the end
row = move_n1_to_end(row)
#insert -1's at the end of each row to make the matrix rectangular
new_arr.append(row + [-1] * (max_length-len(row)))
#append the result to the new matrix
return new_arr #return the resulting matrix
test_matrix = [[ 3, 5, 6],
[ 4, -1, -1],
[ 5, 7, -1],
[ 1, 6, -1],
[ 1, 0, 6],
[ 3, 4, 8],
[ 2, 3, 5],
[ 2, -1, -1],
[ 0, 4, 5],
[ 0, 5, -1]]
pprint.pprint(merge_move(test_matrix, 2, 4))
"""
Output:
[[5, 7, 1, 0, 6, -1],
[3, 5, 6, -1, -1, -1],
[4, -1, -1, -1, -1, -1],
[1, 6, -1, -1, -1, -1],
[3, 4, 8, -1, -1, -1],
[2, 3, 5, -1, -1, -1],
[2, -1, -1, -1, -1, -1],
[0, 4, 5, -1, -1, -1],
[0, 5, -1, -1, -1, -1]]
"""
I was checking out Simhash module ( https://github.com/leonsim/simhash ).
I presume that the Simhash("String").distance(Simhash("Another string")) is the hamming distance between the two strings. Now, I am not sure I understand this "get_features(string) method completely, as shown in (https://leons.im/posts/a-python-implementation-of-simhash-algorithm/).
def get_features(s):
width = 2
s = s.lower()
s = re.sub(r'[^\w]+', '', s)
return [s[i:i + width] for i in range(max(len(s) - width + 1, 1))]
Now, when I try to compute distance between "aaaa" and "aaas" using the width 2, it gives out the distance as 0.
from simhash import Simhash
Simhash(get_features("aaas")).distance(Simhash(get_features("aaaa")))
I am not sure what am I missing out in here.
Dig into code
The width, in your case, is the key parameter in get_features(), which give different splitted words. The get_features() in your case will output like:
['aa', 'aa', 'aa']
['aa', 'aa', 'as']
Then Simhash calculates these list as unweighted features (which means the default weight of each feature is 1) and output like:
86f24ba207a4912
86f24ba207a4912
They are the same!
The reason is from simhash algorithm itself. Let's look into the code:
def build_by_features(self, features):
"""
`features` might be a list of unweighted tokens (a weight of 1
will be assumed), a list of (token, weight) tuples or
a token -> weight dict.
"""
v = [0] * self.f
masks = [1 << i for i in range(self.f)]
if isinstance(features, dict):
features = features.items()
for f in features:
if isinstance(f, basestring):
h = self.hashfunc(f.encode('utf-8'))
w = 1
else:
assert isinstance(f, collections.Iterable)
h = self.hashfunc(f[0].encode('utf-8'))
w = f[1]
for i in range(self.f):
v[i] += w if h & masks[i] else -w
ans = 0
for i in range(self.f):
if v[i] >= 0:
ans |= masks[i]
self.value = ans
from: leonsim/simhash
The calculation process can be divied into 4 steps:
1) hash each splitted word (feature), to transform string into binary numbers;
2) weight them;
3) assumble weighted bits together;
4) change the assumbled number into binary and output as the value.
Now, in your case, the step 3 will output like:
[-3, 3, -3, -3, 3, -3, -3, -3, 3, -3, -3, 3, -3, -3, 3, -3, -3, 3, -3, 3, 3, 3, 3, -3, -3, -3, -3, -3, -3, 3, -3, -3, -3, 3, -3, 3, 3, 3, -3, 3, -3, -3, 3, -3, -3, 3, -3, -3, 3, 3, 3, 3, -3, 3, 3, -3, -3, -3, -3, 3, -3, -3, -3, -3]
[-1, 3, -3, -1, 3, -3, -3, -1, 3, -3, -3, 1, -1, -1, 1, -3, -3, 3, -1, 3, 1, 3, 1, -3, -1, -3, -3, -1, -1, 3, -1, -1, -1, 3, -1, 1, 3, 1, -1, 1, -3, -3, 1, -1, -3, 3, -3, -1, 1, 3, 3, 3, -3, 3, 3, -3, -1, -1, -1, 1, -3, -3, -3, -1]
And after step 4, the 2 output the same value.
Other parameter
If you change the width from 2 to 1, 3, 4, you will get different result of
Simhash(get_features()).
Your case shows the limitation of simhash with short length text.
I have an array a in Python, let's say a=np.array([3, 4]), and would like to define an ndarray (or something like that) of type [-3:3, -4:4], in other words, a collection x of real numbers x[-3,-4], x[-3,-3],...,x[3,4], the i'th coordinate ranging over integers between -a[i] and a[i]. If the array length is given (2 in this example), I could use
np.mgrid[-a[0]:a[0]:1.0,-a[1]:a[1]:1.0][0].
But what should I do if the length of a is unknown?
You could generate a list of ranges with
[np.arange(-x,x+1) for x in a]
I'd have to play around with mgrid, or another function in index_tricks to figure how to use it. I may to make it a tuple or pass it with a *.
mgrid wants slices, so this would replicate your first call
In [60]: np.mgrid[[slice(-x,x+1) for x in [3,4]]]
Out[60]:
array([[[-3, -3, -3, -3, -3, -3, -3, -3, -3],
[-2, -2, -2, -2, -2, -2, -2, -2, -2],
[-1, -1, -1, -1, -1, -1, -1, -1, -1],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 1, 1, 1, 1, 1, 1, 1, 1, 1],
[ 2, 2, 2, 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3, 3, 3, 3]],
[[-4, -3, -2, -1, 0, 1, 2, 3, 4],
[-4, -3, -2, -1, 0, 1, 2, 3, 4],
[-4, -3, -2, -1, 0, 1, 2, 3, 4],
[-4, -3, -2, -1, 0, 1, 2, 3, 4],
[-4, -3, -2, -1, 0, 1, 2, 3, 4],
[-4, -3, -2, -1, 0, 1, 2, 3, 4],
[-4, -3, -2, -1, 0, 1, 2, 3, 4]]])
which of course can be generalized to use a.
My initial arange approach works with meshgrid (producing a list of arrays):
In [71]: np.meshgrid(*[np.arange(-x,x+1) for x in [3,4]],indexing='ij')
Out[71]:
[array([[-3, -3, -3, -3, -3, -3, -3, -3, -3],
[-2, -2, -2, -2, -2, -2, -2, -2, -2],
...
[-4, -3, -2, -1, 0, 1, 2, 3, 4],
[-4, -3, -2, -1, 0, 1, 2, 3, 4],
[-4, -3, -2, -1, 0, 1, 2, 3, 4]])]
I need to find the indicies of both the zero and nonzero elements of an array.
Put another way, I want to find the complementary indices from numpy.nonzero().
The way that I know to do this is as follows:
indices_zero = numpy.nonzero(array == 0)
indices_nonzero = numpy.nonzero(array != 0)
This however means searching the array twice, which for large arrays is not efficient. Is there an efficient way to do this using numpy?
Assuming you already have the range for use numpy.arange(len(array)), just get and store the logical indices:
bindices_zero = (array == 0)
then when you actually need the integer indices you can do
indices_zero = numpy.arange(len(array))[bindices_zero]
or
indices_nonzero = numpy.arange(len(array))[~bindices_zero]
You can use boolean indexing:
In [82]: a = np.random.randint(-5, 5, 100)
In [83]: a
Out[83]:
array([-2, -1, 4, -3, 1, -2, 2, -1, 2, -1, -3, 3, -3, -4, 1, 2, 1,
3, 3, 0, 1, -3, -4, 3, -5, -1, 3, 2, 3, 0, -5, 4, 3, -5,
-3, 1, -1, 0, -4, 0, 1, -5, -5, -1, 3, -2, -5, -5, 1, 0, -1,
1, 1, -1, -2, -2, 1, 1, -4, -4, 1, -3, -3, -5, 3, 0, -5, -2,
-2, 4, 1, -4, -5, -1, 3, -3, 2, 4, -4, 4, 2, -2, -4, 3, 4,
-2, -4, 2, -4, -1, 0, -3, -1, 2, 3, 1, 1, 2, 1, 4])
In [84]: mask = a != 0
In [85]: a[mask]
Out[85]:
array([-2, -1, 4, -3, 1, -2, 2, -1, 2, -1, -3, 3, -3, -4, 1, 2, 1,
3, 3, 1, -3, -4, 3, -5, -1, 3, 2, 3, -5, 4, 3, -5, -3, 1,
-1, -4, 1, -5, -5, -1, 3, -2, -5, -5, 1, -1, 1, 1, -1, -2, -2,
1, 1, -4, -4, 1, -3, -3, -5, 3, -5, -2, -2, 4, 1, -4, -5, -1,
3, -3, 2, 4, -4, 4, 2, -2, -4, 3, 4, -2, -4, 2, -4, -1, -3,
-1, 2, 3, 1, 1, 2, 1, 4])
In [86]: a[-mask]
Out[86]: array([0, 0, 0, 0, 0, 0, 0])
I'm not sure about a built-in numpy method for accomplishing this, but you could use an old-fashioned for loop, I believe. Something like:
indices_zero = []
indices_nonzero = []
for index in xrange(len(array)):
if array[index] == 0:
indicies_zero.append(index)
else:
indicies_nonzero.append(index)
Something like this should accomplish what you want, by only looping once.