Numpy, 1:M joins on Arrays - python

I was wondering if there is a way to join an numpy array.
Example:
array1 = [[1,c,d], [2,a,b], [3, e,f]]
array2 = [[2,g,g,t], [1,alpha, beta, gamma], [1,t,y,u], [3,dog, cat, fish]]
I need to join these array, but the Numpy documentation says if the records are not unique, the functions will fail or return unknown results.
Does anyone have any sample to do a 1:M join instead of a 1:1 join on numpy arrays? Also, I know my examples are in the proper numpy format, but it's just to give a general idea.

What you are willing to achieve looks more like a new nested list based on your two input arrays.
Treating them as lists:
list1 = [[1,'c','d'], [2,'a','b'], [3, 'e','f']]
list2 = [[2,'g','g','t'], [1,'alpha', 'beta', 'gamma'], [1,'t','y','u'], [3,'dog', 'cat', 'fish']]
You can build your desired result doing:
result = [i+j[1:] for i in list1 for j in list2 if i[0]==j[0]]
Which will look like this:
[[1, 'c', 'd', 'alpha', 'beta', 'gamma'],
[1, 'c', 'd', 't', 'y', 'u'],
[2, 'a', 'b', 'g', 'g', 't'],
[3, 'e', 'f', 'dog', 'cat', 'fish']]

Related

Python indirect list indexing [duplicate]

In Python I have a list of elements aList and a list of indices myIndices. Is there any way I can retrieve all at once those items in aList having as indices the values in myIndices?
Example:
>>> aList = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> myIndices = [0, 3, 4]
>>> aList.A_FUNCTION(myIndices)
['a', 'd', 'e']
I don't know any method to do it. But you could use a list comprehension:
>>> [aList[i] for i in myIndices]
Definitely use a list comprehension but here is a function that does it (there are no methods of list that do this). This is however bad use of itemgetter but just for the sake of knowledge I have posted this.
>>> from operator import itemgetter
>>> a_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> my_indices = [0, 3, 4]
>>> itemgetter(*my_indices)(a_list)
('a', 'd', 'e')
Indexing by lists can be done in numpy. Convert your base list to a numpy array and then apply another list as an index:
>>> from numpy import array
>>> array(aList)[myIndices]
array(['a', 'd', 'e'],
dtype='|S1')
If you need, convert back to a list at the end:
>>> from numpy import array
>>> a = array(aList)[myIndices]
>>> list(a)
['a', 'd', 'e']
In some cases this solution can be more convenient than list comprehension.
You could use map
map(aList.__getitem__, myIndices)
or operator.itemgetter
f = operator.itemgetter(*aList)
f(myIndices)
If you do not require a list with simultaneous access to all elements, but just wish to use all the items in the sub-list iteratively (or pass them to something that will), its more efficient to use a generator expression rather than list comprehension:
(aList[i] for i in myIndices)
Alternatively, you could go with functional approach using map and a lambda function.
>>> list(map(lambda i: aList[i], myIndices))
['a', 'd', 'e']
I wasn't happy with these solutions, so I created a Flexlist class that simply extends the list class, and allows for flexible indexing by integer, slice or index-list:
class Flexlist(list):
def __getitem__(self, keys):
if isinstance(keys, (int, slice)): return list.__getitem__(self, keys)
return [self[k] for k in keys]
Then, for your example, you could use it with:
aList = Flexlist(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
myIndices = [0, 3, 4]
vals = aList[myIndices]
print(vals) # ['a', 'd', 'e']

list splicing variable assignments automation

Imagine that you have a list of strings.
lst = ['a','b17','c','dz','e','ff','e3','e66']
you want to seperate those strings into individual variables
a = lst[:7]
b = lst[7:14]
c = lst[14:21]
Im wondering if there is a pythonic way of handling this instead of spending time typing out every single list splice.
You can use a generator expression to produce the slices and unpack them to your desired variables:
a, b, c = (lst[i:i+7] for i in range(0, 21, 7))
But that would produce an error of too many items to unpack if there are more than 21 items in the list, so it's better to use a list comprehension to keep it a list instead of individual variables:
[lst[i:i+7] for i in range(0, len(lst), 7)]
Try this method:
def f(lst,n):
l=[]
range_=list(range(0,len(lst),n))
for x,y in zip(range_,range_[1:]):
l.append(lst[x:y])
return l
print(f(lst,7))
Output with lst as:
lst = ['a','b17','c','dz','e','ff','e3','e66']*5
Is:
[['a', 'b17', 'c', 'dz', 'e', 'ff', 'e3'], ['e66', 'a', 'b17', 'c', 'dz', 'e', 'ff'], ['e3', 'e66', 'a', 'b17', 'c', 'dz', 'e'], ['ff', 'e3', 'e66', 'a', 'b17', 'c', 'dz'], ['e', 'ff', 'e3', 'e66', 'a', 'b17', 'c']]

Indexing failure/odd behaviour with array

I have some code that is intended to convert a 3-dimensional list to an array. Technically it works in that I get a 3-dimensional array, but indexing only works when I don't iterate accross one of the dimensions, and doesn't work if I do.
Indexing works here:
listTempAllDays = []
for j in listGPSDays:
listTempDay = []
for i in listGPSDays[0]:
arrayDay = np.array(i)
listTempDay.append(arrayDay)
arrayTemp = np.array(listTempDay)
listTempAllDays.append(arrayTemp)
arrayGPSDays = np.array(listTempAllDays)
print(arrayGPSDays[0,0,0])
It doesn't work here:
listTempAllDays = []
for j in listGPSDays:
listTempDay = []
for i in j:
arrayDay = np.array(i)
listTempDay.append(arrayDay)
arrayTemp = np.array(listTempDay)
listTempAllDays.append(arrayTemp)
arrayGPSDays = np.array(listTempAllDays)
print(arrayGPSDays[0,0,0])
The difference between the two pieces of code is in the inner for loop. The first piece of code also works for all elements in listGPSDays (e.g. for i in listGPSDays[1]: etc...).
Removing the final print call allows the code to run in the second case, or changing the final line to print(arrayGPSDays[0][0,0]) does also run.
In both cases checking the type at all levels returns <class 'numpy.ndarray'>.
I would like this array indexing to work, if possible - what am I missing?
The following is provided as example data:
Anonymised results from print(arrayGPSDays[0:2,0:2,0:2]), generated using the first piece of code (so that the indexing works! - but also resulting in arrayGPSDays[0] being the same as arrayGPSDays[1]):
[[['1' '2']
['3' '4']]
[['1' '2']
['3' '4']]]
numpy's array constructor can handle arbitrarily dimensioned iterables. They only stipulation is that they can't be jagged (i.e. each "row" in each dimension must have the same length).
Here's an example:
In [1]: list_3d = [[['a', 'b', 'c'], ['d', 'e', 'f']], [['g', 'h', 'i'], ['j', 'k', 'l']]]
In [2]: import numpy as np
In [3]: np.array(list_3d)
Out[3]:
array([[['a', 'b', 'c'],
['d', 'e', 'f']],
[['g', 'h', 'i'],
['j', 'k', 'l']]], dtype='<U1')
In [4]: array_3d = np.array(list_3d)
In [5]: array_3d[0,0,0]
Out[5]: 'a'
In [6]: array_3d.shape
Out[6]: (2, 2, 3)
If the array is jagged, numpy will "squash" down to the dimension where the jagged-ness happens. Since that explanation is clear as mud, an example might help:
In [20]: jagged_3d = [ [['a', 'b'], ['c', 'd']], [['e', 'f'], ['g', 'h'], ['i', 'j']] ]
In [21]: jagged_arr = np.array(jagged_3d)
In [22]: jagged_arr.shape
Out[22]: (2,)
In [23]: jagged_arr
Out[23]:
array([list([['a', 'b'], ['c', 'd']]),
list([['e', 'f'], ['g', 'h'], ['i', 'j']])], dtype=object)
The reason the constructor isn't working out of the box is because you have a jagged array. numpy simply does not support jagged arrays due to the fact that each numpy array has a well-defined shape representing the length of each dimension. So if the items in a given dimension are different lengths, this abstraction falls apart, and numpy simply doesn't allow it.
HTH.
So Isaac, it seems your code have some syntax misinterpretations,
In your for statement, j represents an ITEM inside the list listGPSDays (I assume it is a list), not the ITEM INDEX inside the list, and you don't need to "get" the range of the list, python can do it for yourself, try:
for j in listGPSdays:
instead of
for j in range(len(listGPSDays)):
Also, try changing this line of code from:
for i in listGPSDays[j]:
to:
for i in listGPSDays.index(j):
I think it will solve your problem, hope it works!

Is this correct use of flatten?

I am attempting to flatten a list using:
wd = ['this' , 'is']
np.asarray(list(map(lambda x : list(x) , wd))).flatten()
which returns:
array([['t', 'h', 'i', 's'], ['i', 's']], dtype=object)
when I'm expecting a char array: ['t','h','i','s','i','s']
Is this correct use of flatten?
No, this isn't a correct use for numpy.ndarray.flatten.
Two-dimensional NumPy arrays have to be rectangular or they will be cast to object arrays (or it will throw an exception). With object arrays flatten won't work correctly (because it won't flatten the "objects") and rectangular is impossible because your words have different lengths.
When dealing with strings (or arrays of strings) NumPy won't flatten them at all, neither if you create the array, nor when you try to "flatten" it:
>>> import numpy as np
>>> np.array(['fla', 'tten'])
array(['fla', 'tten'], dtype='<U4')
>>> np.array(['fla', 'tten']).flatten()
array(['fla', 'tten'], dtype='<U4')
Fortunately you can simply use "normal" Python features to flatten iterables, just to mention one example:
>>> wd = ['this' , 'is']
>>> [element for sequence in wd for element in sequence]
['t', 'h', 'i', 's', 'i', 's']
You might want to have a look at the following Q+A for more solutions and explanations:
Making a flat list out of list of lists in Python
Flatten (an irregular) list of lists
with just a list iteration:
[u for i in np.asarray(list(map(lambda x : list(x) , wd))) for u in i]
gives you this:
['t', 'h', 'i', 's', 'i', 's']
Although, as the comments say, you can just use ''.join() for your specific example, this has the advantage of working for numpy arrays and lists of lists:
test = np.array(range(10)).reshape(2,-1)
[u for i in test for u in i]
returns a flat list:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In[8]: from itertools import chain
In[9]: list(chain.from_iterable(['this' , 'is']))
Out[9]: ['t', 'h', 'i', 's', 'i', 's']

Python merging sublist

I've got the following list :
[['a','b','c'],['d','e'],['f','g','h','i',j]]
I would like a list like this :
['abc','de','fghij']
How is it possible?
[Edit] : in fact, my list could have strings and numbers,
l = [[1,2,3],[4,5,6], [7], [8,'a']]
and would be :
l = [123,456, 7, 8a]
thx to all,
you can apply ''.join method for all sublists.
This can be done either using map function or using list comprehensions
map function runs function passed as first argument to all elements of iterable object
initial = ['a', 'b', 'c'], ['d', 'e'], ['f', 'g', 'h', 'i', 'j']]
result = map(''.join, initial)
also one can use list comprehension
initial = ['a', 'b', 'c'], ['d', 'e'], ['f', 'g', 'h', 'i', 'j']]
result = [''.join(sublist) for sublist in initial]
Try
>>> L = [['a','b','c'],['d','e'],['f','g','h','i','j']]
>>> [''.join(x) for x in L]
['abc', 'de', 'fghij']

Categories

Resources