numpy, merge two array of different shape - python

For two arrays a and b,
a = np.array([[1],[2],[3],[4]])
b = np.array(['a', 'b', 'c', 'd'])
I want to generate the following array
c = np.array([[1, 'a'], [2, 'b'], [3, 'c'], [4, 'd']])
Is there a way to do this efficiently ?

You need:
import numpy as np
a = np.array([[1],[2],[3],[4]])
b = np.array(['a', 'b', 'c', 'd'])
print(np.array(list(zip(np.concatenate(a), b))))
Output:
[[1, 'a'], [2, 'b'], [3, 'c'], [4, 'd']]
Alternate Solution
print(np.stack((np.concatenate(a), b), axis=1))

Solution
>>> import numpy as np
>>> a = np.array([[1],[2],[3],[4]])
>>> b = np.array(['a', 'b', 'c', 'd'])
# You have strange array so result is strange
>>> np.array([[a[i],b[i]] for i in range(a.shape[0])])
array([[array([1]), 'a'],
[array([2]), 'b'],
[array([3]), 'c'],
[array([4]), 'd']], dtype=object)
# You want this
>>> np.array([[a[i][0],b[i]] for i in range(a.shape[0])])
array([['1', 'a'],
['2', 'b'],
['3', 'c'],
['4', 'd']], dtype='<U11')
>>>
Note:
You may want to reshape your 'a' array.
>>> a.shape
(4, 1)
>>> a
array([[1],
[2],
[3],
[4]])
Reshape like this for easier use, for next time...
>>> a.reshape(4)
array([1, 2, 3, 4])

You can do:
c = np.vstack((a.flatten(), b)).T

Related

transform 1D array of list to 2D array

For example
import pandas as pd
d1 = pd.Series(['a b', 'c d'])
t1 = d1.str.split()
a1 = t1.values
where a1 would be
array([list(['a', 'b']), list(['c', 'd'])], dtype=object)
how to transform it to
array([['a', 'b'],
['c', 'd']], dtype='<U1')
Use np.stack on t1:
In [186]: np.stack(t1)
Out[186]:
array([['a', 'b'],
['c', 'd']], dtype='<U1')
Or np.array on t1.tolist
In [187]: np.array(t1.tolist())
Out[187]:
array([['a', 'b'],
['c', 'd']], dtype='<U1')

Look up value in an array

Suppose I have two datasets
DS1
ArrayCol
[1,2,3,4]
[1,2,3]
DS2
Key Name
1 A
2 B
3 C
4 D
how to look up the values in the array to map the "Name" so that I can have another dataset like the following?
DS3
COlNew
[A,B,C,D]
[A,B,C]
Thanks, it's in databricks, so method is ok . python,sql,scala…...
you can try this
ds1 = [[1, 2, 3, 4], [1, 2, 3]]
ds2 = {1: 'A', 2: 'B', 3: 'C', 4: 'D'}
new_data = [[ds2[cell] for cell in col] for col in ds1]
print(new_data)
output:
[['A', 'B', 'C', 'D'], ['A', 'B', 'C']]
hope that will be help. :)
Lets consider your dataset are in files and you can do something like this,
making use of dict
f=open("ds1.txt").readlines()
g=open("ds2.txt").readlines()
u=dict(item.rstrip().split("\t") for item in g)
for i in f:
i = i.rstrip().strip('][').split(',')
print [u[col] for col in i]
Output
['A', 'B', 'C', 'D']
['A', 'B', 'C']

How to generate transpose-like matrix without using any built-in function and without using loops?

Is not exactly like a matrix transpose. I'm using python and trying using matrix transformations but I can't without loops, I'm using numpy, is there any solution just using matrix operations or vectorized functions?.
For example:
To this
Looks like you want to rotate this 180 degrees then transpose. How about:
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
np.rot90(x, 2).T
>>> array([[9, 6, 3],
[8, 5, 2],
[7, 4, 1]])
Here are is a way that only uses indexing:
>>> import numpy as np
>>> a = np.array(['abcdefghi']).view('U1').reshape(3, 3)
>>> a
array([['a', 'b', 'c'],
['d', 'e', 'f'],
['g', 'h', 'i']], dtype='<U1')
>>>
>>> a[[2,1,0],[[2],[1],[0]]]
array([['i', 'f', 'c'],
['h', 'e', 'b'],
['g', 'd', 'a']], dtype='<U1')
If you do not want to hardcode the indices you'll have to use some kind of builtin. Either Python builtins:
>>> a[list(reversed(range(3))), list(zip(reversed(range(3))))]
array([['i', 'f', 'c'],
['h', 'e', 'b'],
['g', 'd', 'a']], dtype='<U1')
or numpy
>>> a[np.ogrid[2:-1:-1,2:-1:-1][::-1]]
array([['i', 'f', 'c'],
['h', 'e', 'b'],
['g', 'd', 'a']], dtype='<U1')
Note that all these methods do a non-lazy transpose, meaning that the resulting array is C contiguous.

Appending to a list of lists sequentially

I have two list of lists:
my_list = [[1,2,3,4], [5,6,7,8]]
my_list2 = [['a', 'b', 'c'], ['d', 'e', 'f']]
I want my output to look like this:
my_list = [[1,2,3,4,'a','b','c'], [5,6,7,8,'d','e','f']]
I wrote the following code to do this but I end up getting more lists in my result.
my_list = map(list, (zip(my_list, my_list2)))
this produces the result as:
[[[1, 2, 3, 4], ['a', 'b', 'c']], [[5, 6, 7, 8], ['d', 'e', 'f']]]
Is there a way that I can remove the redundant lists.
Thanks
Using zip is the right approach. You just need to add the elements from the tuples zip produces.
>>> my_list = [[1,2,3,4], [5,6,7,8]]
>>> my_list2 = [['a', 'b', 'c'], ['d', 'e', 'f']]
>>> [x+y for x,y in zip(my_list, my_list2)]
[[1, 2, 3, 4, 'a', 'b', 'c'], [5, 6, 7, 8, 'd', 'e', 'f']]
You can use zip in a list comprehension:
my_list = [[1,2,3,4], [5,6,7,8]]
my_list2 = [['a', 'b', 'c'], ['d', 'e', 'f']]
new_list = [i+b for i, b in zip(my_list, my_list2)]
As an alternative you may also use map with sum and lambda function to achieve this (but list comprehension approach as mentioned in other answer is better):
>>> map(lambda x: sum(x, []), zip(my_list, my_list2))
[[1, 2, 3, 4, 'a', 'b', 'c'], [5, 6, 7, 8, 'd', 'e', 'f']]

How do I keep the index of the duplicate element unchanged

Here is a input list:
['a', 'b', 'b', 'c', 'c', 'd']
The output I expect should be:
[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd']]
I try to use map()
>>> map(lambda (index, word): [index, word], enumerate([['a', 'b', 'b', 'c', 'c', 'd']])
[[0, 'a'], [1, 'b'], [2, 'b'], [3, 'c'], [4, 'c'], [5, 'd']]
How can I get the expected result?
EDIT: This is not a sorted list, the index of each element increase only when meet a new element
>>> import itertools
>>> seq = ['a', 'b', 'b', 'c', 'c', 'd']
>>> [[i, c] for i, (k, g) in enumerate(itertools.groupby(seq)) for c in g]
[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd']]
[
[i, x]
for i, (value, group) in enumerate(itertools.groupby(['a', 'b', 'b', 'c', 'c', 'd']))
for x in group
]
It sounds like you want to rank the terms based on a lexicographical ordering.
input = ['a', 'b', 'b', 'c', 'c', 'd']
mapping = { v:i for (i, v) in enumerate(sorted(set(input))) }
[ [mapping[v], v] for v in input ]
Note that this works for unsorted inputs as well.
If, as your amendment suggests, you want to number items based on order of first appearance, a different approach is in order. The following is short and sweet, albeit offensively hacky:
[ [d.setdefault(v, len(d)), v] for d in [{}] for v in input ]
When list is sorted use groupby (see jamylak answer); when not, just iterate over the list and check if you've seen this letter already:
a = ['a', 'b', 'b', 'c', 'c', 'd']
result = []
d = {}
n = 0
for k in a:
if k not in d:
d[k] = n
n += 1
result.append([d[k],k])
It is the most effective solution; it takes only O(n) time.
Example of usage for unsorted lists:
[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd'], [0, 'a']]
As you can see, you have here the same order of items as in the input list.
When you sort the list first you need O(n*log(n)) additional time.

Categories

Resources