Numpy / Flatten a list - python

I have create this of character
list1 = [['20']*3,['35']*2,['40']*4,['10']*2,['15']*3]
result :
[['20', '20', '20'], ['35', '35'], ['40', '40', '40', '40'], ['10', '10'], ['15', '15', '15']]
I can convert it into a single list using list comprehension
charlist = [x for sublist in list1 for x in sublist]
print(charlist)
['20', '20', '20', '35', '35', '40', '40', '40', '40', '10', '10', '15', '15', '15']
I was wondering how to do that with numpy
listNP=np.array(list1)
gives as output :
array([list(['20', '20', '20']), list(['35', '35']),
list(['40', '40', '40', '40']), list(['10', '10']),
list(['15', '15', '15'])], dtype=object)
The fact is that listNP.flatten() gives as an output the same result. Probably I missed a step when converting the list into an numpy array

You can bypass all the extra operations and use np.repeat:
>>> np.repeat(['20', '35', '40', '10', '15'], [3, 2, 4, 2, 3])
array(['20', '20', '20', '35', '35', '40', '40', '40', '40',
'10', '10', '15', '15', '15'], dtype='<U2')
If you need dtype=object, make the first argument into an array first:
arr1 = np.array(['20', '35', '40', '10', '15'], dtype=object)
np.repeat(arr1, [3, 2, 4, 2, 3])

Use hstack()
import numpy as np
list1 = [['20']*3,['35']*2,['40']*4,['10']*2,['15']*3]
flatlist = np.hstack(list1)
print(flatlist)
['20' '20' '20' '35' '35' '40' '40' '40' '40' '10' '10' '15' '15' '15']
In trying to construct your ListNP with np.array as you do in the OP, I got a warning about jagged arrays and having to use dtype=object, but letting hstack construct it directly doesn't evoke a warning (thanks #Michael Delgado in the comments)

Related

How to split existing list into smaller, separate lists (without using 'groupby')?

I have a list with 64 values, that I want to split into 8 smaller lists. This is the function I used to make the values.
def listMaker(l):
for i in range(10):
l.append(f"0{i}") #Makes all singles digit numbers start with 0 ('01') to make grid even length
for i in range(10, 64):
l.append(f"{i}") #prints all numbers upto 63 (for index 0-63)
I want to go from:
['1','2','3','4']
To something like [['1','2']['3','4']]
So that it can be referenced like print(l[val1][val2])
You need to iterate over the main list, save the values into an intermediate list, when it reches the expected size, save it and use a new one
def listMaker(values, size):
result, tmp = [], []
for value in values:
if len(tmp) == size:
result.append(tmp)
tmp = []
tmp.append(value)
if tmp: # add last bucket
result.append(tmp)
return result
print(listMaker(range(10), 4))
# [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9]]
print(listMaker(range(20), 6))
# [[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19]]
A list of sublists could be created with a list comprehension:
n = 8 #size of sublist
sublists = [l[x*n:x*n+n] for x in range(0, 8)]
You can use the "magic" zip-iter trick:
def listMaker(l):
for i in range(10):
l.append(f"0{i}") #Makes all singles digit numbers start with 0 ('01') to make grid even length
for i in range(10, 64):
l.append(f"{i}") #prints all numbers upto 63 (for index 0-63)
l = []
listMaker(l)
list2 = [list(subl) for subl in zip(*[iter(l)]*8)]
list2
# Out[133]:
# [['00', '01', '02', '03', '04', '05', '06', '07'],
# ['08', '09', '10', '11', '12', '13', '14', '15'],
# ['16', '17', '18', '19', '20', '21', '22', '23'],
# ['24', '25', '26', '27', '28', '29', '30', '31'],
# ['32', '33', '34', '35', '36', '37', '38', '39'],
# ['40', '41', '42', '43', '44', '45', '46', '47'],
# ['48', '49', '50', '51', '52', '53', '54', '55'],
# ['56', '57', '58', '59', '60', '61', '62', '63']]

sorted() unable to sort list of chars [duplicate]

This question already has answers here:
How to sort a list of strings numerically?
(14 answers)
Closed 3 years ago.
I want to sort a list containing integers as chars. e.g:
l = ['1', '10', '11', '12', '16', '17', '2', '24', '26', '27', '28', '30', '32', '34', '35', '36', '43', '45', '47', '49', '50', '6', '9']
print(sorted(l))
is returning:
['1', '10', '11', '12', '16', '17', '2', '24', '26', '27', '28', '30', '32', '34', '35', '36', '43', '45', '47', '49', '50', '6', '9']
why sorted() is acting unusually?
Sorted is acting exactly as it should.
These are strings, not integers, so sorted sorts first by the first character, then by the second character.
If we want to sort ['1', '2', '12'], we get ['1', '12', '2']:
1
12
2
sorted first sorts by the first column, then by the second column.

Create array using lists (consisting of lists) but without flattening inner lists - python

I am trying to create an array using two lists, one of which has a list for each element. The problem is that in the first case I manage to do what I want, using np.column_stack but in the second case, although my initial lists look similar (in structure), my list of lists enters the array flattened (which is not what I need.
I am attaching two examples to replicate, on the first case, I manage to get an array, where each line has a string as first element, and a list as a second, while in the second case, I get 4 columns (the list is flattened) with no obvious reason.
Example 1
temp_list_column1=['St. Raphael',
'Goppingen',
'HSG Wetzlar',
'Huttenberg',
'Kiel',
'Stuttgart',
'Izvidac',
'Viborg W',
'Silkeborg-Voel W',
'Bjerringbro W',
'Lyngby W',
'Most W',
'Ostrava W',
'Presov W',
'Slavia Prague W',
'Dicken',
'Elbflorenz',
'Lubeck-Schwartau',
'HK Ogre/Miandum',
'Stal Mielec',
'MKS Perla Lublin W',
'Koscierzyna W',
'CS Madeira W',
'CSM Focsani',
'CSM Bucuresti',
'Constanta',
'Iasi',
'Suceava',
'Timisoara',
'Saratov',
'Alisa Ufa W',
'Pozarevac',
'Nove Zamky',
'Aranas',
'Ricoh',
'H 65 Hoor W',
'Lugi W',
'Strands W']
temp_list_column2=[['32', '16', '16'],
['32', '16', '16'],
['27', '13', '14'],
['23', '9', '14'],
['29', '14', '15'],
['24', '17', '7'],
['30', '15', '15'],
['26', '12', '14'],
['27', '13', '14'],
['26'],
['18', '9', '9'],
['34', '15', '19'],
['30', '13', '17'],
['31', '13', '18'],
['27', '10', '17'],
['28', '14', '14'],
['24', '14', '10'],
['28', '12', '16'],
['28', '9', '19'],
['22', '13', '9'],
['30', '14', '16'],
['22', '14', '8'],
['17', '8', '9'],
['26'],
['41', '21', '20'],
['36', '18', '18'],
['10'],
['25', '12', '13'],
['27', '16', '11'],
['31', '15', '16'],
['25', '15', '10'],
['24', '8', '16'],
['28', '14', '14'],
['24', '13', '11'],
['26', '14', '12'],
['33', '17', '16'],
['26', '12', '14'],
['17', '12', '5']]
import numpy as np
temp_array = np.column_stack((temp_list_column1,temp_list_column2))
output
array([['St. Raphael', ['32', '16', '16']],
['Goppingen', ['32', '16', '16']],
['HSG Wetzlar', ['27', '13', '14']],
['Huttenberg', ['23', '9', '14']],
['Kiel', ['29', '14', '15']],
['Stuttgart', ['24', '17', '7']],
['Izvidac', ['30', '15', '15']],
['Viborg W', ['26', '12', '14']],
['Silkeborg-Voel W', ['27', '13', '14']],
['Bjerringbro W', ['26']],
['Lyngby W', ['18', '9', '9']],
['Most W', ['34', '15', '19']],
['Ostrava W', ['30', '13', '17']],
['Presov W', ['31', '13', '18']],
['Slavia Prague W', ['27', '10', '17']],
['Dicken', ['28', '14', '14']],
['Elbflorenz', ['24', '14', '10']],
['Lubeck-Schwartau', ['28', '12', '16']],
['HK Ogre/Miandum', ['28', '9', '19']],
['Stal Mielec', ['22', '13', '9']],
['MKS Perla Lublin W', ['30', '14', '16']],
['Koscierzyna W', ['22', '14', '8']],
['CS Madeira W', ['17', '8', '9']],
['CSM Focsani', ['26']],
['CSM Bucuresti', ['41', '21', '20']],
['Constanta', ['36', '18', '18']],
['Iasi', ['10']],
['Suceava', ['25', '12', '13']],
['Timisoara', ['27', '16', '11']],
['Saratov', ['31', '15', '16']],
['Alisa Ufa W', ['25', '15', '10']],
['Pozarevac', ['24', '8', '16']],
['Nove Zamky', ['28', '14', '14']],
['Aranas', ['24', '13', '11']],
['Ricoh', ['26', '14', '12']],
['H 65 Hoor W', ['33', '17', '16']],
['Lugi W', ['26', '12', '14']],
['Strands W', ['17', '12', '5']]], dtype=object)
Example 2
temp_list_column1b=['Benidorm',
'Alpla Hard',
'Dubrava',
'Frydek-Mistek',
'Karvina',
'Koprivnice',
'Nove Veseli',
'Vardar',
'Meble Elblag Wojcik',
'Zaglebie',
'Benfica',
'Barros W',
'Juvelis W',
'Assomada W',
'UOR No.2 Moscow',
'Izhevsk W',
'Stavropol W',
'Din. Volgograd W',
'Zvenigorod W',
'Adyif W',
'Crvena zvezda',
'Ribnica',
'Slovan',
'Jeruzalem Ormoz',
'Karlskrona',
'Torslanda W']
temp_list_column2b=[['28', '14', '14'],
['27', '12', '15'],
['24', '13', '11'],
['24', '14', '10'],
['28', '17', '11'],
['30', '16', '14'],
['26', '15', '11'],
['38', '18', '20'],
['24', '13', '11'],
['33', '15', '18'],
['24', '10', '14'],
['18', '11', '7'],
['22', '9', '13'],
['25', '12', '13'],
['19', '11', '8'],
['24', '10', '14'],
['21', '9', '12'],
['18', '10', '8'],
['31', '17', '14'],
['29', '15', '14'],
['26', '14', '12'],
['29', '12', '17'],
['25', '11', '14'],
['33', '19', '14'],
['32', '14', '18'],
['19', '12', '7']]
import numpy as np
temp_arrayb = np.column_stack((temp_list_column1b,temp_list_column2b))
output
array([['Benidorm', '28', '14', '14'],
['Alpla Hard', '27', '12', '15'],
['Dubrava', '24', '13', '11'],
['Frydek-Mistek', '24', '14', '10'],
['Karvina', '28', '17', '11'],
['Koprivnice', '30', '16', '14'],
['Nove Veseli', '26', '15', '11'],
['Vardar', '38', '18', '20'],
['Meble Elblag Wojcik', '24', '13', '11'],
['Zaglebie', '33', '15', '18'],
['Benfica', '24', '10', '14'],
['Barros W', '18', '11', '7'],
['Juvelis W', '22', '9', '13'],
['Assomada W', '25', '12', '13'],
['UOR No.2 Moscow', '19', '11', '8'],
['Izhevsk W', '24', '10', '14'],
['Stavropol W', '21', '9', '12'],
['Din. Volgograd W', '18', '10', '8'],
['Zvenigorod W', '31', '17', '14'],
['Adyif W', '29', '15', '14'],
['Crvena zvezda', '26', '14', '12'],
['Ribnica', '29', '12', '17'],
['Slovan', '25', '11', '14'],
['Jeruzalem Ormoz', '33', '19', '14'],
['Karlskrona', '32', '14', '18'],
['Torslanda W', '19', '12', '7']],
dtype='<U19')
In the first case, shape is (38, 2), while in the second is (26, 4) (i am interested in the number of columns only). Am I missing something obvious?
Your problem here seems to be that the first B list is jagged, while your second is rectangular.
Look at the difference in how Numpy converts the following two lists into Arrays (which, as #hpaulj points out, is exactly what happens when you pass them to column_stack:
In [1]: b1 = [
...: [1,2,3],
...: [2,3,4],
...: [3,4,5],
...: [4,5,6]]
In [2]: np.array(b1)
Out[2]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
In [3]: b2 = [
...: [1,2,3],
...: [2,3],
...: [3]]
In [4]: np.array(b2)
Out[4]: array([list([1, 2, 3]), list([2, 3]), list([3])], dtype=object)
Thus, when column stacking your example lists, in the first case you have a 1D array of lists that gets converted into a single column, whereas in the second case you have a 2D matrix of numbers that has 3 columns.
You should probably just not even be using Numpy's column_stack in this case, just zip the two lists together. If you want a numpy array as your final result, just np.array(list(zip(list_a, list_b)))
EDIT: In retrospect, your data structure sounds more like what's typically referred to as a DataFrame, rather than a matrix which is what Numpy is trying to give you.
import pandas as pd
data = pd.DataFrame()
data['name'] = temp_list_column1
data['numbers'] = test_list_column2
# Or
data = pd.DataFrame(list(zip(temp_list_column1, temp_list_column2)), columns=['name', 'numbers'])
Which gives you a data structure that looks like:
name numbers
0 John [1, 2, 3]
1 James [2, 3, 4]
2 Peter [3, 4, 5]
3 Paul [4, 5, 6]
Diagnosis
It seems like the issue is for the 2nd example, all the sublists has 3 elements while in the first example there are sublists with length 1 e.g. ['Bjerringbro W', ['26']]; the list ['26'] has only one element.
In the second case apparently np.column_stack forces to NOT HAVE lists as a cell element. In fact, we can have another discussion about why you want to see lists as cell elements which I will not go through here. Here is the solution
Special Case Solution
I assume you don't mind using pandas
import pandas as pd
series_1 = pd.Series(temp_list_column1b).to_frame(name='col1') # name it whatever you want
series_2 = pd.Series(temp_list_column2b).to_frame(name='col2') # name it whatever you want
df = pd.concat([series_1, series_2], axis=1)
# print(df) # view in pandas form
# print(df.values) # to see how it looks like as a numpy array
# print(df.values.shape) # to see how what the shape is in terms of numpy
Generalized Solution
Assuming you have a list of such columns which is called "list_of_cols". Then:
import pandas as pd
'''
list_of_cols: all the lists you want to combine
'''
df = pd.concat([pd.Series(temp_col).to_frame() for temp_col in list_of_cols], axis=1)
I hope this helps!

Python - Convert each integer to string in list and add a comma

I am new to python. I need to create a list of integers from 1 to 70, but for each integer I want to make it a string and a comma after it and store it in another list.
Ex:
for i in range (1,71):
list_of_ints.append(i)
{ Some code
}
it should then be something like this
columns = ['1','2','3','4'.......'70']
Use [str(i) for i in range(1, 71)]. This gives you the list of str(i) for all i in range(1, 71). The function str(i) returns i as a str value instead of as an int
Seems like you want something like this,
>>> l = []
>>> for i in range(1,71):
l.append(str(i))
>>> l
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70']
new_list = [str(x) for x in range(1, 71)]
Using a list comprehension to achieve the same result.
You can use map to help you here:
>>> list_of_ints = range(1, 71)
>>> list_of_ints = map(str, list_of_ints)
>>> print list_of_ints
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70']
>>>

How to determine how many unique values are in a dictionary?

I have a dictionary containing lists that look something like this:
['33', '34', '34', '34', '35', '35', '36', '36', '38']
['34', '37', '38']
['33', '33', '35', '35', '38', '38']
I'm trying to get the number of unique values for each of these lists automatically (i.e. the third list would have a value of 3).
How should I do this?
print len(set([1,1,2,2,3,3]))
is that what you are looking for?
sets are just like lists ... except they only contain unique elements, and they have no order
Build a set and then get its length:
print len(set(yourlist))
len(set(x)) is the number of unique elements in collection x.
So for that dictionary:
d
Out[16]:
{'a': ['33', '34', '34', '34', '35', '35', '36', '36', '38'],
'b': ['34', '37', '38'],
'c': ['33', '33', '35', '35', '38', '38']}
You'd have
for k,v in d.items():
print('{}: {}'.format(k,len(set(v))))
a: 5
c: 3
b: 3
python 3 syntax.

Categories

Resources