Python, Rearanging a numpy array by column 0 value, signed integers - python

I've got a folder with a dataset which is poorly sorted, and id like to rearrange the information that I'm pulling from it as I'm reading it. Therefore I am wondering, is there an easy way to sort following input:
[['-10' '10']
['-10' '20']
['-15' '10']
['-15' '20']
['-5' '10']
['-5' '20]
['0' '10']
['0' '20']
['10' '10']
['10' '20']
['15' '10']
['15' '20']
['5' '10']
['5' '20]
into following output:
[['-15' '10']
['-15' '20']
['-10' '10']
['-10' '20']
['-5' '10']
['-5' '20]
['0' '10']
['0' '20']
['5' '10']
['5' '20]
['10' '10']
['10' '20']
['15' '10']
['15' '20']]

How about using pandas dataframe?
import pandas as pd
data = [['5', '10'], ['4', '20']]
dataframe = pd.DataFrame(data).sort_values(by=0) #define by as index
print(dataframe)
#Output:
# 0 1
#1 4 20
#0 5 10

I'm afraid you'll need to cast your str values to int for the desired sort order. Then, you just want to sort a list by multiple attributes. If you want to have str values in the output, too, you'll also need to cast backwards.
import operator
a = [['-10', '10'],
['-10', '20'],
['-15', '10'],
['-15', '20'],
['-5', '10'],
['-5', '20'],
['0', '10'],
['0', '20'],
['10', '10'],
['10', '20'],
['15', '10'],
['15', '20'],
['5', '10'],
['5', '20']]
print(a)
b = [[int(e[0]), int(e[1])] for e in a] # to int
b = sorted(b, key=operator.itemgetter(0, 1)) # sort
b = [[str(e[0]), str(e[1])] for e in b] # to str
print(b)
Output:
[['-10', '10'], ['-10', '20'], ['-15', '10'], ['-15', '20'], ['-5', '10'], ['-5', '20'], ['0', '10'], ['0', '20'], ['10', '10'], ['10', '20'], ['15', '10'], ['15', '20'], ['5', '10'], ['5', '20']]
[['-15', '10'], ['-15', '20'], ['-10', '10'], ['-10', '20'], ['-5', '10'], ['-5', '20'], ['0', '10'], ['0', '20'], ['5', '10'], ['5', '20'], ['10', '10'], ['10', '20'], ['15', '10'], ['15', '20']]
Hope that helps!
EDIT: Or just use some lambda expression in sorted:
c = sorted(a, key = lambda x: (int(x[0]), int(x[1])))
print(c)

Related

How can I extract a column and create a vector out of them?

mat = [['1', '2', '3', '4', '5'],
['6', '7', '8', '9', '10'],
['11', '12', '13', '14', '15']]
Suppose, I have this vector of vectors.
Say, I need to extract 2nd column of each row, convert them into binary, and then create a vector of them.
Is it possible to do it without using NumPy?
Use zip for transpose list and make loop with enumerate and filter by id with bin().
mat = [['1', '2', '3', '4', '5'],
['6', '7', '8', '9', '10'],
['11', '12', '13', '14', '15']]
vec = [[bin(int(r)) for r in row] for idx, row in enumerate(zip(*mat)) if idx == 1][0]
print(vec) # ['0b10', '0b111', '0b1100']
Yes. This is achievable with the following code :
mat = [['1', '2', '3', '4', '5'],
['6', '7', '8', '9', '10'],
['11', '12', '13', '14', '15']]
def decimalToBinary(n):
return bin(n).replace("0b", "")
new_vect = []
for m in mat:
m = int(m[1])
new_vect.append(decimalToBinary(m))
print (new_vect)
Hope this is expected
['10', '111', '1100']

How to unpack values from a file

If as a input i have a file that read-
0->54:15
1->41:12
2->35:6
3->42:10
4->34:7
5->58:5
6->55:12
7->39:6
8->36:12
9->38:15
10->53:13
11->56:12
12->51:5
13->48:8
14->60:14
15->46:12
16->57:6
17->52:9
18->40:11
Actually this is an adjacency list. I want my code to read the file and take the values as -> u=0,v=54, w=15 and then go with my plan. How can i do this? Thank you in advance for your time to read and answer this.
Using .split would be good.
For each line in the file (You can get this by using the open() function) split it using the arrow and the colon.
for line in lines:
split_line = line.split("->") # Split by the arrow first
split_line = split_line[0] + split_line[1].split(":")
u, v, w = split_line # Note u, v, and w are strings
I would recommend using JSON format so you can use the json module in python the parse the file into variables easily.
If you had a single string:
import re
s = \
'''0->54:15
1->41:12
2->35:6
3->42:10
4->34:7
5->58:5
6->55:12
7->39:6
8->36:12
9->38:15
10->53:13
11->56:12
12->51:5
13->48:8
14->60:14
15->46:12
16->57:6
17->52:9
18->40:11'''
s = s.split('\n')
output = [re.split('->|:', x) for x in s]
output
[['0', '54', '15'], ['1', '41', '12'], ['2', '35', '6'], ['3', '42', '10'], ['4', '34', '7'], ['5', '58', '5'], ['6', '55', '12'], ['7', '39', '6'], ['8', '36', '12'], ['9', '38', '15'], ['10', '53', '13'], ['11', '56', '12'], ['12', '51', '5'], ['13', '48', '8'], ['14', '60', '14'], ['15', '46', '12'], ['16', '57', '6'], ['17', '52', '9'], ['18', '40', '11']]
If you want a dictionary
d = {x[0]:[x[1],x[2]] for x in output}
d
{'0': ['54', '15'], '1': ['41', '12'], '2': ['35', '6'], '3': ['42', '10'], '4': ['34', '7'], '5': ['58', '5'], '6': ['55', '12'], '7': ['39', '6'], '8': ['36', '12'], '9': ['38', '15'], '10': ['53', '13'], '11': ['56', '12'], '12': ['51', '5'], '13': ['48', '8'], '14': ['60', '14'], '15': ['46', '12'], '16': ['57', '6'], '17': ['52', '9'], '18': ['40', '11']}
If you want a dataframe:
import pandas as pd
df = pd.DataFrame(output, columns=['u','v','w'])
df
u v w
0 0 54 15
1 1 41 12
2 2 35 6
3 3 42 10
4 4 34 7
5 5 58 5
6 6 55 12
7 7 39 6
8 8 36 12
9 9 38 15
10 10 53 13
11 11 56 12
12 12 51 5
13 13 48 8
14 14 60 14
15 15 46 12
16 16 57 6
17 17 52 9
18 18 40 11
Here is how you can use re.split() to split strings with multiple delimiters:
from re import split
with open('file.txt','r') as f:
l = f.read().splitlines()
lst = [list(filter(None, split('[(\-\>):]',s))) for s in l]
print(lst)
Output:
[['0', '54', '15'],
['1', '41', '12'],
['2', '35', '6'],
['3', '42', '10'],
['4', '34', '7'],
['5', '58', '5'],
['6', '55', '12'],
['7', '39', '6'],
['8', '36', '12'],
['9', '38', '15'],
['10', '53', '13'],
['11', '56', '12'],
['12', '51', '5'],
['13', '48', '8'],
['14', '60', '14'],
['15', '46', '12'],
['16', '57', '6'],
['17', '52', '9'],
['18', '40', '11']]
Breaking it down:
This: lst = [list(filter(None, split('[(\-\>):]',s))) for s in l] is the equivalent of:
lst = [] # The main list
for s in l: # For every line in the list of lines
uvw = split('[(\-\>):]',s) # uvw = a list of the numbers
uvw = list(filter(None,uvw)) # There is an empty string in the list, so filter it out
lst.append(uvw) # Add the list to the main list
I'm going to challenge the way that you're getting the input file in the first place: if you have any control over how you get this input, I'd encourage you to change its format. (If not, maybe this answer will help people who have a similar issue in the future).
There is typically little reason to "roll your own" serialization and deserialization like this - it's reinventing the wheel, given that most modern languages have built-in libraries to do this already. Rather, if at all possible, you should use a standard serialization and deserialization mechanism like Python pickle or a JSON serializer (or even a CSV, so that you can use a CSV parser).

How to create new lists in python

I want to create new lists from one list. This the example list I am working on:
matrixlist = [['Matrix', '1'], ['1', '4', '6'], ['5', '2', '9'], ['Matrix', '2'], ['2', '6'], ['1', '3'], ['8', '6'], ['Matrix', '3'], ['5', '6', '7', '9'], ['1', '4', '2', '3'], ['8', '7', '3', '5'], ['9', '4', '5', '3'], ['Matrix', '4'], ['7', '8'], ['4', '6'], ['2', '3']]
I split them like this with for loop:
matrix1 = [['1', '4', '6'], ['5', '2', '9']]
matrix2 = [['2', '6'], ['1', '3'], ['8', '6']]
matrix3 = [['5', '6', '7', '9'], ['1', '4', '2', '3'], ['8', '7', '3', '5'], ['9', '4', '5', '3']]
matrix4 = [['7', '8'], ['4', '6'], ['2', '3']]
But I want to give the long list to program and it create lists and append the relevant elements in it. Like matrix 1 elements in matrix1 list.
Edit: I can't use any advanced built-in function. I can only use simple ones (like append, pop, reverse, range) and my functions in code.
You can use itertools.groupby:
from itertools import groupby
matrixlist = [['Matrix', '1'], ['1', '4', '6'], ['5', '2', '9'], ['Matrix', '2'], ['2', '6'], ['1', '3'], ['8', '6'], ['Matrix', '3'], ['5', '6', '7', '9'], ['1', '4', '2', '3'], ['8', '7', '3', '5'], ['9', '4', '5', '3'], ['Matrix', '4'], ['7', '8'], ['4', '6'], ['2', '3']]
result = [list(b) for a, b in groupby(matrixlist, key=lambda x:x[0] == 'Matrix') if not a]
Output:
[[['1', '4', '6'], ['5', '2', '9']],
[['2', '6'], ['1', '3'], ['8', '6']],
[['5', '6', '7', '9'], ['1', '4', '2', '3'], ['8', '7', '3', '5'], ['9', '4', '5', '3']],
[['7', '8'], ['4', '6'], ['2', '3']]]
you can do it using list comprehension like below
indx = [i for i, mat in enumerate(matrixlist )if mat[0]=='Matrix']
matrixes = {matrixlist[i][1]: matrixlist[i+1: j] for i, j in zip(indx, indx[1:])}
# access matrix with its id
matrixes["1"]

Removing all non digits from a list

After reading from a file I have a list of lists contaning not only digits but also other characters, which I would like to get rid of.
I've tried using re.sub function but this doesn't seem to work
import re
Poly_id= [['0', '[4', '8', '18', '20', '5', '0', '4]'], ['1', '[13', '16',
'6', '11', '13]'], ['2', '[3', '1', '10', '9', '2', '15', '3]'], ['3',
'[13', '12', '16', '13]'], ['4', '[13', '11', '17', '14', '7', '13]']]
for x in Poly_id:
[re.sub(r'\W', '', ch) for ch in x]
This doesn't seem to change a thing in this list.
I would like to have a list with only numbers as elements so that I could convert them into integers
I guess technically [4 is non numeric so you can do something like this:
Poly_id = [[char for char in _list if str.isnumeric(char)] for _list in Poly_id]
Output:
['0', '8', '18', '20', '5', '0']
['1', '16', '6', '11']
['2', '1', '10', '9', '2', '15']
['3', '12', '16']
['4', '11', '17', '14', '7']
If you just want to remove the non numeric values and not the complete entry then you can do this:
Poly_id = [[''.join(char for char in substring if str.isnumeric(char)) for substring in _list] for _list in Poly_id]
Output:
['0', '4', '8', '18', '20', '5', '0', '4']
['1', '13', '16', '6', '11', '13']
['2', '3', '1', '10', '9', '2', '15', '3']
['3', '13', '12', '16', '13']
['4', '13', '11', '17', '14', '7', '13']
Here a solution if you want to get rid of the '[' in '[4' but keep the '4':
res = [[re.sub(r'\W', '', st) for st in inlist] for inlist in Poly_id]
res is:
[
['0', '4', '8', '18', '20', '5', '0', '4'],
['1', '13', '16', '6', '11', '13'],
['2', '3', '1', '10', '9', '2', '15', '3'],
['3', '13', '12', '16', '13'],
['4', '13', '11', '17', '14', '7', '13']
]
You can use a module, "itertools"
import itertools
list_of_lists = [[1, 2], [3, 4]]
print(list(itertools.chain(*list_of_lists)))
>>>[1, 2, 3, 4]

Python: sort list except first line

I have this list :
[['Nom', 'Francais', 'Anglais', 'Maths'], ['Catherine', '9', '17', '9'], ['Karim', '12', '15', '11'], ['Rachel', '15', '15', '14'], ['Roger', '12', '14', '12'], ['Gabriel', '7', '13', '8'], ['Francois', '14', '8', '15'], ['Henri', '10', '12', '13'], ['Stephane', '18', '12', '8'], ['Karine', '9', '10', '10'], ['Marie', '10', '10', '10'], ['Claire', '15', '9', '12'], ['Marine', '12', '9', '12']]
I want to sort it with the names (or, in another words, by alphabetical order of the [0] element of each list within the list) but i don't want don't want the first list (['Nom', 'Francais', 'Anglais', 'Maths']) to be sorted with the others , how can in do that ?
Thanks a lot !
You can use range assignment:
>>> from pprint import pprint # just to have a nice display
>>> data = [['Nom', 'Francais', 'Anglais', 'Maths'], ['Catherine', '9', '17', '9'], ['Karim', '12', '15', '11'], ['Rachel', '15', '15', '14'], ['Roger', '12', '14', '12'], ['Gabriel', '7', '13', '8'], ['Francois', '14', '8', '15'], ['Henri', '10', '12', '13'], ['Stephane', '18', '12', '8'], ['Karine', '9', '10', '10'], ['Marie', '10', '10', '10'], ['Claire', '15', '9', '12'], ['Marine', '12', '9', '12']]
>>> pprint(data)
[['Nom', 'Francais', 'Anglais', 'Maths'],
['Catherine', '9', '17', '9'],
['Karim', '12', '15', '11'],
['Rachel', '15', '15', '14'],
['Roger', '12', '14', '12'],
['Gabriel', '7', '13', '8'],
['Francois', '14', '8', '15'],
['Henri', '10', '12', '13'],
['Stephane', '18', '12', '8'],
['Karine', '9', '10', '10'],
['Marie', '10', '10', '10'],
['Claire', '15', '9', '12'],
['Marine', '12', '9', '12']]
>>> data[1:] = sorted(data[1:])
>>> pprint(data)
[['Nom', 'Francais', 'Anglais', 'Maths'],
['Catherine', '9', '17', '9'],
['Claire', '15', '9', '12'],
['Francois', '14', '8', '15'],
['Gabriel', '7', '13', '8'],
['Henri', '10', '12', '13'],
['Karim', '12', '15', '11'],
['Karine', '9', '10', '10'],
['Marie', '10', '10', '10'],
['Marine', '12', '9', '12'],
['Rachel', '15', '15', '14'],
['Roger', '12', '14', '12'],
['Stephane', '18', '12', '8']]
Personally, I'd do something like this. But it assumes you're semi-comfortable with Pandas. This gives you a lot more flexibility to do more with the data.
import pandas as pd
nl = [['Nom', 'Francais', 'Anglais', 'Maths'], ['Catherine', '9', '17', '9'], ['Karim', '12', '15', '11'], ['Rachel', '15', '15', '14'], ['Roger', '12', '14', '12'], ['Gabriel', '7', '13', '8'], ['Francois', '14', '8', '15'], ['Henri', '10', '12', '13'], ['Stephane', '18', '12', '8'], ['Karine', '9', '10', '10'], ['Marie', '10', '10', '10'], ['Claire', '15', '9', '12'], ['Marine', '12', '9', '12']]
df = pd.DataFrame(columns = nl[0])
for l, c in zip(nl[0], range(4)):
df[l] = [ r[c] for r in nl[1:] ]
df.sort_values(by = 'Nom', inplace = True)
df.reset_index(drop = True, inplace = True)
which yields:
Nom Francais Anglais Maths
0 Catherine 9 17 9
1 Claire 15 9 12
2 Francois 14 8 15
3 Gabriel 7 13 8
4 Henri 10 12 13
5 Karim 12 15 11
6 Karine 9 10 10
7 Marie 10 10 10
8 Marine 12 9 12
9 Rachel 15 15 14
10 Roger 12 14 12
11 Stephane 18 12 8
and then if you need a .csv per your most recent comment, it's simply:
df.to_csv('/directory/my_filename.csv', index = False)

Categories

Resources