Python sorting numbers in a multicolumn file [duplicate] - python

This question already has answers here:
How to sort python list of strings of numbers
(4 answers)
Closed 6 years ago.
I have a file with 4 column data, and I want to prepare a final output file which is sorted by the first column. The data file (rough.dat) looks like:
1 2 4 9
11 2 3 5
6 5 7 4
100 6 1 2
The code I am using to sort by the first column is:
with open('rough.dat','r') as f:
lines=[line.split() for line in f]
a=sorted(lines, key=lambda x:x[0])
print a
The result I am getting is strange, and I think I'm doing something silly!
[['1', '2', '4', '9'], ['100', '6', '1', '2'], ['11', '2', '3', '5'], ['6', '5', '7', '4']]
You may see that the first column sorting is not done as per ascending order, instead, the numbers starting with 'one' takes the priority!! A zero after 'one' i.e 100 takes priority over 11!

Strings are compared lexicographically (dictionary order):
>>> '100' < '6'
True
>>> int('100') < int('6')
False
Converting the first item to int in key function will give you what you want.
a = sorted(lines, key=lambda x: int(x[0]))

You are sorting your numbers literally because they are strings not integers. As a more numpythonic way you can use np.loadtext in order to load your data then sort your rows based on second axis:
import numpy as np
array = np.loadtxt('rough.dat')
array.sort(axis=1)
print array
[[ 1. 2. 4. 9.]
[ 2. 3. 5. 11.]
[ 4. 5. 6. 7.]
[ 1. 2. 6. 100.]]

Related

How can i select values from 2d array and add to another 2d array python

So i want to take each value from a list in a 2d array into its own separate 2d array for use later.
I have this code:
for i in portalsAll:
for x in i:
tpinx.append(x.split(" ")[0])
tpiny.append(x.split(" ")[1])
tpoutx.append(x.split(" ")[2])
tpouty.append(x.split(" ")[3])
tpIn_x.append(tpinx)
tpIn_y.append(tpiny)
tpOut_x.append(tpoutx)
tpOut_y.append(tpouty)
and this is the 2d array i wish to take the values from:
[['0 0 1 2', '0 2 2 0', '2 2 1 0'], ['1 0 2 0', '8 0 3 0', '0 0 9 0']]
As you can see, there are spaces between the values, which i cannot delete as i took this data from a file, and is why i split it to remove the spaces.
However this code does not work and it replies for tpIn_x ,as an example,
[['0', '0', '2', '1', '8', '0'], ['0', '0', '2', '1', '8', '0']]
which is a 2d array consisting of 2 repeated lists.
My ideal output is
[['0', '0', '2'], ['1', '8', '0']]
where it would only put the data in each list from the corresponding list in the original 2d array, by the way the list sizes in the 2d array is not set to 3 so i cannot set a maximum list size.
How can i fix this? Any help is gratefully accepted.
for i in portalsAll:
tpIn_x.append([x.split()[0] for x in i])
tpIn_y.append([x.split()[1] for x in i])
tpOut_x.append([x.split()[2] for x in i])
tpOut_y.append([x.split()[3] for x in i])

getting index of a multiple line string

Trying to get an integer in a multi-line string with its value same as its index. Here's my trial.
table='''012
345
678'''
print (table[4])
if i execute the above, i will get a output of 3 instead of a 4.
I am trying to get number i with print(table[i])
What is the simplest way of getting the number corresponding to table[i] without using list, because i have to further use while loops later to replace values of the table and using lists would be very troublesome. Thanks.
Your string contains whitespaces (carriage return and mabye linefeed) at position 4 (\n in linux, \n\r on 4+5 on windows) - you can clean your text by removing them:
table='''012
345
678'''
print (table[4]) #3 - because [3] == \n
print(table.replace("\n","")[4]) # 4
You can view all characters in your "table" like so:
print(repr(table))
# print the ordinal value of the character and the character if a letter
for c in table:
print(ord(c), c if ord(c)>31 else "")
Output:
'012\n345\n678'
48 0
49 1
50 2
10
51 3
52 4
53 5
10
54 6
55 7
56 8
On a sidenote - you might want to build a lookup dict if your table does not change to skip replacing stuffin your string all the time:
table='''012
345
678'''
indexes = dict( enumerate(table.replace("\n","")))
print(indexes)
Output:
{0: '0', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5', 6: '6', 7: '7', 8: '8'}
so you can do index[3] to get the '3' string

How to replace certain columns in a file of uneven dimensions

I'm trying to edit some data in a txt file, but the file is written such that there are some rows with more columns than others. Example:
1 0.0 0.
2 0.25 0.
3 0.50 0. 13 1 0.2 0.
14 2.625 0.
15 2.800 0. 20 1 0.2
21 4.05 0.
22 4.2 0. 24 1 0.2
25 4.75 0.
26 4.90
27 5.05
28 5.15
29 5.25
As can be seen, there are sections with multiple spaces, and some rows have 7 columns instead of 3.
I want to take each value from the second column (0.0, 0.25, etc) and from the sixth column (0.2, 0.2, etc) and perform basic multiplication and division on each. So for example, in the second row, I want to take 0.25 and multiply it by 25.4.
I tried to read and break the file into a list
g = open("myfile.txt","r+")
lines = g.read().split(' ')
while('' in lines):
lines.remove('')
This gives the output
['1', '0.0', '0.\n', '2', '0.25', '0.\n', '3', '0.50', '0.', '13', '1', '0.2',
'0.\n', '14', '2.625', '0.\n', '15', '2.800', '0.', '20', '1', '0.2\n', '21',
'4.05', '0.\n', '22', '4.2', '0.', '24', '1', '0.2\n', '25', '4.75', '0.\n', '26',
'4.90\n', '27', '5.05\n', '28', '5.15\n', '29', '5.25\n\n']
(The second \n at the end is because there are empty rows to space each section of this data). I then tried to use a loop and counter to define where each item in the list is in the table:
counter = 0
for i in lines:
if '\n' in lines[i]:
counter = 0
elif counter == 1 or counter == 5:
lines[i] = float(lines[i])*25.4
counter += 1
From this, I end up with the error:
TypeError: list indices must be integers or slices, not str
Any ideas on what I could do that would work, and potentially be more elegant?
A possible solution for your problem if I understood it correctly:
Use with open() as f sintax to make sure your file will be closed after the scope ended.
lines = list()
with open('my_file.txt', 'r') as f:
for line in f.readlines():
line = line.strip() # clean possible additional spaces just to be sure
lines.append(line.split())
to multiply the values you want in place:
for line in lines:
line[1] = line[1]*25.4
line[5] = line[5]*25.4
hope it helped

How do I Store input data into multiple matrices in Python?

I have a text file with multiple matrices like this:
4 5 1
4 1 5
1 2 3
[space]
4 8 9
7 5 6
7 4 5
[space]
2 1 3
5 8 9
4 5 6
I want to read this input file in python and store it in multiple matrices like:
matrixA = [...] # first matrix
matrixB = [...] # second matrix
so on. I know how to read external files in python but don't know how to divide this input file in multiple matrices, how can I do this?
Thank you
You can write a code like this:
all_matrices = [] # hold matrixA, matrixB, ...
matrix = [] # hold current matrix
with open('file.txt', 'r') as f:
values = line.split()
if values: # if line contains numbers
matrix.append(values)
else: # if line contains nothing then add matrix to all_matrices
all_matrices.append(matrix)
matrix = []
# do what every you want with all_matrices ...
I am sure the algorithm could be optimized somewhere, but the answer I found is quite simple:
file = open('matrix_list.txt').read() #Open the File
matrix_list = file.split("\n\n") #Split the file in a list of Matrices
for i, m in enumerate(matrix_list):
matrix_list[i]=m.split("\n") #Split the row of each matrix
for j, r in enumerate(matrix_list[i]):
matrix_list[i][j] = r.split() #Split the value of each row
This will result in the following format:
[[['4', '5', '1'], ['4', '1', '5'], ['1', '2', '3']], [['4', '8', '9'], ['7', '5', '6'], ['7', '4', '5']], [['2', '1', '3'], ['5', '8', '9'], ['4', '5', '6']]]
Example on how to use the list:
print(matrix_list) #prints all matrices
print(matrix_list[0]) #prints the first matrix
print(matrix_list[0][1]) #prints the second row of the first matrix
print(matrix_list[0][1][2]) #prints the value from the second row and third column of the first matrix

Comparing two numbers lists with each other in Python

I have a data frame (possibly a list):
A = ['01', '20', '02', '25', '26']
B = ['10', '13', '14', '64', '32']
I would like to compare list 'a' with list 'b' in the following way:
As you can see, strings of numbers in the left column with strings in the right column are compared. Combined are strings that have the same boundary digit, one of which is removed during merging (or after). Why was the string '010' removed? Because each digit can occur only once.
You can perform a couple of string slicing operations and then merge on the common digit.
a
A
0 01
1 20
2 02
3 25
4 26
b
B
0 10
1 13
2 14
3 64
4 32
a['x'] = a.A.str[-1]
b['x'] = b.B.str[0]
b['B'] = b.B.str[1:]
m = a.merge(b)
You could also do this in a single line with assign, without disrupting the original dataframes:
m = a.assign(x=a.A.str[-1]).merge(b.assign(x=b.B.str[0], B=b.B.str[1:]))
For uniques, you'll need to convert to set and check its length.
v = (m['A'] + m['B'])
v.str.len() == v.apply(set).str.len()
0 False
1 True
2 True
3 True
dtype: bool
v[v.str.len() == v.apply(set).str.len()].tolist()
['013', '014', '264']
Something you should be aware of is that you're actually passing integers, not strings. That means that A = [01, 20, 02, 25, 26] is the same as A = [1, 20, 2, 25, 26]. If you always know that you're going to be working with integers <= 99, however, this won't be an issue. Otherwise, you should use strings instead of integers, like A = ['01', '20', '02', '25', '26']. So the first thing you should do is convert the lists to lists of strings. If you know all of the integers will be <= 99, you can do so like this:
A = ['%02d' % i for i in A]
B = ['%02d' % i for i in B]
(you could also name these something different if you want to preserve the integer lists). Then here would be the solution:
final = []
for i in A:
for j in B:
if i[-1] == j[0]:
final.append(i + j[1:])

Categories

Resources