I have a file say, file1.txt which looks something like below.
27,28,29,30,1,0.67
31,32,33,34,1,0.84
35,36,37,38,1,0.45
39,40,41,42,1,0.82
43,44,45,46,1,0.92
43,44,45,46,1,0.92
51,52,53,54,2,0.28
55,56,57,58,2,0.77
59,60,61,62,2,0.39
63,64,65,66,2,0.41
75,76,77,78,3,0.51
90,91,92,93,3,0.97
Where the last column is the fitness and the 2nd last column is the class.
Now I read this file like :
rule_file_name = 'file1.txt'
rule_fp = open(rule_file_name)
list1 = []
for line in rule_fp.readlines():
list1.append(line.replace("\n","").split(","))
Then a default dictionary was created to ensure the rows are separated according to the classes.
from collections import defaultdict
classes = defaultdict(list)
for _list in list1:
classes[_list[-2]].append(_list)
Then they are paired up within each class using the below logic.
from random import sample, seed
seed(1)
for key, _list in classes.items():
_list=sorted(_list,key=itemgetter(-1),reverse=True)
length = len(_list)
middle_index = length // 2
first_half = _list[:middle_index]
second_half = _list[middle_index:]
result=[]
result=list(zip(first_half,second_half))
Later using the 2 rows of the pair, a 3rd row is being created using the below logic:
ans=[[random.choice(choices) for choices in zip(*item)] for item in result]
So if there were initially 12 rows in the file1, that will now form 6 pairs and hence 6 new rows will be created. I simply want to append those newly created rows to the file1 using below logic:
list1.append(ans)
print(ans)
with open(f"output.txt", 'w') as out:
new_rules = [list(map(str, i)) for i in list1]
for item in new_rules:
out.write("{}\n".format(",".join(item)))
#out.write("{}\n".format(item))
But now my output.txt looks like:
27,28,29,30,1,0.67
31,32,33,34,1,0.84
35,36,37,38,1,0.45
39,40,41,42,1,0.82
43,44,45,46,1,0.92
43,44,45,46,1,0.92
51,52,53,54,2,0.28
55,56,57,58,2,0.77
59,60,61,62,2,0.39
63,64,65,66,2,0.41
75,76,77,78,3,0.51
90,91,92,93,3,0.97
['43', '44', '41', '46', '1', '0.82'],['27', '28', '45', '46', '1', '0.92'],['35', '36', '33', '38', '1', '0.84']
['55', '60', '57', '58', '2', '0.77'],['51', '64', '53', '66', '2', '0.28']
['75', '91', '77', '93', '3', '0.51']
But my desired outcome is:
27,28,29,30,1,0.67
31,32,33,34,1,0.84
35,36,37,38,1,0.45
39,40,41,42,1,0.82
43,44,45,46,1,0.92
43,44,45,46,1,0.92
51,52,53,54,2,0.28
55,56,57,58,2,0.77
59,60,61,62,2,0.39
63,64,65,66,2,0.41
75,76,77,78,3,0.51
90,91,92,93,3,0.97
43,44,41,46,1,0.82
27,28,45,46,1,0.92
35,36,33,38,1,0.84
55,60,57,58,2,0.77
51,64,53,66,2,0.28
75,91,77,93,3,0.51
I would use numpy, it is flexible and compact.
import numpy as np
fin = 'file1.txt'
col1, col2, col3, col4, jclass, fitness = np.loadtxt(fin, unpack=True, delimiter=',')
rows = np.column_stack((col1, col2, col3, col4, jclass, fitness))
print(rows[0])
print(rows[-1])
print(fitness)
Then apply your logic to the rows array
I have a list as follows:
listt = ['34','56,67','45,56,67','45']
I would like to get a list of single values.
this is my code:
new_list=[]
for element in listt:
if ',' in element:
subl=element.split(',')
new_list = new_list + subl
else:
new_list.append(element)
result:
['34', '56', '67', '45', '56', '67', '45']
Is there actually a way to do this with a comprehension list? (i.e. one liner).
It looks like too much code for such a tiny thing.
thanks.
spam = ['34','56,67','45,56,67','45']
eggs = [num for item in spam for num in item.split(',')]
print(eggs)
output
['34', '56', '67', '45', '56', '67', '45']
listt = ['34','56,67','45,56,67','45']
print(','.join(listt).split(','))
Prints:
['34', '56', '67', '45', '56', '67', '45']
I have a list of strings with the following pattern
my_list = ['/path/to/my/data/S1B_IW_GRDH_1SDV_20190610T030906_20190610T030931_016628_01F4BE_6B99_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190523T030954_20190523T031019_027349_0315A8_999E_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190511T030953_20190511T031018_027174_03102E_402F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190628T030956_20190628T031021_027874_032595_0B1F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190604T030955_20190604T031020_027524_031B16_BD33_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190622T030907_20190622T030932_016803_01F9F1_D6E9_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190505T030904_20190505T030929_016103_01E4AD_17B5_VV.tif']
I want to sort my list in chronological order using the time information that is present on each string (20190610,.....). The problem is that at the begining of each string I have the pattern S1A or S1B which makes that using a simple mylist.sort() does not work directly.
Looking in others posts I have seen that the solution would be to use the key argument with some kind of pattern.
My question is, how to start the sorting at a specific position of each string in my list. In my case I want to start sorting at position 35 right after _1SDV_
I have seen some options like
from operator import itemgetter
my_list.sort(key = itemgetter(35))
or
my_list.sort(key = lambda x: x[35])
Copying #schwobaseggl's solution from the comments, the following solution should work.
my_list.sort(key = lambda x: x[35:])
Example:
>>> my_list = ['91', '82', '73', '64', '55', '46', '37', '28', '19']
>>> my_list.sort()
>>> my_list
['19', '28', '37', '46', '55', '64', '73', '82', '91']
>>> my_list.sort(key = lambda x: x[1:]) # sorting after first position
>>> my_list
['91', '82', '73', '64', '55', '46', '37', '28', '19']
Using regex:
import regex as re
my_list = ['/path/to/my/data/S1B_IW_GRDH_1SDV_20190610T030906_20190610T030931_016628_01F4BE_6B99_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190523T030954_20190523T031019_027349_0315A8_999E_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190511T030953_20190511T031018_027174_03102E_402F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190628T030956_20190628T031021_027874_032595_0B1F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190604T030955_20190604T031020_027524_031B16_BD33_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190622T030907_20190622T030932_016803_01F9F1_D6E9_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190505T030904_20190505T030929_016103_01E4AD_17B5_VV.tif']
my_list.sort(key=lambda x: re.findall("\d{8}", x)[0])
print(my_list)
Output:
['/path/to/my/data/S1B_IW_GRDH_1SDV_20190505T030904_20190505T030929_016103_01E4AD_17B5_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190511T030953_20190511T031018_027174_03102E_402F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190523T030954_20190523T031019_027349_0315A8_999E_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190604T030955_20190604T031020_027524_031B16_BD33_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190610T030906_20190610T030931_016628_01F4BE_6B99_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190622T030907_20190622T030932_016803_01F9F1_D6E9_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190628T030956_20190628T031021_027874_032595_0B1F_VV.tif']
I am trying to separate a column that I read from a .csv file into a multidimensional array. So, if the first column is read into a single array and looks like this:
t = ['90-0066', '24', '33', '34', '91-0495', '22', '33', '92-6676', '23', '32']
How do I write the code in python for every value like '90-0066' the following numbers are put into an array until the next - value? So I would like the array to look like:
t = [['24', '33', '34'], ['22', '33'], ['23', '32']]
Thanks!
You can use itertools.groupby in a list comprehension:
from itertools import groupby
t = [list(g) for k, g in groupby(t, key=str.isdigit) if k]
t becomes:
[['24', '33', '34'], ['22', '33'], ['23', '32']]
If the numbers are possibly floating points, you can use regex instead:
import re
t = [list(g) for k, g in groupby(t, key=lambda s: bool(re.match(r'\d+(?:\.\d+)?$', s)) if k]
Or zip longest with two list comprehensions:
>>> from itertools import zip_longest
>>> l=[i for i,v in enumerate(t) if not v.isdigit()]
>>> [t[x+1:y] for x,y in zip_longest(l,l[1:])]
[['24', '33', '34'], ['22', '33'], ['23', '32']]
>>>
This question already has answers here:
How do I convert all strings in a list of lists to integers?
(15 answers)
Closed 5 years ago.
I have data in python with nested lists, a part of which looks like:
data = [['214', '205', '0', '14', '710', '1813494849', '0'], ['214', '204', '0', '30', '710', '1813494856', '0'], ['214', '204', '0', '34', '710', '1813494863', '0'], ['213', '204', '0', '35', '710', '1813494870', '0'], ['213', '203', '0', '35', '710', '1813494877', '0']]
While converting the data using couple methods:
1.
new_data_list = [[int(x) for x in list] for list in data]
list=[]
for i in range(0,len(data)):
list.append([])
for j in range(0,len(data[i])):
b=int(data[i][j])
list[i].append(b)
I am getting the error which says:
> ValueError: invalid literal for int() with base 10: ''
There are no non-numeric data in my list of data. But there might be something with the header, like an empty header treated as non-numeric value as I have created a list of data from csv.
I wanted to know an efficient way which can convert each element of the list to int while keeping the data multi list.
Call int for each item in each nested list:
new_list = [[int(x) for x in lst] for lst in nested]
Or using map:
new_list = [list(map(int, lst)) for lst in nested]
A concise one is:
x = ['123', '12', '5']
r = list(map(int, x))
print(r)
>[123, 12, 5]