How to read 1st column from csv and separate into multidimensional array - python

I am trying to separate a column that I read from a .csv file into a multidimensional array. So, if the first column is read into a single array and looks like this:
t = ['90-0066', '24', '33', '34', '91-0495', '22', '33', '92-6676', '23', '32']
How do I write the code in python for every value like '90-0066' the following numbers are put into an array until the next - value? So I would like the array to look like:
t = [['24', '33', '34'], ['22', '33'], ['23', '32']]
Thanks!

You can use itertools.groupby in a list comprehension:
from itertools import groupby
t = [list(g) for k, g in groupby(t, key=str.isdigit) if k]
t becomes:
[['24', '33', '34'], ['22', '33'], ['23', '32']]
If the numbers are possibly floating points, you can use regex instead:
import re
t = [list(g) for k, g in groupby(t, key=lambda s: bool(re.match(r'\d+(?:\.\d+)?$', s)) if k]

Or zip longest with two list comprehensions:
>>> from itertools import zip_longest
>>> l=[i for i,v in enumerate(t) if not v.isdigit()]
>>> [t[x+1:y] for x,y in zip_longest(l,l[1:])]
[['24', '33', '34'], ['22', '33'], ['23', '32']]
>>>

Related

Writing list and list of list inside the same file in Python

I have a file say, file1.txt which looks something like below.
27,28,29,30,1,0.67
31,32,33,34,1,0.84
35,36,37,38,1,0.45
39,40,41,42,1,0.82
43,44,45,46,1,0.92
43,44,45,46,1,0.92
51,52,53,54,2,0.28
55,56,57,58,2,0.77
59,60,61,62,2,0.39
63,64,65,66,2,0.41
75,76,77,78,3,0.51
90,91,92,93,3,0.97
Where the last column is the fitness and the 2nd last column is the class.
Now I read this file like :
rule_file_name = 'file1.txt'
rule_fp = open(rule_file_name)
list1 = []
for line in rule_fp.readlines():
list1.append(line.replace("\n","").split(","))
Then a default dictionary was created to ensure the rows are separated according to the classes.
from collections import defaultdict
classes = defaultdict(list)
for _list in list1:
classes[_list[-2]].append(_list)
Then they are paired up within each class using the below logic.
from random import sample, seed
seed(1)
for key, _list in classes.items():
_list=sorted(_list,key=itemgetter(-1),reverse=True)
length = len(_list)
middle_index = length // 2
first_half = _list[:middle_index]
second_half = _list[middle_index:]
result=[]
result=list(zip(first_half,second_half))
Later using the 2 rows of the pair, a 3rd row is being created using the below logic:
ans=[[random.choice(choices) for choices in zip(*item)] for item in result]
So if there were initially 12 rows in the file1, that will now form 6 pairs and hence 6 new rows will be created. I simply want to append those newly created rows to the file1 using below logic:
list1.append(ans)
print(ans)
with open(f"output.txt", 'w') as out:
new_rules = [list(map(str, i)) for i in list1]
for item in new_rules:
out.write("{}\n".format(",".join(item)))
#out.write("{}\n".format(item))
But now my output.txt looks like:
27,28,29,30,1,0.67
31,32,33,34,1,0.84
35,36,37,38,1,0.45
39,40,41,42,1,0.82
43,44,45,46,1,0.92
43,44,45,46,1,0.92
51,52,53,54,2,0.28
55,56,57,58,2,0.77
59,60,61,62,2,0.39
63,64,65,66,2,0.41
75,76,77,78,3,0.51
90,91,92,93,3,0.97
['43', '44', '41', '46', '1', '0.82'],['27', '28', '45', '46', '1', '0.92'],['35', '36', '33', '38', '1', '0.84']
['55', '60', '57', '58', '2', '0.77'],['51', '64', '53', '66', '2', '0.28']
['75', '91', '77', '93', '3', '0.51']
But my desired outcome is:
27,28,29,30,1,0.67
31,32,33,34,1,0.84
35,36,37,38,1,0.45
39,40,41,42,1,0.82
43,44,45,46,1,0.92
43,44,45,46,1,0.92
51,52,53,54,2,0.28
55,56,57,58,2,0.77
59,60,61,62,2,0.39
63,64,65,66,2,0.41
75,76,77,78,3,0.51
90,91,92,93,3,0.97
43,44,41,46,1,0.82
27,28,45,46,1,0.92
35,36,33,38,1,0.84
55,60,57,58,2,0.77
51,64,53,66,2,0.28
75,91,77,93,3,0.51
I would use numpy, it is flexible and compact.
import numpy as np
fin = 'file1.txt'
col1, col2, col3, col4, jclass, fitness = np.loadtxt(fin, unpack=True, delimiter=',')
rows = np.column_stack((col1, col2, col3, col4, jclass, fitness))
print(rows[0])
print(rows[-1])
print(fitness)
Then apply your logic to the rows array

Sort list of strings by position

I have a list of strings with the following pattern
my_list = ['/path/to/my/data/S1B_IW_GRDH_1SDV_20190610T030906_20190610T030931_016628_01F4BE_6B99_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190523T030954_20190523T031019_027349_0315A8_999E_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190511T030953_20190511T031018_027174_03102E_402F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190628T030956_20190628T031021_027874_032595_0B1F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190604T030955_20190604T031020_027524_031B16_BD33_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190622T030907_20190622T030932_016803_01F9F1_D6E9_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190505T030904_20190505T030929_016103_01E4AD_17B5_VV.tif']
I want to sort my list in chronological order using the time information that is present on each string (20190610,.....). The problem is that at the begining of each string I have the pattern S1A or S1B which makes that using a simple mylist.sort() does not work directly.
Looking in others posts I have seen that the solution would be to use the key argument with some kind of pattern.
My question is, how to start the sorting at a specific position of each string in my list. In my case I want to start sorting at position 35 right after _1SDV_
I have seen some options like
from operator import itemgetter
my_list.sort(key = itemgetter(35))
or
my_list.sort(key = lambda x: x[35])
Copying #schwobaseggl's solution from the comments, the following solution should work.
my_list.sort(key = lambda x: x[35:])
Example:
>>> my_list = ['91', '82', '73', '64', '55', '46', '37', '28', '19']
>>> my_list.sort()
>>> my_list
['19', '28', '37', '46', '55', '64', '73', '82', '91']
>>> my_list.sort(key = lambda x: x[1:]) # sorting after first position
>>> my_list
['91', '82', '73', '64', '55', '46', '37', '28', '19']
Using regex:
import regex as re
my_list = ['/path/to/my/data/S1B_IW_GRDH_1SDV_20190610T030906_20190610T030931_016628_01F4BE_6B99_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190523T030954_20190523T031019_027349_0315A8_999E_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190511T030953_20190511T031018_027174_03102E_402F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190628T030956_20190628T031021_027874_032595_0B1F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190604T030955_20190604T031020_027524_031B16_BD33_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190622T030907_20190622T030932_016803_01F9F1_D6E9_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190505T030904_20190505T030929_016103_01E4AD_17B5_VV.tif']
my_list.sort(key=lambda x: re.findall("\d{8}", x)[0])
print(my_list)
Output:
['/path/to/my/data/S1B_IW_GRDH_1SDV_20190505T030904_20190505T030929_016103_01E4AD_17B5_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190511T030953_20190511T031018_027174_03102E_402F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190523T030954_20190523T031019_027349_0315A8_999E_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190604T030955_20190604T031020_027524_031B16_BD33_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190610T030906_20190610T030931_016628_01F4BE_6B99_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190622T030907_20190622T030932_016803_01F9F1_D6E9_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190628T030956_20190628T031021_027874_032595_0B1F_VV.tif']

combine two lists in a list of dictionaries into one

I want to combine the list of ages of the groups which are having a repeated name...
My code:
dic1 = {'g1': ['45', '35', '56', '65'], 'g2': ['67', '76'], 'g3':['8', '96']}
dic2 = {'g1': ['akshay', 'swapnil', 'parth','juhi'], 'g2': ['megha', 'varun'], 'g3': ['gaurav', 'parth']}
for key2,name_list in dic2.items():
for name in name_list:
if name=='parth':
for key1,age_list in dic1.items():
if key1==key2:
print(age_list)
The output is:
['45', '35', '56', '65']
['8', '96']
I want the output as:
['45', '35', '56', '65', '8', '96']
Can someone help me with this?
there's more pythonic than that, you need to chain the lists. Also, no need for so many loops. A one-liner should do.
dic1 = {'g1': ['45', '35', '56', '65'], 'g2': ['67', '76'], 'g3':['8', '96']}
dic2 = {'g1': ['akshay', 'swapnil', 'parth','juhi'], 'g2': ['megha', 'varun'], 'g3': ['gaurav', 'parth']}
import itertools
result = list(itertools.chain.from_iterable(dic1[k] for k,v in dic2.items() if 'parth' in v))
>>> result
['45', '35', '56', '65', '8', '96']
A variant without itertools would be:
result = [x for k,v in dic2.items() if 'parth' in v for x in dic1[k]]
With a dict of sets instead of a dict of lists:
dic2 = {'g1': {'akshay', 'swapnil', 'parth','juhi'}, 'g2': {'megha', 'varun'}, 'g3': {'gaurav', 'parth'}}
those turn your O(N**3) algorithm into a O(N) algorithm (because in lookup in a list is O(N) but O(1) in a set).
If you have a missing key, just replace dic1[k] by dic1.get(k,[]) or even dic1.get(k) or [].
You could either use itertools as mentioned in other answers, or just simplify your own code.
There is no need to have a three layer nested for loop. As python only allows
unique keys, you could eliminate the innermost for loop like so:
output_list = []
for key, name_list in dic2.items():
if "parth" in name_list:
output_list += dic1[key]
print(output_list)
As and when you get the required age list which is to be displayed, add it to the output_list with a simple +=.
Though the above code is easier to understand, I recommend using itertools.

Python Lists Unsolved [duplicate]

This question already has answers here:
Add SUM of values of two LISTS into new LIST
(22 answers)
Closed 6 years ago.
I'm pretty new to Python although I have learned most the basic's although I need to be able to read from a csv file (which so far works), then append the data from this csv into lists which is working, and the part I am unsure about is using two of these lists and / 120 and * 100
for example the list1 first score is 55 and list2 is 51, I want to merge these together into a list to equal 106 and then add something which can divide then times each one as there is 7 different numbers in each list.
import csv
list1 = []
list2 = []
with open("scores.csv") as f:
reader = csv.reader(f)
for row in reader:
list1.append(row[1])
list2.append(row[2])
print (list1)
print (list2)
OUTPUT
['55', '25', '40', '21', '52', '42', '19']
['51', '36', '50', '39', '53', '33', '40']
EXPECTED OUTPUT (WANTED OUTPUT)
['106', '36', '90', '60', '105', '75', '59']
which then needs to be divided by 120 and * 100 for each one.
Check out zip.
for a, b in zip(list1, list2):
# .... do stuff
so for you maybe:
output = [((int(a)+int(b))/120)*100 for a, b in zip(list1, list2)]
Make a new list that takes your desired calculations into account.
>>> list1 = ['55', '25', '40', '21', '52', '42', '19']
>>> list2 = ['51', '36', '50', '39', '53', '33', '40']
>>> result = [(int(x)+int(y))/1.2 for x,y in zip(list1, list2)]
>>> result
[88.33333333333334, 50.833333333333336, 75.0, 50.0, 87.5, 62.5, 49.16666666666667]

Python: Maximum of lists of 2 or more elements in a tuple using key

Lets say i have a tuple of list like:
g = (['20', '10'], ['10', '74'])
I want the max of two based on the first value in each list like
max(g, key = ???the g[0] of each list.something that im clueless what to provide)
And answer is ['20', '10']
Is that possible? what should be the key here?
According to above answer.
Another eg:
g = (['42', '50'], ['30', '4'])
ans: max(g, key=??) = ['42', '50']
PS: By max I mean numerical maximum.
Just pass in a callable that gets the first element of each item. Using operator.itemgetter() is easiest:
from operator import itemgetter
max(g, key=itemgetter(0))
but if you have to test against integer values instead of lexographically sorted items, a lambda might be better:
max(g, key=lambda k: int(k[0]))
Which one you need depends on what you expect the maximum to be for strings containing digits of differing length. Is '4' smaller or larger than '30'?
Demo:
>>> g = (['42', '50'], ['30', '4'])
>>> from operator import itemgetter
>>> max(g, key=itemgetter(0))
['42', '50']
>>> g = (['20', '10'], ['10', '74'])
>>> max(g, key=itemgetter(0))
['20', '10']
or showing the difference between itemgetter() and a lambda with int():
>>> max((['30', '10'], ['4', '10']), key=lambda k: int(k[0]))
['30', '10']
>>> max((['30', '10'], ['4', '10']), key=itemgetter(0))
['4', '10']
You can use lambda to specify which item should be used for comparison:
>>> g = (['20', '10'], ['10', '74'])
>>> max(g, key = lambda x:int(x[0])) #use int() for conversion
['20', '10']
>>> g = (['42', '50'], ['30', '4'])
>>> max(g, key = lambda x:int(x[0]))
['42', '50']
You can also use operator.itemegtter, but in this case it'll not work as the items are in string form.
If by 'max' you mean lexicographic max:
>>> max(['0','5','10','100'])
'5'
>>> min(['0','5','10','100'])
'0'
Then you can just use max with no key function at all:
>>> max((['20', '10'], ['10', '74']))
['20', '10']
>>> max((['42', '50'], ['30', '4']))
['42', '50']
If you mean numerical max, use a lambda:
>>> max((['0','10'],['5','100'],['100','1000']))
['5', '100']
>>> max((['0','10'],['5','100'],['100','1000']),key=lambda l:int(l[0]))
['100', '1000']
If you store numbers as numbers:
g = ([20, 10], [10, 74])
max(g)

Categories

Resources