Sort list of strings by position - python

I have a list of strings with the following pattern
my_list = ['/path/to/my/data/S1B_IW_GRDH_1SDV_20190610T030906_20190610T030931_016628_01F4BE_6B99_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190523T030954_20190523T031019_027349_0315A8_999E_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190511T030953_20190511T031018_027174_03102E_402F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190628T030956_20190628T031021_027874_032595_0B1F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190604T030955_20190604T031020_027524_031B16_BD33_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190622T030907_20190622T030932_016803_01F9F1_D6E9_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190505T030904_20190505T030929_016103_01E4AD_17B5_VV.tif']
I want to sort my list in chronological order using the time information that is present on each string (20190610,.....). The problem is that at the begining of each string I have the pattern S1A or S1B which makes that using a simple mylist.sort() does not work directly.
Looking in others posts I have seen that the solution would be to use the key argument with some kind of pattern.
My question is, how to start the sorting at a specific position of each string in my list. In my case I want to start sorting at position 35 right after _1SDV_
I have seen some options like
from operator import itemgetter
my_list.sort(key = itemgetter(35))
or
my_list.sort(key = lambda x: x[35])

Copying #schwobaseggl's solution from the comments, the following solution should work.
my_list.sort(key = lambda x: x[35:])
Example:
>>> my_list = ['91', '82', '73', '64', '55', '46', '37', '28', '19']
>>> my_list.sort()
>>> my_list
['19', '28', '37', '46', '55', '64', '73', '82', '91']
>>> my_list.sort(key = lambda x: x[1:]) # sorting after first position
>>> my_list
['91', '82', '73', '64', '55', '46', '37', '28', '19']

Using regex:
import regex as re
my_list = ['/path/to/my/data/S1B_IW_GRDH_1SDV_20190610T030906_20190610T030931_016628_01F4BE_6B99_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190523T030954_20190523T031019_027349_0315A8_999E_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190511T030953_20190511T031018_027174_03102E_402F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190628T030956_20190628T031021_027874_032595_0B1F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190604T030955_20190604T031020_027524_031B16_BD33_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190622T030907_20190622T030932_016803_01F9F1_D6E9_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190505T030904_20190505T030929_016103_01E4AD_17B5_VV.tif']
my_list.sort(key=lambda x: re.findall("\d{8}", x)[0])
print(my_list)
Output:
['/path/to/my/data/S1B_IW_GRDH_1SDV_20190505T030904_20190505T030929_016103_01E4AD_17B5_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190511T030953_20190511T031018_027174_03102E_402F_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190523T030954_20190523T031019_027349_0315A8_999E_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190604T030955_20190604T031020_027524_031B16_BD33_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190610T030906_20190610T030931_016628_01F4BE_6B99_VV.tif',
'/path/to/my/data/S1B_IW_GRDH_1SDV_20190622T030907_20190622T030932_016803_01F9F1_D6E9_VV.tif',
'/path/to/my/data/S1A_IW_GRDH_1SDV_20190628T030956_20190628T031021_027874_032595_0B1F_VV.tif']

Related

Get all the csv strings of a list as single elements in a new list with a comprehension list?

I have a list as follows:
listt = ['34','56,67','45,56,67','45']
I would like to get a list of single values.
this is my code:
new_list=[]
for element in listt:
if ',' in element:
subl=element.split(',')
new_list = new_list + subl
else:
new_list.append(element)
result:
['34', '56', '67', '45', '56', '67', '45']
Is there actually a way to do this with a comprehension list? (i.e. one liner).
It looks like too much code for such a tiny thing.
thanks.
spam = ['34','56,67','45,56,67','45']
eggs = [num for item in spam for num in item.split(',')]
print(eggs)
output
['34', '56', '67', '45', '56', '67', '45']
listt = ['34','56,67','45,56,67','45']
print(','.join(listt).split(','))
Prints:
['34', '56', '67', '45', '56', '67', '45']

How to read 1st column from csv and separate into multidimensional array

I am trying to separate a column that I read from a .csv file into a multidimensional array. So, if the first column is read into a single array and looks like this:
t = ['90-0066', '24', '33', '34', '91-0495', '22', '33', '92-6676', '23', '32']
How do I write the code in python for every value like '90-0066' the following numbers are put into an array until the next - value? So I would like the array to look like:
t = [['24', '33', '34'], ['22', '33'], ['23', '32']]
Thanks!
You can use itertools.groupby in a list comprehension:
from itertools import groupby
t = [list(g) for k, g in groupby(t, key=str.isdigit) if k]
t becomes:
[['24', '33', '34'], ['22', '33'], ['23', '32']]
If the numbers are possibly floating points, you can use regex instead:
import re
t = [list(g) for k, g in groupby(t, key=lambda s: bool(re.match(r'\d+(?:\.\d+)?$', s)) if k]
Or zip longest with two list comprehensions:
>>> from itertools import zip_longest
>>> l=[i for i,v in enumerate(t) if not v.isdigit()]
>>> [t[x+1:y] for x,y in zip_longest(l,l[1:])]
[['24', '33', '34'], ['22', '33'], ['23', '32']]
>>>

Python Lists Unsolved [duplicate]

This question already has answers here:
Add SUM of values of two LISTS into new LIST
(22 answers)
Closed 6 years ago.
I'm pretty new to Python although I have learned most the basic's although I need to be able to read from a csv file (which so far works), then append the data from this csv into lists which is working, and the part I am unsure about is using two of these lists and / 120 and * 100
for example the list1 first score is 55 and list2 is 51, I want to merge these together into a list to equal 106 and then add something which can divide then times each one as there is 7 different numbers in each list.
import csv
list1 = []
list2 = []
with open("scores.csv") as f:
reader = csv.reader(f)
for row in reader:
list1.append(row[1])
list2.append(row[2])
print (list1)
print (list2)
OUTPUT
['55', '25', '40', '21', '52', '42', '19']
['51', '36', '50', '39', '53', '33', '40']
EXPECTED OUTPUT (WANTED OUTPUT)
['106', '36', '90', '60', '105', '75', '59']
which then needs to be divided by 120 and * 100 for each one.
Check out zip.
for a, b in zip(list1, list2):
# .... do stuff
so for you maybe:
output = [((int(a)+int(b))/120)*100 for a, b in zip(list1, list2)]
Make a new list that takes your desired calculations into account.
>>> list1 = ['55', '25', '40', '21', '52', '42', '19']
>>> list2 = ['51', '36', '50', '39', '53', '33', '40']
>>> result = [(int(x)+int(y))/1.2 for x,y in zip(list1, list2)]
>>> result
[88.33333333333334, 50.833333333333336, 75.0, 50.0, 87.5, 62.5, 49.16666666666667]

Split a list of strings by comma

I want to convert
['60,78', '70,77', '80,74', '90,75', '100,74', '110,75']
in to
['60', '78', '70', '77'.. etc]
I thought I could use
for word in lines:
word = word.split(",")
newlist.append(word)
return newlist
but this produces this instead:
[['60', '78'], ['70', '77'], ['80', '74'], ['90', '75'], ['100', '74'], ['110', '75']]
Can anyone please offer a solution?
You need to use list.extend instead of list.append.
newlist = []
for word in lines:
word = word.split(",")
newlist.extend(word) # <----
return newlist
Or, using list comprehension:
>>> lst = ['60,78', '70,77', '80,74', '90,75', '100,74', '110,75']
>>> [x for xs in lst for x in xs.split(',')]
['60', '78', '70', '77', '80', '74', '90', '75', '100', '74', '110', '75']
str.split actually returns a list.
Return a list of the words in the string, using sep as the delimiter string.
Since you are appending the returned list to newlist, you are getting a list of lists. Instead use list.extend method, like this
for word in lines:
newlist.extend(word.split(","))
But you can simply use nested list comprehension like this
>>> data = ['60,78', '70,77', '80,74', '90,75', '100,74', '110,75']
>>> [item for items in data for item in items.split(",")]
['60', '78', '70', '77', '80', '74', '90', '75', '100', '74', '110', '75']
using itertools.chain :
from itertools import chain
print(list(chain.from_iterable(ele.split(",") for ele in l)))
['60', '78', '70', '77', '80', '74', '90', '75', '100', '74', '110', '75']
The more items you have to flatten chain does it a bit more efficiently:
In [1]: l= ["1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20" for _ in range(100000)]
In [2]: from itertools import chain
In [3]: l= ["1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30" for _ in range(10000)]
In [4]: timeit (list(chain.from_iterable(ele.split(",") for ele in l)))
100 loops, best of 3: 17.7 ms per loop
In [5]: timeit [item for items in l for item in items.split(",")]
10 loops, best of 3: 20.9 ms per loop
I think this was the easiest way (thanks to a friend who helped with this)
list=['60,78', '70,77', '80,74', '90,75', '100,74', '110,75']
for word in list:
chapter, number = word.split(',') #word = word.split(',')
print(word)

Random data generator mathing a regex in python

In python, I am looking for python code which I can use to create random data matching any regex. For example, if the regex is
\d{1,100}
I want to have a list of random numbers with a random length between 1 and 100 (equally distributed)
There are some 'regex inverters' available (see here) which compute ALL possible matches, which is not what I want, and which is extremely impracticable. The example above, for example, has more then 10^100 possible matches, which never can be stored in a list. I just need a function to return a match by random.
Maybe there is a package already available which can be used to accomplish this? I need a function that creates a matching string for ANY regex, not just the given one or some other, but maybe 100 different regex. I just cannot code them myself, I want the function extract the pattern to return me a matching string.
If the expressions you match do not have any "advanced" features, like look-ahead or look-behind, then you can parse it yourself and build a proper generator
Treat each part of the regex as a function returning something (e.g., between 1 and 100 digits) and glue them together at the top:
import random
from string import digits, uppercase, letters
def joiner(*items):
# actually should return lambda as the other functions
return ''.join(item() for item in items)
def roll(item, n1, n2=None):
n2 = n2 or n1
return lambda: ''.join(item() for _ in xrange(random.randint(n1, n2)))
def rand(collection):
return lambda: random.choice(collection)
# this is a generator for /\d{1,10}:[A-Z]{5}/
print joiner(roll(rand(digits), 1, 10),
rand(':'),
roll(rand(uppercase), 5))
# [A-C]{2}\d{2,20}#\w{10,1000}
print joiner(roll(rand('ABC'), 2),
roll(rand(digits), 2, 20),
rand('#'),
roll(rand(letters), 10, 1000))
Parsing the regex would be another question. So this solution is not universal, but maybe it's sufficient
Two Python libraries can do this: sre-yield and Hypothesis.
sre-yield
sre-yeld will generate all values matching a given regular expression. It uses SRE, Python's default regular expression engine.
For example,
import sre_yield
list(sre_yield.AllStrings('[a-z]oo$'))
['aoo', 'boo', 'coo', 'doo', 'eoo', 'foo', 'goo', 'hoo', 'ioo', 'joo', 'koo', 'loo', 'moo', 'noo', 'ooo', 'poo', 'qoo', 'roo', 'soo', 'too', 'uoo', 'voo', 'woo', 'xoo', 'yoo', 'zoo']
For decimal numbers,
list(sre_yield.AllStrings('\d{1,2}'))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99']
Hypothesis
The unit test library Hypothesis will generate random matching examples. It is also built using SRE.
import hypothesis
g=hypothesis.strategies.from_regex(r'^[A-Z][a-z]$')
g.example()
with output such as:
'Gssov', 'Lmsud', 'Ixnoy'
For decimal numbers
d=hypothesis.strategies.from_regex(r'^[0-9]{1,2}$')
will output one or two digit decimal numbers: 65, 7, 67 although not evenly distributed. Using \d yielded unprintable strings.
Note: use begin and end anchors to prevent extraneous characters.
From this answer
You could try using python to call this perl module:
https://metacpan.org/module/String::Random

Categories

Resources