I want to get border of data in a list using python
For example I have this list :
a = [1,1,1,1,4,4,4,6,6,6,6,6,1,1,1]
I want a code that return data borders. for example:
a = [1,1,1,1,4,4,4,6,6,6,6,6,1,1,1]
^ ^ ^ ^
b = get_border_index(a)
print(b)
output:
[0,4,7,12]
How can I implement get_border_index(lst: list) -> list function?
The scalable answer that also works for very long lists or arrays is to use np.diff. In that case you should avoid a for loop at all costs.
import numpy as np
a = [1,1,1,1,4,4,4,6,6,6,6,6,1,1,1]
a = np.array(a)
# this is unequal 0 if there is a step
d = np.diff(a)
# boolean array where the steps are
is_step = d != 0
# get the indices of the steps (first one is trivial).
ics = np.where(is_step)
# get the first dimension and shift by one as you want
# the index of the element right of the step
ics_shift = ics[0] + 1
# and if you need a list
ics_list = ics_shift.tolist()
print(ics_list)
You can use for loop with enumerate
def get_border_index(a):
last_value = None
result = []
for i, v in enumerate(a):
if v != last_value:
last_value = v
result.append(i)
return result
a = [1,1,1,1,4,4,4,6,6,6,6,6,1,1,1]
b = get_border_index(a)
print(b)
Output
[0, 4, 7, 12]
This code will check if an element in the a list is different then the element before and if so it will append the index of the element to the result list.
Today I'm requesting help with a Python script that I'm writing; I'm using the CSV module to parse a large document with about 1,100 rows, and from each row it's pulling a Case_ID, a unique number that no other row has. For example:
['10215', '10216', '10277', '10278', '10279', '10280', '10281', '10282', '10292', '10293',
'10295', '10296', '10297', '10298', '10299', '10300', '10301', '10302', '10303', '10304',
'10305', '10306', '10307', '10308', '10309', '10310', '10311', '10312', '10313', '10314',
'10315', '10316', '10317', '10318', '10319', '10320', '10321', '10322', '10323', '10324',
'10325', '10326', '10344', '10399', '10400', '10401', '10402', '10403', '10404', '10405',
'10406', '10415', '10416', '10417', '10418', '10430', '10448', '10492', '10493', '10494',
'10495', '10574', '10575', '10576', '10577', '10578', '10579', '10580', '10581', '10582',
'10583', '10584', '10585', '10586', '10587', '10588', '10589', '10590', '10591', '10592',
'10593', '10594', '10595', '10596', '10597', '10598', '10599', '10600', '10601', '10602',
'10603', '10604', '10605', '10606', '10607', '10608', '10609', '10610', '10611', '10612',
'10613', '10614', '10615', '10616', '10617', '10618', '10619', '10620', '10621', '10622',
'10623', '10624', '10625', '10626', '10627', '10628', '10629', '10630', '10631', '10632',
'10633', '10634', '10635', '10636', '10637', '10638', '10639', '10640', '10641', '10642',
'10643', '10644', '10645', '10646', '10647', '10648', '10649', '10650', '10651', '10652',
'10653', '10654', '10655', '10656', '10657', '10658', '10659', '10707', '10708', '10709',
'10710', '10792', '10793', '10794', '10795', '10908', '10936', '10937', '10938', '10939',
'11108', '11109', '11110', '11111', '11112', '11113', '11114', '11115', '11116', '11117',
'11118', '11119', '11120', '11121', '11122', '11123', '11124', '11125', '11126', '11127',
'11128', '11129', '11130', '11131', '11132', '11133', '11134', '11135', '11136', '11137',
'11138', '11139', '11140', '11141', '11142', '11143', '11144', '11145', '11146', '11147',
'11148', '11149', '11150', '11151', '11152', '11153', '11154', '11155', '11194', '11195',
'11196', '11197', '11198', '11199', '11200', '11201', '11202', '11203', '11204', '11205',
'11206', '11207', '11208', '11209', '11210', '11211', '11212', '11213', '11214', '11215',
'11216', '11217', '11218', '11219', '11220', '11221', '11222', '11223', '11224', '11225',
'11226', '11227', '11228', '11229', '11230', '11231', '11232', '11233', '11234', '11235',
'10101', '10102', '10800', '11236']
As you can see, this list is quite an eyeful, so I'd like to include a small little function in my script that can reduce all of the sequential ranges down to hyphenated bookends of a sort, for example 10,277 - 10,282.
Thanks to all for any help included! Have a great day.
Doable. Let's see if this can be done with pandas.
import pandas as pd
data = ['10215', '10216', '10277', ...]
# Load data as series.
s = pd.Series(data)
# Find all consecutive rows with a difference of one
# and bin them into groups using `cumsum`.
v = s.astype(int).diff().bfill().ne(1).cumsum()
# Use `groupby` and `apply` to condense the consecutive numbers into ranges.
# This is only done if the group size is >1.
ranges = (
s.groupby(v).apply(
lambda x: '-'.join(x.values[[0, -1]]) if len(x) > 1 else x.item()).tolist())
print (ranges)
['10215-10216',
'10277-10282',
'10292-10293',
'10295-10326',
'10344',
'10399-10406',
'10415-10418',
'10430',
'10448',
'10492-10495',
'10574-10659',
'10707-10710',
'10792-10795',
'10908',
'10936-10939',
'11108-11155',
'11194-11235',
'10101-10102',
'10800',
'11236']
Your data must be sorted for this to work.
You can just use a simple loop here with the following logic:
Create a list to store the ranges (ranges).
Iterate over the values in your list (l)
If ranges is empty, append a list with the first value in l to ranges
Otherwise if the difference between the current and previous value is 1, append the current value to the last list in ranges
Otherwise append a list with the current value to ranges
Code:
l = ['10215', '10216', '10277', '10278', '10279', '10280', ...]
ranges = []
for x in l:
if not ranges:
ranges.append([x])
elif int(x)-prev_x == 1:
ranges[-1].append(x)
else:
ranges.append([x])
prev_x = int(x)
Now you can compute your final ranges by concatenating the first and last element of each list in ranges (if there are at least 2 elements).
final_ranges = ["-".join([r[0], r[-1]] if len(r) > 1 else r) for r in ranges]
print(final_ranges)
#['10215-10216',
# '10277-10282',
# '10292-10293',
# '10295-10326',
# '10344',
# '10399-10406',
# '10415-10418',
# '10430',
# '10448',
# '10492-10495',
# '10574-10659',
# '10707-10710',
# '10792-10795',
# '10908',
# '10936-10939',
# '11108-11155',
# '11194-11235',
# '10101-10102',
# '10800',
# '11236']
This also assumes your data is sorted. You could simplify the code to combine items 3 and 5.
For purely educational purposes (this is much more inefficient that the loop above), here's the same thing using map and reduce:
from functools import reduce
def myreducer(ranges, x):
if not ranges:
return [[x]]
elif (int(x) - int(ranges[-1][-1]) == 1):
return ranges[:-1] + [ranges[-1]+[x]]
else:
return ranges + [[x]]
final_ranges = map(
lambda r: "-".join([r[0], r[-1]] if len(r) > 1 else r),
reduce(myreducer, l, [])
)
There is also the pynumparser package:
import pynumparser
pynumparser.NumberSequence().encode([1, 2, 3, 5, 6, 7, 8, 10])
# result: '1-3,5-8,10'
pynumparser.NumberSequence().parse('1-3,5-8,10')
# result: (1, 2, 3, 5, 6, 7, 8, 10)
thats what I get:
TypeError: 'float' object is unsubscriptable
Thats what I did:
import numpy as N
import itertools
#I created two lists, containing large amounts of numbers, i.e. 3.465
lx = [3.625, 4.625, ...]
ly = [41.435, 42.435, ...] #The lists are not the same size!
xy = list(itertools.product(lx,ly)) #create a nice "table" of my lists
#that iterttools gives me something like
print xy
[(3.625, 41.435), (3.625, 42.435), (... , ..), ... ]
print xy[0][0]
print xy[0][1] #that works just fine, I can access the varios values of the tuple in the list
#down here is where the error occurs
#I basically try to access certain points in "lon"/"lat" with values from xy through `b` and `v`with that iteration. lon/lat are read earlier in the script
b = -1
v = 1
for l in xy:
b += 1
idx = N.where(lon==l[b][b])[0][0]
idy = N.where(lat==l[b][v])[0][0]
lan/lot are read earlier in the script. I am working with a netCDF file and this is the latitude/longitude,read into lan/lot.
Its an array, build with numpy.
Where is the mistake?
I tried to convert b and v with int() to integers, but that did not help.
The N.where is accessing through the value from xy a certain value on a grid with which I want to proceed. If you need more code or some plots, let me know please.
Your problem is that when you loop over xy, each value of l is a single element of your xy list, one of the tuples. The value of l in the first iteration of the loop is (3.625, 41.435), the second is (3.625, 42.435), and so on.
When you do l[b], you get 3.625. When you do l[b][b], you try to get the first element of 3.625, but that is a float, so it has no indexes. That gives you an error.
To put it another way, in the first iteration of the loop, l is the same as xy[0], so l[0] is the same as xy[0][0]. In the second iteration, l is the same as xy[1], so l[0] is the same as xy[1][0]. In the third iteration, l is equivalent to xy[2], and so on. So in the first iteration, l[0][0] is the same as xy[0][0][0], but there is no such thing so you get an error.
To get the first and second values of the tuple, using the indexing approach you could just do:
x = l[0]
y = l[1]
Or, in your case:
for l in xy:
idx = N.where(lon==l[0])[0][0]
idy = N.where(lat==l[1])[0][0]
However, the simplest solution would be to use what is called "tuple unpacking":
for x, y in xy:
idx = N.where(lon==x)[0][0]
idy = N.where(lat==y)[0][0]
This is equivalent to:
for l in xy:
x, y = l
idx = N.where(lon==x)[0][0]
idy = N.where(lat==y)[0][0]
which in turn is equivalent to:
for l in xy:
x = l[0]
y = l[1]
idx = N.where(lon==x)[0][0]
idy = N.where(lat==y)[0][0]