Extract certain elements from a list - python

I have no clue about Python and started to use it on some files. I managed to find out how to do all the things that I need, except for 2 things.
1st
>>>line = ['0', '1', '2', '3', '4', '5', '6']
>>>#prints all elements of line as expected
>>>print string.join(line)
0 1 2 3 4 5 6
>>>#prints the first two elements as expected
>>>print string.join(line[0:2])
0 1
>>>#expected to print the first, second, fourth and sixth element;
>>>#Raises an exception instead
>>>print string.join(line[0:2:4:6])
SyntaxError: invalid syntax
I want this to work similar to awk '{ print $1 $2 $5 $7 }'. How can I accomplish this?
2nd
how can I delete the last character of the line? There is an additional ' that I don't need.

Provided the join here is just to have a nice string to print or store as result (with a coma as separator, in the OP example it would have been whatever was in string).
line = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
print ','.join (line[0:2])
A,B
print ','.join (line[i] for i in [0,1,2,4,5,6])
A,B,C,E,F,G
What you are doing in both cases is extracting a sublist from the initial list. The first one use a slice, the second one use a list comprehension. As others said you could also have accessed to elements one by one, the above syntaxes are merely shorthands for:
print ','.join ([line[0], line[1]])
A,B
print ','.join ([line[0], line[1], line[2], line[4], line[5], line[6]])
A,B,C,E,F,G
I believe some short tutorial on list slices could be helpfull:
l[x:y] is a 'slice' of list l. It will get all elements between position x (included) and position y (excluded). Positions starts at 0. If y is out of list or missing, it will include all list until the end. If you use negative numbers you count from the end of the list. You can also use a third parameter like in l[x:y:step] if you want to 'jump over' some items (not take them in the slice) with a regular interval.
Some examples:
l = range(1, 100) # create a list of 99 integers from 1 to 99
l[:] # resulting slice is a copy of the list
l[0:] # another way to get a copy of the list
l[0:99] # as we know the number of items, we could also do that
l[0:0] # a new empty list (remember y is excluded]
l[0:1] # a new list that contains only the first item of the old list
l[0:2] # a new list that contains only the first two items of the old list
l[0:-1] # a new list that contains all the items of the old list, except the last
l[0:len(l)-1] # same as above but less clear
l[0:-2] # a new list that contains all the items of the old list, except the last two
l[0:len(l)-2] # same as above but less clear
l[1:-1] # a new list with first and last item of the original list removed
l[-2:] # a list that contains the last two items of the original list
l[0::2] # odd numbers
l[1::2] # even numbers
l[2::3] # multiples of 3
If rules to get items are more complex, you'll use a list comprehension instead of a slice, but it's another subjet. That's what I use in my second join example.

You don't want to use join for that. If you just want to print some bits of a list, then specify the ones you want directly:
print '%s %s %s %s' % (line[0], line[1], line[4], line[6])

Assuming that the line variable should contain a line of cells, separated by commas...
You can use map for that:
line = "1,2,3,4,5,6"
cells = line.split(",")
indices=[0,1,4,6]
selected_elements = map( lambda i: cells[i], indices )
print ",".join(selected_elements)
The map function will do the on-the-fly function for each of the indices in the list argument. (Reorder to your liking)

You could use the following using list comprehension :
indices = [0,1,4,6]
Ipadd = string.join([line[i] for i in xrange(len(line)) if i in indices])
Note : You could also use :
Ipadd = string.join([line[i] for i in indices])
but you will need a sorted list of indices without repetition of course.

Answer to the second question:
If your string is contained in myLine, just do:
myLline = myLine[:-1]
to remove the last character.
Or you could also use rstrip():
myLine = myLine.rstrip("'")

>>> token = ':'
>>> s = '1:2:3:4:5:6:7:8:9:10'
>>> sp = s.split(token)
>>> token.join(filter(bool, map(lambda i: i in [0,2,4,6] and sp[i] or False, range(len(sp)))))
'1:3:5:7'

l = []
l.extend(line[0:2])
l.append(line[5]) # fourth field
l.append(line[7]) # sixth field
string.join(l)
Alternatively
"{l[0]} {l[1]} {l[4]} {l[5]}".format(l=line)
Please see PEP 3101 and stop using the % operator for string formatting.

Related

Returning max of string after comparison with other sub-strings - Python

I have a list that looks like this:
json_file_list = ['349148424_20180312071059_20190402142033.json','349148424_20180312071059_20190405142033.json','360758678_20180529121334_20190402142033.json']
and a empty list:
list2 = []
What I want to do is compare the characters up until the second underscore '_', and if they are the same I only want to append the max of the full string, to the new list. In the case above, the first 2 entries are duplicates (until second underscore) so I want to base the max off the numbers after the second underscore. So the final list2 would have only 2 entries and not 3
I tried this:
for row in json_file_list:
if row[:24] == row[:24]:
list2.append(max(row))
else:
list2.append(row)
but that is just returning:
['s', 's', 's']
Final output should be:
['349148424_20180312071059_20190405142033.json','360758678_20180529121334_20190402142033.json']
Any ideas? I also realize this code is brittle with the way I am slicing it (what happens if the string gets longer/shorter) so I need to come up with a better way to do that. Maybe base if off the second underscore instead. The strings will always end with '.json'
I'd use a dictionary to do this:
from collections import defaultdict
d = defaultdict(list)
for x in json_file_list:
d[tuple(x.split("_")[:2])].append(x)
new_list = [max(x) for x in d.values()]
new_list
Output:
['349148424_20180312071059_20190405142033.json',
'360758678_20180529121334_20190402142033.json']
The if statement in this snippet:
for row in json_file_list:
if row[:24] == row[:24]:
list2.append(max(row))
else:
list2.append(row)
always resolves to True. Think about it, how could row[:24] be different from itself? Given that it's resolving to True, it's adding the farthest letter in the alphabet (and in your string), s in this case, to list2. That's why you're getting an output of ['s', 's', 's'].
Maybe I'm understanding your request incorrectly, but couldn't you just append all the elements of the row to a list and then remove duplicates?
for row in json_file_list:
for elem in row:
list2.append(elem)
list2 = sorted(list(set(list2)))
I suppose you can splice what you want to compare, and use the built in 'set', to perform your difference:
set([x[:24] for x in json_file_list])
set(['360758678_20180529121334', '349148424_20180312071059'])
It would be a simple matter of joining the remaining text later on
list2=[]
for unique in set([x[:24] for x in json_file_list]):
list2.append(unique + json_file_list[0][24:])
list2
['360758678_20180529121334_20190402142033.json',
'349148424_20180312071059_20190402142033.json']

whats the use of [-1] and [0] here?

I would like to know whats the use of [-1] and [0] here. I also tried [1] in the first split and still working the same.
symbols = ["Wiki/ADBE.4", "Wiki/ALGN.4"]
clean_symbols = []
for symbol in symbols:
symbol = symbol.split("Wiki/")[-1].split(".4")[0]
print(symbol)
clean_symbols.append(symbol)
print(clean_symbols)
Thanks!
It's just indexing in lists. Let's look at how it works:
>>> symbol = "Wiki/ADBE.4" # this happens in the for loop
>>> symbol.split("Wiki/")
['', 'ADBE.4']
We have got two items in a list, created by split. Lists are indexed from 0, so 1 is "second item" and -1 is "the last item". In this case, this is the same item, so it works for both 1 and -1. But it really works that way only because you have a list with two items:
>>> symbol.split("Wiki/")[-1]
'ADBE.4'
>>> symbol.split("Wiki/")[1]
'ADBE.4'
If you had more, it would not be the same result:
>>> x = ['first', 'second', 'third']
>>> x[-1]
'third'
>>> x[1]
'second'
And then the same thing happens for the new string we got. A list and then an index picking the first item:
>>> symbol.split("Wiki/")[-1].split(".4")
['ADBE', '']
>>> symbol.split("Wiki/")[-1].split(".4")[0]
'ADBE'
And that's all the magic.
split creates a list. The rest is just list indexing. Negative index numbers count from the end, so [-1] is the last element of the list created by the first split. The next [0] index means the first element of the list created by the second split (just like it does in almost all languages).
Since [-1] and [1] work the same way, it probably means that your list has exactly 2 elements, so its last (-1) element is the same as its second ([1]).
For first iteration, split returns a list of which we are interested in the last element. Hence [-1]
symbol.split("Wiki/") returns ['', 'ADBE.4']
symbol.split("Wiki/")[-1] returns 'ADBE.4'
Hence, the second split returns a list of which we need the first element, hence [0]
'ADBE.4'.split('.4') returns ['ADBE','']
'ADBE.4'.split('.4')[0] returns 'ADBE'

Slices index python

Why when I run
>>> lista = [1,2,3,4,5]
>>> newl = [8,10]
>>> lista[1:4] = newl
[1,8,10,5]
The indexes for replaced values are between 1 until 3. And when I run.
>>> lista[2:2] = newl
[1,2,8,10,3,4,5]
A new index is created to save newl.
To understand slicing, you need to understand this.
Let's say
hi = "Hello"
The slice hi[1:2] contains "e". It starts at the second character and ends before the third. hi[2:2] contains nothing, because it starts at the third character and ends before the third character.
If you are inserting something between characters, it is replacing it. If you do:
hi[1:3] = "abcd"
Then "abcd" is replacing "el". This is the same with lists.
Slice indexes are start-inclusive and end-exclusive.
mylist[1:4] contains the elements at indexes 1, 2, and 3.
From http://docs.python.org/2/library/stdtypes.html:
The slice of s from i to j is defined as the sequence of items with index k such that i <= k < j.
So if you get mylist[2:2] you are retrieving elements for which 2 <= k < 2 (no elements).
However, the list slicing syntax is clever enough to let you assign into that space, and insert elements into that position. If you run
mylist[2:2] = [5,6,7]
then you are inserting element into that space before index 2 that currently holds no elements.
In the first case you tell python to replace 3 specific elements in lista with other 2 elements from newl.
In the second case you reinitialize lista, then you select for substitution lista[2:2] that is an empty list ([]), and more precisely the empty list before the 3rd element of the list (whose index is 2) and so you replace this empty list with the two values from newl.

How to iterate through a list of strings with different lengths?

Suppose I have
lists = ["ABC","AC","CCCC","BC"]
I want a new list where items in my new list are grouped by position based on lists meaning for each string in the list take the position 0("ABC" position 0 is "A") and make a string out of it.
position = ["AACB","BCCC","CC","C"]
I try:
for i in range(0,4):
want = [lists[i] for stuff in lists]
and I get
IndexError: string index out of range
Which makes sense because all the strings are different size. Can anyone help?
I think you might want this:
import itertools
lists = ["ABC","AC","CCCC","BC"]
position = map(''.join,itertools.izip_longest(*lists, fillvalue=''))
and you get:
['AACB', 'BCCC', 'CC', 'C']
edit: now with the new example...
You can use this list comprehension:
>>> lists = ["ABC","AC","CCCC","BC"]
>>> [''.join([s[i:i+1] for s in lists]) for i, el in enumerate(lists)]
['AACB', 'BCCC', 'CC', 'C']
Using the slice notation prevents index errors on non-existing elements.

extract from a list of lists

How can I extract elements in a list of lists and create another one in python. So, I want to get from this:
all_list = [['1 2 3 4','2 3 4 5'],['2 4 4 5', '3 4 5 5' ]]
a new list like this:
list_of_lists = [[('3','4'),('4','5')], [('4','5'),('5','5')]]
Following is what I did, and it doesn't work.
for i in xrange(len(all_lists)):
newlist=[]
for l in all_lists[i]:
mylist = l.split()
score1 = float(mylist[2])
score2 = mylist[3]
temp_list = (score1, score2)
newlist.append(temp_list)
list_of_lists.append(newlist)
Please help. Many thanks in advance.
You could use a nested list comprehension. (This assumes you want the last two "scores" out of each string):
[[tuple(l.split()[-2:]) for l in list] for list in all_list]
It could work almost as-is if you filled in the value for mylist -- right now its undefined.
Hint: use the split function on the strings to break them up into their components, and you can fill mylist with the result.
Hint 2: Make sure that newlist is set back to an empty list at some point.
Adding to eruciforms answer.
First remark, you don't need to generate the indices for the all_list list. You can just iterate over it directly:
for list in all_lists:
for item in list:
# magic stuff
Second remark, you can make your string splitting much more succinct by splicing the list:
values = item.split()[-2:] # select last two numbers.
Reducing it further using map or a list comprehension; you can make all the items a float on the fly:
# make a list from the two end elements of the splitted list.
values = [float(n) for n in item.split()[-2:]]
And tuplify the resulting list with the tuple built-in:
values = tuple([float(n) for n in item.split()[-2:]])
In the end you can collapse it all to one big list comprehension as sdolan shows.
Of course you can manually index into the results as well and create a tuple, but usually it's more verbose, and harder to change.
Took some liberties with your variable names, values would tmp_list in your example.

Categories

Resources