Membership test results different with list and csv.reader - python

I get different results if I check for membership with a list than with a csv.reader object.
The below uses the unittest module.
csv.reader test for membership
with open("file.tab", 'rb') as f:
reader = csv.reader(f, delimiter='\t')
self.assertTrue(['1', '2', '3', '4'] in reader)
self.assertTrue(['2', '3', '4', '5'] in reader)
self.assertTrue(['3', '4', '5', '6'] in reader)
list test for membership
with open("file.tab", 'rb') as f:
reader = csv.reader(f, delimiter='\t')
reader = [record for record in reader]
self.assertTrue(['1', '2', '3', '4'] in reader)
self.assertTrue(['2', '3', '4', '5'] in reader)
self.assertTrue(['3', '4', '5', '6'] in reader)
I know that file.tab contains entries for the three records I'm testing for, but the third assert comes up "False is not true" when using csv.reader and passes when using a list.
csv.reader is a generator; the docs don't explicitly say, but since I can exhaust it I think that means it's a generator. My thinking was this might be the reason, but the following test prints nothing but true:
x = xrange(5)
for m in range(5):
for n in range(5):
print m in x
print n in x
Which makes me think that there are no problems testing for membership with a generator.
Why does the third assert statement evaluate differently when I use a csv.reader than when I use a list?

You had some bad luck there-- xrange isn't actually a generator, but a special type of its own which behaves lazily, and so can fool you into thinking it's one.
>>> x = xrange(10)
>>> 5 in x
True
>>> 5 in x
True
but
>>> it = iter(range(10))
>>> 5 in it
True
>>> 5 in it
False
So your logic was right: the reader instance can be exhausted, but the list can't, which is why membership tests can return different answers, depending on the contents. Note though that membership tests may short-circuit, and so they don't have to exhaust in case of a positive result:
>>> it = iter(range(10))
>>> 3 in it
True
>>> next(it)
4

Yes, csv.reader is a generator and in iterates while it finds the value. As DSM demonstrated.
In your CSV file the order of the rows is different than in your tests. Your tests will pass if you change the order:
>>> def fake_reader():
... yield ['1', '2', '3', '4']
... yield ['2', '3', '4', '5']
... yield ['3', '4', '5', '6']
>>> reader = fake_reader()
>>> ['1', '2', '3', '4'] in reader
True
>>> ['2', '3', '4', '5'] in reader
True
>>> ['3', '4', '5', '6'] in reader
True
And it fails if the order is different:
>>> def fake_reader():
... yield ['1', '2', '3', '4']
... yield ['3', '4', '5', '6'] # changed order
... yield ['2', '3', '4', '5']
>>> reader = fake_reader()
>>> ['1', '2', '3', '4'] in reader # reads one row
True
>>> ['2', '3', '4', '5'] in reader # reads two rows!
True
>>> ['3', '4', '5', '6'] in reader # there are no more rows to read
False

Related

Why does my list have extra double brackets?

arr_list = []
arr = ['5', '6', '2', '4', '+']
arr_list.append([''.join(arr[0:4])])
print(arr_list)
Ouput: [['5624']]
Why does the output have 2 sets of square brackets? I only want one.
Thanks in advnace.
arr_list = []
arr = ['5', '6', '2', '4', '+']
arr_list.append(''.join(arr[0:4]))
print(arr_list)
simply remove the brackets from the third row
Use:
arr_list.append(''.join(arr[0:4]))
You're appending a list to arr_list.
Use this instead:
arr_list.append(''.join(arr[0:4]))
Remove the brackets around what you are appending:
>>> arr_list = []
>>> arr = ['5', '6', '2', '4', '+']
>>> arr_list.append(''.join(arr[0:4]))
>>> arr_list
['5624']
Or you could use list.extend rather than list.append:
>>> arr_list = []
>>> arr = ['5', '6', '2', '4', '+']
>>> arr_list.extend([''.join(arr[0:4])])
>>> arr_list
['5624']
Use
arr_list.append(''.join(arr[0:4]))
or
print(''.join(arr[0:4]))

how to split a list every nth item

I am trying to split a list every 5th item, then delete the next two items ('nan'). I have attempted to use List[:5], but that does not seem to work in a loop. The desired output is: [['1','2','3','4','5'],['1','2','3','4','5'],['1','2','3','4','5'],['1','2','3','4','5']]
List = ['1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan']
for i in List:
# split first 5 items
# delete next two items
# Desired output:
# [['1','2','3','4','5'],['1','2','3','4','5'],['1','2','3','4','5'],['1','2','3','4','5']]
There are lots of ways to do this. I recommend stepping by 7 then splicing by 5.
data = ['1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan']
# Step by 7 and keep the first 5
chunks = [data[i:i+5] for i in range(0, len(data), 7)]
print(*chunks, sep='\n')
Output:
['1', '2', '3', '4', '5']
['1', '2', '3', '4', '5']
['1', '2', '3', '4', '5']
['1', '2', '3', '4', '5']
Reference: Split a python list into other “sublists”...
WARNING: make sure the list follows the rules as you said, after every 5 items 2 nan.
This loop will add the first 5 items as a list, and delete the first 7 items.
lst = ['1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan']
output = []
while True:
if len(lst) <= 0:
break
output.append(lst[:5])
del lst[:7]
print(output) # [['1', '2', '3', '4', '5'], ['1', '2', '3', '4', '5'], ['1', '2', '3', '4', '5'], ['1', '2', '3', '4', '5']]
List=['1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan']
new_list = list()
for k in range(len(List)//7):
new_list.append(List[k*7:k*7+5])
new_list.append(List[-len(List)%7])
Straightforward solution in case if the list doesn’t follow the rules you mentioned but you want to split sequence always between NAN's:
result, temp = [], []
for item in lst:
if item != 'nan':
temp.append(item)
elif temp:
result.append(list(temp))
temp = []
Using itertools.groupby would also support chunks of different lengths:
[list(v) for k, v in groupby(List, key='nan'.__ne__) if k]
I guess there is more pythonic way to do the same but:
result = []
while (len(List) > 5):
result.append(List[0:0+5])
del List[0:0+5]
del List[0:2]
This results: [['1', '2', '3', '4', '5'], ['1', '2', '3', '4', '5'], ['1', '2', '3', '4', '5'], ['1', '2', '3', '4', '5']]
mainlist=[]
sublist=[]
count=0
for i in List:
if i!="nan" :
if count==4:
# delete next two items
mainlist.append(sublist)
count=0
sublist=[]
else:
# split first 5 items
sublist.append(i)
count+=1
Generally numpy.split(...) will do any kind of custom splitting for you. Some reference:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html
And the code:
import numpy as np
lst = ['1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan','1','2','3','4','5','nan','nan']
ind=np.ravel([[i*7+5, (i+1)*7] for i in range(len(lst)//7)])
lst2=np.split(lst, ind)[:-1:2]
print(lst2)
Outputs:
[array(['1', '2', '3', '4', '5'], dtype='<U3'), array(['1', '2', '3', '4', '5'], dtype='<U3'), array(['1', '2', '3', '4', '5'], dtype='<U3'), array(['1', '2', '3', '4', '5'], dtype='<U3')]
I like the splice answers.
Here is my 2 cents.
# changed var name away from var type
myList = ['1','2','3','4','5','nan','nan','1','2','3','4','10','nan','nan','1','2','3','4','15','nan','nan','1','2','3','4','20','nan','nan']
newList = [] # declare new list of lists to create
addItem = [] # declare temp list
myIndex = 0 # declare temp counting variable
for i in myList:
myIndex +=1
if myIndex==6:
nothing = 0 #do nothing
elif myIndex==7: #add sub list to new list and reset variables
if len(addItem)>0:
newList.append(list(addItem))
addItem=[]
myIndex = 0
else:
addItem.append(i)
#output
print(newList)

Remove duplicate items from list

I tried following this post but, it doesnt seem to be working for me.
I tried this code:
for bresult in response.css(LIST_SELECTOR):
NAME_SELECTOR = 'h2 a ::attr(href)'
yield {
'name': bresult.css(NAME_SELECTOR).extract_first(),
}
b_result_list.append(bresult.css(NAME_SELECTOR).extract_first())
#set b_result_list to SET to remove dups, then change back to LIST
set(b_result_list)
list(set(b_result_list))
for brl in b_result_list:
print("brl: {}".format(brl))
This prints out:
brl: https://facebook.site.com/users/login
brl: https://facebook.site.com/users
brl: https://facebook.site.com/users/login
When I just need:
brl: https://facebook.site.com/users/login
brl: https://facebook.site.com/users
What am I doing wrong here?
Thank you!
you are discarding the result when you need to save it ... b_result_list never actually changes... so you are just iterating over the original list. instead save the result of the set operation
b_result_list = list(set(b_result_list))
(note that sets do not preserve order)
If you want to maintain order and uniqueify, you can do:
>>> li
['1', '1', '2', '2', '3', '3', '3', '3', '1', '1', '4', '5', '4', '6', '6']
>>> seen=set()
>>> [e for e in li if not (e in seen or seen.add(e))]
['1', '2', '3', '4', '5', '6']
Or, you can use the keys of an OrderedDict:
>>> from collections import OrderedDict
>>> OrderedDict([(k, None) for k in li]).keys()
['1', '2', '3', '4', '5', '6']
But a set alone may substantially change the order of the original list:
>>> list(set(li))
['1', '3', '2', '5', '4', '6']

Python few elements of the list were skipped while removing them in for loop

I am wondering what is the reason why some elements weren't removed in this program. Could someone provide pointers?
Program:
t = ['1', '2', '2', '2', '2', '2', '2', '2', '2', '2', '7', '8', '9', '10']
print len(t)
for x in t:
if x == '2':
print x
t.remove(x)
else:
print 'hello: '+str(x)
print t
Output on my system:
14
hello: 1
2
2
2
2
2
hello: 8
hello: 9
hello: 10
['1', '2', '2', '2', '2', '7', '8', '9', '10']
I am using Python 2.6.2.
Never alter the sequence on which you're iterating.
#cjonhson318's list-comprehension will work fine, or, less efficiently but more closely akin to your code, just loop on a copy of the list while you're altering the list itself:
for x in list(t):
if x == '2':
print x
t.remove(x)
else:
print 'hello: '+str(x)
As you see the only change from your code is looping on list(t) (a copy of t's initial value) rather than on t itself -- this modest change lets you alter t itself within the loop to your heart's contents.
Say something like:
t = [ i for i in t if i != '2' ]
for item in t:
print "Hello "+item
An alternative would be to get functional
from operator import ne
from functools import partial
t = ['1', '2', '2', '2', '2', '2', '2', '2', '2', '2', '7', '8', '9', '10']
for n in filter(partial(ne, '2'), t):
print('hello {}'.format(n))
Use the filter function to create a new list minus the 2 values.
If the use of partial and operator.ne was not to your liking, you could use a lambda
for n in filter(lambda x: x != '2', t):
print('hello {}'.format(n))

How could i refresh a list once an item has been removed from a list within a list in python

This is quite complicated but i would like to be able to refresh a larger list once at item has been taken out of a mini list within the bigger list.
listA = ['1','2','3','4','5','6','6','8','9','5','3','7']
i used the code below to split it into lists of threes
split = [listA[i:(i+3)] for i in range(0, len(listA) - 1, 3)]
print(split)
# [['1','2','3'],['4','5','6'],['6','8','9'],['5','3','7']]
split = [['1','2','3'],['4','5','6'],['6','8','9'],['5','3','7']]
if i deleted #3 from the first list, split will now be
del split[0][-1]
split = [['1','2'],['4','5','6'],['6','8','9'],['5','3','7']]
after #3 has been deleted, i would like to be able to refresh the list so that it looks like;
split = [['1','2','4'],['5','6','6'],['8','9','5'],['3','7']]
thanks in advance
Not sure how big this list is getting, but you would need to flatten it and recalculate it:
>>> listA = ['1','2','3','4','5','6','6','8','9','5','3','7']
>>> split = [listA[i:(i+3)] for i in range(0, len(listA) - 1, 3)]
>>> split
[['1', '2', '3'], ['4', '5', '6'], ['6', '8', '9'], ['5', '3', '7']]
>>> del split[0][-1]
>>> split
[['1', '2'], ['4', '5', '6'], ['6', '8', '9'], ['5', '3', '7']]
>>> listA = sum(split, []) # <- flatten split list back to 1 level
>>> listA
['1', '2', '4', '5', '6', '6', '8', '9', '5', '3', '7']
>>> split = [listA[i:(i+3)] for i in range(0, len(listA) - 1, 3)]
>>> split
[['1', '2', '4'], ['5', '6', '6'], ['8', '9', '5'], ['3', '7']]
Just recreate the single list from your nested lists, then re-split.
You can join the lists, assuming they are only one level deep, with something like:
rejoined = [element for sublist in split for element in sublist]
There are no doubt fancier ways, or single-liners that use itertools or some other library, but don't overthink it. If you're only talking about a few hundred or even a few thousand items this solution is quite good enough.
I need this for turning of cards in the deck in a solitaire game.
You can deal your cards using itertools.groupby() with a good key function:
def group_key(x, n=3, flag=[0], counter=itertools.count(0)):
if next(counter) % n == 0:
flag[0] = flag[0] ^ 1
return flag[0]
^ is a bitwise operator, basically it change the value of the flag from 0 to 1 and viceversa. The flag value is an element of a list because we're doing some kind of memoization.
Example:
>>> deck = ['1', '2', '3', '4', '5', '6', '6', '8', '9', '5', '3', '7']
>>> for k,g in itertools.groupby(deck, key=group_key):
... print(list(g))
['1', '2', '3']
['4', '5', '6']
['6', '8', '9']
['5', '3', '7']
Now let's say you've used card '9' and '8', so your new deck looks like:
>>> deck = ['1', '2', '3', '4', '5', '6', '6', '5', '3', '7']
>>> for k,g in itertools.groupby(deck, key=group_key):
... print(list(g))
['1', '2', '3']
['4', '5', '6']
['6', '5', '3']
['7']
Build an object that contains a list and tracks when the list is altered (probably by controlling write to it), then have the object do it's own split every time the data is altered and save the split list to a member of the object.

Categories

Resources