Related
I have the following code:
paths = [['E', 'D', 'A', 'B'], ['E', 'D', 'A', 'C', 'B'], ['E', 'D', 'B'], ['E', 'D', 'C', 'B'], ['E', 'B'], ['E', 'C', 'B']]
Now, the lists inside a list represent node paths from start to end which were made using Networkx, however that is some background information. My question is more specific.
I am trying to derive the lists that only have every letter from A-E, aka it would return only the list:
paths_desired = [['E', 'D', 'A', 'C', 'B']]
If I were to have another path:
paths = [['E', 'D', 'A', 'B'], ['E', 'D', 'A', 'C', 'B'], ['D', 'B', 'A','C','E'], ['A', 'D', 'C', 'B']]
It would return:
paths_desired = [['E', 'D', 'A', 'C', 'B'],['D', 'B', 'A', 'C', 'E']]
My idea is a for loop that iterates through each list:
for i in pathways:
counter = 0
for j in letters:
if j in i:
counter = counter + 1;
if counter == 5:
desired_paths.append(i)
print(desired_paths)
This works, however, I want to make the loop more specific, meaning I want only lists that have the following order: ['E','D','A','C','B'], even if all the letters are present in a different list, within the paths list.
Additionally, is there a way I can upgrade my for loop, so that I wouldn't count, rather check if the letters are in there, and not more than 1 of each letter? Meaning no multiple Es, no multiple D, etc.
You can use a use a set and .issubset() like this:
def pathways(letters, paths):
ret = []
letters = set(letters)
for path in paths:
if letters.issubset(path):
ret.append(path)
return ret
letters = ['A', 'B', 'C', 'D', 'E']
paths = [['E', 'D', 'A', 'B'], ['E', 'D', 'A', 'C', 'B'],
['D', 'B', 'A','C','E'], ['A', 'D', 'C', 'B']]
print(pathways(letters, paths)) # => [['E', 'D', 'A', 'C', 'B'], ['D', 'B', 'A', 'C', 'E']]
Also, as a comment by ShadowRanger pointed out, the pathways() function could be shortened using filter(). Like this:
def pathways(letters, paths):
return list(filter(set(letters).issubset, paths))
letters = ['A', 'B', 'C', 'D', 'E']
paths = [['E', 'D', 'A', 'B'], ['E', 'D', 'A', 'C', 'B'],
['D', 'B', 'A','C','E'], ['A', 'D', 'C', 'B']]
print(pathways(letters, paths))
I am using a 'for loop' to eliminate the item step by step and generate a new list(feature_combination) including different combinations.
feature_list = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
feature_combination = []
for i in range(7):
feature_list.pop()
feature_combination.append(feature_list)
feature_combination
The ideal output should be:
[['A', 'B', 'C', 'D', 'E', 'F'],['A', 'B', 'C', 'D', 'E'],['A', 'B', 'C', 'D'],['A', 'B', 'C'],['A', 'B'],['A'], []]
But the current output is:
[[], [], [], [], [], [], []]
When I print the progress step by step:
feature_list = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
feature_combination = []
for i in range(7):
feature_list.pop()
print(feature_list)
I can get the following the results:
['A', 'B', 'C', 'D', 'E', 'F']
['A', 'B', 'C', 'D', 'E']
['A', 'B', 'C', 'D']
['A', 'B', 'C']
['A', 'B']
['A']
[]
So, why I cannot append these results to an empty list? What is the problem?
It's because when you call feature_combination.append(feature_list), you are appending a reference to feature_list, not the actual value of feature_list. Since feature_list is empty at the end of the for loop, all of the references to it are empty as well.
You can fix it by changing feature_combination.append(feature_list) to feature_combination.append(feature_list.copy()), which makes a copy of the list to store.
First of all, you need to pass an index into pop in order to specify which element to delete. Though I find this unesaccary, instead you could use slicing.
Below is an example of how you could accomplish your goal. This code adjusts to your desired output.
feature_list = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
feature_combination = []
for i in range(7):
feature_list = feature_list[:-1]
feature_combination.append(feature_list)
print(feature_combination)
output
[['A', 'B', 'C', 'D', 'E', 'F'], ['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'C', 'D'], ['A', 'B', 'C'], ['A', 'B'], ['A'], []]
A Python variable is a symbolic name that is a reference or pointer to an object. Once an object is assigned to a variable, you can refer to the object by that name. But the data itself is still contained within the object. refer this.
This is because the feature_list points to a specific object, which keeps updating as you pop are subsequently. You are basically creating a list that contains [object, object, object ...] all pointing to the same feature_list object. As you keep popping and updating the object, the list that collects multiple instances of this same object also gets updated with this object.
Here is how you can test this happening -
feature_list = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
feature_combination = []
for i in range(7):
feature_list.pop()
feature_combination.append(feature_list)
print('iteration', i)
print(feature_combination) #Print the primary list after each iteration
iteration 0
[['A', 'B', 'C', 'D', 'E', 'F']]
iteration 1
[['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'C', 'D', 'E']]
iteration 2
[['A', 'B', 'C', 'D'], ['A', 'B', 'C', 'D'], ['A', 'B', 'C', 'D']]
iteration 3
[['A', 'B', 'C'], ['A', 'B', 'C'], ['A', 'B', 'C'], ['A', 'B', 'C']]
iteration 4
[['A', 'B'], ['A', 'B'], ['A', 'B'], ['A', 'B'], ['A', 'B']]
iteration 5
[['A'], ['A'], ['A'], ['A'], ['A'], ['A']]
iteration 6
[[], [], [], [], [], [], []]`
Notice, that after each iteration, every instance of the sublist is being updated after the pop and reflect inside the main list.
A fix
A fix is to use a slice to get and store a copy.
feature_list = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
feature_combination = []
for i in range(7):
feature_list.pop()
print(feature_list)
feature_combination.append(feature_list[:]) #<----
feature_combination
[['A', 'B', 'C', 'D', 'E', 'F'],
['A', 'B', 'C', 'D', 'E'],
['A', 'B', 'C', 'D'],
['A', 'B', 'C'],
['A', 'B'],
['A'],
[]]
I am trying to find the best way to group 'rows' with similar IDs.
My best guess:
np.array([test[test[:,0] == ID] for ID in List_IDs])
result: array of arrays of arrays
[ array([['ID_1', 'col1','col2',...,'coln'],
['ID_1', 'col1','col2',...,'coln'],...,
['ID_1', 'col1','col2',...,'coln']],dtype='|S32')
array([['ID_2', 'col1','col2',...,'coln'],
['ID_2', 'col1','col2',...,'coln'],...,
['ID_2', 'col1','col2',...,'coln']],dtype='|S32')
....
array([['ID_k', 'col1','col2',...,'coln'],
['ID_k', 'col1','col2',...,'coln'],...,
['ID_K', 'col1','col2',...,'coln']],dtype='|S32')]
Can anyone suggest something that can be more efficient ?
Reminder: The test array is huge. 'Rows' not ordered
I am assuming List_IDs is a list of all unique IDs from the first column. With that assumption, here's a Numpy-based solution -
# Sort input array test w.r.t. first column that are IDs
test_sorted = test[test[:,0].argsort()]
# Convert the string IDs to numeric IDs
_,numeric_ID = np.unique(test_sorted[:,0],return_inverse=True)
# Get the indices where shifts (IDs change) occur
_,cut_idx = np.unique(numeric_ID,return_index=True)
# Use the indices to split the input array into sub-arrays with common IDs
out = np.split(test_sorted,cut_idx)[1:]
Sample run -
In [305]: test
Out[305]:
array([['A', 'A', 'B', 'E', 'A'],
['B', 'E', 'A', 'E', 'B'],
['C', 'D', 'D', 'A', 'C'],
['B', 'D', 'A', 'C', 'A'],
['B', 'A', 'E', 'A', 'E'],
['C', 'D', 'C', 'E', 'D']],
dtype='|S32')
In [306]: test_sorted
Out[306]:
array([['A', 'A', 'B', 'E', 'A'],
['B', 'E', 'A', 'E', 'B'],
['B', 'D', 'A', 'C', 'A'],
['B', 'A', 'E', 'A', 'E'],
['C', 'D', 'D', 'A', 'C'],
['C', 'D', 'C', 'E', 'D']],
dtype='|S32')
In [307]: out
Out[307]:
[array([['A', 'A', 'B', 'E', 'A']],
dtype='|S32'), array([['B', 'E', 'A', 'E', 'B'],
['B', 'D', 'A', 'C', 'A'],
['B', 'A', 'E', 'A', 'E']],
dtype='|S32'), array([['C', 'D', 'D', 'A', 'C'],
['C', 'D', 'C', 'E', 'D']],
dtype='|S32')]
I have an issue and I can't for the life of me get anything to return past ()
exam_solution = ['B', 'D', 'A', 'A', 'C', 'A', 'B', 'A', 'C', 'D', 'B', 'C',\
'D', 'A', 'D', 'C', 'C', 'B', 'D', 'A']
student_answers = ['B', 'D', 'B', 'A', 'C', 'A', 'A', 'A', 'C', 'D', 'B', 'C',\
'D', 'B', 'D', 'C', 'C', 'B', 'D', 'A']
I need to compare the 2 lists and append the differences into questions_missed = []
I haven't found anything remotely close to working. Any help would be appreciated
edit: In python been stroking out over it all day.
use python list comprehensions to check list diff:
print [(index, i, j) for index, (i, j) in enumerate(zip(exam_solution, student_answers)) if i != j]
[(2, 'A', 'B'), (6, 'B', 'A'), (13, 'A', 'B')]
You can modify this solution to fit your needs:
exam_solution = ['B', 'D', 'A', 'A', 'C', 'A', 'B', 'A', 'C', 'D', 'B', 'C', 'D', 'A', 'D', 'C', 'C', 'B', 'D', 'A']
student_answers = ['B', 'D', 'B', 'A', 'C', 'A', 'A', 'A', 'C', 'D', 'B', 'C', 'D', 'B', 'D', 'C', 'C', 'B', 'D', 'A']
results = []
correct = 0
incorrect = 0
index = 0
while index < len(student_answers):
if student_answers[index] == exam_solution[index]:
results.append(True)
correct += 1
else:
results.append(False)
incorrect += 1
index += 1
print("You answered " + correct + " questions correctly and " + incorrect + " questions incorrectly.")
Using list comprehensions:
[x for i, x in enumerate(exam_solution) if exam_solution[i] != student_answers[i] ]
['A', 'B', 'A']
Assuming you want an output in common English like this -
Question 3 A != B
Question 7 B != A
Question 14 A != B
You could try -
from array import *
exam_solution = ['B', 'D', 'A', 'A', 'C', 'A', 'B', 'A', 'C', 'D', 'B', 'C',\
'D', 'A', 'D', 'C', 'C', 'B', 'D', 'A']
student_answers = ['B', 'D', 'B', 'A', 'C', 'A', 'A', 'A', 'C', 'D', 'B', 'C',\
'D', 'B', 'D', 'C', 'C', 'B', 'D', 'A']
questions_missed = []
count = 0
for answer in exam_solution:
if (answer != student_answers[count]):
questions_missed.append(count)
count = count + 1
for question in questions_missed:
print str.format("Question {0} {1} != {2}", question+1,
exam_solution[question], student_answers[question]);
Using the KISS design principle, that's how I'd do it:
exam_solution = ['B', 'D', 'A', 'A', 'C', 'A', 'B', 'A', 'C', 'D', 'B', 'C',\
'D', 'A', 'D', 'C', 'C', 'B', 'D', 'A']
student_answers = ['B', 'D', 'B', 'A', 'C', 'A', 'A', 'A', 'C', 'D', 'B', 'C',\
'D', 'B', 'D', 'C', 'C', 'B', 'D', 'A']
questions_missed = []
for index in range(len(exam_solution)):
# this assumes exam_solution and student_answers have the same size!
if exam_solution[index] != student_answers[index]:
questions_missed.append(index)
print (questions_missed)
And the output is:
[2, 6, 13]
L = [(a, b) for a, b in zip(exam_solution, student_answers) if a != b]
print(L)
Mybe you can use zip function.
The output is:
[('A', 'B'), ('B', 'A'), ('A', 'B')]
Solution (use set):
>>> def result(solution, answers):
... return set(str(n)+s for n, s in enumerate(solution)) - \
... set(str(n)+r for n, r in enumerate(answers))
...
>>> result(exam_solution, student_answers)
... set(['6B', '13A', '2A'])
>>>
The result are wrong responses (you can convert to list list(result(student_answers)).
I have following test code.
a = ['a', 'b', 'c', 'd', 'e']
c = a * 3
b = a
but b in c returns False. b is a sub sequence of c and the list c contains b. So why is it returning false?
Thanks in advance.
b in c
Does not work because b looks like:
['a', 'b', 'c', 'd', 'e']
and c looks like:
['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e']
In other words, b is not an element of the sequence. Instead, b is a subsequence. If you were to construct c as follows:
c = [a, a, a]
Then c would look like:
[['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']]
And "b in c" would return True.
Hope this helps.
If you had this code:
a = ['a', 'b', 'c', 'd', 'e']
c = [a] * 3
b = a
when you type b in c you would get True.
In this case
c = [a] * 3 (with [ ] around a)
would return:
[['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']]