I have an array consisting of labels but each label has been broken down by individual characters. For example, this is the first 2 elements of the array:
array([['1', '.', ' ', 'I', 'd', 'e', 'n', 't', 'i', 'f', 'y', 'i', 'n',
'g', ',', ' ', 'A', 's', 's', 'e', 's', 's', 'i', 'n', 'g', ' ',
'a', 'n', 'd', ' ', 'I', 'm', 'p', 'r', 'o', 'v', 'i', 'n', 'g',
' ', 'C', 'a', 'r', 'e', '', ''],
['9', '.', ' ', 'N', 'o', 'n', '-', 'P', 'h', 'a', 'r', 'm', 'a',
'c', 'o', 'l', 'o', 'g', 'i', 'c', 'a', 'l', ' ', 'I', 'n', 't',
'e', 'r', 'v', 'e', 'n', 't', 'i', 'o', 'n', 's', '', '', '',
'', ''], ...
I would like it to be formatted as such:
array(['1. Identifying, Assessing and Improving Care',
'9. Non-Pharmacological Interventions', ...
I want to be able to iterate through a concatenate the label output so it is as shown above.
Any help in achieving this would be much appreciated :) Many thanks!
import numpy as np
k=np.array([['1', '.', ' ', 'I', 'd', 'e', 'n', 't', 'i', 'f', 'y', 'i', 'n',
'g', ',', ' ', 'A', 's', 's', 'e', 's', 's', 'i', 'n', 'g', ' ',
'a', 'n', 'd', ' ', 'I', 'm', 'p', 'r', 'o', 'v', 'i', 'n', 'g',
' ', 'C', 'a', 'r', 'e', '', ''],
['9', '.', ' ', 'N', 'o', 'n', '-', 'P', 'h', 'a', 'r', 'm', 'a',
'c', 'o', 'l', 'o', 'g', 'i', 'c', 'a', 'l', ' ', 'I', 'n', 't',
'e', 'r', 'v', 'e', 'n', 't', 'i', 'o', 'n', 's', '', '', '',
'', '']])
for x in k:
print(''.join(x))
#output
1. Identifying, Assessing and Improving Care
9. Non-Pharmacological Interventions
Using List comprehension:
[''.join(x) for x in k]
#output
['1. Identifying, Assessing and Improving Care',
'9. Non-Pharmacological Interventions']
Considering the array as a list of lists, you could join all characters by looping through the list:
r = [['1', '.', ' ', 'I', 'd', 'e', 'n', 't', 'i', 'f', 'y', 'i', 'n',
'g', ',', ' ', 'A', 's', 's', 'e', 's', 's', 'i', 'n', 'g', ' ',
'a', 'n', 'd', ' ', 'I', 'm', 'p', 'r', 'o', 'v', 'i', 'n', 'g',
' ', 'C', 'a', 'r', 'e', '', ''],
['9', '.', ' ', 'N', 'o', 'n', '-', 'P', 'h', 'a', 'r', 'm', 'a',
'c', 'o', 'l', 'o', 'g', 'i', 'c', 'a', 'l', ' ', 'I', 'n', 't',
'e', 'r', 'v', 'e', 'n', 't', 'i', 'o', 'n', 's', '', '', '',
'', '']]
t = ["".join(i) for i in r]
print(t)
Output:
['1. Identifying, Assessing and Improving Care',
'9. Non-Pharmacological Interventions']
array = [['1', '.', ' ', 'I', 'd', 'e', 'n', 't', 'i', 'f', 'y', 'i', 'n',
'g', ',', ' ', 'A', 's', 's', 'e', 's', 's', 'i', 'n', 'g', ' ',
'a', 'n', 'd', ' ', 'I', 'm', 'p', 'r', 'o', 'v', 'i', 'n', 'g',
' ', 'C', 'a', 'r', 'e', '', ''],
['9', '.', ' ', 'N', 'o', 'n', '-', 'P', 'h', 'a', 'r', 'm', 'a',
'c', 'o', 'l', 'o', 'g', 'i', 'c', 'a', 'l', ' ', 'I', 'n', 't',
'e', 'r', 'v', 'e', 'n', 't', 'i', 'o', 'n', 's', '', '', '',
'', '']]
# array(['1. Identifying, Assessing and Improving Care',
# '9. Non-Pharmacological Interventions', ...
array = [''.join(i) for i in array]
print(array) #['1. Identifying, Assessing and Improving Care', '9. Non-Pharmacological Interventions']
Assuming from array([...]) that you are using numpy, here's a solution
import numpy as np
a = np.array([['1', '.', ' ', 'I', 'd', 'e', 'n', 't', 'i', 'f', 'y', 'i', 'n',
'g', ',', ' ', 'A', 's', 's', 'e', 's', 's', 'i', 'n', 'g', ' ',
'a', 'n', 'd', ' ', 'I', 'm', 'p', 'r', 'o', 'v', 'i', 'n', 'g',
' ', 'C', 'a', 'r', 'e', '', ''],
['9', '.', ' ', 'N', 'o', 'n', '-', 'P', 'h', 'a', 'r', 'm', 'a',
'c', 'o', 'l', 'o', 'g', 'i', 'c', 'a', 'l', ' ', 'I', 'n', 't',
'e', 'r', 'v', 'e', 'n', 't', 'i', 'o', 'n', 's', '', '', '',
'', '']])
b = np.empty(a.shape[0], dtype=object)
for i, x in enumerate(a): b[i] = ''.join(x)
If you make a loop over each element of your array, you can then use list .join to get what you are looking for.
Something like:
arr = [['1', '.', ' ', 'I', ...], ...]
output = list()
for x in arr:
output.append(''.join(x))
output
>>>
['1. Identifying, Assessing and Improving Care', ...]
I have a list of items from PDF text extraction in this way:
['performed three times. Data represent the mean±SEM of threeindependent experiments. *P<0.05, **P<0.005, ***P<0.001.', 'B','O-GlcNAc', 'AMPK', 'pAMPK', 'SREBP-1', 'ACC', 'FAS', 'actin', 'C','T', 'L', 'D', 'O', 'N', 'T', 'M', 'G', 'C', 'T', 'L', 'D', 'O', 'N','T', 'M', 'G', 'HaCaT HeLa', 'O', '-G', 'lN', 'A', 'c', 'le', 'v','e', 'l', '(A', '.U', ')', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L','0.0', '2.5', '1.5', '1.0', '0.5', 'H', 'a', 'C', 'a', 'T', 'D', 'O','N', 'H', 'a', 'C', 'a', 'T', 'T', 'M', 'G', '2.0', '**', '***', 'S','R', 'E', 'B', 'P', '-1', 'le', 'v', 'e', 'l', 'H', 'a', 'C', 'a','T', 'C', 'T', 'L', '0.0', '1.5', '1.0', '0.5', 'H', 'a', 'C', 'a','T', 'D', 'O', 'N', 'H', 'a', 'C', 'a', 'T', 'T', 'M', 'G', '2.0','F', 'A', 'S', '(A', '.U', ')', 'H', 'a', 'C', 'a', 'T', 'C', 'T','L', '0.0', '1.5', '1.0', '0.5', 'H', 'a', 'C', 'a', 'T', 'D', 'O','N', 'H', 'a', 'C', 'a', 'T', 'T', 'M', 'G', '2.0', 'A', 'C', 'C','(A', '.U', ')', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', '0.0', '2.5','1.5', '1.0', '0.5', 'H', 'a', 'C', 'a', 'T', 'D', 'O', 'N', 'H', 'a','C', 'a', 'T', 'T', 'M', 'G', '2.0', 'O', '-G', 'lc', 'N', 'A', 'c','le', 'v', 'e', 'l', '(A', '.U', ')', 'H', 'e', 'L', 'a', 'C', 'T','L', '0.0', '1.5', '1.0', '0.5', 'H', 'e', 'L', 'a', 'D', 'O', 'N','H', 'e', 'L', 'a', 'T', 'M', 'G', '2.0', '***', '***', 'S', 'R', 'E','B', 'P', '-1', 'le', 'v', 'e', 'l', 'H', 'e', 'L', 'a', 'C', 'T','L', '0.0', '1.5', '1.0', '0.5', 'H', 'e', 'L', 'a', 'D', 'O', 'N','H', 'e', 'L', 'a', 'T', 'M', 'G', '2.0', '***', 'F', 'A', 'S', '(A','.U', ')', 'H', 'e', 'L', 'a', 'C', 'T', 'L', '0.0', '1.5', '1.0','0.5', 'H', 'e', 'L', 'a', 'D', 'O', 'N', 'H', 'e', 'L', 'a', 'T','M', 'G', '2.0', '***', 'A', 'C', 'C', '(A', '.U', ')', 'H', 'e', 'L','a', 'C', 'T', 'L', '0.0', '1.5', '1.0', '0.5', 'H', 'e', 'L', 'a','D', 'O', 'N', 'H', 'e', 'L', 'a', 'T', 'M', 'G', '2.0', '***', '***','***', '***', '***', '***', '***', '*** ***', '***', 'O-GlcNAc','AMPK', 'pAMPK', 'SREBP-1', 'ACC', 'FAS', '�-actin', 'O', '-G', 'lN','A', 'c', 'le', 'v', 'e', 'l', '(A', '.U', ')', 'C', 'T', 'L', 'H','a', 'C', 'a', 'T', '0.0', '1.5', '1.0', '0.5', 'Q', 'u', 'e', 'r','H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', 'H', 'e', 'L', 'a', '2.0','Q', 'u', 'e', 'r', 'H', 'e', 'L', 'a', 'p', 'A', 'M', 'P', 'K', '(A','.U', ')', 'C', 'T', 'L', 'H', 'a', 'C', 'a', 'T', '0.0', '1.5','1.0', '0.5', 'Q', 'u', 'e', 'r', 'H', 'a', 'C', 'a', 'T', 'C', 'T','L', 'H', 'e', 'L', 'a', '2.5 ***', 'Q', 'u', 'e', 'r', 'H', 'e', 'L','a', 'S', 'R', 'E', 'B', 'P', '-1', 'le', 'v', 'e', 'l', 'C', 'T','L', 'H', 'a', 'C', 'a', 'T', '0.0', '1.5', '1.0', '0.5', 'Q', 'u','e', 'r', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', 'H', 'e', 'L', 'a','**', 'Q', 'u', 'e', 'r', 'H', 'e', 'L', 'a', 'A', 'C', 'C', '(A','.U', ')', 'C', 'T', 'L', 'H', 'a', 'C', 'a', 'T', '0.0', '1.5','1.0', '0.5', 'Q', 'u', 'e', 'r', 'H', 'a', 'C', 'a', 'T', 'C', 'T','L', 'H', 'e', 'L', 'a', '2.0', '***', 'Q', 'u', 'e', 'r', 'H', 'e','L', 'a', 'F', 'A', 'S', '(A', '.U', ')', 'C', 'T', 'L', 'H', 'a','C', 'a', 'T', '0.0', '1.5', '1.0', '0.5', 'Q', 'u', 'e', 'r', 'H','a', 'C', 'a', 'T', 'C', 'T', 'L', 'H', 'e', 'L', 'a', '***', 'Q','u', 'e', 'r', 'H', 'e', 'L', 'a', '2.0', '***', '*** ***', '*','******* *******', 'HaCaT HeLa', 'CTL Quer CTL Quer', 'A', 'Fig. 4.Quercetin regulates SREBP1 and its target proteins']
In this list, I would like to remove all groups of adjacent elements (length of group > N) for which no element has length > M.
A pseudo code would be:
for item in list:
if len(item) <= M:
buffer.append(item_index)
active = True
if len(item) > M and active == True:
active = False
if len(buffer) > N:
list.replace_at_index(buffer_by_index,'')
buffer.clear()
Thanks for helping
Here is how you can use the built-in enumerate method to iterate through the elements in a list alongside each element's index:
lst = ['performed three times. Data represent the mean±SEM of threeindependent experiments. *P<0.05, **P<0.005, ***P<0.001.', 'B','O-GlcNAc', 'AMPK', 'pAMPK', 'SREBP-1', 'ACC', 'FAS', 'actin', 'C','T', 'L', 'D', 'O', 'N', 'T', 'M', 'G', 'C', 'T', 'L', 'D', 'O', 'N','T', 'M', 'G', 'HaCaT HeLa', 'O', '-G', 'lN', 'A', 'c', 'le', 'v','e', 'l', '(A', '.U', ')', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L','0.0', '2.5', '1.5', '1.0', '0.5', 'H', 'a', 'C', 'a', 'T', 'D', 'O','N', 'H', 'a', 'C', 'a', 'T', 'T', 'M', 'G', '2.0', '**', '***', 'S','R', 'E', 'B', 'P', '-1', 'le', 'v', 'e', 'l', 'H', 'a', 'C', 'a','T', 'C', 'T', 'L', '0.0', '1.5', '1.0', '0.5', 'H', 'a', 'C', 'a','T', 'D', 'O', 'N', 'H', 'a', 'C', 'a', 'T', 'T', 'M', 'G', '2.0','F', 'A', 'S', '(A', '.U', ')', 'H', 'a', 'C', 'a', 'T', 'C', 'T','L', '0.0', '1.5', '1.0', '0.5', 'H', 'a', 'C', 'a', 'T', 'D', 'O','N', 'H', 'a', 'C', 'a', 'T', 'T', 'M', 'G', '2.0', 'A', 'C', 'C','(A', '.U', ')', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', '0.0', '2.5','1.5', '1.0', '0.5', 'H', 'a', 'C', 'a', 'T', 'D', 'O', 'N', 'H', 'a','C', 'a', 'T', 'T', 'M', 'G', '2.0', 'O', '-G', 'lc', 'N', 'A', 'c','le', 'v', 'e', 'l', '(A', '.U', ')', 'H', 'e', 'L', 'a', 'C', 'T','L', '0.0', '1.5', '1.0', '0.5', 'H', 'e', 'L', 'a', 'D', 'O', 'N','H', 'e', 'L', 'a', 'T', 'M', 'G', '2.0', '***', '***', 'S', 'R', 'E','B', 'P', '-1', 'le', 'v', 'e', 'l', 'H', 'e', 'L', 'a', 'C', 'T','L', '0.0', '1.5', '1.0', '0.5', 'H', 'e', 'L', 'a', 'D', 'O', 'N','H', 'e', 'L', 'a', 'T', 'M', 'G', '2.0', '***', 'F', 'A', 'S', '(A','.U', ')', 'H', 'e', 'L', 'a', 'C', 'T', 'L', '0.0', '1.5', '1.0','0.5', 'H', 'e', 'L', 'a', 'D', 'O', 'N', 'H', 'e', 'L', 'a', 'T','M', 'G', '2.0', '***', 'A', 'C', 'C', '(A', '.U', ')', 'H', 'e', 'L','a', 'C', 'T', 'L', '0.0', '1.5', '1.0', '0.5', 'H', 'e', 'L', 'a','D', 'O', 'N', 'H', 'e', 'L', 'a', 'T', 'M', 'G', '2.0', '***', '***','***', '***', '***', '***', '***', '*** ***', '***', 'O-GlcNAc','AMPK', 'pAMPK', 'SREBP-1', 'ACC', 'FAS', '�-actin', 'O', '-G', 'lN','A', 'c', 'le', 'v', 'e', 'l', '(A', '.U', ')', 'C', 'T', 'L', 'H','a', 'C', 'a', 'T', '0.0', '1.5', '1.0', '0.5', 'Q', 'u', 'e', 'r','H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', 'H', 'e', 'L', 'a', '2.0','Q', 'u', 'e', 'r', 'H', 'e', 'L', 'a', 'p', 'A', 'M', 'P', 'K', '(A','.U', ')', 'C', 'T', 'L', 'H', 'a', 'C', 'a', 'T', '0.0', '1.5','1.0', '0.5', 'Q', 'u', 'e', 'r', 'H', 'a', 'C', 'a', 'T', 'C', 'T','L', 'H', 'e', 'L', 'a', '2.5 ***', 'Q', 'u', 'e', 'r', 'H', 'e', 'L','a', 'S', 'R', 'E', 'B', 'P', '-1', 'le', 'v', 'e', 'l', 'C', 'T','L', 'H', 'a', 'C', 'a', 'T', '0.0', '1.5', '1.0', '0.5', 'Q', 'u','e', 'r', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', 'H', 'e', 'L', 'a','**', 'Q', 'u', 'e', 'r', 'H', 'e', 'L', 'a', 'A', 'C', 'C', '(A','.U', ')', 'C', 'T', 'L', 'H', 'a', 'C', 'a', 'T', '0.0', '1.5','1.0', '0.5', 'Q', 'u', 'e', 'r', 'H', 'a', 'C', 'a', 'T', 'C', 'T','L', 'H', 'e', 'L', 'a', '2.0', '***', 'Q', 'u', 'e', 'r', 'H', 'e','L', 'a', 'F', 'A', 'S', '(A', '.U', ')', 'C', 'T', 'L', 'H', 'a','C', 'a', 'T', '0.0', '1.5', '1.0', '0.5', 'Q', 'u', 'e', 'r', 'H','a', 'C', 'a', 'T', 'C', 'T', 'L', 'H', 'e', 'L', 'a', '***', 'Q','u', 'e', 'r', 'H', 'e', 'L', 'a', '2.0', '***', '*** ***', '*','******* *******', 'HaCaT HeLa', 'CTL Quer CTL Quer', 'A', 'Fig. 4.Quercetin regulates SREBP1 and its target proteins']
N = 3
M = 5
buffer = []
for i, v in enumerate(lst):
if len(v) <= M:
buffer.append(i)
else:
if len(buffer) > N:
for i in buffer:
lst[i] = None
buffer.clear()
print(list(filter(None, lst)))
Output:
['performed three times. Data represent the mean±SEM of threeindependent experiments. *P<0.05, **P<0.005, ***P<0.001.', 'B', 'O-GlcNAc', 'AMPK', 'pAMPK', 'SREBP-1', 'HaCaT HeLa', '*** ***', '***', 'O-GlcNAc', 'AMPK', 'pAMPK', 'SREBP-1', 'ACC', 'FAS', '�-actin', '2.5 ***', '*** ***', '*', '******* *******', 'HaCaT HeLa', 'CTL Quer CTL Quer', 'A', 'Fig. 4.Quercetin regulates SREBP\xad1 and its target proteins']
It is unclear exactly what is desired. Below is some code of what I believe is being asked for. Hope this helps. If this is off... let me know.
Psuedo algorithm
Given a list of strings.
Identify the len of each string using class cntseq.
Identify sequence of strings having the same len
For each sequence, if the length of sequence
x = ['performed three times. Data represent the mean±SEM of three independent experiments. P<0.05, P<0.005, P<0.001.', 'B', 'O-GlcNAc', 'AMPK', 'pAMPK', 'SREBP-1', 'ACC', 'FAS', 'actin', 'C', 'T', 'L', 'D', 'O', 'N', 'T', 'M', 'G', 'C', 'T', 'L', 'D', 'O', 'N', 'T', 'M', 'G', 'HaCaT HeLa', 'O', '-G', 'lN', 'A', 'c', 'le', 'v', 'e', 'l', '(A', '.U', ')', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', '0.0', '2.5', '1.5', '1.0', '0.5', 'H', 'a', 'C', 'a', 'T', 'D', 'O', 'N', 'H', 'a', 'C', 'a', 'T', 'T', 'M', 'G', '2.0', '', '', 'S', 'R', 'E', 'B', 'P', '-1', 'le', 'v', 'e', 'l', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', '0.0', '1.5', '1.0', '0.5', 'H', 'a', 'C', 'a', 'T', 'D', 'O', 'N', 'H', 'a', 'C', 'a', 'T', 'T', 'M', 'G', '2.0', 'F', 'A', 'S', '(A', '.U', ')', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', '0.0', '1.5', '1.0', '0.5', 'H', 'a', 'C', 'a', 'T', 'D', 'O', 'N', 'H', 'a', 'C', 'a', 'T', 'T', 'M', 'G', '2.0', 'A', 'C', 'C', '(A', '.U', ')', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', '0.0', '2.5', '1.5', '1.0', '0.5', 'H', 'a', 'C', 'a', 'T', 'D', 'O', 'N', 'H', 'a', 'C', 'a', 'T', 'T', 'M', 'G', '2.0', 'O', '-G', 'lc', 'N', 'A', 'c', 'le', 'v', 'e', 'l', '(A', '.U', ')', 'H', 'e', 'L', 'a', 'C', 'T', 'L', '0.0', '1.5', '1.0', '0.5', 'H', 'e', 'L', 'a', 'D', 'O', 'N', 'H', 'e', 'L', 'a', 'T', 'M', 'G', '2.0', '', '', 'S', 'R', 'E', 'B', 'P', '-1', 'le', 'v', 'e', 'l', 'H', 'e', 'L', 'a', 'C', 'T', 'L', '0.0', '1.5', '1.0', '0.5', 'H', 'e', 'L', 'a', 'D', 'O', 'N', 'H', 'e', 'L', 'a', 'T', 'M', 'G', '2.0', '', 'F', 'A', 'S', '(A', '.U', ')', 'H', 'e', 'L', 'a', 'C', 'T', 'L', '0.0', '1.5', '1.0', '0.5', 'H', 'e', 'L', 'a', 'D', 'O', 'N', 'H', 'e', 'L', 'a', 'T', 'M', 'G', '2.0', '', 'A', 'C', 'C', '(A', '.U', ')', 'H', 'e', 'L', 'a', 'C', 'T', 'L', '0.0', '1.5', '1.0', '0.5', 'H', 'e', 'L', 'a', 'D', 'O', 'N', 'H', 'e', 'L', 'a', 'T', 'M', 'G', '2.0', '', '', '', '', '', '', '', '* ', '', 'O-GlcNAc', 'AMPK', 'pAMPK', 'SREBP-1', 'ACC', 'FAS', '�-actin', 'O', '-G', 'lN', 'A', 'c', 'le', 'v', 'e', 'l', '(A', '.U', ')', 'C', 'T', 'L', 'H', 'a', 'C', 'a', 'T', '0.0', '1.5', '1.0', '0.5', 'Q', 'u', 'e', 'r', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', 'H', 'e', 'L', 'a', '2.0', 'Q', 'u', 'e', 'r', 'H', 'e', 'L', 'a', 'p', 'A', 'M', 'P', 'K', '(A', '.U', ')', 'C', 'T', 'L', 'H', 'a', 'C', 'a', 'T', '0.0', '1.5', '1.0', '0.5', 'Q', 'u', 'e', 'r', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', 'H', 'e', 'L', 'a', '2.5 ', 'Q', 'u', 'e', 'r', 'H', 'e', 'L', 'a', 'S', 'R', 'E', 'B', 'P', '-1', 'le', 'v', 'e', 'l', 'C', 'T', 'L', 'H', 'a', 'C', 'a', 'T', '0.0', '1.5', '1.0', '0.5', 'Q', 'u', 'e', 'r', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', 'H', 'e', 'L', 'a', '', 'Q', 'u', 'e', 'r', 'H', 'e', 'L', 'a', 'A', 'C', 'C', '(A', '.U', ')', 'C', 'T', 'L', 'H', 'a', 'C', 'a', 'T', '0.0', '1.5', '1.0', '0.5', 'Q', 'u', 'e', 'r', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', 'H', 'e', 'L', 'a', '2.0', '', 'Q', 'u', 'e', 'r', 'H', 'e', 'L', 'a', 'F', 'A', 'S', '(A', '.U', ')', 'C', 'T', 'L', 'H', 'a', 'C', 'a', 'T', '0.0', '1.5', '1.0', '0.5', 'Q', 'u', 'e', 'r', 'H', 'a', 'C', 'a', 'T', 'C', 'T', 'L', 'H', 'e', 'L', 'a', '', 'Q', 'u', 'e', 'r', 'H', 'e', 'L', 'a', '2.0', '', '* ', '', '***** *******', 'HaCaT HeLa', 'CTL Quer CTL Quer', 'A', 'Fig. 4. Quercetin regulates SREBP\xad1 and its target proteins']
df = pd.DataFrame(x, columns=['v'])
df['len'] = df.v.apply(len)
N = 2
M = 2
class cntseq(object):
'''
Define class to track sequences across the Column
'''
def __init__(self, **kwargs):
self.prevLen = -1
self.cnt = 0
self.start = None
def countN(self, r):
if r.len == self.prevLen:
# if adjcent sequence found, mark it's start with "len.start"
self.cnt += 1
if self.start == None :
self.start = int(r.name)-1
return '%d.%d'%(r.len, self.start)
else:
# non-adjcent sequence found, mark None.None"
self.prevLen = r.len
self.cnt = 0
self.start = None
return 'None.None'
# Identify sequences of adjcent lengths.
cs = cntseq()
df['seq'] = df.apply(lambda r : cs.countN(r),axis=1)
print("\nOriginal DF info")
print(df.describe())
print("\nOriginal DF ")
print(df.head())
# Compute lookup of duplicate information
df2 = pd.DataFrame(df.groupby('seq').seq.count())
df2.columns=['M']
df2 = df2.reset_index()
n = df2[df2.seq == 'None.None'].index[0]
df2 = df2.drop(78, axis=0)
print("\nLookup DF, seq has count and index for start of 'adjcent elements'")
print(df2.head())
# Compute final DF without duplicates
df3 = df[df.seq.isin(list(df2[df2.M > M].seq))].head()
print("\nFinal DF without duplicated")
print(df3)
print("\nOriginal DF info")
print(df3.describe())
output:
Original DF info
len
count 578.000000
mean 1.730104
std 5.284275
min 0.000000
25% 1.000000
50% 1.000000
75% 1.000000
max 110.000000
Original DF
v len seq
0 performed three times. Data represent the mean... 110 None.None
1 B 1 None.None
2 O-GlcNAc 8 None.None
3 AMPK 4 None.None
4 pAMPK 5 None.None
Lookup DF, seq has count and index for start of 'adjcent elements'
seq M
0 0.221 1
1 0.325 6
2 0.70 1
3 1.111 2
4 1.116 8
Final DF without duplicated
v len seq
10 T 1 1.9
11 L 1 1.9
12 D 1 1.9
13 O 1 1.9
14 N 1 1.9
Original DF info
len
count 5.0
mean 1.0
std 0.0
min 1.0
25% 1.0
50% 1.0
75% 1.0
max 1.0
I am designing a program that will encode. messages imported from a csv. It will do this by converting them their ASCII value, adding 2 to them and then converting them back to characters.
My current problem is that while my code will encode each character in each string the messages are now no longer joined together.
Any help would be appreciated.
My code:
#importing csv file and allowing it to be read from
import csv
ifile = open("messages.csv","rb")
reader= csv.reader(ifile)
#creating lists
plain_text=[]
plain_ascii=[]
encrypted_ascii=[]
encrypted_text=[]
latitude=[]
longitude=[]
#appending csv data to separate lists
for row in reader:
latitude.append(row[0])
longitude.append(row[1])
plain_text.append(row[2])
#encoding messages
encrypted_text=[[chr(ord(ch)+2) for ch in string] for string in plain_text]
print plain_text
print encrypted_text
ifile.close()
The current output:
['A famous Scottish victory in the First War of Scottish Independence - bit.ly/1yIAb8Q', "How high is Scotland's tallest mountain? - bit.ly/1q3Rj6D", 'What is the traditional instrument most often linked with Scotland? - http://#bit.ly/1lNdrk3', "A prickly problem Scotland's national symbol - bit.ly/1q3REpQ", 'Name the largest city in Scotland - bit.ly/T4OEuU']
[['C', '"', 'h', 'c', 'o', 'q', 'w', 'u', '"', 'U', 'e', 'q', 'v', 'v', 'k', 'u', 'j', '"', 'x', 'k', 'e', 'v', 'q', 't', '{', '"', 'k', 'p', '"', 'v', 'j', 'g', '"', 'H', 'k', 't', 'u', 'v', '"', 'Y', 'c', 't', '"', 'q', 'h', '"', 'U', 'e', 'q', 'v', 'v', 'k', 'u', 'j', '"', 'K', 'p', 'f', 'g', 'r', 'g', 'p', 'f', 'g', 'p', 'e', 'g', '"', '/', '"', 'd', 'k', 'v', '0', 'n', '{', '1', '3', '{', 'K', 'C', 'd', ':', 'S'], ['J', 'q', 'y', '"', 'j', 'k', 'i', 'j', '"', 'k', 'u', '"', 'U', 'e', 'q', 'v', 'n', 'c', 'p', 'f', ')', 'u', '"', 'v', 'c', 'n', 'n', 'g', 'u', 'v', '"', 'o', 'q', 'w', 'p', 'v', 'c', 'k', 'p', 'A', '"', '/', '"', 'd', 'k', 'v', '0', 'n', '{', '1', '3', 's', '5', 'T', 'l', '8', 'F'], ['Y', 'j', 'c', 'v', '"', 'k', 'u', '"', 'v', 'j', 'g', '"', 'v', 't', 'c', 'f', 'k', 'v', 'k', 'q', 'p', 'c', 'n', '"', 'k', 'p', 'u', 'v', 't', 'w', 'o', 'g', 'p', 'v', '"', 'o', 'q', 'u', 'v', '"', 'q', 'h', 'v', 'g', 'p', '"', 'n', 'k', 'p', 'm', 'g', 'f', '"', 'y', 'k', 'v', 'j', '"', 'U', 'e', 'q', 'v', 'n', 'c', 'p', 'f', 'A', '"', '/', '"', 'j', 'v', 'v', 'r', '<', '1', '1', 'd', 'k', 'v', '0', 'n', '{', '1', '3', 'n', 'P', 'f', 't', 'm', '5'], ['C', '"', 'r', 't', 'k', 'e', 'm', 'n', '{', '"', 'r', 't', 'q', 'd', 'n', 'g', 'o', '"', 'U', 'e', 'q', 'v', 'n', 'c', 'p', 'f', ')', 'u', '"', 'p', 'c', 'v', 'k', 'q', 'p', 'c', 'n', '"', 'u', '{', 'o', 'd', 'q', 'n', '"', '/', '"', 'd', 'k', 'v', '0', 'n', '{', '1', '3', 's', '5', 'T', 'G', 'r', 'S'], ['P', 'c', 'o', 'g', '"', 'v', 'j', 'g', '"', 'n', 'c', 't', 'i', 'g', 'u', 'v', '"', 'e', 'k', 'v', '{', '"', 'k', 'p', '"', 'U', 'e', 'q', 'v', 'n', 'c', 'p', 'f', '"', '/', '"', 'd', 'k', 'v', '0', 'n', '{', '1', 'V', '6', 'Q', 'G', 'w', 'W']]
You need to join the inner lists.
[''.join(chr(ord(ch)+2) for ch in string) for string in plain]