Slow NetworkX graph creation - python

I have to create a graph, starting from a documents-terms matrix, loaded into a pandas dataframe, where nodes are terms and where arches contain the number of documents in which the two nodes appear together.
The code works well but is really really slow.
edges = []
edges_attrs = {}
columns = list(dtm.columns)
for key in dtm.columns:
for key1 in columns:
# skip the same node
if key == key1:
continue
df = dtm.loc[(dtm[key] != 0) & (dtm[key1] != 0), [key, key1]]
docs = df.shape[0]
edges.append((key, key1))
edges_attrs[(key, key1)] = {'docs': docs}
# no double arches (u, v) == (v, u)
columns.remove(key)
graph.add_edges_from(edges)
nx.set_edge_attributes(graph, edges_attrs)
For a dtm with 2k terms (columns), it takes more than 3 hours, that it sounds to me quite too much for that size.
Some hints on how to speed up?

Don't use for loops. Learn about inner and outer joins in databases. An introductory course in SQL would cover these concepts. Applying them to a pandas dataframe is then pretty straightforward:
#!/usr/bin/env python
"""
https://stackoverflow.com/q/62406586/2912349
"""
import numpy as np
import pandas as pd
# simulate some data
x = pd.DataFrame(np.random.normal(0, 1, (4,4)), index=['a', 'b', 'c', 'd'], columns=['e', 'f', 'g', 'h'])
x[:] = x > 0
# e f g h
# a False False True False
# b False False False True
# c True True True True
# d False True True True
sparse = pd.DataFrame(x[x > 0].stack().index.tolist(), columns=['Documents', 'Terms'])
# Documents Terms
# 0 a g
# 1 b h
# 2 c e
# 3 c f
# 4 c g
# 5 c h
# 6 d f
# 7 d g
# 8 d h
cooccurrences = pd.merge(sparse, sparse, how='inner', on='Documents')
# Documents Terms_x Terms_y
# 0 a g g
# 1 b h h
# 2 c e e
# 3 c e f
# 4 c e g
# 5 c e h
# 6 c f e
# 7 c f f
# 8 c f g
# 9 c f h
# 10 c g e
# 11 c g f
# 12 c g g
# 13 c g h
# 14 c h e
# 15 c h f
# 16 c h g
# 17 c h h
# 18 d f f
# 19 d f g
# 20 d f h
# 21 d g f
# 22 d g g
# 23 d g h
# 24 d h f
# 25 d h g
# 26 d h h
# remove self loops and repeat pairings such as the second tuple in (u, v), (v, u)
valid = cooccurrences['Terms_x'] > cooccurrences['Terms_y']
valid_cooccurrences = cooccurrences[valid]
# Documents Terms_x Terms_y
# 6 c f e
# 10 c g e
# 11 c g f
# 14 c h e
# 15 c h f
# 16 c h g
# 21 d g f
# 24 d h f
# 25 d h g
counts = valid_cooccurrences.groupby(['Terms_x', 'Terms_y']).count()
# Documents
# Terms_x Terms_y
# f e 1
# g e 1
# f 2
# h e 1
# f 2
# g 2
documents = valid_cooccurrences.groupby(['Terms_x', 'Terms_y']).aggregate(lambda x : set(x))
# Documents
# Terms_x Terms_y
# f e {c}
# g e {c}
# f {d, c}
# h e {c}
# f {d, c}
# g {d, c}

Related

Different slices give different inequalities for same elements

import numpy as np
a = np.array([.4], dtype='float32')
b = np.array([.4, .6])
print(a > b)
print(a > b[0], a > b[1])
print(a[0] > b[0], a[0] > b[1])
[ True False]
[False] [False]
True False
What's the deal? Yes, b.dtype == 'float64', but so are its slices b[0] & b[1], and a remains 'float32'.
Note: I'm asking why this occurs, not how to circumvent it, which I know (e.g. cast both to 'float64').
As I've noted in another answer, type casting in numpy is pretty complicated, and this is the root cause of the behaviour you are seeing. The documents linked in that answer make it clear that scalars(/0d arrays) and 1d arrays differ in type conversions, since the latter aren't considered value by value.
The first half of the problem you already know: the problem is that type conversion happens differently for your two cases:
>>> (a + b).dtype
dtype('float64')
>>> (a + b[0]).dtype
dtype('float32')
>>> (a[0] + b[0]).dtype
dtype('float64')
There's also a helper called numpy.result_type() that can tell you the same information without having to perform the binary operation:
>>> np.result_type(a, b)
dtype('float64')
>>> np.result_type(a, b[0])
dtype('float32')
>>> np.result_type(a[0], b[0])
dtype('float64')
I believe we can understand what's happening in your example if we consider the type conversion tables:
>>> from numpy.testing import print_coercion_tables
can cast
[...]
In these tables, ValueError is '!', OverflowError is '#', TypeError is '#'
scalar + scalar
+ ? b h i l q p B H I L Q P e f d g F D G S U V O M m
? ? b h i l q l B H I L Q L e f d g F D G # # # O ! m
b b b h i l q l h i l d d d e f d g F D G # # # O ! m
h h h h i l q l h i l d d d f f d g F D G # # # O ! m
i i i i i l q l i i l d d d d d d g D D G # # # O ! m
l l l l l l q l l l l d d d d d d g D D G # # # O ! m
q q q q q q q q q q q d d d d d d g D D G # # # O ! m
p l l l l l q l l l l d d d d d d g D D G # # # O ! m
B B h h i l q l B H I L Q L e f d g F D G # # # O ! m
H H i i i l q l H H I L Q L f f d g F D G # # # O ! m
I I l l l l q l I I I L Q L d d d g D D G # # # O ! m
L L d d d d d d L L L L Q L d d d g D D G # # # O ! m
Q Q d d d d d d Q Q Q Q Q Q d d d g D D G # # # O ! m
P L d d d d d d L L L L Q L d d d g D D G # # # O ! m
e e e f d d d d e f d d d d e f d g F D G # # # O ! #
f f f f d d d d f f d d d d f f d g F D G # # # O ! #
d d d d d d d d d d d d d d d d d g D D G # # # O ! #
g g g g g g g g g g g g g g g g g g G G G # # # O ! #
F F F F D D D D F F D D D D F F D G F D G # # # O ! #
D D D D D D D D D D D D D D D D D G D D G # # # O ! #
G G G G G G G G G G G G G G G G G G G G G # # # O ! #
S # # # # # # # # # # # # # # # # # # # # # # # O ! #
U # # # # # # # # # # # # # # # # # # # # # # # O ! #
V # # # # # # # # # # # # # # # # # # # # # # # O ! #
O O O O O O O O O O O O O O O O O O O O O O O O O ! #
M ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
m m m m m m m m m m m m m m # # # # # # # # # # # ! m
scalar + neg scalar
[...]
array + scalar
+ ? b h i l q p B H I L Q P e f d g F D G S U V O M m
? ? b h i l q l B H I L Q L e f d g F D G # # # O ! m
b b b b b b b b b b b b b b e f d g F D G # # # O ! m
h h h h h h h h h h h h h h f f d g F D G # # # O ! m
i i i i i i i i i i i i i i d d d g D D G # # # O ! m
l l l l l l l l l l l l l l d d d g D D G # # # O ! m
q q q q q q q q q q q q q q d d d g D D G # # # O ! m
p l l l l l l l l l l l l l d d d g D D G # # # O ! m
B B B B B B B B B B B B B B e f d g F D G # # # O ! m
H H H H H H H H H H H H H H f f d g F D G # # # O ! m
I I I I I I I I I I I I I I d d d g D D G # # # O ! m
L L L L L L L L L L L L L L d d d g D D G # # # O ! m
Q Q Q Q Q Q Q Q Q Q Q Q Q Q d d d g D D G # # # O ! m
P L L L L L L L L L L L L L d d d g D D G # # # O ! m
e e e e e e e e e e e e e e e e e e F F F # # # O ! #
f f f f f f f f f f f f f f f f f f F F F # # # O ! #
d d d d d d d d d d d d d d d d d d D D D # # # O ! #
g g g g g g g g g g g g g g g g g g G G G # # # O ! #
F F F F F F F F F F F F F F F F F F F F F # # # O ! #
D D D D D D D D D D D D D D D D D D D D D # # # O ! #
G G G G G G G G G G G G G G G G G G G G G # # # O ! #
S # # # # # # # # # # # # # # # # # # # # # # # O ! #
U # # # # # # # # # # # # # # # # # # # # # # # O ! #
V # # # # # # # # # # # # # # # # # # # # # # # O ! #
O O O O O O O O O O O O O O O O O O O O O O O O O ! #
M ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
m m m m m m m m m m m m m m # # # # # # # # # # # ! m
[...]
The above is part of the current promotion tables for value-based promotion. It denotes how differing types contribute to a result type when pairing two numpy objects of a given kind (see the first column and first row for the specific types). The types are to be understood according to the single-character dtype specifications (below "One-character strings"), in particular np.dtype('f') corresponds to np.float32 (f for C-style float) and np.dtype('d') (d for C-style double) to np.float64 (see also np.typename('f') and the same for 'd').
I have noted two items in boldface in the above tables:
scalar f + scalar d --> d
array f + scalar d --> f
Now let's look at your cases. The premise is that you have an 'f' array a and a 'd' array b. The fact that a only has a single element doesn't matter: it's a 1d array with length 1 rather than a 0d array.
When you do a > b you are comparing two arrays, this is not denoted in the above tables. I'm not sure what the behaviour is here; my guess is that a gets broadcast to b's shape and then its type is cast to 'd'. The reason I think this is that np.can_cast(a, np.float64) is True and np.can_cast(b, np.float32) is False. But this is just a guess, a lot of this machinery in numpy is not intuitive to me.
When you do a > b[0] you are comparing a 'f' array to a 'd' scalar, so according to the above you get a 'f' array. That's what (a + b[0]).dtype told us. (When you use a > b[0] you don't see the conversion step, because the result is always a bool.)
When you do a[0] > b[0] you are comparing a 'f' scalar to a 'd' scalar, so according to the above you get a 'd' scalar. That's what (a[0] + b[0]).dtype told us.
So I believe this is all consistent with the quirks of type conversion in numpy. While it might seem like an unfortunate corner case with the value of 0.4 in double and single precision, this feature goes deeper and the problem serves as a big red warning that you should be very careful when mixing different dtypes.
The safest course of action is to convert your types yourself in order to control what happens in your code. Especially since there's discussion about reconsidering some aspects of type promotion.
As a side note (for now), there's a work-in-progress NEP 50 created in May 2021 that explains how confusing type promotion can be when scalars are involved, and plans to simplify some of the rules eventually. Since this also involves breaking changes, its implementation in NumPy proper won't happen overnight.

How to leave only one defined sub-string in a string in Python

Say I have one of the strings:
"a b c d e f f g" || "a b c f d e f g"
And I want there to be only one occurrence of a substring (f in this instance) throughout the string so that it is somewhat sanitized.
The result of each string would be:
"a b c d e f g" || "a b c d e f g"
An example of the use would be:
str = "a b c d e f g g g g g h i j k l"
str.leaveOne("g")
#// a b c d e f g h i j k l
If it doesn't matter which instance you leave, you can use str.replace, which takes a parameter signifying the number of replacements you want to perform:
def leave_one_last(source, to_remove):
return source.replace(to_remove, '', source.count(to_remove) - 1)
This will leave the last occurrence.
We can modify it to leave the first occurrence by reversing the string twice:
def leave_one_first(source, to_remove):
return source[::-1].replace(to_remove, '', source.count(to_remove) - 1)[::-1]
However, that is ugly, not to mention inefficient. A more elegant way might be to take the substring that ends with the first occurrence of the character to find, replace occurrences of it in the rest, and finally concatenate them together:
def leave_one_first_v2(source, to_remove):
first_index = source.index(to_remove) + 1
return source[:first_index] + source[first_index:].replace(to_remove, '')
If we try this:
string = "a b c d e f g g g g g h i j k l g"
print(leave_one_last(string, 'g'))
print(leave_one_first(string, 'g'))
print(leave_one_first_v2(string, 'g'))
Output:
a b c d e f h i j k l g
a b c d e f g h i j k l
a b c d e f g h i j k l
If you don't want to keep spaces, then you should use a version based on split:
def leave_one_split(source, to_remove):
chars = source.split()
first_index = chars.index(to_remove) + 1
return ' '.join(chars[:first_index] + [char for char in chars[first_index:] if char != to_remove])
string = "a b c d e f g g g g g h i j k l g"
print(leave_one_split(string, 'g'))
Output:
'a b c d e f g h i j k l'
If I understand correctly, you can just use a regex and re.sub to look for groups of two or more of your letter with or without a space and replace it by a single instance:
import re
def leaveOne(s, char):
return re.sub(r'((%s\s?)){2,}' % char, r'\1' , s)
leaveOne("a b c d e f g g g h i j k l", 'g')
# 'a b c d e f g h i j k l'
leaveOne("a b c d e f ggg h i j k l", 'g')
# 'a b c d e f g h i j k l'
leaveOne("a b c d e f g h i j k l", 'g')
# 'a b c d e f g h i j k l'
EDIT
If the goal is to get rid of all occurrences of the letter except one, you can still use a regex with a lookahead to select all letters followed by the same:
import re
def leaveOne(s, char):
return re.sub(r'(%s)\s?(?=.*?\1)' % char, '' , s)
print(leaveOne("a b c d e f g g g h i j k l g", 'g'))
# 'a b c d e f h i j k l g'
print(leaveOne("a b c d e f ggg h i j k l gg g", 'g'))
# 'a b c d e f h i j k l g'
print(leaveOne("a b c d e f g h i j k l", 'g'))
# 'a b c d e f g h i j k l'
This should even work with more complicated patterns like:
leaveOne("a b c ffff d e ff g", 'ff')
# 'a b c d e ff g'
Given String
mystr = 'defghhabbbczasdvakfafj'
cache = {}
seq = 0
for i in mystr:
if i not in cache:
cache[i] = seq
print (cache[i])
seq+=1
mylist = []
Here I have ordered the dictionary with values
for key,value in sorted(cache.items(),key=lambda x : x[1]):
mylist.append(key)
print ("".join(mylist))

Python Duplicating values when appending to array in class

This is my first StackOverflow post, so please tell me if I have done anything wrong!
I am trying to make a card game in python and have been told that using a class based system would be best.
Whilst trying to do so, when putting all the cards into a deck, the classes seem to duplicate the values onto the board.cards part of the code.
#definitions and imports
import random
class Card:
suit = ""
base = ""
class Hand:
cards = []
poweri = 0
powerii = 0
class Stack:
cards = []
#instantiates classes
deck = Stack()
board = Stack()
player = Hand()
dealer = Hand()
#creates ordered empty deck
def newdeck(obj):
for x in ["2","3","4","5","6","7","8","9","A","B","C","D","E"]:
for y in ["C","D","H","S"]:
card = Card()
card.base = x
card.suit = y
obj.cards.append(card)
#shuffles deck
def shuffle():
random.shuffle(deck.cards)
newdeck(deck)
#disabled to make debug easier
#shuffle()
#prints entire deck
print("\nDeck")
for i in range(len(deck.cards)):
print(deck.cards[i].base, deck.cards[i].suit)
print(len(deck.cards))
#prints entire board
print("\nBoard")
for i in range(len(board.cards)):
print(board.cards[i].base, board.cards[i].suit)
The program returns this:
Deck
2 C
2 D
2 H
2 S
3 C
3 D
3 H
3 S
4 C
4 D
4 H
4 S
5 C
5 D
5 H
5 S
6 C
6 D
6 H
6 S
7 C
7 D
7 H
7 S
8 C
8 D
8 H
8 S
9 C
9 D
9 H
9 S
A C
A D
A H
A S
B C
B D
B H
B S
C C
C D
C H
C S
D C
D D
D H
D S
E C
E D
E H
E S
52
Board
2 C
2 D
2 H
2 S
3 C
3 D
3 H
3 S
4 C
4 D
4 H
4 S
5 C
5 D
5 H
5 S
6 C
6 D
6 H
6 S
7 C
7 D
7 H
7 S
8 C
8 D
8 H
8 S
9 C
9 D
9 H
9 S
A C
A D
A H
A S
B C
B D
B H
B S
C C
C D
C H
C S
D C
D D
D H
D S
E C
E D
E H
E S
Process returned 0 (0x0) execution time : 0.314 s
Press any key to continue . . .
Board should be empty???
Regards,
Alex
The Stack.cards is a mutable class attribute. That means, all class instances will have reference to the same object with the same data.
You probably want each object to have its own data. To change it, instantiate cards within __init__() method:
class Stack:
def __init__(self):
self.cards = []

regex pattern won't return in python script

Why does the first snippet return digits, but the latter does not? I have tried more complicated expressions without success. The expressions I use are valid according to pythex.org, but do not work in the script.
(\d{6}-){7}\d{6}) is one such expression. I've tested it against this string: 123138-507716-007469-173316-534644-033330-675057-093280
import re
pattern = re.compile('(\d{1})')
load_file = open('demo.txt', 'r')
search_file = load_file.read()
result = pattern.findall(search_file)
print(result)
==============
import re
pattern = re.compile('(\d{6})')
load_file = open('demo.txt', 'r')
search_file = load_file.read()
result = pattern.findall(search_file)
print(result)
When I put the string into a variable and then search the variable it works just fine. This should work as is. But it doesn't help if I want to read a text file. I've tried to read each line of the file and that seems to be where the script breaks down.
import re
pattern = re.compile('((\d{6}-){7})')
#pattern = re.compile('(\d{6})')
#load_file = open('demo.txt', 'r')
#search_file = load_file.read()
test_string = '123138-507716-007469-173316-534644-033330-675057-093280'
result = pattern.findall(test_string)
print(result)
=========
printout,
Search File:
ÿþB i t L o c k e r D r i v e E n c r y p t i o n R e c o v e r y K e y
T h e r e c o v e r y k e y i s u s e d t o r e c o v e r t h e d a t a o n a B i t L o c k e r p r o t e c t e d d r i v e .
T o v e r i f y t h a t t h i s i s t h e c o r r e c t r e c o v e r y k e y c o m p a r e t h e i d e n t i f i c a t i o n w i t h w h a t i s p r e s e n t e d o n t h e r e c o v e r y s c r e e n .
R e c o v e r y k e y i d e n t i f i c a t i o n : f f s d f a - f s d f - s f
F u l l r e c o v e r y k e y i d e n t i f i c a t i o n : 8 8 8 8 8 8 8 8 - 8 8 8 8 - 8 8 8 8 - 8 8 8 8 - 8 8 8 8 8 8 8 8 8 8 8
B i t L o c k e r R e c o v e r y K e y :
1 1 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1 1 1 1
6 6 6 6 6 6
Search Results:
[]
Process finished with exit code 0
================
This is where I ended up. It finds the string just fine and without the commas.
import re
pattern = re.compile('(\w{6}-\w{6}-\w{6}-\w{6}-\w{6}-\w{6}-\w{6}-\w{6})')
load_file = open('demo3.txt', 'r')
for line in load_file:
print(pattern.findall(line))

Is there anyway to copy the data from a single column in a Python pandas dataframe into a string or list for further processing?

I am trying to iterate over a Python pandas created dataframe column by column. While it is easy to get Python to print out a whole column, I simply cannot work out how to turn this column of data into a list or string so I can actually use the data it contains (in this case, concatenating the data and copying it into a FASTA file). My code is below. Any suggestions would be greatly appreciated.
import sys
import string
import shlex
import numpy as np
import pandas as pd
SNP_df = pd.read_csv('SNPs.txt',sep='\t',index_col = None ,header = None, nrows = 101)
output = open('100 SNPs.fa','a')
i=1
for i in SNP_df[i]:
data = SNP_df[i]
data = shlex.shlex(data, posix = True)
data.whitespace += "\n"
data.whitespace_split = True
data = list(data)
for j in data:
if j == 0:
output.write(("\n>%s\n")%(str(data(j))))
else:
output.write(data(j))
Here are the first few lines of my data file:
POSITION REF AR_DM1005 AR_DM1015 AR_DM1050 AR_DM1056 AR_DM1088 AR_KB635 AR_KB652 AR_KB754 AR_KB819 AR_KB820 AR_KB827 AR_KB945 AR_MSH126 AR_MSH51 PP_BdA1134-13 PP_BdA1137-10 PP_DM1038 PP_DM1049 PP_DM1054 PP_DM1065 PP_DM1081 PP_DM1084 PP_JR83 ST_JR138 ST_JR158 ST_JR209 ST_JR72 ST_JR84 ST_JR91 ST_MSH177 ST_MSH217 CH_JR198 CH_JR20 CH_JR272 CH_JR356 CH_JR377 CH_KB888 CH_MSH202 TL_MA1959 TL_MSH130 TL_SCI12-2 TL_SPE123_2-3 TL_SPE123_5-1 TL_SPE123_6-3 TL_SPE123_7-1 TL_SPE123_8-1 CU_SPE123_1-2 CU_SPE123_4-1 Dmir_SP138
55 C T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T C
380 G G A A G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G A G G G G G G G G G
391 A A G A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
402 G A A A A G A A A A A A A A A G A A A A A A A A A A A A A A A A A A A G A A A G A A A A G A A A A G
422 A C C C C C C C C C C C C C C A A C C C C C C C C C C C C C C C C C C A C C C A C C C C A C C C C A
564 G G G G G G G G G G G G G G G G G G G G G G G G A A G G G G G G A G G G G G G G G G G G G G G G G G
just use numpy! you can convert a Series (1 column DataFrame) into a 1D numpy array easily!
import numpy as np
for i in SNP_df:
data = SNP_df[i]
data = np.array(data)
for j in data:
if j == 0:
output.write(("\n>%s\n")%(str(data(j))))
else:
output.write(data(j))
Using your example data. Note that due to copy&paste the tabs becaming white space (so using sep='\s+', iso '\t') and i have set the first row of the data as the column names (not using header=None). Concatenating one column to a string can be done using join.
In [20]: from StringIO import StringIO
In [21]: data = """\
....: POSITION REF AR_DM1005 AR_DM1015 AR_DM1050 AR_DM1056 AR_DM1088 AR_KB635 AR_KB652 AR_KB754 AR_KB819 AR_KB820 AR_KB827 AR_KB945 AR_MSH126 AR_MSH51 PP_BdA1134-13 PP_BdA1137-10 PP_DM1038 PP_DM1049 PP_DM1054 PP_DM1065 PP_DM1081 PP_DM1084 PP_JR83 ST_JR138 ST_JR158 ST_JR209 ST_JR72 ST_JR84 ST_JR91 ST_MSH177 ST_MSH217 CH_JR198 CH_JR20 CH_JR272 CH_JR356 CH_JR377 CH_KB888 CH_MSH202 TL_MA1959 TL_MSH130 TL_SCI12-2 TL_SPE123_2-3 TL_SPE123_5-1 TL_SPE123_6-3 TL_SPE123_7-1 TL_SPE123_8-1 CU_SPE123_1-2 CU_SPE123_4-1 Dmir_SP138
....: 55 C T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T C
....: 380 G G A A G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G A G G G G G G G G G
....: 391 A A G A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
....: 402 G A A A A G A A A A A A A A A G A A A A A A A A A A A A A A A A A A A G A A A G A A A A G A A A A G
....: 422 A C C C C C C C C C C C C C C A A C C C C C C C C C C C C C C C C C C A C C C A C C C C A C C C C A
....: 564 G G G G G G G G G G G G G G G G G G G G G G G G A A G G G G G G A G G G G G G G G G G G G G G G G G
....: """
In [22]: import pandas as pd
In [23]: SNP_df = pd.read_csv(StringIO(data), sep='\s+', index_col=None, nrows=101)
In [24]: SNP_df['AR_DM1005']
Out[24]:
0 T
1 G
2 A
3 A
4 C
5 G
Name: AR_DM1005, dtype: object
In [25]: ''.join(SNP_df['AR_DM1005'])
Out[25]: 'TGAACG'

Categories

Resources