String not in string - python

So I have two strings
a = "abc" and b = "ab"
Now I am trying to find the characters in a which are not present in b
The code that I have is :
for element in t:
if element not in s:
print element
This is giving some error for large strings. I have not looked into that error yet but I was wondering that another way to do the same thing would be something similar to :
if a not in b:
//further code to identify the element that is not in string b
The piece of code above gives me False when I run it, I don't know how to identify the element which is not present in the second string.
How do I go about this?

This is the sort of thing that a set is really good for:
>>> a = "abc"
>>> b = "abd"
>>> set(a).difference(b)
set(['c'])
This gives you items in a that aren't in b. If you want the items that only appear in one or the other, you can use symmetric_difference:
>>> a = "abc"
>>> b = "abd"
>>> set(a).symmetric_difference(b)
set(['c', 'd'])
Note that your code should work too given proper inputs:
>>> for element in a:
... if element not in b:
... print element
...
c
However, if you're dealing with large sequences, this is much less efficient and it's a bunch more code to write so I don't really recommend it.

Related

How do you convert a list of strings to separate strings in Python 3?

I want to know if you have a list of strings such as:
l = ['ACGAAAG', 'CAGAAGC', 'ACCTGTT']
How do you convert it to:
O = 'ACGAAAG'
P = 'CAGAAGC'
Q = 'ACCTGTT'
Can you do this without knowing the number of items in a list? You have to store them as variables.
(The variables don't matter.)
Welcome to SE!
Structure Known
If you know the structure of the string, then you might simply unpack it:
O, P, Q = my_list
Structure Unknown
Unpack your list using a for loop. Do your work on each string inside the loop. For the below, I am simply printing each one:
for element in l:
print(element)
Good luck!
If you don't know the number of items beforehand, a list is the right structure to keep the items in.
You can, though, cut off fist few known items, and leave the unknown tail as a list:
a, b, *rest = ["ay", "bee", "see", "what", "remains"]
print("%r, %r, rest is %r" % (a, b, rest))
a,b,c = my_list
this will work as long as the numbers of elements in the list is equal to the numbers of variables you want to unpack, it actually work with any iterable, tuple, list, set, etc
if the list is longer you can always access the first 3 elements if that is what you want
a = my_list[0]
b = my_list[1]
c = my_list[2]
or in one line
a, b, c = my_list[0], my_list[1], my_list[2]
even better with the slice notation you can get a sub list of the right with the first 3 elements
a, b, c = my_list[:3]
those would work as long as the list is at least of size 3, or the numbers of variables you want
you can also use the extended unpack notation
a, b, c, *the_rest = my_list
the rest would be a list with everything else in the list other than the first 3 elements and again the list need to be of size 3 or more
And that pretty much cover all the ways to extract a certain numbers of items
Now depending of what you are going to do with those, you may be better with a regular loop
for item in my_list:
#do something with the current item, like printing it
print(item)
in each iteration item would take the value of one element in the list for you to do what you need to do one item at the time
if what you want is take 3 items at the time in each iteration, there are several way to do it
like for example
for i in range(3,len(my_list),3)
a,b,c = my_list[i-3:i]
print(a,b,c)
there are more fun construct like
it = [iter(my_list)]*3
for a,b,c in zip(*it):
print(a,b,c)
and other with the itertools module.
But now you said something interesting "so that every term is assigned to a variable" that is the wrong approach, you don't want an unknown number of variables running around that get messy very fast, you work with the list, if you want to do some work with each element it there are plenty of ways of doing it like list comprehension
my_new_list = [ some_fun(x) for x in my_list ]
or in the old way
my_new_list = []
for x in my_list:
my_new_list.append( some_fun(x) )
or if you need to work with more that 1 item at the time, combine that with some of the above
I do not know if your use case requires the strings to be stored in different variables. It usually is a bad idea.
But if you do need it, then you can use exec builtin which takes the string representation of a python statement and executes it.
list_of_strings = ['ACGAAAG', 'CAGAAGC', 'ACCTGTT']
Dynamically generate variable names equivalent to the column names in an excel sheet. (A,B,C....Z,AA,AB........,AAA....)
variable_names = ['A', 'B', 'C'] in this specific case
for vn, st in zip(variable_names, list_of_strings):
exec('{} = "{}"'.format(vn, st))
Test it out, print(A,B,C) will output the three strings and you can use A,B and C as variables in the rest of the program

Python disable list string "breaking"

Is there a way to disable breaking string with list. For example:
>>> a = "foo"
>>> b = list()
>>> b.append(list(a))
>>> b
>>>[['f', 'o', 'o']]
Is there a way to have a list inside of a list with string that is not "broken", for example [["foo"],["bar"]]?
Very esay:
>>> a = "foo"
>>> b = list()
>>> b.append([a])
>>> b
[['foo']]
Do this:
>>> a = "foo"
>>> b = list()
>>> b.append([a])
>>> b
[["foo"]]
The reason this happens is that the list function works by taking each element of the sequence you pass it and putting them in a list. A string in Python is a sequence, the elements of the sequence are the individual characters.
Having this abstract concept of a "sequence" means that a lot of Python functions can work on multiple data types, as long as they accept a sequence. Once you get used to this idea, hopefully you'll start finding this concept more useful than surprising.
you sound like you want to break on word boundaries instead of on each letter.
Try something like
a = "foo bar"
b = list()
b.append(a.split(' ')) # [['foo', 'bar']]
Example with RegEx (to support multiple consecutive spaces) :
import re
a = "foo bar"
b.append(re.split(r'\s+', a)) # [['foo', 'bar']]

program to extract every alternate letters from a string in python?

Python programs are often short and concise and what usually requires bunch of lines in other programming languages (that I know of) can be accomplished in a line or two in python.
One such program I am trying to write was to extract every other letters from a string.
I have this working code, but wondering if any other concise way is possible?
>>> s
'abcdefg'
>>> b = ""
>>> for i in range(len(s)):
... if (i%2)==0:
... b+=s[i]
...
>>> b
'aceg'
>>>
>>> 'abcdefg'[::2]
'aceg'
Use Explain Python's slice notation:
>>> 'abcdefg'[::2]
'aceg'
>>>
The format for slice notation is [start:stop:step]. So, [::2] is telling Python to step through the string by 2's (which will return every other character).
The right way to do this is to just slice the string, as in the other answers.
But if you want a more concise way to write your code, which will work for similar problems that aren't as simple as slicing, there are two tricks: comprehensions, and the enumerate function.
First, this loop:
for i in range(len(foo)):
value = foo[i]
something with value and i
… can be written as:
for i, value in enumerate(foo):
something with value and i
So, in your case:
for i, c in enumerate(s):
if (i%2)==0:
b+=c
Next, any loop that starts with an empty object, goes through an iterable (string, list, iterator, etc.), and puts values into a new iterable, possibly running the values through an if filter or an expression that transforms them, can be turned into a comprehension very easily.
While Python has comprehensions for lists, sets, dicts, and iterators, it doesn't have comprehensions for strings—but str.join solves that.
So, putting it together:
b = "".join(c for i, c in enumerate(s) if i%2 == 0)
Not nearly as concise or readable as b = s[::2]… but a lot better than what you started with—and the same idea works when you want to do more complicated things, like if i%2 and i%3 (which doesn't map to any obvious slice), or doubling each letter with c*2 (which could be done by zipping together two slices, but that's not immediately obvious), etc.
Here is another example both for list and string:
sentence = "The quick brown fox jumped over the lazy dog."
sentence[::2]
Here we are saying: Take the entire string from the beginning to the end and return every 2nd character.
Would return the following:
'Teqikbonfxjme vrtelz o.'
You can do the same for a list:
colors = ["red", "organge", "yellow","green", "blue"]
colors[1:4]
would retrun:
['organge', 'yellow', 'green']
The way I read the slice is: If we have sentence[1:4]
Start at index 1 (remember the starting position is index 0) and Stop BEFORE the index 4
you could try using slice and join:
>>> k = list(s)
>>> "".join(k[::2])
'aceg'
Practically, slicing is the best way to go. However, there are also ways you could improve your existing code, not by making it shorter, but by making it more Pythonic:
>>> s
'abcdefg'
>>> b = []
>>> for index, value in enumerate(s):
if index % 2 == 0:
b.append(value)
>>> b = "".join(b)
or even better:
>>> b = "".join(value for index, value in enumerate(s) if index % 2 == 0)
This can be easily extended to more complicated conditions:
>>> b = "".join(value for index, value in enumerate(s) if index % 2 == index % 3 == 0)

Get the first character of the first string in a list?

How would I get the first character from the first string in a list in Python?
It seems that I could use mylist[0][1:] but that does not give me the first character.
>>> mylist = []
>>> mylist.append("asdf")
>>> mylist.append("jkl;")
>>> mylist[0][1:]
'sdf'
You almost had it right. The simplest way is
mylist[0][0] # get the first character from the first item in the list
but
mylist[0][:1] # get up to the first character in the first item in the list
would also work.
You want to end after the first character (character zero), not start after the first character (character zero), which is what the code in your question means.
Get the first character of a bare python string:
>>> mystring = "hello"
>>> print(mystring[0])
h
>>> print(mystring[:1])
h
>>> print(mystring[3])
l
>>> print(mystring[-1])
o
>>> print(mystring[2:3])
l
>>> print(mystring[2:4])
ll
Get the first character from a string in the first position of a python list:
>>> myarray = []
>>> myarray.append("blah")
>>> myarray[0][:1]
'b'
>>> myarray[0][-1]
'h'
>>> myarray[0][1:3]
'la'
Numpy operations are very different than python list operations.
Python has list slicing, indexing and subsetting. Numpy has masking, slicing, subsetting, indexing.
These two videos cleared things up for me.
"Losing your Loops, Fast Numerical Computing with NumPy" by PyCon 2015:
https://youtu.be/EEUXKG97YRw?t=22m22s
"NumPy Beginner | SciPy 2016 Tutorial" by Alexandre Chabot LeClerc:
https://youtu.be/gtejJ3RCddE?t=1h24m54s
Indexing in python starting from 0. You wrote [1:] this would not return you a first char in any case - this will return you a rest(except first char) of string.
If you have the following structure:
mylist = ['base', 'sample', 'test']
And want to get fist char for the first one string(item):
myList[0][0]
>>> b
If all first chars:
[x[0] for x in myList]
>>> ['b', 's', 't']
If you have a text:
text = 'base sample test'
text.split()[0][0]
>>> b
Try mylist[0][0]. This should return the first character.
If your list includes non-strings, e.g. mylist = [0, [1, 's'], 'string'], then the answers on here would not necessarily work. In that case, using next() to find the first string by checking for them via isinstance() would do the trick.
next(e for e in mylist if isinstance(e, str))[:1]
Note that ''[:1] returns '' while ''[0] spits IndexError, so depending on the use case, either could be useful.
The above results in StopIteration if there are no strings in mylist. In that case, one possible implementation is to set the default value to None and take the first character only if a string was found.
first = next((e for e in mylist if isinstance(e, str)), None)
first_char = first[0] if first else None

python sort strings with digits at the end

what is the easiest way to sort a list of strings with digits at the end where some have 3 digits and some have 4:
>>> list = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> list.sort()
>>> print list
['asdf111', 'asdf123', 'asdf1234', 'asdf124']
should put the 1234 one on the end. is there an easy way to do this?
is there an easy way to do this?
Yes
You can use the natsort module.
>>> from natsort import natsorted
>>> natsorted(['asdf123', 'asdf1234', 'asdf111', 'asdf124'])
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
Full disclosure, I am the package's author.
is there an easy way to do this?
No
It's perfectly unclear what the real rules are. The "some have 3 digits and some have 4" isn't really a very precise or complete specification. All your examples show 4 letters in front of the digits. Is this always true?
import re
key_pat = re.compile(r"^(\D+)(\d+)$")
def key(item):
m = key_pat.match(item)
return m.group(1), int(m.group(2))
That key function might do what you want. Or it might be too complex. Or maybe the pattern is really r"^(.*)(\d{3,4})$" or maybe the rules are even more obscure.
>>> data= ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> data.sort( key=key )
>>> data
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
What you're probably describing is called a Natural Sort, or a Human Sort. If you're using Python, you can borrow from Ned's implementation.
The algorithm for a natural sort is approximately as follows:
Split each value into alphabetical "chunks" and numerical "chunks"
Sort by the first chunk of each value
If the chunk is alphabetical, sort it as usual
If the chunk is numerical, sort by the numerical value represented
Take the values that have the same first chunk and sort them by the second chunk
And so on
l = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
l.sort(cmp=lambda x,y:cmp(int(x[4:]), int(y[4:]))
You need a key function. You're willing to specify 3 or 4 digits at the end and I have a feeling that you want them to compare numerically.
sorted(list_, key=lambda s: (s[:-4], int(s[-4:])) if s[-4] in '0123456789' else (s[:-3], int(s[-3:])))
Without the lambda and conditional expression that's
def key(s):
if key[-4] in '0123456789':
return (s[:-4], int(s[-4:]))
else:
return (s[:-3], int(s[-3:]))
sorted(list_, key=key)
This just takes advantage of the fact that tuples sort by the first element, then the second. So because the key function is called to get a value to compare, the elements will now be compared like the tuples returned by the key function. For example, 'asdfbad123' will compare to 'asd7890' as ('asdfbad', 123) compares to ('asd', 7890). If the last 3 characters of a string aren't in fact digits, you'll get a ValueError which is perfectly appropriate given the fact that you passed it data that doesn't fit the specs it was designed for.
The issue is that the sorting is alphabetical here since they are strings. Each sequence of character is compared before moving to next character.
>>> 'a1234' < 'a124' <----- positionally '3' is less than '4'
True
>>>
You will need to due numeric sorting to get the desired output.
>>> x = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> y = [ int(t[4:]) for t in x]
>>> z = sorted(y)
>>> z
[111, 123, 124, 1234]
>>> l = ['asdf'+str(t) for t in z]
>>> l
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
>>>
L.sort(key=lambda s:int(''.join(filter(str.isdigit,s[-4:]))))
rather than splitting each line myself, I ask python to do it for me with re.findall():
import re
import sys
def SortKey(line):
result = []
for part in re.findall(r'\D+|\d+', line):
try:
result.append(int(part, 10))
except (TypeError, ValueError) as _:
result.append(part)
return result
print ''.join(sorted(sys.stdin.readlines(), key=SortKey)),

Categories

Resources