pyteomics: calculate mass from a file containing the sequence of several proteins

pyteomics: calculate mass from a file containing the sequence of several proteins - python

I want to calculate the mass for every protein of my file.
My code so far:
from pyteomics import mass
with open('file.txt') as f:
for line in f:
mass.calculate_mass(line)
When I replace the mass.calculate_mass with print(line) all the lines are correctly printed. But with mass.calculate_mass(line) come several error messages:
Traceback (most recent call last):
File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/parser.py", line 275, in parse
n, body, c = re.match(_modX_sequence, sequence).groups() AttributeError: 'NoneType' object has no attribute 'groups'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 304, in __init__
self._from_sequence(args[0], aa_comp) File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 200, in _from_sequence
show_unmodified_termini=True) File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/parser.py", line 277, in parse
raise PyteomicsError('Not a valid modX sequence: ' + sequence) pyteomics.auxiliary.PyteomicsError: Pyteomics error, message: "Not a valid modX sequence: 'MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY'\n"
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 307, in __init__
self._from_formula(args[0], mass_data) File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 205, in _from_formula
raise PyteomicsError('Invalid formula: ' + formula) pyteomics.auxiliary.PyteomicsError: Pyteomics error, message: "Invalid formula: 'MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY'\n"
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/michaela/calculatemass.py", line 5, in <module>
mass.calculate_mass(line) File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 499, in calculate_mass
else Composition(*args, **kwargs)) File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 312, in __init__
'formula'.format(args[0])) pyteomics.auxiliary.PyteomicsError: Pyteomics error, message: 'Could not create a Composition object from string: "\'MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY\'\n": not a valid sequence or formula'
my file looks like this:
'MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY'
I also tried it with
sequence='MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY'
There are no empty lines in my file.
If I try the same in the shell, it works:
mass.calculate_mass('MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY')
49589.2790365072
I also tried mass.calculate_mass(str(line)) but that did not help.
Do you know what I am doing wrong?

It looks like your file contains quotes ('). These are interpreted as part of the sequence and break the parser.
If you just put sequences in your file without any additional characters, it should work fine:
MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY
Note that the end-of-line characters don't cause any problems, although that is not documented.

I think you have an issue with "EOL" char ('\n'). Try:
mass.calculate_mass(line.strip())
Does it work?
line.strip() removes leading/trailing white spaces from your line. See strip() for more information.

It worked after changing the file to:
MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY
(without sequence = or quotes)
and by changing the code to:
from pyteomics import mass
with open('file.txt') as f:
for line in f:
print(mass.calculate_mass(line))

Related

With Python 3 and lxml, how to extract the Version number from a SOAP WSDL?

When I test with a subset of the WSDL file, with Name Spaces omitted from the file and code, it works fine.
# for reference, these are the final lines from the WSDL
#
# <wsdl:service name="Shopping">
# <wsdl:documentation>
# <Version>1027</Version>
# </wsdl:documentation>
# <wsdl:port binding="ns:ShoppingBinding" name="Shopping">
# <wsdlsoap:address location="http://open.api.ebay.com/shopping"/>
# </wsdl:port>
# </wsdl:service>
#</wsdl:definitions>
from lxml import etree
wsdl = etree.parse('http://developer.ebay.com/webservices/latest/ShoppingService.wsdl')
print(wsdl.findtext('wsdl:.//Version')) # wish this would print 1027
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 "/Users/matecsaj/Google Drive/Projects/collectibles/eBay/figure-it3.py"
Traceback (most recent call last):
File "src/lxml/_elementpath.py", line 79, in lxml._elementpath.xpath_tokenizer (src/lxml/_elementpath.c:2414)
KeyError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/matecsaj/Google Drive/Projects/collectibles/eBay/figure-it3.py", line 14, in <module>
print(wsdl.findtext('wsdl:.//Version')) # wish this would print 1027
File "src/lxml/etree.pyx", line 2230, in lxml.etree._ElementTree.findtext (src/lxml/etree.c:69049)
File "src/lxml/etree.pyx", line 1552, in lxml.etree._Element.findtext (src/lxml/etree.c:60629)
File "src/lxml/_elementpath.py", line 329, in lxml._elementpath.findtext (src/lxml/_elementpath.c:10089)
File "src/lxml/_elementpath.py", line 311, in lxml._elementpath.find (src/lxml/_elementpath.c:9610)
File "src/lxml/_elementpath.py", line 300, in lxml._elementpath.iterfind (src/lxml/_elementpath.c:9282)
File "src/lxml/_elementpath.py", line 277, in lxml._elementpath._build_path_iterator (src/lxml/_elementpath.c:8675)
File "src/lxml/_elementpath.py", line 82, in xpath_tokenizer (src/lxml/_elementpath.c:2542)
SyntaxError: prefix 'wsdl' not found in prefix map
Process finished with exit code 1

The xml has namespaces defined in it, hence to access the element you need to define the link of the namespace. Please see if the code below helps:
wsdlLink = "http://schemas.xmlsoap.org/wsdl/"
wsdl = etree.parse('http://developer.ebay.com/webservices/latest/ShoppingService.wsdl')
print(wsdl.findtext('{'+wsdlLink+'}//Version'))

With credit to the kind folks that commented, here is a modified solution that does print the Version number. All I could get working was the wildcard search. Also, the iterator skipped the Version element, so I had to get at it from its parent element. Good enough.
from lxml import etree
wsdlLink = "http://schemas.xmlsoap.org/wsdl/"
wsdl = etree.parse('http://developer.ebay.com/webservices/latest/ShoppingService.wsdl')
for element in wsdl.iter('{'+wsdlLink+'}*'):
if 'documentation' in element.tag:
for child in element:
print(child.text)

KeyError when assigning ''praw.Reddit'' to variable

I could successfully connect to reddit's servers with oauth2 some time ago, but when running my script just now, I get a KeyError followed by a NoSectionError. Code is below followed by exceptions, (The code has been reduced to its essentials).
import praw
# Configuration
APP_UA = 'useragent'
...
...
...
r = praw.Reddit(APP_UA)
Error message:
Traceback (most recent call last):
File "D:\Directory\Python\lib\configparser.py", line 843, in items
d.update(self._sections[section])
KeyError: 'useragent'
A NoSectionError occurred when handling the above exception.
"During handling of the above exception, another exception occurred:"
'Traceback (most recent call last):
File "D:\Directory\Python\Projects\myprj for Reddit, globaloffensive\oddshotcrawler.py", line 19, in <module>
r = praw.Reddit(APP_UA)
File "D:\Directory\Python\lib\site-packages\praw\reddit.py", line 84, in __init__
**config_settings)
File "D:\Directory\Python\lib\site-packages\praw\config.py", line 47, in __init__
raw = dict(Config.CONFIG.items(site_name), **settings)
File "D:\Directory\Python\lib\configparser.py", line 846, in items
raise NoSectionError(section)
configparser.NoSectionError: No section: 'useragent'
[Finished in 0.2s]

Try giving it a user_agent kwarg.
r = praw.Reddit(useragent=APP_UA)

Pyglet file name "Resource not found"

I'm trying to open a music track and add it to a queue for a player in Pyglet.
def QueueAudio(self):
self.musicpath=filedialog.askopenfilename()
print(self.musicpath)
Player.queue(pyglet.resource.media(r"self.musicpath"))
The musicpath variable works fine as the print statement prints the filename. The error comes when the player tries to queue the track. Error below.
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python33\lib\site-packages\pyglet\resource.py", line 605, in media
location = self._index[name]
KeyError: 'self.musicpath'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python33\lib\tkinter\__init__.py", line 1475, in __call__
return self.func(*args)
File "C:\Users\Rob\Google Drive\Coursework\Part 2\music player tests\test5.py", line 99, in QueueAudio
self.playerpath=pyglet.resource.media(r"self.musicpath")
File "C:\Python33\lib\site-packages\pyglet\resource.py", line 615, in media
raise ResourceNotFoundException(name)
pyglet.resource.ResourceNotFoundException: Resource "self.musicpath" was not found on the path. Ensure that the filename has the correct captialisation.
Does anyone know why this is, and what may fix it?

This line looks to be the culprit:
self.playerpath=pyglet.resource.media(r"self.musicpath")
You are passing a string "self.musicpath" to a function, when you should be passing in the contents of a variable named self.musicpath. You need to call it like this:
self.playerpath = pyglet.resource.media(self.musicpath)

Python re.sub not working correctly

trying to get a 'sed replace' function in python working
What I have now
def pysed(filename,search,replace):
for line in fileinput.input(filename,inplace=True):
sys.stdout.write(re.sub(r'{0}','{1}',line.format(search,replace)))
calling the function...
pysed('/dev/shm/FOOD_DES.txt','Butter','NEVER')
File /dev/shm/FOOD_DES.txt contains the following....
~01001~^~0100~^~Butter, salted~^~BUTTER,WITH SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01002~^~0100~^~Butter, whipped, with salt~^~BUTTER,WHIPPED,WITH SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01003~^~0100~^~Butter oil, anhydrous~^~BUTTER OIL,ANHYDROUS~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01004~^~0100~^~Cheese, blue~^~CHEESE,BLUE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01005~^~0100~^~Cheese, brick~^~CHEESE,BRICK~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01006~^~0100~^~Cheese, brie~^~CHEESE,BRIE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01007~^~0100~^~Cheese, camembert~^~CHEESE,CAMEMBERT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01008~^~0100~^~Cheese, caraway~^~CHEESE,CARAWAY~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
~01009~^~0100~^~Cheese, cheddar~^~CHEESE,CHEDDAR~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01010~^~0100~^~Cheese, cheshire~^~CHEESE,CHESHIRE~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
When running this however, I am getting the following error. Any thoughts, ideas?
pysed('/dev/shm/FOOD_DES.txt','Butter','NEVER')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in pysed
File "/usr/lib64/python2.6/re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "/usr/lib64/python2.6/re.py", line 245, in _compile
raise error, v # invalid expression

You wrote:
sys.stdout.write(re.sub(r'{0}','{1}',line.format(search,replace)))
This is rather confusing. Is this what you meant to write?
sys.stdout.write(re.sub('{0}'.format(search), '{0}'.format(replace), line))
That would of course be equivalent to:
sys.stdout.write(re.sub(search, replace, line))

where is the call to encode the string or force the string to need to be encoded in this file?

I know this may seem rude or mean or unpolite, but I need some help to try to figure out why I cant call window.loadPvmFile("f:\games#DD.ATC3.Root\common\models\a300\amu\dummy.pvm") exactly like that as a string. Instead of doing that, it gives me a traceback error:
Traceback (most recent call last):
File "F:\Python Apps\pvmViewer_v1_1.py", line 415, in <module>
window.loadPvmFile("f:\games\#DD.ATC3.Root\common\models\a300\amu\dummy.pvm")
File "F:\Python Apps\pvmViewer_v1_1.py", line 392, in loadPvmFile
file1 = open(path, "rb")
IOError: [Errno 22] invalid mode ('rb') or filename:
'f:\\games\\#DD.ATC3.Root\\common\\models\x07300\x07mu\\dummy.pvm'
Also notice, that in the traceback error, the file path is different. When I try a path that has no letters in it except for the drive letter and filename, it throws this error:
Traceback (most recent call last):
File "F:\Python Apps\pvmViewer_v1_1.py", line 416, in <module>
loadPvmFile('f:\0\0\dummy.pvm')
File "F:\Python Apps\pvmViewer_v1_1.py", line 393, in loadPvmFile
file1 = open(path, "r")
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
I have searched for the place that the encode function is called or where the argument is encoded and cant find it. Flat out, I am out of ideas, frustrated and I have nowhere else to go. The source code can be found here: PVM VIEWER
Also note that you will not be able to run this code and load a pvm file and that I am using portable python 2.7.3! Thanks for everyone's time and effort!

\a and \0 are escape sequences. Use r'' (or R'') around the string to mark it as a raw string.
window.loadPvmFile(r"f:\games#DD.ATC3.Root\common\models\a300\amu\dummy.pvm")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pyteomics: calculate mass from a file containing the sequence of several proteins - python

I think you have an issue with "EOL" char ('\n'). Try: mass.calculate_mass(line.strip()) Does it work? line.strip() removes leading/trailing white spaces from your line. See strip() for more information.

Related

With Python 3 and lxml, how to extract the Version number from a SOAP WSDL?

KeyError when assigning ''praw.Reddit'' to variable

Pyglet file name "Resource not found"

Python re.sub not working correctly

where is the call to encode the string or force the string to need to be encoded in this file?

Categories

Resources