Python re.sub not working correctly

Python re.sub not working correctly - python

trying to get a 'sed replace' function in python working
What I have now
def pysed(filename,search,replace):
for line in fileinput.input(filename,inplace=True):
sys.stdout.write(re.sub(r'{0}','{1}',line.format(search,replace)))
calling the function...
pysed('/dev/shm/FOOD_DES.txt','Butter','NEVER')
File /dev/shm/FOOD_DES.txt contains the following....
~01001~^~0100~^~Butter, salted~^~BUTTER,WITH SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01002~^~0100~^~Butter, whipped, with salt~^~BUTTER,WHIPPED,WITH SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01003~^~0100~^~Butter oil, anhydrous~^~BUTTER OIL,ANHYDROUS~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01004~^~0100~^~Cheese, blue~^~CHEESE,BLUE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01005~^~0100~^~Cheese, brick~^~CHEESE,BRICK~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01006~^~0100~^~Cheese, brie~^~CHEESE,BRIE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01007~^~0100~^~Cheese, camembert~^~CHEESE,CAMEMBERT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01008~^~0100~^~Cheese, caraway~^~CHEESE,CARAWAY~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
~01009~^~0100~^~Cheese, cheddar~^~CHEESE,CHEDDAR~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01010~^~0100~^~Cheese, cheshire~^~CHEESE,CHESHIRE~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
When running this however, I am getting the following error. Any thoughts, ideas?
pysed('/dev/shm/FOOD_DES.txt','Butter','NEVER')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in pysed
File "/usr/lib64/python2.6/re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "/usr/lib64/python2.6/re.py", line 245, in _compile
raise error, v # invalid expression

You wrote:
sys.stdout.write(re.sub(r'{0}','{1}',line.format(search,replace)))
This is rather confusing. Is this what you meant to write?
sys.stdout.write(re.sub('{0}'.format(search), '{0}'.format(replace), line))
That would of course be equivalent to:
sys.stdout.write(re.sub(search, replace, line))

Related

pyteomics: calculate mass from a file containing the sequence of several proteins

I want to calculate the mass for every protein of my file.
My code so far:
from pyteomics import mass
with open('file.txt') as f:
for line in f:
mass.calculate_mass(line)
When I replace the mass.calculate_mass with print(line) all the lines are correctly printed. But with mass.calculate_mass(line) come several error messages:
Traceback (most recent call last):
File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/parser.py", line 275, in parse
n, body, c = re.match(_modX_sequence, sequence).groups() AttributeError: 'NoneType' object has no attribute 'groups'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 304, in __init__
self._from_sequence(args[0], aa_comp) File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 200, in _from_sequence
show_unmodified_termini=True) File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/parser.py", line 277, in parse
raise PyteomicsError('Not a valid modX sequence: ' + sequence) pyteomics.auxiliary.PyteomicsError: Pyteomics error, message: "Not a valid modX sequence: 'MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY'\n"
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 307, in __init__
self._from_formula(args[0], mass_data) File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 205, in _from_formula
raise PyteomicsError('Invalid formula: ' + formula) pyteomics.auxiliary.PyteomicsError: Pyteomics error, message: "Invalid formula: 'MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY'\n"
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/michaela/calculatemass.py", line 5, in <module>
mass.calculate_mass(line) File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 499, in calculate_mass
else Composition(*args, **kwargs)) File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 312, in __init__
'formula'.format(args[0])) pyteomics.auxiliary.PyteomicsError: Pyteomics error, message: 'Could not create a Composition object from string: "\'MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY\'\n": not a valid sequence or formula'
my file looks like this:
'MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY'
I also tried it with
sequence='MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY'
There are no empty lines in my file.
If I try the same in the shell, it works:
mass.calculate_mass('MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY')
49589.2790365072
I also tried mass.calculate_mass(str(line)) but that did not help.
Do you know what I am doing wrong?

It looks like your file contains quotes ('). These are interpreted as part of the sequence and break the parser.
If you just put sequences in your file without any additional characters, it should work fine:
MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY
Note that the end-of-line characters don't cause any problems, although that is not documented.

I think you have an issue with "EOL" char ('\n'). Try:
mass.calculate_mass(line.strip())
Does it work?
line.strip() removes leading/trailing white spaces from your line. See strip() for more information.

It worked after changing the file to:
MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY
(without sequence = or quotes)
and by changing the code to:
from pyteomics import mass
with open('file.txt') as f:
for line in f:
print(mass.calculate_mass(line))

While calling simplify in sympy getting error?

When my python code tried to use simplify it shows following error. This problem showed after i run separate code file of pyparsing(Which execute successfully). The same code is working fine before.
Edit:
>>> expression="a+b+z"
>>> t=simplify(expression)
ast.py:4: SyntaxWarning: invalid pattern (**) passed to Regex
operator = pp.Regex("**").setName("operator")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sympy\simplify\simplify.py", line 507, in simplify
expr = sympify(expr)
File "C:\Python27\lib\site-packages\sympy\core\sympify.py", line 308, in sympify
from sympy.parsing.sympy_parser import (parse_expr, TokenError,
File "C:\Python27\lib\site-packages\sympy\parsing\sympy_parser.py", line 11, in <module>
import ast
File "ast.py", line 4, in <module>
operator = pp.Regex("**").setName("operator")
File "C:\Python27\lib\site-packages\pyparsing.py", line 1920, in __init__
self.re = re.compile(self.pattern, self.flags)
File "C:\Python27\Lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\Python27\Lib\re.py", line 244, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
Please suggest?

You have a local file, ast.py, which is getting imported in place of Python's built-in ast module. You should remove or rename this file to avoid the name conflict, as this can cause other modules to not work correctly.
Additionally, your local module contains the following line, which is causing an exception on import:
operator = pp.Regex("**").setName("operator")
** is not a valid regular expression. In a regular expression, * means "0 or more repetitions of the preceding expression", which doesn't make sense at the beginning of an expression because there is "nothing to repeat" (as the error message says).

lxml xpath filter with OR condition not working

The langs object is a lxml object produced by parsing this file: http://www.loc.gov/standards/codelists/languages.xml
This xpath works:
langs.node.xpath("//lang:language[lang:name='English']", namespaces={'lang':'info:lc/xmlns/codelist-v1'})[0].findtext('lang:name', namespaces={'lang': 'info:lc/xmlns/codelist-v1'})
When I add an additional condition of | lang:code = 'English' like so:
langs.node.xpath("//lang:language[lang:name='English' | lang:code='English']", namespaces={'lang':'info:lc/xmlns/codelist-v1'})[0].findtext('lang:name', namespaces={'lang': 'info:lc/xmlns/codelist-v1'})
I get the error:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "lxml.etree.pyx", line 1509, in lxml.etree._Element.xpath (src/lxml /lxml.etree.c:50717)
File "xpath.pxi", line 318, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:145969)
File "xpath.pxi", line 238, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:144977)
File "xpath.pxi", line 223, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:144785)
XPathEvalError: Invalid type

The correct syntax is or, not |:
nsmap={'lang': 'info:lc/xmlns/codelist-v1'}
langs.node.xpath("//lang:language[lang:name='English' or lang:code='English']",
namespaces=nsmap)[0].findtext('lang:name', namespaces=nsmap)

10 more minutes of googling and I had the answer:
I had the change | to or
langs.node.xpath("//lang:language[lang:name='English' or lang:code='English']", namespaces={'lang':'info:lc/xmlns/codelist-v1'})[0].findtext('lang:name', namespaces={'lang': 'info:lc/xmlns/codelist-v1'})

TypeError: not all arguments converted during string formatting error python

I have customized tempest code for our REST API's, but when i run the scripts, using nosetests at the beginning it gives some weird type error as mentioned below, though test cases in the script passes
Traceback (most recent call last):
File "/usr/lib64/python2.6/logging/__init__.py",line 776, in emit msg = self.format(record)
File "/usr/lib64/python2.6/logging/__init__.py", line 654, in format return fmt.format(record)
File "/home/rmcbuild/repository/rmc-test/tempest/openstack/common/log.py", line 516, in format return logging.Formatter.format(self, record)
File "/usr/lib64/python2.6/logging/__init__.py", line 436, in format record.message = record.getMessage()
File "/usr/lib64/python2.6/logging/__init__.py", line 306, in getMessage msg = msg % self.args
TypeError: not all arguments converted during string formatting
has anyone come across this type of error, would appreciate if someone helps me out of it.
Thanks,

This happens when using %s to print a string:
>>> x = 'aj8uppal'
>>> print 'Hello, %s %s' %(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: not enough arguments for format string
>>> print 'Hello, ' %(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting
>>>
If you put one too many %ss, it gives a TypeError: not enough arguments for format string, but if you put one to few, it gives TypeError: not all arguments converted during string formatting.
What is likely is that you have put a variable(s) in parentheses at the end, but the number of percent signs does not match. Please post your code if you want us to help you further :)

I received this error when I typed a '&s' by mistake rather than a '%s'

What's wrong in python script?

What's wrong in python script?
Code:
import os
import shutil
import getpass
os.mkdir("C:\\dtmp")
shutil.copy("C:\\path\\to\\bb-freeze-script.py","C:\\dtmp")
os.chdir("C:\\dtmp")
shutil.copy("C:\\path\\to\\main.py","C:\\dtmp")
os.system("python bb-freeze-script.py main.py")
os.mkdir("C:\\Program Files\\Directories v0.6")
os.chdir("C:\\")
shutil.rmtree("C:\\dtmp")
print getpass.getuser()
Error:
Traceback (most recent call last):
File "bb-freeze-script.py", line 8, in <module>
load_entry_point('bbfreeze==0.97.3', 'console_scripts', 'bb-freeze')()
File "C:\Python27\lib\site-packages\bbfreeze-0.97.3-py2.7-win32.egg\bbfreeze\__init__.py", line 24, in main
f.addScript(x)
File "C:\Python27\lib\site-packages\bbfreeze-0.97.3-py2.7-win32.egg\bbfreeze\freezer.py", line 410, in addScript
s = self.mf.run_script(path)
File "C:\Python27\lib\site-packages\bbfreeze-0.97.3-py2.7-win32.egg\bbfreeze\modulegraph\modulegraph.py", line 241, in run_script
co = compile(file(pathname, READ_MODE).read()+'\n', pathname, 'exec')
File "C:\dtmp\main.py", line 14
^
IndentationError: expected an indented block
Operating system -- Windows XP

Here's a quick walkthrough on how to read tracebacks. It's pretty easy.
Looking through your code, all of it is calling Python builtin modules. It's safe to say they're not causing the error, so the only thing left is the os.system call. Sure enough, you're calling python through said call (why would you not just import the module you want to call?).
The traceback confirms that the error is in the other Python you are calling:
Traceback (most recent call last):
File "bb-freeze-script.py", line 8, in <module>
load_entry_point('bbfreeze==0.97.3', 'console_scripts', 'bb-freeze')()
Next, read through the lines of the transcript to burrow through the call stack and find out exactly where the error occurred.
File "C:\Python27\lib\site-packages\bbfreeze-0.97.3-py2.7-win32.egg\bbfreeze\__init__.py", line 24, in main
f.addScript(x)
File "C:\Python27\lib\site-packages\bbfreeze-0.97.3-py2.7-win32.egg\bbfreeze\freezer.py", line 410, in addScript
s = self.mf.run_script(path)
File "C:\Python27\lib\site-packages\bbfreeze-0.97.3-py2.7-win32.egg\bbfreeze\modulegraph\modulegraph.py", line 241, in run_script
co = compile(file(pathname, READ_MODE).read()+'\n', pathname, 'exec')
until you get to
File "C:\dtmp\main.py", line 14
IndentationError: expected an indented block
There you go, the error is in line 14 of main.py, where you should have had an indent but didn't.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python re.sub not working correctly - python

You wrote: sys.stdout.write(re.sub(r'{0}','{1}',line.format(search,replace))) This is rather confusing. Is this what you meant to write? sys.stdout.write(re.sub('{0}'.format(search), '{0}'.format(replace), line)) That would of course be equivalent to: sys.stdout.write(re.sub(search, replace, line))

Related

pyteomics: calculate mass from a file containing the sequence of several proteins

While calling simplify in sympy getting error?

lxml xpath filter with OR condition not working

TypeError: not all arguments converted during string formatting error python

What's wrong in python script?

Categories

Resources