Split lines by a character or whitespace python - python

I'm trying to split the lines in the data file I'm playing with. This was originally someone else's code, just trying to 'fix it'. They have it splitting on a semi-colon, but I realized that they actually need it to split on excess whitespace as well. I've singled my problem out to the expression in line 28. I was trying some suggestions from other users, but when I use a regex command I get an invalid literal for int() warning. This is confusing because it works if I don't use the regex. Any suggestions? Thanks.
EDIT: Edited for full code link.

No, .split with no arguments is the only form that splits on any whitespace.
Use a regex like this:
re.split(r'[\s;]+', text)

Related

Get the string within brackets and remove useless string in Python

I have a string like this '0x69313430303239377678(i1400297vx)' I only want the value i1400297vx and nothing else.
Is there a simple way for example using strip method or I'm forced to use Regex,I'm not good at...
Someone could kindly help me?
This works, using split and strip:
'0x69313430303239377678(i1400297vx)'.split('(')[1].strip(')')
but a regex would be more readable!

How to convert variable to a regex string?

I am working in python I am looping through a large group of strings and I want to be able to see if they are in a second list of strings.
for line in dictionary:
line = line.replace('\r\n','').replace('\n','')
for each in complex8list:
txt = re.compile(.*line.*)
if re.search(each, txt):
I need to be able to check if the string with anything before it, and anything after it is in the second list.
What is the correct syntax to do this?
If line isn't a regex, you don't even need regex for this.
if line in each:
If line is a regex, then you don't need to do anything since a leading .* is implied with re.search and a trailing .* is unnecessary.
if re.search(line, each):
BTW you seem to have the arguments to re.search backwards.

Regular expression to match a string when reading a file

I have files which contain information in the following manner
2458813.92557 10 20 30 #00FA0040000000010100005AB9000FFE86000F3596109000000703000100001000000000000036404E000000004000000020000000000032*
All of this is in the same line. I am only interested in getting only the portions in bold. I have the following regular expression to get what i want:
^(\d{7}\.\d{5}).*#([\dA-Z]+)\*
The regex works fine but when i use it this in python it does not include the # and the * in the second bold string. I am using re.match(r'^(\d{7}\.\d{5}).*#([\dA-Z]+\*)') in python. I would love to know why this is and what would be the solution to it.
Thanks
Grouping was wrong, use below regex.
^(\d{7}\.\d{5}).*(#[\dA-Z]+\*)

Python regex: find words and emoticons

I want to find matches between a tweet and a list of strings containing words, phrases, and emoticons. Here is my code:
words = [':)','and i','sleeping','... :)','! <3','facebook']
regex = re.compile(r'\b%s\b|(:\(|:\))+' % '\\b|\\b'.join(words), flags=re.IGNORECASE)
I keep receiving this error:
error: unbalanced parenthesis
Apparently there is something wrong with the code and it cannot match emoticons. Any idea how to fix it?
I tried the below and it stopped throwing the error:
words = [':\)','and i','sleeping','... :\)','! <3','facebook']
The re module has a function escape that takes care of correct escaping of words, so you could just use
words = map(re.escape, [':)','and i','sleeping','... :)','! <3','facebook'])
Note that word boundaries might not work as you expect when used with words that don't start or end with actual word characters.
While words has all the necessary formatting, re uses ( and ) as special characters. This requires you to use \( or \) to avoid them being interpreted as special characters, but rather as the ASCII characters 40 and 41. Since you didn't understand what #Nicarus was saying, you need to use this:
words = [':\)','and i','sleeping','... :\)','! <3','facebook']
Note: I'm only spelling it out because this doesn't seem like a school assignment, for all the people who might want to criticize this. Also, look at the documentation prior to going to stack overflow. This explains everything.

re.sub issue when using group with \number

i'm trying to use a regexp to arrange some text, with re.sub.
Let's say it's an almost csv file that I have to clean to make it totally csv.
I replaced all \t by \n doing :
t = t.replace("\n", "\t")
... and it works just fine. After that, I need to get some \t back to \n, for each of my CSV lines. I use for that this expression :
t = re.sub("\t(\d*?);", "\n\1;", t, re.U)
The problem is it works... but partially. The \n are added properly, but instead of being followed by my matching group, they are followed by a ^A (according to Vim)
I tried my regexp using a re.findall and it works juste fine... so what could be wrong according to you ?
My CSV lines are finally supposed to be like :
number;text;text;...;...;\n
Thanks for your help !
Your \1 is interpreted as the ascii character 1.
Try using \\1 or r"\n\1;" .
Like Scharron said, always always always use raw-string (r'') notation with regexes. Get into that habit and then you won't have to debug weird issues like this.
r'\n\1;'

Categories

Resources