How do you split a string at a specific point? - python

I am new to python and want to split what I have read in from a text file into two specific parts. Below is an example of what could be read in:
f = ['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]
So what I want to achieve is to be able to execute the second part of the program is:
words = ['Cats','like','dogs','as','much','cats.']
numbers = [1,2,3,4,5,4,3,2,6]
I have tried using:
words,numbers = f.split("][")
However, this removes the double bracets from the two new variable which means the second part of my program which recreates the original text does not work.
Thanks.

I assume f is a string like
f = "['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]"
then we can find the index of ][ and add one to find the point between the brackets
i = f.index('][')
a, b = f[:i+1], f[i+1:]
print(a)
print(b)
output:
['Cats','like','dogs','as','much','cats.']
[1,2,3,4,5,4,3,2,6]

Another Alternative if you want to still use split()
f = "['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]"
d="]["
print f.split(d)[0]+d[0]
print d[1]+f.split(d)[1]

If you can make your file look something like this:
[["Cats","like","dogs","as","much","cats."],[1,2,3,4,5,4,3,2,6]]
then you could simply use Python's json module to do this for you. Note that the JSON format requires double quotes rather than single.
import json
f = '[["Cats","like","dogs","as","much","cats."],[1,2,3,4,5,4,3,2,6]]'
a, b = json.loads(f)
print(a)
print(b)
Documentation for the json library can be found here: https://docs.python.org/3/library/json.html

An alternative to Patrick's answer using regular expressions:
import re
data = "f = ['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]"
pattern = 'f = (?P<words>\[.*?\])(?P<numbers>\[.*?\])'
match = re.match(pattern, data)
words = match.group('words')
numbers = match.group('numbers')
print(words)
print(numbers)
Output
['Cats','like','dogs','as','much','cats.']
[1,2,3,4,5,4,3,2,6]

If I understand correctly, you have a text file that contains ['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6] and you just need to split that string at the transition between brackets. You can do this with the string.index() method and string slicing. See my console output below:
>>> f = open('./catsdogs12.txt', 'r')
>>> input = f.read()[:-1] # Read file without trailing newline (\n)
>>> input
"['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]"
>>> bracket_index = input.index('][') # Get index of transition between brackets
>>> bracket_index
41
>>> words = input[:bracket_index + 1] # Slice from beginning of string
>>> words
"['Cats','like','dogs','as','much','cats.']"
>>> numbers = input[bracket_index + 1:] # Slice from middle of string
>>> numbers
'[1,2,3,4,5,4,3,2,6]'
Note that this will leave you with a python string that looks visually identical to a list (array). If you needed the data represented as python native objects (i.e. so that you can actually use it like a list), you'll need to use some combination of string[1:-1].split(',') on both strings and list.map() on the numbers list to convert the numbers from strings to numbers.
Hope this helps!

Another thing you can do is first replace ][ with ]-[ and then do a split or partition using - but i will suggest you to do split as we don't want that delimiter.
SPLIT
f = "['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]"
f = f.replace('][',']-[')
a,b = f.split('-')
Output
>>> print(a)
['Cats','like','dogs','as','much','cats.']
>>> print(b)
[1,2,3,4,5,4,3,2,6]
PARTITION
f = "['Cats','like','dogs','as','much','cats.'][1,2,3,4,5,4,3,2,6]"
f = f.replace('][',']-[')
a,b,c = f.partition('-')
Output
>>> print(a)
['Cats','like','dogs','as','much','cats.']
>>> print(c)
[1,2,3,4,5,4,3,2,6]

Related

How to convert string to list?

I have a list:
sample = ["['P001']"]
How to convert it into:
sample = [["P001" ]]
I tried [[x] for x in sample] but the output is [["[P001]"]]
Edited : I am very sorry for the inconvenience because I have made mistake in the question I ask. Actually is the "P001" a string. Sorry again.
If you want the output as this [['P001']], you can do this:
>>> sample = ["[P001]"]
>>> [[sample[0].strip('[\]')]]
[['P001']]
But if you want the output as [[P001]] I don't know if that's possible. P001 is not a python type, and if P001 is a defined variable before then the modified list will hold the value of P001 not the name itself. For example:
>>> P001 = 'something'
>>> sample = ["[P001]"]
>>> [eval(sample[0])]
[['something']]
This is an array of strings and each string consists of pattern like "[P001]". You need to loop over the outer array and run a match with each element to get the inner value, in this case 'P001'. Then you can append the value as you like.
s = ["[P001]", "[S002]"]
result = []
import re
for i in s:
r = re.match(r"\[(.*?)\]", i)
if r:
result.append([r.group(1)])
print(result)
[['P001'], ['S002']]
It can be done just using a regular expression for alphanumeric characters:
sample = ["[P001,P002]","[P003,P004]"]
import re
sample = [re.findall("\w+", sublist) for sublist in sample]

How to delete everything from string up to the specific character in Python

I wanted to extract only date from following string. Here is variable:
file = '62-201809.csv'
I used rsplit to get rid of file csv extension like this:
splitf = file.rsplit('.', 1)[0]
I got 62-201809 so it's okey but now i need to get rid of everything to '-' and store only 201809 into variable.How to do it?
Try using:
>>> file = '62-201809.csv'
>>> file.split('-', 1)[1].split('.')[0]
'201809'
>>>
Or use regex:
>>> import re
>>> file = '62-201809.csv'
>>> re.search('-(\d+)', file).group(1)
'201809'
>>>
If want only use split can do that:
filen = '62-201809.csv'
number = filen.split('.')[0]
number2 = number.split('-')[1]
print(number2)
first get only number, and later the number 201809 only.

Python: Split between two characters

Let's say I have a ton of HTML with no newlines. I want to get each element into a list.
input = "<head><title>Example Title</title></head>"
a_list = ["<head>", "<title>Example Title</title>", "</head>"]
Something like such. Splitting between each ><.
But in Python, I don't know of a way to do that. I can only split at that string, which removes it from the output. I want to keep it, and split between the two equality operators.
How can this be done?
Edit: Preferably, this would be done without adding the characters back in to the ends of each list item.
# initial input
a = "<head><title>Example Title</title></head>"
# split list
b = a.split('><')
# remove extra character from first and last elements
# because the split only removes >< pairs.
b[0] = b[0][1:]
b[-1] = b[-1][:-1]
# initialize new list
a_list = []
# fill new list with formatted elements
for i in range(len(b)):
a_list.append('<{}>'.format(b[i]))
This will output the given list in python 2.7.2, but it should work in python 3 as well.
You can try this:
import re
a = "<head><title>Example Title</title></head>"
data = re.split("><", a)
new_data = [data[0]+">"]+["<" + i+">" for i in data[1:-1]] + ["<"+data[-1]]
Output:
['<head>', '<title>Example Title</title>', '</head>']
The shortest approach using re.findall() function on extended example:
# extended html string
s = "<head><title>Example Title</title></head><body>hello, <b>Python</b></body>"
result = re.findall(r'(<[^>]+>[^<>]+</[^>]+>|<[^>]+>)', s)
print(result)
The output:
['<head>', '<title>Example Title</title>', '</head>', '<body>', '<b>Python</b>', '</body>']
Based on the answers by other people, I made this.
It isn't as clean as I had wanted, but it seems to work. I had originally wanted to not re-add the characters after split.
Here, I got rid of one extra argument by combining the two characters into a string. Anyways,
def split_between(string, chars):
if len(chars) is not 2: raise IndexError("Argument chars must contain two characters.")
result_list = [chars[1] + line + chars[0] for line in string.split(chars)]
result_list[0] = result_list[0][1:]
result_list[-1] = result_list[-1][:-1]
return result_list
Credit goes to #cforemanand #Ajax1234.
Or even simpler, this:
input = "<head><title>Example Title</title></head>"
print(['<'+elem if elem[0]!='<' else elem for elem in [elem+'>' if elem[-1]!='>' else elem for elem in input.split('><') ]])

How can I strip the first 14 characters in an list element using python?

I have a txt file, from which I need to search a specific line, which is working, but in that line I need to strip the first 14 characters, and the part of the list element I am interested is dynamically generated during run time. So, scenario is I ran a script and the output is saved in output.txt, now I am parsing it, here is what I have tried
load_profile = open('output.txt', "r"
read_it = load_profile.read()
myLines = [ ]
for line in read_it.splitlines():
if line.find("./testSuites/") > -1
myLines.append(line)
print myLines
which gives output:
['*** Passed :) at ./testSuites/TS1/2013/06/17/15.58.12.744_14']
I need to parse ./testSuites/TS1/2013/06/17/15.58.12.744_14' part only and 2013 and est of the string is dynamically generated.
Could you please guide me what would be best way to achieve it?
Thanks in advance
Urmi
Use slicing:
>>> strs = 'Passed :) at ./testSuites/TS1/2013/06/17/15.58.12.744_14'
>>> strs[13:]
'./testSuites/TS1/2013/06/17/15.58.12.744_14'
Update : use lis[0] to access the string inside that list.
>>> lis = ['*** Passed :) at ./testSuites/TS1/2013/06/17/15.58.12.744_14']
>>> strs = lis[0]
>>> strs[17:] # I think you need 17 here
'./testSuites/TS1/2013/06/17/15.58.12.744_14'
You are asking how to strip the first 14 characters, but what if your strings don't always have that format in the future? Try splitting the string into substrings (removing whitespace) and then just get the substring with "./testSuites/" in it.
load_profile = open('output.txt', "r")
read_it = load_profile.read()
myLines = [ ]
for line in read_it.splitlines():
for splt in line.split():
if "./testSuites/" in splt:
myLines.append(splt)
print myLines
Here's how it works:
>>> pg = "Hello world, how you doing?\nFoo bar!"
>>> print pg
Hello world, how you doing?
Foo bar!
>>> lines = pg.splitlines()
>>> lines
["Hello world, how you doing?", 'Foo bar!']
>>> for line in lines:
... for splt in line.split():
... if "Foo" in splt:
... print splt
...
Foo
>>>
Of course, if you do in fact have strict requirements on the formats of these lines, you could just use string slicing (strs[13:] as Ashwini says) or you could split the line and do splt[-1] (which means get the last element of the split line list).

appending regex matches to a dictionary

I have a file in which there is the following info:
dogs_3351.txt:34.13559322033898
cats_1875.txt:23.25581395348837
cats_2231.txt:22.087912087912088
elephants_3535.txt:37.092592592592595
fish_1407.txt:24.132530120481928
fish_2078.txt:23.470588235294116
fish_2041.txt:23.564705882352943
fish_666.txt:23.17241379310345
fish_840.txt:21.77173913043478
I'm looking for a way to match the colon and append whatever appears afterwards (the numbers) to a dictionary the keys of which are the name of the animals in the beginning of each line.
Actually, regular expressions are unnecessary, provided that your data is well formatted and contains no surprises.
Assuming that data is a variable containing the string that you listed above:
dict(item.split(":") for item in data.split())
t = """
dogs_3351.txt:34.13559322033898
cats_1875.txt:23.25581395348837
cats_2231.txt:22.087912087912088
elephants_3535.txt:37.092592592592595
fish_1407.txt:24.132530120481928
fish_2078.txt:23.470588235294116
fish_2041.txt:23.564705882352943
fish_666.txt:23.17241379310345
fish_840.txt:21.77173913043478
"""
import re
d = {}
for p, q in re.findall(r'^(.+?)_.+?:(.+)', t, re.M):
d.setdefault(p, []).append(q)
print d
why dont you use the python find method to locate the index of the colons which you can use to slice the string.
>>> x='dogs_3351.txt:34.13559322033898'
>>> key_index = x.find(':')
>>> key = x[:key_index]
>>> key
'dogs_3351.txt'
>>> value = x[key_index+1:]
>>> value
'34.13559322033898'
>>>
Read in each line of the file as a text and process the lines individually as above.
Without regex and using defaultdict:
from collections import defaultdict
data = """dogs_3351.txt:34.13559322033898
cats_1875.txt:23.25581395348837
cats_2231.txt:22.087912087912088
elephants_3535.txt:37.092592592592595
fish_1407.txt:24.132530120481928
fish_2078.txt:23.470588235294116
fish_2041.txt:23.564705882352943
fish_666.txt:23.17241379310345
fish_840.txt:21.77173913043478"""
dictionary = defaultdict(list)
for l in data.splitlines():
animal = l.split('_')[0]
number = l.split(':')[-1]
dictionary[animal] = dictionary[animal] + [number]
Just make sure your data is well formatted

Categories

Resources