Split variable into multiple variables in python - python

In python, I have a variable with 2 lines of stuff which I want to split into 2 different variables. I also want it to do it for any number of lines, for example I can change the number of lines and it automatically splits it for how many lines there are. How would I go about doing this?

If you're sometimes unsure about how many lines there are, you'd be forced to dynamically create variable names. Don't do this. This is almost always the wrong approach to take, and a better way exists.
Instead, use the list that .split() produces directly, and process the data inside of that.
If you need to further split each line in your string into two parts, you can use nested lists. Here is a simple example of the method I think you should use instead:
# your input string
string = "first line\nsecond line\nthird line\nfourth line"
# use a list to store your data, and proccses it instead.
data = []
for line in string.split('\n'):
# split each line into two parts.
first_part, second_part = line[:-4], line[-4:]
data.append([first_part, second_part])
# data:
#
# [['first ', 'line'],
# ['second ', 'line'],
# ['third ', 'line'],
# ['fourth ', 'line']]
print(data)
You can access each part of data using a certain index. For example, if you wanted to process the first line of string, you could use data[0] which yields ['first ', 'line'].

something like below
>>> a="""this is a
multiline
string"""
>>> for eachline in a.split('\n'): print(eachline)
this is a
multiline
string

when you split the string it gets converted into a list.
>>> split_list=a.split('\n')
>>> split_list
['this is a', 'multiline', 'string']
then you can access via
>>> var1=split_list[0]
>>> var1
'this is a'
and so on...
else you can run a for loop to get each item of the list
for eachline in a.split('\n'): print(eachline)

Related

Tabs \n in list for python

I have simple script in python, want return per line the values
Tabs = # and \n
SCRIPT
output = ['192.168.0.1 #SRVNET\n192.168.0.254 #SRVDATA']
output = output[0].split('#')
output.split('\n')
OUTPUT
AttributeError: 'list' object has no attribute 'split'
After you split the first time, output is a list which doesn't support .split.
If splitting on two different items, you can use a regular expression with re.split:
>>> import re
>>> output = ['192.168.0.1 #SRVNET\n192.168.0.254 #SRVDADOS']
>>> re.split(r'\n|\s*#\s*',output[0]) # newline or comment (removing leading/trailing ws)
['192.168.0.1', 'SRVNET', '192.168.0.254', 'SRVDADOS']
You may want to group the IP with a comment as well, for example:
>>> [re.split(r'\s*#\s*',line) for line in output[0].splitlines()]
[['192.168.0.1', 'SRVNET'], ['192.168.0.254', 'SRVDADOS']]
The output of the line :
output = output[0].split('#')
is actually a list. ".split" always returns a list. In your case the output looks like this:
['192.168.0.1 ', 'SRVNET\n192.168.0.254 ', 'SRVDATA']
And as the error rightly points out, a list cannot be "split" using the ".split" which it does not support.
So now if you wanna further split the list when "#" is encountered, then this can be solved by iterating through the list and calling the split function like this:
output=['192.168.0.1 ', 'SRVNET\n192.168.0.254 ', 'SRVDATA']
for i in output:
if "\n" in i:
print("yes")
output_1=i.split("\n")
This will give the "output_1" as:
['SRVNET', '192.168.0.254 ']
If you don't want to use re, then you need to apply split("\n") to each element of output[0].split("#"), then concatenate the results together again. One way to do that is
result = [y for x in output[0].split("#") for y in x.split("\n")]

Splitting up input based on separators and storing the values

so I am new to Python. I was wondering how I could take something like
"James-Dean-Winchester"
or
"James:Dean:Winchester"
or simply
"James Dean Winchester"
and have python be able to see which format is which, split the input based on the format and then store it in variables to be modified later on. Could I somehow store the splitting characters (":","-"," ") in an array then call the array on the text that I am wishing to split or is there an easier way of doing it?
Update: I should have added that there will only ever be one type of separator.
you could define a function that performs the split and returns the separator in addition to the separated array:
def multiSepSplit(string,separators=["-",":"," "]):
return max([(string.split(sep),sep) for sep in separators],key=lambda s:len(s[0]))
multiSepSplit("James-Dean-Winchester")
# (['James', 'Dean', 'Winchester'], '-')
multiSepSplit("James Dean Winchester")
# (['James', 'Dean', 'Winchester'], ' ')
multiSepSplit("James:Dean:Winchester")
# (['James', 'Dean', 'Winchester'], ':')
How it works is by performing all the splits using a list comprehension on the separators and taking the one with the maximum number of elements in the resulting array.
Each entry in the list is actually a tuple with the resulting array s[0] and the separator that was used s[1].
If you do not know which delimiter is in play for each string, you need to write some logic for this.
One suggestion is to maintain a list of potential delimiters (sorted by preference / popularity) and test whether they occur in your string more than once.
Below is an example.
delimiters = list('-: ')
test_list = ['James-Dean-Winchester', 'April:May:June',
'John Abraham Smith', 'Joe:Ambiguous-Connor']
def get_delimiter(x, delim):
for sep in delim:
if x.count(sep) > 1:
return sep
else:
return None
result = [get_delimiter(i, delimiters) for i in test_list]
# ['-', ':', ' ', None]
You can then link test_list with result via zip, i.e. by iterating indices in each list sequentially.
You can split a string by a delimiter using, for example, 'mystr1-mystr2-mystr3'.split('-').

Trying to remove list of sentences from text, only removes first character

I have made the following class
class SentenceReducer():
def getRidOfSentences(self, line, listSentences):
for i in listSentences:
print(i)
return line.replace(i, '')
strings = 'This is a'
def stripSentences(self, aTranscript):
result = [self.getRidOfSentences(line, self.strings) for line in aTranScript]
return(result)
It should basically eat a dataframe and then line per line check whether the relevant line conains a sentence from listSentences (1 in this example)
However when I create a new class
newClass = SentenceReducer()
And run the script with the following data
aTranScript = [ 'This is a test', 'This is not a test']
new_df = newClass.stripSentences(aTranScript)
It deletes the 'T' in my original data. But it should replace the whole sentence ('This is a'). Also if I add the print(i) it prints T.
Any thoughts on what goes wrong here?
Inside getRidOfSentences, the variable listSentences has the value 'This is a', which is a string.
Iterating over a string gives the individual characters:
>>> strings = 'This is a'
>>> for x in strings:
... print(x)
T
h
i
s
i
s
a
You want to put this string in a list, so that iterating over that list gives you the whole string, not its individual characters:
>>> strings = ['This is a']
>>> for x in strings:
... print(x)
This is a
Another problem: The return inside the for loop means that the function exits at the end of the first iteration, that's why you only see T, but not h, i, s and so on.
First, aTranscript and aTranScript are not the same variable (notice the capital s in the latter).
Second, you should access listSentences with self.listSentences or SentenceReducer.listSentences.
Third, you're using string which isn't declared anywhere.
And last, the function stripSentences doesn't return anything.

Pythonically remove the first X words of a string

How do I do it Pythonically?
I know how to delete the first word, but now I need to remove three.
Note that words can be delimited by amount of whitecap, not just a single space (although I could enforce a single white space if it must be so).
[Update] I mean any X words; I don't know hat they are.
I am considering looping and repeatedly removing the first word, joining together again, rinsing and repeating.
s = "this is my long sentence"
print ' '.join(s.split(' ')[3:])
This will print
"long sentence"
Which I think is what you need (it will handle the white spaces the way you wanted).
Try:
import re
print re.sub("(\w+)", "", "a sentence is cool", 3)
Prints cool
This can be done by simple way as:
In [7]: str = 'Hello, this is long string'
In [8]: str = str[3:]
In [9]: str
Out[9]: 'lo, this is long string'
In [10]:
Now you can update 3 on line In[8] with your X
You can use the split function to do this. Essentially, it splits the string up into individual (space separated, by default) words. These words are stored in a list and then from that list, you can access the words you want, just like you would with a normal list of other data types. Using the desired words you can then join the list to form a string.
for example:
import string
str='This is a bunch of words'
string_list=string.split(
#The string is now stored in a list that looks like:
#['this', 'is', 'a', 'bunch', 'of', 'words']
new_string_list=string_list[3:]
#the list is now: ['bunch', 'of', 'words']
new_string=string.join(new_string_list)
#you now have the string 'bunch of words'
You can also do this in fewer lines, if desired (not sure if this is pythonic though)
import string as st
str='this is a bunch of words'
new_string=st.join(st.split(str[3:])
print new_string
#output would be 'bunch of words'
You can use split:
>>> x = 3 # number of words to remove from beginning
>>> s = 'word1 word2 word3 word4'
>>> s = " ".join(s.split()) # remove multiple spacing
>>> s = s.split(" ", x)[x] # split and keep elements after index x
>>> s
'word4'
This will handle multiple spaces as well.

How to split a line but keep a variable in the line unsplit in python

I have a line that i want to split into three parts:
line4 = 'http://www.example.org/lexicon#'+synset_offset+' http://www.monnetproject.eu/lemon#gloss '+gloss+''
The variable gloss contains full sentences, which I dont want to be split. How do I stop this from happening?
The final 3 split parts should be:
'http://www.example.org/lexicon#'+synset_offset+'
http://www.monnetproject.eu/lemon#gloss
'+gloss+''
after running triple = line4.split()
I'm struggling to understand, but why not just create a list to start with:
line4 = [
'http://www.example.org/lexicon#' + synset_offset,
'http://www.monnetproject.eu/lemon#gloss',
gloss
]
Simplified example - instead of joining them all together, then splitting them out again, just join them properly in the first place:
a = 'hello'
b = 'world'
c = 'i have spaces in me'
d = ' '.join((a,b,c)) # <- correct way
# hello world i have spaces in me
print ' '.join(d.split(' ', 2)) # take joined, split out again making sure not to split `c`, then join back again!?
If they are all begin with "http" you could split them using http as delimiter, otherwise you could do two steps:
First extract the first url from the string by using the space or http as
firstSplit=line4.split(' ', 1)
firstString= firstSplit.pop(0) -> pop the first url
secondSplit =firstSplit.join() -> join the rest
secondSplit[-1].split('lemon#gloss') ->splits the remaining two
>>> synset_offset = "foobar"
>>> gloss = "This is a full sentence."
>>> line4 = 'http://www.example.org/lexicon#'+synset_offset+' http://www.monnetproject.eu/lemon#gloss '+gloss
>>> import string
>>> string.split(line4, maxsplit=2)
['http://www.example.org/lexicon#foobar', 'http://www.monnetproject.eu/lemon#gloss', 'This is a full sentence.']
Not sure what you're trying to do here. If in general you're looking to avoid splitting a keyword, you should do:
>>> string.split(line:line.index(keyword)) + [line[line.index(keyword):line.index(keyword)+len(keyword)]] + string.split(line[line.index(keyword)+len(keyword):])
If the gloss (or whatever keyword part) of the string is the end part, that slice will just be an empty string ''; if that is the case, don't append it, or remove it if you do.

Categories

Resources