Related
I have a list that is read from a text file that outputs:
['/Users/myname/Documents/test1.txt\n', '/Users/myname/Documents/test2.txt\n', '/Users/myname/Documents/test3.txt\n']
I want to remove the \n from each element, but using .split() does not work on lists only strings (which is annoying as this is a list of strings).
How do I remove the \n from each element so I can get the following output:
['/Users/myname/Documents/test1.txt', '/Users/myname/Documents/test2.txt', '/Users/myname/Documents/test3.txt']
old_list = [x.strip() for x in old_list]
old_list refers to the list you want to remove the \n from.
Or if you want something more readable:
for x in range(len(old_list)):
old_list[x] = old_list[x].strip()
Does the same thing, without list comprehension.
strip() method takes out all the whitespaces, including \n.
But if you are not ok with the idea of removing whitespaces from start and end, you can do:
old_list = [x.replace("\n", "") for x in old_list]
or
for x in range(len(old_list)):
old_list[x] = old_list[x].replace("\n", "")
do a strip but keep in mind that the result is not modifying the original list, so you will need to reasign it if required:
a = ['/Users/myname/Documents/test1.txt\n', '/Users/myname/Documents/test2.txt\n', '/Users/myname/Documents/test3.txt\n']
a = [path.strip() for path in a]
print a
Give this code a try:
lst = ['/Users/myname/Documents/test1.txt\n', '/Users/myname/Documents/test2.txt\n', '/Users/myname/Documents/test3.txt\n']
for n, element in enumerate(lst):
element = element.replace('\n', '')
lst[n] = element
print(lst)
Use:
[i.strip() for i in lines]
in case you don't mind to lost the spaces and tabs at the beginning and at the end of the lines.
You can read the whole file and split lines using str.splitlines:
temp = file.read().splitlines()
if you still have problems go to this question where I got the answer from
How to read a file without newlines?
answered Sep 8 '12 at 11:57 Bakuriu
There are many ways to achieve your result.
Method 1: using split() method
l = ['/Users/myname/Documents/test1.txt\n', '/Users/myname/Documents/test2.txt\n', '/Users/myname/Documents/test3.txt\n']
result = [i.split('\n')[0] for i in l]
print(result) # ['/Users/myname/Documents/test1.txt', '/Users/myname/Documents/test2.txt', '/Users/myname/Documents/test3.txt']
Method 2: using strip() method that removes leading and trailing whitespace
l = ['/Users/myname/Documents/test1.txt\n', '/Users/myname/Documents/test2.txt\n', '/Users/myname/Documents/test3.txt\n']
result = [i.strip() for i in l]
print(result) # ['/Users/myname/Documents/test1.txt', '/Users/myname/Documents/test2.txt', '/Users/myname/Documents/test3.txt']
Method 3: using rstrip() method that removes trailing whitespace
l = ['/Users/myname/Documents/test1.txt\n', '/Users/myname/Documents/test2.txt\n', '/Users/myname/Documents/test3.txt\n']
result = [i.rstrip() for i in l]
print(result) # ['/Users/myname/Documents/test1.txt', '/Users/myname/Documents/test2.txt', '/Users/myname/Documents/test3.txt']
Method 4: using the method replace
l = ['/Users/myname/Documents/test1.txt\n', '/Users/myname/Documents/test2.txt\n', '/Users/myname/Documents/test3.txt\n']
result = [i.replace('\n', '') for i in l]
print(result) # ['/Users/myname/Documents/test1.txt', '/Users/myname/Documents/test2.txt', '/Users/myname/Documents/test3.txt']
Here is another way to do it with lambda:
cleannewline = lambda somelist : map(lambda element: element.strip(), somelist)
Then you can just call it as:
cleannewline(yourlist)
I have a very messy data, I am trying to remove elements that contains alphabets or words. I am trying to capture the elements that have alphanumerical and numerical values. I tried .isalpha() but it not working. How do I remove this?
lista = ['A8817-2938-228','12421','12323-12928-A','12323-12928',
'-','A','YDDEWE','hello','world','testing_purpose','testing purpose',
'A8232-2938-228','N7261-8271']
lista
Tried:
[i.isalnum() for i in lista] # gives boolean, but opposite of what I need.
Output:
['A8817-2938-228','12421','12323-12928-A','12323-12928','-','A8232-2938-228','N7261-8271']
Thanks!
You can add conditional checks in list comprehensions, so this is what you want:
new_list = [i for i in lista if not i.isalnum()]
print(new_list)
Output:
['A8817-2938-228', '12323-12928-A', '12323-12928', '-', 'testing_purpose', 'testing purpose', 'A8232-2938-228', 'N7261-8271']
Note that isalnum won't say True if the string contains spaces or underscores. One option is to remove them before checking: (You also need to use isalpha instead of isalnum)
new_list_2 = [i for i in lista if not i.replace(" ", "").replace("_", "").isalpha()]
print(new_list_2)
Output:
['A8817-2938-228', '12421', '12323-12928-A', '12323-12928', '-', 'A8232-2938-228', 'N7261-8271']
It seems you can just test at least one character is a digit or equality with '-':
res = [i for i in lista if any(ch.isdigit() for ch in i) or i == '-']
print(res)
['A8817-2938-228', '12421', '12323-12928-A', '12323-12928',
'-', 'A8232-2938-228', 'N7261-8271']
What type your data in the list?
You can try to do this:
[str(i).isalnum() for i in lista]
I have this list with part of speech tags and their specifics: ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']. As you can see there are no spaces between the characters, so it can be seen as one word.
Now I need a new list with only the part of speech tags, like this ['VNW', 'WW', 'LID'].
I tried removing the brackets and everything in them with a regex like this pattern = re.compile(r'(.*)').
I also tried to match only the capital letters, but I can't get it right. Suggestions?
Regular expression is not need for this case. Split by (; then get the first part only.
>>> 'VNW(pers,pron,nomin,red,2v,ev)'.split('(')
['VNW', 'pers,pron,nomin,red,2v,ev)']
>>> 'VNW(pers,pron,nomin,red,2v,ev)'.split('(')[0]
'VNW'
>>> xs = ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)',
'LID(bep,stan,rest)']
>>> [x.split('(')[0] for x in xs]
['VNW', 'WW', 'LID']
Some of the possible solutions are:
Removing Brackets using loop
l = ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
for i in range(len(l)):
i1,i2=l[i].find('('),l[i].find(')')
l[i]=l[i][:i1]+l[i][i2+1:]
print l
Using Regex
import re
pattern = r'\([^)]*\)'
l = ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
for i in range(len(l)):
l[i] = re.sub(pattern, '', l[i])
print l
Output: ['VNW', 'WW', 'LID']
Short solution using str.find() function:
l = ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
result = [i[:i.find('(')] for i in l]
result contents:
['VNW', 'WW', 'LID']
For example:
In [102]: s=['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
In [103]: [x.split('(', 1)[0] for x in s]
Out[103]: ['VNW', 'WW', 'LID']
I am trying to use line.strip() and line.split() to get an element out of a file, but this always gives me a list of string, does line.split() always return a string? how can I just get a list of elements instead of a list of 'elements'?
myfile = open('myfile.txt','r')
for line in myfile:
line_strip = line.strip()
myline = line_strip.split(' ')
print(myline)
So my code gives me ['hello','hi']
I want to get a list out of the file look likes[hello,hi]
[2.856,9.678,6.001] 6 Mary
[8.923,3.125,0.588] 7 Louis
[7.122,9.023,4,421] 16 Ariel
so when I try
list = []
list.append((mylist[0][0],mylist[0][1]))
I actually want a list = [(2.856,9.678),(8.923,3.123),(7.122,9.023)]
but it seems this mylist[0][0] refers to '[' in my file
my_string = 'hello'
my_list = list(my_string) # ['h', 'e', 'l', 'l', 'o']
my_new_string = ''.join(my_list) # 'hello'
I think you are looking for this
>>> print("[{}]".format(", ".join(data)))
[1, 2, 3]
To address your question, though
this always gives me a list of string,
Right. As str.split() should do.
does line.split() always return a string?
Assuming type(line) == str, then no, it returns a list of string elements from the split line.
how can I just get a list of elements instead of a list of 'elements'?
Your "elements" are strings. The ' marks are only Python's repr of a str type.
For example...
print('4') # 4
print(repr('4')) # '4'
line = "1,2,3"
data = line.split(",")
print(data) # ['1', '2', '3']
You can cast to a different data-type as you wish
print([float(x) for x in data]) # [1.0, 2.0, 3.0]
For what you posted, use a regex:
>>> s="[2.856,9.678,6.001] 6 Mary"
>>> import re
>>> [float(e) for e in re.search(r'\[([^\]]+)',s).group(1).split(',')]
[2.856, 9.678, 6.001]
For all the lines you posted (and this would be similar to a file) you might do:
>>> txt="""\
... [2.856,9.678,6.001] 6 Mary
... [8.923,3.125,0.588] 7 Louis
... [7.122,9.023,4,421] 16 Ariel"""
>>> for line in txt.splitlines():
... print [float(e) for e in re.search(r'\[([^\]]+)',line).group(1).split(',')]
...
[2.856, 9.678, 6.001]
[8.923, 3.125, 0.588]
[7.122, 9.023, 4.0, 421.0]
You would need to add error code to that (if the match fails for instance) but this is the core of what you are looking for.
BTW: Don't use list as a variable name. You will overwrite the list function and have confusing errors in the future...
line.split() returns a list of strings.
For example:
my_string = 'hello hi'
my_string.split(' ') is equal to ['hello', 'hi']
To put a list of strings, like ['hello', 'hi] back together, use join.
For example, ' '.join(['hello', 'hi']) is equal to 'hello hi'. The ' ' specifies to put a space between all the elements in the list that you are joining.
These commands:
l = ["1\n2"]
print(l)
print
['1\n2']
I want to print
['1
2']
Is it possible when we generate the list outside of the print() command?
A first attempt:
l = ["1\n2"]
print(repr(l).replace('\\n', '\n'))
The solution above doesn't work in tricky cases, for example if the string is "1\\n2" it replaces, but it shouldn't. Here is how to fix it:
import re
l = ["1\n2"]
print(re.sub(r'\\n|(\\.)', lambda match: match.group(1) or '\n', repr(l)))
Only if you are printing the element itself (or each element) and not the whole list:
>>> a = ['1\n2']
>>> a
['1\n2']
>>> print a
['1\n2']
>>> print a[0]
1
2
When you try to just print the whole list, it prints the string representation of the list. Newlines belong to individual elements so get printed as newlines only when print that element. Otherwise, you will see them as \n.
You should probably use this, if you have more than one element
>>> test = ['1\n2', '3', '4\n5']
>>> print '[{0}]'.format(','.join(test))
[1
2,3,4
5]
Try this:
s = ["1\n2"]
print("['{}']".format(s[0]))
=> ['1
2']