myArray = []
textFile = open("file.txt")
lines = textFile.readlines()
for line in lines:
myArray.append(line.split(" "))
print (myArray)
This code outputs
[['a\n'], ['b\n'], ['c\n'], ['d']]
What would I need to do to make it output
a, b, c, d
You're adding a list to your result (split returns a list). Moreover, specifying "space" for split character isn't the best choice, because it doesn't remove linefeed, carriage return, double spaces which create an empty element.
You could do this using a list comprehension, splitting the items without argument (so the \n naturally goes away)
with open("file.txt") as lines:
myArray = [x for line in lines for x in line.split()]
(note the with block so file is closed as soon as exited, and the double loop to "flatten" the list of lists into a single list: can handle more than 1 element in a line)
then, either you print the representation of the array
print (myArray)
to get:
['a', 'b', 'c', 'd']
or you generate a joined string using comma+space
print(", ".join(myArray))
result:
a, b, c, d
You could do the following:
myArray = [[v.strip() for v in x] for x in myArray]
This will remove all the formatting characters.
If you do not want each character to be in its own array, you could then do:
myArray = [v[0] for v in myArray]
To print, then 'print(', '.join(myArray))
It seems you should be using strip() (trim whitespace) rather than split() (which generates a list of chunks of string.
myArray.append(line.strip())
Then 'print(myArray)' will generate:
['a', 'b', 'c', 'd']
To print 'a, b, c, d' you can use join():
print(', '.join(myArray))
You can try something like:
import re
arr = [['a\n'], ['b\n'], ['c\n'], ['d']]
arr = ( ", ".join( repr(e) for e in arr ))
arr = arr.replace('\\n', '')
new = re.sub(r'[^a-zA-Z0-9,]+', '', arr)
print(new)
Result:
a,b,c,d
Related
For a given string s='ab12dc3e6' I want to add 'ab' and '12' in two different lists. that means for output i am trying to achieve as temp1=['ab','dc','e'] and for temp2=['12,'3','6'].
I am not able to do so with the following code. Can someone provide an efficient way to do it?
S = "ab12dc3e6"
temp=list(S)
x=''
temp1=[]
temp2=[]
for i in range(len(temp)):
while i<len(temp) and (temp[i] and temp[i+1]).isdigit():
x+=temp[i]
i+=1
temp1.append(x)
if not temp[i].isdigit():
break
You can also solve this without any imports:
S = "ab12dc3e6"
def get_adjacent_by_func(content, func):
"""Returns a list of elements from content that fullfull func(...)"""
result = [[]]
for c in content:
if func(c):
# add to last inner list
result[-1].append(c)
elif result[-1]: # last inner list is filled
# add new inner list
result.append([])
# return only non empty inner lists
return [''.join(r) for r in result if r]
print(get_adjacent_by_func(S, str.isalpha))
print(get_adjacent_by_func(S, str.isdigit))
Output:
['ab', 'dc', 'e']
['12', '3', '6']
you can use regex, where you group letters and digits, then append them to lists
import re
S = "ab12dc3e6"
pattern = re.compile(r"([a-zA-Z]*)(\d*)")
temp1 = []
temp2 = []
for match in pattern.finditer(S):
# extract words
#dont append empty match
if match.group(1):
temp1.append(match.group(1))
print(match.group(1))
# extract numbers
#dont append empty match
if match.group(2):
temp2.append(match.group(2))
print(match.group(2))
print(temp1)
print(temp2)
Your code does nothing for isalpha - you also run into IndexError on
while i<len(temp) and (temp[i] and temp[i+1]).isdigit():
for i == len(temp)-1.
You can use itertools.takewhile and the correct string methods of str.isdigit and str.isalpha to filter your string down:
S = "ab12dc3e6"
r = {"digit":[], "letter":[]}
from itertools import takewhile, cycle
# switch between the two test methods
c = cycle([str.isalpha, str.isdigit])
r = {}
i = 0
while S:
what = next(c) # get next method to use
k = ''.join(takewhile(what, S))
S = S[len(k):]
r.setdefault(what.__name__, []).append(k)
print(r)
Output:
{'isalpha': ['ab', 'dc', 'e'],
'isdigit': ['12', '3', '6']}
This essentially creates a dictionary where each seperate list is stored under the functions name:
To get the lists, use r["isalpha"] or r["isdigit"].
I am trying to do slicing in string "abcdeeefghij", here I want the slicing in such a way that whatever input I use, i divide the output in the format of a list (such that in one list element no alphabets repeat).
In this case [abcde,e,efghij].
Another example is if input is "aaabcdefghiii". Here the expected output is [a,a,acbdefghi,i,i].
Also amongst the list if I want to find the highest len character i tried the below logic:
max_str = max(len(sub_strings[0]),len(sub_strings[1]),len(sub_strings[2]))
print(max_str) #output - 6
which will yield 6 as the output, but i presume this logic is not a generic one: Can someone suggest a generic logic to print the length of the maximum string.
Here is how:
s = "abcdeeefghij"
l = ['']
for c in s: # For character in s
if c in l[-1]: # If the character is already in the last string in l
l.append('') # Add a new string to l
l[-1] += c # Add the character to either the last string, either new, or old
print(l)
Output:
['abcde', 'e', 'efghij']
Use a regular expression:
import re
rx = re.compile(r'(\w)\1+')
strings = ['abcdeeefghij', 'aaabcdefghiii']
lst = [[part for part in rx.split(item) if part] for item in strings]
print(lst)
Which yields
[['abcd', 'e', 'fghij'], ['a', 'bcdefgh', 'i']]
You would loop over the characters in the input and start a new string if there is an existing match, otherwise join them onto the last string in the output list.
input_ = "aaabcdefghiii"
output = []
for char in input_:
if not output or char in output[-1]:
output.append("")
output[-1] += char
print(output)
To avoid repetition of alphabet within a list element repeat, you can greedily track what are the words that are already in the current list. Append the word to your answer once you detected a repeating alphabet.
from collections import defaultdict
s = input()
ans = []
d = defaultdict(int)
cur = ""
for i in s:
if d[i]:
ans.append(cur)
cur = i # start again since there is repeatition
d = defaultdict(int)
d[i] = 1
else:
cur += i #append to cur since no repetition yet
d[i] = 1
if cur: # handlign the last part
ans.append(cur)
print(ans)
An input of aaabcdefghiii produces ['a', 'a', 'abcdefghi', 'i', 'i'] as expected.
I am trying to use line.strip() and line.split() to get an element out of a file, but this always gives me a list of string, does line.split() always return a string? how can I just get a list of elements instead of a list of 'elements'?
myfile = open('myfile.txt','r')
for line in myfile:
line_strip = line.strip()
myline = line_strip.split(' ')
print(myline)
So my code gives me ['hello','hi']
I want to get a list out of the file look likes[hello,hi]
[2.856,9.678,6.001] 6 Mary
[8.923,3.125,0.588] 7 Louis
[7.122,9.023,4,421] 16 Ariel
so when I try
list = []
list.append((mylist[0][0],mylist[0][1]))
I actually want a list = [(2.856,9.678),(8.923,3.123),(7.122,9.023)]
but it seems this mylist[0][0] refers to '[' in my file
my_string = 'hello'
my_list = list(my_string) # ['h', 'e', 'l', 'l', 'o']
my_new_string = ''.join(my_list) # 'hello'
I think you are looking for this
>>> print("[{}]".format(", ".join(data)))
[1, 2, 3]
To address your question, though
this always gives me a list of string,
Right. As str.split() should do.
does line.split() always return a string?
Assuming type(line) == str, then no, it returns a list of string elements from the split line.
how can I just get a list of elements instead of a list of 'elements'?
Your "elements" are strings. The ' marks are only Python's repr of a str type.
For example...
print('4') # 4
print(repr('4')) # '4'
line = "1,2,3"
data = line.split(",")
print(data) # ['1', '2', '3']
You can cast to a different data-type as you wish
print([float(x) for x in data]) # [1.0, 2.0, 3.0]
For what you posted, use a regex:
>>> s="[2.856,9.678,6.001] 6 Mary"
>>> import re
>>> [float(e) for e in re.search(r'\[([^\]]+)',s).group(1).split(',')]
[2.856, 9.678, 6.001]
For all the lines you posted (and this would be similar to a file) you might do:
>>> txt="""\
... [2.856,9.678,6.001] 6 Mary
... [8.923,3.125,0.588] 7 Louis
... [7.122,9.023,4,421] 16 Ariel"""
>>> for line in txt.splitlines():
... print [float(e) for e in re.search(r'\[([^\]]+)',line).group(1).split(',')]
...
[2.856, 9.678, 6.001]
[8.923, 3.125, 0.588]
[7.122, 9.023, 4.0, 421.0]
You would need to add error code to that (if the match fails for instance) but this is the core of what you are looking for.
BTW: Don't use list as a variable name. You will overwrite the list function and have confusing errors in the future...
line.split() returns a list of strings.
For example:
my_string = 'hello hi'
my_string.split(' ') is equal to ['hello', 'hi']
To put a list of strings, like ['hello', 'hi] back together, use join.
For example, ' '.join(['hello', 'hi']) is equal to 'hello hi'. The ' ' specifies to put a space between all the elements in the list that you are joining.
Assuming a python array "myarray" contains:
mylist = [u'a',u'b',u'c']
I would like a string that contains all of the elements in the array, while preserving the double quotes like this (notice how there are no brackets, but parenthesis instead):
result = "('a','b','c')"
I tried using ",".join(mylist), but it gives me the result of "a,b,c" and eliminated the single quotes.
You were quite close, this is how I would've done it:
result = "('%s')" % "','".join(mylist)
What about this:
>>> mylist = [u'a',u'b',u'c']
>>> str(tuple(map(str, mylist)))
"('a', 'b', 'c')"
Try this:
result = "({})".format(",".join(["'{}'".format(char) for char in mylist]))
>>> l = [u'a', u'b', u'c']
>>> str(tuple([str(e) for e in l]))
"('a', 'b', 'c')"
Calling str on each element e of the list l will turn the Unicode string into a raw string. Next, calling tuple on the result of the list comprehension will replace the square brackets with parentheses. Finally, calling str on the result of that should return the list of elements with the single quotes enclosed in parentheses.
What about repr()?
>>> repr(tuple(mylist))
"(u'a', u'b', u'c')"
More info on repr()
Here is another variation:
mylist = [u'a',u'b',u'c']
result = "\"{0}\"".format(tuple(mylist))
print(result)
Output:
"('a', 'b', 'c')"
File abc's content:
a
b
c
The code is
data_fh = open("abc")
str = data_fh.read()
arr = str.split("\n")
print len(arr)
data_fh.seek(0)
arr = data_fh.read().splitlines()
print len(arr)
but the output is:
4
3
so why is that?
Because .splitlines() does not include the empty line at the end, while .split('\n') returns an empty string for the last ...\n:
>>> 'last\n'.split('\n')
['last', '']
>>> 'last\n'.splitlines()
['last']
This is explicitly mentioned in the str.splitlines() documentation:
Unlike split() when a delimiter string sep is given, this method returns an empty list for the empty string, and a terminal line break does not result in an extra line.
If there is no trailing newline, the output is identical:
>>> 'last'.split('\n')
['last']
>>> 'last'.splitlines()
['last']
In other words, str.split() doesn't add anything, but str.splitlines() does remove.
You probably have a trailing newline:
>>> s = 'a\nb\nc\n' # <-- notice the \n at the end
>>>
>>> s.split('\n')
['a', 'b', 'c', '']
>>>
>>> s.splitlines()
['a', 'b', 'c']
Notice that split() leaves an empty string at the end whereas splitlines() does not.
As an aside, you shouldn't use str as a variable name since that's already taken by a built-in function.