String concatenation produces incorrect output in Python? - python

I have this code:
filenames=["file1","FILE2","file3","fiLe4"]
def alignfilenames():
#build a string that can be used to add labels to the R variables.
#format goal: suffixes=c(".fileA",".fileB")
filestring='suffixes=c(".'
for filename in filenames:
filestring=filestring+str(filename)+'",".'
print filestring[:-3]
#now delete the extra characters
filestring=filestring[-1:-4]
filestring=filestring+')'
print "New String"
print str(filestring)
alignfilenames()
I'm trying to get the string variable to look like this format: suffixes=c(".fileA",".fileB".....) but adding on the final parenthesis is not working. When I run this code as is, I get:
suffixes=c(".file1",".FILE2",".file3",".fiLe4"
New String
)
Any idea what's going on or how to fix it?

Does this do what you want?
>>> filenames=["file1","FILE2","file3","fiLe4"]
>>> c = "suffixes=c(%s)" % (",".join('".%s"' %f for f in filenames))
>>> c
'suffixes=c(".file1",".FILE2",".file3",".fiLe4")'
Using a string.join is a much better way to add a common delimiter to a list of items. It negates the need to have to check for being on the last item before adding the delimiter, or in your case attempting to strip off the last one added.
Also, you may want to look into List Comprehensions

It looks like you might be trying to use python to write an R script, which can be a quick solution if you don't know how to do it in R. But in this case the R-only solution is actually rather simple:
R> filenames= c("file1","FILE2","file3","fiLe4")
R> suffixes <- paste(".", tolower(filenames), sep="")
R> suffixes
[1] ".file1" ".file2" ".file3" ".file4"
R>

What's going on is that this slicing returns an empty string
filestring=filestring[-1:-4]
Because the end is before the begin. Try the following on the command line:
>>> a = "hello world"
>>> a[-1:-4]
''
The solution is to instead do
filestring=filestring[:-4]+filestring[-1:]
But I think what you actually wanted was to just drop the last three characters.
filestring=filestring[:-3]
The better solution is to use the join method of strings as sberry2A suggested

Related

How can I delete comma at the end of the output in Python?

I am trying to order a word's letters by alphabetically in Python. But there is a comma at the end of the output.(I tried ''.sort() command, it worked well but there is square brackets at the beginning and at the end of the output). The input and the output must be like this:
word
'd','o','r','w'
This is my code:
alphabet='AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
word=str(input())
for i in alphabet:
for j in word:
if i==j:
print("'{}',".format(i),end='')
And this is my output:
word
'd','o','r','w',
Python strings have a join() function:
ls = ['a','b','c']
print(",".join(ls)) # prints "a,b,c"
Python also has what is called a 'list comprehension', that you can use like so:
alphabet='AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
word=str(input())
matches = [l for l in word if l in alphabet]
print(",".join(sorted(matches)))
All the list comprehension does is put l in the list if it is in alphabet. All the candidate ls are taken from the word variable.
sorted is a function that will do a simple sort (though more complex sorts are possible).
Finally; here are a few other fun options that all result in "a,b,c,d":
"a,b,c,d,"[:-1] . # list-slice
"a,b,c,d,".strip(",") . # String strip
you store it in an array and then print it at the end
alphabet='AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
word=str(input())
matches = []
for i in alphabet:
for j in word:
if i==j:
matches.append("'{i}',".format(i=i))
#now that matches has all our matches
print(",".join(arrayX) # join it
or as others have mentioned
print(",".join(sorted(word)))
You want to use the string.join() function.
alphabet='AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
','.join(alphabet)
There's really no need to anything to make the string into a list, join will iterate over it quite happily. Tried on python 2.7 and 3.6
Doing it your self
The trick is in the algorithm you use.
You want to add a comma and a space, after each field, except the last. But it is hard to know which is the last, until it is too late.
It would be much easier if you could make the first field the special case, as this is mach easier to predict.
Therefore transform the algorithm to: Add a comma and a space, before each field, except the first. This produces the same output, but is a much simpler algorithm.
Use a library
Using a library is always preferable (unless doing it just for the practice).
python has the join method. See other answers.

Why is the split() returning list objects that are empty? [duplicate]

I have the following file names that exhibit this pattern:
000014_L_20111007T084734-20111008T023142.txt
000014_U_20111007T084734-20111008T023142.txt
...
I want to extract the middle two time stamp parts after the second underscore '_' and before '.txt'. So I used the following Python regex string split:
time_info = re.split('^[0-9]+_[LU]_|-|\.txt$', f)
But this gives me two extra empty strings in the returned list:
time_info=['', '20111007T084734', '20111008T023142', '']
How do I get only the two time stamp information? i.e. I want:
time_info=['20111007T084734', '20111008T023142']
I'm no Python expert but maybe you could just remove the empty strings from your list?
str_list = re.split('^[0-9]+_[LU]_|-|\.txt$', f)
time_info = filter(None, str_list)
Don't use re.split(), use the groups() method of regex Match/SRE_Match objects.
>>> f = '000014_L_20111007T084734-20111008T023142.txt'
>>> time_info = re.search(r'[LU]_(\w+)-(\w+)\.', f).groups()
>>> time_info
('20111007T084734', '20111008T023142')
You can even name the capturing groups and retrieve them in a dict, though you use groupdict() rather than groups() for that. (The regex pattern for such a case would be something like r'[LU]_(?P<groupA>\w+)-(?P<groupB>\w+)\.')
If the timestamps are always after the second _ then you can use str.split and str.strip:
>>> strs = "000014_L_20111007T084734-20111008T023142.txt"
>>> strs.strip(".txt").split("_",2)[-1].split("-")
['20111007T084734', '20111008T023142']
Since this came up on google and for completeness, try using re.findall as an alternative!
This does require a little re-thinking, but it still returns a list of matches like split does. This makes it a nice drop-in replacement for some existing code and gets rid of the unwanted text. Pair it with lookaheads and/or lookbehinds and you get very similar behavior.
Yes, this is a bit of a "you're asking the wrong question" answer and doesn't use re.split(). It does solve the underlying issue- your list of matches suddenly have zero-length strings in it and you don't want that.
>>> f='000014_L_20111007T084734-20111008T023142.txt'
>>> f[10:-4].split('-')
['0111007T084734', '20111008T023142']
or, somewhat more general:
>>> f[f.rfind('_')+1:-4].split('-')
['20111007T084734', '20111008T023142']

Python split a string at an underscore

How do I split a string at the second underscore in Python so that I get something like this
name = this_is_my_name_and_its_cool
split name so I get this ["this_is", "my_name_and_its_cool"]
the following statement will split name into a list of strings
a=name.split("_")
you can combine whatever strings you want using join, in this case using the first two words
b="_".join(a[:2])
c="_".join(a[2:])
maybe you can write a small function that takes as argument the number of words (n) after which you want to split
def func(name, n):
a=name.split("_")
b="_".join(a[:n])
c="_".join(a[n:])
return [b,c]
Assuming that you have a string with multiple instances of the same delimiter and you want to split at the nth delimiter, ignoring the others.
Here's a solution using just split and join, without complicated regular expressions. This might be a bit easier to adapt to other delimiters and particularly other values of n.
def split_at(s, c, n):
words = s.split(c)
return c.join(words[:n]), c.join(words[n:])
Example:
>>> split_at('this_is_my_name_and_its_cool', '_', 2)
('this_is', 'my_name_and_its_cool')
I think you're trying the split the string based on second underscore. If yes, then you used use findall function.
>>> import re
>>> s = "this_is_my_name_and_its_cool"
>>> re.findall(r'^[^_]*_[^_]*|[^_].*$', s)
['this_is', 'my_name_and_its_cool']
>>> [i for i in re.findall(r'^[^_]*_[^_]*|(?!_).*$', s) if i]
['this_is', 'my_name_and_its_cool']
print re.split(r"(^[^_]+_[^_]+)_","this_is_my_name_and_its_cool")
Try this.
Here's a quick & dirty way to do it:
s = 'this_is_my_name_and_its_cool'
i = s.find('_'); i = s.find('_', i+1)
print [s[:i], s[i+1:]]
output
['this_is', 'my_name_and_its_cool']
You could generalize this approach to split on the nth separator by putting the find() into a loop.

how to get the same required string with better and shorter way

s = 'myName.Country.myHeight'
required = s.split('.')[0]+'.'+s.split('.')[1]
print required
myName.Country
How can I get the same 'required' string with better and shorter way?
Use str.rpartition like this
s = 'myName.Country.myHeight'
print s.rpartition(".")[0]
# myName.Country
rpartition returns a three element tuple,
1st element being the string before the separator
then the separator itself
and the the string after the separator
So, in our case,
s = 'myName.Country.myHeight'
print s.rpartition(".")
# ('myName.Country', '.', 'myHeight')
And we have picked only the first element.
Note: If you want to do it from the left, instead of doing it from the right, we have a sister function called str.partition.
You have a few options.
1
print s.rsplit('.',1)[0]
2
print s[:s.rfind('.')]
3
print s.rpartition('.')[0]
Well, that seems just fine to me... But here are a few other ways I can think of :
required = ".".join(s.split(".")[0:2]) // only one split
// using regular expressions
import re
required = re.sub(r"\.[^\.]$", "", s)
The regex only works if there are no dots in the last part you want to split off.

How do I strip a string given a list of unwanted characters? Python

Is there a way to pass in a list instead of a char to str.strip() in python? I have been doing it this way:
unwanted = [c for c in '!##$%^&*(FGHJKmn']
s = 'FFFFoFob*&%ar**^'
for u in unwanted:
s = s.strip(u)
print s
Desired output, this output is correct but there should be some sort of a more elegant way than how i'm coding it above:
oFob*&%ar
Strip and friends take a string representing a set of characters, so you can skip the loop:
>>> s = 'FFFFoFob*&%ar**^'
>>> s.strip('!##$%^&*(FGHJKmn')
'oFob*&%ar'
(the downside of this is that things like fn.rstrip(".png") seems to work for many filenames, but doesn't really work)
Since, you are looking to not delete elements from the middle, you can just use.
>>> 'FFFFoFob*&%ar**^'.strip('!##$%^&*(FGHJKmn')
'oFob*&%ar'
Otherwise, Use str.translate().
>>> 'FFFFoFob*&%ar**^'.translate(None, '!##$%^&*(FGHJKmn')
'oobar'

Categories

Resources