Related
I'm new to Python and relatively new to programming. I'm trying to replace part of a file path with a different file path. If possible, I'd like to avoid regex as I don't know it. If not, I understand.
I want an item in the Python list [] before the word PROGRAM to be replaced with the 'replaceWith' variable.
How would you go about doing this?
Current Python List []
item1ToReplace1 = \\server\drive\BusinessFolder\PROGRAM\New\new.vb
item1ToReplace2 = \\server\drive\BusinessFolder\PROGRAM\old\old.vb
Variable to replace part of the Python list path
replaceWith = 'C:\ProgramFiles\Microsoft\PROGRAM'
Desired results for Python List []:
item1ToReplace1 = C:\ProgramFiles\Micosoft\PROGRAM\New\new.vb
item1ToReplace2 = C:\ProgramFiles\Micosoft\PROGRAM\old\old.vb
Thank you for your help.
The following code does what you ask, note I updated your '' to '\', you probably need to account for the backslash in your code since it is used as an escape character in python.
import os
item1ToReplace1 = '\\server\\drive\\BusinessFolder\\PROGRAM\\New\\new.vb'
item1ToReplace2 = '\\server\\drive\\BusinessFolder\\PROGRAM\\old\\old.vb'
replaceWith = 'C:\ProgramFiles\Microsoft\PROGRAM'
keyword = "PROGRAM\\"
def replacer(rp, s, kw):
ss = s.split(kw,1)
if (len(ss) > 1):
tail = ss[1]
return os.path.join(rp, tail)
else:
return ""
print(replacer(replaceWith, item1ToReplace1, keyword))
print(replacer(replaceWith, item1ToReplace2, keyword))
The code splits on your keyword and puts that on the back of the string you want.
If your keyword is not in the string, your result will be an empty string.
Result:
C:\ProgramFiles\Microsoft\PROGRAM\New\new.vb
C:\ProgramFiles\Microsoft\PROGRAM\old\old.vb
One way would be:
item_ls = item1ToReplace1.split("\\")
idx = item_ls.index("PROGRAM")
result = ["C:", "ProgramFiles", "Micosoft"] + item_ls[idx:]
result = "\\".join(result)
Resulting in:
>>> item1ToReplace1 = r"\\server\drive\BusinessFolder\PROGRAM\New\new.vb"
... # the above
>>> result
'C:\ProgramFiles\Micosoft\PROGRAM\New\new.vb'
Note the use of r"..." in order to avoid needing to have to 'escape the escape characters' of your input (i.e. the \). Also that the join/split requires you to escape these characters with a double backslash.
I have a list of lists of sequences, and a corresponding list of lists of names.
testSequences = [
['aaaa', 'cccc'],
['tt', 'gg'],
['AAAAAAA', 'CCCCCC', 'TTTTTT', 'GGGGGG']]
testNames = [
['>xx_oneFish |xzx', '>xx_twoFish |zzx'],
['>xx_redFish |zxx', '>xx_blueFish |zxx'],
['>xx_oneFish |xzx', '>xx_twoFish |xzx', '>xz_redFish |xxx', '>zx_blueFish |xzz']]
I also have a list of all the identifying parts of the names:
taxonNames = ['oneFish', 'twoFish', 'redFish', 'blueFish']
I am trying to produce a new list, where each item in the list will correspond to one of the "identifying parts of the names", and the string will be made up of all the sequences for that name.
If a name and sequence does not appear in one of the lists in the lists (i.e. no redFish or blueFish in the first list of testNames) I want to add in a string of hyphens the same length as the sequences in that list. This would give me this output:
['aaaa--AAAAAA', 'cccc--CCCCCC', '----ttTTTTTT', '----ggGGGG']
I have this piece of code to do this.
complete = [''] * len(taxonNames)
for i in range(len(testSequences)):
for j in range(len(taxonNames)):
sequenceLength = len(testSequences[i][0])
for k in range(len(testSequences[i])):
if taxonNames[j] in testNames[i][k]:
complete[j].join(testSequences[i][k])
if taxonNames[j] not in testNames[i][k]:
hyphenString = "-" * sequenceLength
complete[j].join(hyphenString)
print complete
"complete" should give my final output as explained above, but it comes out looking like this:
['', '', '', '']
How can I fix my code to give me the correct answer?
The main issue with your code, which makes it very hard to understand, is you're not really leveraging the language elements that make Python so strong.
Here's a solution to your problem that works:
test_sequences = [
['aaaa', 'cccc'],
['tt', 'gg'],
['AAAAAAA', 'CCCCCC', 'TTTTTT', 'GGGGGG']]
test_names = [
['>xx_oneFish |xzx', '>xx_twoFish |zzx'],
['>xx_redFish |zxx', '>xx_blueFish |zxx'],
['>xx_oneFish |xzx', '>xx_twoFish |xzx', '>xz_redFish |xxx', '>zx_blueFish |xzz']]
taxon_names = ['oneFish', 'twoFish', 'redFish', 'blueFish']
def get_seqs(taxon_name, sequences_list, names_list):
for seqs, names in zip(sequences_list, names_list):
found_seq = None
for seq, name in zip(seqs, names):
if taxon_name in name:
found_seq = seq
break
yield found_seq if found_seq else '-' * len(seqs[0])
result = [''.join(get_seqs(taxon_name, test_sequences, test_names))
for taxon_name in taxon_names]
print(result)
The generator get_seqs pairs up lists from test_sequences and test_names and for each pair, tries to find the sequence (seq) for the name (name) that matches and yields it, or yields a string of the right number of hyphens for that list of sequences.
The generator (a function that yields multiple values) has code that quite literally follows the explanation above.
The result is then simply a matter of, for each taxon_name, getting all the resulting sequences that match in order and joining them together into a string, which is the result = ... line.
You could make it work with list indexing loops and string concatenation, but this is not a PHP question, now is it? :)
Note: for brevity, you could just access the global test_sequences and test_names instead of passing them in as parameters, but I think that would come back to haunt you if you were to actually use this code. Also, I think it makes semantic sense to change the order of names and sequences in the entire example, but I didn't to avoid further deviating from your example.
Here is a solution that may do what you want. It begins, not with your data structures from this post, but with the three example files from your previous post (which you used to build this post's data structures).
The only thing I couldn't figure out was how many hyphens to use for a missing sequence from a file.
differentNames = ['oneFish', 'twoFish', 'redFish', 'blueFish']
files = ['f1.txt', 'f2.txt', 'f3.txt']
data = [[] for _ in range(len(differentNames))]
final = []
for file in files:
d = dict()
with open(file, 'r') as fin:
for line in fin:
line = line.rstrip()
if line.startswith('>'): # for ex., >xx_oneFish |xxx
underscore = line.index('_')
space = line.index(' ')
key = line[underscore+1:space]
else:
d[key] = line
for i, key in enumerate(differentNames):
data[i].append(d.get(key, '-' * 4))
for array in data:
final.append(''.join(array))
print(final)
Prints:
['AAAAAAAaaaa----', 'CCCCCCcccc----', 'TTTTTT----tt', 'GGGGGG----gg']
Let's say I have a ton of HTML with no newlines. I want to get each element into a list.
input = "<head><title>Example Title</title></head>"
a_list = ["<head>", "<title>Example Title</title>", "</head>"]
Something like such. Splitting between each ><.
But in Python, I don't know of a way to do that. I can only split at that string, which removes it from the output. I want to keep it, and split between the two equality operators.
How can this be done?
Edit: Preferably, this would be done without adding the characters back in to the ends of each list item.
# initial input
a = "<head><title>Example Title</title></head>"
# split list
b = a.split('><')
# remove extra character from first and last elements
# because the split only removes >< pairs.
b[0] = b[0][1:]
b[-1] = b[-1][:-1]
# initialize new list
a_list = []
# fill new list with formatted elements
for i in range(len(b)):
a_list.append('<{}>'.format(b[i]))
This will output the given list in python 2.7.2, but it should work in python 3 as well.
You can try this:
import re
a = "<head><title>Example Title</title></head>"
data = re.split("><", a)
new_data = [data[0]+">"]+["<" + i+">" for i in data[1:-1]] + ["<"+data[-1]]
Output:
['<head>', '<title>Example Title</title>', '</head>']
The shortest approach using re.findall() function on extended example:
# extended html string
s = "<head><title>Example Title</title></head><body>hello, <b>Python</b></body>"
result = re.findall(r'(<[^>]+>[^<>]+</[^>]+>|<[^>]+>)', s)
print(result)
The output:
['<head>', '<title>Example Title</title>', '</head>', '<body>', '<b>Python</b>', '</body>']
Based on the answers by other people, I made this.
It isn't as clean as I had wanted, but it seems to work. I had originally wanted to not re-add the characters after split.
Here, I got rid of one extra argument by combining the two characters into a string. Anyways,
def split_between(string, chars):
if len(chars) is not 2: raise IndexError("Argument chars must contain two characters.")
result_list = [chars[1] + line + chars[0] for line in string.split(chars)]
result_list[0] = result_list[0][1:]
result_list[-1] = result_list[-1][:-1]
return result_list
Credit goes to #cforemanand #Ajax1234.
Or even simpler, this:
input = "<head><title>Example Title</title></head>"
print(['<'+elem if elem[0]!='<' else elem for elem in [elem+'>' if elem[-1]!='>' else elem for elem in input.split('><') ]])
I am trying to match elements from two lists and write it to a file, match columns from both the files col[0] and print certain columns in to a new file
with open('~/gf_out.txt', 'w') as w:
w.write('\t'.join(headers) + '\n')
for i in d1: #list1
for j in d2: # list2
if i[0] == j[0]:
out = ((j[0:10]),i[1],i[2],j[11],j[12])
# print out
w.write('\t'.join(out) + '\n')
TypeError: sequence item 0: expected string, list found
if out changed to
out = (str(j[0:10]),i[1],i[2],j[11],j[12])
the final output would have [ ] around the first 10 columns, how can this be fixed
ANALYSIS
Your problem is right where the error messge (certainly) told and, and just what it described ... once you're comfortable enough with Python to interpret the description.
out = ((j[0:10]),i[1],i[2],j[11],j[12])
w.write('\t'.join(out) + '\n')
join operates on a sequence of strings. You gave it a sequence, but the first element of that is the tuple (j[0:10]).
REMEDY
You have nested lists, so you need nested joins.
sep = '\t' # separator
out_0 = sep.join(j[0:10])
out_line = sep.join(out_0,,i[1],i[2],j[11],j[12])
w.write(out_line)
Yes, you can recombine this to a single-line write; I broke it down to make the logic clear.
If this doesn't match your needs, then please provide the required MCVE to clarify the problems.
What exactly are you wanting it to do? j[0:10] is a list, so if you want to convert it to a string, it will have square brackets. if you want those lements to be joind by tabs as well, you need to either do that explicitly or join it to the other list instead of embedding it.
out = ('\t'.join(j[0:10]),i[1],i[2],j[11],j[12])
or
out = j[0:10] + [i[1],i[2],j[11],j[12]]
I'd like to make the 'pyparsing' parsing result come out as a dictionary without neeing to post-process. For this, I need to define my own key strings. The following the best I could come up with that produces the desired results.
Line to parse:
%ADD22C,0.35X*%
Code:
import pyparsing as pyp
floatnum = pyp.Regex(r'([\d\.]+)')
comma = pyp.Literal(',').suppress()
cmd_app_def = pyp.Literal('AD').setParseAction(pyp.replaceWith('aperture-definition'))
cmd_app_def_opt_circ = pyp.Group(pyp.Literal('C') +
comma).setParseAction(pyp.replaceWith('circle'))
circular_apperture = pyp.Group(cmd_app_def_opt_circ +
pyp.Group(pyp.Empty().setParseAction(pyp.replaceWith('diameter')) + floatnum) +
pyp.Literal('X').suppress())
<the grammar for the entire line>
The result is:
['aperture-definition', '20', ['circle', ['diameter', '0.35']]]
What I consider a hack here is
pyp.Empty().setParseAction(pyp.replaceWith('diameter'))
which always matches and is empty, but then I assign my desired key name to it.
Is this the best way to do this? Am I abusing pyparsing to do something it's not meant to do?
If you want to name your floatnum as "diameter", you can use named results:
cmd_app_def_opt_circ = pyp.Group(pyp.Literal('C') +
comma)("circle")
circular_apperture = pyp.Group(cmd_app_def_opt_circ +
pyp.Group(floatnum)("diameter") +
pyp.Literal('X').suppress())
In this way, every time the parses encounters floatnum in the circular_appertur context, this result is named diameter. Also, as described above, you can name circle in the same fashion. Does this work for you?
See comments in the posted code.
import pyparsing as pyp
comma = pyp.Literal(',').suppress()
# use parse actions to do type conversion at parse time, so that results fields
# can immediately be used as ints or floats, without additional int() or float()
# calls
floatnum = pyp.Regex(r'([\d\.]+)').setParseAction(lambda t: float(t[0]))
integer = pyp.Word(pyp.nums).setParseAction(lambda t: int(t[0]))
# define the command keyword - I assume there will be other commands too, they
# should follow this general pattern (define the command keyword, then all the
# options, then define the overall command)
aperture_defn_command_keyword = pyp.Literal('AD')
# define a results name for the matched integer - I don't know what this
# option is, wasn't in your original post
d_option = 'D' + integer.setResultsName('D')
# shortcut for defining a results name is to use the expression as a
# callable, and pass the results name as the argument (I find this much
# cleaner and keeps the grammar definition from getting messy with lots
# of calls to setResultsName)
circular_aperture_defn = 'C' + comma + floatnum('diameter') + 'X'
# define the overall command
aperture_defn_command = aperture_defn_command_keyword("command") + d_option + pyp.Optional(circular_aperture_defn)
# use searchString to skip over '%'s and '*'s, gives us a ParseResults object
test = "%ADD22C,0.35X*%"
appData = aperture_defn_command.searchString(test)[0]
# ParseResults can be accessed directly just like a dict
print appData['command']
print appData['D']
print appData['diameter']
# or if you prefer attribute-style access to results names
print appData.command
print appData.D
print appData.diameter
# convert ParseResults to an actual Python dict, removes all unnamed tokens
print appData.asDict()
# dump() prints out the parsed tokens as a list, then all named results
print appData.dump()
Prints:
AD
22
0.35
AD
22
0.35
{'diameter': 0.34999999999999998, 'command': 'AD', 'D': 22}
['AD', 'D', 22, 'C', 0.34999999999999998, 'X']
- D: 22
- command: AD
- diameter: 0.35