changing file name and extension using loop - python

I need to change the name and extension of a series of files. The names are currently 'tmax.##.txt', but I need it to be 'tmax_##.txt'. Then, I want to change the .txt extension to .asc. I've tried the below code and the first loop works as expected to produce 'tmax_01'. The second loop runs, but produces unexpected results, 't'.
list_raw = 'tmax.01.txt', 'tmax.02.txt', 'tmax.03.txt'
for i in list_raw:
list_conv = i.replace('.','_')
for i in list_conv:
list_final = i.replace('_txt','.asc')
Any suggestions?

You are just assigning new values to a variable in each iteration of the loop. What you want to do is create a new list from the modified elements of an existing list, which is best done with a list comprehension:
list_raw = ['tmax.01.txt', 'tmax.02.txt', 'tmax.03.txt']
list_final = [i.replace(".", "_").replace("_txt", ".asc") for i in list_raw]
Note that you can do this, as in my example, in one step - there is no reason to iterate over the list twice, and produce an intermediate list, which is inefficient.
You could also do i.replace(".", "_", 1) to only replace the first ., and avoid having to do the awkward hack with the file extension. However, I would personally use i[:-4].replace(".", "_") + ".asc" - that is, cut off the existing extension with a slice, replace the .s, and then add the new extension.
If the extensions are likely to vary in length, you may want to look into the os.path module, as suggested by sotapme.

Because you're talking of files it may be worth using os.path as it's likely that the next part of your code will be to manipulate these or other files. (just guessing)
os.path.splitext('afile.txt')[0] + '.asc'
Gives
'afile.asc'

In the first loop: -
for i in list_raw:
list_conv = i.replace('.','_')
Your list_conv contains a str object. And it will contain the last element in the list with the appropriate replacement.
Then in your 2nd loop: -
for i in list_conv:
list_final = i.replace('_txt','.asc')
You are just iterating over string sequence, which will give you 1 character at a time. And list_final will contain the last character with the appropriate replacement done.
Since the last character in tmax_03_txt is t, that is why you got t.
If you want to do the replacement on each element of the list, then you can use list comprehension, and chaning of method invocation: -
>>> list_raw = ['tmax.01.txt', 'tmax.02.txt', 'tmax.03.txt']
>>> [elem.replace('.', '_').replace('_txt', '.asc') for elem in list_raw]
16: ['tmax_01.asc', 'tmax_02.asc', 'tmax_03.asc']

Alternately you could use the string method rsplit.
list_raw = ['tmax.01.txt', 'tmax.02.txt', 'tmax.03.txt']
list_final = [filename.rsplit('.',1)[0] + '.ext' for filename in list_raw]
Where ext is the new extension. The 1 in rsplit() indicates that only the rightmost '.' will act as split point.

Related

Editing String Objects in a List in Python

I have read in data from a basic txt file. The data is time and date in this form "DD/HHMM" (meteorological date and time data). I have read this data into a list: time[]. It prints out as you would imagine like so: ['15/1056', '15/0956', '15/0856', .........]. Is there a way to alter the list so that it ends up just having the time, basically removing the date and the forward slash, like so: ['1056', '0956', '0856',.........]? I have already tried list.split but thats not how that works I don't think. Thanks.
I'm still learning myself and I haven't touched python in sometime, BUT, my solution if you really need one:
myList = ['15/1056', '15/0956', '15/0856']
newList = []
for x in mylist:
newList.append(x.split("/")[1])
# splits at '/'
# returns ["15", "1056"]
# then appends w/e is at index 1
print(newList) # for verification

Get selected node names into a list or tuple in Nuke with Python

I am trying to obtain a list of the names of selected nodes with Python in Nuke.
I have tried:
for s in nuke.selectedNodes():
n = s['name'].value()
print n
This gives me the names of the selected nodes, but as separate strings.
There is nothing I can do to them that will combine each string. If I
have three Merges selected, in the Nuke script editor I get:
Result: Merge3
Merge2
Merge1
If I wrap the last variable n in brackets, I get:
Result: ['Merge3']
['Merge2']
['Merge1']
That's how I know they are separate strings. I found one other way to
return selected nodes. I used:
s = nuke.tcl("selected_nodes")
print s
I get odd names back like node3a7c000, but these names work in anything
that calls a node, like nuke.toNode() and they are all on one line. I
tried to force these results into a list or a tuple, like so:
s = nuke.tcl("selected_nodes")
print s
Result: node3a7c000 node3a7c400 node3a7c800
s = nuke.tcl("selected_nodes")
s2 = s.replace(" ","', '")
s3 = "(" + "'" + s2 + "'" + ")"
print s3
Result: ('node3a7c000', 'node3a7c400', 'node3a7c800')
My result looks to have the standard construct of a tuple, but if I try
to call the first value from the tuple, I get a parentheses back. This
is as if my created tuple is still a string.
Is there anything I can do to gather a list or tuple of selected nodes
names? I'm not sure what I am doing wrong and it seems that my last
solution should have worked.
As you iterate over each node, you'll want to add its name to a list ([]), and then return that. For instance:
names = []
for s in nuke.selectedNodes():
n = s['name'].value()
names.append(n)
print names
This will give you:
# Result: ['Merge3', 'Merge2', 'Merge1']
If you're familiar with list comprehensions, you can also use one to make names in one line:
names = [s['name'].value() for s in nuke.selectedNodes()]
nodename = list()
for node in nuke.selectedNodes():
nodename.append(node.name())

Creating a list of dictionaries

I have code that generates a list of 28 dictionaries. It cycles thru 28 files and links data points from each file in the appropriate dictionary. In order to make my code more flexible I wanted to use:
tegDics = [dict() for x in range(len(files))]
But when I run the code the first 27 dictionaries are blank and only the last, tegDics[27], has data. Below is the code including the clumsy, yet functional, code I'm having to use that generates the dictionaries:
x=0
import os
files=os.listdir("DirPath")
os.chdir("DirPath")
tegDics = [{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}] # THIS WORKS!!!
#tegDics = [dict() for x in range(len(files))] - THIS WON'T WORK!!!
allRads=[]
while x<len(tegDics): # now builds dictionaries
for line in open(files[x]):
z=line.split('\t')
allRads.append(z[2])
tegDics[x][z[2]]=z[4] # pairs catNo with locNo
x+=1
Does anybody know why the more elegant code doesn't work.
Since you're using x within the list comprehension, it will no longer be zero by the time you reach the while loop - it will be len(files)-1 instead. I suggest changing the variable you use to something else. It's traditional to use a single underscore for a value you don't care about.
tegDics = [dict() for _ in range(len(files))]
It could be useful to eliminate your use of x entirely. It's customary in python to iterate directly over the objects in a sequence, rather than using a counter variable. You might do something like:
for tegDic in tegDics:
#do stuff with tegDic here
Although it's slightly trickier in your case, since you want to simultaneously iterate through tegDics and files at the same time. You can use zip to do that.
import os
files=os.listdir("DirPath")
os.chdir("DirPath")
tegDics = [dict() for _ in range(len(files))]
allRads=[]
for file, tegDic in zip(files,tegDics):
for line in open(file):
z=line.split('\t')
allRads.append(z[2])
tegDic[z[2]]=z[4] # pairs catNo with locNo
Anyway there is a simplest way imho:
taegDics = [{}]*len(files)

use slice in for loop to build a list

I would like to build up a list using a for loop and am trying to use a slice notation. My desired output would be a list with the structure:
known_result[i] = (record.query_id, (align.title, align.title,align.title....))
However I am having trouble getting the slice operator to work:
knowns = "output.xml"
i=0
for record in NCBIXML.parse(open(knowns)):
known_results[i] = record.query_id
known_results[i][1] = (align.title for align in record.alignment)
i+=1
which results in:
list assignment index out of range.
I am iterating through a series of sequences using BioPython's NCBIXML module but the problem is adding to the list. Does anyone have an idea on how to build up the desired list either by changing the use of the slice or through another method?
thanks zach cp
(crossposted at [Biostar])1
You cannot assign a value to a list at an index that doesn't exist. The way to add an element (at the end of the list, which is the common use case) is to use the .append method of the list.
In your case, the lines
known_results[i] = record.query_id
known_results[i][1] = (align.title for align in record.alignment)
Should probably be changed to
element=(record.query_id, tuple(align.title for align in record.alignment))
known_results.append(element)
Warning: The code above is untested, so might contain bugs. But the idea behind it should work.
Use:
for record in NCBIXML.parse(open(knowns)):
known_results[i] = (record.query_id, None)
known_results[i][1] = (align.title for align in record.alignment)
i+=1
If i get you right you want to assign every record.query_id one or more matching align.title. So i guess your query_ids are unique and those unique ids are related to some titles. If so, i would suggest a dictionary instead of a list.
A dictionary consists of a key (e.g. record.quer_id) and value(s) (e.g. a list of align.title)
catalog = {}
for record in NCBIXML.parse(open(knowns)):
catalog[record.query_id] = [align.title for align in record.alignment]
To access this catalog you could either iterate through:
for query_id in catalog:
print catalog[query_id] # returns the title-list for the actual key
or you could access them directly if you know what your looking for.
query_id = XYZ_Whatever
print catalog[query_id]

Python: problem with list append

Here is my code -
cumulative_nodes_found_list = []
cumulative_nodes_found_total_list = []
no_of_runs = 10
count = 0
while count < no_of_runs:
#My program code
print 'cumulative_nodes_found_list - ' + str(cumulative_nodes_found_list)
cumulative_nodes_found_total_list.insert(count,cumulative_nodes_found_list)
print 'cumulative_nodes_found_total_list - ' + str(cumulative_nodes_found_total_list)
count = count + 1
Here is a part of the output -
#count = 0
cumulative_nodes_found_list - [0.0, 0.4693999, 0.6482, 0.6927999999, 0.7208999999, 0.7561999999, 0.783399999, 0.813999999, 0.8300999999, 0.8498, 0.8621999999]
cumulative_nodes_found_total_list - [[0.0, 0.4693999, 0.6482, 0.6927999999, 0.7208999999, 0.7561999999, 0.783399999, 0.813999999, 0.8300999999, 0.8498, 0.8621999999]]
#count = 1
cumulative_nodes_found_list - [0.0, 0.55979999999999996, 0.66220000000000001, 0.69479999999999997, 0.72040000000000004, 0.75380000000000003, 0.77629999999999999, 0.79679999999999995, 0.82979999999999998, 0.84850000000000003, 0.85760000000000003]
cumulative_nodes_found_total_list -[[0.0, 0.55979999999999996, 0.66220000000000001, 0.69479999999999997, 0.72040000000000004, 0.75380000000000003, 0.77629999999999999, 0.79679999999999995, 0.82979999999999998, 0.84850000000000003, 0.85760000000000003],
[0.0, 0.55979999999999996, 0.66220000000000001, 0.69479999999999997, 0.72040000000000004, 0.75380000000000003, 0.77629999999999999, 0.79679999999999995, 0.82979999999999998, 0.84850000000000003, 0.85760000000000003]]
As the new item is appended the old item is replaced by new item. This trend continues.
Can anyone tell me why this is happening. I have tried using 'append' in place of insert but got the same output. However when I use 'extend' I get the correct output but I need inner items as lists which I dont get with extend.
You need to rebind cumulative_nodes_found_list at the beginning of the loop, instead of just clearing it.
This is psychic debugging at its best, since you're effectively asking "what is wrong with my code, which I'm not going to show to you".
All I can do is assume.
I'm assuming you're re-using the array objects in memory.
In other words, you do something like this:
list1.insert(0, list2)
list2.clear()
list2.append(10)
list2.append(15)
list1.insert(0, list2)
Since list1 points to the same array/list the whole time, and you're adding a reference to the object, and not a copy of it, later changes will make it appear your copy changed.
In other words, the result of the code above is going to be:
[[10, 15], [10, 15]]
regardless of what was in the list before you added it the first time.
Try assigning the changing list a new, empty, object each time you enter the loop body and see if that fixes anything.
You are adding a reference to cumulative_nodes_found_list to the cumulative_nodes_found_total_list, but it's the same reference each time. Move this line into the loop body:
cumulative_nodes_found_list = []
Lists are mutable objects. You're mutating cumulative_nodes_found_list inside your code, so the object added to your total list in the previous run is also mutated, because they are the same object.
Either make a copy to insert in the total:
for count in xrange(no_of_runs):
# ...
cumulative_nodes_found_total_list.append(list(cumulative_nodes_found_list))
... or reset the list on each iteration:
for count in xrange(no_of_runs):
cumulative_nodes_found_list = [] # creates a NEW list for this iteration
# ...
cumulative_nodes_found_total_list.append(cumulative_nodes_found_list)
I believe the problem is in the rest of your program code.
The items in cummulative_nodes_found_list is being replaced in-place each time through the loop.
I assume you're doing something like this:
while count < no_of_runs:
cummulative_nodes_found_list.clear()
#fill up the list with values using whatever program logic you have
cummulative_nodes_found_list.append(1.1)
cummulative_nodes_found_list.append(2.1)
print 'cumulative_nodes_found_list - ' + str(cumulative_nodes_found_list)
cumulative_nodes_found_total_list.insert(count,cumulative_nodes_found_list)
print 'cumulative_nodes_found_total_list - ' + str(cumulative_nodes_found_total_list)
count = count + 1
if this is, infact, what you're doing, then instead of using 'clear()' to clear the list, create a new one:
ie, replace cummulative_nodes_found_list.clear() with
cummulative_nodes_found_list = []
My guess is that you are not assigning the cumulative_nodes_found_list to be a new list each time, but updating its contents instead. So each time around the loop you are adding the same list reference to the total list. Since the reference within the totals list is the same object, when you update this list the next time around the loop, it affects what you hoped was the last loops values.
If you want to append to a list, use mylist.append(item) instead.
Also, if you iterate a fixed number of times it's better to use a for loop:
for i in range(no_of_runs):
# do stuff
The idea is, that range(no_of_runs) generates the list [0, 1, 2, ..., 10] for no_of_runs = 10 and the loop then iterates over its values.
Edit: this doesn't solve the problem. Other answers in this thread do, however. It's just a comment on style.
This method worked for me. Just like you, I was trying to append/insert a list into another list.
cumulative_nodes_found_total_list.insert(count,cumulative_nodes_found_list)
But the old values were being appended by the new values. So instead I tried this -
cumulative_nodes_found_total_list.insert(count,cumulative_nodes_found_list[:])
"Assignment statements in Python do not copy objects, they create
bindings between a target and an object."
Use deepcopy (or copy)

Categories

Resources