Python: How to find duplicate folder names and rename them? - python

I'm running into some difficulties with python.
I have a code I'm using in conjunction with ArcGIS that is parsing filenames into a database to return the corresponding unique ID and to rename the folder with this unique ID.
It has been working great before, but I need to handle some exceptions, like when the Unique ID already exists within the directory, and when the action has already been completed on the directory. The unique id contains all numbers so I've been trying:
elif re.findall('[0-9]', fn):
Roll = string.join(string, "1")
print (Roll)
os.rename(os.path.join(basedir, fn),
os.path.join(basedir, Roll))
which returns all folders with a unique ID. I just can't figure out how to get a count of the number of times a specific folder name occurs in the directory.

I suspect you're making this way harder on yourself than you need to, but answering your immediate question:
folder_name_to_create = 'whatever'
if os.path.exists(folder_name_to_create):
folder_name_to_create += '_1'
If you are getting name collisions, I suspect you need to look at your "unique" naming algorithm, but maybe I'm misunderstanding what you mean by that.

add the name to a set and then check if it's in the set.

One way to do it might be the following: Create a dictionary whose keys are your folder names, and the value associated with each key is an integer, the number of occurrences of that name. Each time you process a folder, update the dictionary's keys/values appropriately. After you've added all the folder names in your set, check all the count values in the dictionary, and any time the count is > 1 you know you have a duplicate.
Or, if you need to detect duplicates as you go, just check whether the key already exists. In that case you don't really need the value at all, and you can use a set or list instead of a dict.
You could use collections.Counter to help you in this. You can see an example usage in this question. It shouldn't be too difficult to adapt that example to your needs.
Hope this helps.

Related

How to name subsets of a dataframe inside a loop

I'm having trouble naming the subsets I create inside a loop. I want to give each one the five first letters of the condition (or even just the iteration number) as a name but I haven't figured out how to.
Here's my code
list_mun=list(ensud21.NOM_MUN.unique())
for mun in list_mun:
name=ensud21[ensud21['NOM_MUN']== mun]
list_mun is a list with the unique values that a column of my dataframe can take. Inside the for loop I wrote name where I want what I explained before. I am unable to give each dataframe a different name. Thankyou!
You shouldn't try to set variable names dynamically. Use a container, a dictionary is perfect here:
list_mun=list(ensud21.NOM_MUN.unique())
out_dic = {}
for mun in list_mun:
# here we set "mun" as key
out_dict[mun] = ensud21[ensud21['NOM_MUN']== mun]
Then subsets with:
out_dic[the_mun_you_want]

Python: can a variable take two different strings?

Am I being dumb here?
I would like to know if I have a list called cons=['gps-ops', 'beidou'] and in my code there are different names for the same thing. i.e: 'gps-ops' = 'GPS' and 'beidou'='BDS'.
Some parts of the code (class) takes 'gps-ops' and some parts take 'GPS'. At the moment I have been using if and elif statements at different sections of the code.
Is there a way to say 'gps-ops' is also 'GPS' and vice versa, depending on how the user inputs the string throughout the code?
You could put every word for each valid name in a set like {'gps-ops', 'GPS'} and then go through your list in a for loop and check if that word is in the set:
gps_set = {'gps-ops', 'GPS'}
for name in cons:
if name in gps_set:
# do somehthing if true
Read the python docs for sets, they're quite an amazing data structure

How to search thousands of objects in S3 with Boto3 knowing just a partial key name (not prefix)?

I have an S3 bucket with roughly 10,000 objects in it. All objects share a similar naming pattern, except for the very end of the name. For example, a file has this pattern:
"Order_RandomValuesHere_UniqueIDHere.txt"
So, all files start with the word Order, followed by random data, and END with an ID value which I want to search on. I am trying to create a Python Lambda script that will take an ID as input parameter, and then search in S3 for the specific object that has this ID value in the name of the object, and pull back data from that object.
objectlist = s3.list_objects_v2(Bucket="My-Bucket", Prefix="OrderNumber")
The above works great to grab everything, assuming I have less than 1000 results. Then, I iterate through the keys looking for my ID in the Key, since the prefix parameter doesn't narrow down my list:
for _ in range(len(objectlist['Contents'])):
if myId in objectlist['Contents'][_]['Key']:
print("We have found our Order file! It's key is: " + objectlist['Contents'][_]['Key'])
However, the problem is that the list_objects_v2 method (and original list_objects) only returns the first 1000 results. How can I have my lambda function query against the entire 10,000+ results looking for the one file I know is there? (but just don't know full file name or unique prefix). The above works fine if your bucket contains less than 1000 objects.
Any suggestions are greatly appreciated! I know there is probably a different way to think about this problem but I'm running into mental roadblocks here...
Thanks!

Finding files with a name pattern

I need to find whether a file with specific pattern name is available in a current directory or not. I used the following code for this purpose.
H1 = []
for record_name in my_list:
file_name = 'RSN' + '_' + record_name[0:5] + '*' + record_name[-8:]
H1 += glob.glob(record_name)
It should be noted that I used the above method because in some cases there are some differences between the available record_name and the real name of the file available in the current directory. For example, the true name for one of my file is "RSN20148_BB40204628_KRPHHZ", while I have "20148_40204628_KRPHHZ" in my_list. Please note that the second one does not have "RSN" and "BB" terms.
The above procedure works, but the problem is that it takes a lot of time. Is there any suggestion to reduce the time?
Please note that I can not use os.listdir() to get the name of all files because the order of files in my_list is important for me.
Maybe implement an algorithm of yours where if record names are unique, you could create a dictionary (orderedDict if Python < 3.6 else by default, dicts are ordered) with all the record names set to False.
Then use threading with os.path.exists(path) which sets that key to True or False depending upon if that record exists. Dictionary being O(1) with threading might give you a performance boost.
A last note - This is all theoretical and you would have to implement/optimise yourself to see if it gives you a performance boost at all or adds unnecessary overhead.
Cheers!

Get key from a value(list) in a dic

Is there a way to retrive a key from values if the values is a list:
def add(self, name, number):
if name in self.bok.values():
print 'The name alredy exists in the telefonbok.'
else:
self.bok.update({number: []})
self.bok[number].append(name)
print self.bok
This works if i only have one element in the list:
self.bok.keys()[self.bok.values().index(my value i want to get the corresponding key)]
But if i insert more elements is gives me the error that it isnt in the list,
if u are wondering im creating an telephone book using class and dictionary so im supposed
to give and alias to the number and also be able to change the number on one name and alias should also get the new number. Would appriciate any help sorry if i'm blurry
If you find yourself wondering "how do I look up a key by its value?" it usually means that your dictionary is going the wrong way, or at least that you should be keeping two dictionaries. This is especially true if you notice yourself ensuring that the values are unique by hand!
At the moment, your conditional is never true (unless self.bok.values() is updated some other way), because name is (presumably) a string whereas self.bok.values() looks like it's a list of lists. If names should only appear once in the telephone book, that's a good hint that you should have a dictionary going the opposite direction.
Assuming you also need the number-to-name lookup, what I would do is add another dictionary to your class, and update them both whenever you add a new name/number pair.
import collections # defaultdict is a very nice object
# elsewhere, presumably in the __init__ method
self.name_to_number = {}
self.number_to_names = collections.defaultdict(list)
def add(self, name, number):
if name in self.name_to_number:
print 'The name alredy exists in the telefonbok.'
else:
self.name_to_number[name] = number
self.number_to_names[number].append(name)
If you're dead set on doing it the hard way for whatever reason, the findByValue method in Óscar López's answer is what you need.

Categories

Resources