Renaming files with different Names depending upon different match cases in Python

Renaming files with different Names depending upon different match cases in Python - python

I'm a newbie at python, I'm trying to write a python code to rename multiple files with different names depending upon different match cases, here's my code
for i, file in enumerate(os.listdir(inputpath)):
if(match(".*"716262_2.*$"),file):
dstgco="Test1"+"DateHere"+".xls"
gnupgCommandOp=gnupgCommand(os.rename(os.path.join(inputpath,file),os.path.join(inputpath,dstgco)))
returnCode = call(gnupgCommandop)
if(match(".*"270811_2.*$"),file):
dstgmo="Test2"+"DateHere"+".xls"
gnupgCommandOp=gnupgCommand(os.rename(os.path.join(inputpath,file),os.path.join(inputpath,dstgmo)))
returnCode = call(gnupgCommandop)
currently what is happening is only one file is getting renamed which is Test2 DateHere with str object is not a callable error, my requirement is to rename files present at the location depending on the different match cases, I'm writing incorrect for loop or if statements?
things I have tried :
used incremental count
used glob
used only os.listdir and not enumerate
seems like it is matching the first statement and breaking on the next retrieval, may be I wrote If statements wrong
I can't debug this since I'm calling this code from an internal tool using a bat file.
can someone please help me out with this, I know only a single gnupgCommandOp should be used, is my syntax is wrong? is what would be a better way to achieve this?

Related

Divide and Conquer Lists in Python (to read sav files using pyreadstat)

I am trying to read sav files using pyreadstat in python but for some rare scenarios I am getting error of UnicodeDecodeError since the string variable has special characters.
To handle this I think instead of loading the entire variable set I will load only variables which do not have this error.
Below is the pseudo-code that I have with me. This is not a very efficient code since I check for error in each item of list using try and except.
# Reads only the medata to get information about the variables
df, meta = pyreadstat.read_sav('Test.sav', metadataonly=True)
list = meta.column_names # All variables are stored in list
result = []
for var in list:
print(var)
try:
df, meta = pyreadstat.read_sav('Test.sav', usecols=[str(var)])
# If no error that means we can store this variable in result
result.append(var)
except:
pass
# This will finally load the sav for non error variables
df, meta = pyreadstat.read_sav('Test.sav', usecols=result)
For a sav file with 1000+ variables it takes a long amount of time to process this.
I was thinking if there is a way to use divide and conquer approach and do it faster. Below is my suggested approach but I am not very good in implementing recursion algorithm. Can someone please help me with pseudo code it would be very helpful.
Take the list and try to read sav file
In case of no error then output can be stored in result and then we read the sav file
In case of error then split the list into 2 parts and run these again ....
Step 3 needs to run again until we have a list where it does not give any error
Using the second approach 90% of my sav files will get loaded on the first pass itself hence I think recursion is a good method
You can try to reproduce the issue for sav file here

For this specific case I would suggest a different approach: you can give an argument "encoding" to pyreadstat.read_sav to manually set the encoding. If you don't know which one it is, what you can do is iterate over the list of encodings here: https://gist.github.com/hakre/4188459 to find out which one makes sense. For example:
# here codes is a list with all the encodings in the link mentioned before
for c in codes:
try:
df, meta = p.read_sav("Test.sav", encoding=c)
print(encoding)
print(df.head())
except:
pass
I did and there were a few that may potentially make sense, assuming that the string is in a non-latin alphabet. However the most promising one is not in the list: encoding="UTF8" (the list contains UTF-8, with dash and that fails). Using UTF8 (no dash) I get this:
నేను గతంలో వాడిన బ
which according to google translate means "I used to come b" in Telugu. Not sure if that fully makes sense, but it's a way.
The advantage of this approach is that if you find the right encoding, you will not be loosing data, and reading the data will be fast. The disadvantage is that you may not find the right encoding.
In case you would not find the right encoding, you anyway would be reading the problematic columns very fast, and you can discard them later in pandas by inspecting which character columns do not contain latin characters. This will be much faster than the algorithm you were suggesting.

Extract list of variables with start attribute from Modelica model

Is there an easy way to extract a list of all variables with start attribute from a Modelica model? The ultimate goal is to run a simulation until it reaches steady-state, then run a python script that compares the values of start attribute against the steady-state value, so that I can identify start values that were chosen badly.
In the Dymola Python interface I could not find such a functionality. Another approach could be to generate the modelDescription.xml and parse it, I assume the information is available somewhere in there, but for that approach I also feel I need help to get started.

Similar to this answer, you can extract that info easily from the modelDescription.xml inside a FMU with FMPy.
Here is a small runnable example:
from fmpy import read_model_description
from fmpy.util import download_test_file
from pprint import pprint
fmu_filename = 'CoupledClutches.fmu'
download_test_file('2.0', 'CoSimulation', 'MapleSim', '2016.2', 'CoupledClutches', fmu_filename)
model_description = read_model_description(fmu_filename)
start_vars = [v for v in model_description.modelVariables if v.start and v.causality == 'local']
pprint(start_vars)

The files dsin.txt and dsfinal.txt might help you around with this. They have the same structure, with values at the start and at the end of the simulation; by renaming dsfinal.txt to dsin.txt you can start your simulation from the (e.g. steady-state) values you computed in a previous run.
It might be worthy working with these two files if you have in mind already to use such values for running other simulations.
They give you information about solvers/simulation settings, that you won't find in the .mat result files (if they're of any interest for your case)
However, if it is only a comparison between start and final values of variables that are present in the result files anyway, a better choice might be to use python and a library to read the result.mat file (dymat, modelicares, etc). It is then a matter of comparing start-end values of the signals of interest.

After some trial and error, I came up with this python code snippet to get that information from modelDescription.xml:
import xml.etree.ElementTree as ET
root = ET.parse('modelDescription.xml').getroot()
for ScalarVariable in root.findall('ModelVariables/ScalarVariable'):
varStart = ScalarVariable.find('*[#start]')
if varStart is not None:
name = ScalarVariable.get('name')
value = varStart.get('start')
print(f"{name} = {value};")
To generate the modelDescription.xml file, run Dymola translation with the flag
Advanced.FMI.GenerateModelDescriptionInterface2 = true;
Python standard library has several modules for processing XML:
https://docs.python.org/3/library/xml.html
This snippet uses ElementTree.
This is just a first step, not sure if I missed something basic.

Use list gotten by a function into other function

So, I have function ABC in a certain .py file that returns a list. And in ANOTHER .py FILE, i have another function where I'm supposed to write into an empty file (that function will return that new file). I want to write into that new empty file my list gotten with the function ABC. How am I supposed to do that?
Obs - sorry for not posting any code but I really have no ideas about how to do this, besides that I've found nothing in another questions similars to this.

https://www.learnpython.org/en/Functions has a very good section on how to use functions and the following sections on classes, objects and modules are worth reviewing.
Write a file:
Python 2.7 : Write to file instantly
Passing args vs list:
Advantages of using *args in python instead of passing a list as a parameter
That gives you a starting point for learning Python and solving your immediate problem.

error with gdalbuildvrt, in Python

I am new to python/GDAL and am running into perhaps a trivial issue. This may stem from the fact that I don't really understand how to use GDAL properly in python, or something careless, but even though I think I am following the help doc, I keep getting a syntax error when trying to use "gdalbuildvrt".
What I want to do is take several (amount varies for each set, call it N) geotagged 1-band binary rasters [all values are either 0 or 1] of different sizes (each raster in the set overlaps for the most part though), and "stack" them on top of each other so that they are aligned properly according to their coordinate information. I want this "stack" simply so I can sum the values and produce a 'total' tiff that has an extent to match the exclusive extent (meaning not just the overlap region) of all the original rasters. The resulting tiff would have values ranging from 0 to N, to represent the total number of "hits" the pixel in that location received over the course of the N rasters.
I was led to gdalbuildvrt [http://www.gdal.org/gdalbuildvrt.html] and after reading about it, it seemed that by using the keyword -separate, I would be able to achieve what I need. However, each time I try to run my program, I get a syntax error. The following shows two of the several different ways I tried calling gdalbuildvrt:
gdalbuildvrt -separate -input_file_list stack.vrt inputlist.txt
gdalbuildvrt -separate stack.vrt inclassfiles
Where inputlist.txt is a text file with a path to the tif on every line, just like the help doc specifies. And inclassfiles is a python list of the pathnames. Every single time, no matter which way I call it, I get a syntax error on the first word after the keywords (i.e. 'inputlist' in inputlist.txt, or 'stack' in stack.vrt).
Could someone please shed some light on what I might be doing wrong? Alternatively, does anyone know how else I could use python to get what I need?
Thanks so much.

gdalbuildvrt is a GDAL command line utility. From your example its a bit unclear how you actually run it, but when running from within Python you should execute it as a subprocess.
And in your first line you have the .vrt and the .txt in the wrong order. The textfile containing the files should follow directly after the -input_file_list.
From within Python you can call gdalbuildvrt like:
import os
os.system('gdalbuildvrt -separate -input_file_list inputlist.txt stack.vrt')
Note that the command is provided as a string. Using a Python list with the files can be done with something like:
os.system('gdalbuildvrt -separate stack.vrt %s') % ' '.join(data)
The ' '.join(data) part converts the list to a string with a space between the items.
Depending on how your GDAL is build, its sometimes possible to use wildcards as well:
os.system('gdalbuildvrt -separate stack.vrt *.tif')

Creating dictionary from xlsx: TypeError: argument of type 'Book' is not iterable

I'm pretty new to Python (and the xlrd module), so my code is probably not nearly as compact as it could be. I'm just using it to analyse some data, so it's more important for me to get what I'm doing rather than for me to make the code as compact as possible (though I do hope to improve, so feel free to give me advice on the coding itself, provided you manage to explain it to a 'newbie' :p )
That being said, here's my issue:
Context
I have an xlsx file with data on errors that people made when translating a text. The first column contains a code for the error relative to the text (conceptual errors), the second column contains a code for the translator that made the error. I want to create a dictionary in which the keys are the conceptual error codes, and the values are lists of the different translators that made that conceptual error.
An short fragment from the xlsx (to give you an idea of the codes in the two columns):
1722_Z1_CF5 1722_HT_EV_Z1_F1
1722_Z1_CF1 1722_PE_AL_Z1_F1
1722_Z1_CF9 1722_PE_EVC_Z1_F1
1722_Z1_CF5 1722_PE_LH_Z1_F1
As you can see, the conceptual error '1722_Z1_CF5' has been made by 2 different people ('1722_HT_EV_Z1_F1' and '1722_PE_LH_Z1_F1). The dictionary for this fragment would look something like:
1722_Z1_CF5: 1722_HT_EV_Z1_F1, 1722_PE_LH_Z1_F1
1722_Z1_CF1: 1722_PE_AL_Z1_F1
1722_Z1_CF9: 1722_PE_EVC_Z1_F1
Code
The code below is what I tried to do to create the dictionary.
def TranslatorsPerError(sheet):
TotalConceptualErrors(sheet)
TranslatorsPerError = {}
for row_index in range(sheet.nrows):
if sheet.cell(row_index,0).value in ConceptualErrors and sheet.cell(row_index,0).value not in TranslatorsPerError:
TranslatorsPerError[str(sheet.cell(row_index,0).value)]=[str(sheet.cell(row_index,1).value),]
if sheet.cell(row_index,0).value in ConceptualErrors and sheet.cell(row_index,0).value in TranslatorsPerError:
TranslatorsPerError[str(sheet.cell(row_index,0).value)].append(str(sheet.cell(row_index,1).value))
return TranslatorsPerError
'TotalConceptualErrors' is a function I created that returns a list ('ConceptualErrors') of the conceptual error codes from the first column without duplicates (and it filters out some other information that was also present in the first column, that's why I needed to use this one first).
Problem
The problem is that this function keeps giving me an error: TypeError: argument of type 'Book' is not iterable
I know that problems with iterables can sometimes be solved by casting certain things into a different type, but I'm not sure how I should solve this one. I tried to use 'str()' for different elements, but that didn't solve the problem. Maybe it has something to do with my code, maybe with the nature of dictionaries or xlrd... (looking at the type 'book', my guess would be on the latter).
Any help or feedback on how to fix this would be greatly appreciated. If you need extra information to understand what's going on or what I'm looking for, please ask.

where is ConceptualErrors being set?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.