I have variables of all the months and is better to change a month one time at the start than all the script. Not focus on the counts, only in understanding of python.
Is it possible?
My variable is uqFEB (and I have uqJAN, uqMAR...).
uqFEB = xarray.DataArray: lat: 721, lon: 1440
month = 'FEB'
The expectative is to write somenthing like this:
x = uq+{'month'}+*2
or
x = uq'month'*2
And python substitute and understand:
x= uqFEB*2
How can I write this syntax? It's possible?
Thank you so much!
As mentioned in my comment (and others) you want a dictionary. Any time you are thinking of dynamically referencing a variable in your code, think "Dictionary" instead:
month_dict={"Jan":1, "Feb":2, "Mar":3, "Apr":4, "May":5, "Jun":6, "Jul":7, "Aug":8, "Sept":9, "Oct":10, "Nov":11, "Dec":12}
x=month_dict["Feb"]**2
print(x)
4
Related
This seems like it should be very simple but am not sure the proper syntax in Python. To streamline my code I want a while loop (or for loop if better) to cycle through 9 datasets and use the counter to call each file out using the counter as a way to call on correct file.
I would like to use the "i" variable within the while loop so that for each file with sequential names I can get the average of 2 arrays, the max-min of this delta, and the max-min of another array.
Example code of what I am trying to do but the avg(i) and calling out temp(i) in loop does not seem proper. Thank you very much for any help and I will continue to look for solutions but am unsure how to best phrase this to search for them.
temp1 = pd.read_excel("/content/113VW.xlsx")
temp2 = pd.read_excel("/content/113W6.xlsx")
..-> temp9
i=1
while i<=9
avg(i) =np.mean(np.array([temp(i)['CC_H='],temp(i)['CC_V=']]),axis=0)
Delta(i)=(np.max(avg(i)))-(np.min(avg(i)))
deltaT(i)=(np.max(temp(i)['temperature='])-np.min(temp(i)['temperature=']))
i+= 1
EG: The slow method would be repeating code this for each file
avg1 =np.mean(np.array([temp1['CC_H='],temp1['CC_V=']]),axis=0)
Delta1=(np.max(avg1))-(np.min(avg1))
deltaT1=(np.max(temp1['temperature='])-np.min(temp1['temperature=']))
avg2 =np.mean(np.array([temp2['CC_H='],temp2['CC_V=']]),axis=0)
Delta2=(np.max(avg2))-(np.min(avg2))
deltaT2=(np.max(temp2['temperature='])-np.min(temp2['temperature=']))
......
Think of things in terms of lists.
temps = []
for name in ('113VW','113W6',...):
temps.append( pd.read_excel(f"/content/{name}.xlsx") )
avg = []
Delta = []
deltaT = []
for data in temps:
avg.append(np.mean(np.array([data['CC_H='],data['CC_V=']]),axis=0)
Delta.append(np.max(avg[-1]))-(np.min(avg[-1]))
deltaT.append((np.max(data['temperature='])-np.min(data['temperature=']))
You could just do your computations inside the first loop, if you don't need the dataframes after that point.
The way that I would tackle this problem would be to create a list of filenames, and then iterate through them to do the necessary calculations as per the following:
import pandas as pd
# Place the files to read into this list
files_to_read = ["/content/113VW.xlsx", "/content/113W6.xlsx"]
results = []
for i, filename in enumerate(files_to_read):
temp = pd.read_excel(filename)
avg_val =np.mean(np.array([temp(i)['CC_H='],temp['CC_V=']]),axis=0)
Delta=(np.max(avg_val))-(np.min(avg_val))
deltaT=(np.max(temp['temperature='])-np.min(temp['temperature=']))
results.append({"avg":avg_val, "Delta":Delta, "deltaT":deltaT})
# Create a dataframe to show the results
df = pd.DataFrame(results)
print(df)
I have included the enumerate feature to grab the index (or i) should you want to access it for anything, or include it in the results. For example, you could change the the results.append line to something like this:
results.append({"index":i, "Filename":filename, "avg":avg_val, "Delta":Delta, "deltaT":deltaT})
Not sure if I understood the question correctly. But if you want to read the files inside a loop using indexes (i variable), you can create a list to hold the contents of the excel files instead of using 9 different variables.
something like
files = []
files.append(pd.read_excel("/content/113VW.xlsx"))
files.append(pd.read_excel("/content/113W6.xlsx"))
...
then use the index variable to iterate over the list
i=1
while i<=9
avg(i) = np.mean(np.array([files[i]['CC_H='],files[i]['CC_V=']]),axis=0)
...
i+=1
P.S.: I am not a Pandas/NumPy expert, so you may have to adapt the code to your needs
I'm trying parse variables and the intervals in which they live in from a file. For example, an input file might contain something like this:
Constants
x1 = 1;
Variables
x45 in [-45,5844];
x63 in [0,456];
x41 in [45, 1.e8]; #Where 1.e8 stands for 10^8
All I want to do is to stock every couple (variable, interval) in a dictionary. If it's a constant, the interval would be [constant, constant]. I first imagined I had to use the built-in function findall to search through the whole file all the lines of the type "x"random_number" in "random_interval" or "x"random_number = random_number" but I don't know how to get and stock the "x" and the intervals after I find all the lines I wanted.
Also, whenever there is a "1.e8" in an interval, I want to replace it by a "10^8" before stocking it in the dictionary.
Any clue ?
Thanks for helping me to solve my problem
Honestly I don't exactly know how to begin. All I did is this :
from sys import argv
import re
script, filename = argv
#The file which contains the variables is filename, so that I can use as a command
in a terminal : "python myscript.py filename.txt"
file_data = open(filename, 'r')
Dict = {} #The dictionnary in which I want to store the variables and the intervals
txt = file_data.read()
#Here I thought I might use this : re.findall(.*"in".*, txt) or something like that
#but It will get me all the lines of the type "blabla in bloblo".
#I want also blabla and bloblo so that I can put them into my dictionnary like this :
#Dict[blabla] = bloblo
#In the dictionnary it will be for example Dict = {x45 : [-10, 456]}
file_data.close()
To answer Akilan,
if the file is the following :
Constants
x1 = 5;
Variables
x56 in [3,12];
x78 in [1,4];
It should return the following :
{"x1" : [5,5], "x56" : [3,12], "x78" : [1,4]}
I hope I can be as clear as possible.
I have an excel file with 400 subjects for a study and for each one of them I have their age, their sex and 40 more columns of biological variables.
Es: CODE0001; (age)20; M\F; Biovalue1; BioValue 2 ..... Biovalue 40.
My goal is to analyze these data with the 1-way Anova because I think it's the best option I have. I'm trying do it (even using this guide https://www.marsja.se/four-ways-to-conduct-one-way-anovas-using-python/ ) but there's always a problem with the code.
So: how can I set up my data in order to be able to use the code for example from that website?
I've already done Dataset.mean() and Dataset.std() for all the data, but I can't use for example the value "Mean Age" because it seems like Jupyter only reads it as a string and not a value.
I'm in a deep state of confusion, so all kind of help will be super appreciated!!!
Thank you in advance
I'm sorry but I didn't understand. I'm relatively new to python so maybe i couldn't explain myself properly.
I need to do an Anova analysis:
First I did this:
AnalisiISAD.mean()
2) Then I made a list from that:
MeanList = [......]
3) Then i proceded with the anova script
AnalisiI.boxplot('MeanList', by='AgeT0', figsize=(12,8))
ctrl = Analisi['MeanList'][Analisi == 'ctrl']
grps = pd.unique(Analisi.group.values)
d_data = {grp:Analisi['MeanList'][Analisi.group ==grp] for grp in grps}
k = len(pd.unique(Analisi.group))
N = len(Analisi.values)
n = Analisi.groupby('AgeT0').size()[0]
but this error occurs: KeyError: 'Column not found: MeanList'
Does this mean I have to create a new column in the excel file? How do I do that?
When using df.mean() or df.std(), try changing the data to pd.Series first and run it.
I'm not sure what to call what I'm looking for; so if I failed to find this question else where, I apologize. In short, I am writing python code that will interface directly with the Linux kernel. Its easy to get the required values from include header files and write them in to my source:
IFA_UNSPEC = 0
IFA_ADDRESS = 1
IFA_LOCAL = 2
IFA_LABEL = 3
IFA_BROADCAST = 4
IFA_ANYCAST = 5
IFA_CACHEINFO = 6
IFA_MULTICAST = 7
Its easy to use these values when constructing structs to send to the kernel. However, they are of almost no help to resolve the values in the responses from the kernel.
If I put the values in to dict I would have to scan all the values in the dict to look up keys for each item in each struct from the kernel I presume. There must be a simpler, more efficient way.
How would you do it? (feel free to retitle the question if its way off)
If you want to use two dicts, you can try this to create the inverted dict:
b = {v: k for k, v in a.iteritems()}
Your solution leaves a lot of work do the repeated person creating the file. That is a source for error (you actually have to write each name three times). If you have a file where you need to update those from time to time (like, when new kernel releases come out), you are destined to include an error sooner or later. Actually, that was just a long way of saying, your solution violates DRY.
I would change your solution to something like this:
IFA_UNSPEC = 0
IFA_ADDRESS = 1
IFA_LOCAL = 2
IFA_LABEL = 3
IFA_BROADCAST = 4
IFA_ANYCAST = 5
IFA_CACHEINFO = 6
IFA_MULTICAST = 7
__IFA_MAX = 8
values = {globals()[x]:x for x in dir() if x.startswith('IFA_') or x.startswith('__IFA_')}
This was the values dict is generated automatically. You might want to (or have to) change the condition in the if statement there, according to whatever else is in that file. Maybe something like the following. That version would take away the need to list prefixes in the if statement, but it would fail if you had other stuff in the file.
values = {globals()[x]:x for x in dir() if not x.endswith('__')}
You could of course do something more sophisticated there, e.g. check for accidentally repeated values.
What I ended up doing is leaving the constant values in the module and creating a dict. The module is ip_addr.py (the values are from linux/if_addr.h) so when constructing structs to send to the kernel I can use if_addr.IFA_LABEL and resolves responses with if_addr.values[2]. I'm hoping this is the most straight forward so when I have to look at this again in a year+ its easy to understand :p
IFA_UNSPEC = 0
IFA_ADDRESS = 1
IFA_LOCAL = 2
IFA_LABEL = 3
IFA_BROADCAST = 4
IFA_ANYCAST = 5
IFA_CACHEINFO = 6
IFA_MULTICAST = 7
__IFA_MAX = 8
values = {
IFA_UNSPEC : 'IFA_UNSPEC',
IFA_ADDRESS : 'IFA_ADDRESS',
IFA_LOCAL : 'IFA_LOCAL',
IFA_LABEL : 'IFA_LABEL',
IFA_BROADCAST : 'IFA_BROADCAST',
IFA_ANYCAST : 'IFA_ANYCAST',
IFA_CACHEINFO : 'IFA_CACHEINFO',
IFA_MULTICAST : 'IFA_MULTICAST',
__IFA_MAX : '__IFA_MAX'
}
I'm rewriting some code from Ruby to Python. The code is for a Perceptron, listed in section 8.2.6 of Clever Algorithms: Nature-Inspired Programming Recipes. I've never used Ruby before and I don't understand this part:
def test_weights(weights, domain, num_inputs)
correct = 0
domain.each do |pattern|
input_vector = Array.new(num_inputs) {|k| pattern[k].to_f}
output = get_output(weights, input_vector)
correct += 1 if output.round == pattern.last
end
return correct
end
Some explanation: num_inputs is an integer (2 in my case), and domain is a list of arrays: [[1,0,1], [0,0,0], etc.]
I don't understand this line:
input_vector = Array.new(num_inputs) {|k| pattern[k].to_f}
It creates an array with 2 values, every values |k| stores pattern[k].to_f, but what is pattern[k].to_f?
Try this:
input_vector = [float(pattern[i]) for i in range(num_inputs)]
pattern[k].to_f
converts pattern[k] to a float.
I'm not a Ruby expert, but I think it would be something like this in Python:
def test_weights(weights, domain, num_inputs):
correct = 0
for pattern in domain:
output = get_output(weights, pattern[:num_inputs])
if round(output) == pattern[-1]:
correct += 1
return correct
There is plenty of scope for optimising this: if num_inputs is always one less then the length of the lists in domain then you may not need that parameter at all.
Be careful about doing line by line translations from one language to another: that tends not to give good results no matter what languages are involved.
Edit: since you said you don't think you need to convert to float you can just slice the required number of elements from the domain value. I've updated my code accordingly.