Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 10 days ago.
Improve this question
In an existing code, I used the following code to perform operation against a dataframe column.
df.loc[:, ['test1']] = m/(df.loc[:, ['rh']]*d1)
Here, both m and d1 are scalar. 'test1' and 'rh' are the column names.
Is this the right way or the best practice to perform math operation against the dataframe?
Yes, what you have there is fine. If you're looking for ways to improve it, a couple suggestions:
When accessing entire columns (like you're doing here), you can be more concise by not using .loc, and just doing df["test1"] and df["rh"]
You could alternatively use the .apply() method, which is useful for a more general case of wanting to perform any arbitrary operation (that you can implement in a function, anyway) on a DataFrame column. You could use it here, and it would look like
df["test1"] = df["rh"].apply(lambda rh: m/(rh*d1)), though it is almost certainly not necessary for this simple case.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 months ago.
Improve this question
My problem is a conceptional rather then an actual practical one:
What are the reasons numpy uses row-based data instead of column-based data?
I know that row-based data can be accessed faster from the CPU this way, thus increasing performance. Column-based data on the other hand would be more "mathematically correct".
The performance would be reason alone to justify the convention, but I just wanted to know if there are any other reasons this convention is used (I am aware that not only numpy uses this convention, but that it is used in general, so I suppose another reason is that this follows convention of other libraries too).
Note that I asked this question on the numpy github already, but I wanted to see if I can reach different people with different knowledge here.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 months ago.
Improve this question
I want to make for loop formation with dataframe but i couldn't find grammar rule with this situation. Below is an overview of the functions I want to implement.
With Detail, i want to make new column named [df] which is calculated with column[f_adj]'s value.
Image of Excel
How to I fix this code?
df1['df'] = 0
for i in range(1,len(df1)-1):
df1['df'[i]] = df1['f_adj'[i+1]] - df1['f_adj'[i-1]]
Thank you in Advance
You should used iloc or loc in your code.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 months ago.
Improve this question
Here is the operation that I want to perform:
for sample in samples:
if sample['something'] is None:
sample['something'] = 'something_else'
I want to write this in a more elegant and Pythonic way. It would be great if some experts in Python could help. samples is a list of dictionaries.
for sample in (elt for elt in samples if elt["something"] is None):
sample["something"] = "something_else"
Note use of generator expression to not build another in-memory list, though this might be unnecessary.
I also would not use explicit None check either, unless empty collections, strings or 0 should be considered "truthy" values -- often they aren't, with the exception of zero. if not foo reads IMO better than if foo is not None. If this is your case too, you could do just
for sample in samples:
sample["something"] = sample["something"] or "something_else"
Then again I wouldn't probably bother, original would be good enough for me, and like suggested in comments, using or shortcut could be a tad hacky (in a bad way) for some readers.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
This is most likely a dumb question but being a beginner in Python/Numpy I will ask it anyways. I have come across a lot of posts on how to Normalize an array/matrix in numpy. But I am not sure about the WHY. Why/When does an array/matrix need to be normalized in numpy? When is it used?
Normalize can have multiple meanings in difference context. My question belongs to the field of Data Analytics/Data Science. What does Normalization mean in this context? Or more specifically in what situation should I normalize an array?
The second part to this question is - What are the different methods of Normalization and can they be used interchangeably in all situations?
The third and final part - can Normalization be used for Arrays of any dimensions?
Links to any reference material (for beginners) will be appreciated.
Consider trying to cluster objects with two numerical attributes A and B. Both are equally important. Attribute A can range from 0 to 1000 and attribute B can range from 0 to 5.
If you did not normalize A and B you would end up with attribute A completely overpowering attribute B when applying any standard distance metric.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a big file with some format. I want to substitute some items in the string by others, the string is long (like a big XML but with no format).
I know where they are, i could locate them by using a regular expression for each, but i wonder which is the best method, easier and better if its the most efficient way.
format/% already searches the string for parameter placeholders internally. Since they're implemented in C, you're not gonna beat their performance with Python code even if your search and replace workload is somewhat simpler. See Faster alternatives to numpy.argmax/argmin which is slow for a glance on C to Python relative performance.