I want to multiply two columns in different dataframes - python

So I have a dataframe named data1 with column named 'E-E11'in it and another dataframe named Volx with a column 'EVOL' in it. I want to multiply them and it doesn't work I get a KeyError 'E-E11'.All of the columns contain 332924 values.
used this
Volx = pd.read_csv('BCCdir1VOL.csv') #already floats in dataframe
Volx.drop(Volx.columns[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], inplace=True, axis=1) # have one column in my data frame
data1 = pd.read_csv('abaqusBCC1Dir.csv') #already floats in dataframe
data1.drop(data1.columns[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15]], inplace=True, axis=1) # have one column in my data frame
def getPower(data1, Multiplicationx, numOfCol):
for i in range(numOfCol):
Volx = 'EVOL' % (i+1)
E11x = 'E-E11' % (i+1)
Multiplicationx = 'E11x_V' % (i+1)
data1[Multiplicationx] = data1[E11x]*Volx[Volx]
data1[Multiplicationx] = data1['E-E11']*Volx['EVOL']
instead of getting a column Multiplicationx as a new data frame of multiplying two other datat frames, I get KeyError 'E-E11'. Please help me?

It's kinda of hard to tell what's going on, but I don't understand 'EVOL' % (i+1).
Try:
Volx = f'EVOL{i+1}'
E11x = f'E-E11{i+1}'
Multiplicationx = f'E11x_V{i+1}'
data1[Multiplicationx] = data1[E11x] * Volx

Related

How to automate the process to select the clusters using the labels

So I'm new to using python and I'm working in the analyze of some data, I'm using a process extremely manual to find the clusters, first I get the labels using the method from the library:
labels = optics_model.labels_[optics_model.ordering_]
then I use the command angwhere to find the index values that have that label:
cluster_0 = np.argwhere(labels == 0)
then I print this data, use another site to clean the data and use it to select from the dataframe the rows that are from this cluster:
index_0 = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
cluster_0 = df.iloc[index_0]
can someone help me automate this process?
So after some looking and testing I made it work for me using a method to add a column to the dataframe with the labels:
df_copy = df.assign(labels=labels)
then I calculated the number of clusters using this:
max = 0
for i in range(len(labels)):
if max < labels[i]:
max = labels[i]
then a made the necessary number of empty dataframes:
cluster = {}
for i in range(max):
cluster[i] = pd.DataFrame()
then I just copy the data I want from the dataframe:
for i in range(0, max):
cluster[i] = df_copy.loc[df_copy['labels'] == i]

adding rows based on values of other rows

I have a list (in a dataframe) that looks like this:
oddnum = [1, 3, 5, 7, 9, 11, 23]
I want to create a new list that looks like this:
newlist = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 23]
I want to test if the distance between two numbers is 2 (if oddnum[index+1]-oddnum[index] == 2)
If the distance is 2, then I want to add the number following oddnum[index] and create a new list (oddnum[index] + 1)
If the distance is greater than two, keep the list as is
I keep getting key error because (I think) the list runs out of [index] and [index+1] no longer exists once it reaches the end of the list. How do I do this?
To pass errors, the best method is to use try and except conditions. Here's my code:
oddnum = [1, 3, 5, 7, 9, 11, 23]
res = [] # The new list
for i in range(len(oddnum)):
res.append(oddnum[i]) # Append the first value by default
try: # Tries to run the code
if oddnum[i] + 2 == oddnum[i+1]: res.append(oddnum[i]+1) # Appends if the condition is met
except: pass # Passes on exception (in our case KeyError)
print(res)
oddnum = [1, 3, 5, 7, 9, 11, 23]
new_list = []
for pos, num in enumerate(oddnum):
new_list.append(num)
try:
if num-oddnum[pos+1] in [2, -2]:
new_list.append(num+1)
except:
pass
print(new_list)
Use try: except: to prevent exceptions popping up and ignore it

How do I get index of a specific value (in second dataframe) based on the same value in first dataframe

I have 2 data frames, df_ts and df_cmexport. I am trying to get the index of placement id in df_cmexport for the placements in df_ts
Refer to get an idea of the explanation : Click here to view excel file
Once I have the index of those placement id's as a list, I will iterate through them using for j in list_pe_ts_1: to get some value for 'j' index as such : df_cmexport['p_start_year'][j].
My code below returns an empty list for some reason print(list_pe_ts_1) returns []
I think something wrong with list_pe_ts_1 = df_cmexport.index[df_cmexport['Placement ID'] == pid_1].tolist() as this returens empty list of length 0
I even tried using list_pe_ts_1 = df_cmexport.loc[df_cmexport.isin([pid_1]).any(axis=1)].index but still gives a empty list
Help is always appreciated :) Cheers to you all #stackoverflow
for i in range(0, len(df_ts)):
pid_1 = df_ts['PLACEMENT ID'][i]
print('for pid ', pid_1)
list_pe_ts_1 = df_cmexport.index[df_cmexport['Placement ID'] == pid_1].tolist()
print('len of list',len(list_pe_ts_1))
ts_p_start_year_for_pid = df_ts['p_start_year'][i]
ts_p_start_month_for_pid = df_ts['p_start_month'][i]
ts_p_start_day_for_pid = df_ts['p_start_date'][i]
print('\np_start_full_date_ts for :', pid_1, 'y:', ts_p_start_year_for_pid, 'm:', ts_p_start_month_for_pid,
'd:', ts_p_start_day_for_pid)
# j=list_pe_ts
print(list_pe_ts_1)
for j in list_pe_ts_1:
# print(j)
export_p_start_year_for_pid = df_cmexport['p_start_year'][j]
export_p_start_month_for_pid = df_cmexport['p_start_month'][j]
export_p_start_day_for_pid = df_cmexport['p_start_date'][j]
print('\np_start_full_date_export for ', pid, "at row(", j, ") :", export_p_start_year_for_pid,
export_p_start_month_for_pid, export_p_start_day_for_pid)
if (ts_p_start_year_for_pid == export_p_start_year_for_pid) and (
ts_p_start_month_for_pid == export_p_start_month_for_pid) and (
ts_p_start_day_for_pid == export_p_start_day_for_pid):
pids_p_1.add(pid_1)
# print('pass',pids_p_1)
# print(export_p_end_year_for_pid)
else:
pids_f_1.add(pid_1)
# print("mismatch in placement end date for pid ", pids)
# print("pids list ",pids)
# print('fail',pids_f_1)
With below snippest you can get a list of the matching index field from seconds dataframe.
import pandas as pd
df_ts = pd.DataFrame(data = {'index in df':[0,1,2,3,4,5,6,7,8,9,10,11,12],
"pid":[1,1,2,2,3,3,3,4,6,8,8,9,9],
})
df_cmexport = pd.DataFrame(data = {'index in df':[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
"pid":[1,1,1,2,3,3,3,3,3,4,4,4,5,5,6,7,8,8,9,9,9],
})
Create new dataframe by mearging the two
result = pd.merge(df_ts, df_cmexport, left_on=["pid"], right_on=["pid"], how='left', indicator='True', sort=True)
Then identify unique values in "index in df_y" dataframe
index_list = result["index in df_y"].unique()
The result you get;
index_list
Out[9]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14, 16, 17, 18, 19,
20], dtype=int64)

How to add dictionary values in python numpy array

I'm trying to create numpy array and data keys are positions, metadata. Its output should be like below
#sample output
['positions', metadata] #data keys when I print file_name.keys()
{'num_pos': 10, 'keypoints': [[4, 5, 6, 10, 11, 12], [1, 2, 3, 13, 14, 15]]} #values of metadata in dictionary when I print file_name['metadata']
I want output same as above. Below is my python code to get required npz file.
#code sample
positions = [] #this step is working and values are saved in npz file, so I'm just skipping this step, my problem is in metadata key which is given below
metadata = {
'num_pos': 10,
'keypoints': [[4, 5, 6, 10, 11, 12], [1, 2, 3, 13, 14, 15]]
}
positions = np.array(positions).astype(np.float32)
np.savez_compressed('file_name.npz', position=positions, metadata=metadata)
With above code I can get npz file having values of positions but not values of metadata. When I print file_name.keys() then output is ['positions', 'metadata'] which is ok but when I print file_name['metadata'] I'm getting following error.
ValueError: unsupported pickle protocol: 3
Looking for valuable suggestions

Printing top n distinct values of a list

I want to print the top 10 distinct elements from a list:
top=10
test=[1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
for i in range(0,top):
if test[i]==1:
top=top+1
else:
print(test[i])
It is printing:
2,3,4,5,6,7,8
I am expecting:
2,3,4,5,6,7,8,9,10,11
What I am missing?
Using numpy
import numpy as np
top=10
test=[1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
test=np.unique(np.array(test))
test[test!=1][:top]
Output
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Since you code only executes the loop for 10 times and the first 3 are used to ignore 1, so only the following 3 is printed, which is exactly happened here.
If you want to print the top 10 distinct value, I recommand you to do this:
# The code of unique is taken from [remove duplicates in list](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists)
def unique(l):
return list(set(l))
def print_top_unique(List, top):
ulist = unique(List)
for i in range(0, top):
print(ulist[i])
print_top_unique([1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], 10)
My Solution
test = [1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
uniqueList = [num for num in set(test)] #creates a list of unique characters [1,2,3,4,5,6,7,8,9,10,11,12,13]
for num in range(0,11):
if uniqueList[num] != 1: #skips one, since you wanted to start with two
print(uniqueList[num])

Categories

Resources