So I have a dataframe named data1 with column named 'E-E11'in it and another dataframe named Volx with a column 'EVOL' in it. I want to multiply them and it doesn't work I get a KeyError 'E-E11'.All of the columns contain 332924 values.
used this
Volx = pd.read_csv('BCCdir1VOL.csv') #already floats in dataframe
Volx.drop(Volx.columns[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], inplace=True, axis=1) # have one column in my data frame
data1 = pd.read_csv('abaqusBCC1Dir.csv') #already floats in dataframe
data1.drop(data1.columns[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15]], inplace=True, axis=1) # have one column in my data frame
def getPower(data1, Multiplicationx, numOfCol):
for i in range(numOfCol):
Volx = 'EVOL' % (i+1)
E11x = 'E-E11' % (i+1)
Multiplicationx = 'E11x_V' % (i+1)
data1[Multiplicationx] = data1[E11x]*Volx[Volx]
data1[Multiplicationx] = data1['E-E11']*Volx['EVOL']
instead of getting a column Multiplicationx as a new data frame of multiplying two other datat frames, I get KeyError 'E-E11'. Please help me?
It's kinda of hard to tell what's going on, but I don't understand 'EVOL' % (i+1).
Try:
Volx = f'EVOL{i+1}'
E11x = f'E-E11{i+1}'
Multiplicationx = f'E11x_V{i+1}'
data1[Multiplicationx] = data1[E11x] * Volx
Related
So I'm new to using python and I'm working in the analyze of some data, I'm using a process extremely manual to find the clusters, first I get the labels using the method from the library:
labels = optics_model.labels_[optics_model.ordering_]
then I use the command angwhere to find the index values that have that label:
cluster_0 = np.argwhere(labels == 0)
then I print this data, use another site to clean the data and use it to select from the dataframe the rows that are from this cluster:
index_0 = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
cluster_0 = df.iloc[index_0]
can someone help me automate this process?
So after some looking and testing I made it work for me using a method to add a column to the dataframe with the labels:
df_copy = df.assign(labels=labels)
then I calculated the number of clusters using this:
max = 0
for i in range(len(labels)):
if max < labels[i]:
max = labels[i]
then a made the necessary number of empty dataframes:
cluster = {}
for i in range(max):
cluster[i] = pd.DataFrame()
then I just copy the data I want from the dataframe:
for i in range(0, max):
cluster[i] = df_copy.loc[df_copy['labels'] == i]
I have a list (in a dataframe) that looks like this:
oddnum = [1, 3, 5, 7, 9, 11, 23]
I want to create a new list that looks like this:
newlist = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 23]
I want to test if the distance between two numbers is 2 (if oddnum[index+1]-oddnum[index] == 2)
If the distance is 2, then I want to add the number following oddnum[index] and create a new list (oddnum[index] + 1)
If the distance is greater than two, keep the list as is
I keep getting key error because (I think) the list runs out of [index] and [index+1] no longer exists once it reaches the end of the list. How do I do this?
To pass errors, the best method is to use try and except conditions. Here's my code:
oddnum = [1, 3, 5, 7, 9, 11, 23]
res = [] # The new list
for i in range(len(oddnum)):
res.append(oddnum[i]) # Append the first value by default
try: # Tries to run the code
if oddnum[i] + 2 == oddnum[i+1]: res.append(oddnum[i]+1) # Appends if the condition is met
except: pass # Passes on exception (in our case KeyError)
print(res)
oddnum = [1, 3, 5, 7, 9, 11, 23]
new_list = []
for pos, num in enumerate(oddnum):
new_list.append(num)
try:
if num-oddnum[pos+1] in [2, -2]:
new_list.append(num+1)
except:
pass
print(new_list)
Use try: except: to prevent exceptions popping up and ignore it
I have 2 data frames, df_ts and df_cmexport. I am trying to get the index of placement id in df_cmexport for the placements in df_ts
Refer to get an idea of the explanation : Click here to view excel file
Once I have the index of those placement id's as a list, I will iterate through them using for j in list_pe_ts_1: to get some value for 'j' index as such : df_cmexport['p_start_year'][j].
My code below returns an empty list for some reason print(list_pe_ts_1) returns []
I think something wrong with list_pe_ts_1 = df_cmexport.index[df_cmexport['Placement ID'] == pid_1].tolist() as this returens empty list of length 0
I even tried using list_pe_ts_1 = df_cmexport.loc[df_cmexport.isin([pid_1]).any(axis=1)].index but still gives a empty list
Help is always appreciated :) Cheers to you all #stackoverflow
for i in range(0, len(df_ts)):
pid_1 = df_ts['PLACEMENT ID'][i]
print('for pid ', pid_1)
list_pe_ts_1 = df_cmexport.index[df_cmexport['Placement ID'] == pid_1].tolist()
print('len of list',len(list_pe_ts_1))
ts_p_start_year_for_pid = df_ts['p_start_year'][i]
ts_p_start_month_for_pid = df_ts['p_start_month'][i]
ts_p_start_day_for_pid = df_ts['p_start_date'][i]
print('\np_start_full_date_ts for :', pid_1, 'y:', ts_p_start_year_for_pid, 'm:', ts_p_start_month_for_pid,
'd:', ts_p_start_day_for_pid)
# j=list_pe_ts
print(list_pe_ts_1)
for j in list_pe_ts_1:
# print(j)
export_p_start_year_for_pid = df_cmexport['p_start_year'][j]
export_p_start_month_for_pid = df_cmexport['p_start_month'][j]
export_p_start_day_for_pid = df_cmexport['p_start_date'][j]
print('\np_start_full_date_export for ', pid, "at row(", j, ") :", export_p_start_year_for_pid,
export_p_start_month_for_pid, export_p_start_day_for_pid)
if (ts_p_start_year_for_pid == export_p_start_year_for_pid) and (
ts_p_start_month_for_pid == export_p_start_month_for_pid) and (
ts_p_start_day_for_pid == export_p_start_day_for_pid):
pids_p_1.add(pid_1)
# print('pass',pids_p_1)
# print(export_p_end_year_for_pid)
else:
pids_f_1.add(pid_1)
# print("mismatch in placement end date for pid ", pids)
# print("pids list ",pids)
# print('fail',pids_f_1)
With below snippest you can get a list of the matching index field from seconds dataframe.
import pandas as pd
df_ts = pd.DataFrame(data = {'index in df':[0,1,2,3,4,5,6,7,8,9,10,11,12],
"pid":[1,1,2,2,3,3,3,4,6,8,8,9,9],
})
df_cmexport = pd.DataFrame(data = {'index in df':[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
"pid":[1,1,1,2,3,3,3,3,3,4,4,4,5,5,6,7,8,8,9,9,9],
})
Create new dataframe by mearging the two
result = pd.merge(df_ts, df_cmexport, left_on=["pid"], right_on=["pid"], how='left', indicator='True', sort=True)
Then identify unique values in "index in df_y" dataframe
index_list = result["index in df_y"].unique()
The result you get;
index_list
Out[9]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14, 16, 17, 18, 19,
20], dtype=int64)
I'm trying to create numpy array and data keys are positions, metadata. Its output should be like below
#sample output
['positions', metadata] #data keys when I print file_name.keys()
{'num_pos': 10, 'keypoints': [[4, 5, 6, 10, 11, 12], [1, 2, 3, 13, 14, 15]]} #values of metadata in dictionary when I print file_name['metadata']
I want output same as above. Below is my python code to get required npz file.
#code sample
positions = [] #this step is working and values are saved in npz file, so I'm just skipping this step, my problem is in metadata key which is given below
metadata = {
'num_pos': 10,
'keypoints': [[4, 5, 6, 10, 11, 12], [1, 2, 3, 13, 14, 15]]
}
positions = np.array(positions).astype(np.float32)
np.savez_compressed('file_name.npz', position=positions, metadata=metadata)
With above code I can get npz file having values of positions but not values of metadata. When I print file_name.keys() then output is ['positions', 'metadata'] which is ok but when I print file_name['metadata'] I'm getting following error.
ValueError: unsupported pickle protocol: 3
Looking for valuable suggestions
I want to print the top 10 distinct elements from a list:
top=10
test=[1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
for i in range(0,top):
if test[i]==1:
top=top+1
else:
print(test[i])
It is printing:
2,3,4,5,6,7,8
I am expecting:
2,3,4,5,6,7,8,9,10,11
What I am missing?
Using numpy
import numpy as np
top=10
test=[1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
test=np.unique(np.array(test))
test[test!=1][:top]
Output
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Since you code only executes the loop for 10 times and the first 3 are used to ignore 1, so only the following 3 is printed, which is exactly happened here.
If you want to print the top 10 distinct value, I recommand you to do this:
# The code of unique is taken from [remove duplicates in list](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists)
def unique(l):
return list(set(l))
def print_top_unique(List, top):
ulist = unique(List)
for i in range(0, top):
print(ulist[i])
print_top_unique([1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], 10)
My Solution
test = [1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
uniqueList = [num for num in set(test)] #creates a list of unique characters [1,2,3,4,5,6,7,8,9,10,11,12,13]
for num in range(0,11):
if uniqueList[num] != 1: #skips one, since you wanted to start with two
print(uniqueList[num])