Related
Some background: MB column only consists of 1 of 2 values (M or B) while the Indent column contains int. The numbers don't necessarily follow a pattern but if it does increment, it will increment by one. The numbers can decrement by any amount. The rows are sorted in a specific order.
The goal here is to drop rows with Indent values higher than the indent value of a row that contains a "B" value in the MB column. This should only stop once the indent value is equal to or less than the row that contains the "B" value. Below is a chart demonstrating what rows should be dropped.
Sample data:
import pandas as pd
d = {'INDENT': {'0': 0, '1': 1, '2': 1, '3': 2, '4': 3, '5': 3, '6': 4, '7': 2, '8': 3}, 'MB': {'0': 'M', '1': 'B', '2': 'M', '3': 'B', '4': 'B', '5': 'M', '6': 'M', '7': 'B', '8': 'M'}}
df = pd.DataFrame(d)
Code:
My current code has issues where I cant drop the rows of the inner for loop since it isn't using iterrows. I am aware of dropping based on a conditional expression but I am unsure how to nest this correctly.
for index, row in df.iterrows():
for row in range(index-1,0,-1):
if df.loc[row].at["INDENT"] <= df.loc[index].at["INDENT"]-1:
if df.loc[row].at["MB"]=="B":
df.drop(df.index[index], inplace=True)
break
else:
break
Edit 1:
This problem can be represented graphically. This is effectively scanning a hierarchy for an attribute and deleting anything below it. The example I provided is bad since all rows that need to be dropped are simply indent 3 or higher but this can happen at any indent level.
Edit 2: We are going to cheat on this problem a bit. I won't have to generate an edge graph from scratch since I have the prerequisite data to do this. I have an updated table and sample data.
Updated Sample Data
import pandas as pd
d = {
'INDENT': {'0': 0, '1': 1, '2': 1, '3': 2, '4': 3, '5': 3, '6': 4, '7': 2, '8': 3},
'MB': {'0': 'M', '1': 'B', '2': 'M', '3': 'B', '4': 'B', '5': 'M', '6': 'M', '7': 'B', '8': 'M'},
'a': {'0': -1, '1': 5000, '2': 5000, '3': 5322, '4': 5449, '5': 5449, '6': 5621, '7': 5322, '8': 4666},
'c': {'0': 5000, '1': 5222, '2': 5322, '3': 5449, '4': 5923, '5': 5621, '6': 5109, '7': 4666, '8': 5219}
}
df = pd.DataFrame(d)
Updated Code
import matplotlib.pyplot as plt
import networkx as nx
import pandas as pd
d = {
'INDENT': {'0': 0, '1': 1, '2': 1, '3': 2, '4': 3, '5': 3, '6': 4, '7': 2, '8': 3},
'MB': {'0': 'M', '1': 'B', '2': 'M', '3': 'B', '4': 'B', '5': 'M', '6': 'M', '7': 'B', '8': 'M'},
'a': {'0': -1, '1': 5000, '2': 5000, '3': 5322, '4': 5449, '5': 5449, '6': 5621, '7': 5322, '8': 4666},
'c': {'0': 5000, '1': 5222, '2': 5322, '3': 5449, '4': 5923, '5': 5621, '6': 5109, '7': 4666, '8': 5219}
}
df = pd.DataFrame(d)
G = nx.Graph()
G = nx.from_pandas_edgelist(df, 'a', 'c', create_using=nx.DiGraph())
T = nx.dfs_tree(G, source=-1).reverse()
print([x for x in T])
nx.draw(G, with_labels=True)
plt.show()
I am unsure how to use the edges from here to identify the rows that need to be dropped from the dataframe
Not a answer, but to long for a comment:
import pandas as pd
d = {'INDENT': {'0': 0, '1': 1, '2': 1, '3': 2, '4': 3, '5': 3, '6': 4, '7': 2, '8': 3}, 'MB': {'0': 'M', '1': 'B', '2': 'M', '3': 'B', '4': 'B', '5': 'M', '6': 'M', '7': 'B', '8': 'M'}}
df = pd.DataFrame(d)
df['i'] = df['INDENT']+1
df = df.reset_index()
df = df.merge(df[['INDENT', 'index', 'MB']].rename(columns={'INDENT':'target', 'index':'ix', 'MB': 'MBt'}), left_on=['i'], right_on=['target'], how='left')
import networkx as nx
G = nx.Graph()
G = nx.from_pandas_edgelist(df, 'index', 'ix', create_using=nx.DiGraph())
T = nx.dfs_tree(G, source='0').reverse()
print([x for x in T])
nx.draw(G, with_labels=True)
This demonstrates the problem. You actually want to apply graph theory, the library networkx can help you with that. A first step would be to first construct the connection between each node, like I did in the example above. From there you can try to apply a logic to filter edges you don't want.
Not sure I fully understand the question you're asking however this is my attempt. It selects only if index > indent and mb ==B. In this case I'm just selecting the subset we want instead of dropping the subset we don't
import numpy as np
import pandas as pd
x=np.transpose(np.array([[0,1,2,3,4,5,6,7,8],[0,1,1,2,3,3,4,2,3],['M','B','M','B','B','M','M','B','M']]))
df=pd.DataFrame(x,columns=['Index','indent','MB'])
df1=df[(df['Index']>=df['indent']) & (df['MB']=='B')]
print(df1)
I have a dict I want to iterate through to find all values that contain the key. My output would be a separate dict that would contain the numbers from each dict value without the key in the value or each specific keys values in the final
dict_in =
{'6': ['2,9,8,10'], '1': ['3,5,8,9,10,12'], '4': ['2,5,7,8,9'], '2': ['3,4,7,6,13'], '12': ['1,7,5,9'], '3': ['9,11,10,1,2,13'], '10': ['1,3,6,11'], '5': ['4,1,7,11,12'], '13': ['2,3'], '8': ['1,6,4,11'], '7': ['5,2,4,9,12'], '11': ['3,5,10,8'], '9': ['12,1,3,6,4,7']}
so the output would be like this:
{'6':['3,4,7,13,1,3,11,1,4,11,12,1,3,4,7'] , '4':['3,4,6,13,1,11,12,1,6,11,12,12,1,3,6'],'13': ['4,7,6,9,11,10,1']}
I am a beginner and I do not even know where to start. Would it be easier to convert it to a list of lists?
There is one thing about your problem that is an extra challenge. That is for you or some other good samaritan to solve. This is just a nudge in your direction. Your values for the keys is actually a single string. Now if it was actual integers, the problem is not too complicated. Also to note, your expected output that you wrote based on your requirement is actually typed wrong, you missed a few values.
In case of having integers instead of a string, I can show you one approach that as a beginner you can hopefully understand:
dict_in = {'6': [2,9,8,10], '1': [3,5,8,9,10,12], '4': [2,5,7,8,9], '2': [3,4,7,6,13], '12': [1,7,5,9], '3': [9,11,10,1,2,13], '10': [1,3,6,11], '5': [4,1,7,11,12], '13': [2,3], '8' :[1,6,4,11], '7': [5,2,4,9,12], '11': [3,5,10,8], '9': [12,1,3,6,4,7]}
dict_out = {}
for key in dict_in:
if key == "6" or key == "4" or key == "13":
for k,v in dict_in.items():
for y in v:
if int(key) in v and y != int(key):
dict_out.setdefault(key, []).append(y)
Output:
{'6': [3, 4, 7, 13, 1, 3, 11, 1, 4, 11, 12, 1, 3, 4, 7], '4': [3, 7, 6, 13, 1, 7, 11, 12, 1, 6, 11, 5, 2, 9, 12, 12, 1, 3, 6, 7], '13': [3, 4, 7, 6, 9, 11, 10, 1, 2]}
Last note, I have no clue whatsoever for why on Earth, you wanted the only keys left to be 6,4 and 13.
In any case, do not consider this as a full answer.
Now sure what you mean, so just created something that will cover most part. Change it with what you want.
dict_in = {'6': ['2,9,8,10'], '1': ['3,5,8,9,10,12'], '4': ['2,5,7,8,9'], '2': ['3,4,7,6,13'], '12': ['1,7,5,9'], '3': ['9,11,10,1,2,13'], '10': ['1,3,6,11'], '5': ['4,1,7,11,12'], '13': ['2,3'], '8' :['1,6,4,11'], '7': ['5,2,4,9,12'], '11': ['3,5,10,8'], '9': ['12,1,3,6,4,7']}
dict_out = {}
for key, values in dict_in.items():
for item in values:
if True:#your condition
if key in dict_out.keys():
dict_out[key].append(item)
else:
dict_out[key] = [item]
print(dict_out)
I think the title is correct. If not, I apologize.
I have aList defined as
[24, 19, 18, 15, 15, 23, 18, 15, 18, 15]
and aDict defined as
{'1': 18, '2': 76, '3': 0, '4': 13, '5': 4, '6': 30, '7': 25, '8': 21}
and a masterDict defined (initialized with 0s) as
{'1': 0, '2': 0, '3': 0, '4': 0, '5': 0, '6': 0, '7': 0, '8': 0}
How can I check whether each element in aList matches a value in aDict? If it does, increment the corresponding key in masterDict by 1?
The code i'm currently using is
for x in aList:
for k, v in aDict.iteritems():
if x == v:
masterDict[k] = +1
However, this is returning a masterDict that looks like this
{'1': 1, '2': 0, '3': 0, '4': 0, '5': 0, '6': 0, '7': 0, '8': 0}
aList contains three occurrences of the element 18 and it matches a value in aDict. I'm looking to increment the corresponding key in masterDict three times. However, it's only incrementing one time.
The output i'm looking to produce is
{'1': 3, '2': 0, '3': 0, '4': 0, '5': 0, '6': 0, '7': 0, '8': 0}
it is because of typo in your code - should be masterDict[k] += 1
instead of masterDict[k] = +1
after change output is: {'1': 3, '2': 0, '3': 0, '4': 0, '5': 0, '6': 0, '7': 0, '8': 0}
You can just use for k in dct and if the value that key produces matches item in lst then increase the k in mst by 1
mstr = {'1': 0, '2': 0, '3': 0, '4': 0, '5': 0, '6': 0, '7': 0, '8': 0}
dct = {'1': 18, '2': 76, '3': 0, '4': 13, '5': 4, '6': 30, '7': 25, '8': 21}
lst = [24, 19, 18, 15, 15, 23, 18, 15, 18, 15]
for i in lst:
for k in dct:
if dct[k] == i:
mstr[k] += 1
print(mstr)
# {'1': 3, '2': 0, '3': 0, '4': 0, '5': 0, '6': 0, '7': 0, '8': 0}
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
Lets say I have a dictionary like this:
{'1': 2, '0': 0, '3': 4, '2': 4, '5': 1, '4': 1, '7': 0, '6': 0, '9': 0, '8': 0}
I want to remove all items with a value of zero
So that it comes out like this
{'1': 2, '3': 4, '2': 4, '5': 1, '4': 1}
Use a dict-comprehension:
In [94]: dic={'1': 2, '0': 0, '3': 4, '2': 4, '5': 1, '4': 1, '7': 0, '6': 0, '9': 0, '8': 0}
In [95]: {x:y for x,y in dic.items() if y!=0}
Out[95]: {'1': 2, '2': 4, '3': 4, '4': 1, '5': 1}
Use a dictionary comprehension:
{k: v for k, v in d.items() if v}
I have this dictionary:
d= {'1': { '2': 1, '3': 0, '4': 0, '5': 1, '6': 29 }
,'2': {'1': 13, '3': 1, '4': 0, '5': 21, '6': 0 }
,'3': {'1': 0, '2': 0, '4': 1, '5': 0, '6': 1 }
,'4': {'1': 1, '2': 17, '3': 1, '5': 2, '6': 0 }
,'5': {'1': 39, '2': 1, '3': 0, '4': 0, '6': 14 }
,'6': {'1': 0, '2': 0, '3': 43, '4': 1, '5': 0 }
}
I want to write a function that returns the column where all the values are <2 (less than 2).
So far i have turned the dictionary into a list, and then tried a lot of things that didn't work... I know that the answer is column number 4.
This was my latest attemp:
def findFirstRead(overlaps):
e= [[d[str(i)].get(str(j), '-') for j in range(1, 7)] for i in range(1, 7)]
nested_list = e
for i in map(itemgetter(x),nested_list):
if i<2:
return x+1
else:
continue
...and it was very wrong
The following set and list comprehension lists columns where the column has a max value of 2:
columns = {c for r, row in d.iteritems() for c in row}
[c for c in columns if max(v.get(c, -1) for v in d.itervalues()) < 2]
This returns ['4'].