Iterating over two pandas lists - python

I'm working on cleaning up some of my code to make it a bit more pythonic, but I'm wondering if the below could be written in a nicer way with something like an itertools or pandas method. The below code works, however I'm hoping to remove the double for-loop and consolidate a bit of the code for performance reasons.
Ultimately, I'm working with a list of indices that call a Pandas column.
def foo(dataset):
api_reshaped = pd.DataFrame(columns=['foo', 'bar'])
k = 0
for index, _ in dataset.iterrows():
for key in dataset.iloc[index][0][0]:
api_reshaped.loc[k, 'foo'] = key
api_reshaped.loc[k, 'bar'] = dataset.iloc[index][0][0][key]
k += 1
return api_reshaped
Below is the expected input/output from this function:
foo_input = pd.dataframe({
'batch_data': [{'foo_query': [{'bar_query': 'data'}]}],
'query_spell': ['foo']
})
print foo_input(foo_input)
# expected_output = pd.dataframe({
# 'foo': 'foo_query',
# 'bar': [{'bar_query': 'data'}]
# })
Many thanks!

You can use list comprehension with transpose:
# your input data
foo_input = pd.DataFrame({
'batch_data': [{'foo_query': [{'bar_query': 'data'}]}],
'query_spell': ['foo']
})
# use list comprehension with transpose
df = pd.DataFrame([item for item in foo_input['batch_data']]).T.reset_index()
# rename your columns
df.columns = ['Foo', 'Bar']
Foo Bar
0 foo_query [{'bar_query': 'data'}]
you can use applymap with a lambda function if you want to remove the list and just have a dict

Related

Python np where , variable as array index, tuple

I want to search a value in a 2d array and get the value of the correspondent "pair"
in this example i want to search for 'd' and get '14'.
I did try with np location with no success and i finished with this crap code, someone else has a smarter solution?
`
import numpy as np
ar=[[11,'a'],[12,'b'],[13,'c'],[14,'d']]
arr = np.array(ar)
x = np.where(arr == 'd')
print(x)
print("x[0]:"+str(x[0]))
print("x[1]:"+str(x[1]))
a = str(x[0]).replace("[", "")
a = a.replace("]", "")
a = int (a)
print(a)
b = str(x[1]).replace("[", "")
b = b.replace("]", "")
b = int (b) -1
print(b)
print(ar[a][b])
#got 14
`
So you want to lookup a key and get a value?
It feels like you need to use dict!
>>> ar=[[11,'a'],[12,'b'],[13,'c'],[14,'d']]
>>> d = dict([(k,v) for v,k in ar])
>>> d
{'a': 11, 'b': 12, 'c': 13, 'd': 14}
>>> d['d']
14
Use a dict, simple and straight forward:
dct = {k:v for v,k in ar}
dct['d']
If you are hell bent on using np.where, then you can use this:
import numpy as np
ar = np.array([[11,'a'],[12,'b'],[13,'c'],[14,'d']])
i = np.where(ar[:,1] == 'd')[0][0]
result = ar[i, 0]
I didn't know about np.where! It's docstring mentions using nonzero directly, so here's a code snippet that uses that to print the rows that match your requirement: note I add another row with 'd' to show it works for the general case where you want multiple rows matching the condition:
ar=[[11,'a'],[12,'b'],[13,'c'],[14,'d'],[15,'e'],[16,'d']]
arr = np.array(ar)
rows = arr[(arr=='d').nonzero()[0], :]
# array([['14', 'd'],
# ['16', 'd']], dtype='<U21')
This works because nonzero (or where) returns a tuple of row/column indexes of the match. So we just use the first entry in the tuple (an array of row indexes) to index the array row-wise and ask Numpy for all columns (:). This makes the code a bit fragile if you move to 3D or higher dimensions, so beware.
This is assuming you really do intend to use Numpy! Dict is better for many reasons.

remove empty dataframe from list and drop corresponding name in second list

I have two lists, where the first one is a list of strings called names and has been generated by using the name of the corresponding csv files.
names = ['ID1','ID2','ID3']
I have loaded the csv files into individual pandas dataframes and then done some preprocessing which leaves me with a list of lists, where each element is the data of each dataframe:
dfs = [['car','fast','blue'],[],['red','bike','slow']]
As you can see it can happen that after preprocessing a dataframe could be empty, which leads to an empty list in dfs.
I would like to remove the element from this list and return it's index, so far I have tried this but I get no index when printing k.
k = [i for i,x in enumerate(dfs) if not x]
The reason I need this index is, so I can then look at removing the corresponding index element in list names.
The end results would look a bit like this:
names = ['ID1','ID3']
dfs = [['car','fast','blue'],['red','bike','slow']]
This way I can then save each individual dataframe as a csv file:
for df, name in zip(dfs, names):
df.to_csv(name + '_.csv', index=False)
EDIT: I MADE A MISTAKE: The list of lists called dfs needs changing from [''] to []
You can use the built-in any() method:
k = [i for i, x in enumerate(dfs) if not any(x)]
The reason your
k = [i for i, x in enumerate(dfs) if not x]
doesn't work is because, regardless of what is in a list, as long as the list is not empty, the truthy value of the list will be True.
The any() method will take in an array, and return whether any of the elements in the array has a truthy value of True. If the array has no elements such, it will return False. The thruthy value of an empty string, '', is False.
EDIT: The question got edited, here is my updated answer:
You can try creating new lists:
names = ['ID1','ID2','ID3']
dfs = [['car','fast','blue'],[],['red','bike','slow']]
new_names = list()
new_dfs = list()
for i, x in enumerate(dfs):
if x:
new_names.append(names[i])
new_dfs.append(x)
print(new_names)
print(new_dfs)
Output:
['ID1', 'ID3']
[['car', 'fast', 'blue'], ['red', 'bike', 'slow']]
If it doesn't work, try adding a print(x) to the loop to see what is going on:
names = ['ID1','ID2','ID3']
dfs = [['car','fast','blue'],[],['red','bike','slow']]
new_names = list()
new_dfs = list()
for i, x in enumerate(dfs):
print(x)
if x:
new_names.append(names[i])
new_dfs.append(x)
Since you are already using enumerate , you do not have to loop again.
Hope this solves your problem:
names = ['ID1', 'ID2', 'ID3']
dfs = [['car', 'fast', 'blue'], [''], ['red', 'bike', 'slow']]
for index, i in enumerate(dfs):
if len(i) == 1 and '' in i:
del dfs[index]
del names[index]
print(names)
print(dfs)
# Output
# ['ID1', 'ID3']
# [['car', 'fast', 'blue'], ['red', 'bike', 'slow']]
I think The issue is because of [''].
l = ['']
len(l)
Gives output as 1. Hence,
not l
Gives False
If you are sure it will be [''] only, then try
dfs = [['car','fast','blue'],[''],['red','bike','slow']]
k = [i for i,x in enumerate(dfs) if len(x)==1 and x[0]=='']
this gives [1] as output
Or you can try with any(x)
Looking at the data presented, I would do the following:
Step 1: Check if list has any values. If it does, if df will be True.
Step 2: Once you have the list, create a dataframe and write to csv.
The code is as shown below:
names = ['ID1','ID2','ID3']
dfs = [['car','fast','blue'],[],['red','bike','slow']]
dfx = {names[i]:df for i,df in enumerate(dfs) if df)}
import pandas as pd
for name,val in dfx.items():
df = pd.DataFrame({name:val})
df.to_csv(name + '_.csv', index=False)

Apply function per group of values of a key in list of dicts

Let's suppose I have this:
my_list = [{'id':'1','value':'1'},
{'id':'1','value':'8'},
{'id':'2','value':'2'},
{'id':'2','value':'3'},
{'id':'2','value':'5'},
]
and I want to apply a function (eg shuffle) for each group of values separately for the key id.
So I would like to have this for example:
my_list = [{'id':'1','value':'1'},
{'id':'1','value':'8'},
{'id':'2','value':'3'},
{'id':'2','value':'5'},
{'id':'2','value':'2'},
]
Therefore I do not want something to change between the different groups of values (eg id=1,2 etc) but only within each one separately.
Use groupby directly in case your list is sorted by 'id' or sort by 'id' and use groupby:
from itertools import groupby
import random
my_list = [{'id':'1','value':'1'},
{'id':'1','value':'8'},
{'id':'2','value':'2'},
{'id':'2','value':'3'},
{'id':'2','value':'5'}]
res = []
for k, g in groupby(my_list, lambda x: x['id']):
lst = list(g)
random.shuffle(lst)
res += lst
print(res)
# [{'id':'1','value':'1'},
# {'id':'1','value':'8'},
# {'id':'2','value':'3'},
# {'id':'2','value':'5'},
# {'id':'2','value':'2'}]

creating a dictionary using a generator with format strings

I would like to create a data frame from a dictionary by looping over a list of string column names, rather than slicing the dataframe directly. For instance
df = pd.DataFrame(np.random.randn(100,7), columns=list('ABCDEFG'))
list_of_cols = ['A','B','C']
dictslice = {'%s': df['%s'] % (elt for elt in list_of_cols), 'Z': np.ones(len(df))}
But I cannot have a format string outside of a string so am not sure how to proceed. I do not want a solution like
df[[list_of_cols]]
since I want to add more vectors to dictslice that may not necessarily be in df.
Can anyone help?
EDIT
I am a fool, it works with this:
dictslice = {'%s' % elt : df[elt] for elt in list_of_cols}
but this does not work:
dictslice = {'%s' % elt : df[elt] for elt in list_of_cols, 'Z': np.ones(len(df))}
This seems like something that can be done with simple variable access.
What's wrong with this:
df = pd.DataFrame(np.random.randn(100,7), columns=list('ABCDEFG'))
list_of_cols = ['A','B','C']
dictslice = dict([(elt, df[elt]) for elt in list_of_cols] + [('Z', np.ones(len(df)))])

How do I get next element from list after search string match in python

Hi Friends I have a list where I'm searching for string and along with searched string I want to get next element of list item. Below is sample code
>>> contents = ['apple','fruit','vegi','leafy']
>>> info = [data for data in contents if 'fruit' in data]
>>> print(info)
['fruit']
I want to have output as
fruit
vegi
What about:
def find_adjacents(value, items):
i = items.index(value)
return items[i:i+2]
You'll get a ValueError exception for free if the value is not in items :)
I might think of itertools...
>>> import itertools
>>> contents = ['apple','fruit','vegi','leafy']
>>> icontents = iter(contents)
>>> iterable = itertools.dropwhile(lambda x: 'fruit' not in x, icontents)
>>> next(iterable)
'fruit'
>>> next(iterable)
'vegi'
Note that if you really know that you have an exact match (e.g. 'fruit' == data instead of 'fruit' in data), this becomes easier:
>>> ix = contents.index('fruit')
>>> contents[ix: ix+2]
['fruit', 'vegi']
In both of these cases, you'll need to specify what should happen if no matching element is found.
One way to do that is to iterate over the list zipped with itself.
Calling zip(contents, contents[1:]), allows the data variable to take on these values during the loop:
('apple', 'fruit')
('fruit', 'vegi')
('vegi', 'leafy')
in that order. Thus, when "fruit" is matched, data has the value ('fruit', 'vegi').
Consider this program:
contents = ['apple','fruit','vegi','leafy']
info = [data for data in zip(contents,contents[1:]) if 'fruit' == data[0]]
print(info)
We compare "fruit" to data[0], which will match when data is ('fruit', 'vegi').
This straightforward imperative approach worked for me:
contents = ['apple', 'fruit', 'vegi', 'leafy']
result = '<no match or no successor>'
search_term = 'fruit'
for i in range(len(contents)-1):
if contents[i] == search_term:
result = contents[i+1]
print result
Note that you don't specify what the behavior should be for 1) not finding the search term, or 2) finding a match at the end of the list.

Categories

Resources