How to pivot/cross-tab data in Python 3?

How to pivot/cross-tab data in Python 3? - python

What is the best solution to pivot/cross-tab tables in Python 3? Is there a built-in function that will do this? Ideally, I'm looking for a Python 3 solution that does not have external dependencies. For example, given a nested list:
nl = [["apples", 2 "New York"],
["peaches", 6, "New York"],
["apples", 6, "New York"],
["peaches", 1, "Vermont"]]
I would like to be able to rearrange rowed data and groupby fields:
apples peaches
New York 2 6
Vermont 6 1
The above is a trivial example, but is there a solution that would be easier than using itertools.groupby everytime a pivot is desired? Ideally, the solution would allow rowed data to be pivoted on any column. I was debating about using pandas, but it is an external library and only has limited Python 3 support.

Here is some simple code. Providing row/column/grand totals is left as an exercise for the reader.
class CrossTab(object):
def __init__(
self,
missing=0, # what to return for an empty cell.
# Alternatives: '', 0.0, None, 'NULL'
):
self.missing = missing
self.col_key_set = set()
self.cell_dict = {}
self.headings_OK = False
def add_item(self, row_key, col_key, value):
self.col_key_set.add(col_key)
try:
self.cell_dict[row_key][col_key] += value
except KeyError:
try:
self.cell_dict[row_key][col_key] = value
except KeyError:
self.cell_dict[row_key] = {col_key: value}
def _process_headings(self):
if self.headings_OK:
return
self.row_headings = list(sorted(self.cell_dict.keys()))
self.col_headings = list(sorted(self.col_key_set))
self.headings_OK = True
def get_col_headings(self):
self._process_headings()
return self.col_headings
def generate_row_info(self):
self._process_headings()
for row_key in self.row_headings:
row_dict = self.cell_dict[row_key]
row_vals = [
row_dict.get(col_key, self.missing)
for col_key in self.col_headings
]
yield row_key, row_vals
if __name__ == "__main__":
data = [["apples", 2, "New York"],
["peaches", 6, "New York"],
["apples", 6, "New York"],
["peaches", 1, "Vermont"]]
ctab = CrossTab(missing='uh-oh')
for s in data:
ctab.add_item(row_key=s[2], col_key=s[0], value=s[1])
print()
print('Column headings:', ctab.get_col_headings())
for row_heading, row_values in ctab.generate_row_info():
print(repr(row_heading), row_values)
Output:
Column headings: ['apples', 'peaches']
'New York' [8, 6]
'Vermont' ['uh-oh', 1]
See also this answer.
And this one, which I'd forgotten about.

itertools.groupby was exactly made for this problem. You will be hard-pressed to find something better, especially within the standard library.

Related

Recursive relations search between 2 columns in a table [Using python list / Dict]

I am trying to optimize a solution that I created to find recursive relations between 2 columns in a table. I need to find all accIDs for a bssID and recursively find all the bssIDs for those accIDs and so on till I find all the related bssIDs.
bssIDs
accIDs
ABC
4424
ABC
56424
ABC
2383
A100BC
2383
A100BC
4943
A100BC
4880
A100BC
6325
A100BC
4424
XYZ
123
The below solution works for an initial table of 100K rows but the below solution runs for >16 hours for a dataset of 20 million rows. I am trying to use dicts instead of list but I am unable to change the dict while iterating over the same as I am with a list.
import time
accIds = {4880: ['A100BC'], 6325: ['A100BC'], 2383: ['A100BC','ABC'],4424: ['A100BC','ABC'], 4943: ['A100BC'], 56424: ['ABC'],123: ['XYZ']}
bssIds = {'ABC': [4424,56424,2383], 'A100BC': [2383,4943,4880,6325,4424], 'XYZ':[123]}
def findBIDs(aID):
return accIds[aID]
def findAIDs(bID):
return bssIds[bID]
def getList(Ids):
return Ids.keys()
def checkList(inputList, value):
return (value in inputList)
def addToList(inputList, value):
return inputList.append(value)
def removeFromList(inputList, value):
return inputList.remove(value)
aIDlist = list(getList(accIds))
bIDlist = list(getList(bssIds))
bRelations = {}
runningList = list()
for x in bIDlist:
if not checkList(runningList,x):
aList = list()
bList = list()
addToList(bList, x)
for y in bList:
for c in findAIDs(y):
if not checkList(aList, c):
addToList(aList, c)
for z in aList:
for a in findBIDs(z):
if not checkList(bList, a):
addToList(bList, a)
bRelations.update({time.time_ns(): bList})
runningList.extend(bList)
print(bRelations)
Output : {1652374114032173632: ['ABC', 'A100BC'], 1652374114032180888: ['XYZ']}
Please suggest if there is a way to update a dict while iterating over it or If we can apply a recursive solution for the same.

This is the fastest I could think of:
accIds = {4880: frozenset(['A100BC']), 6325: frozenset(['A100BC']), 2383: frozenset(['A100BC','ABC']),4424: frozenset(['A100BC','ABC']), 4943: frozenset(['A100BC']), 56424: frozenset(['ABC']),123: frozenset(['XYZ'])}
bssIds = {'ABC': frozenset([4424,56424,2383]), 'A100BC': frozenset([2383,4943,4880,6325,4424]), 'XYZ':frozenset([123])}
def search_bssid(bssId):
traversed_accIds = set()
traversed_bssIds = {bssId}
accIds_to_check = []
bssIds_to_check = [bssId]
while bssIds_to_check:
bssId = bssIds_to_check.pop()
new_accids = bssIds[bssId] - traversed_accIds
traversed_accIds.update(new_accids)
accIds_to_check.extend(new_accids)
while accIds_to_check:
accId = accIds_to_check.pop()
new_bssids = accIds[accId] - traversed_bssIds
traversed_bssIds.update(new_bssids)
bssIds_to_check.extend(new_bssids)
return traversed_bssIds
print(search_bssid("ABC"))

How to loop through values in JSON and assign to another dictionary

I am developing a Python/Django web app. I am trying to parse JSON into a python dictionary, read the values in the dictionary, and assign the values to another dictionary if certain conditions are met.
JSON is structured like this:
{content: {cars: [0, 1, 2]}, other_stuff: []}
Each car has multiple attributes:
0: {"make", "model", "power"...}
Each attribute has three variables:
make: {"value": "Toyota", "unit": "", "user_edited": "false"}
I am trying to assign the values in the JSON to other dictionaries; car_0, car_1 and car_2. In this case the JSON response is otherwise identical considering each car, but the 'make' of the first car is changed to 'Nissan', and I'm trying to then change the make of the car_0 also to 'Nissan'. I'm parsing JSON in the following way:
local_cars = [car_0, car_1, car_2] # Dictionaries which are already initialized.
print(local_cars[0] is local_cars[1]) # Prints: false
print(local_cars[0]['make']['value']) # Prints: Toyota (yes)
print(local_cars[1]['make']['value']) # Prints: Toyota (yes)
print(local_cars[2]['make']['value']) # Prints: Toyota (yes)
counter = 0
if request.method == 'POST':
payload = json.loads(request.body)
if bool(payload):
print(len(local_cars)) # Prints: 3
print(counter, payload['cars'][0]['make']['value']) # Prints: Nissan (yes)
print(counter, payload['cars'][1]['make']['value']) # Prints: Toyota (yes)
print(counter, payload['cars'][2]['make']['value']) # Prints: Toyota (yes)
print(counter, local_cars[0]['make']['value']) # Prints: Toyota (yes)
print(counter, local_cars[1]['make']['value']) # Prints: Toyota (yes)
print(counter, local_cars[2]['make']['value']) # Prints: Toyota (yes)
for target_car in payload['cars']: # Loop through all three cars in payload
print(local_cars[0] is local_cars[1]) # false
for attr in target_car.items(): # Loop through all key:dict pairs of a single car
attribute_key = attr[0] # Key (eg. 'make')
vars_dict = attr[1] # Dictionary of variables ('value': 'xx', 'unit': 'yy', 'user_edited': 'zz')
if vars_dict['user_edited'] == 'true':
local_cars[counter][attribute_key]['user_edited'] = 'true'
local_cars[counter][attribute_key]['value'] = vars_dict['value']
print(counter, local_cars[counter]['make']['value']) # Prints: 0, Toyota (yes), 1, Nissan (no!), 2, Nissan (no!)
counter = counter + 1
What I don't understand is why the other cars, local_cars[1] and local_cars[2] are affected in anyway in this loop. As it can be seen, for some reason their 'make' is changed to 'Nissan' even though it was 'Toyota' in the request body. This seems to happen in the first round of 'for target_car in payload['cars'].
Abandoning the loop/counter and focusing on one car does not make any difference:
for target_car in payload['cars']: --> target_car = payload['cars'][0]:
...
local_cars[0][attribute_key]['user_edited'] = 'true'
local_cars[0][attribute_key]['value'] = vars_dict['value']
What am I doing wrong? How can the car_1 and car_2 be affected even if I change the only part of the code where any values in those dictionaries are edited to affect only on the local_cars[0]?
UPDATED
Received the correct answer for this. Before the part of code originally posted, I initialized the car_0, car_1 and car_2 dictionaries.
What I did before was:
default_car = model_to_dict(Car.objects.first())
car_0 = {}
car_1 = {}
car_2 = {}
attribute = {}
i = 0
for key, value in default_car.items():
if i > 1:
attribute[key] = {"value": value, "unit": units.get(key), "user_edited": "false"}
i = i + 1
car_0.update(attribute)
car_1.update(attribute)
car_2.update(attribute)
local_cars = [car_0, car_1, car_2]
...
Apparently it was the problem that all car_x had a connection to attribute-dictionary. I solved the problem by editing the car_x initialization to the following:
default_car = model_to_dict(Car.objects.first())
car_0 = {}
car_1 = {}
car_2 = {}
attribute_0 = {}
attribute_1 = {}
attribute_2 = {}
i = 0
for key, value in default_car.items():
if i > 1:
attribute_0[key] = {"value": value, "unit": units.get(key), "user_edited": "false"}
attribute_1[key] = {"value": value, "unit": units.get(key), "user_edited": "false"}
attribute_2[key] = {"value": value, "unit": units.get(key), "user_edited": "false"}
i = i + 1
car_0.update(attribute_0)
car_1.update(attribute_1)
car_2.update(attribute_2)
local_cars = [car_0, car_1, car_2]
...

I think you are probably failing to take copies of car_0 etc. Don't forget that python assignment is purely name-binding.
x = car_0
y = car_0
print( x['make']['value'] ) # 'Toyota'
print( y['make']['value'] ) # 'Toyota'
print( x is y ) # True. Both names refer to the same object
x['make']['value'] = 'foo'
print( y['make']['value'] ) # 'foo'
Should have been y = car_0.copy() or even y=car_0.deepcopy().
I don't fully follow your code, but if you are still unsure then do some is testing to find out which entities are bound to the same object (and shouldn't be).

Generate strings using translations of several characters, mapped to several others

I'm facing quite a tricky problem in my python code. I looked around and was not able to find anyone with a similar problem.
I'd like to generate strings translating some characters into several, different ones.
I'd like that original characters, meant to be replaced (translated), to be replaced by several different ones.
What I'm looking to do is something like this :
text = "hi there"
translations = {"i":["b", "c"], "r":["e","f"]}
result = magicfunctionHere(text,translations)
print(result)
> [
"hb there",
"hc there",
"hi theee",
"hi thefe",
"hb theee",
"hb thefe",
"hc theee",
"hc thefe"
]
The result contains any combination of the original text with 'i' and 'r' replaced respectively by 'b' and 'c', and 'e' and 'f'.
I don't see how to do that, using itertools and functions like permutations, product etc...
I hope I'm clear enough, it is quite a specific problem !
Thank you for your help !

def magicfunction(ret, text, alphabet_location, translations):
if len(alphabet_location) == 0:
ret.append(text)
return ret
index = alphabet_location.pop()
for w in translations[text[index]]:
ret = magicfunction(ret, text[:index] + w + text[index + 1:], alphabet_location, translations)
alphabet_location.append(index)
return ret
def magicfunctionHere(text, translations):
alphabet_location = []
for key in translations.keys():
alphabet_location.append(text.find(key))
translations[key].append(key)
ret = []
ret = magicfunction(ret, text, alphabet_location, translations)
ret.pop()
return ret
text = "hi there"
translations = {"i":["b", "c"], "r":["e","f"]}
result = magicfunctionHere(text,translations)
print(result)

One crude way to go would be to use a Nested Loop Constructin 2 steps (Functions) as depicted in the Snippet below:
def rearrange_characters(str_text, dict_translations):
tmp_result = []
for key, value in dict_translations.items():
if key in str_text:
for replacer in value:
str_temp = str_text.replace(key, replacer, 1)
if str_temp not in tmp_result:
tmp_result.append(str_temp)
return tmp_result
def get_rearranged_characters(str_text, dict_translations):
lst_result = rearrange_characters(str_text, dict_translations)
str_joined = ','.join(lst_result)
for str_part in lst_result:
str_joined = "{},{}".format(str_joined, ','.join(rearrange_characters(str_part, dict_translations)))
return set(str_joined.split(sep=","))
text = "hi there"
translations = {"i": ["b", "c"], "r":["e","f"]}
result = get_rearranged_characters(text, translations)
print(result)
## YIELDS: {
'hb theee',
'hc thefe',
'hc there',
'hi thefe',
'hb thefe',
'hi theee',
'hc theee',
'hb there'
}
See also: https://eval.in/960803
Another equally convoluted approach would be to use a single function with nested loops like so:
def process_char_replacement(str_text, dict_translations):
tmp_result = []
for key, value in dict_translations.items():
if key in str_text:
for replacer in value:
str_temp = str_text.replace(key, replacer, 1)
if str_temp not in tmp_result:
tmp_result.append(str_temp)
str_joined = ','.join(tmp_result)
for str_part in tmp_result:
tmp_result_2 = []
for key, value in dict_translations.items():
if key in str_part:
for replacer in value:
str_temp = str_part.replace(key, replacer, 1)
if str_temp not in tmp_result_2:
tmp_result_2.append(str_temp)
str_joined = "{},{}".format(str_joined, ','.join(tmp_result_2))
return set(str_joined.split(sep=","))
text = "hi there"
translations = {"i": ["b", "c"], "r":["e","f"]}
result = process_char_replacement(text, translations)
print(result)
## YIELDS: {
'hb theee',
'hc thefe',
'hc there',
'hi thefe',
'hb thefe',
'hi theee',
'hc theee',
'hb there'
}
Refer to: https://eval.in/961602

Create multiple list with different variables

I would like to create a bunch of empty lists with names such as:
author1_count = []
author2_count = []
...
...
and so on...but a priori I do not know how many lists I need to generate.
Answers to question similar this one suggest to create a dictionary as in (How to create multiple (but individual) empty lists in Python?) or an array of lists. However, I wish to append values to the list as in:
def search_list(alist, aname):
count = 0
author_index = 0
author_list = alist
author_name = aname
for author in author_list:
if author == author_name:
author_index = author_list.index(author)+1
count = 1
return count, author_index
cehw_list = ["Ford, Eric", "Mustang, Jason", "BMW, James", "Mercedes, Megan"]
author_list = []
for author in authors:
this_author = author.encode('ascii', 'ignore')
author_list.append(this_author)
# Find if the author is in the authorlist
for cehw in cehw_list:
if cehw == cehw_list[0]:
count0, position0 = search_list(author_list, cehw)
author1_count.append(count0)
elif cehw == cehw_list[1]:
count1, position1 = search_list(author_list, cehw)
author2_count.append(count1)
...
...
Any idea how to create such distinct lists. Is there an elegant way to do this?

Dictionaries! You only need to be more specific when appending values, e.g.
author_lists = {}
for i in range(3):
author_lists['an'+str(i)] = []
author_lists
{'an0': [], 'an1': [], 'an2': []}
author_lists['an0'].append('foo')
author_lists
{'an0': ['foo'], 'an1': [], 'an2': []}

You should be able to use a dictionary still.
data = {}
for cehw in cehw_list:
count0, position0 = search_list(author_list, cehw)
# Or whatever property on cehw that has the unique identifier
if cehw in data:
data[cehw].append(count0)
else:
data[cehw] = [count0]

load parameters from a file in Python

I am writing a Python class to model a process and I want to initialized the parameters from a file, say 'input.dat'. The format of the input file looks like this.
'input.dat' file:
Z0: 0 0
k: 0.1
g: 1
Delta: 20
t_end: 300
The code I wrote is the following. It works but appears redundant and inflexible. Is there a better way to do the job? Such as a loop to do readline() and then match the keyword?
def load(self,filename="input.dat"):
FILE = open(filename)
s = FILE.readline().split()
if len(s) is 3:
self.z0 = [float(s[1]),float(s[2])] # initial state
s = FILE.readline().split()
if len(s) is 2:
self.k = float(s[1]) # kappa
s = FILE.readline().split()
if len(s) is 2:
self.g = float(s[1])
s = FILE.readline().split()
if len(s) is 2:
self.D = float(s[1]) # Delta
s = FILE.readline().split()
if len(s) is 2:
self.T = float(s[1]) # end time

Assuming the params are coming from a safe place (made by you or users, not the internet), just make the parameters file a Python file, params.py:
Z0 = (0, 0)
k = 0.1
g = 1
Delta = 20
t_end = 300
Then in your code all you need is:
import params
fancy_calculation(10, k=params.k, delta=params.Delta)
The beauty of this is two-fold: 1) simplicity, and 2) you can use the power of Python in your parameter descriptions -- particularly useful here, for example:
k = 0.1
Delta = 20
g = 3 * k + Delta
Alternatively, you could use Python's built-in JSON or ConfigParser .INI parser modules.

If you are open to some other kind of file where you can keep your parameters, I would suggest you to use a YAML file.
The Python library is PyYAML. This is how you can easily use it with Python.
For a better introduction, look at this Wikipedia article: http://en.wikipedia.org/wiki/YAML.
The benefit is you can read the parameter values as lists or maps.
You would love it!

Try the following:
def load(self, filename="input.dat"):
d = {"Z0": "z0", "k": "k", "g": "g", "Delta": "D", "t_end": "T"}
FILE = open(filename)
for line in FILE:
name, value = line.split(":")
value = value.strip()
if " " in value:
value = map(float, value.split())
else:
value = float(value)
setattr(self, d[name], value)
Proof that it works:
>>> class A(object): pass
...
>>> a = A()
>>> load(a)
>>> a.__dict__
{'k': 0.10000000000000001, 'z0': [0.0, 0.0], 'D': 20.0, 'g': 1.0, 'T': 300.0}

As others have mentioned, in Python you can create object attributes dynamically "on the fly". That means you could do something like the following to create Params objects as they're read-in. I've tried to make the code as data-driven as possible, so relatively flexible.
# maps label to attribute name and types
label_attr_map = {
"Z0:": ["z0", float, float],
"k:": [ "k", float],
"g:": [ "g", float],
"Delta:": [ "D", float],
"t_end:": [ "T", float]
}
class Params(object):
def __init__(self, input_file_name):
with open(input_file_name, 'r') as input_file:
for line in input_file:
row = line.split()
label = row[0]
data = row[1:] # rest of row is data list
attr = label_attr_map[label][0]
datatypes = label_attr_map[label][1:]
values = [(datatypes[i](data[i])) for i in range(len(data))]
self.__dict__[attr] = values if len(values) > 1 else values[0]
params = Params('input.dat')
print 'params.z0:', params.z0
print 'params.k:', params.k
print 'params.g:', params.g
print 'params.D:', params.D
print 'params.T:', params.T
Output:
params.z0: [0.0, 0.0]
params.k: 0.1
params.g: 1.0
params.D: 20.0
params.T: 300.0

Perhaps this might give you what you need:
def load(self,filename='input.dat'):
with open(filename) as fh:
for line in fh:
s = line.split()
if len(s) == 2:
setattr(self,s[1],s[2])
elif len(s) == 3:
setattr(self,s[1],s[2:])
I also didn't include any error checking, but setattr is very handy.

Something like this:
def load(self,filename="input.dat"):
# maps names to number of fields they need
# only necessary for variables with more than 1 field
argmap = dict(Z0=2)
# maps config file names to their attribute names on the object
# if name is the same both places, no need
namemap = dict(Z0="z0", Delta="D", t_end="T")
with open(filename) as FILE:
for line in FILE:
s = line.split()
var = s[0].rstrip(":")
try:
val = [float(x) for x in s[1:]]
except ValueError:
continue
if len(val) == varmap.get(var, 1):
if len(val) == 1:
val = val[0]
setattr(self, namemap.get(var, var), val)

Python objects have a built-in __dict__ member. You can modify it, and then refer to properties as obj.key.
class Data(object):
def __init__(self, path='infile.dat'):
with open(path, 'r') as fo:
for line in fo.readlines():
if len(line) < 2: continue
parts = [s.strip(' :\n') for s in line.split(' ', 1)]
numbers = [float(s) for s in parts[1].split()]
# This is optional... do you want single values to be stored in lists?
if len(numbers) == 1: numbers = numbers[0]
self.__dict__[parts[0]] = numbers
# print parts -- debug
obj = Data('infile.dat')
print obj.g
print obj.Delta
print obj.Z0
At the end of this, we print out a few of the keys. Here's the output of those.
1.0
20.0
[0.0, 0.0]
For consistency, you can remove the line marked "optional" in my code, and have all objects in lists -- regardless of how many elements they have. That will make using them quite a bit easier, because you never have to worry about obj.g[0] returning an error.

Here's another one
def splitstrip(s):
return s.split(':')[1].strip()
with open('input.dat','r') as f:
a.z0 = [float(x) for x in splitstrip(f.readline()).split(' ')]
a.k, a.g, a.D, a.T = tuple([float(splitstrip(x)) for x in f.read().rstrip().split('\n')])
;)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to pivot/cross-tab data in Python 3? - python

itertools.groupby was exactly made for this problem. You will be hard-pressed to find something better, especially within the standard library.

Related

Recursive relations search between 2 columns in a table [Using python list / Dict]

How to loop through values in JSON and assign to another dictionary

Generate strings using translations of several characters, mapped to several others

Create multiple list with different variables

load parameters from a file in Python

Categories

Resources