Mapping data from excel with Python - python

I am reading data from an xls spreadsheet with xlrd. First, I gather the index for the column that contains the data that I need (may not always be in the same column in every instance):
amr_list, pssr_list, inservice_list = [], [], []
for i in range(sh.ncols):
for j in range(sh.nrows):
if 'amrprojectnumber' in sh.cell_value(j,i).lower():
amr_list.append(sh.cell_value(j,i))
if 'pssrnumber' in sh.cell_value(j,i).lower():
pssr_list.append(sh.cell_value(j,i))
if 'inservicedate' in sh.cell_value(j,i).lower():
inservice_list.append(sh.cell_value(j,i))
Now I have three lists, which I need to use for writing data to a new workbook. The values in a row are related. So the index of an item in one list corresponds to the same index of the items in the other lists.
The amr_list has repeating string values. For example:
['4006BA','4006BA','4007AC','4007AC','4007AC']
The pssr_list always shares the same value as the amr_list but with additional info:
['4006BA(1)','4006BA(2)','4007AC(1)','4007AC(2)','4007AC(3)']
Finally, the inservice_list may or may not contain a variable date (as read from excel):
[40780.0, '', 40749.0, 40764.0, '']
This is the result I want from the data:
amr = { '4006BA':[('4006BA(1)',40780.0),('4006BA(2)','')], '4007AC':[('4007AC(1)',40749.0),('4007AC(2)',40764.0),('4007AC(3)','')] }
But I am having a hard time figuring out how an easy way to get there. Thanks in advance.

Maybe this can help:
A = ['4006BA','4006BA','4007AC','4007AC','4007AC']
B = ['4006BA(1)','4006BA(2)','4007AC(1)','4007AC(2)','4007AC(3)']
C = [40780.0, '', 40749.0, 40764.0, '']
result = dict()
for item in xrange(len(A)):
key = A[item]
result.setdefault(key, [])
result[key].append( (B[item], C[item] ) )
print result
This will print you the data in the format you are looking for.

look into itertools.groupby and
zip(amr_list, pssr_list, inservice_list)
For your case:
dict((x,list(a[1:] for a in y)) for x,y in
itertools.groupby(zip(amr_list, pssr_list, inservice_list), lambda z: z[0]))
Note that this assumes your input is sorted by amr_list.
Another approach would be:
combined={}
for k, v in zip(amr_list, zip(pssr_list, inservice_list)):
combined.setdefault(k, []).append(v)
Which does not require your input to be sorted.

Related

Combining lists within a nested list, if lists contain the same element?

I have nested list that has a structure similar to this, except it's obviously much longer:
mylist = [ ["Bob", "12-01 2:30"], ["Sal", "12-01 5:23"], ["Jill", "12-02 1:28"] ]
My goal is to create another nested lists that combines all elements that have the same date. So, the following output is desired:
newlist = [ [["Bob", "12-01 2:30"], ["Sal", "12-01 5:23"]], [["Jill", "12-02 1:28"]] ]
Above, all items with the date 12-01, regardless of time, are combined, and all elements of 12-02 are combined.
I've sincerely been researching how to do this for the past 1 hour and can't find anything. Furthermore, I'm a beginner at programming, so I'm not skilled enough to try to create my own solution. So, please don't think that I haven't attempted to do research or put any effort into trying this problem myself. I'll add a few links as examples of my research below:
Collect every pair of elements from a list into tuples in Python
Create a list of tuples with adjacent list elements if a condition is true
How do I concatenate two lists in Python?
Concatenating two lists of Strings element wise in Python without Nested for loops
Zip two lists together based on matching date in string
How to merge lists into a list of tuples?
Use dict or orderdict(if the sort is important) group data by the date time .
from collections import defaultdict # use defaultdict like {}.setdefault(), it's very facility
mylist = [["Bob", "12-01 2:30"], ["Sal", "12-01 5:23"], ["Jill", "12-02 1:28"]]
record_dict = defaultdict(list)
# then iter the list group all date time.
for data in mylist:
_, time = data
date_time, _ = time.split(" ")
record_dict[date_time].append(data)
res_list = list(record_dict.values())
print(res_list)
output:
[[['Bob', '12-01 2:30'], ['Sal', '12-01 5:23']], [['Jill', '12-02 1:28']]]
A pure list-based solution as an alternative to the accepted dictionary-based solution. This offers the additional feature of easily sorting the whole list, first by date, then by hour, then by name
from itertools import groupby
mylist = [["Bob", "12-01 2:30"], ["Sal", "12-01 5:23"], ["Jill", "12-02 1:28"]]
newlist = [dt.split() + [name] for (name, dt) in mylist]
newlist.sort() # can be removed if inital data is already sorted by date
newlist = [list(group) for (date, group) in groupby(newlist, lambda item:item[0])]
# result:
# [[['12-01','2:30','Bob'], ['12-01','5:23','Sal']], [['12-02','1:28','Jill']]]
If you really want the same item format as the initial list, it requires a
double iteration:
newlist = [[[name, date + ' ' + time] for (date, time, name) in group]
for (date, group) in groupby(newlist, lambda item:item[0])]
# result:
# [[['Bob', '12-01 2:30'], ['Sal', '12-01 5:23']], [['Jill', '12-02 1:28']]]
If you don't mind going heavy on your memory usage, you can try using a dictionary. You can use the date as the key and make a list of values.
all_items = {}
for line in myList:
x, y = line
date, time = y.split()
try:
all_items[date].append(line)
except:
all_items[date] = [line,]
Then, you can create a new list using the sorted date for keys.
If all of the elements with the same date are consecutive, you can use itertools.groupby:
list(map(list, groupby(data, lambda value: ...)))

How do i retrieve the variable name for each dict in a list of dicts?

I may be missing something fundamental here but consider the following:
graph=nx.read_graphml('path here...')
dDict=dict(nx.degree_centrality(graph)) #create dict
lDict=dict(nx.load_centrality(graph))
new_lists=[dDict,lDict]
for i in new_lists:
print().... #how to get variable name i.e. dDict
how do i iterate through the list of dicts so that when i do a print it returns me the variable name the dict equals i.e. i want to be able to retrieve back 'dDict' and 'lDict'?I do not want a quick hack such as
dDict['name'] = 'dDict'
Any ideas..?
EDIT: the reason i want to do this is so that i can append these centrality measures to a dataframe with new column name i.e.:
for idx in range(len(new_lists)):
for i in range(len(df)):
rowIndex = df.index[i]
df.loc[rowIndex, idx] = new_lists[idx][rowIndex] #instead of idx how do i dynamically name the column according to the list item name.
You can iterate over globals() and get the variable name of the object that matches the content you are looking for.
More info on https://docs.python.org/3/library/functions.html?highlight=globals#globals
However, this is a rather cumbersome trick, you should not do that! Rather, redesign your software so you don't have to look for the variable names in the first place.
def get_var_name_of(x):
return [k for k,v in globals().items() if v==x][0]
dDict = {1:'asd'}
lDict = {2:'qwe'}
new_list=[dDict,lDict]
for d in new_list:
print(get_var_name_of(d))
dDict
lDict

Compare dictionary with a variable

I am trying to compare a dictionary value with a variable but for some reasons I can't output the part that I want from the dictionary.
The dictionary is an ouput from a html table.
This is the code that I use to prase the html table into a dictionary:
with open('output.csv') as fd:
rd = csv.DictReader(fd, skipinitialspace=True)
for row in rd:
lista = { k: row[k] for k in row if k in ['Name', 'Clan Days']}
This is the output:
{'Name': 'SirFulgeruL2k19', 'Clan Days': '140'}
{'Name': 'Darius', 'Clan Days': '127'}
How to I compare for example the clan days from the first dictionary and if the value matches the value that I set in a variable should get the name as a string so I can later use it in another line.
Assuming you first read the data into a list of dictionaries:
data = [{ k: row[k] for k in row if k in ['Name', 'Clan Days']}
for row in rd]
You may use next() to search for the first dictionary in data matching the Clan Days value defaulting to None if no entries matched your search query:
desired_clan_days = '140'
clan_name = next((entry["Name"] for entry in data
if entry["Clan Days"] == desired_clan_days), None)
Now, next() would return you the first match, if you need all of the matches, just use a list comprehension:
clan_names = [entry["Name"] for entry in data
if entry["Clan Days"] == desired_clan_days]
Note that this kind of search requires you to, in the worst case (entry not found), loop through all the entries in data. If this kind of search is the primary use case of this data structure, consider re-designing it to better fit the problem - e.g. having clan_days value as a key with a list of clan names:
data = {
"140": ["SirFulgeruL2k19"],
"127": ["Darius"]
}
In that state, getting a match would be a constant operation and as easy as data[desired_clan_days]. defaultdict(list) is something that would help you to make that transformation.
Not really sure what exactly you want, but if it's just comparing a dictionary value to a variable and getting the Name part if they match, you would get something like this..
>>> dict = {'Name': 'SirFulgeruL2k19', 'Clan Days': '140'}
>>> target = 140
>>> if int(dict['Clan Days']) == target:
... name = dict['Name']
...
>>> name
'SirFulgeruL2k19'
Edit: Read your post too quickly, considering it's all the rows from a HTML table this code is too simple. Use alecxe's answer :)

Short way to add dictionary item to a set only when its string value isn't empty?

I currently have this piece of code:
name_set = set()
reader = [{'name':'value1'}, {'name':''}, {'name':'value2'}]
for row in reader:
name = row.get('name', None)
if name:
name_set.add(name)
print(name_set)
In the real code the reader is a DictReader, but I use a list with dicts to represent this.
Note that the if name: will check for:
Empty string present in the Dictionary (thus "")
Not whenever the key does not exist in the Dictionary
Although, I think this code is easy readable, but I'm wondering if there is a shorter way as this code is 6 lines to simply extract values from dicts and save these in a set.
Your existing code is fine.
But since you asked for a "short" way, you could just use set comprehensions/arithmetic:
>>> reader = [{'name':'value1'}, {'name':''}, {'name':'value2'}]
>>> {d['name'] for d in reader} - {''}
{'value1', 'value2'}

Remove duplicate data from an array in python

I have this array of data
data = [20001202.05, 20001202.05, 20001202.50, 20001215.75, 20021215.75]
I remove the duplicate data with list(set(data)), which gives me
data = [20001202.05, 20001202.50, 20001215.75, 20021215.75]
But I would like to remove the duplicate data, based on the numbers before the "period"; for instance, if there is 20001202.05 and 20001202.50, I want to keep one of them in my array.
As you don't care about the order of the items you keep, you could do:
>>> {int(d):d for d in data}.values()
[20001202.5, 20021215.75, 20001215.75]
If you would like to keep the lowest item, I can't think of a one-liner.
Here is a basic example for anybody who would like to add a condition on the key or value to keep.
seen = set()
result = []
for item in sorted(data):
key = int(item) # or whatever condition
if key not in seen:
result.append(item)
seen.add(key)
Generically, with python 3.7+, because dictionaries maintain order, you can do this, even when order matters:
data = {d:None for d in data}.keys()
However for OP's original problem, OP wants to de-dup based on the integer value, not the raw number, so see the top voted answer. But generically, this will work to remove true duplicates.
data1 = [20001202.05, 20001202.05, 20001202.50, 20001215.75, 20021215.75]
for i in data1:
if i not in ls:
ls.append(i)
print ls

Categories

Resources