Python - list of tuples from file - python

I have completed some rather intensive calculations, and i was not able to save my results in pickle (recursion depth exceded), so i was forced to print all the data and save it in a text file.
Is there any easy way to now convert my list of tuples in text to well... list of tuples in python? the output looks like this:
[(10, 5), (11, 6), (12, 5), (14, 5), (103360, 7), (16, 6), (102725, 7), (17, 6), (18, 5), (19, 9), (20, 6), ...(it continues for 60MB)]

You can use ast.literal_eval():
>>> s = '[(10, 5), (11, 6), (12, 5), (14, 5)]'
>>> res = ast.literal_eval(s)
[(10, 5), (11, 6), (12, 5), (14, 5)]
>>> res[0]
(10, 5)

string = "[(10, 5), (11, 6), (12, 5), (14, 5), (103360, 7), (16, 6), (102725, 7), (17, 6), (18, 5), (19, 9), (20, 6)]" # Read it from the file however you want
values = []
for t in string[1:-1].replace("),", ");").split("; "):
values.append(tuple(map(int, t[1:-1].split(", "))))
First I remove the start and end square bracket with [1:-1], I replace ), with ); to be able to split by ; so that the it foesn't split by the commas inside the tuples as they are not preceded by a ). Inside the loop I'm using [1:-1] to remove the parenthesis this time and splitting by the commas. The map part is to convert the numeric strs into ints and I'm appending them as a tuple.

Related

List of lists of tuples, sum element-wise

I have a list of lists of tuples. Each inner list contains 3 tuples, of 2 elements each:
[
[(3, 5), (4, 5), (4, 5)],
[(7, 13), (9, 13), (10, 13)],
[(5, 7), (6, 7), (7, 7)]
]
I need to get a single list of 3 tuples, summing all these elements "vertically", like this:
(3, 5), (4, 5), (4, 5)
+ + + + + +
(7, 13), (9, 13), (10, 13)
+ + + + + +
(5, 7), (6, 7), (7, 7)
|| || ||
[(15, 25), (19, 25), (21, 25)]
so, for example, the second tuple in the result list is given by the sums of the second tuples in the initial list
(4+9+6, 5+13+7) = (19, 25)
I'm trying with list/tuple comprehensions, but I'm getting a little lost with this.
You can use zip and sum for something a little longer, but without the heavyweight dependency on numpy if you aren't already using it.
>>> [tuple(sum(v) for v in zip(*t)) for t in zip(*x)]
[(15, 25), (19, 25), (21, 25)]
The outer zip pairs the corresponding tuples together; the inner zip pairs corresponding elements of those tuples together for addition.
You could do this pretty easily with numpy. Use sum on axis 0.
import numpy as np
l = [
[(3, 5), (4, 5), (4, 5)],
[(7, 13), (9, 13), (10, 13)],
[(5, 7), (6, 7), (7, 7)]
]
[tuple(x) for x in np.sum(l,0)]
Output
[(15, 25), (19, 25), (21, 25)]
You could do this with pure python code.
lst = [
[(3, 5), (4, 5), (4, 5)],
[(7, 13), (9, 13), (10, 13)],
[(5, 7), (6, 7), (7, 7)]
]
lst2 = []
for a in range(len(lst[0])):
l = []
for i in range(len(lst)):
l.append(lst[i][a])
lst2.append(l)
output = []
for a in lst2:
t = [0 for a in range(len(lst[0][0]))]
for i in range(len(a)):
for z in range(len(a[i])):
t[z]+= a[i][z]
output.append(tuple(t))
print(output)
if you change the list then its is works.
output
IN:
lst = [
[(3, 5), (4, 5), (4, 5)],
[(7, 13), (9, 13), (10, 13)],
[(5, 7), (6, 7), (7, 7)]
]
OUT:
[(15, 25), (19, 25), (21, 25)]
IN:
lst = [
[(3, 5,2), (4, 5,3), (4, 5,1)],
[(7, 13,1), (9, 13,3), (10, 13,3)],
[(5, 7,6), (6, 7,3), (7, 7,7)]
]
OUT:
[(15, 25, 9), (19, 25, 9), (21, 25, 11)]
data = [
[(3, 5), (4, 5), (4, 5)],
[(7, 13), (9, 13), (10, 13)],
[(5, 7), (6, 7), (7, 7)]
]
result = [tuple(sum(x) for x in zip(*t)) for t in zip(*data)]
print(result)
This is a one-liner, I don't think you can get more pythonic than this.

Specifying a color to a data value in a tuple and plotting into a graph in Python

I'm importing data from a .json file, where I transformed the dictionary into a list of tuples. These tuples represent the data as a timestamp and a value marked at that specified timestamp, such as this example:
participant_1 = [(1, 8), (2, 2), (3, 2), (4, 1), (5, 3), (6, 5), (7, 6), (8, 6), (9, 8), (10, 9), (11, 9), (12, 9), (13, 3), (14, 3), (15, 4), (16, 5), (17, 6), (18, 6), (19, 7), (20, 8), (21, 8), (22, 9), (23, 9), (24, 9), (25, 9), (26, 9), (27, 9)]
participant_2 = [(1, 5), (2, 5), (3, 1), (4, 3), (5, 4), (6, 5), (7, 5), (8, 7), (9, 8), (10, 9), (11, 10), (12, 10), (13, 10), (14, 10), (15, 10), (16, 10), (17, 10), (18, 0), (19, 0), (20, 0), (21, 0), (22, 0), (23, 0), (24, 0), (25, 0), (26, 0), (27, 0)]
I'll have multiple lists (of multiple participants) where the timestamp (first value of the tuple) will not change but the second (marked value) will. What I want to do is plot a graph where I can compare the marked values (therefore, the x-axis will be the time and the y-axis the marked values).
The way I want to compare the data is by horizontal bars where a different color would represent the marked value. These values range from 0 - 10. Thus, for each of these values, I would like to assign a color. In this way, there would be multiple horizontal bars, for each participant, and for each marked value, a different color (so that I can see the differences between the marked values of participants).
I do not wish for multiple bars for each participant - more like a stacked graph where the marked value would be one color, and those change according to the timestamp. In this way, I would be able to compare the marked values of the participants in a timeframe. I have an example from a paper:
Example
However, I couldn't find any way to do this yet.
Thanks.
You could convert each list to a dataframe, using the timestamp as index. The concatenation of these lists as columns to an assembling dataframe can be shown as a heatmap.
Here is some example code:
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
participant_1 = [(1, 8), (2, 2), (3, 2), (4, 1), (5, 3), (6, 5), (7, 6), (8, 6), (9, 8), (10, 9), (11, 9), (12, 9), (13, 3), (14, 3), (15, 4), (16, 5), (17, 6), (18, 6), (19, 7), (20, 8), (21, 8), (22, 9), (23, 9), (24, 9), (25, 9), (26, 9), (27, 9)]
participant_2 = [(1, 5), (2, 5), (3, 1), (4, 3), (5, 4), (6, 5), (7, 5), (8, 7), (9, 8), (10, 9), (11, 10), (12, 10), (13, 10), (14, 10), (15, 10), (16, 10), (17, 10), (18, 0), (19, 0), (20, 0), (21, 0), (22, 0), (23, 0), (24, 0), (25, 0), (26, 0), (27, 0)]
participants = [participant_1, participant_2]
names = ['participant 1', 'participant 2']
pd.concat({name: pd.DataFrame(particip_data, columns=['timestamp', name]).set_index('timestamp')
for name, particip_data in zip(names, participants)}).reset_index()
full_df = pd.concat([pd.DataFrame(particip_data, columns=['timestamp', name]).set_index('timestamp')
for name, particip_data in zip(names, participants)],
axis=1)
fig, ax = plt.subplots(figsize=(15, 3))
cmap = plt.get_cmap('turbo', 11)
sns.heatmap(ax=ax, data=full_df.T, annot=True,
cmap='turbo', vmin=-0.5, vmax=10.5, cbar_kws={'ticks': np.arange(11), 'pad': 0.02})
ax.tick_params(labelrotation=0)
plt.tight_layout()
plt.show()

Format items in a list automatically

How would I correctly format a list of items without having to manually do it?
xy_coords = [(15, 5),
(9, 0),
(3, 5),
(13, 7),
(21, 1),
(19, 22),
(22, 2),
(11, 11),
(10, 21),
(24, 2),
(19, 19)]
First of all, the variable xy_coords is not a list. You will get an error if you run that line.
Ignoring that, and assuming xy_coords is a String, it will look like this:
xy_coords = '[(15, 5) (9, 0) (3, 5) (13, 7) (21, 1) (19, 22) (22, 2) (11, 11) (10, 21) (24, 2) (19, 19)]'
(note the single quotes above, which makes xy_coords a string)
Now, to add the commas between each tuple, you can do this:
new = ''
a = string.split(') ')
for k in a[:-1]:
new+=(k+'), ')
new+=a[-1]
print(new)
OUTPUT
[(15, 5), (9, 0), (3, 5), (13, 7), (21, 1), (19, 22), (22, 2), (11, 11), (10, 21), (24, 2), (19, 19)]
If xy_coords is a string, I would have used a RegEx to find all occurrences of couples (x, y) and then convert the strings into integers.
This can be done with the re.findall function. The RegEx can be minimal and only match the two coordinates. By using groups in your RegEx, the function will return a list of string tuples that you need to convert into int.
For instance:
import re
xy_coords = '[(15, 5) (9, 0) (3, 5) (13, 7) (21, 1) (19, 22) (22, 2) (11, 11) (10, 21) (24, 2) (19, 19)]'
xy_coords = [
tuple(map(int, coord))
for coord in re.findall(r"(\d+),\s*(\d+)", xy_coords)
]
print(xy_coords)
The result is a list of int tuples:
[(15, 5), (9, 0), (3, 5), (13, 7), (21, 1), (19, 22), (22, 2), (11, 11), (10, 21), (24, 2), (19, 19)]

Build 2 lists in one go while reading from file, pythonically

I'm reading a big file with hundreds of thousands of number pairs representing the edges of a graph. I want to build 2 lists as I go: one with the forward edges and one with the reversed.
Currently I'm doing an explicit for loop, because I need to do some pre-processing on the lines I read. However, I'm wondering if there is a more pythonic approach to building those lists, like list comprehensions, etc.
But, as I have 2 lists, I don't see a way to populate them using comprehensions without reading the file twice.
My code right now is:
with open('SCC.txt') as data:
for line in data:
line = line.rstrip()
if line:
edge_list.append((int(line.rstrip().split()[0]), int(line.rstrip().split()[1])))
reversed_edge_list.append((int(line.rstrip().split()[1]), int(line.rstrip().split()[0])))
I would keep your logic as it is the Pythonic approach just not split/rstrip the same line multiple times:
with open('SCC.txt') as data:
for line in data:
spl = line.split()
if spl:
i, j = map(int, spl)
edge_list.append((i, j))
reversed_edge_list.append((j, i))
Calling rstrip when you have already called it is redundant in itself even more so when you are splitting as that would already remove the whitespace so splitting just once means you save doing a lot of unnecessary work.
You can also use csv.reader to read the data and filter empty rows once you have a single whitespace delimiting:
from csv import reader
with open('SCC.txt') as data:
edge_list, reversed_edge_list = [], []
for i, j in filter(None, reader(data, delimiter=" ")):
i, j = int(i), int(j)
edge_list.append((i, j))
reversed_edge_list.append((j, i))
Or if there are multiple whitespaces delimiting you can use map(str.split, data):
for i, j in filter(None, map(str.split, data)):
i, j = int(i), int(j)
Whatever you choose will be faster than going over the data twice or splitting the sames lines multiple times.
You can't create two lists in one comprehension, so, instead of doing the same operations twice on the two lists, one viable option would be to initialize one of them and then create the second one by reversing each entry in the first one. That way you don't iterate over the file twice.
To that end, you could create the first list edge_list with a comprehension (not sure why you called rsplit again on it):
edge_list = [tuple(map(int, line.split())) for line in data]
And now go through each entry and reverse it with [::-1] in order to create its reversed sibling reverse_edge_list.
Using mock data for edge_list:
edge_list = [(1, 2), (3, 4), (5, 6)]
Reversing it could look like this:
reverse_edge_list = [t[::-1] for t in edge_list]
Which now looks like:
reverse_edge_list
[(2, 1), (4, 3), (6, 5)]
Maybe not clearer, but shorter:
with open('SCC.txt') as data:
process_line = lambda line, r: (int(line.rstrip().split()[r]), int(line.rstrip().split()[1-r]))
edge_list, reverved_edge_list = map(list, zip(*[(process_line(line, 0), process_line(line, 1))
for line in data
if line.rstrip()]))
Here comes a solution
A test file:
In[19]: f = ["{} {}".format(i,j) for i,j in zip(xrange(10), xrange(10, 20))]
In[20]: f
Out[20]:
['0 10',
'1 11',
'2 12',
'3 13',
'4 14',
'5 15',
'6 16',
'7 17',
'8 18',
'9 19']
One liner using comprehension, zip and map:
In[27]: l, l2 = map(list,zip(*[(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f]))
In[28]: l
Out[28]:
[(0, 10),
(1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15),
(6, 16),
(7, 17),
(8, 18),
(9, 19)]
In[29]: l2
Out[29]:
[(10, 0),
(11, 1),
(12, 2),
(13, 3),
(14, 4),
(15, 5),
(16, 6),
(17, 7),
(18, 8),
(19, 9)]
Explaining, with [(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f] we build a list containing a pair tuple with the pair tuples and its reversed forms:
In[24]: [(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f]
Out[24]:
[((0, 10), (10, 0)),
((1, 11), (11, 1)),
((2, 12), (12, 2)),
((3, 13), (13, 3)),
((4, 14), (14, 4)),
((5, 15), (15, 5)),
((6, 16), (16, 6)),
((7, 17), (17, 7)),
((8, 18), (18, 8)),
((9, 19), (19, 9))]
Applaying zip to the unpack form we split the tuples inside the main tuple, so we have 2 tuples containing the tuples pairs in the first and the reversed in the others:
In[25]: zip(*[(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f])
Out[25]:
[((0, 10),
(1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15),
(6, 16),
(7, 17),
(8, 18),
(9, 19)),
((10, 0),
(11, 1),
(12, 2),
(13, 3),
(14, 4),
(15, 5),
(16, 6),
(17, 7),
(18, 8),
(19, 9))]
Almost there, we just use map to transform that tuples into lists.
EDIT:
as #PadraicCunningham asked, for filtering empty lines, just add a if x in the comprehension [ ... for x in f if x]

Python: comparison of two dict lists

Here is what I want to achieve:
I have got two lists of dictionaries. All the dictionaries have the following structure:
dictinary = {'name':'MyName', 'state':'MyState'}
I would like to go through all the elements of both lists and compare the states of the entries with the same name. Here is the best way that I can imagine:
for d in list1:
name = d['name']
for d2 in list2:
if d2['name'] == name:
if d1['state'] != d2['state']:
# Do something
While I think that this approach would work, I wonder whether there is a more efficient and/or elegant way to perform this operation. Thank you for your ideas!
have a look at product from itertools:
import itertools
xs = range(1,10)
ys = range(11,20)
zs = itertools.product(xs,ys)
list(zs)
[(1, 11), (1, 12), (1, 13), (1, 14), (1, 15), (1, 16), (1, 17), (1, 18), (1, 19), (2, 11), (2, 12), (2, 13), (2, 14), (2, 15), (2, 16), (2, 17), (2, 18), (2, 19), (3, 11), (3, 12), (3, 13), (3, 14), (3, 15), (3, 16), (3, 17), (3, 18), (3, 19), (4, 11), (4, 12), (4, 13), (4, 14), (4, 15), (4, 16), (4, 17), (4, 18), (4, 19), (5, 11), (5, 12), (5, 13), (5, 14), (5, 15), (5, 16), (5, 17), (5, 18), (5, 19), (6, 11), (6, 12), (6, 13), (6, 14), (6, 15), (6, 16), (6, 17), (6, 18), (6, 19), (7, 11), (7, 12), (7, 13), (7, 14), (7, 15), (7, 16), (7, 17), (7, 18), (7, 19), (8, 11), (8, 12), (8, 13), (8, 14), (8, 15), (8, 16), (8, 17), (8, 18), (8, 19), (9, 11), (9, 12), (9, 13), (9, 14), (9, 15), (9, 16), (9, 17), (9, 18), (9, 19)]
A couple of other things -
when you are only representing two things, it is common to use a tuple (even a named tuple)
so have a think about why they are dicts to begin with - you might have a great reason :)
[('name','state'),('name','state'),('name','state')...]
Another approach, would be to compare elements directly, for example you could check the intersection of setA (list of dicts 1) and setB (list of dicts 2)
>>> listA = [('fred','A'), ('bob','B'), ('mary', 'D'), ('eve', 'E')]
>>> listB = [('fred','X'), ('clive', 'C'), ('mary', 'D'), ('ben','B')]
# your listA and listB could be sets to begin with
>>> set.intersection(set(listA),set(listB))
set([('mary', 'D')])
this approach however does not allow for duplicates...
The most elegant way I can think of is a list comprehension.
[[do_something() for d1 in list1 if d1["name"] == d2["name"] and d1["state"] != d2["state"]] for d2 in list2]
But that's kind of the same code.
You can also make your sample code a bit more elegant by reducing it a bit:
for d in list1:
for d2 in list2:
if d2['name'] == d['name'] and d['state'] != d2['state']:
# Do something
The other answers are functional (they deliver the correct answer), but won't perform well for large lists because they use nested iteration -- for lists of length N, the number of steps they use grows like N^2. This isn't a concern if the lists are small; but if the lists are big, the number of iterations would explode.
An alternate approach that keeps time complexity linear with N goes like this (being pretty verbose):
##
## sample data
data = list()
data.append( [
dict(name='a', state='0'),
dict(name='b', state='1'),
dict(name='c', state='3'),
dict(name='d', state='5'),
dict(name='e', state='7'),
dict(name='f', state='10'),
dict(name='g', state='11'),
dict(name='h', state='13'),
dict(name='i', state='14'),
dict(name='l', state='19'),
])
data.append( [
dict(name='a', state='0'),
dict(name='b', state='1'),
dict(name='c', state='4'),
dict(name='d', state='6'),
dict(name='e', state='8'),
dict(name='f', state='10'),
dict(name='g', state='12'),
dict(name='j', state='16'),
dict(name='k', state='17'),
dict(name='m', state='20'),
])
##
## coalesce lists to a single flat dict for searching
dCombined = {}
for d in data:
dCombined = { i['name'] : i['state'] for i in d }
##
## to record mismatches
names = []
##
## iterate over lists -- individually / not nested
for d in data:
for i in d:
if i['name'] in dCombined and i['state'] != dCombined[i['name']]:
names.append(i['name'])
##
## see result
print names
Caveats:
The OP didn't say if there could be repeated names within a list; that would change this approach a bit.
Depending on the details of "do something" you might record something other than justthe names -- could store references to or copies of the individual dict objects, or whatever "do something" requires.
The trade-off for this approach is that it requires more memory than the previous answers; however the memory requirement scales only with the number of actual mismatches, and is O(N).
Notes:
This approach also works when you have more than 2 lists to compare -- e.g. if there were 5 lists, my alternative is still O(N) in time and memory, while the previous answers would be O(N^5) in time!

Categories

Resources