The data I load with the below code will end up in the following format:
new_list = [['1', '100', 'A', 'B,A'], ['2', '200', 'A', 'T'],
['3', '200', 'H', 'A,C'], ['4', '300', 'W', 'T'],
['5', '400', 'I', 'BABA,ABB'], ['6', '500', 'Q', 'LP,AL']]
What I want to achieve is sorting the last column alphabetically changing the list to:
new_list = [['1', '100', 'A', 'A,B'], ['2', '200', 'A', 'T'],
['3', '200', 'H', 'A,C'], ['4', '300', 'W', 'T'],
['5', '400', 'I', 'ABB,BABA'], ['6', '500', 'Q', 'AL,LP']]
However I don't know how to sort only a specified index in this list.
Should I split the last column on ,?
Sample data:
# Data
# I
# don't
# need
1 100 982 A B,A 41
2 200 982 A T 42
3 200 982 H C 43
4 300 982 W T 43
5 400 982 I BABA,ABB 44
6 500 982 Q LP,AL 44
Loading the data:
filename = 'test.txt'
new_list = []
readFile = open(filename, 'r')
lines = readFile.readlines()
for line in lines:
if not line[0].startswith('#'):
linewords = line.split()
new_list.append([linewords[0],
linewords[1],
linewords[3],
linewords[4]])
split it on ",", then sort, then join the list :
new_list.append([linewords[0],
linewords[1],
linewords[3],
",".join(sorted(linewords[4].split(",")))])
first split, then sort, last join. there may be more than one blank space, you can use regex split.
import re
p = re.compile(' +')
for line in lines:
if line.startswith('#'):
continue
linewords = p.split(line)
lastword = linewords[4].split(',')
lastword.sort()
new_list.append([linewords[0],linewords[1],linewords[3],','.join(lastword)])
try:
def sort_last(inner_list):
last = inner_list[-1].split(',')
last.sort()
last = ','.join(last)
return inner_list[:-1] + [last]
new_list = [sort_last(l) for l in new_list]
[x[:-1]+[','.join(sorted(x[-1].split(',')))] for x in new_list]
Try this
list2 = []
for level1 in new_list:
_temp = []
for level2 in level1:
if "," in level2:
_temp.append(sorted(level2.split(",")))
else:
_temp.append(level2)
list2.append(_temp)
print list2
Related
I have my data in txt file.
1 B F 2019-03-10
1 C G 2019-03-11
1 B H 2019-03-10
1 C I 2019-03-10
1 B J 2019-03-10
2 A K 2019-03-10
1 D L 2019-03-10
2 D M 2019-03-10
2 E N 2019-03-11
1 E O 2019-03-10
What I need to do is to split the data according to the first column.
So all rows with number 1 in the first column go to one list( or dictionary or whatever) and all rows with number 2 in the first column do to other list or whatever. This is a sample data, in original data we do not know how many different numbers are in the first column.
What I have to do next is to sort the data for each key (in my case for numbers 1 and 2) by date and time. I could do that with the data.txt, but not with the dictionary.
with open("data.txt") as file:
reader = csv.reader(file, delimiter="\t")
data=sorted(reader, key=itemgetter(0))
lines = sorted(data, key=itemgetter(3))
lines
OUTPUT:
[['1', 'B', 'F', '2019-03-10'],
['2', 'D', 'M', '2019-03-10'],
['1', 'B', 'H', '2019-03-10'],
['1', 'C', 'I', '2019-03-10'],
['1', 'B', 'J', '2019-03-10'],
['1', 'D', 'L', '2019-03-10'],
['2', 'A', 'K', '2019-03-10'],
['1', 'E', 'O', '2019-03-10'],
['1', 'C', 'G', '2019-03-11'],
['2', 'E', 'N', '2019-03-11']]
So what I need is to group the data by the number in the first column as well as to sort this by the date and time. Could anyone please help me to combine these two codes somehow? I am not sure if I had to use a dictionary, maybe there is another way to do that.
You can sort corresponding list for each key after splitting the data according to the first column
def sort_by_time(key_items):
return sorted(key_items, key=itemgetter(3))
d = {k: sort_by_time(v) for k, v in d.items()}
If d has separate elements for time and for date, then you can sort by several columns:
sorted(key_items, key=itemgetter(2, 3))
itertools.groupby can help build the lists:
from operator import itemgetter
from itertools import groupby
from pprint import pprint
# Read all the data splitting on whitespace
with open('data.txt') as f:
data = [line.split() for line in f]
# Sort by indicated columns
data.sort(key=itemgetter(0,3,4))
# Build a dictionary keyed on the first column
# Note: data must be pre-sorted by the groupby key for groupby to work correctly.
d = {group:list(items) for group,items in groupby(data,key=itemgetter(0))}
pprint(d)
Output:
{'1': [['1', 'B', 'F', '2019-03-10', '16:13:38.935'],
['1', 'B', 'H', '2019-03-10', '16:13:59.045'],
['1', 'C', 'I', '2019-03-10', '16:14:07.561'],
['1', 'B', 'J', '2019-03-10', '16:14:35.371'],
['1', 'D', 'L', '2019-03-10', '16:14:40.854'],
['1', 'E', 'O', '2019-03-10', '16:15:05.878'],
['1', 'C', 'G', '2019-03-11', '16:14:39.999']],
'2': [['2', 'D', 'M', '2019-03-10', '16:13:58.641'],
['2', 'A', 'K', '2019-03-10', '16:14:43.224'],
['2', 'E', 'N', '2019-03-11', '16:15:01.807']]}
Say I have two lists
[['1', '2', '1', '3', '1', '3'], ['A', 'G', 'T', 'T', 'T', 'G']]
In this case each index matches the number on the left with the letter on the right, so 1 : A, and 2 : G and so on. I want to see if AT LEAST one number on the left changes mapping. So, I want to know if ANY number changes mapping. So if 1 : A changes to 1 : T, I would have True returned.
You can create a dictionary:
s = [['1', '2', '1', '3', '1', '3'], ['A', 'G', 'T', 'T', 'T', 'G']]
new_s = {b:a for a, b in zip(*s)}
final_vals = [a for a, b in new_s.items() if any(d == b for c, d in new_s.items() if c != a)]
Output:
['A', 'T']
Actually perform the assignments in a dictionary, stop whenever one changes an existing entry.
def check_overwrite(keys, values):
d = {}
for k,v in zip(keys, values):
if d.setdefault(k, v) != v:
return True
return False
print check_overwrite(['1', '2', '1', '3', '1', '3'], ['A', 'G', 'T', 'T', 'T', 'G'])
If you want to know if it's not only changed but what changed this (stolen from above) should help
>>> numbers = ['1', '2', '1', '3', '1', '3']
>>> letters = ['A', 'G', 'T', 'T', 'T', 'G']
>>> def check_overwrite(keys, values):
... d = {}
... overlap = {}
... for k,v in zip(keys, values):
... if d.setdefault(k, v) != v:
... overlap[k] = v
... return overlap
...
>>> check_overwrite(numbers, letters)
{'1': 'T', '3': 'G'}
I want to rearrange a file alphabetically. Also I want the number to be printed next to the arranged letter.
e.g.:
a 4
c 5
e 6
f 2
here is my code:
f = open("1.txt","r")
r = f.read()
print(r)
r=r.split()
line=sorted(r)
for row in line:
print(line)
and here are the results I'm getting:
f 2
c 5
e 6
a 4
['2', '4', '5', '6', 'a', 'c', 'e', 'f']
['2', '4', '5', '6', 'a', 'c', 'e', 'f']
['2', '4', '5', '6', 'a', 'c', 'e', 'f']
['2', '4', '5', '6', 'a', 'c', 'e', 'f']
['2', '4', '5', '6', 'a', 'c', 'e', 'f']
['2', '4', '5', '6', 'a', 'c', 'e', 'f']
['2', '4', '5', '6', 'a', 'c', 'e', 'f']
['2', '4', '5', '6', 'a', 'c', 'e', 'f']
>>>
To get the pairs in sublists map str.split on the file object and call sorted on that:
with open("in.txt") as f:
print(sorted(map(str.split,f)))
in.txt:
e 6
c 5
f 2
a 4
Output:
[['a', '4'], ['c', '5'], ['e', '6'], ['f', '2']]
To sort a file alphabetically just getting the lines you would simply call sorted on the file object:
with open("test.txt") as f:
print(sorted(f))
If you want to format the output:
with open("test.txt") as f:
for sub in sorted(map(str.split,f)):
print("letter = {}, num = {}".format(*sub))
letter = a, num = 4
letter = c, num = 5
letter = e, num = 6
letter = f, num = 2
Also why you see ['2', '4', '5', '6', 'a', 'c', 'e', 'f'] is because calling split on .read splits all the data into a single list as split, splits on any whitespaces and when lexicographically comparing string digits to letters digits are considered lower so 2 < a, also beware when you are comparing string digits against each other as 11 > 100 = True as strings are compared character by character as 1 is considered greater than 0 100 would appear before 11 in your sorted list, when comparing digits cast to int.
If you want to have a max of three scores per user always keeping the most recent, you can use a deque with a maxlen of 3 and after the initial sort pickle the dict.
from csv import reader
from collections import deque, OrderedDict
import pickle
name, new_score = "foo",100
with open("test.txt") as f:
d = OrderedDict((name, deque(map(int,rest),maxlen=3)) for name, *rest in reader(sorted(f)))
print(d)
d[name].append(new_score)
print(d)
with open("my_data.pkl","wb") as out:
pickle.dump(d, out)
with open("my_data.pkl","rb") as out:
print(pickle.load(out))
test.txt:
homer,2
foo,1,2,3
bar,4,5,6
Output:
OrderedDict([('bar', deque([4, 5, 6], maxlen=3)), ('foo', deque([1, 2, 3], maxlen=3)), ('homer', deque([2], maxlen=3))])
OrderedDict([('bar', deque([4, 5, 6], maxlen=3)), ('foo', deque([2, 3, 100], maxlen=3)), ('homer', deque([2], maxlen=3))])
OrderedDict([('bar', deque([4, 5, 6], maxlen=3)), ('foo', deque([2, 3, 100], maxlen=3)), ('homer', deque([2], maxlen=3))])
Once sorted you just need to load to get the dict and dump after you have written.
You need to use readlines() instead of read() to get each line of the file as a seperate element of the list. Then a simple sort of the list will work.
f = open('1.txt','r')
# Use readlines isntead of of read to get an list of lines
lines = f.readlines()
print ''.join(lines)
# Now sort the list (default is by first letter
lines.sort()
print ''.join(lines)
Alternatively you could force the split function to use the end of line char '\n' instead of the default which is all white space. But now you will need to join the list back with the new line char ('\n') instead of an empty string.
f = open('1.txt','r')
lines = f.read()
lines = lines.split('\n')
print '\n'.join(lines)
# Now sort the list (default is by first letter
lines.sort()
print '\n'.join(lines)
I have lists like:
['a', '2', 'b', '1', 'c', '4']
['d', '5', 'e', '7', 'f', '4', 'g', '6']
And I want to make a dictionary consist of keys as letters and values as numbers. I mean:
{'a': 2, 'b': 1, 'c': 4, 'd':5, 'e':7, 'f':4, 'g':6}
You can try:
>>> l = ['a', '2', 'b', '1', 'c', '4']
>>> it = iter(l)
>>> dict(zip(it, it))
{'a': '2', 'c': '4', 'b': '1'}
First you create an iterator out of the list. Then with zip of the iterator with itself you take pair of values from the list. Finally, with dict you transform these tuples to your wanted dictionary.
If you also want to do the string to number conversion, then use:
{x: int(y) for x, y in zip(it, it)}
EDIT
If you don't want to use zip then:
{x: int(next(it)) for x in it}
l = ['a', '2', 'b', '1', 'c', '4']
d = {k:v for k,v in zip(l[::2], l[1::2])}
Or if you want the numbers to be actual numbers:
l = ['a', '2', 'b', '1', 'c', '4']
d = {k:int(v) for k,v in zip(l[::2], l[1::2])}
Use float(v) instead of int(v) if the numbers have the potential to be floating-point values instead of whole numbers.
Without using any built-in functions:
l = ['a', '2', 'b', '1', 'c', '4']
d = {}
l1 = l[::2]
l2 = l[1::2]
idx = 0
while 1:
try:
d[l1[idx]] = l2[idx]
idx += 1
except IndexError:
break
You can split the lists into two, one containing the letters and the other containing the keys, with
key_list = old_list[::2]
value_list = old_list[1::2]
Then you can loop over the two lists at once with zip and you can make the dictionary.
I have a text file in which has only one column. What I need is to split the only column to a few columns. For example, assume my file consists of :
10
20
30
40
50
e
1467
1608
1733
1767
1878
e
1787
1353
1024
693
423
I need it to become as below:
10 1467 1787
20 1608 1353
30 1733 1024
40 1767 693
50 1878 423
Just was wondering if you help me to do it with a Python script. In addition, if I can do it by writing some commands in OS X terminal, please let me know.
Here is an example of what it's possible to do with list comprehensions and the itertools module.
>>> from itertools import dropwhile, izip, takewhile
>>> l = ['1', '2', 'X', '3', '4', 'X', '5', '6']
>>> splitter = 'X'
>>> fun = lambda e: e != 'X'
>>> begin = [e for e in takewhile(fun, l)]
>>> end = [e for e in dropwhile(fun, l)][1:]
>>> begin, end
(['1', '2'], ['3', '4', 'X', '5', '6'])
>>> # OUT: (['1', '2'], ['3', '4', 'X', '5', '6'])
>>> mid = [e for e in takewhile(fun, end)]
>>> end = [e for e in dropwhile(fun, end)][1:]
>>> begin, mid, end
(['1', '2'], ['3', '4'], ['5', '6'])
>>> # OUT: (['1', '2'], ['3', '4'], ['5', '6'])
>>> [e for e in izip(begin, mid, end)]
[('1', '3', '5'), ('2', '4', '6')]
>>> # OUT: [('1', '3', '5'), ('2', '4', '6')]
Of course, if the original list has a variable length, it is necessary to do this work in a loop.
I recommend you to test this kind of statements in a BPython interpreter so you can easily test interactive examples.
You can split the content of an entire file into a list using:
def read_data(filename):
with open(filename) as f:
return f.read().split()
Running data = read_data('test.txt') using a text.txt that contains:
10
20
30
e
11
21
31
e
12
22
32
Will result in:
data = ['10', '20', '30', 'e', '11', '21', '31', 'e', '12', '22', '32']
NOTE: test.txt can be formatted with spaces, tabs and newlines in any way as split() will handle them correctly!
The data should really be in a 2D array that does not contain the 'e' entries. This can be done using the following:
def list_to_grid(data):
ret = []
line = []
for entry in data:
if entry == 'e':
if len(line) != 0:
ret.append(line)
line = []
else:
line.append(int(entry))
if len(line) != 0:
ret.append(line)
return ret
NOTE: I'm sure there's a more Pythonic way of doing this, but it works.
Running data = list_to_grid(read_data('test.txt')) on the text.txt file will result in:
data = [[10, 20, 30], [11, 21, 31], [12, 22, 32]]
What you are doing is transposing the 2D array. That is, given data[i][j], it has the new position data[j][i]. Now this data can be transposed to get the desired sequence:
def transpose(data):
ret = []
for i in range(0, len(data)):
ret.append([data[j][i] for j in range(0, len(data[i]))])
return ret
Which for tdata = transpose(data) gives:
data = [[10, 20, 30], [11, 21, 31], [12, 22, 32]]
tdata = [[10, 11, 12], [20, 21, 22], [30, 31, 32]]
Now print it out:
def print_data(data):
for line in data:
print ' '.join([str(x) for x in line])
Using print_data(tdata) results in:
10 11 12
20 21 22
30 31 32
Which is what you wanted.
Note: Modified to reflect changed data formats
Based on your (new) sample data using 'e' as the group delimiter. The basic idea is to iterate over the lines in the file grouping as it goes and starting a new group whenever the delimiter is seen.
# testdata contains:
10
20
30
40
50
e
1467
1608
1733
1767
1878
e
1787
1353
1024
693
423
_
DELIMITER = 'e'
groups = []
this_group = []
for l in open('testdata', 'r'):
l = l.strip()
if l == DELIMITER and this_group:
groups.append(this_group)
this_group = []
else:
this_group.append(l)
if this_group:
groups.append(this_group)
for t in zip(*groups):
print ' '.join(t)
10 1467 1787
20 1608 1353
30 1733 1024
40 1767 693
50 1878 423