sort data by date and time python - python

I have my data in txt file.
1 B F 2019-03-10
1 C G 2019-03-11
1 B H 2019-03-10
1 C I 2019-03-10
1 B J 2019-03-10
2 A K 2019-03-10
1 D L 2019-03-10
2 D M 2019-03-10
2 E N 2019-03-11
1 E O 2019-03-10
What I need to do is to split the data according to the first column.
So all rows with number 1 in the first column go to one list( or dictionary or whatever) and all rows with number 2 in the first column do to other list or whatever. This is a sample data, in original data we do not know how many different numbers are in the first column.
What I have to do next is to sort the data for each key (in my case for numbers 1 and 2) by date and time. I could do that with the data.txt, but not with the dictionary.
with open("data.txt") as file:
reader = csv.reader(file, delimiter="\t")
data=sorted(reader, key=itemgetter(0))
lines = sorted(data, key=itemgetter(3))
lines
OUTPUT:
[['1', 'B', 'F', '2019-03-10'],
['2', 'D', 'M', '2019-03-10'],
['1', 'B', 'H', '2019-03-10'],
['1', 'C', 'I', '2019-03-10'],
['1', 'B', 'J', '2019-03-10'],
['1', 'D', 'L', '2019-03-10'],
['2', 'A', 'K', '2019-03-10'],
['1', 'E', 'O', '2019-03-10'],
['1', 'C', 'G', '2019-03-11'],
['2', 'E', 'N', '2019-03-11']]
So what I need is to group the data by the number in the first column as well as to sort this by the date and time. Could anyone please help me to combine these two codes somehow? I am not sure if I had to use a dictionary, maybe there is another way to do that.

You can sort corresponding list for each key after splitting the data according to the first column
def sort_by_time(key_items):
return sorted(key_items, key=itemgetter(3))
d = {k: sort_by_time(v) for k, v in d.items()}
If d has separate elements for time and for date, then you can sort by several columns:
sorted(key_items, key=itemgetter(2, 3))

itertools.groupby can help build the lists:
from operator import itemgetter
from itertools import groupby
from pprint import pprint
# Read all the data splitting on whitespace
with open('data.txt') as f:
data = [line.split() for line in f]
# Sort by indicated columns
data.sort(key=itemgetter(0,3,4))
# Build a dictionary keyed on the first column
# Note: data must be pre-sorted by the groupby key for groupby to work correctly.
d = {group:list(items) for group,items in groupby(data,key=itemgetter(0))}
pprint(d)
Output:
{'1': [['1', 'B', 'F', '2019-03-10', '16:13:38.935'],
['1', 'B', 'H', '2019-03-10', '16:13:59.045'],
['1', 'C', 'I', '2019-03-10', '16:14:07.561'],
['1', 'B', 'J', '2019-03-10', '16:14:35.371'],
['1', 'D', 'L', '2019-03-10', '16:14:40.854'],
['1', 'E', 'O', '2019-03-10', '16:15:05.878'],
['1', 'C', 'G', '2019-03-11', '16:14:39.999']],
'2': [['2', 'D', 'M', '2019-03-10', '16:13:58.641'],
['2', 'A', 'K', '2019-03-10', '16:14:43.224'],
['2', 'E', 'N', '2019-03-11', '16:15:01.807']]}

Related

Appending columns using loc pandas dataframe

I am working with a dataframe that I have created with the below code:
df = pd.DataFrame({'player': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
'playerlookup': ['B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'score': ['10', '9', '8', '7', '6', '5', '4', '3']})
I want to add a new column called "scorelookup" to this dataframe that for each row, takes the value in the 'playerlookup' column, searches for it in the 'player' column and then returns the score in a new column. For example, the value in the "scorelookup" column in the first row of the dataframe would be '9' because that was the score for player 'B'. In instances where the value in the 'playerlookup' column isn't contained within the 'player' column (for example the last row of the table which has a value of 'I' in the 'playerlookup' column), the value in that column would be blank.
I have tried using code like:
df['playerlookup'].apply(lambda n: df.loc[df['player'] == n, 'score'])
but have been unsuccessful.
Any help massively appreciated!
I hope this is the result you are looking for :
import pandas as pd
df = pd.DataFrame({'player': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
'playerlookup': ['B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'score': ['10', '9', '8', '7', '6', '5', '4', '3']})
d1 = df[["playerlookup"]].copy()
d2 = df[["player","score"]].copy()
d1.rename({'playerlookup':'player'}, axis='columns',inplace=True)
df["scorelookup"] = d1.merge(d2, on='player', how='left')["score"]
The output
player playerlookup score scorelookup
0 A B 10 9
1 B C 9 8
2 C D 8 7
3 D E 7 6
4 E F 6 5
5 F G 5 4
6 G H 4 3
7 H I 3 NaN

How to create a list of lists of all the n'th characters from a list of strings?

I have a list that contains multiple strings created from a FASTA format file.
The list is like this:
data = ['ATCCAGCT', 'GGGCAACT', 'ATGGATCT', 'AAGCAACC', 'TTGGAACT', 'ATGCCATT', 'ATGGCACT']
I want to get the characters at the first index of all the strings in the list and transfer them to another list and I do it like this:
list1 = []
z = 0
while x < len(data):
list1.append(((data[x])[z]))
x += 1
Now that I have the first index, how do I do that for every index of all the strings? Assuming they are all the same length.
Assuming they all have the same length, you can zip the string.
The first string in the result contains all the first chars, the second all the seconds, etc
>>> res = ["".join(el) for el in zip(*data)]
>>> res
['AGAATAA', 'TGTATTT', 'CGGGGGG', 'CCGCGCG', 'AAAAACC', 'GATAAAA', 'CCCCCTC', 'TTTCTTT']
If all your strings are of same length, you can use zip() to achieve this:
>>> data = ['ATCCAGCT', 'GGGCAACT', 'ATGGATCT', 'AAGCAACC', 'TTGGAACT', 'ATGCCATT', 'ATGGCACT']
>>> my_lists = zip(*data)
>>> my_lists[0] # chars from `0`th index of each string
('A', 'G', 'A', 'A', 'T', 'A', 'A')
>>> my_lists[1] # chars from `1`st index of each string
('T', 'G', 'T', 'A', 'T', 'T', 'T')
# ... so on
If you want each of these lists stored in separate variables, then you can also unpack these like:
a, b, c, d, e, f, g, h = zip(*data)
# where:
# a = ('A', 'G', 'A', 'A', 'T', 'A', 'A') ## chars from `0`th index
# b = ('T', 'G', 'T', 'A', 'T', 'T', 'T') ## chars from `1`st index
In case your strings are of different length, you can use itertools.zip_longest() in Python 3 (or itertools.izip_longest() in Python 2) as:
>>> from itertools import zip_longest # In Python 3
# OR, from itertools import izip_longest # In Python 2
>>> my_list = ['abc', 'de', 'fghi', 'j']
>>> list(zip_longest(*my_list, fillvalue=''))
[('a', 'd', 'f', 'j'), ('b', 'e', 'g', ''), ('c', '', 'h', ''), ('', '', 'i', '')]
Skipping fillvalue param in above example will fill the empty elements with None like this:
[('a', 'd', 'f', 'j'), ('b', 'e', 'g', None), ('c', None, 'h', None), (None, None, 'i', None)]
All you really need is a for loop that loops for every item in the list. This is how I would do it:
data = ['ATCCAGCT', 'GGGCAACT', 'ATGGATCT', 'AAGCAACC', 'TTGGAACT', 'ATGCCATT', 'ATGGCACT']
first_index = []
for i in data:
first_index.append(i[0])
print(first_index)
This outputs a list which looks like this:
['A', 'G', 'A', 'A', 'T', 'A', 'A']
So, I've take your question to mean getting the first letter of each string in the data.
(ATCCAGCT).
The way I would do it is:
data = ['.', '..', '...'] # Ur values
list1 = [] # empty list
for x in data: # for each entry
list1.append(x[0]) # add the first letter
Use a list comprehension. If you want to use a loop, then deconstruct this to the loop form. You went to a lot of indirect effort in your posted code.
first = [seq[0] for seq in data]
data = ['ATCCAGCT', 'GGGCAACT', 'ATGGATCT', 'AAGCAACC', 'TTGGAACT', 'ATGCCATT', 'ATGGCACT']
mydict = {}
for position in range(len(data[0])):
mydict[str(position)] = []
for seq in data:
for position, nucleotide in enumerate(seq):
mydict[str(position)].append(nucleotide)
for position in mydict.keys():
print (position,mydict[position],"".join(mydict[position]))
Output:
0 ['A', 'G', 'A', 'A', 'T', 'A', 'A'] AGAATAA
1 ['T', 'G', 'T', 'A', 'T', 'T', 'T'] TGTATTT
2 ['C', 'G', 'G', 'G', 'G', 'G', 'G'] CGGGGGG
3 ['C', 'C', 'G', 'C', 'G', 'C', 'G'] CCGCGCG
4 ['A', 'A', 'A', 'A', 'A', 'C', 'C'] AAAAACC
5 ['G', 'A', 'T', 'A', 'A', 'A', 'A'] GATAAAA
6 ['C', 'C', 'C', 'C', 'C', 'T', 'C'] CCCCCTC
7 ['T', 'T', 'T', 'C', 'T', 'T', 'T'] TTTCTTT

How do I check whether a variable was changed?

Say I have two lists
[['1', '2', '1', '3', '1', '3'], ['A', 'G', 'T', 'T', 'T', 'G']]
In this case each index matches the number on the left with the letter on the right, so 1 : A, and 2 : G and so on. I want to see if AT LEAST one number on the left changes mapping. So, I want to know if ANY number changes mapping. So if 1 : A changes to 1 : T, I would have True returned.
You can create a dictionary:
s = [['1', '2', '1', '3', '1', '3'], ['A', 'G', 'T', 'T', 'T', 'G']]
new_s = {b:a for a, b in zip(*s)}
final_vals = [a for a, b in new_s.items() if any(d == b for c, d in new_s.items() if c != a)]
Output:
['A', 'T']
Actually perform the assignments in a dictionary, stop whenever one changes an existing entry.
def check_overwrite(keys, values):
d = {}
for k,v in zip(keys, values):
if d.setdefault(k, v) != v:
return True
return False
print check_overwrite(['1', '2', '1', '3', '1', '3'], ['A', 'G', 'T', 'T', 'T', 'G'])
If you want to know if it's not only changed but what changed this (stolen from above) should help
>>> numbers = ['1', '2', '1', '3', '1', '3']
>>> letters = ['A', 'G', 'T', 'T', 'T', 'G']
>>> def check_overwrite(keys, values):
... d = {}
... overlap = {}
... for k,v in zip(keys, values):
... if d.setdefault(k, v) != v:
... overlap[k] = v
... return overlap
...
>>> check_overwrite(numbers, letters)
{'1': 'T', '3': 'G'}

How to remove floats of a list of lists?

For example:
[['D', 'D', '-', '1', '.', '0'],['+', '2', '.', '0', 'D', 'D'],['D', 'D', 'D']]
This is:
D D -1.0
+2.0 D D
D D D
I want to extract the values, put in differents variables and know the line and column where the signal was (so i can put symbol that corresponds to the old value).
D D x
y D D
D D D
[['D', 'D', '-1.0'],['+2.0', 'D', 'D'],['D', 'D', 'D']]
Don't create a list of list. Take directly the lines from your file and split them with the help of regular expressions:
maze = []
for line in arq:
maze.append(re.findall('[-+][0-9.]+|\S', line)
import itertools
merged = list(itertools.chain(*list2d))
print [x for x in merged if not (x.isdigit() or x in '-+.')]
Use re.findall. The pattern [-+]?\d*\.\d+|\d+ is used to extract float values from a string.
import re
list2d = [['D', 'D', '-', '1', '.', '0'],['+', '2', '.', '0', 'D', 'D'],['D', 'D', 'D']]
lists = list()
for l in list2d:
s = ''.join(l)
matches = re.findall(r"D|[-+]?\d*\.\d+|\d+", s)
lists.append(matches)
print(lists)
# Output
[['D', 'D', '-1.0'], ['+2.0', 'D', 'D'], ['D', 'D', 'D']]
I'm not sure if this is what you want, could add more information in your description.
import csv
csv_file = open("maze.txt")
csv_reader = csv.reader(csv_file)
maze = []
for line in csv_reader:
for char in line:
maze.append(char.split())
print(maze)
# Output
[['D', 'D', '-1.0'], ['+2.0', 'D', 'D'], ['D', 'D', 'D']]

How to count the number of times a certain pattern in a sublist occurs within a list and then append that count to the sublist?

The challenge is that I want to count the number of times a certain pattern of items occurs in a sub-list at certain indices.
For example, I'd like to count the number of times a unique patter occurs at index 0 and index 1. 'a' and 'z' occur three times below at index 0 and index 1 while '1' and '2' occur two times below at index 0 and index 1. I'm only concerned at the pair that occurs at index 0 and 1 and I'd like to know the count of unique pairs that are there and then append that count back to the sub-list.
List = [['a','z','g','g','g'],['a','z','d','d','d'],['a','z','z','z','d'],['1','2','f','f','f'],['1','2','3','f','f'],['1','1','g','g','g']]
Desired_List = [['a','z','g','g','g',3],['a','z','d','d','d',3],['a','z','z','z','d',3],['1','2','f','f','f',2],['1','2','3','f','f',2],['1','1','g','g','g',1]]
Currently, my attempt is this:
from collections import Counter
l1 = Counter(map(lambda x: (x[0] + "|" + x[1]),List)
Deduped_Original_List = map(lambda x: Counter.keys().split("|"),l1)
Counts = map(lambda x: Counter.values(),l1)
for ele_a, ele_b in zip(Deduped_Original_List, Counts):
ele_a.append(ele_b)
This clearly doesn't work because in the process I lose index 2,3, and 4.
You can use list comprehension with collections.Counter:
from collections import Counter
lst = [['a','z','g','g','g'],['a','z','d','d','d'],['a','z','z','z','d'],['1','2','f','f','f'],['1','2','3','f','f'],['1','1','g','g','g']]
cnt = Counter([tuple(l[:2]) for l in lst])
lst_output = [l + [cnt[tuple(l[:2])]] for l in lst]
print lst_output
Ouput:
[['a', 'z', 'g', 'g', 'g', 3], ['a', 'z', 'd', 'd', 'd', 3], ['a', 'z', 'z', 'z', 'd', 3], ['1', '2', 'f', 'f', 'f', 2], ['1', '2', '3', 'f', 'f', 2], ['1', '1', 'g', 'g', 'g', 1]]
>>> import collections
>>> List = [['a','z','g','g','g'],['a','z','d','d','d'],['a','z','z','z','d'],['1','2','f','f','f'],['1','2','3','f','f'],['1','1','g','g','g']]
>>> patterns = ['az', '12']
>>> answer = collections.defaultdict(int)
>>> for subl in List:
... for pattern in patterns:
... if all(a==b for a,b in zip(subl, pattern)):
... answer[pattern] += 1
... break
...
>>> for i,subl in enumerate(List):
... if ''.join(subl[:2]) in answer:
... List[i].append(answer[''.join(subl[:2])])
...
>>> List
[['a', 'z', 'g', 'g', 'g', 3], ['a', 'z', 'd', 'd', 'd', 3], ['a', 'z', 'z', 'z', 'd', 3], ['1', '2', 'f', 'f', 'f', 2], ['1', '2', '3', 'f', 'f', 2], ['1', '1', 'g', 'g', 'g']]
>>>
I like the Counter approach of YS-L. Here is another approach:
>>> List = [['a','z','g','g','g'], ['a','z','d','d','d'], ['a','z','z','z','d'],['1','2','f','f','f'], ['1','2','3','f','f'], ['1','1','g','g','g']]
>>> d = {}
>>> for i in List:
key = i[0] + i[1]
if not d.get(key, None): d[key] = 1
else: d[key] += 1
>>> Desired_List = [li + [d[li[0] + li[1]]] for li in List]
>>> Desired_List
[['a', 'z', 'g', 'g', 'g', 3], ['a', 'z', 'd', 'd', 'd', 3], ['a', 'z', 'z', 'z', 'd', 3], ['1', '2', 'f', 'f', 'f', 2], ['1', '2', '3', 'f', 'f', 2], ['1', '1', 'g', 'g', 'g', 1]]

Categories

Resources