Print certain amount of tuples in a list without name - python

so my exercise was to print 10 most common words in a text file.
Assuming I opened the file and created a dictionary that contains seperated words with indexes.
Normally, I do this:
li=list()
for key,value in d.items():
tpl=(value,key)
li.append(tpl)
li=sorted(li,reverse=True)
for key,value in li[:10]:
print('Ten most common words: ',value,key)
But prof gave me a single line of code that can replace almost all those lines:
print(sorted([(value,key) for key,value in d.items()],reverse=True))
However I can't find a way to print only 10 tuples since the list has no name, I can't use the for loop to print. Can you help me out?

Separate the list creation from the print():
li = sorted([(value, key) for key, value in d.items()], reverse=True)
Now you can iterate through li.
for item in li[:10]:
print(item)

Related

How to sort a Python dictionary by a substring contained in the keys, according to the order set in a list?

I'm very new to Python and I'm stuck on a task. First I made a file containing a number of fasta files with sequence names into a dictionary, then managed to select only those I want, based on substrings included in the keys which are defined in list "flu_genes".
Now I'm trying to reorder the items in this dictionary based on the order of substrings defined in the list "flu_genes". I'm completely stuck; I found a way of reordering based on the key order in a list BUT it is not my case, as the order is defined not by the keys but by a substring within the keys.
Should also add that in this case the substring its at the end with format "_GENE", however it could be in the middle of the string with the same format, perhaps "GENE", therefore I'd rather not rely on a code to find the substring at the end of the string.
I hope this is clear enough and thanks in advance for any help!
"full_genome.fasta"
>A/influenza/1/1_NA
atgcg
>A/influenza/1/1_NP
ctgat
>A/influenza/1/1_FluB
agcta
>A/influenza/1/1_HA
tgcat
>A/influenza/1/1_FluC
agagt
>A/influenza/1/1_M
tatag
consensus = {}
flu_genes = ['_HA', '_NP', '_NA', '_M']
with open("full_genome.fasta", 'r') as myseq:
for line in myseq:
line = line.rstrip()
if line.startswith('>'):
key = line[1:]
else:
if key in consensus:
consensus[key] += line
else:
consensus[key] = line
flu_fas = {key : val for key, val in consensus.items() if any(ele in key for ele in flu_genes)}
print("Dictionary after removal of keys : " + str(flu_fas))
>>>Dictionary after removal of keys : {'>A/influenza/1/1_NA': 'atgcg', '>A/influenza/1/1_NP': 'ctgat', '>A/influenza/1/1_HA': 'tgcat', '>A/influenza/1/1_M': 'tatag'}
#reordering by keys order (not going to work!) as in: https://try2explore.com/questions/12586065
reordered_dict = {k: flu_fas[k] for k in flu_genes}
A dictionary is fundamentally unsorted, but as an implementation detail of python3 it remembers its insertion order, and you're not going to change anything later, so you can do what you're doing.
The problem is, of course, that you're not working with the actual keys. So let's just set up a list of the keys, and sort that according to your criteria. Then you can do the other thing you did, except using the actual keys.
flu_genes = ['_HA', '_NP', '_NA', '_M']
def get_gene_index(k):
for index, gene in enumerate(flu_genes):
if k.endswith(gene):
return index
raise ValueError('I thought you removed those already')
reordered_keys = sorted(flu_fas.keys(), key=get_gene_index)
reordered_dict = {k: flu_fas[k] for k in reordered_keys}
for k, v in reordered_dict.items():
print(k, v)
A/influenza/1/1_HA tgcat
A/influenza/1/1_NP ctgat
A/influenza/1/1_NA atgcg
A/influenza/1/1_M tatag
Normally, I wouldn't do an n-squared sort, but I'm assuming the lines in the data file is much larger than the number of flu_genes, making that essentially a fixed constant.
This may or may not be the best data structure for your application, but I'll leave that to code review.
It's because you are trying to reorder it with non-existent dictionary keys. Your keys are
['>A/influenza/1/1_NA', '>A/influenza/1/1_NP', '>A/influenza/1/1_HA', '>A/influenza/1/1_M']
which doesn't match the list
['_HA', '_NP', '_NA', '_M']
you first need to get transform them to make them match and since we know the pattern that it's at the end of the string starting with an underscore, we can split at underscores and get the last match.
consensus = {}
flu_genes = ['_HA', '_NP', '_NA', '_M']
with open("full_genome.fasta", 'r') as myseq:
for line in myseq:
line = line.rstrip()
if line.startswith('>'):
sequence = line
gene = line.split('_')[-1]
key = f"_{gene}"
else:
consensus[key] = {
'sequence': sequence,
'data': line
}
flu_fas = {key : val for key, val in consensus.items() if any(ele in key for ele in flu_genes)}
print("Dictionary after removal of keys : " + str(flu_fas))
reordered_dict = {k: flu_fas[k] for k in flu_genes}

summing dictionary values from reading another file

My assignment is to:
Read the protein sequence from FILE A and calculate the molecular weight of
this protein using the dictionary created above.
So far, I have the code below:
import pprint
my_dict= {'A':'089Da', 'R':'174Da','N':'132Da','D':'133Da','B':'133Da','C':'121Da','Q':'146Da','E':'147Da',
'Z':'147Da','G':'075Da','H':'155Da','I':'131Da','L':'131Da','K':'146Da','M':'149Da',
'F':'165Da','P':'115Da','S':'105Da','T':'119Da','W':'204Da','Y':'181Da','V':'117Da'}
new=sorted(my_dict.items(), key=lambda x:x[1])
print("AA", " ", "MW")
for key,value in new:
print(key, " ", value)
with open('lysozyme.fasta','r') as content:
fasta = content.read()
for my_dict in fasta:
In which the top part of the code is my dictionary created. The task is to i.e open the rile and read 'MWAAAA' in the file, and then sum up the values associated with those keys using the dictionary I created. I'm not sure how to proceed after the for loop. Do I use an append function? Would appreciate any advice, thanks!
after read your file, you can check char by char:
for char in fasta:
print(char)
output:
M
W
A
A
A
A
then use the char as a key for retrieve value of your dict
summ += my_dict[char]

.get() returning letters rather than strings

I have a list of dictionaries that is structured as such:
json_data = [{'a':10,'text':"Salam"},{'a':4,'text':"Hello Friend"}]
I have been able to iterate through the list and extract the key 'text' from each dictionary:
json1_text = [[[value] for value in json1_data[index].get('text')] for
index in range(len(json1_data))]
However, the new json1_text list does not contain sentences from returned from the dictionary, but rather each individual letter:
json1_text[0]
Returns:
[['S'],['a'],['l'],['a'],['m']]
How would I be able to return the whole sentence "Hello Friend" as opposed to each individual letter and storing each in a list?
Thanks in advance!
json1_text = [v for i in json_data for k,v in i.items() if isinstance(v,str)]
print (json1_text)
Result:
['Salam', 'Hello Friend']

How to extract from dictionaries to only print certain variables python

I had a tsv file like such
Name School Course
Nicole UVA Biology
Jenna GWU CS
from there,
I only want to print the Name and the Course from that dictionary. How would I go about this?
The code below is how I put the original TSV file into the dictionary above.
import csv
data = csv.reader(open('data.tsv'),delimiter='\t')
fields = data.next()
for row in data:
item = dict(zip(fields, row))
print item
So now I got a dictionary like such:
{'Name':'Nicole.', 'School':'UVA.','Course':'Biology'}
{'Name':'Jenna.', 'School':'GWU','Course':'CS'}
{'Name':'Shan', 'School':'Columbia','Course':'Astronomy'}
{'Name':'BILL', 'School':'UMD.','Course':'Algebra'}
I only want to print the Name and the Course from that dictionary. How would I go about this?
I want to add code so that I'm only printing
{'Name':'Jenna.','Course':'CS'}
{'Name':'Shan','Course':'Astronomy'}
{'Name':'BILL','Course':'Algebra'}
Please guide. Thank You
Just delete the key in the loop
for row in data:
item = dict(zip(fields, row))
del item['School']
print item
The easiest way is to just remove the item['School'] entry before printing
for row in data:
item = dict(zip(fields, row))
del item['School']
print item
But this only works if you know exactly what the dictionary looks like, it has no other entries that you don't want, and it already has a School entry. I would instead recommend that you build a new dictionary out of the old one, only keeping the Name and Course entries
for row in data:
item = dict(zip(fields, row))
item = {k, v for k, v in item.items() if k in ('Name', 'Course')}
print item
Maybe use a DictReader in the first place, and rebuild the row-dict only if key matches a pre-defined list:
import csv
keep = {"Name","Course"}
data = csv.DictReader(open('data.tsv'),delimiter='\t')
for row in data:
row = {k:v for k,v in row.items() if k in keep}
print(row)
result:
{'Course': 'Biology', 'Name': 'Nicole'}
{'Course': 'CS', 'Name': 'Jenna'}
Based on the answer here:
filter items in a python dictionary where keys contain a specific string
print {k:v for k,v in item.iteritems() if "Name" in k or "Course" in k}
You're better off using a library designed for these kinds of tasks (Pandas). A dictionary is great for storing key-value pairs, but it looks like you have spreadsheet-like tabular data, so you should choose a storage type that better reflects the data at hand. You could simply do the following:
import pandas as pd
df = pd.read_csv('myFile.csv', sep = '\t')
print df[['Name','Course']]
You'll find that as you start doing more complicated tasks, it's better to use well written libraries than to cludge something together
replace the your " print(item) " line with the below line.
print(dict(filter(lambda e: e[0]!='School', item)))
OUTPUT:
{'Name':'Jenna.','Course':'CS'}
{'Name':'Shan','Course':'Astronomy'}
{'Name':'BILL','Course':'Algebra'}

Python: create dict from list and auto-gen/increment the keys (list is the actual key values)?

i've searched pretty hard and cant find a question that exactly pertains to what i want to..
I have a file called "words" that has about 1000 lines of random A-Z sorted words...
10th
1st
2nd
3rd
4th
5th
6th
7th
8th
9th
a
AAA
AAAS
Aarhus
Aaron
AAU
ABA
Ababa
aback
abacus
abalone
abandon
abase
abash
abate
abater
abbas
abbe
abbey
abbot
Abbott
abbreviate
abc
abdicate
abdomen
abdominal
abduct
Abe
abed
Abel
Abelian
I am trying to load this file into a dictionary, where using the word are the key values and the keys are actually auto-gen/auto-incremented for each word
e.g {0:10th, 1:1st, 2:2nd} ...etc..etc...
below is the code i've hobbled together so far, it seems to sort of works but its only showing me the last entry in the file as the only dict pair element
f3data = open('words')
mydict = {}
for line in f3data:
print line.strip()
cmyline = line.split()
key = +1
mydict [key] = cmyline
print mydict
key = +1
+1 is the same thing as 1. I assume you meant key += 1. I also can't see a reason why you'd split each line when there's only one item per line.
However, there's really no reason to do the looping yourself.
with open('words') as f3data:
mydict = dict(enumerate(line.strip() for line in f3data))
dict(enumerate(x.rstrip() for x in f3data))
But your error is key += 1.
f3data = open('words')
print f3data.readlines()
The use of zero-based numeric keys in a dict is very suspicious. Consider whether a simple list would suffice.
Here is an example using a list comprehension:
>>> mylist = [word.strip() for word in open('/usr/share/dict/words')]
>>> mylist[1]
'A'
>>> mylist[10]
"Aaron's"
>>> mylist[100]
"Addie's"
>>> mylist[1000]
"Armand's"
>>> mylist[10000]
"Loyd's"
I use str.strip() to remove whitespace and newlines, which are present in /usr/share/dict/words. This may not be necessary with your data.
However, if you really need a dictionary, Python's enumerate() built-in function is your friend here, and you can pass the output directly into the dict() function to create it:
>>> mydict = dict(enumerate(word.strip() for word in open('/usr/share/dict/words')))
>>> mydict[1]
'A'
>>> mydict[10]
"Aaron's"
>>> mydict[100]
"Addie's"
>>> mydict[1000]
"Armand's"
>>> mydict[10000]
"Loyd's"
With keys that dense, you don't want a dict, you want a list.
with open('words') as fp:
data = map(str.strip, fp.readlines())
But if you really can't live without a dict:
with open('words') as fp:
data = dict(enumerate(X.strip() for X in fp))
{index: x.strip() for index, x in enumerate(open('filename.txt'))}
This code uses a dictionary comprehension and the enumerate built-in, which takes an input sequence (in this case, the file object, which yields each line when iterated through) and returns an index along with the item. Then, a dictionary is built up with the index and text.
One question: why not just use a list if all of your keys are integers?
Finally, your original code should be
f3data = open('words')
mydict = {}
for index, line in enumerate(f3data):
cmyline = line.strip()
mydict[index] = cmyline
print mydict
Putting the words in a dict makes no sense. If you're using numbers as keys you should be using a list.
from __future__ import with_statement
with open('words.txt', 'r') as f:
lines = f.readlines()
words = {}
for n, line in enumerate(lines):
words[n] = line.strip()
print words

Categories

Resources