Python divide each string by the total lenght of string

Python divide each string by the total lenght of string - python

Thank you for your help and patience.
I am new to python and am attempting to calculate the number of times a particular atomic symbol appears divided by the total number of atoms. So that the function accepts a list of strings as argument and returns a list containing the fraction of 'C', 'H', 'O' and 'N'. But I keep on getting one result instead of getting all for each of my atoms. My attempt is below:
Atoms = ['N', 'C', 'C', 'O', 'H', 'H', 'C', 'H', 'H', 'H', 'H', 'O', 'H']
def count_atoms (atoms):
for a in atoms:
total = atoms.count(a)/len(atoms)
return total
Then
faa = count_atoms(atoms)
print(faa)
However I only get one result which is 0.07692307692307693. I was supposed to get a list starting with [0.23076923076923078,..etc], but I don't know how to. I was supposed to calculate the fraction of 'C', 'H', 'O' and 'N' atomic symbols in the molecule using a for loop and a return statement. :( Please help, it will be appreciated.

#ganderson comment explains the problem. as to alternative implementation here is one using collections.Counter
from collections import Counter
atoms = ['N', 'C', 'C', 'O', 'H', 'H', 'C', 'H', 'H', 'H', 'H', 'O', 'H']
def count_atoms(atoms):
num = len(atoms)
return {atom:count/num for atom, count in Counter(atoms).items()}
print(count_atoms(atoms))

Well you return the variable total at your first loop. Why don't you use a list to store your values? Like this:
atoms = ['N', 'C', 'C', 'O', 'H', 'H', 'C', 'H', 'H', 'H', 'H', 'O', 'H'] #python is case sensitive!
def count_atoms (atoms):
return_list = [] #empty list
for a in atoms:
total = atoms.count(a)/len(atoms)
return_list.append(total) #we add a new item
return return_list #we return everything and leave the function

It would be better to return a dictionary so you know which element the fraction corresponds to:
>>> fractions = {element: Atoms.count(element)/len(Atoms) for element in Atoms}
>>> fractions
{'N': 0.07692307692307693, 'C': 0.23076923076923078, 'O': 0.15384615384615385, 'H': 0.5384615384615384}
You can, then, even lookup the fraction for a particular element like:
>>> fractions['N']
0.07692307692307693
However, if you must use a for loop and a return statement, then answer from #not_a_bot_no_really_82353 would be the right one.

A simple one liner should do
[atoms.count(a)/float(len(atoms)) for a in set(atoms)]
Or better create a dictionary using comprehension
{a:atoms.count(a)/float(len(atoms)) for a in set(atoms)}
Output
{'C': 0.23076923076923078,
'H': 0.5384615384615384,
'N': 0.07692307692307693,
'O': 0.15384615384615385}
If you still want to use the for loop. I would suggest to go for map which would be a lot cleaner
atoms = ['N', 'C', 'C', 'O', 'H', 'H', 'C', 'H', 'H', 'H', 'H', 'O', 'H']
def count_atoms (a):
total = atoms.count(a)/float(len(atoms))
return total
map(count_atoms,atoms)

Related

Python script to generate a word with specific structure and letter combinations

I want to write a really short script that will help me generate a random/nonsense word with the following qualities:
-Has 8 letters
-First letter is "A"
-Second and Fourth letters are random letters
-Fifth letter is a vowel
-Sixth and Seventh letters are random letters and are the same
-Eighth letter is a vowel that's not "a"
This is what I have tried so far (using all the info I could find and understand online)
firsts = 'A'
seconds = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
thirds = ['a', 'e', 'i', 'o', 'u', 'y']
fourths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
fifths = ['a', 'e', 'i', 'o', 'u', 'y']
sixths = sevenths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
eighths = ['e', 'i', 'o', 'u', 'y']
print [''.join(first, second, third, fourth, fifth)
for first in firsts
for second in seconds
for third in thirds
for fourth in fourths
for fifth in fifths
for sixth in sixths
for seventh in sevenths
for eighth in eighths]
However it keeps showing a SyntaxError: invalid syntax after the for and now I have absolutely no idea how to make this work. If possible please look into this for me, thank you so much!

So the magic function you need to know about to pick a random letter is random.choice. You can pass a list into this function and it will give you a random element from that list. It also works with strings because strings are basically a list of chars. Also to make your life easier, use string module. string.ascii_lowercase returns all the letters from a to z in a string so you don't have to type it out. Lastly, you don't use loops to join strings together. Keep it simple. You can just add them together.
import string
from random import choice
first = 'A'
second = choice(string.ascii_lowercase)
third = choice(string.ascii_lowercase)
fourth = choice(string.ascii_lowercase)
fifth = choice("aeiou")
sixthSeventh = choice(string.ascii_lowercase)
eighth = choice("eiou")
word = first + second + third + fourth + fifth + sixthSeventh + sixthSeventh + eighth
print(word)

Try this:
import random
sixth=random.choice(sixths)
s='A'+random.choice(seconds)+random.choice(thirds)+random.choice(fourths)+random.choice(fifths)+sixth+sixth+random.choice(eighths)
print(s)
Output:
Awixonno
Ahiwojjy
etc

There are several things to consider. First, the str.join() method takes in an iterable (e.g. a list), not a bunch of individual elements. Doing
''.join([first, second, third, fourth, fifth])
fixes the program in this respect. If you are using Python 3, print() is a function, and so you should add parentheses around the entire list comprehension.
With the syntax out of the way, let's get to a more interesting problem: Your program constructs every (82255680 !) possible word. This takes a long time and memory. What you want is probably to just pick one. You can of course do this by first constructing all, then picking one at random. It's far cheaper though to pick one letter from each of firsts, seconds, etc. at random and then collecting these. All together then:
import random
firsts = ['A']
seconds = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
thirds = ['a', 'e', 'i', 'o', 'u', 'y']
fourths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
fifths = ['a', 'e', 'i', 'o', 'u', 'y']
sixths = sevenths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
eighths = ['e', 'i', 'o', 'u', 'y']
result = ''.join([
random.choice(firsts),
random.choice(seconds),
random.choice(thirds),
random.choice(fourths),
random.choice(fifths),
random.choice(sixths),
random.choice(sevenths),
random.choice(eighths),
])
print(result)
To improve the code from here, try to:
Find a way to generate the "data" in a neater way than writing it out explicitly. As an example:
import string
seconds = list(string.ascii_lowercase) # you don't even need list()!
Instead of having a separate variable firsts, seconds, etc., collect these into a single variable, e.g. a single list containing each original list as a single str with all characters included.

This will implement what you describe. You can make the code neater by putting the choices into an overall list rather than have several different variables, but you will have to explicitly deal with the fact that the sixth and seventh letters are the same; they will not be guaranteed to be the same simply because there are the same choices available for each of them.
The list choices_list could contain sub-lists per your original code, but as you are choosing single characters it will work equally with strings when using random.choice and this also makes the code a bit neater.
import random
choices_list = [
'A',
'abcdefghijklmnopqrstuvwxyz',
'aeiouy',
'abcdefghijklmnopqrstuvwxyz',
'aeiouy',
'abcdefghijklmnopqrstuvwxyz',
'eiouy'
]
letters = [random.choice(choices) for choices in choices_list]
word = ''.join(letters[:6] + letters[5:]) # here the 6th letter gets repeated
print(word)
Some example outputs:
Alaeovve
Aievellu
Ategiwwo
Aeuzykko

Here's the syntax fix:
print(["".join([first, second, third])
for first in firsts
for second in seconds
for third in thirds])
This method might take up a lot of memory.

Unique elements of sublists depending on specific value in sublist

I an trying to select unique datasets from a very large quite inconsistent list.
My Dataset RawData consists of string-items of different length.
Some items occure many times, for example: ['a','b','x','15/30']
The key to compare the item is always the last string: for example '15/30'
The goal is: Get a list: UniqueData with items that occure only once. (i want to keep the order)
Dataset:
RawData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['a','x','c','15/30'],['i','j','k','l','m','n','o','p','20/60'],['x','b','c','15/30']]
My desired solution Dataset:
UniqueData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['i','j','k','l','m','n','o','p','20/60']]
I tried many possible solutions for instance:
for index, elem in enumerate(RawData): and appending to a new list if.....
for element in list does not work, because the items are not exactly the same.
Can you help me finding a solution to my problem?
Thanks!

The best way to remove duplicates is to add them into a set. Add the last element into a set as to keep track of all the unique values. When the value you want to add is already present in the set unique do nothing if not present add the value to set unique and append the lst to result list here it's new.
Try this.
new=[]
unique=set()
for lst in RawData:
if lst[-1] not in unique:
unique.add(lst[-1])
new.append(lst)
print(new)
#[['a', 'b', 'x', '15/30'],
['d', 'e', 'f', 'g', 'h', '20/30'],
['w', 'x', 'y', 'z', '10/10'],
['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60']]

You could set up a new array for unique data and to track the items you have seen so far. Then as you loop through the data if you have not seen the last element in that list before then append it to unique data and add it to the seen list.
RawData = [['a', 'b', 'x', '15/30'], ['d', 'e', 'f', 'g', 'h', '20/30'], ['w', 'x', 'y', 'z', '10/10'],
['a', 'x', 'c', '15/30'], ['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60'], ['x', 'b', 'c', '15/30']]
seen = []
UniqueData = []
for data in RawData:
if data[-1] not in seen:
UniqueData.append(data)
seen.append(data[-1])
print(UniqueData)
OUTPUT
[['a', 'b', 'x', '15/30'], ['d', 'e', 'f', 'g', 'h', '20/30'], ['w', 'x', 'y', 'z', '10/10'], ['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60']]

RawData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['a','x','c','15/30'],['i','j','k','l','m','n','o','p','20/60'],['x','b','c','15/30']]
seen = []
seen_indices = []
for _,i in enumerate(RawData):
# _ -> index
# i -> individual lists
if i[-1] not in seen:
seen.append(i[-1])
else:
seen_indices.append(_)
for index in sorted(seen_indices, reverse=True):
del RawData[index]
print (RawData)

Using a set to filter out entries for which the key has already been seen is the most efficient way to go.
Here's a one liner example using a list comprehension with internal side effects:
UniqueData = [rd for seen in [set()] for rd in RawData if not(rd[-1] in seen or seen.add(rd[-1])) ]

check if two lists differ for an element

I have two lists:
list1=['h', 'e', 'n', 'o', 'p']
list2=['e', 'h', 'c', 'n', 'p', 'o']
I want my function diff1 to return true if these two lists differ for exactly one element
in this case diff1 return True because list2 has a 'c'
I can assume list2 has always exactly one more element than list1
thanks you for any help you can provide

You could use the symmetric difference of sets:
symmetric_difference(other)
set ^ other
Return a new set with elements
in either the set or other but not both.
list1=['h', 'e', 'n', 'o', 'p']
list2=['e', 'h', 'c', 'n', 'p', 'o']
sym_diff = set(list1).symmetric_difference(list2)
print(sym_diff)
# {'c'}
And you just need to check if this difference contains one item:
one_different = len(sym_diff) == 1
print(one_different)
# True

How to convert a numpy.ndarray type into a list?

I want to read a matfile in python and then export the data in a database. in order to do this I need to have the data type as list in python. I wrote the code below:
import scipy.io as si
import csv
a = si.loadmat('matfilename')
b = a['variable']
list1=b.tolist()
The variable has 1 row and 15 columns. when I print list1, I get the answer below: (It is indeed a list, but a list that contains only one element. It means when I call list1[0], I get the same result.):
[[array(['A'],
dtype='<U13'), array(['B'],
dtype='<U14'), array(['C'],
dtype='<U6'), array(['D'],
dtype='<U4'), array(['E'],
dtype='<U10'), array(['F'],
dtype='<U13'), array(['G'],
dtype='<U11'), array(['H'],
dtype='<U9'), array(['I'],
dtype='<U16'), array(['J'],
dtype='<U18'), array(['K'],
dtype='<U16'), array(['L'],
dtype='<U16'), array(['M'],
dtype='<U16'), array(['N'],
dtype='<U14'), array(['O'],
dtype='<U13')]]
While the form that I expect is:
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
Does anyone know what the problem is?

To my experience, that is just like MATLAB files are structured, only nested arrays.
You can create the list yourself:
>>> [x[0][0] for x in list1[0]]
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']

Sorting a list using an alphabet string

I'm trying to sort a list containing only lower case letters by using the string :
alphabet = "abcdefghijklmnopqrstuvwxyz".
that is without using sort, and with O(n) complexity only.
I got here:
def sort_char_list(lst):
alphabet = "abcdefghijklmnopqrstuvwxyz"
new_list = []
length = len(lst)
for i in range(length):
new_list.insert(alphabet.index(lst[i]),lst[i])
print (new_list)
return new_list
for this input :
m = list("emabrgtjh")
I get this:
['e']
['e', 'm']
['a', 'e', 'm']
['a', 'b', 'e', 'm']
['a', 'b', 'e', 'm', 'r']
['a', 'b', 'e', 'm', 'r', 'g']
['a', 'b', 'e', 'm', 'r', 'g', 't']
['a', 'b', 'e', 'm', 'r', 'g', 't', 'j']
['a', 'b', 'e', 'm', 'r', 'g', 't', 'h', 'j']
['a', 'b', 'e', 'm', 'r', 'g', 't', 'h', 'j']
looks like something goes wrong along the way, and I can't seem to understand why.. if anyone can please enlighten me that would be great.

You are looking for a bucket sort. Here:
def sort_char_list(lst):
alphabet = "abcdefghijklmnopqrstuvwxyz"
# Here, create the 26 buckets
new_list = [''] * len(alphabet)
for letter in lst:
# This is the bucket index
# You could use `ord(letter) - ord('a')` in this specific case, but it is not mandatory
index = alphabet.index(letter)
new_list[index] += letter
# Assemble the buckets
return ''.join(new_list)
As for complexity, since alphabet is a pre-defined fixed-size string, searching a letter in it is requires at most 26 operations, which qualifies as O(1). The overall complexity is therefore O(n)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python divide each string by the total lenght of string - python

Related

Python script to generate a word with specific structure and letter combinations

Unique elements of sublists depending on specific value in sublist

check if two lists differ for an element

How to convert a numpy.ndarray type into a list?

Sorting a list using an alphabet string

Categories

Resources