Python: Unify multiple lists into one - python

Could you help me with the following challenge I am currently facing:
I have multiple lists, each of which contains multiple strings. Each string has the following format:
"ID-Type" - where ID is a number and type is a common Python type. One such example can be found here:
["1-int", "2-double", "1-string", "5-list", "5-int"],
["3-string", "1-int", "1-double", "5-double", "5-string"]
Before calculating further, I now want to preprocess these list to unify them the following way:
Count how often each type is appearing in each list
Generate a new list, combining both results
Create a mapping from initial list to that new list
As an example
In the above lists, we have the following types:
List 1: 2 int, 1 double, 1 string, 1 list
List 2: 2 string, 2 double, 1 int
The resulting table should now contain:
2 int, 2 double, 2 string, 1 list (in order to be able to contain both lists), like this:
[
"int_1-int",
"int_2-int",
"double_1-double",
"double_2-double",
"string_1-string",
"string_2-string",
"list_1-list"
]
And lastly, in order to map input to output, the idea is to have a corresponding dictionary to map this transformation, e.g., for list_1:
{
"1-int": "int_1-int",
"2-double": "double_1-double",
"1-string": "string_1-string",
"5-list": "list_1-list",
"5-int": "int_2-int"
}
I want to prevent to do this with a nested loop and multiple iterations - are there any libraries or is there maybe a smart vectorized solution to address this challenge?

Just add them:
Example :
['it'] + ['was'] + ['annoying']
You should read the Python tutorial to learn basic info like this.
Just another method....
import itertools
ab = itertools.chain(['it'], ['was'], ['annoying'])
list(ab)

Just add them: Example :
['it'] + ['was'] + ['annoying']
You should read the Python tutorial to learn basic info like this.
Just another method....
import itertools
ab = itertools.chain(['it'], ['was'], ['annoying'])
list(ab)

In general, this approach doesn't really make sense unless you specifically need to have the items in the resulting list and dict in this exact format. But here's how you can do it:
def process_type_list(type_list):
mapping = dict()
for i in type_list:
i_type = i.split('-')[1]
n_occur = 1
map_val = f'{i_type}_{n_occur}-{i_type}'
while map_val in mapping.values():
n_occur += 1
map_val = f'{i_type}_{n_occur}-{i_type}'
mapping[i] = map_val
return mapping
l1 = ["1-int", "2-double", "1-string", "5-list", "5-int"]
l2 = ["3-string", "1-int", "1-double", "5-double", "5-string"]
l1_mapping = process_type_list(l1)
l2_mapping = process_type_list(l2)
Additionally, Python does not have a double type. C doubles are implemented as Python floats (or decimal.Decimal if you need fine control over the precision)

I am pretty sure that this is what you want to do:
To make a joint list:
['any item'] + ['any item 2']
If you want to turn the list into a dictionary:
dict(zip(['key 1', 'key 2'], ['value 1', 'value 2']))
Another method of joining 2 lists:
a = ['list item', 'another list item']
a.extend(['another list item', 'another list item'])

Related

Given two list of words, than return as dictionary and set together

Hey (Sorry bad english) so am going to try and make my question more clear. if i have a function let's say create_username_dict(name_list, username_list). which takes in two list's 1 being the name_list with names of people than the other list being usernames that is made out of the names of people. what i want to do is take does two list than convert them to a dictonary and set them together.
like this:
>>> name_list = ["Ola Nordmann", "Kari Olsen", "Roger Jensen"]
>>> username_list = ["alejon", "carli", "hanri"]
>>> create_username_dict(name_list, username_list)
{
"Albert Jones": "alejon",
"Carlos Lion": "carli",
"Hanna Richardo": "hanri"
}
i have tried look around on how to connect two different list in too one dictonary, but can't seem to find the right solution
If both lists are in matching order, i.e. the i-th element of one list corresponds to the i-th element of the other, then you can use this
D = dict(zip(name_list, username_list))
Use zip to pair the list.
d = {key: value for key,value in zip(name_list, username_list)}
print(d)
Output:
{'Ola Nordmann': 'alejon', 'Kari Olsen': 'carli', 'Roger Jensen': 'hanri'}
Considering both the list are same length and one to one mapping
name_list = ["Ola Nordmann", "Kari Olsen", "Roger Jensen"]
username_list = ["alejon", "carli", "hanri"]
result_stackoverflow = dict()
for index, name in enumerate(name_list):
result_stackoverflow[name] = username_list[index]
print(result_stackoverflow)
>>> {'Ola Nordmann': 'alejon', 'Kari Olsen': 'carli', 'Roger Jensen': 'hanri'}
Answer by #alex does the same but maybe too encapsulated for a beginner. So this is the verbose version.

How to split based on string matching?

I have two lists, one that contains the user input and the other one that contains the mapping.
The user input looks like this :
The mapping looks like this :
I am trying to split the strings in the user input list. Sometime they enter one record as CO109CO45 but in reality these are two codes and don't belong together. They need to be separated with a comma or space as such CO109,CO45.
There are many examples that have the same behavior and i was thinking to use a mapping list to match and split. Is this something that can be done? What do you suggest? Thanks in advance for your help!
Use a combination of look ahead and look behind regex in the split.
df = pd.DataFrame({'RCode': ['CO109', 'CO109CO109']})
print(df)
RCode
0 CO109
1 CO109CO109
df.RCode.str.split('(?<=\d)(?=\D)')
0 [CO109]
1 [CO109, CO109]
Name: RCode, dtype: object
You can try with regex:
import re
l = ['CO2740CO96', 'CO12', 'CO973', 'CO870CO397', 'CO584', 'CO134CO42CO685']
df = pd.DataFrame({'code':l})
df.code = df.code.str.findall('[A-Za-z]+\d+')
print(df)
Output:
code
0 [CO2740, CO96]
1 [CO12]
2 [CO973]
3 [CO870, CO397]
4 [CO584]
5 [CO134, CO42, CO685]
I usually use something like this, for an input original_list:
output_list = [
[
('CO' + target).strip(' ,')
for target in item.split('CO')
]
for item in original_list
]
There are probably more efficient ways of doing it, but you don't need the overhead of dataframes / pandas, or the hard-to-read aspects of regexes.
If you have a manageable number of prefixes ("CO", "PR", etc.), you can set up a recursive function splitting on each of them. - Or you can use .find() with the full codes.

Iterate over a list of strings in python

I am trying to set up a data set that checks how often several different names are mentioned in a list of articles. So for each article, I want to know how often nameA, nameB and so forth are mentioned. However, I have troubles with iterating over the list.
My code is the following:
for element in list_of_names:
for i in list_of_articles:
list_of_namecounts = len(re.findall(element, i))
list_of_names = a string with several names [nameA nameB nameC]
list_of_articles = a list with 40.000 strings that are articles
Example of article in list_of_articles:
Index: 1
Type: str
Size: Amsterdam - de financiële ...
the error i get is: expected string or buffer
I though that when iterating over the list of strings, that the re.findall command should work using lists like this, but am also fairly new to Python. Any idea how to solve my issue here?
Thank you!
If your list is ['apple', 'apple', 'banana'] and you want the result: number of apple = 2, then:
from collections import Counter
list_count = Counter(list_of_articles)
for element in list_of_names:
list_of_namecounts = list_count[element]
And assuming list_of_namecounts is a list ¿?
list_of_namecounts = []
for element in list_of_names:
list_of_namecounts.append(list_count[element])
See this for more understanding

Regular expressions matching words which contain the pattern but also the pattern plus something else

I have the following problem:
list1=['xyz','xyz2','other_randoms']
list2=['xyz']
I need to find which elements of list2 are in list1. In actual fact the elements of list1 correspond to a numerical value which I need to obtain then change. The problem is that 'xyz2' contains 'xyz' and therefore matches also with a regular expression.
My code so far (where 'data' is a python dictionary and 'specie_name_and_initial_values' is a list of lists where each sublist contains two elements, the first being specie name and the second being a numerical value that goes with it):
all_keys = list(data.keys())
for i in range(len(all_keys)):
if all_keys[i]!='Time':
#print all_keys[i]
pattern = re.compile(all_keys[i])
for j in range(len(specie_name_and_initial_values)):
print re.findall(pattern,specie_name_and_initial_values[j][0])
Variations of the regular expression I have tried include:
pattern = re.compile('^'+all_keys[i]+'$')
pattern = re.compile('^'+all_keys[i])
pattern = re.compile(all_keys[i]+'$')
And I've also tried using 'in' as a qualifier (i.e. within a for loop)
Any help would be greatly appreciated. Thanks
Ciaran
----------EDIT------------
To clarify. My current code is below. its used within a class/method like structure.
def calculate_relative_data_based_on_initial_values(self,copasi_file,xlsx_data_file,data_type='fold_change',time='seconds'):
copasi_tool = MineParamEstTools()
data=pandas.io.excel.read_excel(xlsx_data_file,header=0)
#uses custom class and method to get the list of lists from a file
specie_name_and_initial_values = copasi_tool.get_copasi_initial_values(copasi_file)
if time=='minutes':
data['Time']=data['Time']*60
elif time=='hour':
data['Time']=data['Time']*3600
elif time=='seconds':
print 'Time is already in seconds.'
else:
print 'Not a valid time unit'
all_keys = list(data.keys())
species=[]
for i in range(len(specie_name_and_initial_values)):
species.append(specie_name_and_initial_values[i][0])
for i in range(len(all_keys)):
for j in range(len(specie_name_and_initial_values)):
if all_keys[i] in species[j]:
print all_keys[i]
The table returned from pandas is accessed like a dictionary. I need to go to my data table, extract the headers (i.e. the all_keys bit), then look up the name of the header in the specie_name_and_initial_values variable and obtain the corresponding value (the second element within the specie_name_and_initial_value variable). After this, I multiply all values of my data table by the value obtained for each of the matched elements.
I'm most likely over complicating this. Do you have a better solution?
thanks
----------edit 2 ---------------
Okay, below are my variables
all_keys = set([u'Cyp26_G_R1', u'Cyp26_G_rep1', u'Time'])
species = set(['[Cyp26_R1R2_RARa]', '[Cyp26_SRC3_1]', '[18-OH-RA]', '[p38_a]', '[Cyp26_G_rep1]', '[Cyp26]', '[Cyp26_G_a]', '[SRC3_p]', '[mRARa]', '[np38_a]', '[mRARa_a]', '[RARa_pp_TFIIH]', '[RARa]', '[Cyp26_G_L2]', '[atRA]', '[atRA_c]', '[SRC3]', '[RARa_Ser369p]', '[p38]', '[Cyp26_mRNA]', '[Cyp26_G_L]', '[TFIIH]', '[Cyp26_SRC3_2]', '[Cyp26_G_R1R2]', '[MSK1]', '[MSK1_a]', '[Cyp26_G]', '[Basal_Kinases]', '[Cyp26_R1_RARa]', '[4-OH-RA]', '[Cyp26_G_rep2]', '[Cyp26_Chromatin]', '[Cyp26_G_R1]', '[RXR]', '[SMRT]'])
You don't need a regex to find common elements, set.intersection will find all elements in list2 that are also in list1:
list1=['xyz','xyz2','other_randoms']
list2=['xyz']
print(set(list2).intersection(list1))
set(['xyz'])
Also if you wanted to compare 'xyz' to 'xyz2' you would use == not in and then it would correctly return False.
You can also rewrite your own code a lot more succinctly, :
for key in data:
if key != 'Time':
pattern = re.compile(val)
for name, _ in specie_name_and_initial_values:
print re.findall(pattern, name)
Based on your edit you have somehow managed to turn lists into strings, one option is to strip the []:
all_keys = set([u'Cyp26_G_R1', u'Cyp26_G_rep1', u'Time'])
specie_name_and_initial_values = set(['[Cyp26_R1R2_RARa]', '[Cyp26_SRC3_1]', '[18-OH-RA]', '[p38_a]', '[Cyp26_G_rep1]', '[Cyp26]', '[Cyp26_G_a]', '[SRC3_p]', '[mRARa]', '[np38_a]', '[mRARa_a]', '[RARa_pp_TFIIH]', '[RARa]', '[Cyp26_G_L2]', '[atRA]', '[atRA_c]', '[SRC3]', '[RARa_Ser369p]', '[p38]', '[Cyp26_mRNA]', '[Cyp26_G_L]', '[TFIIH]', '[Cyp26_SRC3_2]', '[Cyp26_G_R1R2]', '[MSK1]', '[MSK1_a]', '[Cyp26_G]', '[Basal_Kinases]', '[Cyp26_R1_RARa]', '[4-OH-RA]', '[Cyp26_G_rep2]', '[Cyp26_Chromatin]', '[Cyp26_G_R1]', '[RXR]', '[SMRT]'])
specie_name_and_initial_values = set(s.strip("[]") for s in specie_name_and_initial_values)
print(all_keys.intersection(specie_name_and_initial_values))
Which outputs:
set([u'Cyp26_G_R1', u'Cyp26_G_rep1'])
FYI, if you had lists inside the set you would have gotten an error as lists are mutable so are not hashable.

Remove specific characters from list python

I am fairly new to Python. I have a list as follows:
sorted_x = [('pvg-cu2', 50.349189), ('hkg-pccw', 135.14921), ('syd-ipc', 163.441705), ('sjc-inap', 165.722676)]
I am trying to write a regex which will remove everything after the '-' and before the ',', i.e I need the same list to look as below:
[('pvg', 50.349189), ('hkg', 135.14921), ('syd', 163.441705), ('sjc', 165.722676)]
I have written a regex as follows:
for i in range(len(sorted_x)):
title_search = re.search('^\(\'(.*)-(.*)\', (.*)\)$', str(sorted_x[i]), re.IGNORECASE)
if title_search:
title = title_search.group(1)
time = title_search.group(3)
But this requires me to create two new lists and I don't want to change my original list.
Can you please suggest a simple way so that I can modify my original list without creating a new list?
result = [(a.split('-', 1)[0], b) for a, b in sorted_x]
Example:
>>> sorted_x = [('pvg-cu2', 50.349189), ('hkg-pccw', 135.14921), ('syd-ipc', 163.441705), ('sjc-inap', 165.722676)]
>>> [(a.split('-', 1)[0], b) for a, b in sorted_x]
[('pvg', 50.349189000000003), ('hkg', 135.14921000000001), ('syd', 163.44170500000001), ('sjc', 165.72267600000001)]

Categories

Resources