My question is if there's any way to attribute the numbers in the first column to the ones in the second column. So that I can read the numbers in the second column but have them connected to the ones in the first column in some way, so that I can sort them as I do in the sorted_resistances list but after sorting them I replace them with the values in the first column that we're assigned to each of the values.
For information in the code it's opening up from a file the list that's why it's programed like that
1 30000
2 30511
3 30052
4 30033
5 30077
6 30055
7 30086
8 30044
9 30088
10 30019
11 30310
12 30121
13 30132
with open("file.txt") as file_in:
list_of_resistances = []
for line in file_in:
list_of_resistances.append(int(line.split()[1]))
sorted_resistances = sorted(list_of_resistances)
If you want to keep the correlation between the values in the two columns, you can keep all of the values from each line in a tuple (or list), and then sort the list of tuples using a specific piece by passing a lambda function to the key parameter of the sorted() function that tells it to use the second piece of each tuple as the sort value.
In this example, I used pprint.pprint to make the output the of the lists easier to read.
from pprint import pprint
with open("file.txt") as file_in:
list_of_resistances = []
for line in file_in:
list_of_resistances.append(tuple(line.strip().split(' ')))
print("Unsorted values:")
pprint(list_of_resistances)
sorted_resistances = sorted(list_of_resistances, key=lambda x: x[1])
print("\nSorted values:")
pprint(sorted_resistances)
print("\nSorted keys from column 1:")
pprint([x[0] for x in sorted_resistances])
Output:
Unsorted values:
[('1', '30000'),
('2', '30511'),
('3', '30052'),
('4', '30033'),
('5', '30077'),
('6', '30055'),
('7', '30086'),
('8', '30044'),
('9', '30088'),
('10', '30019'),
('11', '30310'),
('12', '30121'),
('13', '30132')]
Sorted values:
[('1', '30000'),
('10', '30019'),
('4', '30033'),
('8', '30044'),
('3', '30052'),
('6', '30055'),
('5', '30077'),
('7', '30086'),
('9', '30088'),
('12', '30121'),
('13', '30132'),
('11', '30310'),
('2', '30511')]
Sorted keys from column 1:
['1', '10', '4', '8', '3', '6', '5', '7', '9', '12', '13', '11', '2']
Related
I have a following problem. I would like to convert dataframe into list of tuples based on a category. See simple code below:
data = {'product_id': ['5', '7', '8', '5', '30'], 'id_customer': ['1', '1', '1', '3', '3']}
df = pd.DataFrame.from_dict(data)
#desired output is:
result = [('5', '7', '8'), ('5', '30')]
how can I do it please? This question did not help me: Convert pandas dataframe into a list of unique tuple
Use GroupBy.agg with tuple like:
print (df.groupby('id_customer', sort=False)['product_id'].agg(tuple).tolist())
print (df.groupby('id_customer', sort=False)['product_id'].apply(tuple).tolist())
print (list(df.groupby('id_customer', sort=False)['product_id'].agg(tuple)))
print (list(df.groupby('id_customer', sort=False)['product_id'].apply(tuple)))
[('5', '7', '8'), ('5', '30')]
Use groupby.agg:
>>> [tuple(v) for _, v in df.groupby('id_customer')['product_id']]
[('5', '7', '8'), ('5', '30')]
>>>
I am using Python to read data from a file and assign the strings in one column to the integers in another column. This is what I have so far:
flight_source_graph = {}
with open('flightinfo.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
next(reader, None)
for row in reader:
flight_source = row[0] #'ABC','XYZ','TWR'
flight_dep_time = row[2] #'0','10','7'
#departure information
if flight_source in flight_source_graph:
flight_source_graph[flight_source].append(flight_dep_time)
else:
flight_source_graph[flight_source] = [flight_dep_time]
Output:
{'ABC': ['0', '10', '7'], 'XYZ': ['4','7','10'], 'TWR': ['9','11','15','24']}
Now that I have this data showing what's assigned to what, how would I go about transforming it to create a list that looks like this:
[('ABC', '0'),('ABC','10'),('ABC','7')]
#Where each of the values in the parentheses () would signify a node
This is one way to combine the keys with the list values:
if __name__ == '__main__':
flight_info = {'ABC': ['0', '10', '7'], 'XYZ': ['4', '7', '10'], 'TWR': ['9', '11', '15', '24']}
flight_graph = []
for k, v in flight_info.items():
for i in v:
flight_graph.append((k, i))
print(flight_graph)
# [('ABC', '0'), ('ABC', '10'), ('ABC', '7'), ('XYZ', '4'), ('XYZ', '7'), ('XYZ', '10'), ('TWR', '9'), ('TWR', '11'), ('TWR', '15'), ('TWR', '24')]
I searched for sorting a Python dictionary based on value and got various answers on the internet.Tried few of them and finally used Sorted function.
I have simplified the example to make it clear.
I have a dictionary,say:
temp_dict = {'1': '40', '0': '109', '3': '37', '2': '42', '5': '26', '4': '45', '7': '109', '6': '42'}
Now ,to sort it out based on value,I did the following operation(using Operator module):
sorted_temp_dict = sorted(temp_dict.items(), key=operator.itemgetter(1))
The result I'm getting is(The result is a tuple,which is fine for me):
[('0', '109'), ('7', '109'), ('5', '26'), ('3', '37'), ('1', '40'), ('2', '42'), ('6', '42'), ('4', '45')]
The issue is,as you can see,the first two elements of the tuple is not sorted.The rest of the elements are sorted perfectly based on the value.
Not able to find the mistake here.Any help will be great.Thanks
Those are sorted. They are strings, and are sorted lexicographically: '1' is before '2', etc.
If you want to sort by numeric value, you'll need to convert to ints in the key function. For example:
sorted(temp_dict.items(), key=lambda x: int(x[1]))
They are sorted, the issue is that the elements are string , hence -
'109' < '26' # this is true, as they are string
Try converting them to int for the key argument, you can use a lambda such as -
>>> sorted_temp_dict = sorted(temp_dict.items(), key=lambda x: int(x[1]))
>>> sorted_temp_dict
[('5', '26'), ('3', '37'), ('1', '40'), ('6', '42'), ('2', '42'), ('4', '45'), ('7', '109'), ('0', '109')]
The problem is trying to sort with values that are str, and not int. If you first convert the values into int and then sort, it will work.
I am using a OCR algorithm (tesseract based) which has difficulties with recognizing certain characters. I have partially solved that by creating my own "post-processing hash-table" which includes pairs of characters. For example, since the text is just numbers, I have figured out that if there is Q character inside the text, it should be 9 instead.
However I have a more serious problem with 6 and 8 characters since both of them are recognized as B. Now since I know what I am looking for (when I am translating the image to text) and the strings are fairly short (6~8 digits), I thought to create strings with all possible combinations of 6 and 8 and compare each one of them to the one I am looking for.
So for example, I have the following string recognized by the OCR:
L0B7B0B5
So each B here can be 6 or 8.
Now I want to generate a list like the below:
L0878085
L0878065
L0876085
L0876065
.
.
So it's kind of binary table with 3 digits and in this case there are 8 options. But the amount of B characters in string can be other than 3 (it can be any number).
I have tried to use Python itertools module with something like that:
list(itertools.product(*["86"] * 3))
Which will provide the following result:
[('8', '8', '8'), ('8', '8', '6'), ('8', '6', '8'), ('8', '6', '6'), ('6', '8', '8'), ('6', '8', '6'), ('6', '6', '8'), ('6', '6', '6')]
which I assume I can then later use to swap B characters. However, for some reason I can't make itertools work in my environment. I assume it has something to do the fact I am using Jython and not pure Python.
I will be happy to hear any other ideas as how to complete this task. Maybe there is a simpler solution I didn't think of?
itertools.product accepts a repeat keyword that you can use:
In [92]: from itertools import product
In [93]: word = "L0B7B0B5"
In [94]: subs = product("68", repeat=word.count("B"))
In [95]: list(subs)
Out[95]:
[('6', '6', '6'),
('6', '6', '8'),
('6', '8', '6'),
('6', '8', '8'),
('8', '6', '6'),
('8', '6', '8'),
('8', '8', '6'),
('8', '8', '8')]
Then one fairly concise method to make the substitutions is to do a reduction operation with the string replace method:
In [97]: subs = product("68", repeat=word.count("B"))
In [98]: [reduce(lambda s, c: s.replace('B', c, 1), sub, word) for sub in subs]
Out[98]:
['L0676065',
'L0676085',
'L0678065',
'L0678085',
'L0876065',
'L0876085',
'L0878065',
'L0878085']
Another method, using a couple more functions from itertools:
In [90]: from itertools import chain, izip_longest
In [91]: subs = product("68", repeat=word.count("B"))
In [92]: [''.join(chain(*izip_longest(word.split('B'), sub, fillvalue=''))) for sub in subs]
Out[92]:
['L0676065',
'L0676085',
'L0678065',
'L0678085',
'L0876065',
'L0876085',
'L0878065',
'L0878085']
Here simple recursive function for generating your strings : - (It is a pseudo code)
permut(char[] original,char buff[],int i) {
if(i<original.length) {
if(original[i]=='B') {
buff[i] = '6'
permut(original,buff,i+1)
buff[i] = '8'
permut(original,buff,i+1)
}
else if(original[i]=='Q') {
buff[i] = '9'
permut(original,buff,i+1)
}
else {
buff[i] = ch[i];
permut(original,buff,i+1)
}
}
else {
store buff[]
}
}
print activities
activities = sorted(activities,key = lambda item:item[1])
print activities
Activities in this case is a list of tuples like (start_number,finish_number) the output of the above code according to me should be the list of values sorted according the the increasing order of finish_number. When I tried the above code in shell I got the following output. I am not sure why the second list is not sorted according the the increasing order of the finish_number. Please help me in understanding this.
[('1', '4'), ('3', '5'), ('0', '6'), ('5', '7'), ('3', '9'), ('5', '9'), ('6', '10'), ('8', '11'), ('8', '12'), ('2', '14'), ('12', '16')]
[('6', '10'), ('8', '11'), ('8', '12'), ('2', '14'), ('12', '16'), ('1', '4'), ('3', '5'), ('0', '6'), ('5', '7'), ('3', '9'), ('5', '9')]
You are sorting strings instead of integers: in that case, 10 is "smaller" than 4. To sort on integers, convert it to this:
activites = sorted(activities,key = lambda item:int(item[1]))
print activities
Results in:
[('1', '4'), ('3', '5'), ('0', '6'), ('5', '7'), ('3', '9'), ('5', '9'), ('6', '10'), ('8', '11'), ('8', '12'), ('2', '14'), ('12', '16')]
Your items are being compared as strings, not as numbers. Thus, since the 1 character comes before 4 lexicographically, it makes sense that 10 comes before 4.
You need to cast the value to an int first:
activities = sorted(activities,key = lambda item:int(item[1]))
You are sorting strings, not numbers. Strings get sorted character by character.
So, for example '40' is greater than '100' because character 4 is larger than 1.
You can fix this on the fly by simply casting the item as an integer.
activities = sorted(activities,key = lambda item: int(item[1]))
It's because you're not storing the number as a number, but as a string. The string '10' comes before the string '2'. Try:
activities = sorted(activities, key=lambda i: int(i[1]))
Look for a BROADER solution to your problem: Convert your data from str to int immediately on input, work with it as int (otherwise you'll be continually be bumping into little problems like this), and format your data as str for output.
This principle applies generally, e.g. when working with non-ASCII string data, do UTF-8 -> unicode -> UTF-8; don't try to manipulate undecoded text.