How can I sort this list numerically? - python

How do I sort this list numerically?
sa = ['3 :mat', '20 :zap', '20 :jhon', '5 :dave', '14 :maya' ]
print(sorted(sa))
This shows
[ '14 :maya', '20 :zap','20 :jhon', '3 :mat', '5 :dave']

You can do it like this, since your numbers are part a the string:
sorted(sa, key = lambda x: int(x.split(' ')[0]))

You can do something like the below, which will use the numbers in the string and sort them.
sa.sort(key=lambda x: int(''.join(filter(str.isdigit, x))))
print(sa)

using regex:
sorted(sa, key=lambda x:int(re.findall('\d+', x)[0]))
['3 :mat', '5 :dave', '14 :maya', '20 :zap', '20 :jhon']
Using module natsort
from natsort import natsorted
natsorted(sa)
['3 :mat', '5 :dave', '14 :maya', '20 :jhon', '20 :zap']

Related

Setting X-Tick Labels on Transposed Line Plot

I'm trying to properly label my line plot and set the x-tick labels but have been unsuccessful.
Here is what I've tried so far:
plt.xticks(ticks = ... ,labels =...)
AND
labels = ['8 pcw', '12 pcw', '13 pcw', '16 pcw', '17 pcw', '19 pcw', '21 pcw',
'24 pcw', '35 pcw', '37 pcw', '4 mos', '1 yrs', '2 yrs', '3 yrs',
'4 yrs', '8 yrs', '11 yrs', '13 yrs', '18 yrs', '19 yrs', '21 yrs',
'23 yrs', '30 yrs', '36 yrs', '37 yrs', '40 yrs']
ax.set_xticks(labels)
The code that I've used to transpose this dataframe into a line graph is this:
mean_df.transpose().plot().line(figsize = (25, 10))
plt.xlabel("Age")
plt.ylabel("Raw RPKM")
plt.title("BTRC Expression in V1C")
The dataframe I'm using (mean_df) contains columns that are already named with their respective label (8 pcw, 12 pcw, ... 36yrs, 40yrs) so I would have thought that it would have pulled them automatically from there. However, it looks like matplotlib automatically removes the x-ticks and displays only 5 values for the x-ticks. How can I get it to display all 24 values instead?
I keep getting the following two errors when I try the methods listed above:
Failed to convert value(s) to axis units:
OR
ValueError: The number of FixedLocator locations (n), usually from a
call to set_ticks, does not match the number of ticklabels (n)
Here is an image of my plot:

Python regex find single digit if no digits before it

I have a list of strings and I want to use regex to get a single digit if there are no digits before it.
strings = ['5.8 GHz', '5 GHz']
for s in strings:
print(re.findall(r'\d\s[GM]?Hz', s))
# output
['8 GHz']
['5 GHz']
# desired output
['5 GHz']
I want it to just return '5 GHz', the first string shouldn't have any matches. How can I modify my pattern to get the desired output?
As per my comment, it seems that you can use:
(?<!\d\.)\d+\s[GM]?Hz\b
This matches:
(?<!\d\.) - A negative lookbehind to assert position is not right after any single digit and literal dot.
\d+ - 1+ numbers matching the integer part of the frequency.
[GM]?Hz - An optional uppercase G or M followed by "Hz".
\b - A word boundary.
>>> strings = ['5.8 GHz', '5 GHz']
>>>
>>> for s in strings:
... match = re.match(r'^[^0-9]*([0-9] [GM]Hz)', s)
... if match:
... print(match.group(1))
...
5 GHz
Updated Answer
import re
a = ['5.8 GHz', '5 GHz', '8 GHz', '1.2', '1.2 Some Random String', '1 Some String', '1 MHz of frequency', '2 Some String in Between MHz']
res = []
for fr in a:
if re.match('^[0-9](?=.[^0-9])(\s)[GM]Hz$', fr):
res.append(fr)
print(res)
Output:
['5 GHz', '8 GHz']
My two cents:
selected_strings = list(filter(
lambda x: re.findall(r'(?:^|\s+)\d+\s+(?:G|M)Hz', x),
strings
))
With ['2 GHz', '5.8 GHz', ' 5 GHz', '3.4 MHz', '3 MHz', '1 MHz of Frequency'] as strings, here selected_strings:
['2 GHz', ' 5 GHz', '3 MHz', '1 MHz of Frequency']

Given a list of days, months, and hours in python, how to find the minimum?

Suppose I have a list with the following elements:
times = ['1 day ago', '1 day ago', '1 day ago', '1 day ago', '1 day ago', '7 days ago', '5 months ago', '27 days ago', '7 days ago', '7 days ago', '1 month ago', '1 month ago', '7 days ago', '7 days ago', '7 days ago', '7 days ago', '27 days ago', '1 month ago', '6 hours ago', '22 hours ago', '20 hours ago', '15 hours ago', '1 day ago', '4 days ago', '10 days ago', '8 days ago', '6 days ago', '7 days ago', '8 hours ago', '14 days ago', '14 days ago', '22 days ago', '2 months ago', '2 months ago', '2 months ago']
I am wondering how I can find the entry corresponding to the shortest duration. I have an idea to use a look over days, months, etc. but this feels very inefficient. Does anyone have any ideas? thanks!
You can convert to datetime.timedelta which can be used with min:
from datetime import timedelta
def convert(s):
n, unit, __ = s.split()
n = int(n)
if unit.startswith('month'): # assuming "1 month" means 30 days
n *= 30
unit = 'days'
if not unit.endswith('s'):
unit += 's'
return timedelta(**{unit: n})
Then convert the strings and take the minimum:
deltas = [convert(s) for s in times]
min(deltas)
Or use this method as a key to min:
min(times, key=convert)
I would redefine the key parameter of the min built-in function:
def value(t):
x = t.split()
number = int(x[0])
number *= (1 if x[1].startswith("hour") else
24 if x[1].startswith("day") else
24*30)
return number
result = min(times, key=value)
Here I suppose that a month lasts 30 days (this is not always the case).
There you go:
times = ['1 day ago', '1 day ago', '1 day ago', '1 day ago', '1 day ago', '7 days ago', '5 months ago', '27 days ago', '7 days ago', '7 days ago', '1 month ago', '1 month ago', '7 days ago', '7 days ago', '7 days ago', '7 days ago', '27 days ago', '1 month ago', '6 hours ago', '22 hours ago', '20 hours ago', '15 hours ago', '1 day ago', '4 days ago', '10 days ago', '8 days ago', '6 days ago', '7 days ago', '8 hours ago', '14 days ago', '14 days ago', '22 days ago', '2 months ago', '2 months ago', '2 months ago']
def evaluate(time):
if 'hour' in time:
return int(time.split(' ')[0])
if 'day' in time:
return int(time.split(' ')[0]) * 24
if 'month' in time:
return int(time.split(' ')[0]) * 24 * 30
values = [evaluate(time) for time in times]
minValue = min(values)
minIndex = values.index(minValue)
print(minIndex)
print(times[minIndex])
First you have to parse your input and convert to something readable by your software:
for t in times:
num, unit, _ = t.split()
num = int(num) # here you have an integer
Than you can use a dictionary to convert your values to seconds (in this example) to have the same unit of measure.
units = {"second": 1, "minute": 60, "hour": 3600, ... }
So you can extract your unit:
if unit.endswith("s"): # remove the plural `s`
unit = unit[:-1]
converted_unit = units[unit]
seconds_ago = converted_unit * num
And here you have it: you have a single number that you can compare with others, than it is just a matter of finding the minimum.
Enjoy!

Match and append inside a list

So, I am working on a project, and I have the following list :
a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']
I want to run a code that will check whether the first character of each string is present in an other string, and select them to add them in a new list if yes.
I know how to do it, but only for two strings. Here, I want to do it so that it will select all of those which start with the same string, and sort it through the number of original string there is . For example, I want to regroup by sublist of 3 strings (so, coming from the original list), all the possible combinations of strings which start with the same string.
Also, I wish the result would only count one string per possible association of substrings, and not give different combinations with the same substrings but different orders.
The expected result in that case (i.e when i want strings of 3 substrings and with a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']) is:
['2 co, 2 tr, ,2 pi', '2 co, 2 tr, 2, ca', '2pi, 2ca, 2tr', '2pi, 2ca, 2co', 3 co, 3 ca, 3 pi]
You see that here, I don't have '2 tr, 2 co, 2 pi', because i already have '2 co, 2 tr, ,2 pi'
And when i want to regroup by sublist of 4, the expected output is
['2 co, 2 tr, 2, pi, 2 ca']
I managed how to do it, but only when grouping by subset of two, and it gives all the combinations including the one with the same substrings but different order... here is it :
a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']
result = []
for i in range(len(a)):
for j in a[:i]+a[i+1:]:
if a[i][0] == j[0]:
result.append(j)
print(result)
Thanks for your help !
You can use itertools.groupby and itertools.combinations for that task:
import itertools as it
import operator as op
groups = it.groupby(sorted(a), key=op.itemgetter(0))
result = [', '.join(c) for g in groups for c in it.combinations(g[1], 3)]
Note that if the order of elements should only depend on the first character you might want to add another key=op.itemgetter(0) to the sorted function. If the data is already presorted such that "similar" items (with the same first character) are next to each other then you can drop the sorted all together.
Details
it.groupby puts the data into groups, based on their first character (due to key=op.itemgetter(0), which selects the first item, i.e. the first character, from each string). Expanding groups, it looks like this:
[('2', ['2 co', '2 tr', '2 pi', '2 ca']),
('3', ['3 co', '3 ca', '3 pi']),
('6', ['6 tr', '6 pi']),
('7', ['7 ca', '7 pi']),
('8', ['8 tr'])]
Then for each of the groups it.combinations(..., 3) computes all possible combinations of length 3 and concatenates them in the list comprehension (for groups with less than 3 members no combinations are possible):
['2 co, 2 tr, 2 pi',
'2 co, 2 tr, 2 ca',
'2 co, 2 pi, 2 ca',
'2 tr, 2 pi, 2 ca',
'3 co, 3 ca, 3 pi']

How to sort nested dictionary inside a list?

data = [{'USA': [{'accommodations': '2 BR ','Price': 1245},
{'accommodations': '5 BR ','Price': 1045}]},
{'Dubai': [{'accommodations': '2 BR | Sleeps 6','Price': 966},
{'accommodations': '5 BR | Sleeps 6','Price': 800}]}]
I want to sort above data on the basis of Price.
I know I have to do something like this, but I am confused because of nested dictionary and list.
sorted(data, key=lambda k: k["Price"])
Also, wants an only first (i.e. minimum) value entry of a sorted list.
Expected output:
data = [{'usa': {'accommodations': '5 BR ','Price': 1045}},
{'Dubai':{'accommodations': '5 BR | Sleeps 6','Price': 800}}]
Due to fairly complex data nature, you could use a dict comprehension (where only the lowest (based on its price) accommodation is kept for each country), inside a list comprehension (that iterates through all the countries):
>>> [{k: min(v, key=lambda x: x["Price"]) for k, v in item.items()} for item in data]
[{'USA': {'accommodations': '5 BR ', 'Price': 1045}}, {'Dubai': {'accommodations': '5 BR | Sleeps 6', 'Price': 800}}]
Resources:
[Python]: PEP 274 -- Dict Comprehensions
[Python]: List Comprehensions
[Python]: min(iterable, *[, key, default])
This is one way using list comprehensions. Since your dictionaries contain one item each, we extract the first key and first value. Alternatively, you could use next(iter(d.keys())) / next(iter(d.values())).
from operator import itemgetter
# sort increasing by price
res = [{list(item.keys())[0]: sorted(list(item.values())[0], key=itemgetter('Price'))}
for item in data]
# get lowest price
res = [{list(item.keys())[0]: min(list(item.values())[0], key=itemgetter('Price'))}
for item in data]
print(res)
[{'USA': {'accommodations': '5 BR ', 'Price': 1045}},
{'Dubai': {'accommodations': '5 BR | Sleeps 6', 'Price': 800}}]
You can use min:
data = [{'USA': [{'accommodations': '2 BR ','Price': 1245},
{'accommodations': '5 BR ','Price': 1045}]},
{'Dubai': [{'accommodations': '2 BR | Sleeps 6','Price': 966},
{'accommodations': '5 BR | Sleeps 6','Price': 800}]}]
final_data = [{a:min(b, key=lambda x:x['Price']) for a, b in i.items()} for i in data]
Output:
[{'Usa': {'accommodations': '5 BR ', 'Price': 1045}}, {'Dubai': {'accommodations': '5 BR | Sleeps 6', 'Price': 800}}]
You can try this
filteredData = []
for records in data:
for country, accommodations in records.items():
accommodations = sorted(accommodations, key=lambda k: k["Price"])
filteredData.append({country:accommodations[0]})

Categories

Resources