Related
How do I sort this list numerically?
sa = ['3 :mat', '20 :zap', '20 :jhon', '5 :dave', '14 :maya' ]
print(sorted(sa))
This shows
[ '14 :maya', '20 :zap','20 :jhon', '3 :mat', '5 :dave']
You can do it like this, since your numbers are part a the string:
sorted(sa, key = lambda x: int(x.split(' ')[0]))
You can do something like the below, which will use the numbers in the string and sort them.
sa.sort(key=lambda x: int(''.join(filter(str.isdigit, x))))
print(sa)
using regex:
sorted(sa, key=lambda x:int(re.findall('\d+', x)[0]))
['3 :mat', '5 :dave', '14 :maya', '20 :zap', '20 :jhon']
Using module natsort
from natsort import natsorted
natsorted(sa)
['3 :mat', '5 :dave', '14 :maya', '20 :jhon', '20 :zap']
Suppose I have a list with the following elements:
times = ['1 day ago', '1 day ago', '1 day ago', '1 day ago', '1 day ago', '7 days ago', '5 months ago', '27 days ago', '7 days ago', '7 days ago', '1 month ago', '1 month ago', '7 days ago', '7 days ago', '7 days ago', '7 days ago', '27 days ago', '1 month ago', '6 hours ago', '22 hours ago', '20 hours ago', '15 hours ago', '1 day ago', '4 days ago', '10 days ago', '8 days ago', '6 days ago', '7 days ago', '8 hours ago', '14 days ago', '14 days ago', '22 days ago', '2 months ago', '2 months ago', '2 months ago']
I am wondering how I can find the entry corresponding to the shortest duration. I have an idea to use a look over days, months, etc. but this feels very inefficient. Does anyone have any ideas? thanks!
You can convert to datetime.timedelta which can be used with min:
from datetime import timedelta
def convert(s):
n, unit, __ = s.split()
n = int(n)
if unit.startswith('month'): # assuming "1 month" means 30 days
n *= 30
unit = 'days'
if not unit.endswith('s'):
unit += 's'
return timedelta(**{unit: n})
Then convert the strings and take the minimum:
deltas = [convert(s) for s in times]
min(deltas)
Or use this method as a key to min:
min(times, key=convert)
I would redefine the key parameter of the min built-in function:
def value(t):
x = t.split()
number = int(x[0])
number *= (1 if x[1].startswith("hour") else
24 if x[1].startswith("day") else
24*30)
return number
result = min(times, key=value)
Here I suppose that a month lasts 30 days (this is not always the case).
There you go:
times = ['1 day ago', '1 day ago', '1 day ago', '1 day ago', '1 day ago', '7 days ago', '5 months ago', '27 days ago', '7 days ago', '7 days ago', '1 month ago', '1 month ago', '7 days ago', '7 days ago', '7 days ago', '7 days ago', '27 days ago', '1 month ago', '6 hours ago', '22 hours ago', '20 hours ago', '15 hours ago', '1 day ago', '4 days ago', '10 days ago', '8 days ago', '6 days ago', '7 days ago', '8 hours ago', '14 days ago', '14 days ago', '22 days ago', '2 months ago', '2 months ago', '2 months ago']
def evaluate(time):
if 'hour' in time:
return int(time.split(' ')[0])
if 'day' in time:
return int(time.split(' ')[0]) * 24
if 'month' in time:
return int(time.split(' ')[0]) * 24 * 30
values = [evaluate(time) for time in times]
minValue = min(values)
minIndex = values.index(minValue)
print(minIndex)
print(times[minIndex])
First you have to parse your input and convert to something readable by your software:
for t in times:
num, unit, _ = t.split()
num = int(num) # here you have an integer
Than you can use a dictionary to convert your values to seconds (in this example) to have the same unit of measure.
units = {"second": 1, "minute": 60, "hour": 3600, ... }
So you can extract your unit:
if unit.endswith("s"): # remove the plural `s`
unit = unit[:-1]
converted_unit = units[unit]
seconds_ago = converted_unit * num
And here you have it: you have a single number that you can compare with others, than it is just a matter of finding the minimum.
Enjoy!
I have got a pandas data frame with 26 columns. I need to create barplot based on unique values of a column in particular order. I have managed to extract unique values of the column in an array. Now I want to sort it out in particular order. Is there any way?
NOTE:
I would prefer not to disturb the index of the dataframe, based on this column.
my code
e= df['emp_length'].dropna().unique()
e = np.sort(e)
sns.countplot(x='emp_length',order=e,data=df)
The array e is ordered as below
array(['1 year', '10+ years', '2 years', '3 years', '4 years', '5 years',
'6 years', '7 years', '8 years', '9 years', '< 1 year'],
dtype=object)
However, I want the array to be ordered as below
array(['< 1 year','1 year', '2 years', '3 years', '4 years', '5 years',
'6 years', '7 years', '8 years', '9 years', '10+ years'],
dtype=object)
Close what need is use natsorted, but then is necessary change order - add last value to first:
a = np.array(['1 year', '10+ years', '2 years', '3 years', '4 years', '5 years',
'6 years', '7 years', '8 years', '9 years', '< 1 year'])
from natsort import natsorted
b = natsorted(a)
print (b[-1:] + b[:-1])
['< 1 year', '1 year', '2 years', '3 years',
'4 years', '5 years', '6 years', '7 years',
'8 years', '9 years', '10+ years']
So, I am working on a project, and I have the following list :
a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']
I want to run a code that will check whether the first character of each string is present in an other string, and select them to add them in a new list if yes.
I know how to do it, but only for two strings. Here, I want to do it so that it will select all of those which start with the same string, and sort it through the number of original string there is . For example, I want to regroup by sublist of 3 strings (so, coming from the original list), all the possible combinations of strings which start with the same string.
Also, I wish the result would only count one string per possible association of substrings, and not give different combinations with the same substrings but different orders.
The expected result in that case (i.e when i want strings of 3 substrings and with a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']) is:
['2 co, 2 tr, ,2 pi', '2 co, 2 tr, 2, ca', '2pi, 2ca, 2tr', '2pi, 2ca, 2co', 3 co, 3 ca, 3 pi]
You see that here, I don't have '2 tr, 2 co, 2 pi', because i already have '2 co, 2 tr, ,2 pi'
And when i want to regroup by sublist of 4, the expected output is
['2 co, 2 tr, 2, pi, 2 ca']
I managed how to do it, but only when grouping by subset of two, and it gives all the combinations including the one with the same substrings but different order... here is it :
a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']
result = []
for i in range(len(a)):
for j in a[:i]+a[i+1:]:
if a[i][0] == j[0]:
result.append(j)
print(result)
Thanks for your help !
You can use itertools.groupby and itertools.combinations for that task:
import itertools as it
import operator as op
groups = it.groupby(sorted(a), key=op.itemgetter(0))
result = [', '.join(c) for g in groups for c in it.combinations(g[1], 3)]
Note that if the order of elements should only depend on the first character you might want to add another key=op.itemgetter(0) to the sorted function. If the data is already presorted such that "similar" items (with the same first character) are next to each other then you can drop the sorted all together.
Details
it.groupby puts the data into groups, based on their first character (due to key=op.itemgetter(0), which selects the first item, i.e. the first character, from each string). Expanding groups, it looks like this:
[('2', ['2 co', '2 tr', '2 pi', '2 ca']),
('3', ['3 co', '3 ca', '3 pi']),
('6', ['6 tr', '6 pi']),
('7', ['7 ca', '7 pi']),
('8', ['8 tr'])]
Then for each of the groups it.combinations(..., 3) computes all possible combinations of length 3 and concatenates them in the list comprehension (for groups with less than 3 members no combinations are possible):
['2 co, 2 tr, 2 pi',
'2 co, 2 tr, 2 ca',
'2 co, 2 pi, 2 ca',
'2 tr, 2 pi, 2 ca',
'3 co, 3 ca, 3 pi']
I am new to python and looking into scraping HTML using python beautifulsoup library.
I need to fetch date field value as Day and date and precip field value as well as measuring unit .
Python code
dates=[]
Precip=[]
for row in right_table.findAll("tr"):
cells = row.findAll('td')
th_cells=row.findAll('th') #To store second column data
if len(cells)==5:
Precip.append(cells[1].find(text=True))
dates.append(th_cells[0].find(text=True))
print(dates)
print(Precip)
Code Output
['Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ']
['0 ', '0 ', '0 ', '1 ', '3 ', '3 ', '13 ', '0 ', '0 ', '0 ', '0 ', '0 ', '\xa0', '1 ', '3 ', '0 ', '1 ', '4 ', '2 ', '9 ', '2 ', '0 ', '1 ', '0 ', '0 ', '0 ', '0 ', '0 ', '1 ', '2 ']
Required Output
['Wed 11/1','Thur 11/2'.......]
['0mm','0mm'....]
Below is the HTML which i am trying to parse
HTML
<class 'list'>: ['\n', <thead>
<tr>
<th>Date</th>
<th>Hi/Lo</th>
<th>Precip</th>
<th>Snow</th>
<th>Forecast</th>
<th>Avg. HI / LO</th>
</tr>
</thead>, '\n', <tbody>
<tr class="pre">
<th scope="row">Wed <time>11/1</time></th>
<td>25°/20°</td>
<td>0 <span class="small">mm</span></td>
<td>0 <span class="small">CM</span></td>
<td> </td>
<td>28°/18°</td>
</tr>
<tr class="pre">
<th scope="row">Thu <time>11/2</time></th>
<td>28°/19°</td>
<td>0 <span class="small">mm</span></td>
<td>0 <span class="small">CM</span></td>
<td> </td>
<td>27°/18°</td>
</tr>
I'd use .text instead of .find(text=true). What's currently happening is you're not fetching the content of the subtags, like <time>.
from bs4 import BeautifulSoup
import requests
html = requests.get("https://www.accuweather.com/en/in/bengaluru/204108/month/204108?view=table").text
soup = BeautifulSoup(html, 'html.parser')
right_table = soup.find("tbody")
dates=[]
Precip=[]
for row in right_table.findAll("tr"):
cells = row.findAll('td')
th_cells=row.findAll('th') #To store second column data
if len(cells)==5:
Precip.append(cells[1].text)
dates.append(th_cells[0].text)
print(dates)
print(Precip)
This gets the correct outputted result:
['Wed 11/1', 'Thu 11/2', 'Fri 11/3', 'Sat 11/4', 'Sun 11/5', 'Mon 11/6', 'Tue 11/7', 'Wed 11/8', 'Thu 11/9', 'Fri 11/10', 'Sat 11/11', 'Sun 11/12', 'Mon 11/13', 'Tue 11/14', 'Wed 11/15', 'Thu 11/16', 'Fri 11/17', 'Sat 11/18', 'Sun 11/19', 'Mon 11/20', 'Tue 11/21', 'Wed 11/22', 'Thu 11/23', 'Fri 11/24', 'Sat 11/25', 'Sun 11/26', 'Mon 11/27', 'Tue 11/28', 'Wed 11/29', 'Thu 11/30']
['0 mm', '0 mm', '0 mm', '1 mm', '3 mm', '3 mm', '13 mm', '0 mm', '0 mm', '0 mm', '0 mm', '0 mm', '\xa0', '1 mm', '3 mm', '0 mm', '1 mm', '4 mm', '2 mm', '9 mm', '2 mm', '0 mm', '1 mm', '0 mm', '0 mm', '0 mm', '0 mm', '0 mm', '1 mm', '2 mm']