Remove leading zeros in forecast period string

Remove leading zeros in forecast period string - python

I am needing to format forecast period columns to later merge with another data frame.
Columns of my data frame are:
current_cols = [
'01+11',
'02+10',
'03+09',
'04+08',
'05+07',
'06+06',
'07+05',
'08+04',
'09+03',
'10+02',
'11+01'
]
desired_out = [
'1+11',
'2+10',
'3+9',
'4+8',
'5+7',
'6+6',
'7+5',
'8+4',
'9+3',
'10+2',
'11+1'
]
Originally, I tried to split the list by split('+'), and use lstrip('0') for each element in the list. Then recombine elements within tuple with + in between.
Is there a better approach? I'm having trouble combining elements in tuples back together, with + in between. Help would be much appreciated.

You can use re module for the task:
import re
pat = re.compile(r"\b0+")
out = [pat.sub(r"", s) for s in current_cols]
print(out)
Prints:
[
"1+11",
"2+10",
"3+9",
"4+8",
"5+7",
"6+6",
"7+5",
"8+4",
"9+3",
"10+2",
"11+1",
]

current_cols =['01+11','02+10','03+09','04+08','05+07','06+06','07+05','08+04','09+03','10+02','11+01']
desired_out = []
for item in current_cols:
if item[0] == "0":
item = item[1:]
if "+0" in item:
item = item.replace('+0', '+')
desired_out.append(item)

You can do it with nested comprehensions, conversion to int(), and formatting using an f-string:
current_cols = [
'01+11',
'02+10',
'03+09',
'04+08',
'05+07',
'06+06',
'07+05',
'08+04',
'09+03',
'10+02',
'11+01'
]
desired_out = [
f'{int(a)}+{int(b)}' for (a, b) in [
e.split('+') for e in current_cols
]
]
The code above will set desired_out with:
['1+11', '2+10', '3+9', '4+8', '5+7', '6+6', '7+5', '8+4', '9+3', '10+2', '11+1']
This method is implementing your original thought of splitting each element using the + signal as separator, extracting the leading zeros from each pair element (done with the int() conversion inside the f-string), and combining them back, with a + sign in between (also using the f-string).
The inner comprehension is just walking each element of the list, and splitting them by the + sign. The outer comprehension converts each element of each pair to int() to get rid of the leading zeros.

We want a bunch of map operations here to do the following:
split each element of current_cols on "+":
map(lambda s: s.split("+"), current_cols)
lstrip the "0" out of each element of the resulting lists:
map(lambda l: (x.lstrip("0") for x in l), ...)
join the resulting values on "+":
map("+".join, ...)
Then, we list out the elements of these map operations:
list(
map("+".join,
map(lambda l: (x.lstrip('0') for x in l),
map(lambda s: s.split('+'), current_cols)
)
)
)
which gives:
['1+11',
'2+10',
'3+9',
'4+8',
'5+7',
'6+6',
'7+5',
'8+4',
'9+3',
'10+2',
'11+1']

Related

Python - appending to a list in for loop after if statement

I have a list of list elements that I'm modifying. And after that modification I want to put them into a list that have the same structure as the original one:
list_data_type = [
[['void1']], [['uint8']], [['uint8'], ['uint32']], [['void2']], [['void3']], [['void4']], [['void5']]
]
Firstly I check for elements that have more than one element. So in this case that would be element with index number = 2. Then I change it into a string, strip it from brackets [] and " " marks and convert it to a list. Then I take other elements and do the same thing. After conversion I want to create a new list with those elements, but without unnecessary symbols. So my desired output would look like this:
list_data_converted = [
['void1'], ['uint8'], ['uint8', 'uint32'], ['void2'], ['void3'], ['void4'], ['void5']
]
Conversion works and I can print out elements, but I have a problem with appending them to a list. My code saves only last value from original list:
def Convert(string):
li = list(string.split(" "))
return li
for element in list_data_type:
if type(element) == list:
print("element is a:", element, type(element))
if len(element) > 1:
id_of_el = list_data_type.index(element)
el_str = str(element).replace('[', '').replace("'", '').replace("'", '').replace(']', '').replace(',', '')
el_con = Convert(el_str)
elif len(element <= 1):
elements_w_1_el = element
list_el = []
for i in range(len(elements_w_1_el)):
el_str_2 = str(element).replace('[', '').replace("'", '').replace("'", '').replace(']', '').replace(',', '')
list_el.append(elements_w_1_el[i])
And my out instead looking like "list_data_converted", has only one element - ['void5']. How do I fix that?

Converting a list to a string to flatten it is a very... cumbersome approach.
Try simple list-comprehension:
list_data_type = [[v[0] for v in l] for l in list_data_type]

Type casting the list into a string and then replacing the characters and then again converting the string into list might be bad way to achieve what you're doing.
Try this :
def flatten(lst):
if lst == []:
return lst
if isinstance(lst[0], list):
return flatten(lst[0]) + flatten(lst[1:])
return lst[:1] + flatten(lst[1:])
list_data_converted = [flatten(element) for element in list_data_type]
This actually flattens any list item inside list_data_type and keep them in a single list. This should work with any depth of list inside list.
Output print(list_data_converted) would give the following :
[
['void1'], ['uint8'], ['uint8', 'uint32'], ['void2'], ['void3'], ['void4'], ['void5']
]

Get sum of integers from list of strings

alist = [["Chanel-1000, Dior-2000, Prada-500"],
["Chloe-200,Givenchy-400,LV-600"], ["Bag-1,Bagg-2,Baggg-3"]]
alist_min = [
min(map(str.strip, x[0].split(',')),
key=lambda i: int(str.strip(i).split('-')[-1])) for x in alist
]
print(alist_min)
Given this script how to get the sum of alist_min it will only print the integer so given the result of [Prada-500, Chloe-200, Bagg-1] by doing the summation of the list the output would be
#total: 701

You can use sum() and list comprehension with split() function:
sum([int(x.split('-')[1]) for x in alist_min])
Full code:
alist = [["Chanel-1000, Dior-2000, Prada-500"],
["Chloe-200,Givenchy-400,LV-600"], ["Bag-1,Bagg-2,Baggg-3"]]
alist_min = [
min(map(str.strip, x[0].split(',')),
key=lambda i: int(str.strip(i).split('-')[-1])) for x in alist
]
print(alist_min)
print(sum([int(x.split('-')[1]) for x in alist_min]))
Output:
['Prada-500', 'Chloe-200', 'Bag-1']
701
Explanation:
use split() to split each string in alist_min at character -, into two, the second one has the number.
Convert this to an int.
Use above logic in list comprehension to generate list of numbers
Use sum() to take sum of this list

You can use regular expression, along with map and sum
import re
sum(map(int,(map(lambda x:re.findall('\d+',x)[0], alist_min))))
#output: 701

How to merge first part of a list value prior to a character, based on the values after the character in python

I have a scenario where I need to combine the element values of a list in python. This is based on the values coming after a specific character in the element. For example:
I have a below input list value
[('245|CALENDAR_DATE-DATE'), ('129|AREA-VARCHAR'),('450|DIVISION-VARCHAR'),('678|CALENDAR_DATE-DATE'),('298|DIVISION-VARCHAR')]
I have to get the output list as below
[('245,678|CALENDAR_DATE-DATE'), ('129|AREA-VARCHAR'),('450,298|DIVISION-VARCHAR')]
So in the list, if the values of two elements after the pipe(|) is same, I have to club the values before pipe(|) as comma separated values.
The clubbing should occur into the element which is placed first out of the two.
Thanks in advance.

Try groupby:
from itertools import groupby
l = [('245|CALENDAR_DATE-DATE'), ('129|AREA-VARCHAR'),('450|DIVISION-VARCHAR'),('678|CALENDAR_DATE-DATE'),('298|DIVISION-VARCHAR')]
print([','.join([x.split('|')[0] for x in v]) + '|' + i for i, v in groupby(sorted(l, key=lambda x: x.split('|')[1]), lambda x: x.split('|')[1])])
Output:
['129|AREA-VARCHAR', '245,678|CALENDAR_DATE-DATE', '450,298|DIVISION-VARCHAR']

First you may keep the numbers related to a string together, use a dict for this : key is the string, value a list if the ints
occurences = {}
for value in values:
content = value.split("|")
occurences[content[1]] = occurences.get(content[1], []) + [content[0]]
print(occurences) # {'CALENDAR_DATE-DATE': ['245', '678'], 'AREA-VARCHAR': ['129'],
# 'DIVISION-VARCHAR': ['450', '298']}
Then just concat each pairs using your formatting
result = [','.join(v) + '|' + k for k, v in occurences.items()]
print(result) # ['245,678|CALENDAR_DATE-DATE','129|AREA-VARCHAR','450,298|DIVISION-VARCHAR']

How to search through an arry containing strings, and create a new array with only integers

If I have an array that contains only strings, but some of them are numbers, how would I search through the array, determine which strings are actually numbers, and add those numbers to a new array? An example of the array is as follows: [ "Chris" , "90" , "Dave" , "76" ]
I have tried using a for loop to consecutively use isdigit() on each index, and if it is true to add that item to the new array.
scores = []
for i in range(len(name_and_score_split)):
if name_and_score_split[i].isdigit() == True:
scores.append(name_and_score_split[i])
When the above code is ran it tells me list data type does not have the "isdigit" function
edit: iv'e found that my problem is the list is actually a list of lists.

Use a list-comprehension and also utilise the for-each property of Python for rather than iterating over indices:
lst = ["Chris" , "90" , "Dave" , "76"]
scores = [x for x in lst if x.isdigit()]
# ['90', '76']
Alternately, filter your list:
scores = list(filter(lambda x: x.isdigit(), lst))

Assuming what you're trying if for integers you can do something like:
// Taken from and changing float by int.
def is_number(s):
try:
int(s)
return True
except ValueError:
return False
Then you can do
[x for x in name_and_score_split if is_number(x)]

If you want list of int:
s = ["Chris", "90", "Dave", "76"]
e = [int(i) for i in s if i.isdigit()]
print(e)
# OUTPUT: [90, 76]

How to iteratively split a string using backward combinations?

I have a list of strings that look like this:
['C04.123.123.123', 'C03.456.456.456', 'C05.789.789.789']
I'm trying to split each string so I get different backward combinations of splits on the period delimiter. Basically, if I only take the example of the first string, I want to get:
['C04.123.123.123', 'C04.123.123', 'C04.123', 'C04']
How can I achieve this? I've tried looking into itertools.combinations and the standard split features but no luck.

One-line, easy to understand (was less easy to tune :)), using str.rsplit with maxsplit gradually increasing up to the number of dots:
lst = ['C04.123.123.123', 'C03.456.456.456', 'C05.789.789.789']
result = [x.rsplit(".",i)[0] for x in lst for i in range(x.count(".")+1) ]
result:
['C04.123.123.123',
'C04.123.123',
'C04.123',
'C04',
'C03.456.456.456',
'C03.456.456',
'C03.456',
'C03',
'C05.789.789.789',
'C05.789.789',
'C05.789',
'C05']
The only thing that annoys me is that it calls split a lot just to keep the first element. Too bad there isn't a built-in lazy split function we could call next on.

You can use a list comprehension:
d = ['C04.123.123.123', 'C03.456.456.456', 'C05.789.789.789']
new_d = [a+('.' if i else '')+'.'.join(i) for a, *c in map(".".split, d)
for i in [c[:h] for h in range(len(c)+1)][::-1]]
Output:
['C04.123.123.123', 'C04.123.123', 'C04.123', 'C04', 'C03.456.456.456', 'C03.456.456', 'C03.456', 'C03', 'C05.789.789.789', 'C05.789.789', 'C05.789', 'C05']

start_list = ['C04.123.123.123', 'C03.456.456.456', 'C05.789.789.789']
final_list = []
for item in start_list:
broke_up = item.split('.')
temp = []
full_item = []
for sect in broke_up:
temp.append(sect)
full_item.append(".".join(temp))
final_list.extend(full_item)
print(final_list)
Alternatively you can final_list.append(full_item) to keep seperate lists for each string in the original list.

Try this:
list(accumulate(s.split('.'), lambda a, b: a + '.' + b))[::-1]

You can use itertools.accumulate:
from itertools import accumulate
s = 'C04.123.123.123'
# define the incremental step
append = lambda s, e: s + '.' + e
result = list(accumulate(s.split('.'), append))[::-1]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove leading zeros in forecast period string - python

You can use re module for the task: import re pat = re.compile(r"\b0+") out = [pat.sub(r"", s) for s in current_cols] print(out) Prints: [ "1+11", "2+10", "3+9", "4+8", "5+7", "6+6", "7+5", "8+4", "9+3", "10+2", "11+1", ]

current_cols =['01+11','02+10','03+09','04+08','05+07','06+06','07+05','08+04','09+03','10+02','11+01'] desired_out = [] for item in current_cols: if item[0] == "0": item = item[1:] if "+0" in item: item = item.replace('+0', '+') desired_out.append(item)

Related

Python - appending to a list in for loop after if statement

Get sum of integers from list of strings

How to merge first part of a list value prior to a character, based on the values after the character in python

How to search through an arry containing strings, and create a new array with only integers

How to iteratively split a string using backward combinations?

Categories

Resources