Is it possible to remove the string and just have the list
data = [
"50,bird,corn,105.4,"
"75,cat,meat,10.3,"
"100,dog,eggs,1000.5,"
]
would like it to look like this
data = [
50,'bird','corn',105.4,
75,'cat','meat',10.3,
100,'dog','eggs',1000.5,
]
out = []
for x in data:
for e in x.split(","):
out.append(e)
What does this do? It splits each element (x) in data on the comma, picks out each of those separate tokens (e), and puts them in the variable (out.append).
new_data = []
for i in data:
new_data.extend(i.split(','))
new_data
Do note that there might be issues (for example, you have one last comma with nothing after it, so it generates a '' string as the last element in the new array).
If you want to specifically convert the numbers to ints and floats, maybe there is a more elegant way, but this will work (it also removes empty cells if you have excess commas):
new_data = []
for i in data:
strings = i.split(',')
for s in strings:
if (len(s)>0):
try:
num = int(s)
except ValueError:
try:
num = float(s)
except ValueError:
num = s
new_data.append(num)
new_data
split each string (this gives you an array of the segments between "," in each string):
str.split(",")
and add the arrays together
Because each string in the list has a trailing comma, you can simply put it back together as a single string and split it again on commas. In order to get actual numeric items in the resulting list, you could do this:
import re
data = [
"50,bird,corn,105.4,"
"75,cat,meat,10.3,"
"100,dog,eggs,1000.5,"
]
numeric = re.compile("-?\d+[\.]\d*$")
data = [ eval(s) if numeric.match(s) else s for s in "".join(data).split(",")][:-1]
data # [50, 'bird', 'corn', 105.4, 75, 'cat', 'meat', 10.3, 100, 'dog', 'eggs', 1000.5]
Related
I am needing to format forecast period columns to later merge with another data frame.
Columns of my data frame are:
current_cols = [
'01+11',
'02+10',
'03+09',
'04+08',
'05+07',
'06+06',
'07+05',
'08+04',
'09+03',
'10+02',
'11+01'
]
desired_out = [
'1+11',
'2+10',
'3+9',
'4+8',
'5+7',
'6+6',
'7+5',
'8+4',
'9+3',
'10+2',
'11+1'
]
Originally, I tried to split the list by split('+'), and use lstrip('0') for each element in the list. Then recombine elements within tuple with + in between.
Is there a better approach? I'm having trouble combining elements in tuples back together, with + in between. Help would be much appreciated.
You can use re module for the task:
import re
pat = re.compile(r"\b0+")
out = [pat.sub(r"", s) for s in current_cols]
print(out)
Prints:
[
"1+11",
"2+10",
"3+9",
"4+8",
"5+7",
"6+6",
"7+5",
"8+4",
"9+3",
"10+2",
"11+1",
]
current_cols =['01+11','02+10','03+09','04+08','05+07','06+06','07+05','08+04','09+03','10+02','11+01']
desired_out = []
for item in current_cols:
if item[0] == "0":
item = item[1:]
if "+0" in item:
item = item.replace('+0', '+')
desired_out.append(item)
You can do it with nested comprehensions, conversion to int(), and formatting using an f-string:
current_cols = [
'01+11',
'02+10',
'03+09',
'04+08',
'05+07',
'06+06',
'07+05',
'08+04',
'09+03',
'10+02',
'11+01'
]
desired_out = [
f'{int(a)}+{int(b)}' for (a, b) in [
e.split('+') for e in current_cols
]
]
The code above will set desired_out with:
['1+11', '2+10', '3+9', '4+8', '5+7', '6+6', '7+5', '8+4', '9+3', '10+2', '11+1']
This method is implementing your original thought of splitting each element using the + signal as separator, extracting the leading zeros from each pair element (done with the int() conversion inside the f-string), and combining them back, with a + sign in between (also using the f-string).
The inner comprehension is just walking each element of the list, and splitting them by the + sign. The outer comprehension converts each element of each pair to int() to get rid of the leading zeros.
We want a bunch of map operations here to do the following:
split each element of current_cols on "+":
map(lambda s: s.split("+"), current_cols)
lstrip the "0" out of each element of the resulting lists:
map(lambda l: (x.lstrip("0") for x in l), ...)
join the resulting values on "+":
map("+".join, ...)
Then, we list out the elements of these map operations:
list(
map("+".join,
map(lambda l: (x.lstrip('0') for x in l),
map(lambda s: s.split('+'), current_cols)
)
)
)
which gives:
['1+11',
'2+10',
'3+9',
'4+8',
'5+7',
'6+6',
'7+5',
'8+4',
'9+3',
'10+2',
'11+1']
I have string below,and I want to get list,dict,var from this string.
How can I to split this string to specific format?
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
import re
m1 = re.findall (r'(?=.*,)(.*?=\[.+?\],?)',s)
for i in m1 :
print('m1:',i)
I only get result 1 correctly.
Does anyone know how to do?
m1: list_c=[1,2],
m1: a=3,b=1.3,c=abch,list_a=[1,2],
Use '=' to split instead, then you can work around with variable name and it's value.
You still need to handle the type casting for values (regex, split, try with casting may help).
Also, same as others' comment, using dict may be easier to handle
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
al = s.split('=')
var_l = [al[0]]
value_l = []
for a in al[1:-1]:
var_l.append(a.split(',')[-1])
value_l.append(','.join(a.split(',')[:-1]))
value_l.append(al[-1])
output = dict(zip(var_l, value_l))
print(output)
You may have better luck if you more or less explicitly describe the right-hand side expressions: numbers, lists, dictionaries, and identifiers:
re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
# [('list_c', '[1,2]'), ('a', '3'), ('b', '1.3'), ('c', 'abch'),
# ('list_a', '[1,2]'), ('dict_a', '{a:2,b:3}')]
The answer is like below
import re
from pprint import pprint
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1],Save,Record,dict_a={a:2,b:3}'
m1 = re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
temp_d = {}
for i,j in m1:
temp = i.strip(',').split(',')
if len(temp)>1:
for k in temp[:-1]:
temp_d[k]=''
temp_d[temp[-1]] = j
else:
temp_d[temp[0]] = j
pprint(temp_d)
Output is like
{'Record': '',
'Save': '',
'a': '3',
'b': '1.3',
'c': 'abch',
'dict_a': '{a:2,b:3}',
'list_a': '[1]',
'list_c': '[1,2]'}
Instead of picking out the types, you can start by capturing the identifiers. Here's a regex that captures all the identifiers in the string (for lowercase only, but see note):
regex = re.compile(r'([a-z]|_)+=')
#note if you want all valid variable names: r'([a-z]|[A-Z]|[0-9]|_)+'
cases = [x.group() for x in re.finditer(regex, s)]
This gives a list of all the identifiers in the string:
['list_c=', 'a=', 'b=', 'c=', 'list_a=', 'dict_a=']
We can now define a function to sequentially chop up s using the
above list to partition the string sequentially:
def chop(mystr, mylist):
temp = mystr.partition(mylist[0])[2]
cut = temp.find(mylist[1]) #strip leading bits
return mystr.partition(mylist[0])[2][cut:], mylist[1:]
mystr = s[:]
temp = [mystr]
mylist = cases[:]
while len() > 1:
mystr, mylist = chop(mystr, mylist)
temp.append(mystr)
This (convoluted) slicing operation gives this list of strings:
['list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'list_a=[1,2],dict_a={a:2,b:3}',
'dict_a={a:2,b:3}']
Now cut off the ends using each successive entry:
result = []
for x in range(len(temp) - 1):
cut = temp[x].find(temp[x+1]) - 1 #-1 to remove commas
result.append(temp[x][:cut])
result.append(temp.pop()) #get the last item
Now we have the full list:
['list_c=[1,2]', 'a=3', 'b=1.3', 'c=abch', 'list_a=[1,2]', 'dict_a={a:2,b:3}']
Each element is easily parsable into key:value pairs (and is also executable via exec).
If I have an array that contains only strings, but some of them are numbers, how would I search through the array, determine which strings are actually numbers, and add those numbers to a new array? An example of the array is as follows: [ "Chris" , "90" , "Dave" , "76" ]
I have tried using a for loop to consecutively use isdigit() on each index, and if it is true to add that item to the new array.
scores = []
for i in range(len(name_and_score_split)):
if name_and_score_split[i].isdigit() == True:
scores.append(name_and_score_split[i])
When the above code is ran it tells me list data type does not have the "isdigit" function
edit: iv'e found that my problem is the list is actually a list of lists.
Use a list-comprehension and also utilise the for-each property of Python for rather than iterating over indices:
lst = ["Chris" , "90" , "Dave" , "76"]
scores = [x for x in lst if x.isdigit()]
# ['90', '76']
Alternately, filter your list:
scores = list(filter(lambda x: x.isdigit(), lst))
Assuming what you're trying if for integers you can do something like:
// Taken from and changing float by int.
def is_number(s):
try:
int(s)
return True
except ValueError:
return False
Then you can do
[x for x in name_and_score_split if is_number(x)]
If you want list of int:
s = ["Chris", "90", "Dave", "76"]
e = [int(i) for i in s if i.isdigit()]
print(e)
# OUTPUT: [90, 76]
I'm trying to read in a text file and format it correctly in a numpy array.
The Input.txt file contains:
point_load, 3, -300
point_load, 6.5, 500
point_moment, 6.5, 3000
I want to produce this array:
point_load = [3, -300, 65, 500]
My code is:
a = []
for line in open("Input.txt"):
li=line.strip()
if li.startswith("point_load")
a.append(li.split(","))
#np.flatten(a)
My code prints:
[['point_load', ' 3', ' -300'], ['point_load', ' 6.5', ' 500']]
Any help would be appreciated. Thank you.
Change this line :
a.append(li.split(","))
to this:
a.append(li.split(",")[1:])
li.split(",")
returns a list, so you appended a list, obtaining a nested list.
You wanted append individual elements append to the a list, namely 2nd and 3rd, i. e. with indices 1 and 2. So instead
a.append(li.split(","))
use
temp = li.split(","))
second = temp[1]
third = temp[2]
a.append(float(second))
a.append(float(third))
Note the use of the float() function as the .split() method returns a list of strings.
(Maybe in the last .append() would be more appropriate for you the use of int() function instead.)
To end up with a list of numbers instead of strings, I recommend the following:
a = []
for line in open("Input.txt"):
li=line.strip()
if li.startswith("point_load"):
l = li.split(',')
for num in l[1:]:
try:
num = float(num.strip())
a.append(num)
except ValueError:
print 'num was not a number'
The difference here is the list slice which takes the entire line starting from the second comma-separated element (more here: understanding-pythons-slice-notation)
l[1:]
Also, stripping then converting the strings to floats (since you have decimals)
num = float(num.strip())
Resulting array:
a = [3.0, -300.0, 6.5, 500.0]
If i have a list strings:
first = []
last = []
my_list = [' abc 1..23',' bcd 34..405','cda 407..4032']
how would i append the numbers flanking the .. to their corresponding lists ? to get:
first = [1,34,407]
last = [23,405,4032]
i wouldn't mind strings either because i can convert to int later
first = ['1','34','407']
last = ['23','405','4032']
Use re.search to match the numbers between .. and store them in two different groups:
import re
first = []
last = []
for s in my_list:
match = re.search(r'(\d+)\.\.(\d+)', s)
first.append(match.group(1))
last.append(match.group(2))
DEMO.
I'd use a regular expression:
import re
num_range = re.compile(r'(\d+)\.\.(\d+)')
first = []
last = []
my_list = [' abc 1..23',' bcd 34..405','cda 407..4032']
for entry in my_list:
match = num_range.search(entry)
if match is not None:
f, l = match.groups()
first.append(int(f))
last.append(int(l))
This outputs integers:
>>> first
[1, 34, 407]
>>> last
[23, 405, 4032]
One more solution.
for string in my_list:
numbers = string.split(" ")[-1]
first_num, last_num = numbers.split("..")
first.append(first_num)
last.append(last_num)
It will throw a ValueError if there is a string with no spaces in my_list or there is no ".." after the last space in some of the strings (or there is more than one ".." after the last space of the string).
In fact, this is a good thing if you want to be sure that values were really obtained from all the strings, and all of them were placed after the last space. You can even add a try…catch block to do something in case the string it tries to process is in an unexpected format.
first=[(i.split()[1]).split("..")[0] for i in my_list]
second=[(i.split()[1]).split("..")[1] for i in my_list]