Filtering using a str array - python

I am trying to filter an ASCII list (which contains ASCII and other characters) by using an array that I have created. I am trying to remove any integer string within the list.
import pandas as pd
with open('ASCII.txt') as f:
data = f.read().replace('\t', ',')
print(data, file=open('my_file.csv', 'w'))
df = list(data)
test = ['0','1','2','3','4','5','6','7','8','9']
for x in df:
try:
df = int(df)
for i in range(0,9):
while any(test) in df:
df.remove('i')
print(df)
except:
continue
print(df)
This is what I currently have however, it does not work and outputs:
['3', '3', ',', '0', '4', '1', ',', '2', '1', ',', '!', ',', '\n', '3', '4', ',', '0', '4', ...]

Your if condition for numbers is broken.
any checks if at least one element in the passed iterable is truthy, i.e. not an empty string in your case.
test = ['0','1','2','3','4','5','6','7','8','9']
while any(test) in df: # Condition always evaluates to False
df.remove('i') # Only removes the character 'i' from df
So your condition any(test) evaluates to True. And now you are checking if True is in df which it isn't, so the condition evaluates to False.
The next error is, that you try to remove the letter 'i' from your list with the remove call. This can be fixed by casting the integer to a string
for i in range(9):
# Cast integer to str
while str(i) in df:
# Remove str i from df
df.remove(str(i))
Using a str list instead of the range function, you can directly iterate over the elements of the test list:
df = list(data)
test = ['0','1','2','3','4','5','6','7','8','9']
for num in test:
# Loop as long as num appears in df
while num in df:
df.remove(num) # removes all elements with value of num
By doing so you have to run a second loop to remove all appearances of the current num in df, as remove only removes the first occurrence of that value.
Alternatively you can also check each element of df if it is a digit by using the str method isdigit. But as you modify the list in-place you need to iterate over a copy. Otherwise you'll encounter side-effects as you reduce the size of df:
# Use slice to create a copy of df
for el in df[:]:
if el.isdigit():
df.remove(el)
As you iterate over each element in df you don't need an inner loop to remove each occurrence of value el.

Related

how to find the matching pattern for an input list and then replace the found pattern with the proper pattern conversion using python

note that the final two numbers of this pattern for example FBXASC048 are ment to be ascii code for numbers (0-9)
input example list ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human']
result example ['1009Car', '5002Toy', '2004Human']
what is the proper way to searches for any of these pattern in an input list
num_ascii = ['FBXASC048', 'FBXASC049', 'FBXASC050', 'FBXASC051', 'FBXASC052', 'FBXASC053', 'FBXASC054', 'FBXASC055', 'FBXASC056', 'FBXASC057']
and then replaces the pattern found with one of the items in the conv list but not randomally
because each element in the pattern list equals only one element in the conv_list
conv_list = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
this is the solution in mind:
it has two part
1st part--> is to find for ascii pattern[48, 49, 50, 51, 52, 53, 54, 55, 56,57]
and then replace those with the proper decimal matching (0-9)
so we will get new input list will be called input_modi_list that has ascii replaced with decimal
2nd part-->another process to use fixed pattern to replace using replace function which is this 'FBXASC0'
new_list3
for x in input_modi_list:
y = x.replace('FBXASC0', '')
new_list3.append(new_string)
so new_list3 will have the combined result of the two parts mentioned above.
i don't know if there would be a simplar solution or a better one maybe using regex
also note i don't have any idea on how to replace ascii with decimal for a list of items
I think this should do the trick:
import re
input_list = ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human']
pattern = re.compile('FBXASC(\d{3,3})')
def decode(match):
return chr(int(match.group(1)))
result = [re.sub(pattern, decode, item) for item in input_list]
print(result)
Now, there is some explanation due:
1- the pattern object is a regular expression that will match any part of a string that starts with 'FBXASC' and ends with 3 digits (0-9). (the \d means digit, and {3,3} means that it should occur at least 3, and at most 3 times, i.e. exactly 3 times). Also, the parenthesis around \d{3,3} means that the three digits matched will be stored for later use (explained in the next part).
2- The decode function receives a match object, uses .group(1) to extract the first matched group (which in our case are the three digits matched by \d{3,3}), then uses the int function to parse the string into an integer (for example, convert '048' to 48), and finally uses the chr function to find which character has that ASCII-code. (for example chr(48) will return '0', and chr(65) will return 'A')
3- The final part applies the re.sub function to all elements of list which will replace each occurrence of the pattern you described (FBXASC048[3-digits]) with it's corresponding ASCII character.
You can see that this solution is not limited only to your specific examples. Any number can be used as long as it has a corresponding ASCII character recognized by the chr function.
But, if you do want to limit it just to the 48-57 range, you can simply modify the decode function:
def decode(match):
ascii_code = int(match.group(1))
if ascii_code >= 48 and ascii_code <= 57:
return chr(ascii_code)
else:
return match.group(0) # returns the entire string - no modification
This is how I would do it.
make the regex pattern by simply joining the strings with |:
>>> num_ascii = ['FBXASC048', 'FBXASC049', 'FBXASC050', 'FBXASC051', 'FBXASC052', 'FBXASC053', 'FBXASC054', 'FBXASC055', 'FBXASC056', 'FBXASC057']
>>> conv_list = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
>>> regex_pattern = '|'.join(num_ascii)
>>> regex_pattern
'FBXASC048|FBXASC049|FBXASC050|FBXASC051|FBXASC052|FBXASC053|FBXASC054|FBXASC055
|FBXASC056|FBXASC057'
make a look-up dictionary by simply zipping the two lists:
>>> conv_table = dict(zip(num_ascii, conv_list))
>>> conv_table
{'FBXASC048': '0', 'FBXASC049': '1', 'FBXASC050': '2', 'FBXASC051': '3', 'FBXASC
052': '4', 'FBXASC053': '5', 'FBXASC054': '6', 'FBXASC055': '7', 'FBXASC056': '8
', 'FBXASC057': '9'}
iterate over the data and replace the matched string with the corresponding digit:
>>> import re
>>> result = []
>>> for item in ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human']:
... m = re.match(regex_pattern, item)
... matched_string = m[0]
... digit = (conv_table[matched_string])
... print(f'replacing {matched_string} with {digit}')
... result.append(item.replace(matched_string, digit))
...
replacing FBXASC048 with 0
replacing FBXASC053 with 5
replacing FBXASC050 with 2
>>> result
['0009Car', '5002Toy', '2004Human']

Python list insert with index

I have an empty python list. and I have for loop that insert elements with index but at random (means indices are chosen randomly to insert the item). I tried a simple example, with randomly select indices, but it works in some indices but others won't work. Below is a simple example of what I wanna do:
a=[]
#a=[]
a.insert(2, '2')
a.insert(5, '5')
a.insert(0, '0')
a.insert(3, '3')
a.insert(1, '1')
a.insert(4, '4')
The output of this is a = ['0','1','2','5','4','3']
it's correct in the first three (0,1,2) but wrong in the last three ('5','4','3')
How to control insert to an empty list with random indices.
list.insert(i, e) will insert the element e before the index i, so e.g. for an empty list it will insert it as the first element.
Map out the operations in your head using this information:
a = [] # []
a.insert(2, '2') # ['2']
a.insert(5, '5') # ['2', '5']
a.insert(0, '0') # ['0', '2', '5']
a.insert(3, '3') # ['0', '2', '5', '3']
a.insert(1, '1') # ['0', '1', '2', '5', '3']
a.insert(4, '4') # ['0', '1', '2', '5', '4', '3']
Keep in mind that lists are not fixed sized arrays. The list has no predefined size, and can grow and shrink by appending or popping elements.
For what you might want to do you can create a list and use indexing to set the values.
If you know the target size (e.g. it is 6):
a = [None] * 6
a[2] = '2'
# ...
If you only know the maximum possible index, it would have to be done like this:
a = [None] * (max_index+1)
a[2] = '2'
# ...
a = [e for e in a if e is not None] # get rid of the Nones, but maintain ordering
If you do not know the maximum possible index, or it is very large, a list is the wrong data structure, and you could use a dict, like pointed out and shown in the other answers.
If your values are unique, could you use them as keys in a dictionary, with the dict values being the index?
a={}
nums = [1,4,5,7,8,3]
for num in nums:
a.update({str(num): num})
sorted(a, key=a.get)
If you have an empty array and you try to add some element in the third position, that element will be added in first position. Because python's list is a linked list.
You could resolve your problem creating a list with None values in it. This could be made with this:
# Creating a list with size = 10, so you could insert up to 10 elements.
a = [None] * 10
# Inserting the 99 in third position
a.insert(3,99)
Another way to do that is using numpy:
import numpy as np
# Create an empty array with 10 elements
a = np.empty(10,dtype=object)
# Insert 99 number in third position
np.insert(a, 3, 99)
print(a)
# array([None, None, None, 99, None, None, None, None, None, None, None],
# dtype=object)

Remove wildcard string from list

I have a list which is a large recurring dataset with headers of the form:
array = ['header = 1','0','1','2',...,'header = 1','1','2','3',...,'header = 2','1','2','3']
The header string can vary between each individual dataset, but the size of the individual datasets do not.
I would like to remove all of the headers so that I am left with:
array = ['0','1','2',...,'1','2','3',...,'1','2','3']
If the header string does not vary, then I can remove them with:
lookup = array[0]
while True:
try:
array.remove(lookup)
except ValueError:
break
However, if the header strings do change, then they are not caught, and I am left with:
array = ['0','1','2',...,'1','2','3',...,'header = 2','1','2','3']
Is there a way in which the sub-string "header" can be removed, regardless of what else is in the string?
Best use a list comprehension with a condition instead of repeatedly removing elements. Also, use startswith instead of using a fixed lookup to compare to.
>>> array = ['header = 1','0','1','2','header = 1','1','2','3','header = 2','1','2','3']
>>> [x for x in array if not x.startswith("header")]
['0', '1', '2', '1', '2', '3', '1', '2', '3']
Note that this does not modify the existing list but create a new one, but it should be considerably faster as each single remove has O(n) complexity.
If you do not know what the header string is, you can still determine it from the first element:
>>> lookup = array[0].split()[0] # use first part before space
>>> [x for x in array if not x.startswith(lookup)]
['0', '1', '2', '1', '2', '3', '1', '2', '3']
Using the find() method you can determine whether or not the word "header" is contained in the first list item and use that to determine whether or not to remove the first item.

Python string split, handling single quotes

I am trying to split a string by ",".
'split' function works fine for the following 'example1' as expected.
example1 = "1,'aaa',337.5,17195,.02,0,0,'yes','abc'"
example1.split(",")
Result: ['1', "'aaa'", '337.5', '17195', '.02', '0', '0', "'yes'", "'abc'"]
But, here i have a scenario, where there are commas within the single quotes, on which i do not want to split on.
example2 = "1,'aaa',337.5,17195,.02,0,0,'yes','abc, def, xyz'"
example2.split(",")
Result: ["1,'aaa',337.5,17195,.02,0,0,'yes','abc,", 'def,', "xyz'"]
But I am trying to get this result instead:
['1', "'aaa'", '337.5', '17195', '.02', '0', '0', "'yes'", "'abc, def, xyz'"]
How can I achieve this with string split function?
You should first try to use built-ins or the standard library to read in your data as a list, for instance directly from a CSV file via the csv module.
If your string is from a source you cannot control, adding opening and closing square brackets gives a valid list, so you can use ast.literal_eval:
from ast import literal_eval
example2 = "1,'aaa',337.5,17195,.02,0,0,'yes','abc, def, xyz'"
res = literal_eval(f'[{example2}]')
# [1, 'aaa', 337.5, 17195, 0.02, 0, 0, 'yes', 'abc, def, xyz']
This does convert numeric data to integers / floats as appropriate. If you would like to keep them as strings, as per #JonClements' comment, you can pass to csv.reader:
import csv
res = next(csv.reader([example2], quotechar="'"))
# ['1', 'aaa', '337.5', '17195', '.02', '0', '0', 'yes', 'abc, def, xyz']
Assuming that you want to keep those 's around the elements ("'aaa'" instead of 'aaa' as in your expected output), here's how you may do it with a function:
def spl(st, ch):
res = []
temp = []
in_quote = False
for x in st:
if (x == "'"):
in_quote = not in_quote
if (not in_quote and x == ch):
res.append("".join(temp))
temp = []
else:
temp.append(x)
res.append("".join(temp))
return res
example2 = "1,'aaa',337.5,17195,.02,0,0,'yes','abc, def, xyz'"
print(spl(example2, ','))
Output:
['1', "'aaa'", '337.5', '17195', '.02', '0', '0', "'yes'", "'abc, def, xyz'"]

How do I remove hyphens from a nested list?

In the nested list:
x = [['0', '-', '3', '2'], ['-', '0', '-', '1', '3']]
how do I remove the hyphens?
x = x.replace("-", "")
gives me AttributeError: 'list' object has no attribute 'replace', and
print x.remove("-")
gives me ValueError: list.remove(x): x not in list.
x is a list of lists. replace() will substitute a pattern string for another within a string. What you want is to remove an item from a list. remove() will remove the first occurrence of an item. A simple approach:
for l in x:
while ("-" in l):
l.remove("-")
For more advanced solutions, see the following: Remove all occurrences of a value from a Python list

Categories

Resources