I have a little issue with my the following python code I have.
I have a list which contains the following elements. This list could also be empty with NO contents.
As you can see, the This is my first stock I bought 01.27.2019 is encode and I have to decode it to remove the b''. When I perform the split operation on the mylist, I get '' as the first item in the list, and I am not sure why.
mylist = "$17$b'This is my first stock I bought 01.27.2019'"
The fields in mylist are seperated by a $ rather than a ,
tmp_list = mylist.split('$')
print (tmp_list) # ['', '17', "b'This is my first stock I bought 01.27.2019'"] ---> Not sure why I have the '' as the first item in the tmp_list
tmp_iter = iter(tmp_list)
res['myinfo']= '{' + '},{'.join(f'{n},{s}' for n, s in zip(tmp_iter, tmp_iter)) + '}'
I want my res['myinfo'] to be "{{17, This is my first stock I bought 01.27.2019}, ...many more {,}}.
At times, the res['myinfo'] could just be "{}" if the mylist = [""].
I am not sure on how to fix my code, any help would be appreciated.
First of all
mylist = "$17$b'This is my first stock I bought 01.27.2019'"
Should be
mylist = "17$b'This is my first stock I bought 01.27.2019'"
Just as #takendarkk said
That's probably because the first character in the string is your
separator ($). What comes before that? Nothing. – takendarkk
As for decoding your string assuming they are all like: " b' some string ' " and have an age before them you have to decode every odd indexed elements in your list. A very straigth forward solution could be:
def decodeString(stringToDecode):
decodedList = []
for i in range(2, len(stringToDecode)-1): #every character except the first 2 and the last one
decodedList.append(stringToDecode[i])
decodedString = ''.join(decodedList)
return decodedString
for i in range(len(tmp_list)):
if i % 2 == 1: #every odd indexed element
decodedString = decodeString(tmp_list[i])
tmp_list[i] = decodedString
And also it seems like you need 1 more layer of '{}' here:
res['myinfo']= '{' + '},{'.join(f'{n},{s}' for n, s in zip(tmp_iter, tmp_iter)) + '}'
This way your res['myinfo'] is {17,This is my first stock I bought 01.27.2019},{18,This is my second stock I bought 01.28.2019} but if you want it to be '{{17,This is my first stock I bought 01.27.2019},{18,This is my second stock I bought 01.27.2019}}' as you said you need:
res['myinfo']= '{' + '{' + '},{'.join(f'{n},{s}' for n, s in zip(tmp_iter, tmp_iter)) + '}' + '}'
Related
I am new in python.
I have a list with seperator of "::" and it seems like that;
1::Erin Burkovich (2000)::Drama
2::Assassins (1995)::Thriller
I want to split them by "::" and extract the year from name and add it into the end of the line. Each movie has it own index.
Desired list seems like;
1::Erin Burkovich:Drama::2000
2::Assasins:Thriller:1995
I have below code:
for i in movies:
movie_id,movie_title,movie_genre=i.split("::")
movie_year=((movie_title.split(" "))[-1]).replace("(","").replace(")","")
movies.insert(-1, movie_year)
but it doesn't work at all.
Any help ?
Thanks in advance.
You're having infinite loop, because when you add an item, your loop needs to iterate on more items, and then you're adding another item...
You should create a new list with the result.
Also, you can extract the list in a much easier way:
movie_year = re.findall('\d+', '(2000)')
Instead of splitting, you can use re.findall to grab all alphanumeric characters, including whitespace, and then regroup:
import re
s = ['1::Erin Burkovich (2000)::Drama', '2::Assassins (1995)::Thriller']
new_data = [re.sub('\s(?=\:)', '', "{}::{}:{}:{}".format(id, name, type, year)) for id, name, year, type in [re.findall('[a-zA-Z0-9\s]+', i) for i in s]]
Output:
['1::Erin Burkovich:Drama:2000', '2::Assassins:Thriller:1995']
Another (probably less elegant) way:
for i in movies:
split_list = i.split("::")
movie_id = split_list[0]
movie_title = split_list[1].split('(')
movie_genre = split_list[2]
print movie_id + '::' + movie_title[0].strip() + "::" + movie_genre + "::" + movie_title[1].strip(')')
For python 3.6,check out this
a="""1::Erin Burkovich (2000)::Drama
2::Assassins (1995)::Thriller"""
a=a.split("\n")
c=[]
for b in range(len(a)):
g=[]
d=a[b].split("::")
e=d[1].split(" (")[1].split(")")[0]
f=d[1].split(" (")[0]
g.append(d[0])
g.append(f)
g.append(d[2])
g.append(e)
h="::".join(g)
c.append(h)
print("\n".join(c))
OUTPUT::
1::Erin Burkovich::Drama::2000
2::Assassins::Thriller::1995
Many issues,
split doesn't return a tuple but a list, so it can't be assigned directly
movie year split is fine but you aren't removing the year from the original title
inserting into movies array is not a good idea, you need to replace the array element instead
I've rewritten the code based on what you wanted, hope it helps
movies=["1::Erin Burkovich (2000)::Drama", "2::Assassins (1995)::Thriller"]
for i in range(len(movies)):
movie_details=movies[i].split("::")
print movie_details
movie_id=movie_details[0]
movie_title=movie_details[1]
movie_genre=movie_details[2]
movie_title_parts=movie_title.split(" ")
movie_year=((movie_title_parts[-1]).replace("(","").replace(")",""))
del movie_title_parts[-1]
movie_title=" ".join(movie_title_parts)
print movie_title+", "+movie_year
movies[i]=movie_id+"::"+movie_title+"::"+movie_genre+"::"+movie_year
I am using the following code to bring back prices from an ecommerce website:
response.css('div.price.regularPrice::text').extract()
but getting the following result:
'\r\n\t\t\tDhs 5.00\r\n\t\t\t\t\t\t\t\t',
'\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t',
I do not want the slashes and letters and only the number 5. How do I get this?
First you can use strip() to remove tabs "\t" and enters "\n".
data = ['\r\n\t\t\tDhs 5.00\r\n\t\t\t\t\t\t\t\t',
'\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t']
data = [item.strip() for item in data]
and you get
['Dhs 5.00', '']
Next you can use if to skip empty elements
data = [item for item in data if item]
and you get
['Dhs 5.00']
If item always has the same structure Dns XXX.00
then you can use slicing [4:-3] to remove "Dhs " and ".00"
data = [item[4:-3] for item in data]
and you get
['5']
So now you have to only get first element data[0] to get 5.
If you need you can convert string "5" to integer 5 using int()
result = int(data[0])
You can even put all in one line
data = ['\r\n\t\t\tDhs 5.00\r\n\t\t\t\t\t\t\t\t',
'\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t']
data = [item.strip()[4:-3] for item in data if item.strip()]
result = int(data[0])
If you always need only first element from list then you can write it
data = ['\r\n\t\t\tDhs 5.00\r\n\t\t\t\t\t\t\t\t',
'\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t']
result = int( data[0].strip()[4:-3] )
Use regex to fetch only the numbers.
\d+ regex expression should do the trick.
I have been trying various solutions all yesterday, before I hung it up and went to bed. After coming back today and taking another look at it... I still cannot understand what is wrong with my regex statement.
I am trying to search my inventory based on a simple name and return an item index and the amount of that item that I have.
for instance, in my inventory instead of knife I could have bloody_knife[9] at the 0 index and the script should return 9, and 0, based on the query of knife.
The code:
import re
inventory = ["knife", "bottle[1]", "rope", "flashlight"]
def search_inventory(item):
numbered_item = '.*' + item + '\[([0-9]*)\].*'
print(numbered_item) #create regex statement
regex_num_item = re.compile(numbered_item)
print(regex_num_item) #compiled regex statement
for x in item:
match1 = regex_num_item.match(x) #regex match....
print(match1) #seems to be producing nothing.
if match1: #since it produces nothing the code fails.
num_item = match1.group()
count = match1.group(1)
print(count)
index = inventory.index(num_item)
else: #eventually this part will expand to include "item not in inventory"
print("code is wrong")
return count, index
num_of_item, item_index = search_inventory("knife")
print(num_of_item)
print(item_index)
The output:
.*knife\[([0-9]*)\].*
re.compile('.*knife.*\\[([0-9]*)\\].*')
None
code is wrong
One thing that I cannot seem to settle well with is when python takes the code in my numbered_item variable and uses it in the re.compile() function. why is it adding additional escapes when I already have the necessary [] escaped.
Has anyone run into something like this before?
Your issue is here:
for x in item:
That is looking at "for every character in your item knife". So your regex was running on k, then n, and so on. Your regex won't want that of course. If you still wanted to "see it", add a print x:
for x in item:
print x #add this line
match1 = regex_num_item.match(x) #regex match....
print(match1) #seems to be producing nothing.
You'll see that it will print each letter of the item. That's what you're matching against in your match1 = regex_num_item.match(x) so obiously it won't work.
You want to iterate over the inventory.
So you want:
for x in inventory: #meaning, for every item in inventory
Is the index important to you? Because you can change the inventory into a dictionary and you don't have to use regex:
inventory = {'knife':8, 'bottle':1, 'rope':1, 'flashlight':0, 'bloody_knife':1}
And then, if you wanted to find every item that has the word knife and how many you have of it:
for item in inventory:
if "knife" in item:
itemcount = inventory[item] #in a dictionary, we get the 'value' of the key this way
print "Item Name: " + item + "Count: " + str(itemcount)
Output:
Item Name: bloody_knife, Count: 1
Item Name: knife, Count: 8
I have a file at /location/data.txt . In this file I have entry like :
aaa:xxx:abc.com:1857:xxx1:rel5t2:y
ifa:yyy:xyz.com:1858:yyy1:rel5t2:y
I want to access 'aaa' from my code either I mention aaa while giving the input in caps or small after running my python code it should return me aaa is the right item
But here I want to include one exception that if I give the input with -mc suffix (aaa-mc) either in small latters or in caps it should ignore the -mc.
Below is the my code and output as well which I am getting now.
def pITEMName():
global ITEMList,fITEMList
pITEMList = []
fITEMList = []
ITEMList = str(raw_input('Enter pipe separated list of ITEMS : ')).upper().strip()
items = ITEMList.split("|")
count = len(items)
print 'Total Distint ITEM Count : ', count
pipelst = [i.split('-mc')[0] for i in ITEMList.split('|')]
filepath = '/location/data.txt'
f = open(filepath, 'r')
for lns in f:
split_pipe = lns.split(':', 1)
if split_pipe[0] in pipelst:
index = pipelst.index(split_pipe[0])
pITEMList=split_pipe[0]+"|"
fITEMList.append(pITEMList)
del pipelst[index]
for lns in pipelst:
print bcolors.red + lns,' is wrong ITEM Name' + bcolors.ENDC
f.close()
When I execute above code it prompts me like :
Enter pipe separated list of ITEMS :
And if I provide the list like :
Enter pipe separated list of ITEMS : aaa-mc|ifa
it gives me the result as :
Total Distint item Count : 2
AAA-MC is wrong item Name
items Belonging to other :
Other center :
item Count From Other center = 0
items Belonging to Current Centers :
Active items in US1 :
^IFA$
Active items in US2 :
^AAA$
Ignored item Count From Current center = 0
You Have Entered itemList belonging to this center as: ^IFA$|^AAA$
Active item Count : 2
Do You Want To Continue [YES|Y|NO|N] :
As you must be see in above result aaa is coming as valid count (active item count : 2) because its available in /location/data.txt file. but also its coming as AAA-MC is wrong item name (2nd line from above result). I want '-mc or -MC' to ignore with any item present or non present in /location/data.txt file.
Please let me know what's wrong with my above code to achieving this.
The issue you're having is that your code expects the "-mc" suffix to appear in lowercase, but you're calling the upper() method on the input string, resulting in text that is all upper case. You need to change one of those so that they match (it doesn't really matter which one).
Either replace the upper() call with lower(), or replace the string "-mc" with "-MC", and your code should work better (I'm not certain I understand all of it, so there may be other issues).
The way you are constructing ITEMList is by reading in a string, capitalizing it (with upper()), and stripping all whitespace. Therefore, something like 'aaa-mc' is being converted to 'AAA-MC'. You're later splitting this uppercase string on the token '-mc', which is impossible for it to contain, so.
I'd reccommed either replacing upper() with lower() when you are reading your string in, or doing a hard replace on the types of '-mc', so instead of
i.split('-mc')[0]
try using
i.replace('-mc','').replace('-MC','')
in your list comprension.
this little snippet of code is my attempt to pull multiple unique values out of rows in a CSV. the CSV looks something like this in the header:
descr1, fee part1, fee part2, descr2, fee part1, fee part2,
with the descr columns having many unique names in a single column. I want to take these unique fee names and make a new header out of them. to do this I decided to start by getting all the different descr columns names, so that when I start pulling data from the actual rows I can check to see if that row has a fee amount or one of the fee names I need. There are probably a lot of things wrong with this code, but I am a beginner. I really just want to know why my first if statement is never triggered when the l in fin does equal a comma, I know it must at some point as it writes a comma to my row string. thanks!
row = ''
header = ''
columnames = ''
cc = ''
#fout = open(","w")
fin = open ("raw data.csv","rb")
for l in fin:
if ',' == l:
if 'start of cust data' not in row:
if 'descr' in row:
columnames = columnames + ' ' + row
row = ''
else:
pass
else:
pass
else:
row = row+l
print(columnames)
print(columnames)
When you iterate over a file, you get lines, not characters -- and they have the newline character, \n, at the end. Your if ',' == l: statement will never succeed because even if you had a line with only a single comma in it, the value of l would be ",\n".
I suggest using the csv module: you'll get much better results than trying to do this by hand like you're doing.