Trying to parse list in loops - python

I have a basic list as follows
data = "ffff,999,John Doe, Sam Adams"
mydata = data.split(',')
I want to be able to check if the 4th field is not null and if it is set a variable to the 4th field else set the variable to the 3rd field.
I have the following code
if mydata[3] is not None:
name = mydata[3]
elif mydata[2] is not None:
name = mydata[2]
The first part works, but if I set data to
data = "ffff,999,John Doe,"
The code doesn't do anything. What am I doing wrong?
Thanks

Since you split on ',' and data has trailing comma, your last item in lst is an empty string
>>> data = "ffff,999,John Doe,"
>>>
>>> data.split(',')
['ffff', '999', 'John Doe', '']
>>> lst = data.split(',')
>>>
>>> lst[3] is not None
True
This is how split() behaves.
Python2.7 str.split()

Related

Is it possible to change cell value by dictionaly in Pandas DataFrame by iteration over list in the cell

UPDATED
Pandas DataFram I have a column that contains a list like the below in cells
df_lost['Article]
out[6]:
37774 186-2, 185-3, 185-2
37850 358-1, 358-4
37927
38266 111-2
38409 111-2
38508
38519 185-1
41161 185-4, 357-1
42948 185-1
Name: Article, dtype: object
for each entry like '182-2', '111-2' etch I have a dictionary like
aDict = {'111-2': 'Text-1', '358-1': 'Text-2'.....}'
is it possible to iterate over the list in the df cells and change the value to the value of a key from the dictionary?
Expected result:
37774 ['Text 1, Text 2, Text -5']
....
I have tried to use the map function
df['Article'] = df['Article'].map(aDict)
but it doesn't work with the list in a cell. As a temp solution, I have created the dictionary
aDict = {'186-2, 185-3, 185-2': 'Test - 1, test -2, test -3".....}
this works but the number of combinations is extremely big
You need to split the string at the comma delimiters, and then look up each element in the dictionary. You also have to index the list to get the string out of the first element, and wrap the result string back into a list.
def convert_string(string_list, mapping):
items = string[0].split(', ')
new_items = [mapping.get(i, i) for i in items]
return [', '.join(new_items)]
df['Article'] = df['Article'].map(convert_string)
I would use a regex and str.replace here:
aDict = {'111-2': 'Text1', '358-1': 'Text 2'}
import re
pattern = '|'.join(map(re.escape, aDict))
df['Article'] = df['Article'].str.replace(pattern, lambda m: aDict[m.group()], regex=True)
NB. If the dictionary keys can overlap (ab/abc), then they should be sorted by decreasing length to generate the pattern.
Output:
Article
37774 186-2, 185-3, 185-2
37850 Text 2, 358-4
37927
38266 Text1
38409 Text1
38508
38519 185-1
41161 185-4, 357-1
42948 185-1

Excel cell into list in Python

So I have an Excel column which contains Python lists.
The problem is that when I'm trying to loop through it in Python it reads the cells as str. Attempt to split it makes the items in a list generate as e.g.:
list = ["['Gdynia',", "'(2262011)']"]
list[0] = "['Gdynia,'"
list1 = "'(2261011)']"
I want only to get the city name which is e.g. 'Gdynia' or 'Tczew'. Any idea how can I make it possible?
You can split the string at a desired symbol, ' would be good for your example.
Then you get a list of strings and you can chose the part you need.
str = "['Gdynia',", "'(2262011)']"
str_parts = str.split("'") #['[', 'Gdynia', ',', '(2262011)', ']']
city = str_parts[1] #'Gdynia'
Solution with re:
import re
data = ["['Gdynia', '(2262011)'",
"['Tczew', '(2214011)']",
"['Zory', ’(2479011)']"]
r = re.compile("'(.*?)'")
print(*[r.search(s).group(1) for s in data], sep='\n')
Output
Gdynia
Tczew
Zory

How do I remove everything after a certain character in a value in a dictionary for all dictionaries in a group of dictionaries?

My goal is to remove all characters after a certain character in a value from a set of dictionaries.
I have imported a CSV file from my local machine and printed using the following code:
import csv
with open('C:\Users\xxxxx\Desktop\Aug_raw_Page.csv') as csvfile:
reader=csv.DictReader(csvfile)
for row in reader:
print row
I get a set of directories that look like:
{Pageviews_Aug':'145', 'URL':'http://www.domain.com/#fbid=12345'}
For any directory that includes a value with #fbid, I am trying to removing #fbid and any characters that come after that - for all directories where this is true.
I have tried:
for key,value in row.items():
if key == 'URL' and '#' in value or 'fbid' in value
value.split('#')[0]
print row
Didn't work.
Don't think rsplit will work as it removes only whitespace.
Fastest way I thought about is using rsplit()
out = text.rsplit('#fbid')[0]
Okay, so I'm guessing your problem isn't in removing the text that comes afer the # but in getting to that string.
What is 'row'?
I'm guessing it's a dictionnary with a single 'URL' key, am I wrong?
for key,value in row.items():
if key == 'URL' and '#fbid' in value:
print value.split('#')[0]
I don't quite get the whole format of your data.
If you want to edit a single variable in your dictionary, you don't have to iterate through all the items:
if 'URL' in row.keys():
if '#fbid' in row['URL']:
row['URL'] = row['URL'].rsplit('#fbid')[0]
That should work.
But I really think you should copy an example of your whole data (three items would suffice)
Use a regular expression:
>>> import re
>>> value = 'http://www.domain.com/#fbid=12345'
>>> re.sub(ur'#fbid.*','',value)
'http://www.domain.com/'
>>> value = 'http://www.domain.com/'
>>> re.sub(ur'#fbid.*','',value)
'http://www.domain.com/'
for your code you could do something like this to get the answer in the same format as before:
import csv
with open('C:\Users\xxxxx\Desktop\Aug_raw_Page.csv') as csvfile:
reader=csv.DictReader(csvfile)
for row in reader:
row['URL'] = re.sub(ur'#fbid.*','',row['URL'])
print row
given your sample code, it looks to you that don't work because you don't save the result of value.split('#')[0], do something like
for key,value in row.items():
if key == 'URL' and '#' in value or 'fbid' in value
new_value = value.split('#')[0] # <-- here save the result of split in new_value
row[key] = new_value # <-- here update the dict row
print row # instead of print each time, print it once at the end of the operation
this can be simplify to
if '#fbid' in row['URL']:
row['URL'] = row['URL'].split('#fbid')[0]
because it only check for one key.
example
>>> row={'Pageviews_Aug':'145', 'URL':'http://www.domain.com/#fbid=12345'}
>>> if "#fbid" in row["URL"]:
row["URL"] = row['URL'].split("#fbid")[0]
>>> row
{'Pageviews_Aug': '145', 'URL': 'http://www.domain.com/'}
>>>

why is my python code returning text:'my string' instead of just my string?

my code snippet looks like this:
for current_row in range(worksheet.nrows):
fname_text = worksheet.row(current_row)[0]
lname_text = worksheet.row(current_row)[1]
cmt = worksheet.row(current_row)[2]
print (fname_text, lname_text, cmt)
this prints:
text:'firstname' text:'lastname' text'the cmt line'
i want it just to return:
firstname lastname the cmt line
what do i need to change to make this happen?
That's what Cell objects look like:
>>> sheet.row(0)
[text:u'RED', text:u'RED', empty:'']
>>> sheet.row(0)[0]
text:u'RED'
>>> type(sheet.row(0)[0])
<class 'xlrd.sheet.Cell'>
You can get at the wrapped values in a few ways:
>>> sheet.row(0)[0].value
u'RED'
>>> sheet.row_values(0)
[u'RED', u'RED', '']
and remember you can access cells without going via row:
>>> sheet.cell(0,0).value
u'RED'

capturing the usernames after List: tag

I am trying to create a list named "userlist" with all the usernames listed beside "List:",
my idea is to parse the line with "List:" and then split based on "," and put them in a list,
however am not able to capture the line ,any inputs on how can this be achieved?
output=""" alias: tech.sw.host
name: tech.sw.host
email: tech.sw.host
email2: tech.sw.amss
type: email list
look_elsewhere: /usr/local/mailing-lists/tech.sw.host
text: List tech SW team
list_supervisor: <username>
List: username1,username2,username3,username4,
: username5
Members: User1,User2,
: User3,User4,
: User5 """
#print output
userlist = []
for line in output :
if "List" in line:
print line
If it were me, I'd parse the entire input so as to have easy access to every field:
inFile = StringIO.StringIO(ph)
d = collections.defaultdict(list)
for line in inFile:
line = line.partition(':')
key = line[0].strip() or key
d[key] += [part.strip() for part in line[2].split(',')]
print d['List']
Using regex, str.translate and str.split :
>>> import re
>>> from string import whitespace
>>> strs = re.search(r'List:(.*)(\s\S*\w+):', ph, re.DOTALL).group(1)
>>> strs.translate(None, ':'+whitespace).split(',')
['username1', 'username2', 'username3', 'username4', 'username5']
You can also create a dict here, which will allow you to access any attribute:
def func(lis):
return ''.join(lis).translate(None, ':'+whitespace)
lis = [x.split() for x in re.split(r'(?<=\w):',ph.strip(), re.DOTALL)]
dic = {}
for x, y in zip(lis[:-1], lis[1:-1]):
dic[x[-1]] = func(y[:-1]).split(',')
dic[lis[-2][-1]] = func(lis[-1]).split(',')
print dic['List']
print dic['Members']
print dic['alias']
Output:
['username1', 'username2', 'username3', 'username4', 'username5']
['User1', 'User2', 'User3', 'User4', 'User5']
['tech.sw.host']
Try this:
for line in output.split("\n"):
if "List" in line:
print line
When Python is asked to treat a string like a collection, it'll treat each character in that string as a member of that collection (as opposed to each line, which is what you're trying to accomplish).
You can tell this by printing each line:
>>> for line in ph:
... print line
...
a
l
i
a
s
:
t
e
...
By the way, there are far better ways of handling this. I'd recommend taking a look at Python's built-in RegEx library: http://docs.python.org/2/library/re.html
Try using strip() to remove the white spaces and line breakers before doing the check:
if 'List:' == line.strip()[:5]:
this should capture the line you need, then you can extract the usernames using split(','):
usernames = [i for i in line[5:].split(',')]
Here is my two solutions, which are essentially the same, but the first is easier to understand.
import re
output = """ ... """
# First solution: join continuation lines, the look for List
# Join lines such as username5 with previous line
# List: username1,username2,username3,username4,
# : username5
# becomes
# List: username1,username2,username3,username4,username5
lines = re.sub(r',\s*:\s*', ',', output)
for line in lines.splitlines():
label, values = [token.strip() for token in line.split(':')]
if label == 'List':
userlist = userlist = [user.strip() for user in values.split(',')]
print 'Users:', ', '.join(userlist)
# Second solution, same logic as above
# Different means
tokens, = [line for line in re.sub(r',\s*:\s*', ',', output).splitlines()
if 'List:' in line]
label, values = [token.strip() for token in tokens.split(':')]
userlist = userlist = [user.strip() for user in values.split(',')]
print 'Users:', ', '.join(userlist)

Categories

Resources