Removed the default content in nested expression - python

I am using Pyparsing module and the nestedExpr function in it.
I want to give a delimitter instead of the default whitespace-delimited in the content argument of nestedexpr function.
If I have a text such as the following
text = "{{Infobox | birth_date = {{birth date and age|mf=yes|1981|1|31}}| birth_place = ((Memphis, Tennessee|Memphis)), ((Tennessee)), U.S.| instrument = ((Beatboxing)), guitar, keyboards, vocalsprint expr.parse| genre = ((Pop music|Pop)), ((contemporary R&B|R&B))| occupation = Actor, businessman, record producer, singer| years_active = 1992–present| label = ((Jive Records|Jive)), ((RCA Records|RCA)), ((Zomba Group of Companies|Zomba))| website = {{URL|xyz.com|Official website}} }}"
When I give nestedExpr('{{','}}').parseString(text) I need the output as the following list:
['Infobox | birth_date =' ,['birth date and age|mf=yes|1981|1|31'],'| birth_place = ((Memphis, Tennessee|Memphis)), ((Tennessee)), U.S.| instrument = ((Beatboxing)), guitar, keyboards, vocalsprint expr.parse| genre = ((Pop music|Pop)), ((contemporary R&B|R&B))| occupation = Actor, businessman, record producer, singer| years_active = 1992–present| label = ((Jive Records|Jive)), ((RCA Records|RCA)), ((Zomba Group of Companies|Zomba))| website =',[ 'URL|xyz.com|Official website' ]]
How can I give a ',' or '|' as the delimmiter instead of the whitespace-delimited characters? I tried giving the characters but it didnt work.

Related

Extract data from text file using Python (or any language)

I have a text file that looks like:
First Name Bob
Last name Smith
Phone 555-555-5555
Email bob#bob.com
Date of Birth 11/02/1986
Preferred Method of Contact Text Message
Desired Appointment Date 04/29
Desired Appointment Time 10am
City Pittsburgh
Location State
IP Address x.x.x.x
User-Agent (Browser/OS) Apple Safari 14.0.3 / OS X
Referrer http://www.example.com
First Name john
Last name Smith
Phone 555-555-4444
Email john#gmail.com
Date of Birth 03/02/1955
Preferred Method of Contact Text Message
Desired Appointment Date 05/22
Desired Appointment Time 9am
City Pittsburgh
Location State
IP Address x.x.x.x
User-Agent (Browser/OS) Apple Safari 14.0.3 / OS X
Referrer http://www.example.com
.... and so on
I need to extract each entry to a csv file, so the data should look like: first name, last name, phone, email, etc. I don't even know where to start on something like this.
first of all you'll need to open the text file in read mode.
I'd suggest using a context manager like so:
with open('path/to/your/file.txt', 'r') as file:
for line in file.readlines():
# do something with the line (it is a string)
as for managing the info you could build some intermediate structure, for example a dictionary or a list of dictionaries, and then translate that into a CSV file with the csv module.
you could for example split the file whenever there is a blank line, maybe like this:
with open('Downloads/test.txt', 'r') as f:
my_list = list() # this will be the final list
entry = dict() # this contains each user info as a dict
for line in f.readlines():
if line.strip() == "": # if line is empty start a new dict
my_list.append(entry) # and append the old one to the list
entry = dict()
else: # otherwise split the line and create new dict
line_items = line.split(r' ')
print(line_items)
entry[line_items[0]] = line_items[1]
print(my_list)
this code won't work because your text is not formatted in a consistent way: you need to find a way to make the split between "title" and "content" (like "first name" and "bob") in a consistent way. I suggest maybe looking at regex and fixing the txt file by making spacing more consistent.
assuming the data resides in a:
a="""
First Name Bob
Last name Smith
Phone 555-555-5555
Email bob#bob.com
Date of Birth 11/02/1986
Preferred Method of Contact Text Message
Desired Appointment Date 04/29
Desired Appointment Time 10am
City Pittsburgh
Location State
IP Address x.x.x.x
User-Agent (Browser/OS) Apple Safari 14.0.3 / OS X
Referrer http://www.example.com
First Name john
Last name Smith
Phone 555-555-4444
Email john#gmail.com
Date of Birth 03/02/1955
Preferred Method of Contact Text Message
Desired Appointment Date 05/22
Desired Appointment Time 9am
City Pittsburgh
Location State
IP Address x.x.x.x
User-Agent (Browser/OS) Apple Safari 14.0.3 / OS X
Referrer http://www.example.com
"""
line_sep = "\n" # CHANGE ME ACCORDING TO DATA
fields = ["First Name", "Last name", "Phone",
"Email", "Date of Birth", "Preferred Method of Contact",
"Desired Appointment Date", "Desired Appointment Time",
"City", "Location", "IP Address", "User-Agent","Referrer"]
records = a.split(line_sep * 2)
all_records = []
for record in records:
splitted_record = record.split(line_sep)
one_record = {}
csv_record = []
for f in fields:
found = False
for one_field in splitted_record:
if one_field.startswith(f):
data = one_field[len(f):].strip()
one_record[f] = data
csv_record.append(data)
found = True
if not found:
csv_record.append("")
all_records.append(";".join(csv_record))
one_record will have the record as dictionary and csv_record will have it as a list of fields (ordered as fields variable)
Edited to add: ignore this answer, the code from Koko Jumbo looks infinitely more sensible and actually gives you a CVS file at the end of it! It was a fun exercise though :)
Just to expand on fcagnola's code a bit.
If it's a quick and dirty one-off, and you know that the data will be consistently presented, the following should work to create a list of dictionaries with the correct key/value pairing. Each line is processed by splitting the line and comparing the line number (reset to 0 with each new dict) against an array of values that represent where the boundary between key and value falls.
For example, "First Name Bob" becomes ["First","Name","Bob"]. The function has been told that linenumber= 0 so it checks entries[linenumber] to get the value "2", which it uses to join the key name (items 0 & 1) and then join the data (items 2 onwards). The end result is ["First Name", "Bob"] which is then added to the dictionary.
class Extract:
def extractEntry(self,linedata,lineindex):
# Hardcoded list! The quick and dirty part.
# This is specific to the example data provided. The entries
# represent the index to be used when splitting the string
# between the key and the data
entries = (2,2,1,1,3,4,3,3,1,1,2,2,1)
return self.createNewEntry(linedata,entries[lineindex])
def createNewEntry(self,linedata,dataindex):
list_data = linedata.split()
key = " ".join(list_data[:dataindex])
data = " ".join(list_data[dataindex:])
return [key,data]
with open('test.txt', 'r') as f:
my_list = list() # this will be the final list
entry = dict() # this contains each user info as a dict
extr = Extract() # class for splitting the entries into key/value
x = 0
for line in f.readlines():
if line.strip() == "": # if line is empty start a new dict
my_list.append(entry) # and append the old one to the list
entry = dict()
x = 0
else: # otherwise split the line and create new dict
extracted_data = extr.extractEntry(line,x)
entry[extracted_data[0]] = extracted_data[1]
x += 1
my_list.append(entry)
print(my_list)

Django: Select object contains all keywords

Here is the db looks like:
id | Post | tag
1 | Post(1) | 'a'
2 | Post(1) | 'b'
3 | Post(2) | 'a'
4 | Post(3) | 'b'
And here is the code of the module
class PostMention(models.Model):
tag = models.CharField(max_length=200)
post = models.ForeignKey(Post,on_delete=models.CASCADE)
Here is the code of search,
def findPostTag(tag):
keywords=tag.split(' ')
keyQs = [Q(tag=x) for x in keywords]
keyQ = keyQs.pop()
for i in keyQs:
keyQ &= i
a = PostMention.objects.filter(keyQ).order_by('-id')
if not a:
a=[]
return a
(this code does not work correctly)
I withdraw the tags and save each as one row in the database. Now I want to make a search function that the user can input more than one keywords at the same time, like 'a b', and it will return 'Post(1)'. I searched for some similar situations, but seems all about searching for multi keywords in one row at the same time, like using Q(tag='a') & Q(tag='b'), it will search for the tag that equals to both 'a' and 'b'(in my view), which is not what I want (and get no result, obviously). So is there any solution to solve this? Thanks.
Is this cases, django provides, ManyToManyField, to work correctly you must to use:
class Tags(models.Model):
tag = models.CharField(unique=True, verbose_name='Tags')
class Post(models.Model): #your model
title = models.CharField(verbosone_name = 'Title')
post_tags = models.ManyToManyField(Tags, verbose_name='Choice your tags')
So you'll choice many tags to your post

Split and save text string on scrapy

I need split a substring from a string, exactly this source text:
Article published on: Tutorial
I want delete "Article published on:" And leave only
Tutorial
, so i can save this
i try with:
category = items[1]
category.split('Article published on:','')
and with
for p in articles:
bodytext = p.xpath('.//text()').extract()
joined_text = ''
# loop in categories
for each_text in text:
stripped_text = each_text.strip()
if stripped_text:
# all the categories together
joined_text += ' ' + stripped_text
joined_text = joined_text.split('Article published on:','')
items.append(joined_text)
if not is_phrase:
title = items[0]
category = items[1]
print('title = ', title)
print('category = ', category)
and this don't works, what im missing?
error with this code:
TypeError: 'str' object cannot be interpreted as an integer
You probably just forgot to assign the result:
category = category.replace('Article published on:', '')
Also it seems that you meant to use replace instead of split. The latter also works though:
category = category.split(':')[1]

Refactoring Django query

I wrote some instructions in order to extract data from my database.
I have two values; a city name and a keyword, which are attributes of Address and Museum:
class Museum(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=200)
address = models.ForeignKey(Address)
description = models.CharField(max_length=200)
class Address(models.Model):
id = models.AutoField(primary_key=True)
streetAddress = models.CharField(max_length=200)
city = models.CharField(max_length=200)
Now I am receiving two optional parameters: a city and and a keyword. I want to filter out museums according to such city (exact match) AND such keyword (partial match in name OR description)
This is what I ended up writing:
if city is not None and keyword is None:
city_data = Address.objects.all().filter(city=city)
museum_list = Museum.objects.all().filter(address__in=city_data)
elif city is None and keyword is not None:
museum_list = Museum.objects.all().filter(
Q(name__contains=keyword) | Q(description__contains=keyword)
)
elif city is not None and keyword is not None:
city_data = Address.objects.all().filter(city=city)
museum_list = Museum.objects.all().filter(
Q(address__in=city_data) & (
Q(name__contains = keyword) | Q(description__contains=keyword)
)
)
else:
museum_list = Museum.objects.all()
I don't like this code, because I am accounting for all possible combinations. How can I use Django filtering to improve such code to something like:
results = Museum.objects.all()
if city not null
results = results.filterByAddress_City
if keyword not null
results = results.filterByKeywordLikeNameOrLikeDescription
Thanks.
Queries are composable, so you can pretty much do exactly what you state in your pseudocode.
results = Museum.objects.all()
if city:
results = results.filter(address__city=city)
if keyword:
results = results.filter(Q(name__contains = keyword) | Q(description__contains = keyword))

change a key to a dictionary in google app engine - solved using a different approach

I'm having trouble changing a key value to a dictionary value
def get(self):
#Get all the Subjects
subjects = ndb.gql('SELECT name,order FROM Subject ORDER BY order ASC')
values = {'subjects':subjects}
#Get all the Contents
for subject in subjects:
contents = ndb.gql('SELECT * FROM Content WHERE ANCESTOR IS :1 ORDER BY order ASC',subject.key)
values[subject.name] = contents #***HERE is the issue***
Rather than getting a dictionary
value = {key:value}
I'm trying to get
value = {{key:value}:value}
Thanks in advance for any suggestions!
EDIT:
When I try
values['subject':subject.name] = contents
I get the error
TypeError: unhashable type
Solved: with a different approach:
def get(self):
#Get all the Subjects
subjects = ndb.gql('SELECT name,order FROM Subject ORDER BY order ASC')
values = {'subjects':subjects}
#Get all the Contents
values['contents'] = []
for subject in subjects:
#Formatting HTML output
subjectAll = subject.name + ' ' + subject.order
contents = ndb.gql('SELECT name,order FROM Content WHERE ANCESTOR IS :1 ORDER BY order ASC',subject.key)
values['contents'].append(subjectAll)
for content in contents:
#Formatting HTML output
contentAll = content.name + ' ' + content.order
values['contents'].append(contentAll)

Categories

Resources