Find and replace the parameter of MediaWiki template - python

I am creating a small script to replace the parameter of MediaWiki Template. There are two types of MediaWiki Template form:
First (inline):
{{Infobox|name = ABC |work = ABC |year = 1021 }}
Second (non-inline):
{{Infobox
|name = ABC
|work = ABC
|year = 1021
}}
Now I want to replace the name with CBA:
{{Infobox
|name = CBA
|work = ABC
|year = 1021
}}
I have three variables in the Python script:
param = sheet.cell_value(i + 1, 1)
value = sheet.cell_value(i + 1, 2)
template = sheet.cell_value(i + 1, 3)
Here template = Infobox, param = name, value= CBA
I did some searches on Google and found that it will be done by regex. Let store the template content in the text variable. So How we find and replace it?
Please keep in mind that MediaWiki Template may be in both forms (inline or non-inline). and it should not replace the same values of other parameters.

I don't know if this helps:
msg = re.sub(r"^(.*name\s*=\s*)[A-Za-z0-9]+(.*)$", r"\1CBA\2", msg, flags=re.S)
Explanation:
The Code replaces the content in msg with "(regex match group)CBA(regex match group)"
Here is my test-case:
import re
pattern = r"name\s*=\s*([A-Za-z0-9]+)"
msg = '{{Infobox|name = ABC |work = ABC |year = 1021 }}'
print(msg)
msg_long = '{{Infobox \
|name = CBA \
|work = ABC \
|year = 1021 \
}}'
msg = re.sub(r"^(.*name\s*=\s*)[A-Za-z0-9]+(.*)$", r"\1CBA\2", msg, flags=re.S)
print(msg)
print(msg_long)
msg_long = re.sub(r"^(.*name\s*=\s*)[A-Za-z0-9]+(.*)$", r"\1CBA\2", msg_long, flags=re.S)
print(msg_long)

Related

icontains unable to search certain words from a field that uses richtext field - Django

I made a search function in my project where a user can enter a query and a number of fields will be searched before sending a response with the data filtered. As usual I am using icontains in the views for making the queries in my model. I copied certain words directly from a field that uses ckeditor to the searchbar to see if it works. What I have noticed is it is unable to match certain that are in bold form.
In the image for example if I search the words arbitration agreement no data is returned but as you can see the words exists in the field. This is happeing with all the bold words.
Please help me to solve the problem as I am unable to understand as to why this is happeing. Below is the view that deals with the search functionality. Using ckeditor for the field.
views.py
def search_citation(request):
q = request.data.get('q')
print(f'{q}')
if q is None:
q = ""
if len(q) > 78 or len(q) < 1:
return Response({"message":'not appropriate'}, status=status.HTTP_200_OK)
try:
judge_name = Civil.objects.filter(judge_name__icontains = q)
case_no = Civil.objects.filter(case_no__icontains = q)
party_name = Civil.objects.filter(party_name__icontains = q)
advocate_petitioner = Civil.objects.filter(advocate_petitioner__icontains = q)
advocate_respondent = Civil.objects.filter(advocate_respondent__icontains = q)
judgements = Civil.objects.filter(judgements__icontains = q)
institution_name = Civil.objects.filter(institution_name__icontains = q)
title = Civil.objects.filter(title__icontains = q)
sub_law_type = Civil.objects.filter(sub_law_type__icontains = q)
law_category = Civil.objects.filter(law_category__icontains = q)
q_final = judge_name | case_no | party_name | advocate_petitioner | advocate_respondent | title | sub_law_type | law_category | judgements | institution_name
q_serial = Initial_Detail_Serial(q_final, many = True)
return Response(q_serial.data, status= status.HTTP_200_OK)
except Exception as e:
print(e)
return Response({"error":str(e)}, status=status.HTTP_400_BAD_REQUEST)

Django: Select object contains all keywords

Here is the db looks like:
id | Post | tag
1 | Post(1) | 'a'
2 | Post(1) | 'b'
3 | Post(2) | 'a'
4 | Post(3) | 'b'
And here is the code of the module
class PostMention(models.Model):
tag = models.CharField(max_length=200)
post = models.ForeignKey(Post,on_delete=models.CASCADE)
Here is the code of search,
def findPostTag(tag):
keywords=tag.split(' ')
keyQs = [Q(tag=x) for x in keywords]
keyQ = keyQs.pop()
for i in keyQs:
keyQ &= i
a = PostMention.objects.filter(keyQ).order_by('-id')
if not a:
a=[]
return a
(this code does not work correctly)
I withdraw the tags and save each as one row in the database. Now I want to make a search function that the user can input more than one keywords at the same time, like 'a b', and it will return 'Post(1)'. I searched for some similar situations, but seems all about searching for multi keywords in one row at the same time, like using Q(tag='a') & Q(tag='b'), it will search for the tag that equals to both 'a' and 'b'(in my view), which is not what I want (and get no result, obviously). So is there any solution to solve this? Thanks.
Is this cases, django provides, ManyToManyField, to work correctly you must to use:
class Tags(models.Model):
tag = models.CharField(unique=True, verbose_name='Tags')
class Post(models.Model): #your model
title = models.CharField(verbosone_name = 'Title')
post_tags = models.ManyToManyField(Tags, verbose_name='Choice your tags')
So you'll choice many tags to your post

Split a paragraph into lines

I have this paragraph in a variable
"Information about this scan : abc version : 5.2.5 pqr version : 201 403061815 hello kdshfldfs;dfkfjljcsdlc sljc lsjclsj csjclks cscjsld"
I want to fetch 'abc version' and 'pqr version'.
How can I achieve that?
You can do it as follows:
string = "Your Paragraph String"
string = string.split()
abc_version = string[string.index('abc')+3]
AND
pqr_version = string[string.index('pqr')+3] #This will give 201
OR
pqr_version = ' '.join(string[string.index('pqr')+3:string.index('pqr')+5]) #This will give 201 403061815
Please specify where your pqr version string starts and ends.

Removed the default content in nested expression

I am using Pyparsing module and the nestedExpr function in it.
I want to give a delimitter instead of the default whitespace-delimited in the content argument of nestedexpr function.
If I have a text such as the following
text = "{{Infobox | birth_date = {{birth date and age|mf=yes|1981|1|31}}| birth_place = ((Memphis, Tennessee|Memphis)), ((Tennessee)), U.S.| instrument = ((Beatboxing)), guitar, keyboards, vocalsprint expr.parse| genre = ((Pop music|Pop)), ((contemporary R&B|R&B))| occupation = Actor, businessman, record producer, singer| years_active = 1992–present| label = ((Jive Records|Jive)), ((RCA Records|RCA)), ((Zomba Group of Companies|Zomba))| website = {{URL|xyz.com|Official website}} }}"
When I give nestedExpr('{{','}}').parseString(text) I need the output as the following list:
['Infobox | birth_date =' ,['birth date and age|mf=yes|1981|1|31'],'| birth_place = ((Memphis, Tennessee|Memphis)), ((Tennessee)), U.S.| instrument = ((Beatboxing)), guitar, keyboards, vocalsprint expr.parse| genre = ((Pop music|Pop)), ((contemporary R&B|R&B))| occupation = Actor, businessman, record producer, singer| years_active = 1992–present| label = ((Jive Records|Jive)), ((RCA Records|RCA)), ((Zomba Group of Companies|Zomba))| website =',[ 'URL|xyz.com|Official website' ]]
How can I give a ',' or '|' as the delimmiter instead of the whitespace-delimited characters? I tried giving the characters but it didnt work.

Python - sending unicode characters (prefixed with \u) in an HTTP POST request

I'm writing a program which fetches and edits articles on Wikipedia, and I'm having a bit of trouble handling Unicode characters prefixed with \u. I've tried .encode("utf8") and it isn't seeming to do the trick here. How can I properly encode these values prefixed with \u to POST to Wikipedia? See this edit for my problem.
Here is some code:
To get the page:
url = "http://en.wikipedia.org/w/api.php?action=query&format=json&titles="+urllib.quote(name)+"&prop=revisions&rvprop=content"
articleContent = ClientCookie.urlopen(url).read().split('"*":"')[1].split('"}')[0].replace("\\n", "\n").decode("utf-8")
Before I POST the page:
data = dict([(key, value.encode('utf8')) for key, value in data.iteritems()])
data["text"] = data["text"].replace("\\", "")
editInfo = urllib2.Request("http://en.wikipedia.org/w/api.php", urllib.urlencode(data))
You are downloading JSON data without decoding it. Use the json library for that:
import json
articleContent = ClientCookie.urlopen(url)
data = json.load(articleContent)
JSON encoded data looks a lot like Python, it uses \u escaping as well, but it is in fact a subset of JavaScript.
The data variable now holds a deep datastructure. Judging by the string splitting, you wanted this piece:
articleContent = data['query']['pages'].values()[0]['revisions'][0]['*']
Now articleContent is an actual unicode() instance; it is the revision text of the page you were looking for:
>>> print u'\n'.join(data['query']['pages'].values()[0]['revisions'][0]['*'].splitlines()[:20])
{{For|the game|100 Bullets (video game)}}
{{GOCEeffort}}
{{italic title}}
{{Supercbbox <!--Wikipedia:WikiProject Comics-->
| title =100 Bullets
| image =100Bullets vol1.jpg
| caption = Cover to ''100 Bullets'' vol. 1 "First Shot, Last Call". Cover art by Dave Johnson.
| schedule = Monthly
| format =
|complete=y
|Crime = y
| publisher = [[Vertigo (DC Comics)|Vertigo]]
| date = August [[1999 in comics|1999]] – April [[2009 in comics|2009]]
| issues = 100
| main_char_team = [[Agent Graves]] <br/> [[Mr. Shepherd]] <br/> The Minutemen <br/> [[List of characters in 100 Bullets#Dizzy Cordova (also known as "The Girl")|Dizzy Cordova]] <br/> [[List of characters in 100 Bullets#Loop Hughes (also known as "The Boy")|Loop Hughes]]
| writers = [[Brian Azzarello]]
| artists = [[Eduardo Risso]]<br>Dave Johnson
| pencillers =
| inkers =
| colorists = Grant Goleash<br>[[Patricia Mulvihill]]

Categories

Resources