Trying to make a local file dictionary and search system

Trying to make a local file dictionary and search system - python

I am trying to make a local file searcher, which will search for files based on tags, and also will by names. i dont have any idea on how to make the searching system nor the python dictionary and searching with tags which confuse me.
files = {'samplefile1.txt', 'samplefile2.txt'}
fileName = ''
fileDiscription = 'Enter Discription here'
isMP3File = True
isMP4File = True
isTxtFile = True
isArchived = True
tags = ['sample1', 'sample2', 'favorited']
filesDictionary = {
'samplefile1.txt': {
fileName: 'coolFile1',
fileDiscription: 'cool disc.',
isMP3File: False,
isMP4File: False,
isTxtFile: True,
isArchived: False,
tags = ['sample1', 'favorited']
},
'samplefile1.txt': {
fileName: 'coolFile2',
fileDiscription: 'cool disc2',
isMP3File: False,
isMP4File: False,
isTxtFile: True,
isArchived: True,
tags = ['sample2']
},
}
so in the code above, with search function, it should show only samplefile1.txt when searched by 'sample1', or 'favorited', or samplefile2.txt if searched with 'sample2'
(also fileName is the name i was talking about in this question, not the file name on pc)
(also any idea on how to automate this 'files' dictionary adding using gui (something like how you would post stuff to twitter or smth, with ticks and message boxes))

Create a dictionary where you have each tag as a key, and the filename as a value.
Since you want to search by tag, having the tags as keys will make the search time constant.
searchDict = {
'sample1': ['samplefile1.txt'],
'favorited': ['samplefile1.txt'],
'sample2': ['samplefile2.txt']
}
then given a tag you can just get the filename inmediately
searchDict['sample1'] # will return ['samplefile1.txt']
You can then use that key to access your main dictionary files
for filename in searchDict['sample1']:
print(files[filename])
will print
{
fileName: 'coolFile1',
fileDiscription: 'cool disc.',
isMP3File: False,
isMP4File: False,
isTxtFile: True,
isArchived: False,
tags = ['sample1', 'favorited']
}
To create the searchDict, you can iterate once over your database of files, getting the tags and associating them to the filenames. It will be a costly operation if your database is big, but once done your search will run in constant time.

Related

Extracting a value in python with specific JSON array

New to python how would I get the value out of the key value pair appid in the below JSON?
{
"Datadog":[
"host:i-068fee2324438213477be9a4"
],
"Amazon Web Services":[
"availability-zone:us-east-1a",
"aws:cloudformation:logical-id:ec2instance01",
"aws:cloudformation:stack-id:arn:aws:cloudformation:us-east-1:353245",
"appid:42928482474dh28424a",
"name:devinstance",
"region:us-east-1",
"security-group:sg-022442414d8a",
"security-group:sg-0691af18875ad9d0b",
"security-group:sg-022442414d8a",
"security-group:sg-022442414d8a"
]
}

What you're using is a dictionnary. You can access the values like this
nameOfYourDict["nameOfYourKey"]
For example if the name of your dict is data and you want to access Datadog :
data["Datadog"]

Start by getting the AWS pairs into their own variable:
aws_pairs = data["Amazon Web Services"]
Then loop over the pairs until you find one with the correct anchor:
appid_pair = None
for pair in aws_pairs:
if pair.startswith("appid:"):
appid_pair = pair
break
appid_value = None
if appid_pair:
appid_value = appid_pair.split(":", 1)[1]
print(appid_value)
Breaking this down into a simple next statement:
aws_pairs = data["Amazon Web Services"]
appid_value = next(
(
pair.split(":", 1)[1]
for pair in aws_pairs
if pair.startswith("appid:")
),
None
)
print(appid_value)

It's not really a JSON thing, you have a dictionary of lists so extract the relevant list then search it for the item you're looking for:
x = {
"Datadog":[
"host:i-068fee2324438213477be9a4"
],
"Amazon Web Services":[
"availability-zone:us-east-1a",
"aws:cloudformation:logical-id:ec2instance01",
"aws:cloudformation:stack-id:arn:aws:cloudformation:us-east-1:353245",
"appid:42928482474dh28424a",
"name:devinstance",
"region:us-east-1",
"security-group:sg-022442414d8a",
"security-group:sg-0691af18875ad9d0b",
"security-group:sg-022442414d8a",
"security-group:sg-022442414d8a"
]
}
aws = x["Amazon Web Services"]
for string in aws:
name, value = string.split(":", 1)
if name == "appid":
print(value)
Gives:
42928482474dh28424a

The most efficient approach I can think of is to check if each tag (under the "Amazon Web Services" key) starts with a specified prefix, or tag name in this case.
Note that you can also use str.startswith, however here I just use a substring lookup, which also has the same effect.
data = {
"Datadog": [
"host:i-068fee2324438213477be9a4"
],
"Amazon Web Services": [
"availability-zone:us-east-1a",
"aws:cloudformation:logical-id:ec2instance01",
"aws:cloudformation:stack-id:arn:aws:cloudformation:us-east-1:353245",
"appid:42928482474dh28424a",
"name:devinstance",
"region:us-east-1",
"security-group:sg-022442414d8a",
"security-group:sg-0691af18875ad9d0b",
"security-group:sg-022442414d8a",
"security-group:sg-022442414d8a"
]
}['Amazon Web Services']
target_tag = 'appid:'
len_tag_name = len(target_tag)
for tag in data:
if tag[:len_tag_name] == target_tag:
app_id = tag[len_tag_name:]
break
else: # no `break` statement encountered, hence app_id not found
app_id = None
assert app_id == '42928482474dh28424a' # True
And finally, here is a one-liner version of the above, using a next iterator to find the first match in a generator expression. This should work if you know for sure that an appid tag exists.
app_id = next(tag[len_tag_name:] for tag in data if tag[:len_tag_name] == target_tag)

Dynamically create list/dict during for loop for conversion to JSON

I am trying to build a list/dict that will be converted to JSON later on. I am trying to write the code that builds and populates the multiple levels of the JSON format I ultimately need. I am having an issue wrapping my head around this. Thank you for the help.
What I ultimately need -> Populate this list/dict:
dataset_permission_json = []
with this format:
{
"projects":[
{
"project":"test-project-1",
"datasets":[
{
"dataset":"testing1",
"permissions":[
{
"role":"READER",
"google_group":"testing1#test.com"
}
]
},
{
"dataset":"testing2",
"permissions":[
{
"role":"OWNER",
"google_group":"testing2#test.com"
}
]
},
{
"dataset":"testing3",
"permissions":[
{
"role":"READER",
"google_group":"testing3#test.com"
}
]
},
{
"dataset":"testing4",
"permissions":[
{
"role":"WRITER",
"google_group":"testing4#test.com"
}
]
}
]
}
]
}
I have multiple for loops that successfully print out the information I am pulling from an external API but I to be able to enter that data into the list/dict. The dynamic values I am trying to input are:
'project' i.e. test-project-1
'dataset' i.e. testing1
'role' i.e. READER
'google_group' i.e. testing1#test.com
I have tried things like:
dataset_permission_json.update({'project': project})
but cannot figure out how not to overwrite the data during the multiple for loops.
for project in projects:
print(project) ## Need to add this variable to 'projects'
for bq_group in bq_groups:
delegated_credentials = credentials.create_delegated(bq_group)
http_auth = delegated_credentials.authorize(Http())
list_datasets_in_project = bigquery_service.datasets().list(projectId=project).execute()
datasets = list_datasets_in_project.get('datasets',[])
print(dataset['datasetReference']['datasetId']) ##Add the dataset to 'datasets' under the project
for dataset in datasets:
get_dataset_permissions_result = bigquery_service.datasets().get(projectId=project, datasetId=dataset['datasetReference']['datasetId']).execute()
dataset_permissions = get_dataset_permissions_result.get('access',[])
### ADD THE NEXT LEVEL 'permissions' level here?
for dataset_permission in dataset_permissions:
if 'groupByEmail' in dataset_permission:
if bq_group in dataset_permission['groupByEmail']:
print(dataset['datasetReference']['datasetId'] && dataset_permission['groupByEmail']) ##Add to each dataset
I appreciate the help.
EDIT: Updated Progress
Ok I have created the nested structure that I was looking for using StackOverflow
Things are great except for the last part. I am trying to append the role & group to each 'permission' nest, but after everything runs the data is only appended to the last 'permission' nest in the JSON structure. It seems like it is overwriting itself during the for loop. Thoughts?
Updated for loop:
for project in projects:
for bq_group in bq_groups:
delegated_credentials = credentials.create_delegated(bq_group)
http_auth = delegated_credentials.authorize(Http())
list_datasets_in_project = bigquery_service.datasets().list(projectId=project).execute()
datasets = list_datasets_in_project.get('datasets',[])
for dataset in datasets:
get_dataset_permissions_result = bigquery_service.datasets().get(projectId=project, datasetId=dataset['datasetReference']['datasetId']).execute()
dataset_permissions = get_dataset_permissions_result.get('access',[])
for dataset_permission in dataset_permissions:
if 'groupByEmail' in dataset_permission:
if bq_group in dataset_permission['groupByEmail']:
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions']
permission = {'group': dataset_permission['groupByEmail'],'role': dataset_permission['role']}
dataset_permission_json['permissions'] = permission
UPDATE: Solved.
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions']
permission = {'group': dataset_permission['groupByEmail'],'role': dataset_permission['role']}
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions'] = permission

Analysis of fields in nested document elasticsearch

I am using a document with nested structure in it where the content is analysed in spite of my telling it "not analysed". The document is defined as follows:
class SearchDocument(es.DocType)
# Verblijfsobject specific data
gebruiksdoel_omschrijving = es.String(index='not_analyzed')
oppervlakte = es.Integer()
bouwblok = es.String(index='not_analyzed')
gebruik = es.String(index='not_analyzed')
panden = es.String(index='not_analyzed')
sbi_codes = es.Nested({
'properties': {
'sbi_code': es.String(index='not_analyzed'),
'hcat': es.String(index='not_analyzed'),
'scat': es.String(index='not_analyzed'),
'hoofdcategorie': es.String(fields= {'raw': es.String(in dex='not_analyzed')}),
'subcategorie': es.String(fields={'raw':es.String(index='not_analyzed')}),
'sub_sub_categorie': es.String(fields= {'raw': es.String(index='not_analyzed')}),
'bedrijfsnaam': es.String(fields= {'raw': es.String(index='not_analyzed')}),
'vestigingsnummer': es.String(index='not_analyzed')
}
})
As is clear, it says "not analysed" in the document for most fields. This works OK for the "regular fields". The problem is in the nested structure. There the hoofdcategorie and other fields are indexed for their separate words instead of the unanalysed version.
The structure is filled with the following data:
[
{
"sbi_code": "74103",
"sub_sub_categorie": "Interieur- en ruimtelijk ontwerp",
"vestigingsnummer": "000000002216",
"bedrijfsnaam": "Flippie Tests",
"subcategorie": "design",
"scat": "22279_12_22254_11",
"hoofdcategorie": "zakelijke dienstverlening",
"hcat": "22279_12"
},
{
"sbi_code": "9003",
"sub_sub_categorie": "Schrijven en overige scheppende kunsten",
"vestigingsnummer": "000000002216",
"bedrijfsnaam": "Flippie Tests",
"subcategorie": "kunst",
"scat": "22281_12_22259_11",
"hoofdcategorie": "cultuur, sport, recreatie",
"hcat": "22281_12"
}
]
Now when I retrieve aggregates it has split the hoofdcategorie in 3 different words ("cultuur", "sport", "recreatie"). This is not what I want, but as far as I know I have specified it correctly using the "not analysed" phrase.
Anyone any ideas?

Find and Replace text within headers with Win32COM

I'd like to find some words in the headers of a Word document and replace them with other words : I've done this in the body of the document with the following code, and it works fine.
import win32com.client
wdFindContinue = 1
wdReplaceAll = 2
app = win32com.client.DispatchEx("Word.Application")
app.Visible = 1
app.DisplayAlerts = 0
app.Documents.Open(document_path)
FromTo = {"<#TITLE#>":"My title", "<#DATE#>":"Today"}
for From in FromTo.keys():
app.Selection.Find.Execute(From, False, False, False, False, False, True, wdFindContinue, False, FromTo[From], wdReplaceAll)
The problem is that this code doesn't work for headers and footers. I've also tried this :
app.ActiveDocument.Sections(1).Headers(win32com.client.constants.wdHeaderFooterPrimary).Range.Select
app.Selection.Find.Execute(From, False, False, False, False, False, True, wdFindContinue, False, FromTo[From], wdReplaceAll)
But it doesn't work better (despite the fact that I don't have any error message).
Does someone have an idea on how to do that? Another information is that I have an image inserted in the headers as well, I don't know if it matters or not.

You must activate header/footer pane after open document.
Language Visual basic. Change syntax to python
ActiveDocument.ActiveWindow.Panes(1).View.SeekView=wdSeekCurrentPageHeader
for header and
ActiveDocument.ActiveWindow.Panes(1).View.SeekView = wdSeekCurrentPageFooter
for footer
Then search/replace
To change pane to main part use
ActiveDocument.ActiveWindow.Panes(1).View.SeekView = wdSeekMainDocument

MongoDB - Upsert with increment

I am trying to run the following query:
data = {
'user_id':1,
'text':'Lorem ipsum',
'$inc':{'count':1},
'$set':{'updated':datetime.now()},
}
self.db.collection('collection').update({'user_id':1}, data, upsert=True)
but the two '$' queries cause it to fail. Is it possible to do this within one statement?

First of all, when you ask a question like this it's very helpful to add information on why it's failing (e.g. copy the error).
Your query fails because you're mixing $ operators with document overrides. You should use the $set operator for the user_id and text fields as well (although the user_id part in your update is irrelevant at this example).
So convert this to pymongo query:
db.test.update({user_id:1},
{$set:{text:"Lorem ipsum", updated:new Date()}, $inc:{count:1}},
true,
false)
I've removed the user_id in the update because that isn't necessary. If the document exists this value will already be 1. If it doesn't exist the upsert will copy the query part of your update into the new document.

If you're trying to do the following:
If the doc doesn't exist, insert a new doc.
If it exists, then only increment one field.
Then you can use a combo of $setOnInsert and $inc. If the song exists then $setOnInsert won't do anything and $inc will increase the value of "listened". If the song doesn't exist, then it will create a new doc with the fields "songId" and "songName". Then $inc will create the field and set the value to be 1.
let songsSchema = new mongoose.Schema({
songId: String,
songName: String,
listened: Number
})
let Song = mongoose.model('Song', songsSchema);
let saveSong = (song) => {
return Song.updateOne(
{songId: song.songId},
{
$inc: {listened: 1},
$setOnInsert: {
songId: song.songId,
songName: song.songName,
}
},
{upsert: true}
)
.then((savedSong) => {
return savedSong;
})
.catch((err) => {
console.log('ERROR SAVING SONG IN DB', err);
})

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to make a local file dictionary and search system - python

Related

Extracting a value in python with specific JSON array

Dynamically create list/dict during for loop for conversion to JSON

Analysis of fields in nested document elasticsearch

Find and Replace text within headers with Win32COM

MongoDB - Upsert with increment

Categories

Resources