"a"/"append" only once to yaml file - python

I'm trying to use yaml.dump to send some data from an .csv file to .yml file!
Everything works and I can send data successfully BUT the script is meant to be run many times... And this adds the same data to the .yml file every time i run the script.
My code in python:
# Reading whole csv file with panda library.
df = pandas.read_csv('Keywords.csv', sep=';')
for index, row in df.iterrows(): # Iterates the csv file.
pack_name = row.NAME # Name of the pack
print(pack_name)
def dumpFunction():
with open(f'packs/{pack_name}/config.yml', 'a') as outfile: # HERE I USE APPEND
yaml.dump_all(
df.loc[(df['NAME'] == pack_name)].to_dict(orient='records'), # Send keywords to the right pack.
outfile,
sort_keys=False,
indent=4
)
if pack_name and os.path.exists(f'packs/{pack_name}'): # Check if the pack is available, row.NAME is the pack name.
dumpFunction()
else:
pathlib.Path(f"packs/{pack_name}").mkdir() # If pack not exist I will make one new
dumpFunction()
print(f'{pack_name} was made!')
My .csv file (shortened) - semicolon separated:
NAME;KEYWORDS
.NET;apm, .net, language agent
.NET core;apm, .net
.NET MVC Web API;apm, .net
ActiveRecord;apm, ruby
Acts_as_solr;apm, ruby
My .yml file after I run script 3 times:
NAME: .NET
KEYWORDS: apm, .net, language agent
NAME: .NET
KEYWORDS: apm, .net, language agent
NAME: .NET
KEYWORDS: apm, .net, language agent
I only want it once like this in .yml file even if I run the script 10 times:
NAME: .NET
KEYWORDS: apm, .net, language agent

Try this one:
I didn't have enough data in the CSV to test it but it should work.
I have referenced this answer Python3 - write back to same stream in yaml
import os
import pathlib
import pandas
import yaml
df = pandas.read_csv('Keywords.csv', sep=';')
for index, row in df.iterrows(): # Iterates the csv file.
pack_name = row.NAME # Name of the pack
print(pack_name)
def dumpFunction():
with open(f'packs/{pack_name}/config.yml', 'w+') as outfile: # HERE I USE APPEND
f = outfile.read()
old_yaml_content = yaml.load_all(outfile, Loader=yaml.FullLoader)
outfile.seek(0)
yaml.dump_all(
df.loc[(df['NAME'] == pack_name)].to_dict(orient='records'), # Send keywords to the right pack.
outfile,
sort_keys=False,
indent=4
)
outfile.truncate()
if pack_name and os.path.exists(f'packs/{pack_name}'): # Check if the pack is available, row.NAME is the pack name.
dumpFunction()
else:
pathlib.Path(f"packs/{pack_name}").mkdir() # If pack not exist I will make one new
dumpFunction()
print(f'{pack_name} was made!')

Related

Download Public Github repository using Python function

I have a CSV file with one column. This column consists of around 100 GitHub public repo addresses (for example, NCIP/c3pr-docs)
I want to know if there is any way to download all these 100 public repo inside my computer using Python.
I don't want to use any command on the terminal, I need a function for it.
I use a very simple code to access to user and repo. Here it is:
import csv
import requests
#replace the name with your actual csv file name
file_name = "dataset.csv"
f = open(file_name)
csv_file = csv.reader(f)
second_column = [] #empty list to store second column values
for line in csv_file:
if line[1] == "Java":
second_column.append(line[0])
print(line[0]) #index 1 for second column
So by doing this I read a CSV file and get access to the users and repo.
I need a piece of code to help me download all these repo
Try this:
import requests
def download(user_and_repo, branch):
URL = f"https://github.com/{user_and_repo}/archive/{branch}.tar.gz"
response = requests.get(URL)
open(f"{user_and_repo.split('/')[1]}.tar.gz", "wb").write(response.content)
download("AmazingRise/hugo-theme-diary", "main")
Tested under Python 3.9.

How to make a python executable file from jupyter notebook that reads path of files to be read in the main program from a configuration file?

I am trying to make an executable(.exe) python file of my jupyter notebook code. The code basically reads a bunch of files from folder A and folder B, and finds the difference between the files in the folder, making a csv of the results. Where do I go about looking for how to set up a configuration file, which the executable file reads to get the path for the input folders (containing all the files) that need to be compared. This configuration file can be either json or text file that the user edits and adds the current directory for him in which the the two folders with the files are located. In my code, I read the folders from my own path and add the path to directory_A and directory_B as directory_A = r"C:\Users\Bilal\Python\Task1\OlderVersionFiles\" and for directory_B=r"C:\Users\Bilal\Python\Task1\NewVersionFiles\".
I know how to convert a jupyter notebook to a python executable thanks to :Is it possible to generate an executable (.exe) of a jupyter-notebook?
This creates a build folder with a lot of files and an application file that does not do anything in my case.
How can I make it so that it creates the Record.csv file that my code generates when I run it through jupyter by itself on clicking the executable file? using the static path code in the python file referring to the path of folders stored in my system.
How can I have an application file that reads paths from a configuration file and outputs a csv with differences between the folders?
My code for finding the difference is as follows
import os
import csv
import pandas as pd
import io
import re
dir_A_dict = dict()
directory_A = r"C:\\Users\\Bilal\\Python\\Task1\\OlderVersionFiles\\"
dir_A_files= [os.path.join(directory_A, x) for x in os.listdir(directory_A) if '.csv' in str(x)]
dir_B_dict = dict()
directory_B = r"C:\\Users\\Bilal\\Python\\Task1\\NewVersionFiles\\"
dir_B_files = [os.path.join(directory_B, x) for x in os.listdir(directory_B) if '.csv' in str(x)]
for file_ in dir_A_files:
f = open(file_, 'r')
reader = csv.reader(f)
header = next(reader)
for line in reader:
if ''.join(line) not in dir_A_dict.keys():
dir_A_dict[''.join(line)] = {
"record": line,
"file_name": os.path.basename(file_),
"folder" : "OlderVersion",
"row": reader.line_num
}
for file_ in dir_B_files:
f = open(file_, 'r')
reader = csv.reader(f)
header = next(reader)
for line in reader:
if ''.join(line) not in dir_B_dict.keys():
dir_B_dict[''.join(line)] = {
"record": line,
"file_name": os.path.basename(file_),
"folder" : "NewVersion",
"row": reader.line_num
}
aset = set()
for v in dir_A_dict.values():
aset.add(tuple(v['record']))
bset = set()
for v in dir_B_dict.values():
bset.add(tuple(v['record']))
in_a_not_b = aset - bset
in_b_not_a = bset - aset
diff = in_a_not_b.union(in_b_not_a)
record_ = []
for val in diff:
file_ = ''.join(val)
record_.append(file_)
# Writing dictionary values to a text file
with open("Report2.txt", 'w') as f:
for i in range(73488):
if record_[i] not in dir_A_dict.keys():
f.write('%s\n' % ', '.join(str(x)for x in dir_B_dict[record_[i]].values()))
else:
f.write('%s\n' % ', '.join(str(x)for x in dir_A_dict[record_[i]].values()))
# regular expression to capture contents of balanced brackets
location_regex = re.compile(r'\[([^\[\]]+)\]')
with open(r"C:\\Users\\Bilal\\Report2.txt", 'r') as fi:
# replaced brackets with quotes, pipe into file-like object
fo = io.StringIO()
fo.writelines(str(re.sub(location_regex, r'"\1"', line)) for line in fi)
# rewind file to the beginning
fo.seek(0)
# read transformed CSV into data frame
df = pd.read_csv(fo)
df.columns = ['Record', 'Filename', 'Folder', "Row"]
# print(df)
df.to_csv('Records2Arranged.csv')
No need to make youe life harder trying to translate the .ibpnb as it's just python code with some additional data.
To create an executable from the python code you can use use py2exe to make a single exe or use the cython with the embed option, I personally find it easier to work with, but it harder to just roll in.

Dynamically create CSV files for each section of a config.ini file

I am working on automating status reports using Python and the Jira API with a config.ini file.
My goal is to have a config file where users can supply a jql statement, more so just specifying a filter in Jira that shows the tickets that should be in the report. This along with a column set. The different column sets is to represent different sets of fields.
Example config file:
Here is the code I have so far. I basically have it right now breaking the report into sections based on the column set; general, risk, etc. I think it would be better to have the script create a new csv for each section it identifies in the configuration file; Section 1, Section 2, etc. One of the problems I am running into with how I have it currently is if both sections have the same column set, the first section gets overwritten.
for sectionName in userConfig_file.sections(): ##sectionName = Section 1 or Section 2
optionName = userConfig_file.options(sectionName)
#for optionName in userConfig_file.options(sectionName): ## optionName = columnSet or jql
# valueName = userConfig_file.get(sectionName, optionName)
#valueName = userConfig_file.get(sectionName,optionName)
if (optionName[0].__eq__("columnset")):
#print(userConfig_file.get(sectionName, optionName[0]))
#print(userConfig_file.get(sectionName, optionName[2]))
if (userConfig_file.get(sectionName,optionName[0]).__eq__("general")):
for i in generalFields:
generalHeaders.append(generalFields[i])
with open(generalSection,'w') as f:
csvwriter = csv.writer(f)
csvwriter.writerow(generalHeaders)
for results in jira.search_issues(userConfig_file.get(sectionName, optionName[2])):
generalSectionData = []
with open(generalSection, 'a') as f:
csvwriter = csv.writer(f)
generalSectionData.append(results) # issue key
generalSectionData.append(results.fields.status)
generalSectionData.append(results.fields.created)
generalSectionData.append(results.fields.updated)
generalSectionData.append(results.fields.priority)
generalSectionData.append(results.fields.summary)
generalSectionData.append(results.fields.issuetype)
csvwriter.writerow(generalSectionData)
elif (userConfig_file.get(sectionName,optionName[0]).__eq__("risk")):
for i in riskFields:
riskHeaders.append(riskFields[i])
with open(riskSection, 'w') as f:
csvwriter = csv.writer(f)
csvwriter.writerow(riskHeaders)
for results in jira.search_issues(userConfig_file.get(sectionName, optionName[2])):
riskSectionData = []
with open(riskSection, 'a') as f:
csvwriter = csv.writer(f)
riskSectionData.append(results) # issue key
riskSectionData.append(results.fields.status)
riskSectionData.append(results.fields.created)
riskSectionData.append(results.fields.updated)
riskSectionData.append(results.fields.priority)
riskSectionData.append(results.fields.summary)
riskSectionData.append(results.fields.issuetype)
csvwriter.writerow(riskSectionData)```
I wanted to get some ideas on how to have the script dynamically create separate csv files for each section. A user should be able to specify x amount of sections in the config file, example: 6 sections should create 6 csv files instead of being based by the columnset provided in the config.

How to combine YAML files in python?

I got some Kubernetes YAML files which I need to combine.
For that, I tried using Python.
The second file, sample.yaml, should be merged to the first file, source.yaml.
The source.yaml file has one section sample:, where the complete sample.yaml should be added.
I tried using the below code:
#pip install pyyaml
import yaml
def yaml_loader(filepath):
#Loads a yaml file
with open(filepath,'r')as file_descriptor:
data = yaml.load(file_descriptor)
return data
def yaml_dump(filepath,data):
with open(filepath,"w") as file_descriptor:
yaml.dump(data, file_descriptor)
if __name__ == "__main__":
file_path1 = "source"
data1 = yaml_loader(file_path1)
file_path2 = "sample.yaml"
with open(file_path2, 'r') as file2:
sample_yaml = file2.read()
data1['data']['sample'] = sample_yml
yaml_dump("temp.yml", data1)
This is creating a new file temp.yml but instead of line breaks, it is saving \n as strings:
How to fix this?
Your original YAML may have issues. If you use VS Code, format your YAML file. Click on the bottom of vscode(if using the same) [Spaces]
and select convert indentation to spaces
also, you can check if YAML module has any indentation property to be configured ,when loading the file

CSV new-line character seen in unquoted field error

the following code worked until today when I imported from a Windows machine and got this error:
new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
import csv
class CSV:
def __init__(self, file=None):
self.file = file
def read_file(self):
data = []
file_read = csv.reader(self.file)
for row in file_read:
data.append(row)
return data
def get_row_count(self):
return len(self.read_file())
def get_column_count(self):
new_data = self.read_file()
return len(new_data[0])
def get_data(self, rows=1):
data = self.read_file()
return data[:rows]
How can I fix this issue?
def upload_configurator(request, id=None):
"""
A view that allows the user to configurator the uploaded CSV.
"""
upload = Upload.objects.get(id=id)
csvobject = CSV(upload.filepath)
upload.num_records = csvobject.get_row_count()
upload.num_columns = csvobject.get_column_count()
upload.save()
form = ConfiguratorForm()
row_count = csvobject.get_row_count()
colum_count = csvobject.get_column_count()
first_row = csvobject.get_data(rows=1)
first_two_rows = csvobject.get_data(rows=5)
It'll be good to see the csv file itself, but this might work for you, give it a try, replace:
file_read = csv.reader(self.file)
with:
file_read = csv.reader(self.file, dialect=csv.excel_tab)
Or, open a file with universal newline mode and pass it to csv.reader, like:
reader = csv.reader(open(self.file, 'rU'), dialect=csv.excel_tab)
Or, use splitlines(), like this:
def read_file(self):
with open(self.file, 'r') as f:
data = [row for row in csv.reader(f.read().splitlines())]
return data
I realize this is an old post, but I ran into the same problem and don't see the correct answer so I will give it a try
Python Error:
_csv.Error: new-line character seen in unquoted field
Caused by trying to read Macintosh (pre OS X formatted) CSV files. These are text files that use CR for end of line. If using MS Office make sure you select either plain CSV format or CSV (MS-DOS). Do not use CSV (Macintosh) as save-as type.
My preferred EOL version would be LF (Unix/Linux/Apple), but I don't think MS Office provides the option to save in this format.
For Mac OS X, save your CSV file in "Windows Comma Separated (.csv)" format.
If this happens to you on mac (as it did to me):
Save the file as CSV (MS-DOS Comma-Separated)
Run the following script
with open(csv_filename, 'rU') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
print ', '.join(row)
Try to run dos2unix on your windows imported files first
This is an error that I faced. I had saved .csv file in MAC OSX.
While saving, save it as "Windows Comma Separated Values (.csv)" which resolved the issue.
This worked for me on OSX.
# allow variable to opened as files
from io import StringIO
# library to map other strange (accented) characters back into UTF-8
from unidecode import unidecode
# cleanse input file with Windows formating to plain UTF-8 string
with open(filename, 'rb') as fID:
uncleansedBytes = fID.read()
# decode the file using the correct encoding scheme
# (probably this old windows one)
uncleansedText = uncleansedBytes.decode('Windows-1252')
# replace carriage-returns with new-lines
cleansedText = uncleansedText.replace('\r', '\n')
# map any other non UTF-8 characters into UTF-8
asciiText = unidecode(cleansedText)
# read each line of the csv file and store as an array of dicts,
# use first line as field names for each dict.
reader = csv.DictReader(StringIO(cleansedText))
for line_entry in reader:
# do something with your read data
I know this has been answered for quite some time but not solve my problem. I am using DictReader and StringIO for my csv reading due to some other complications. I was able to solve problem more simply by replacing delimiters explicitly:
with urllib.request.urlopen(q) as response:
raw_data = response.read()
encoding = response.info().get_content_charset('utf8')
data = raw_data.decode(encoding)
if '\r\n' not in data:
# proably a windows delimited thing...try to update it
data = data.replace('\r', '\r\n')
Might not be reasonable for enormous CSV files, but worked well for my use case.
Alternative and fast solution : I faced the same error. I reopened the "wierd" csv file in GNUMERIC on my lubuntu machine and exported the file as csv file. This corrected the issue.

Categories

Resources