Generate JSON file from file names and their content

Generate JSON file from file names and their content - python

I'm new to python, and couldn't find a close enough answer to make me figure it out. I'm trying generate a single json file that contains current directory file names that end with a .txt extension as nodes, and the contents of those files as a list inside the file name's node.
for example:
node1.txt contains
foo
bar
and node2.txt contains
test1
test2
the output should look like:
{
"node1": [
"foo",
"bar"
],
"node2": [
"test1",
"test2"
]
}

Use pathlib and json modules and a simple loop...
import pathlib
import json
data = {}
for node in pathlib.Path('.').glob('node*.txt'):
with open(node, 'r') as fp:
data[node.stem] = [line.strip() for line in fp.readlines()]
print(json.dumps(data, indent=4))
Output:
{
"node1": [
"foo",
"bar"
],
"node2": [
"test1",
"test2"
]
}

Related

Removing JSON element of array in Python with same file descriptor

I have a JSON file that has an array of objects like this:
{
"array": [
{
"foo1": "bar",
"spam1": "eggs"
},
{
"foo2": "bar",
"spam2": "eggs"
},
{
"foo3": "bar",
"spam3": "eggs"
}
]
}
And what I'm trying to do in Python is to read a JSON file, then remove an element of an array and then write the contents back to the file. I expect the file to be exactly the same, just without that element, but the problem is that when I write the contents back, they are corrupted in a weird way.
When I run this code:
import json
CONTENTS = {
"array": [
{
"foo1": "bar",
"spam1": "eggs"
},
{
"foo2": "bar",
"spam2": "eggs"
},
{
"foo3": "bar",
"spam3": "eggs"
}
]
}
# Write that object to file
with open("file.json", "w") as file:
json.dump(CONTENTS, file, indent=2)
# You can check here to see the file
input()
# Modify the file
with open("file.json", "r+") as file:
contents = json.load(file)
file.seek(0)
print(contents)
del contents["array"][-1] # Delete the last object of the array
print(contents)
json.dump(contents, file, indent=2)
The file after the second open is exactly like this:
{
"solves": [
{
"foo1": "bar",
"spam1": "eggs"
},
{
"foo2": "bar",
"spam2": "eggs"
}
]
}{
"foo3": "bar",
"spam3": "eggs"
}
]
}
As I said, I was expecting the file to be the same, just without the last object of the array, but instead it is... wrong.
Am I actually doing something wrong? I had no problem changing an object's field or appending an object to that same array in the same with block or with the same file descriptor.
My questions are: What am I doing wrong? Is the problem the fact that I read AND write to the file? How can I fix it, besides doing this:
with open("file.json", "r+") as file:
contents = json.load(file)
del contents["array"][-1] # Delete the last object of the array
with open("file.json", "w") as file:
json.dump(contents, file, indent=2)

After you overwrite the file you need to truncate it to remove the excess JSON at the end.
with open("file.json", "r+") as file:
contents = json.load(file)
file.seek(0)
print(contents)
del contents["array"][-1] # Delete the last object of the array
print(contents)
json.dump(contents, file, indent=2)
file.truncate()

How can i read through each line of a file and append it to a json file in python?

I've been trying to figure out a way to store proxy data in a json form, i know the easier way is to just take each proxy from the text box and save it to the file and then to access it i would just load the information from the file but i want to have groups that work with different types of IP's. Say for example one group uses the proxy IP from a certain provider and another group would use an IP from a different one, i would need to store the IP's in their respected groups which is why i think i need to create a json file to store each of the proxies in their own json array. What i'm having trouble with is adding the IP's to the json array as i am trying to loop over a transfer file with the IP's in them and then add it to the json array. As of now i tried this,
def save_proxy():
proxy = pooled_data_default.get('1.0', 'end-2c')
transfer_file = open('proxies.txt', 'w')
transfer_file.write(proxy)
transfer_file.close()
transfer_file1 = open('proxies.txt', 'r')
try:
with open('proxy_groups.txt', 'r+') as file:
proxy_group = json.load(file)
except:
proxy_group = []
total = []
for line in transfer_file1:
line = transfer_file1.readline().strip()
total.append(line)
proxy_group.append({
'group_name': pool_info.get(),
'proxy': [{
'proxy': total,
}]
}),
with open('proxy_groups.txt', 'w') as outfile:
json.dump(proxy_group, outfile, indent=4)
This doesn't work but it was my attempt at taking each line from the file and adding it to the json array dynamically. Any help is appreciated.
EDIT: this is what is being outputted:
[
{
"group_name": "Defualt",
"proxy": [
{
"proxy": [
"asdf",
""
]
}
]
}
]
This was the input
wdsa
asdf
sfs
It seems that it is only selecting the middle one of the 3. I thought that printing the list of them would work but it is still printing the middle and then a blank space at the end.
An example of my data is the input to the text box may be
wkenwwins:1000:username:password
uwhsuh:1000:username:password
2ewswsd:1000:username:password
gfrfccv:1000:username:password
the selected group which i may want to save this to could be called 'Default'. I select default and then clicking save should add these inputs to the seperate txt sheet called 'proxies.txt', which it does. From the text sheet i then want to loop through each line and append each line to the json data. Which it doesnt do, here it was i expect it to look like in json data
[
{
"group_name": "Defualt",
"proxy": [
{
"proxy": [
'ewswsd:1000:username:password',
'wkenwwins:1000:username:password',
'uwhsuh:1000:username:password'
]
}
]
}
]
So then say if i made 2 groups the json data txt file should look like this:
[
{
"group_name": "Defualt",
"proxy": [
{
"proxy": [
'ewswsd:1000:username:password',
'wkenwwins:1000:username:password',
'uwhsuh:1000:username:password'
]
}
]
}
]
[
{
"group_name": "Test",
"proxy": [
{
"proxy": [
'ewswsd:1000:username:password',
'wkenwwins:1000:username:password',
'uwhsuh:1000:username:password'
]
}
]
}
]
This is so i can access each group by only calling the group name.

You can simplify the save_proxy() as below:
def save_proxy():
proxy = pooled_data_default.get('1.0', 'end-1c')
# save the proxies to file
with open('proxies.txt', 'w') as transfer_file:
transfer_file.write(proxy)
# load the proxy_groups if exists
try:
with open('proxy_groups.txt', 'r') as file:
proxy_group = json.load(file)
except:
proxy_group = []
proxy_group.append({
'group_name': pool_info.get(),
'proxy': proxy.splitlines()
})
with open('proxy_groups.txt', 'w') as outfile:
json.dump(proxy_group, outfile, indent=4)
The output file proxy_groups.txt would look like below:
[
{
"group_name": "default",
"proxy": [
"wkenwwins:1000:username:password",
"uwhsuh:1000:username:password"
]
}
]

How i can parse the first information in to CSV?

I'm trying to parse more than 100 json files, but i do not need all the info.
i only need to parse the first set of the 'coordinates', the CSV already have printed URL and URL type, but i cannot print the first set of coordinates.
this is a section of the Json file
{
"type":"featureCollection",
"features" : [
{
"type": "feature",
"geometry": {
"type": "multilinestring",
"coordinates":[
[
[
148.9395348,
-21.3292286
],
[
148.93963,
-21.33001
],
[
148.93969,
-21.3303
]
]
]
},
"properties":{
"url" :"www.thiswebpageisfake.com",
"url_type":"fake"
},
"information":{
"timestamp":"10/10/19"
}
}]
}
i'm using python 2.7, i have tried creating an array for coordinates but i have a type error
import os
import csv
import json
import sys
reload(sys)
file_path = 'C:\\Users\\user\\Desktop\\Python\\json'
dirs = os.listdir(file_path)
file_out = 'C:\\Users\\user\\output.csv'
f = csv.writer(open(file_out, "wb+"))
f.writerow(
['url','url_type','lat','long'])
for file in dirs:
json_dict = json.loads(open(os.path.join(file_path, file)).read())
print file
for key in json_dict['features']:
for key1 in key:
description = key['properties']['description']
if description is None:
description = 'null'
array = ()
array = (key['geometry']['type']['coordinates'])
f.writerow([file,
key['properties']['url'],
key['properties']['url_type'],
array[1]
])
print 'completed'

Firstly, it looks like your second loop is supposed to be nested in the first, otherwise you'll do nothing with all json files except the last and only end up processing one file.
Secondly, your array should be defined as array = (key['geometry']['coordinates']), as 'coordinates' is not contained in 'type'.

Python split key values

I have a Json file which I'm parsing data and it's generated output is in output.txt. At this moment, after the output.txt is generated i'm reading output.txt line by line. Splitting each line and then deleting first two column.
("\t".join(line.split()[2:]) + "\n")
How can I get the same result from for loop shared below?
Expected output project_name + Files_name.
script.py
import json
x = json.load(open('data.json'))
for sub_dict in x['changed']:
print('project_name', sub_dict['project_name'])
for entry in sub_dict['added_commits']:
print (entry['File_Names'])
data.json
{
"changed": [
{
"prev_revision": "a09936ea19ddc9f69ed00a7929ea81234af82b95",
"added_commits": [
{
"File_Names": [
"115\t0\t1/src/hello.cpp",
"116\t0\t1/src/hell1o.cpp"
],
}
],
"project_name": "android/hello"
},
{
"prev_revision": "a09936ea19ddc9f69ed00a7929ea81234af82b95",
"added_commits": [
{
"File_Names": [
"41\t1\t1/src/hello1.cpp"
],
}
],
"project_name": "android/helloworld"
}
]
}
output.txt
115 0 1/src/hello.cpp
116 0 1/src/hell1o.cpp
41 1 1/src/hello1.cpp
expected output.txt
android/hello/src/hello.cpp
android/hello/src/hell1o.cpp
android/helloworld/src/hello1.cpp

This will do the trick
import json
import re
with open('data.json') as f:
x = json.load(f)
for sub_dict in x['changed']:
proj = sub_dict['project_name']
for entry in sub_dict['added_commits']:
for name in entry['File_Names']:
n = re.findall(r'(?:\s*\d+\s*\d+\s*\d+)(\/.*)', name)[0]
print( proj + n)
Note the use of with to open the file which will also close it afterwards.
I used regex to make this more robust, this will get anything of the from numbers numbers numbers/stuff_to_match

You can iterate through the sub-lists like this:
for d in x['changed']:
for c in d['added_commits']:
for f in c['File_Names']:
print(d['project_name'] + f.split('\t')[2][1:])
This outputs:
android/hello/src/hello.cpp
android/hello/src/hell1o.cpp
android/helloworld/src/hello1.cpp

Python script to surgically replace one JSON field by another

At the moment I'm working with a large set of JSON files of the following form:
File00, at time T1:
{
"AAA": {
"BBB": {
"000": "value0"
},
"CCC": {
"111": "value1",
"222": "value2",
"333": "value3"
},
"DDD": {
"444": "value4"
}
}
It's the situation that now I have a new input for the sub-field "DDD", I'd like to "wholesale" replace it with the following:
"DDD": {
"666": "value6",
"007": "value13"
}
Accordingly the file would be changed to:
File00, at time T2:
{
"AAA": {
"BBB": {
"000": "value0"
},
"CCC": {
"111": "value1",
"222": "value2",
"333": "value3"
},
"DDD": {
"666": "value6",
"007": "value13"
}
}
In the situation I'm confronted with, there are many files similar to File00, so I'm endeavoring to create a script that can process all the files in a particular directory, identifying the JSON field DDD and replace it's contents with something new.
How to do this in Python?

Here are the steps I took for each file:
Read the json
Convert it to a Python dict
Edit the Python dict
Convert it back to json
Write it to the file
Repeat
Here is my code:
import json
#list of files.
fileList = ["file1.json", "file2.json", "file3.json"]
for jsonFile in fileList:
# Open and read the file, then convert json to a Python dictionary
with open(jsonFile, "r") as f:
jsonData = json.loads(f.read())
# Edit the Python dictionary
jsonData["AAA"]["DDD"]={"666": "value6","007": "value13"}
# Convert Python dictionary to json and write to the file
with open(jsonFile, "w") as f:
json.dump(jsonData, f)
Also, I got code for iterating through a directory from here. You probably want something like this:
import os
directory = os.fsencode(directory_in_str)
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".json"):
fileList.append(filename)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generate JSON file from file names and their content - python

Related

Removing JSON element of array in Python with same file descriptor

How can i read through each line of a file and append it to a json file in python?

How i can parse the first information in to CSV?

Python split key values

Python script to surgically replace one JSON field by another

Categories

Resources