CSV forgetting some columns when added to ArcGIS Pro - python

In the first photo []. I have all the columns I have desired (which the example I will talking about is Project Scope) in an Excel CSV file. When I go to add the the CSV into ArcPro (second photo) ArcGIS Pro drops the Project Scope column for some reason and it should be in between the Project Phase column and the Construction Finish column []. The CSV was generated using a python script that used Pandas function so that may be a lead on the case.
Has anyone encountered this before? Advice would be appreciated!
I tried creating a python script with pandas that would create an dictionary to bracket the columns(A:O) which Project Scope would be included but this came up to no avail and my issue remained. Here is the code I have
import arcpy
import pandas as pd
import csv
#project plan to csv
df = pd.read_excel(r'original.xlsx', usecols="A:O")
#convert to csv
df.to_csv(r'new.csv')
df = pd.read_csv(r'new.csv')
#generate key field
df["unique_id"] = df["City"] + "<>" + df["Project Status"] + '-' + df["Project Phase"]
df.to_csv(r'new.csv')
#isolate unique id column between new and old columns
with open(r'new.csv', 'w') as infile:
reader = csv.reader(infile)
newlist = [rows[15] for rows in reader]
with open(r'old.csv', 'w') as infile:
reader = csv.reader(infile)
oldist = [rows[15] for rows in reader]
I tried deleting some unnecessary columns in the CSV before adding the CSV into ArcGIS Pro, which showed the Project Scope field, but dropped other columns that are of importance.

Project scope has line breaks within. I bet that is causing problems. Look at the actual csv file you are trying to load in a text editor.

Related

Python script to check csv columns for empty cells that will be used with multiple excels

I've done research and can't find anything that has solved my issue. I need a python script to read csv files using a folder path. This script needs to check for empty cells within a column and then display a popup statement notifying users of the empty cells. Anything helps!!
Use the pandas library
pip install pandas
You can import the excel file as a DataFrame and check each cell with loops.
A simple example using Python csv module:
cat test.csv
1,,test
2,dog,cat
3,foo,
import csv
with open('test.csv') as csv_file:
empty_list = []
c_reader = csv.DictReader(csv_file, fieldnames=["id", "fld_1", "fld_2"])
for row in c_reader:
row_dict = {row["id"]: item for item in row if not row[item]}
if row_dict:
empty_list.append(row_dict)
empty_list
[{'1': 'fld_1'}, {'3': 'fld_2'}]
This example assumes that there is at least one column that will always have a value and is the equivalent of a primary key. You have not mentioned what client you will be running this in. So it is not possible at this time to come up with a code example that presents this to the user for action.
Hello I think it's quite easy to solve with pandas:
import pandas as pd
df = pd.read_csv('<path>')
df.describe() # to just see empoty stuff
np.where(pd.isnull(df)) # to show indexes of empty celss -> from https://stackoverflow.com/questions/27159189/find-empty-or-nan-entry-in-pandas-dataframe
alternatively you can read the file and check line by line for empty cells

Pandas read_csv incorrectly naming columns

I am trying to import a Leukemia gene expression data set found at https://www.kaggle.com/brunogrisci/leukemia-gene-expression-cumida. This data set has a lot of columns (22285) and the columns imported towards the end have an incorrect name. For example the last column named AFFX-r2-P1-cre-3_at is actually called 217005_at in the csv file. The image below shows my juypter notebook cells. I am not sure why it is being formatted this way? Any help would be greatly appreciated.
Evidently the CSV file has column names that start with 'AFFX-r2-P1' -- it's not a pandas issue. Using the built-in csv package shows:
import csv
from pathlib import Path
data_file = Path('../../../Downloads/Leukemia_GSE9476.csv')
with open(data_file, 'rt') as lines:
csv_file = csv.reader(lines)
fields = next(csv_file)
#
[
(field_number, field)
for field_number, field in enumerate(fields)
if field.startswith('AFFX-r2-P1')
]
The output is:
[(22277, 'AFFX-r2-P1-cre-3_at'), (22278, 'AFFX-r2-P1-cre-5_at')]

Grab values from seperate csv file and replace the values of columns in a pipe delimited file

Trying to whip this out in python. Long story short I got a csv file that contains column data i need to inject into another file that is pipe delimited. My understanding is that python can't replace values, so i have to re-write the whole file with the new values.
data file(csv):
value1,value2,iwantthisvalue3
source file(txt, | delimited)
value1|value2|iwanttoreplacethisvalue3|value4|value5|etc
fixed file(txt, | delimited)
samevalue1|samevalue2| replacedvalue3|value4|value5|etc
I can't figure out how to accomplish this. This is my latest attempt(broken code):
import re
import csv
result = []
row = []
with open("C:\data\generatedfixed.csv","r") as data_file:
for line in data_file:
fields = line.split(',')
result.append(fields[2])
with open("C:\data\data.txt","r") as source_file, with open("C:\data\data_fixed.txt", "w") as fixed_file:
for line in source_file:
fields = line.split('|')
n=0
for value in result:
fields[2] = result[n]
n=n+1
row.append(line)
for value in row
fixed_file.write(row)
I would highly suggest you use the pandas package here, it makes handling tabular data very easy and it would help you a lot in this case. Once you have installed pandas import it with:
import pandas as pd
To read the files simply use:
data_file = pd.read_csv("C:\data\generatedfixed.csv")
source_file = pd.read_csv('C:\data\data.txt', delimiter = "|")
and after that manipulating these two files is easy, I'm not exactly sure how many values or which ones you want to replace, but if the length of both "iwantthisvalue3" and "iwanttoreplacethisvalue3" is the same then this should do the trick:
source_file['iwanttoreplacethisvalue3'] = data_file['iwantthisvalue3]
now all you need to do is save the dataframe (the table that we just updated) into a file, since you want to save it to a .txt file with "|" as the delimiter this is the line to do that (however you can customize how to save it in a lot of ways):
source_file.to_csv("C:\data\data_fixed.txt", sep='|', index=False)
Let me know if everything works and this helped you. I would also encourage to read up (or watch some videos) on pandas if you're planning to work with tabular data, it is an awesome library with great documentation and functionality.

How to update columns of a CSV file if row exists, else how to append to same CSV, using temporary file

I am stuck at trying to build a database using a CSV file.
I am using input of symbols (stock market tickers), and I am able to generate website links for each symbol, corresponding to the company's website.
I would like to save that database to a CSV file named BiotechDatabase.csv
The Database look
Every time I input a new symbol in Python, I would like to verify the first column of the CSV file to see if the symbol exists. If it does, I need to overwrite the Web column to make sure it is updated.
If the symbol does not exist, a row will need to be appended containing the symbol and the Web.
Since I need to expand the columns to add more information in the future, I need to use DictWriter as some columns might have missing information and need to be skipped.
I have been able to update information for a symbol if the symbol is in the database using the code below:
from csv import DictWriter
import shutil
import csv
#Replacing the symbol below with the any stock symbol I want to get the website for
symbol = 'PAVM'
#running the code web(symbol) generates the website I need for PAVM and that is http://www.pavmed.com which I converted to a string below
web(symbol)
filename = 'BiotechDatabase.csv'
tempfile = NamedTemporaryFile('w', newline='', delete=False)
fields = ['symbol','Web']
#I was able to replace any symbol row using the code below:
with open(filename, 'r', newline='') as csvfile, tempfile:
reader = csv.DictReader(csvfile, fieldnames=fields)
writer = csv.DictWriter(tempfile, fieldnames=fields)
for row in reader:
if row['symbol'] == symbol:
print('adding row', row['symbol'])
row['symbol'], row['Web']= symbol, str(web(symbol))
row = {'symbol': row['symbol'], 'Web': row['Web']}
writer.writerow(row)
shutil.move(tempfile.name, filename)
If the symbol I entered in Python doesn't exist however in the CSV file, how can I append a new row in the CSV file at the bottom of the list, without messing with the header, and while still using a temporary file?
Since the tempfile I defined above uses mode 'w', do I need to create another temporary file that allows mode 'a' in order to append rows?
You can simplify your code dramatically using the Pandas python library.
Note: I do not know how the raw data looks like so you might need to do some tweaking in order to get it to work, please feel free to ask me more about the solution in the comments.
import pandas as pd
symbol = 'PAVM'
web(symbol)
filename = 'BiotechDatabase.csv'
fields = ['symbol', 'Web']
# Reading csv from file with names as fields
df = pd.read_csv(filename, names=fields)
# Pandas uses the first column automatically as index
df.loc[symbol, 'Web'] = web(symbol)
# Saving back to filename and overwrites it - Be careful!
pd.to_csv(filename)
There might be some faster ways to do that but this one is very elegant.

Output of terminal to a csv with separate columns in python

my code goes as follows:
import csv
with open('Remarks_Drug.csv', newline='', encoding ='utf-8') as myFile:
reader = csv.reader(myFile)
for row in reader:
product = row[0].lower()
filename = row[1]
product_patterns = ', '.join([i.split("+")[0].strip() for i in product.split(",")])
print(product_patterns, filename)
which outputs as below: (where film-coated tab should be one column and the filename should be another column)
film-coated tablet RECD outcome AUBAGIO IAIN-21 AoR.txt
solution for injection 093 Acceptance NO Safety profil.txt
I want to output this to a csv file with one column as product_patterns and another as filename. I wrote the below code but only the last row gets appended. Can anyone please help me with the looping here. The code I wrote is:
with open ('drug_output.csv', 'a') as csvfile:
fieldnames = ['product_patterns', 'filename']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow({'product_patterns':product_patterns, 'filename':filename})
enter image description here
Depending on the environment that you can use, it might be more practical to use more dedicated programs to solve your problem.
Especially the pandas package seems useful in your case.
Then you can load the csv using:
import pandas as pd
df=pd.read_csv(file_path)
After doing the necessary manipulations, you can save it again with
df.to_csv(file_path)
This will save you a lot of issues that typically occur when parsing line by line, and it should also increase performance a bit. Pandas is a pretty good package to learn anyway if you need to do some data manipulation.

Categories

Resources