I am trying to parse through a CSV file and extract few columns from the CSV.
ID | Code | Phase |FBB | AM | Development status | AN REMARKS | stem | year | IN -NAME |IN Year |Company
L2106538 |Rs124 | 4 | | | Unknown | | -pre- | 1982 | Domoedne | 1982 | XYZ
I would like to group and extract few columns for uploading them to different models.
For example I would like to group first 3 columns to a model, next two to a different model, first column and the 6, 7 to a different model and so on.
I also need to keep the header of the file and store the data as key value pair so that I would know which column should go for a particular field in a model.
This is what I have so far.
def group_header_value(file):
reader = csv.DictReader(open(file, 'r'))# to have the header and get the data as a key value pair.
all_result= []
for row in reader:
print row
all_result.append(row)
return all_result
def group_by_models(all_results):
MD = range(1,3) # to get the required cols.
for every_row in all_results:
contents = [(every_row[i] for i in MD)]
print contents
def handle(self, *args, **options):
database = options.get('database')
filename = options.get('filename')
all_results = group_header_value(filename)
print 'grouped_bymodel', group_by_models(all_results)
This is what I get when I try to get the contents
grouped_by model: at 0x7f9f5382e0f0>
at 0x7f9f5382e0a0>
at 0x7f9f5382e0f0>
Is there a different approach to extract particular columns in DictReader? how else can I extract required columns using DictReader. Thanks
(every_row[i] for i in MD) is a generator expression. The syntax for a generator expression is (mostly) the same as that for a list comprehension, except that a generator expression is enclosed by parentheses, (...), while a list comprehension uses brackets, [...].
[(every_row[i] for i in MD)] is a list containing one element, the generator expression.
To fix your code with minimal changes, remove the parentheses:
def group_by_models(all_results):
MD = range(1,3) # to get the required cols.
for every_row in all_results:
contents = [every_row[i] for i in MD]
print(contents)
You could also make group_by_models more reusable by making MD a parameter:
def group_by_models(all_results, MD=range(3)):
for every_row in all_results:
contents = [every_row[i] for i in MD]
print(contents)
Related
I have a file with data .dat where inside it, they have 3 columns with values referring to certain quantities, given the form:
apr.dat
| Mass density | Pressure | Energy density |
|:---- |:------:| -----:|
|2.700000e-02 |1.549166e-11|2.700000e-02 |
|2.807784e-02 |1.650004e-11|2.807784e-02 |
|2.919872e-02 |1.757406e-11|2.919872e-02 |
|3.036433e-02 |1.871798e-11|3.036433e-02 |
|3.157648e-02 |1.993637e-11|3.157648e-02 |
|3.283702e-02 |2.123406e-11|3.283702e-02 |
|3.414788e-02 |2.261622e-11|3.414788e-02 |
...
I just want to use the second and third column of data (without using the title). I was able to open the file using
data = open(r"C:\Users\Ramos\PycharmProjects\pythonProject\\apr.dat")
print(data.read())
And then, I tried to turn it into a list with the following code:
import numpy as np
data = open(r"C:\Users\Ramos\PycharmProjects\pythonProject\\apr.dat")
data2 = np.shape(data)
print(data2[1])
But when I tried to insert the numbers of column 2 and column 3 in a list, it gave an error. Is there an easier way to do this?
Thanks for any help.
I think there is no need at this point to use numpy.
with open(r"C:\Users\Ramos\PycharmProjects\pythonProject\apr.dat", 'r') as f:
reader = csv.reader(f, delimiter='\t')
next(reader) # skip 1st line
array = [[float(row[1]), float(row[2])] for row in reader] # skip column 0 (1st col)
EDIT: if you want separate lists for x and y:
x, y = list(zip(*array))
I want to fill the 'references' column in df_out with the 'ID' if the corresponding 'my_ID' in df_sp is contained in df_jira 'reference_ids'.
import pandas as pd
d_sp = {'ID': [1,2,3,4], 'my_ID': ["my_123", "my_234", "my_345", "my_456"], 'references':["","","2",""]}
df_sp = pd.DataFrame(data=d_sp)
d_jira = {'my_ID': ["my_124", "my_235", "my_346"], 'reference_ids': ["my_123, my_234", "", "my_345"]}
df_jira = pd.DataFrame(data=d_jira)
df_new = df_jira[~df_jira["my_ID"].isin(df_sp["my_ID"])].copy()
df_out = pd.DataFrame(columns=df_sp.columns)
needed_cols = list(set(df_sp.columns).intersection(df_new.columns))
for column in needed_cols:
df_out[column] = df_new[column]
df_out['Related elements_my'] = df_jira['reference_ids']
Desired output df_out:
| ID | my_ID | references |
|----|-------|------------|
| | my_124| 1, 2 |
| | my_235| |
| | my_346| 3 |
What I tried so far is list comprehension, but I only managed to get the reference_ids "copied" from a helper column to my 'references' column with this:
for row, entry in df_out.iterrows():
cpl_ids = [x for x in entry['Related elements_my'].split(', ') if any(vh_id == x for vh_id in df_cpl_list['my-ID'])]
df_out.at[row, 'Related elements'] = ', '.join(cpl_ids)
I can not wrap my head around on how to get the specific 'ID's on the matches of 'any()' or if this actually the way to go as I need all the matches, not something if there is any match.
Any hints are appreciated!
I work with python 3.9.4 on Windows (adding in case python 3.10 has any other solution)
Backstory: Moving data from Jira to MS SharePoint lists. (Therefore, the 'ID' does not equal the actual index in the dataframe, but is rather assigned by SharePoint upon insertion into the list. Hence, empty after running for the new entries.)
ref_df = df_sp[["ID","my_ID"]].set_index("my_ID")
df_out.references = df_out["Related elements_my"].apply(lambda x: ",".join(list(map(lambda y: "" if y == "" else str(ref_df.loc[y.strip()].ID), x.split(",")))))
df_out[["ID","my_ID","references"]]
output:
ID my_ID references
0 NaN my_124 1,2
1 NaN my_235
2 NaN my_346 3
what is map?
map is something like [func(i) for i in lst] and apply func on all variables of lst but in another manner that increase speed.
and you can read more about this: https://realpython.com/python-map-function/
but, there, our function is : lambda y: "" if y == "" else str(ref_df.loc[y.strip()].ID)
so, if y, or y.strip() there and just for remove spaces, is empty, maps to empty: "" if y == "" like my_234
otherwise, locate y in df_out and get corresponding ID, i.e maps each my_ID to ID
hope to be helpfull :)
I have been provided with a .csv file, which has data on covid19. It is in the form of:
district | country | date1 | date2 | date3 |etc
victoria | australia |1 case | 3 cases |7 cases | etc
It is a fairly large file, with 263 rows of countries/districts, and 150 columns of dates.
The program needs to be able to take in an input district, country, and date and print out the number of COVID cases in that location as of that date. (print the value of a specified row and column of a CSV file)
We have been instructed not to use the CSV module or the pandas module. I am having trouble understanding where to start. I will add my attempted solutions to this question as I go along. Not looking for a complete solution,but any ideas that I could try would be appreciated.
This is what I finally did to solve it. It works perfectly. for reference the data file I am using is : https://portland-my.sharepoint.com/:x:/g/personal/msharma8-c_ad_cityu_edu_hk/ES7eUlPURzxOqTmRLmcxVEMBtemkKQzLcKD6U6SlbX2-_Q?e=tc5aJF
# for the purpose of this answer I preset the country, province, and date
country = 'Australia'
province = 'New South Wales'
date = '3/10/2020'
with open('covid19.csv', 'r') as f:
final_list = []
list0 = f.readline().split(',')
for line in f:
if line.split(',')[0] == province:
final_list = line.split(',')
dict1 = dict(zip(list0,final_list))
print dict1[date]
I will use the same logic to finish the solution.
This question already has answers here:
Printing Lists as Tabular Data
(20 answers)
Closed 3 years ago.
I want to make a table in python
+----------------------------------+--------------------------+
| name | rank |
+----------------------------------+--------------------------+
| {} | [] |
+----------------------------------+--------------------------+
| {} | [] |
+----------------------------------+--------------------------+
But the problem is that I want to first load a text file that should contain domains name and then I would like to making a get request to each domain one by one and then print website name and status code in table format and table should be perfectly align. I have completed some code but failed to display output in a table format that should be in perfectly align as you can see in above table format.
Here is my code
f = open('sub.txt', 'r')
for i in f:
try:
x = requests.get('http://'+i)
code = str(x.status_code)
#Now here I want to display `code` and `i` variables in table format
except:
pass
In above code I want to display code and i variables in table format as I showed in above table.
Thank you
You can achieve this using the center() method of string. It creates and returns a new string that is padded with the specified character.
Example,
f = ['AAA','BBBBB','CCCCCC']
codes = [401,402,105]
col_width = 40
print("+"+"-"*col_width+"+"+"-"*col_width+"+")
print("|"+"Name".center(col_width)+"|"+"Rank".center(col_width)+"|")
print("+"+"-"*col_width+"+"+"-"*col_width+"+")
for i in range(len(f)):
_f = f[i]
code = str(codes[i])
print("|"+code.center(col_width)+"|"+_f.center(col_width)+"|")
print("+"+"-"*col_width+"+"+"-"*col_width+"+")
Output
+----------------------------------------+----------------------------------------+
| Name | Rank |
+----------------------------------------+----------------------------------------+
| 401 | AAA |
+----------------------------------------+----------------------------------------+
| 402 | BBBBB |
+----------------------------------------+----------------------------------------+
| 105 | CCCCCC |
+----------------------------------------+----------------------------------------+
Hi I'm totally new to Python but am hoping someone can show me the ropes.
I have a csv reference table which contains over 1000 rows with unique Find values, example of reference table:
|Find |Replace |
------------------------------
|D2-D32-dog |Brown |
|CJ-E4-cat |Yellow |
|MG3-K454-bird |Red |
I need to do a find and replace of text in another csv file. Example of Column in another file that I need to find and replace (over 2000 rows):
|Pets |
----------------------------------------
|D2-D32-dog |
|CJ-E4-cat, D2-D32-dog |
|MG3-K454-bird, D2-D32-dog, CJ-E4-cat |
|T2- M45 Pig |
|CJ-E4-cat, D2-D32-dog |
What I need is for python to find and replace, returning the following, and if no reference, return original value:
|Expected output |
---------------------
|Brown |
|Yellow, Brown |
|Red, Brown, Yellow |
|T2- M45 Pig |
|Yellow, Brown |
Thanking you in advance.
FYI - I don't have any programming experience, usually use Excel but was told that Python will be able to achieve this. So I have given it a go in hope to achieve the above - but it's returning invalid syntax error...
import pandas as pd
dfRef1 = pd.read_csv(r'C:\Users\Downloads\Lookup.csv')
#File of Find and Replace Table
df= pd.read_csv(r'C:\Users\Downloads\Data.csv')
#File that contains text I want to replace
dfCol = df['Pets'].tolist()
#converting Pets column to list from Data.csv file
for x in dfCol:
Split = str(x).split(',')
#asking python to look at each element within row to find and replace
newlist=[]
for index,refRow in dfRef1.iteritems():
newRow = []
for i in Split:
if i == refRow['Find']:
newRow.append(refRow['Replace']
else
newRow.append(refRow['Find'])
newlist.append(newRow)
newlist
#if match found replace, else return original text
#When run, the code is Returning - SyntaxError: invalid syntax
#I've also noticed that the dfRef1 dtype: object
Am I even on the right track? Any advise is greatly appreciated.
I understand the concept of Excel VLookup, however, because the cell value contains multiple lookup items which i need to replace within the same cell, I'm unable to do this in Excel.
Thanks again.
You can save the excel file as CSV to make your life easier
then strip your file to contain only the table without any unnecessary information.
load the CSV file to python with pandas:
import pandas as pd
df_table1 = pd.read_csv("file/path/filename.csv")
df_table2 = pd.read_csv("file/path/other_filename.csv")
df_table1[['wanted_to_be_replaced_col_name']] = df_table2[['wanted_col_to_copy']]
for further informaion and more complex assignment go visit the pandas documentaion # https://pandas.pydata.org/
hint: for large amount of columns check the iloc function