How to create function to import CSV? - python

I'd like to create a function in Python to import CSV files from github, by just inputting the file name.
I tried the following code, but no dataframe is created. Can anyone give me a hand? Million thanks
import pandas as pd
def in_csv(file_name):
file = 'https://raw.githubusercontent.com/USER/file_name.csv'
file_name = pd.read_csv(file, header = 0)
in_csv('csv_name')

There are many ways to read à csv, but with à pandas.DataFrame:
import pandas as pd
def in_csv(file_name):
file_path = f'https://raw.githubusercontent.com/USER/{file_name}'
df = pd.read_csv(file_path, header = 0)
return df
df = in_csv('csv_name')
print(df.head())

Thanks #Alian and #Aaron
Just add '.csv' after {file_name} and work perfect.
import pandas as pd
def in_csv(file_name):
file_path = f'https://raw.githubusercontent.com/USER/{file_name}**.csv**'
df = pd.read_csv(file_path, header = 0)
return df
df = in_csv('csv_name')

In order to do this, you probably want to use a Python 3 F-string rather than a regular string. In this case, you would want to change your first line in the function to this:
file = f'https://raw.githubusercontent.com/USER/{file_name}.csv'
The f'{}' syntax uses the value of the variable or expression within the brackets instead of the string literal you included.

Related

How to select the Column from csv file from the folder?

I am trying to select "column 3" from my files and then combine them into one file. The issue is While I am combing the columns, they are not in the same pattern as files are in the folder. For Example, I have three files in the folder "First, Second and Third". My code given below is always reading the "Second" file before the "First" file. Can Anyone help me?
import glob
import pandas as pd
import numpy as np
from tqdm import tqdm
extension = 'dat'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
df = pd.DataFrame(np.nan, index = np.arange(1394521), columns = ["velocity-magnitude"])
for i,f in tqdm(enumerate(all_filenames)):
reader = pd.read_csv(f, sep=r"\s+")
col = reader.iloc[:,[3]]
frames = [df,col]
df = pd.concat(frames, axis=1,join="outer")
df.to_csv('combined.dat', mode='a', header = False, index = False)
glob.glob uses os.listdir internally. This explains the arbitrary order of the files. In case you want some specific sorting, then you will have to apply it yourself, e.g. using sorted(glob.glob('*.{}'.format(extension)).
Thanks NYC Coder, yeah this sorted function is the solution to my problem.

Python, how to add a new column in excel

I am having below file(file1.xlsx) as input. In total i am having 32 columns in this file and almost 2500 rows. Just for example i am mentioning 5 columns in screen print
I want to edit same file with python and want output as (file1.xlsx)
it should be noted i am adding one column named as short and data is a kind of substring upto first decimal of data present in name(A) column of same excel.
Request you to please help
Regards
Kawaljeet
Here is what you need...
import pandas as pd
file_name = "file1.xlsx"
df = pd.read_excel(file_name) #Read Excel file as a DataFrame
df['short'] = df['Name'].str.split(".")[0]
df.to_excel("file1.xlsx")
hello guys i solved the problem with below code:
import pandas as pd
import os
def add_column():
file_name = "cmdb_inuse.xlsx"
os.chmod(file_name, 0o777)
df = pd.read_excel(file_name,) #Read Excel file as a DataFrame
df['short'] = [x.split(".")[0] for x in df['Name']]
df.to_excel("cmdb_inuse.xlsx", index=False)

Convert list of multiple strings into a Python data frame

I have a list of string values I read this from a text document with splitlines. which yields something like this
X = ["NAME|Contact|Education","SMITH|12345|Graduate","NITA|11111|Diploma"]
I have tried this
for i in X:
textnew = i.split("|")
data[x] = textnew
I want to make a dataframe out of this
Name Contact Education
SMITH 12345 Graduate
NITA 11111 Diploma
You can read it directly from your file by specifying a sep argument to pd.read_csv.
df = pd.read_csv("/path/to/file", sep='|')
Or if you wish to convert it from list of string instead:
data = [row.split('|') for row in X]
headers = data.pop(0) # Pop the first element since it's header
df = pd.DataFrame(data, columns=headers)
you had it almost correct actually, but don't use data as dictionary(by using keys - data[x] = textnew):
X = ["NAME|Contact|Education","SMITH|12345|Graduate","NITA|11111|Diploma"]
df = []
for i in X:
df.append(i.split("|"))
print(df)
# [['NAME', 'Contact', 'Education'], ['SMITH', '12345', 'Graduate'], ['NITA', '11111', 'Diploma']]
Depends on further transformations, but pandas might be overkill for this kind of task
Here is a solution for your problem
import pandas as pd
X = ["NAME|Contact|Education","SMITH|12345|Graduate","NITA|11111|Diploma"]
data = []
for i in X:
data.append( i.split("|") )
df = pd.DataFrame( data, columns=data.pop(0))
In your situation, you can avoid to load the file using readlines and use pandas for take care about loading the file:
As mentioned above, the solution is a standard read_csv:
import os
import pandas as pd
path = "/tmp"
filepath = "file.xls"
filename = os.path.join(path,filepath)
df = pd.read_csv(filename, sep='|')
print(df.head)
Another approach (in such situation when you have no access to the file or you have to deal with a list of string) can be wrap the list of string as a text file, then load normally using pandas
import pandas as pd
from io import StringIO
X = ["NAME|Contact|Education", "SMITH|12345|Graduate", "NITA|11111|Diploma"]
# Wrap the string list as a file of new line
DATA = StringIO("\n".join(X))
# Load as a pandas dataframe
df = pd.read_csv(DATA, delimiter="|")
Here the result

How to read and query headers of txt using Pandas?

I am writing a script to read txt file using Pandas.
I need to query on particular type of hearders.
Reading excel is possible but i cannot read txt file.
import pandas as pd
#df=pd.read_excel('All.xlsx','Sheet1',dtype={'num1':str},index=False) #works
df=pd.read_csv('read.txt',dtype={'PHONE_NUMBER_1':str}) #doest work
array=['A','C']
a = df['NAME'].isin(array)
b = df[a]
print(b)
try using this syntax.
you are not using the correct key value
df=pd.read_csv('read.txt',dtype={'BRAND_NAME_1':str})
You can try this:
import pandas as pd
df = pd.read_table("input.txt", sep=" ", names=('BRAND_NAME_1'), dtype={'BRAND_NAME_1':str})
You can read file txt then astype for column.
Read file:
pd.read_csv('file.txt', names = ['PHONE_NUMBER_1', 'BRAND_NAME_1'])
names: is name of columns
Assign type:
df['PHONE_NUMBER_1'] = df['PHONE_NUMBER_1'].astype(str)

Quoting in CSV from Pandas

I am working on a project where I need to manipulate certain text files and write down as text files again. A sample file will look like
As you can see I have headers which are like "A". When I use the following code
import pandas as pd
df = pd.read_csv("Test doc.txt",sep =";")
df.to_csv("Output.txt",sep=";",index = None)
I get the output as
Now the headers are like A, the " are gone. How do I write the file in the exact same format as before?
I also tried
df.to_csv("Output.txt",sep=";",index = None, header = ["'A'","'B'","'C'"])
But this gives me
Now the header is 'A' but still not in the original format.
If I try
df.to_csv("Output.txt",sep=";",index = None, header = ['"A"','"B"','"C"'])
Now it looks like
import csv
df.to_csv("Output.txt",sep=";", index=None, quoting=csv.QUOTE_NONNUMERIC)
Change the default quote char.
df.to_csv("Output.txt", sep=';', index=None, quotechar="'", header=['"A"','"B"','"C"'])

Categories

Resources