Problem: If you are working with millions of record in a CSV it is difficult to handle large sized file.
Solution: You can split the file into multiple smaller files according to the number of records you want in one file. Python helps to make it easy and faster way to split the file in microseconds.
For Example: Save this code in testsplit.py file:
import sys if(len(sys.argv) < 2): #if no file location is provided then show error message and exit print('Provide File Location') sys.exit() fil = sys.argv csvfilename = open(fil, 'r').readlines() #store header values header = csvfilename #remove header from list csvfilename.pop(0) file = 1 #Number of lines to be written in new file record_per_file = 50 for j in range(len(csvfilename)): if j % record_per_file == 0: write_file = csvfilename[j:j+record_per_file] #adding header at the start of the write_file write_file.insert(0, header) #write in file open(str(fil)+ str(file) + '.csv', 'w+').writelines(write_file) file += 1
You can run this file on the command line using the following command:
python testsplit.py test.csv
If test.csv is the name of the file you want to split then split files are named as test.csv1.csv, test.csv2.csv and so on.