Thursday, June 3, 2010

Converting Unicode to ASCII / Breaking Up Large Files

Here are a couple of quick Windows tips that are useful when you are dealing with large data files.


Unicode to ASCII


Many popular software programs generate results as a Unicode-encoded file. Here is a simple way to convert that file to ASCII.

At a command prompt enter the following command:
TYPE [unicodefile] > [asciifile]

Split large files


Here is a short command file to break up a large file into multiple smaller ones. This works best with data files, where the row order is not important.


This process works by shuffling the rows in the large file into the smaller ones. For example, if you split a file into two smaller ones, the command puts the first line into file 1, the second line into file 2, the third into file 1, the fourth into file2, etc.


Open notepad and type in the following:
@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
for /f "tokens=* delims=" %%i in (%1) do (set /a x+=1 & set /a n=!x!%% %2 & echo %%i >> !n!_%1)
Save the file as splitfile.cmd.


The command sytax is:
splitfile.cmd [large file],[number of smaller files]

For example:

splitfile lotsofdata.txt, 5


will take the data in lotsofdata.txt and put it into smaller files named:
1_lotsofdata.txt
2_lotsofdata.txt
3_lotsofdata.txt
4_lotsofdata.txt
5_lotsofdata.txt