Thursday, June 3, 2010

Converting Unicode to ASCII / Breaking Up Large Files

Here are a couple of quick Windows tips that are useful when you are dealing with large data files.


Unicode to ASCII


Many popular software programs generate results as a Unicode-encoded file. Here is a simple way to convert that file to ASCII.

At a command prompt enter the following command:
TYPE [unicodefile] > [asciifile]

Split large files


Here is a short command file to break up a large file into multiple smaller ones. This works best with data files, where the row order is not important.


This process works by shuffling the rows in the large file into the smaller ones. For example, if you split a file into two smaller ones, the command puts the first line into file 1, the second line into file 2, the third into file 1, the fourth into file2, etc.


Open notepad and type in the following:
@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
for /f "tokens=* delims=" %%i in (%1) do (set /a x+=1 & set /a n=!x!%% %2 & echo %%i >> !n!_%1)
Save the file as splitfile.cmd.


The command sytax is:
splitfile.cmd [large file],[number of smaller files]

For example:

splitfile lotsofdata.txt, 5


will take the data in lotsofdata.txt and put it into smaller files named:
1_lotsofdata.txt
2_lotsofdata.txt
3_lotsofdata.txt
4_lotsofdata.txt
5_lotsofdata.txt

Wednesday, May 5, 2010

IBM EAS Relationship Resolution Tip - Eliminate Pipeline Transport Deadlocks

(This tip is based on a system running IBM EAS Relationship Resolution version 4.2 on a Windows server using SQL Server 2005 for the RDBMS.)

I usually set up EAS with the pipelines running as a Windows service, using database tables for the input and output transports. The one issue with this configuration, though, is that when a deadlock occurs on the input transport table, the pipeline service will stop. To eliminate the deadlocks, execute the following SQL command:

sp_indexoption '[input transport table]','DisAllowPageLocks',1

where [input transport table]is the SQL server table containing the input UMF.

Normally the pipeline services do page locking on the input transport table when they mark records for processing, and when they delete the records after processing. If two services are trying to lock the same set of pages, a deadlock situation may occur. This command forces the pipeline services to use row level locking, rather than page locking. Since no two pipelines will process the same rows, no deadlock occurs, and the pipeline services continue running.

Wednesday, March 24, 2010

IBM EAS Relationship Resolution Tip - Clear Data, Not Configuration

There will be times when you will be testing EAS setup and rules for your project and will want to clear all the subject data out of the system while retaining the current configuration.  Rather than truncating individual tables, you can use SUIT to clear out the database.

Obligatory Warning
This command will clear out all subject data in your system, with no "Are you sure?" warning.  Unless you only have a handful of identities that you can submit again, it would be in your best interest to back up the RELRES database before executing.

The SUIT command is located in [EAS install directory]\suit\sql.  The command to delete identity data is:

suit -t [dbms] -s [server] -d [database] -u [username] -p [password] [RELRES schema] internal-clean -auto

where [dbms] is one of the following: 
  • oracle
  • mssql2000
  • mssql2005
  • db2
  • zdb2

and [RELRES schema] is either RELRES.NO_HIST or RELRES, depending on your initial installation.

The rest of the parameters should be self-explanatory.  You can see the full set of parameters by running SUIT with no parameters.