Concatenating text files

Jamie Thomson has a post on concatenating multiple text files (using the ForEach Loop) and subsequently loading this concatenated file in one go. You can find the post here:

http://blogs.conchango.com/jamiethomson/archive/2006/06/22/4116.aspx

I use to do this with a small batch file. The batch file is called concat.bat and takes in the following parameters: %1 for the extension of the files to concatenate, e.g. txt and %2 for the folder in which the files are stored.

The body of the batch file is as follows:
set I=
for /F %%f in ('dir "%2"*.%1 /B /O:N /A:-D') do set I=%%f
if not defined I goto EXIT
copy "%2"*.%1 "%2"\backup
copy "%2"*.%1 "%2"%I:~0,5%concat.tmp
if exist "%2"%I:~0,5%concat.tmp del "%2"*.%1 > NUL
ren "%2"*concat.tmp *.%1
:EXIT
Example of the usage of the batch file:
The batch files is called with %1 = .txt and %2 = c:\temp. Say there are three files in c:\temp called test1.txt, test2.txt, and test3.txt with create dates 2006/01/01, 2006/01/02, and 2006/01/03 respectively. The result is one textfile called text1.txt containing the content of all three files.

Explanation of each line of code:

Set I=
dimensions the variable I.

for /F %%f in ('dir "%2"*.%1 /B /O:N /A:-D') do set I=%%f
iterates over all files in folder c:\temp with extension ".txt", ordered descending by create date. Each iteration "do set I=%%f" assigns the filename to variable I. After iteration I contains the value of the last filename in the list. This is the file created first according to create date, since the list is sorted decendantly.

if not defined I goto EXIT
checks if I has a value, if not, the batch ends. I.e. there where no .txt files in the folder.

copy "%2"*.%1 "%2"\backup
copies the three sourcefiles to a backup folder. This is done for auditing purposes.

copy "%2"*.%1 "%2"%I:~0,5%concat.tmp
concatenates all .txt files to one file with the filename stored in I. So, a file called test1concat.tmp is created, containing the union/concatenation of the three files.

if exist "%2"%I:~0,5%concat.tmp del "%2"*.%1 > NUL
does a check to ensure the .tmp file has been created and deletes the three source files, so the inbox in emptied.

ren "%2"*concat.tmp *.%1
renames the .tmp file to the final filename, in this case text1.txt.

EXIT:
exits the batch file.

Currently it only works if all files have the same filename length, so 5 (="textX") in the case of the example. Of course this can be optimized, e.g. search for the first dot and take the left part of the file name.

I have no idea if this approach is better with regard to performance. I haven't tested it on thousands of files and have not done a comparison to the approach used in the post by Jamie. On the other hand, this approach will also work nicely in DTS, where there is no ForEach Loop.

Note:
The "for /F" functionality is not available in all Windows versions. It is available in Windows 2003 server and Windows XP.

2 comments:

J said...

Hi, looking for how to concat the contents (all .txt files or .doc files) of a folder into a single .txt or .doc file, without specifying filenames. In MS-DOS!

Thanks, and keep up the good work,
J

Anonymous said...

Fantastic website I loved reading your info

[url=http://partyopedia.com]party supplies[/url]