cobol2j project

There are still mainframes, vms and as400s arround us plugged into a power. The aim of the project is to facilitate reading their data files stored in EBCDIC format and give you possibility to do with them what you want.

cobol2j is a java library written to read original EBCDIC data file which comes from COBOL or RPG system. It also reads files already converted from EBCDIC to ASCII. The advantage of cobol2j is that it reads data types like: numeric fields ( signed and unsigned ), packed decimals, packed date and text fields. Those fields come from ancient 70s times when people saved work and paper. Paper? Yes, data came to 70s-puters on a paper and it was reasonable to pack two digits on one byte in BCD format to save it.

My first cobol2j application was an programatic interface to another system. Another application was to present some AS/400 data on the web page. cobol2j gives you mainframe data as Java Bean. Then you can create pdf, excel file, generate sql statements to import data to a database or just connect to database with JDBC and push in all you read with cobol2j.

Usage

Things you need to know about your data:

1. Data file encoding - optional if you have already an ASCII file

There are many code pages in both EBCDIC and ASCII world and you should know source and destination code pages for your data. This point is optional when you use native EBCDIC to ASCII export facilities ( ones comes with EBCDIC machines ).

2. Data structure - mandatory

Reading EBCDIC text-only file requires EBCDIC to ASCII conversion tables and knowledge of codepages only. It is relatively easy operation and it is even available on some web pages as online web converters. cobol2j challenge is to read flat data files with record structure ( very often more than one record structure in one file! ) : text, numeric, date, packed decimals, zoned, signed and unsigned fields, etc. This require you to know data structure because such source files are binary files and simple text conversion will not do the job here. The knowledge of data structure is required even if your in-house EBCDIC to ASCII conversion is already done. Simple knowledge of data structure is enought. No need of any special file.

Two types of input data:

1. original EBCDIC

Fresh untuched binary data without conversion errors is very welcome input data for cobol2j. It requires to choose right EBCDIC to ASCII conversion table and to define a data structure.

2. ASCII data file

You have a file that is already preprocessed with some external tools. There are also environments where cobol or RPG data are processed in ASCII format. Baby/36 is an example. In that case the fist part of the job is already done and the only thing you should need to do is to define the data structure of the file.

Some data files may have conversion errors. It is quite common that you have a file converted from EBCDIC to ASCII in the way like it was a text file only. cobol2j provides some auto-correction facilities and let you read some bad files with 0% data loss. Here are two cases when input files are in error:

Case 1 - wrong code page

The file was converted with wrong table. Then it can be read by cobol2j but text fields will have some bad characters.

Case 2 - Binary data file converted as it is text file

The file was converted as it was text-only file. It still can be read by cobol2j. Text fields will be ok. Non-packed signed and unsigned numeric fields are corrected with automatic failover machanism. Packed decimal and date fields will be corupted. No way to recover original values except you know applied conversion tables. Failover not supported now.

Fast start example

Prepare to use

EBCDIC to ASCII conversion

If your EBCDIC file includes characters that looks like us-ascii or latin-1 characters on original machine then table EBCDIC 037 -> ISO 8859-1 is ready for you. If you use other character set then please donate your conversion table to the project.

Conversion table is a java class that extends net.sf.cobol2j.AbstractConversionTBL and implements only one method:
public byte convert(byte data);

Data structure definition

Whatever you do with EBCDIC data files you need to know structure. The definition have to be another java class which extends: net.sf.cobol2j.FileFormat class.

Example application

Field Type	Cobol example	Java type	cobol2j field definition	Remarks
Text	X(11)	java.lang.String	X	just text
Numeric ( decimal )	S9(5)V99 COMP-3	java.math.BigDecimal	9	can be packed or not
Date	9(8)	java.lang.String	D	many possible patterns so parse yourselve to have java.util.Date
Binary ( 2's rep. of number )	9(8) BINARY	java.math.BigInteger	B	simple conversion to BigInteger, no sign and point support yet
Hex dump	any	java.lang.String	H	this let you see data as text hexadecimal dump


package test;



import java.io.File;

import java.io.FileInputStream;

import java.io.FileNotFoundException;

import java.io.FileWriter;

import java.io.IOException;



import net.sf.cobol2j.FileFormat;

import net.sf.cobol2j.FileFormatException;

import net.sf.cobol2j.RecordFormat;

import net.sf.cobol2j.RecordSet;

import net.sf.cobol2j.tables.EBCDIC_037_TO_ISO_8859_1;



public class CoboDataFilelReaderTest {



    public static void main(String[] args) {



        RecordSet ctest = new RecordSet();

        FileFormat f = new FileFormat();

        f.setConversionTable(new EBCDIC_037_TO_ISO_8859_1());

        // Only one record structure

        f.setDistinguishFieldSize(0);

        RecordFormat rF = f.addRecordDefinition("0");

        // Text field

        rF.addFieldDefinition("blah 1", 'X', 8, 0, false);

        // 6 byte long packed decimal

        rF.addFieldDefinition("blah 2", '9', 13, 2, true);

        // Signed decimal

        rF.addFieldDefinition("blah 3", '9', 4, 0, false);

        ctest.setFileFormat(f);



        try {

            ctest.setInputStream(new FileInputStream("input_file.ebcdic"));

            File plik = new File("NICE_OUTPUT_FILE.TXT");

            FileWriter out = new FileWriter(plik);

            while (ctest.hasNext())

                out.write(ctest.getNextRecordAsPlainString(", ") + "\r\n");

            out.close();

        } catch (FileNotFoundException ex) {

            ex.printStackTrace();

        } catch (IOException ex) {

            ex.printStackTrace();

        } catch (FileFormatException ex) {

            ex.printStackTrace();

        }

    }

}

Faster Start

If you are bored with own programming and want to see immediatly your data you can hex dump it with:

where commons-logging.jar is from Apache Jakarta Commons Logging. recLen is a length of your record and recSeparatorLen should be 0.

Cobol or RPG data conversion to Excel

Now cobol2j has a load and save feature that let you keep file format definition in an xml file. Here is an example file: YourFileFormat.xc2j. You need two more libraries to run this example: xercesImpl.jar from http://xerces.apache.org/xerces2-j/ and poi-2.5.1-final-20040804.jar from http://jakarta.apache.org/poi/ The command will be:

java -cp commons-logging.jar:cobol2j.jar:xercesImpl.jar:poi-2.5.1-final-20040804.jar test.Cobol2ExcelTest2 YourFileFormat.xc2j < cobol.file > sheet.xls


    package test;

    

    import java.io.FileInputStream;

    import java.io.FileOutputStream;

    import java.io.IOException;

    

    import java.math.BigDecimal;

    import java.math.BigInteger;

    

    import java.util.Iterator;

    import java.util.List;

    

    import net.sf.cobol2j.FieldFormat;

    import net.sf.cobol2j.FileFormat;

    import net.sf.cobol2j.FileFormatException;

    import net.sf.cobol2j.RecordFormat;

    import net.sf.cobol2j.RecordSet;

    

    import org.apache.commons.logging.Log;

    import org.apache.commons.logging.LogFactory;

    import org.apache.poi.hssf.usermodel.HSSFRow;

    import org.apache.poi.hssf.usermodel.HSSFSheet;

    import org.apache.poi.hssf.usermodel.HSSFWorkbook;

    

    public class Cobol2ExcelTest2 {

    

        private static Log log = LogFactory.getLog(Cobol2ExcelTest2.class);

    

        public static void main(String args[]){

            

            RecordSet records = new RecordSet();

            FileFormat f = new FileFormat();

            records.setFileFormat(f);

    

            int rowNr = -1;

            String first = "#Â¤%&/()=()/(&%)";

    

            HSSFWorkbook wb = new HSSFWorkbook();

            HSSFSheet sheet = wb.createSheet("C2J");

            HSSFRow xl_row;

            

            try{

                f.load( new FileInputStream( args[0] ) );

                records.setInputStream( System.in );

                

                while( records.hasNext() ){

                    List fields = records.next();

                    String first1 = fields.get(0).toString();

                    RecordFormat rF = ( RecordFormat )f.getRecordsDefinitions().get(first1);

                    if( !first.equals( first1 ) ){

                        // Print header for another record structure

                        rowNr++;

                        xl_row = sheet.createRow((short) rowNr);

                        Iterator fFs = rF.getFields().iterator();

                        int cellNr = -1;

                        while( fFs.hasNext() ){

                            FieldFormat ff = ( FieldFormat )fFs.next();

                            cellNr++;

                            xl_row.createCell((short)cellNr).setCellValue( ff.getName() );

                        }

                    }

    

                    rowNr++;

                    xl_row = sheet.createRow( (short) rowNr );

                    Iterator fs = fields.iterator();

                    int cellNr = -1;

                    while( fs.hasNext() ){

                        cellNr++;

                        Object o = fs.next();

                        if ( o instanceof BigDecimal )

                            xl_row.createCell((short)cellNr).setCellValue( ((BigDecimal)o).doubleValue() );

                        else if ( o instanceof BigInteger )

                            xl_row.createCell((short)cellNr).setCellValue( ((BigInteger)o).intValue() );

                        else if ( o instanceof String )

                            xl_row.createCell((short)cellNr).setCellValue( o.toString() );

                        else

                            log.error("wassssssup!?");

                    }

                }

                wb.write(System.out);

            }

            catch(IOException ex){

                log.error( ex.getMessage() );

            }

            catch(FileFormatException ex){

                log.error( ex.getMessage() );

            }

        }

    }

"OCCURS ... DEPENDING ON" - variable record length.

cobol2j supports "OCCURS" and "DEPENDING ON" since release 1.2. The file structure definition has been extended to support it in both xml file and programatically. RecordFormat and FieldsGroup both are derived from FieldsList which has a possibility to add both FieldFormat and FieldsGroup. The last can be added recursively. See xml schema drawing. Boxes represent tags structure with attributes. You can also use FileFormat.xsd

Troubleshooting

cobol2j was tested and is used to process series of files from mainframe and from AS/400 in production environment but it is too few tests to avoid problems with field decryption.

When you have exceptions while processing then the fastest way to go is define questionable fields as hex. Then you can analyse field content and choose the right field type.

I am aware of that one or two field types need better implementation. Especially with external signs or even signed binary type.

If you have fields that are not decoded properly with cobol2j then please send me your example data then I will implement it.

Welcome to cobol2j !