What is ODBC?

Document type: Tech doc  
Associated product(s): Product name: Batch Version: 4.0 + Implementation: StandaloneOS: Windows
Views: 31944  |  Created: 4 years agoLast updated: 4 years ago

<< More documents like this

What is ODBC?

Summary

The purpose of this document is to answer some frequently asked questions regarding ODBC, particularly in regard to its use within Batch.

What is ODBC?

How does it work?

Who provides ODBC drivers?

What is a registered ODBC source?

Can We Batch Data That Is On A Unix/AS400/Mainframe Box?

What Happens When You Clean A Database?

What Happens During Automatic Cleaning?

Why Can't Batch Open All Tables?

Resuming An Automatic Run.

What Is A Unique Record Identifier?

What Is The Intermediate Database?

Where Is The Intermediate Database Stored?

How Big Is The Intermediate Database?

How Can We Delete It?

Why Do We Use An Intermediate Database Instead Of Updating As We Go?

What Is DAO And Why Do We Use It?

What Happens During Interactive Cleaning?

What Happens When You Commit?

Why Did Commit Say "There Are No Records Matching The Specified Filter"?

What Happens When A Commit Fails?

Solution

What Is ODBC?

Open Database Connectivity (ODBC) is a standard that allows programs to read from almost all of the databases currently available.

This means that Batch can read database formats like Microsoft Access, Paradox, dBase and FoxPro; spreadsheets like Microsoft Excel; and connect to external database servers such as Oracle, Informix or Microsoft SQL Server.

Or For A Technical Definition…

“Open Database Connectivity (ODBC) technology provides a common interface for accessing heterogeneous SQL databases. ODBC is based on Structured Query Language (SQL) as a standard for accessing data. This interface provides maximum interoperability: a single application can access different SQL Database Management Systems (DBMS) through a common set of code. This enables a developer to build and distribute a client/server application without targeting a specific DBMS. Database drivers are then added to link the application to the user's choice of DBMS.”

How Does It Work?

Rather than having to know about each type of database, an application just talks to an ODBC driver that understands that format.

This means that, once the ODBC driver for a particular type of database is provided, any program can read databases in that format. If a new database format needs to be worked with, then all that is required is to install the relevant driver.
The figure above shows how ODBC fits in between an application and the database it is accessing.

Who Provides ODBC Drivers?

When installing Batch, if the “Database Connectivity” option is left checked, then ODBC drivers will be installed for the following database formats: Microsoft Access, dBase, Excel, FoxPro, Oracle, SQL Server and Paradox.

If you need drivers that are not included then they should be available from the supplier or manufacturer of the database: check on their web site, as they are often available free-of-charge.

Tech Note: Batch will not display the new database types unless it has additional information about the driver, stored in qaworld.ini. However, it is still possible to use the new driver by using it to establish a Registered ODBC Source.

Batch does not need ODBC to operate, but does require DAO. The “System Components” installation option installs all the basics Batch needs; it must be run on every machine you want to run Batch from.

What Is A Registered ODBC Source?

A Registered ODBC Source is a way of storing all the information needed to make an ODBC connection: it can then be referred to by a single name – the data source name (DSN).

The application can then establish a connection without needing to know where or what type of database it is connecting to, making it easy for administrators to move or update the source.

You would use a Registered ODBC Source when you need to connect to an external database server – it is the only way of doing so – or if you frequently connect to a particular database.

To examine ODBC settings on your machine, go to the Control Panel and open the ODBC icon. The figure above shows what it looks like on various platforms.

The ODBC Data Source Administrator that starts up displays all the drivers installed on your machine (under the Drivers tab) along with their version, connections for all who use your machine (under the System DSN tab) and connections set up solely for you (under the User DSN tab).

To create a new DSN click on the Add button and use the wizard that appears to set up all the settings required. Click on Configure to modify an existing DSN.

Can We Batch Data That Is On A Unix/AS400/Mainframe Box?

Yes. All you need is the appropriate ODBC driver (such as the SQL Server driver provided with QAS Batch) and to set up your server as a Registered ODBC Source. You might need to provide a username and password in order to make the connection.

What Happens When You Clean A Database?

The following section explains what database reading and writing goes on during a Batch run, and what affect this has on Batch cleaning.


The figure above shows a simplified picture of how the cleaning process work.

What Happens During Automatic Cleaning?

If you set up a session by selecting a database and driver then Batch automatically makes an ODBC connection to the database on your behalf, using those settings.

If, instead, you use a Registered ODBC Source then the connection information has already been set up on your machine, so Batch just needs to open the link.

Tech Note: Text files (.csv, .dat, .txt) are not accessed using ODBC, as QAS have a more comprehensive system for working with them. This, by the way, is why input file filtering is not available for text files, as only ODBC provides it.

Records are read from the database, one at a time, cleaned and then written to the intermediate database.

Therefore, the user performing the automatic run requires write access to the session file and intermediate database, and read access to the source database.

Why Can't Batch Open All Tables?

Batch reads databases through ODBC, which does not support as rich a set of allowed table and field names as does, say, Microsoft Access. It is recommended that spaces, commas, quotes and other difficult characters should be excluded from the names of any tables or fields you wish to process.

Resuming An Automatic Run.

The Batch session goes into a suspended state if the user chooses to suspend or if there is a system crash. Normally the run can be resumed at the last record read, but this is only possible if the database has a unique record identifier.

What Is A Unique Record Identifier?

A unique record identifier (UID), or row identifier, allows individual records in a database to be directly referenced. Without it there is no random access to records, no way to refer to a particular record. This also means that Batch cannot update data in place when it comes to the Commit stage.

When such a database is selected a warning is given in the Setup Wizard, the Suspend button is renamed Stop, and Commit will only write out the cleaned addresses to a text file.

What Is The Intermediate Database?

The intermediate database (IDB) is where the original input fields, reference fields and cleaned (output) address fields are kept. It is updated during Automatic and Interactive cleaning and Commit operations.

Where Is The Intermediate Database Stored?

Initially, all intermediate databases are stored in the same place as the session files.

The location of a session's IDB and all future IDB's can be changed when configuring a session: it is shown at the bottom of the "Enter Filename" page of the wizard: modify it by clicking on the “Change…” button.

How Big Is The Intermediate Database?

As the IDB holds original and cleaned address fields as well as the reference fields, it can be as large as twice the size of the source database. However, it will frequently be smaller as it does not include any fields or tables that are not part of the Batch run.

How Can We Delete It?

The IDB is automatically deleted when a session is deleted. It is also deleted when a fresh Batch run is started on an existing session, or when a fully cleaned session is Reconfigured and then saved.

The last one is a useful way of reusing a session file – say, if the same file is cleaned periodically – without the intermediate database hanging around.

Why Do We Use An Intermediate Database Instead Of Updating As We Go?

Writing data back to the user's database it is a one-way process: it cannot be undone or rolled back. As Batch frequently deals with customer and contact databases of immense value to the user, integrity and security are very important.

We wish to support the following functionality:

a. Allow the user to view the cleaned records before they are written.

b. Inform the user of what has been done to the record (match codes).

c. Provide the ability to Interactively clean the addresses.

d. Allow cleaned records to be selectively written back to the original database.

In order to provide all this, it is therefore necessary to hold the original and cleaned addresses in an intermediate database.

What Is DAO And Why Do We Use It?

It was decided to store the IDB as an Access database rather than plain text as it allows random access reads, updates, writes, record filtering and multi-user access.

Data Access Objects (DAO) is way of connecting to databases, designed primarily for Microsoft Access files. It was chosen because it offers superior performance to ODBC for our purposes.

What Happens During Interactive Cleaning?

Interactive displays the original and cleaned address fields to the user, along with the match code (what Batch has done and how much confidence it has in the cleaned record), all of which is read from the intermediate database.

If the user corrects the address or updates it's status then changes will be written back to the IDB.

Note, therefore, that every Interactive user must have full write access to the intermediate database, as well as the session file, in order to work successfully.

What Happens When You Commit?

The act of writing the cleaned address records back to the source database is called a Commit (taken from database terminology). It is a one-way updating procedure.

As mentioned in paragraph 0, Resuming An Automatic Run, the source database can only be updated in place if it has a unique field identifier. If it does not then the cleaned address records can be written out to a text file instead.

Once a Commit filter has been chosen, the cleaned records that match it will be written back to the source database. When Batch writes data back, it simply overwrites the contents of each output field, then marks the record as committed in the IDB.

As time has passed between when the source was originally read and the commit, the following situations can arise:

a. Modification: if changes are made to fields in the source database after the record is read by Batch, then they will be lost if Batch writes to that field as it commits.

b. Deletion: if a record has been deleted then Batch will be unable to write the address back: it will keep track of write failures, which will be displayed in the commit summary.

c. Write failure: if the record cannot be updated for any other reason then this is classed as a serious error, which will be reported individually in the summary along with the reason. Fifty of these errors can occur before the Commit process terminates.


Write failure could occur because the record is locked – if somebody is editing it in the source database – or if the field is read-only to the user running Batch, or if the database is offline or down.

Note that once a record successfully commits, it no longer appears in any of the filters. This also means that problem records can easily be traced by running the Viewer with the filter that was just used to Commit.

Why Did Commit Say "There Are No Records Matching The Specified Filter"?

As mentioned above, when you Commit, each record that is successfully updated is then marked as committed and is excluded from all the filters. This means that if you immediately reapply the same Commit filter then there should no longer be any record that matches it.

If some further Automatic or Interactive processing is performed then records may once again match the selected filter.

What Happens When A Commit Fails?

If over fifty serious commit failures occur, or if Batch cannot proceed, then Batch will unmark all the previously committed records in the IDB, so the whole Commit operation can be restarted when the issues are resolved.

Was this document helpful?

What can we do to improve this information?