What is ODBC?
Associated product(s): Product name: Batch
Views: 38564 | Created: 4 years ago | Last updated: 4 years ago
The purpose of this document is to answer some frequently asked questions regarding ODBC, particularly in regard to its use within Batch.
Open Database Connectivity (ODBC) is a standard that allows programs to read from almost all of the databases currently available.
This means that Batch can read database formats like Microsoft Access, Paradox, dBase and FoxPro; spreadsheets like Microsoft Excel; and connect to external database servers such as Oracle, Informix or Microsoft SQL Server.
Or For A Technical Definition…
“Open Database Connectivity (ODBC) technology provides a common interface for accessing heterogeneous SQL databases. ODBC is based on Structured Query Language (SQL) as a standard for accessing data. This interface provides maximum interoperability: a single application can access different SQL Database Management Systems (DBMS) through a common set of code. This enables a developer to build and distribute a client/server application without targeting a specific DBMS. Database drivers are then added to link the application to the user's choice of DBMS.”
Rather than having to know about each type of database, an application just talks to an ODBC driver that understands that format.
This means that, once the ODBC driver for a particular type of database is provided, any program can read databases in that format. If a new database format needs to be worked with, then all that is required is to install the relevant driver.
The figure above shows how ODBC fits in between an application and the database it is accessing.
When installing Batch, if the “Database Connectivity” option is left checked, then ODBC drivers will be installed for the following database formats: Microsoft Access, dBase, Excel, FoxPro, Oracle, SQL Server and Paradox.
If you need drivers that are not included then they should be available from the supplier or manufacturer of the database: check on their web site, as they are often available free-of-charge.
Tech Note: Batch will not display the new database types unless it has additional information about the driver, stored in qaworld.ini. However, it is still possible to use the new driver by using it to establish a Registered ODBC Source.
Batch does not need ODBC to operate, but does require DAO. The “System Components” installation option installs all the basics Batch needs; it must be run on every machine you want to run Batch from.
A Registered ODBC Source is a way of storing all the information needed to make an ODBC connection: it can then be referred to by a single name – the data source name (DSN).
The application can then establish a connection without needing to know where or what type of database it is connecting to, making it easy for administrators to move or update the source.
You would use a Registered ODBC Source when you need to connect to an external database server – it is the only way of doing so – or if you frequently connect to a particular database.
To examine ODBC settings on your machine, go to the Control Panel and open the ODBC icon. The figure above shows what it looks like on various platforms.
The ODBC Data Source Administrator that starts up displays all the drivers installed on your machine (under the Drivers tab) along with their version, connections for all who use your machine (under the System DSN tab) and connections set up solely for you (under the User DSN tab).
To create a new DSN click on the Add button and use the wizard that appears to set up all the settings required. Click on Configure to modify an existing DSN.
Yes. All you need is the appropriate ODBC driver (such as the SQL Server driver provided with QAS Batch) and to set up your server as a Registered ODBC Source. You might need to provide a username and password in order to make the connection.
The following section explains what database reading and writing goes on during a Batch run, and what affect this has on Batch cleaning.
The figure above shows a simplified picture of how the cleaning process work.
If you set up a session by selecting a database and driver then Batch automatically makes an ODBC connection to the database on your behalf, using those settings.
If, instead, you use a Registered ODBC Source then the connection information has already been set up on your machine, so Batch just needs to open the link.
Tech Note: Text files (.csv, .dat, .txt) are not accessed using ODBC, as QAS have a more comprehensive system for working with them. This, by the way, is why input file filtering is not available for text files, as only ODBC provides it.
Records are read from the database, one at a time, cleaned and then written to the intermediate database.
Therefore, the user performing the automatic run requires write access to the session file and intermediate database, and read access to the source database.
Batch reads databases through ODBC, which does not support as rich a set of allowed table and field names as does, say, Microsoft Access. It is recommended that spaces, commas, quotes and other difficult characters should be excluded from the names of any tables or fields you wish to process.
The Batch session goes into a suspended state if the user chooses to suspend or if there is a system crash. Normally the run can be resumed at the last record read, but this is only possible if the database has a unique record identifier.
A unique record identifier (UID), or row identifier, allows individual records in a database to be directly referenced. Without it there is no random access to records, no way to refer to a particular record. This also means that Batch cannot update data in place when it comes to the Commit stage.
When such a database is selected a warning is given in the Setup Wizard, the Suspend button is renamed Stop, and Commit will only write out the cleaned addresses to a text file.
The intermediate database (IDB) is where the original input fields, reference fields and cleaned (output) address fields are kept. It is updated during Automatic and Interactive cleaning and Commit operations.
Initially, all intermediate databases are stored in the same place as the session files.
The location of a session's IDB and all future IDB's can be changed when configuring a session: it is shown at the bottom of the "Enter Filename" page of the wizard: modify it by clicking on the “Change…” button.
As the IDB holds original and cleaned address fields as well as the reference fields, it can be as large as twice the size of the source database. However, it will frequently be smaller as it does not include any fields or tables that are not part of the Batch run.
The IDB is automatically deleted when a session is deleted. It is also deleted when a fresh Batch run is started on an existing session, or when a fully cleaned session is Reconfigured and then saved.
The last one is a useful way of reusing a session file – say, if the same file is cleaned periodically – without the intermediate database hanging around.
Writing data back to the user's database it is a one-way process: it cannot be undone or rolled back. As Batch frequently deals with customer and contact databases of immense value to the user, integrity and security are very important.
We wish to support the following functionality:
a. Allow the user to view the cleaned records before they are written.
b. Inform the user of what has been done to the record (match codes).
c. Provide the ability to Interactively clean the addresses.
d. Allow cleaned records to be selectively written back to the original database.
In order to provide all this, it is therefore necessary to hold the original and cleaned addresses in an intermediate database.
It was decided to store the IDB as an Access database rather than plain text as it allows random access reads, updates, writes, record filtering and multi-user access.
Data Access Objects (DAO) is way of connecting to databases, designed primarily for Microsoft Access files. It was chosen because it offers superior performance to ODBC for our purposes.
Interactive displays the original and cleaned address fields to the user, along with the match code (what Batch has done and how much confidence it has in the cleaned record), all of which is read from the intermediate database.
If the user corrects the address or updates it's status then changes will be written back to the IDB.
Note, therefore, that every Interactive user must have full write access to the intermediate database, as well as the session file, in order to work successfully.
The act of writing the cleaned address records back to the source database is called a Commit (taken from database terminology). It is a one-way updating procedure.
As mentioned in paragraph 0, Resuming An Automatic Run, the source database can only be updated in place if it has a unique field identifier. If it does not then the cleaned address records can be written out to a text file instead.
Once a Commit filter has been chosen, the cleaned records that match it will be written back to the source database. When Batch writes data back, it simply overwrites the contents of each output field, then marks the record as committed in the IDB.
As time has passed between when the source was originally read and the commit, the following situations can arise:
a. Modification: if changes are made to fields in the source database after the record is read by Batch, then they will be lost if Batch writes to that field as it commits.
b. Deletion: if a record has been deleted then Batch will be unable to write the address back: it will keep track of write failures, which will be displayed in the commit summary.
c. Write failure: if the record cannot be updated for any other reason then this is classed as a serious error, which will be reported individually in the summary along with the reason. Fifty of these errors can occur before the Commit process terminates.
Write failure could occur because the record is locked – if somebody is editing it in the source database – or if the field is read-only to the user running Batch, or if the database is offline or down.
Note that once a record successfully commits, it no longer appears in any of the filters. This also means that problem records can easily be traced by running the Viewer with the filter that was just used to Commit.
As mentioned above, when you Commit, each record that is successfully updated is then marked as committed and is excluded from all the filters. This means that if you immediately reapply the same Commit filter then there should no longer be any record that matches it.
If some further Automatic or Interactive processing is performed then records may once again match the selected filter.
If over fifty serious commit failures occur, or if Batch cannot proceed, then Batch will unmark all the previously committed records in the IDB, so the whole Commit operation can be restarted when the issues are resolved.