17. Working with Simple Oracle Document Access (SODA)

Oracle Database Simple Oracle Document Access (SODA) allows documents to be inserted, queried, and retrieved from Oracle Database using a set of NoSQL-style python-oracledb methods. Documents are generally JSON data, but they can be any data at all (including video, images, sounds, or other binary content). Documents can be fetched from the database by key lookup or by using query-by-example (QBE) pattern-matching. You can also use SODA APIs to access existing JSON-Relational Duality Views.

Note

SODA is only supported in python-oracledb Thick mode. See Enabling python-oracledb Thick mode.

SODA uses a SQL schema to store documents, but you do not need to know SQL or how the documents are stored. However, access through SQL does allow use of advanced Oracle Database functionality such as analytics for reporting.

Oracle SODA implementations are also available in Node.js, Java, PL/SQL, Oracle Call Interface, and through REST.

For general information on SODA, see the SODA home page and the Oracle Database Introduction to Simple Oracle Document Access (SODA) manual.

For specific requirements, see the python-oracledb SODA requirements.

Python-oracledb uses the following objects for SODA:

SODA Database Object: The top-level object for python-oracledb SODA operations. This is acquired from an Oracle Database connection. A ‘SODA database’ is an abstraction, allowing access to SODA collections in that ‘SODA database’, which then allow access to documents in those collections. A SODA database is analogous to an Oracle Database user or schema, a collection is analogous to a table, and a document is analogous to a table row with one column for a unique document key, a column for the document content, and other columns for various document attributes.
SODA Collection Object: Represents a collection of SODA documents. By default, collections allow JSON documents to be stored. This is recommended for most SODA users. However, optional metadata can set various details about a collection, such as its database storage, whether it should track version and time stamp document components, how such components are generated, and what document types are supported. By default, the name of the Oracle Database table storing a collection is the same as the collection name. Note: do not use SQL to drop the database table, since SODA metadata will not be correctly removed. Use the SodaCollection.drop() method instead.
SODA Document Object: Represents a document. Typically the document content will be JSON. The document has properties including the content, a key, timestamps, and the media type. By default, document keys are automatically generated. See SODA Document objects for the forms of SodaDoc.
SODA Document Cursor: A cursor object representing the result of the SodaOperation.getCursor() method from a SodaCollection.find() operation. It can be iterated over to access each SodaDoc.
SODA Operation Object: An internal object used with SodaCollection.find() to perform read and write operations on documents. Chained methods set properties on a SodaOperation object which is then used by a terminal method to find, count, replace, or remove documents. This is an internal object that should not be directly accessed.

17.1. SODA Examples

Creating and adding documents to a collection can be done as follows:

soda = connection.getSodaDatabase()

# create a new SODA collection; this will open an existing collection, if
# the name is already in use
collection = soda.createCollection("mycollection")

# insert a document into the collection; for the common case of a JSON
# document, the content can be a simple Python dictionary which will
# internally be converted to a JSON document
content = {'name': 'Matilda', 'address': {'city': 'Melbourne'}}
returned_doc = collection.insertOneAndGet(content)
key = returned_doc.key
print('The key of the new SODA document is: ', key)

By default, a system generated key is created when documents are inserted. With a known key, you can retrieve a document:

# this will return a dictionary (as was inserted in the previous code)
content = collection.find().key(key).getOne().getContent()
print(content)

You can also search for documents using query-by-example syntax:

# Find all documents with names like 'Ma%'
print("Names matching 'Ma%'")
qbe = {'name': {'$like': 'Ma%'}}
for doc in collection.find().filter(qbe).getDocuments():
    content = doc.getContent()
    print(content["name"])

See the samples directory for runnable SODA examples.

17.2. Using the SODA Metadata Cache

SODA metadata can be cached to improve the performance of SodaDatabase.createCollection() and SodaDatabase.openCollection() by reducing round-trips to the database. Caching is available with Oracle Client 21.3 (or later). The feature is also available in Oracle Client 19 from 19.11 onwards.

The metadata cache can be turned on when creating a connection pool with oracledb.create_pool(). Each pool has its own cache:

# Create the connection pool
pool = oracledb.create_pool(user="hr", password=userpwd,
                             dsn="dbhost.example.com/orclpdb",
                             soda_metadata_cache=True)

The cache is not available for standalone connections. Applications using these should retain and reuse the collection returned from createCollection() or openCollection() wherever possible, instead of making repeated calls to those methods.

The cache is not used by createCollection() when explicitly passing metadata. In this case, instead of using only createCollection() and relying on its behavior of opening an existing collection like:

mymetadata = { . . . }

# open an existing collection, or create a new collection
collection = soda.createCollection("mycollection", mymetadata)

collection.insertOne(mycontent)

you will find it more efficient to use logic similar to:

collection = soda.openCollection("mycollection")
if collection is None:
    mymetadata = { . . . }
    collection = soda.createCollection("mycollection", mymetadata)
collection.insertOne(mycontent)

If collection metadata changes are made externally, the cache can become invalid. If this happens, the cache can be cleared by calling ConnectionPool.reconfigure() with soda_metadata_cache set to False, or by setting the attribute ConnectionPool.soda_metadata_cache to False. Use a second call to reconfigure() or set soda_metadata_cache to re-enable the cache.

17.3. Committing SODA Work

The general recommendation for SODA applications is to turn on autocommit globally:

connection.autocommit = True

If your SODA document write operations are mostly independent of each other, this removes the overhead of application transaction management and the need for explicit Connection.commit() calls.

When deciding how to commit transactions, beware of transactional consistency and performance requirements. If you are using individual SODA calls to insert or update a large number of documents with individual calls, you should turn autocommit off and issue a single, explicit commit() after all documents have been processed. Also consider using SodaCollection.insertMany() or SodaCollection.insertManyAndGet() which have performance benefits.

If you are not autocommitting, and one of the SODA operations in your transaction fails, then previous uncommitted operations will not be rolled back. Your application should explicitly roll back the transaction with Connection.rollback() to prevent any later commits from committing a partial transaction.

Note:

SODA DDL operations do not commit an open transaction the way that SQL always does for DDL statements.
When autocommit is True, most SODA methods will issue a commit before successful return.
SODA provides optimistic locking. See SodaOperation.version().
When mixing SODA and relational access, any commit or rollback on the connection will affect all work.