14. Using Vector Data

Oracle Database 23ai introduced a new data type VECTOR for artificial intelligence and machine learning search operations. The vector data type is a homogeneous array of 8-bit signed integers, 32-bit floating-point numbers, or 64-bit floating-point numbers. With the vector data type, you can define the number of dimensions for the data and the storage format for each dimension value in the vector.

To create a table with three columns for vector data, for example:

CREATE TABLE vector_table (
    v32 vector(3, float32),
    v64 vector(3, float64),
    v8  vector(3, int8)
)

In this example, each column can store vector data of three dimensions where each dimension value is of the specified storage format. This example is used in subsequent sections.

14.1. Inserting Vectors

With python-oracledb, vector data can be inserted using Python arrays (array.array()). To use Python arrays, import the array module in your code.

Python arrays (array.array()) of float (32-bit), double (64-bit), or int8_t (8-bit signed integer) are used as bind values when inserting vector columns. For example:

vector_data_32 = array.array("f", [1.625, 1.5, 1.0]) # 32-bit float
vector_data_64 = array.array("d", [11.25, 11.75, 11.5]) # 64-bit float
vector_data_8 = array.array("b", [1, 2, 3]) # 8-bit signed integer

cursor.execute(
    "insert into vector_table (v32, v64, v8) values (:1, :2, :3)",
    [vector_data_32, vector_data_64, vector_data_8],
)

See vector.py for a runnable example.

If you are using python-oracledb Thick mode with older versions of Oracle Client libraries than 23ai, see this section.

14.2. Fetching Vectors

With python-oracledb, vector columns are fetched as Python arrays (array.array()). For example:

cursor.execute("select * from vector_table")
    for row in cursor:
        print(row)

This prints an output such as:

(array("f", [1.625, 1.5, 1.0]), array("d", [11.25, 11.75, 11.5]), array("b", [1, 2, 3]))

The FetchInfo object that is returned as part of the fetched metadata contains attributes FetchInfo.vector_dimensions and FetchInfo.vector_format which return the number of dimensions of the vector column and the storage format of each dimension value in the vector column respectively.

You can convert the vector data fetched from a connection to a Python list by using the following output type handler:

def output_type_handler(cursor, metadata):
    if metadata.type_code is oracledb.DB_TYPE_VECTOR:
        return cursor.var(metadata.type_code, arraysize=cursor.arraysize,
                          outconverter=list)
connection.outputtypehandler = output_type_handler
cursor.execute("select * from vector_table")
for row in cursor:
    print(row)

For each vector column, the database will now return a Python list representation of each row’s value.

If you are using python-oracledb Thick mode with older versions of Oracle Client libraries than 23ai, see below.

14.3. Using python-oracledb Thick Mode with Older Versions of Oracle Client Libraries

If you are using python-oracledb Thick mode with older versions of Oracle Client libraries than 23ai, then you must use strings when inserting vectors. For example:

vector_data_32 = "[1.625, 1.5, 1.0]"
vector_data_64 = "[11.25, 11.75, 11.5]"
vector_data_8 = "[1, 2, 3]"

cursor.execute(
    "insert into vector_table (v32, v64, v8) values (:1, :2, :3)",
    [vector_data_32, vector_data_64, vector_data_8],
)

The vector columns are fetched as Python lists. For example:

cursor.execute("select * from vector_table")
for row in cursor:
    print(row)

See vector_string.py for a runnable example.

14.4. Using NumPy

Vector data can be used with Python’s NumPy package types. To use NumPy’s ndarray type, install NumPy, for example with pip install numpy, and import the module in your code.

14.4.1. Inserting Vectors with NumPy

To insert vectors, you must convert NumPy ndarray types to array types. This conversion can be done by using an input type handler. For example:

def numpy_converter_in(value):
    if value.dtype == numpy.float64:
        dtype = "d"
    elif value.dtype == numpy.float32:
        dtype = "f"
    else:
        dtype = "b"
    return array.array(dtype, value)

def input_type_handler(cursor, value, arraysize):
    if isinstance(value, numpy.ndarray):
        return cursor.var(
            oracledb.DB_TYPE_VECTOR,
            arraysize=arraysize,
            inconverter=numpy_converter_in,
        )

Using it in an INSERT statement:

vector_data_32 = numpy.array([1.625, 1.5, 1.0])
vector_data_64 = numpy.array([11.25, 11.75, 11.5])
vector_data_8 = numpy.array([1, 2, 3])

connection.inputtypehandler = input_type_handler

cursor.execute(
    "insert into vector_table (v32, v64, v8) values (:1, :2, :3)",
    [vector_data_32, vector_data_64, vector_data_8],
)

14.4.2. Fetching Vectors with NumPy

To fetch vector data as an ndarray type, you can convert the array type to an ndarray type by using an output type handler. For example:

def numpy_converter_out(value):
    if value.typecode == "b":
        dtype = numpy.int8
    elif value.typecode == "f":
        dtype = numpy.float32
    else:
        dtype = numpy.float64
    return numpy.array(value, copy=False, dtype=dtype)

def output_type_handler(cursor, metadata):
    if metadata.type_code is oracledb.DB_TYPE_VECTOR:
        return cursor.var(
            metadata.type_code,
            arraysize=cursor.arraysize,
            outconverter=numpy_converter_out,
        )

Using it in a query:

connection.outputtypehandler = output_type_handler

cursor.execute("select * from vector_table")
    for row in cursor:
        print(row)

This prints an output such as:

(array([1.625, 1.5, 1.0], dtype=float32), array([11.25, 11.75, 11.5], dtype=float64), array([1, 2, 3], dtype=int8))

See vector_numpy.py for a runnable example.