14. Using VECTOR Data

Oracle Database 23ai introduced a new data type VECTOR for artificial intelligence and machine learning search operations. The VECTOR data type is a homogeneous array of 8-bit signed integers, 8-bit unsigned integers, 32-bit floating-point numbers, or 64-bit floating-point numbers.

With the VECTOR data type, you can define the number of dimensions for the data and the storage format for each dimension value in the vector. The possible storage formats include:

  • int8 for 8-bit signed integers

  • binary for 8-bit unsigned integers

  • float32 for 32-bit floating-point numbers

  • float64 for 64-bit floating point numbers

Vectors can also be defined with an arbitrary number of dimensions and formats. This allows you to specify vectors of different dimensions with the various storage formats mentioned above. For example:

CREATE TABLE vector_table (
    vec_data vector
)

14.1. Using FLOAT32, FLOAT64, and INT8 Vectors

To create a table with three columns for vector data:

CREATE TABLE vector_table (
    v32 vector(3, float32),
    v64 vector(3, float64),
    v8  vector(3, int8)
)

In this example, each column can store vector data of three dimensions where each dimension value is of the specified format.

14.1.1. Inserting FLOAT32, FLOAT64, and INT8 Vectors

With python-oracledb, vector data can be inserted using Python array.array() objects. Python arrays of type float (32-bit), double (64-bit), or int8_t (8-bit signed integer) are used as bind values when inserting vector columns. For example:

import array

vector_data_32 = array.array("f", [1.625, 1.5, 1.0]) # 32-bit float
vector_data_64 = array.array("d", [11.25, 11.75, 11.5]) # 64-bit float
vector_data_8 = array.array("b", [1, 2, 3]) # 8-bit signed integer

cursor.execute(
    "insert into vector_table (v32, v64, v8) values (:1, :2, :3)",
    [vector_data_32, vector_data_64, vector_data_8]
)

14.1.2. Fetching FLOAT32, FLOAT64, and INT8 Vectors

With python-oracledb, vector columns of int8, float32, and float64 format are fetched as Python array.array() types. For example:

cursor.execute("select * from vector_table")
    for row in cursor:
        print(row)

This prints an output such as:

(array("f", [1.625, 1.5, 1.0]), array("d", [11.25, 11.75, 11.5]), array("b", [1, 2, 3]))

The FetchInfo object that is returned as part of the fetched metadata contains attributes FetchInfo.vector_dimensions and FetchInfo.vector_format which return the number of dimensions of the vector column and the format of each dimension value in the vector column respectively.

You can convert the vector data fetched from array.array() to a Python list by using the following output type handler:

def output_type_handler(cursor, metadata):
    if metadata.type_code is oracledb.DB_TYPE_VECTOR:
        return cursor.var(metadata.type_code, arraysize=cursor.arraysize,
                          outconverter=list)
connection.outputtypehandler = output_type_handler
cursor.execute("select * from vector_table")
for row in cursor:
    print(row)

For each vector column, the database will now return a Python list representation of each row’s value as shown below:

([1.625, 1.5, 1.0], [11.25, 11.75, 11.5], [1, 2, 3])

See Inserting Vectors with NumPy for an example of using an input type handler.

If you are using python-oracledb Thick mode with older versions of Oracle Client libraries than 23ai, see this section.

See vector.py for a runnable example.

14.2. Using BINARY Vectors

In addition to Int8, Float32, and Float64 formats, you can also use a Binary format to define vectors. The binary format represents each dimension value as a binary value (0 or 1). Binary vectors require less memory storage. For example, a 16 dimensional vector with binary format requires only 2 bytes of storage while a 16 dimensional vector with int8 format requires 16 bytes of storage.

Binary vectors are represented as 8-bit unsigned integers. For the binary format, you must define the number of dimensions as a multiple of 8.

To create a table with one column for vector data:

CREATE TABLE vector_binary_table (
    vb vector(24, binary)
)

In this example, the vb column can store vector data of 24 dimensions where each dimension value is represented as a single bit. Note that the number of dimensions 24 is a multiple of 8.

If you specify a vector dimension that is not a multiple of 8, then you will get ORA-51813.

14.2.1. Inserting BINARY Vectors

Python arrays of type uint8_t (8-bit unsigned integer) are used as bind values when inserting vector columns. The length of unit8_t arrays must be equal to the number of dimensions divided by 8. For example, if the number of dimensions for a vector column is 24, then the length of the array must be 3. The values in unint8_t arrays can range from 0 to 255. For example:

import array

vector_data_vb = array.array("B", [180, 150, 100]) # 8-bit unsigned integer

cursor.execute(
    "insert into vector_binary_table values (:1)",
    [vector_data_vb]
)

14.2.2. Fetching BINARY Vectors

With python-oracledb, vector columns of binary format are fetched as Python array.array() types. For example:

cursor.execute("select * from vector_binary_table")
    for row in cursor:
        print(row)

This prints an output such as:

(array("B", [180, 150, 100]))

The FetchInfo object that is returned as part of the fetched metadata contains attributes FetchInfo.vector_dimensions and FetchInfo.vector_format which return the number of dimensions of the vector column and the format of each dimension value in the vector column respectively.

You can convert the vector data fetched from a connection to a Python list by using this output type handler. For each vector column, the database will now return a Python list representation of each row’s value.

If you are using python-oracledb Thick mode with older versions of Oracle Client libraries than 23ai, see this section.

14.3. Using python-oracledb Thick Mode with Older Versions of Oracle Client Libraries

If you are using python-oracledb Thick mode with versions of Oracle Client libraries older than 23ai, then you must use strings when inserting vectors. The vector columns are fetched as Python lists.

14.3.1. Inserting Vectors with Older Oracle Client Versions

To insert vectors of int8, float32, float64, and unit8 format when using Oracle Client versions older than 23ai, you must use strings as shown below:

vector_data_32 = "[1.625, 1.5, 1.0]"
vector_data_64 = "[11.25, 11.75, 11.5]"
vector_data_8 = "[1, 2, 3]"
vector_data_vb = "[180, 150, 100]"

cursor.execute(
    "insert into vector_table (v32, v64, v8, vb) values (:1, :2, :3, :4)",
    [vector_data_32, vector_data_64, vector_data_8, vector_data_vb]
)

14.3.2. Fetching Vectors with Older Oracle Client Versions

With Oracle Client versions older than 23ai, the vector columns are fetched as Python lists. For example:

cursor.execute("select * from vector_table")
for row in cursor:
    print(row)

This prints an output such as:

([1.625, 1.5, 1.0], [11.25, 11.75, 11.5], [1, 2, 3], [180, 150, 100])

See vector_string.py for a runnable example.

14.4. Using NumPy

Vector data can be used with Python’s NumPy package types. To use NumPy’s ndarray type, install NumPy, for example with pip install numpy, and import the module in your code.

14.4.1. Inserting Vectors with NumPy

To insert vectors, you must convert NumPy ndarray types to array types. This conversion can be done by using an input type handler. For example:

def numpy_converter_in(value):
    if value.dtype == numpy.float64:
        dtype = "d"
    elif value.dtype == numpy.float32:
        dtype = "f"
    elif value.dtype == numpy.uint8:
        dtype = "B"
    else:
        dtype = "b"
    return array.array(dtype, value)

def input_type_handler(cursor, value, arraysize):
    if isinstance(value, numpy.ndarray):
        return cursor.var(
            oracledb.DB_TYPE_VECTOR,
            arraysize=arraysize,
            inconverter=numpy_converter_in,
        )

Using it in an INSERT statement:

vector_data_32 = numpy.array([1.625, 1.5, 1.0])
vector_data_64 = numpy.array([11.25, 11.75, 11.5])
vector_data_8 = numpy.array([1, 2, 3])
vector_data_vb = numpy.array([180, 150, 100])

connection.inputtypehandler = input_type_handler

cursor.execute(
    "insert into vector_table (v32, v64, v8, vb) values (:1, :2, :3, :4)",
    [vector_data_32, vector_data_64, vector_data_8, vector_data_vb],
)

14.4.2. Fetching Vectors with NumPy

To fetch vector data as an ndarray type, you can convert the array type to an ndarray type by using an output type handler. For example:

def numpy_converter_out(value):
    if value.typecode == "b":
        dtype = numpy.int8
    elif value.typecode == "f":
        dtype = numpy.float32
    elif value.typecode == "B":
        dtype = numpy.uint8
    else:
        dtype = numpy.float64
    return numpy.array(value, copy=False, dtype=dtype)

def output_type_handler(cursor, metadata):
    if metadata.type_code is oracledb.DB_TYPE_VECTOR:
        return cursor.var(
            metadata.type_code,
            arraysize=cursor.arraysize,
            outconverter=numpy_converter_out,
        )

Using it to query the columns:

connection.outputtypehandler = output_type_handler

cursor.execute("select * from vector_table")
    for row in cursor:
        print(row)

This prints an output such as:

(array([1.625, 1.5, 1.0], dtype=float32), array([11.25, 11.75, 11.5], dtype=float64), array([1, 2, 3], dtype=int8), array([180, 150, 100], dtype=uint8))

See vector_numpy.py for a runnable example.