Solved: How to Query Bigtable rows by min, max, or between...

yufanzhang

We have a request for query by min, max, or between min and max based on column, here, the data type of values is integer or double. I see Google Cloud API Range class is to handle String data type. Comparing String is deferent from comparing number, for e.g., for number, 99 < 1000, while for String, “99” > “1000” when using String compare() function.

Therefore, one of the solutions for us is to store the Integer or double value as fixed length String, such as storing String “00001234.56” for double value 1234.56. in this way, we can apply the methods of Range class: startClosed(), startOpen(), endClosed(), or endOpen(), to filter the query rows.

I feel there might be another way to query Bigtable rows with custom Range.BoundType, such as to implement Double or Integer value comparable Class for Range<C extends Comparable>. So, when we do query by min, or max double value on a column, we might pass a custom Value RangeFilter

Such as replace Google Cloud Value RageFilter with CustomDoubleValueFilter on https://cloud.google.com/bigtable/docs/using-filters#value-range

Filter filter =CustomDoubleValueFilter.startClosed("123.45").endClosed("1000");
readFilter(projectId, instanceId, tableId, filter)

If that is possible, are there any sample codes for that?

ms4446

You are correct about the limitations of string comparison for numerical values. In Bigtable, all values are treated as byte arrays, meaning direct lexicographic comparisons are not suitable for numerical ranges. While the idea of a custom comparator is appealing, it's important to note that Google Cloud Bigtable's client library does not support direct implementation of custom comparator classes for filters. Instead, we can manipulate byte arrays to represent numerical values and use built-in filter methods effectively. Here's how you can implement this using Java:

 

import com.google.cloud.bigtable.data.v2.models.Filters;
import com.google.cloud.bigtable.data.v2.models.Query;
import com.google.protobuf.ByteString;
import java.nio.ByteBuffer;

public class BigtableNumericQuery {

    public static Query createNumericRangeQuery(
        String tableName, String columnFamily, String columnQualifier,
        double start, double end, boolean startInclusive, boolean endInclusive
    ) {
        ByteString startKey = toByteString(start);
        ByteString endKey = toByteString(end);

        Filters.Filter filter = Filters.FILTERS.chain()
            .filter(Filters.FILTERS.family().exactMatch(columnFamily))
            .filter(Filters.FILTERS.qualifier().exactMatch(columnQualifier))
            .filter(Filters.FILTERS.value().range()
                .start(startKey, startInclusive)
                .end(endKey, endInclusive)
            );

        return Query.create(tableName).filter(filter);
    }

    private static ByteString toByteString(double value) {
        return ByteString.copyFrom(ByteBuffer.allocate(8).putDouble(value).array());
    }
}

Key Points:

Byte Array Conversion with toByteString: The toByteString helper function encapsulates the logic for converting a double to a ByteString, making the main function more readable.
Built-in Filter with Chain: This leverages Bigtable's value().range() filter, combined with other filters in a chain to target specific columns. This ensures precise querying within your Bigtable dataset.
Inclusivity Control: The inclusion of startInclusive and endInclusive provides fine-grained control over the range boundaries, allowing for flexible specification of inclusive or exclusive bounds.
Adaptability: This function is easily adaptable to other numeric types like integers or longs by modifying the toByteString function for appropriate type conversions.

Additional Considerations:

Precision (Floating-Point): As highlighted, be cautious with potential rounding errors when converting floating-point numbers. Consider additional checks or precision-handling strategies for high-accuracy scenarios.
Indexing for Performance: In the case of large tables, employing secondary indexes on numerical columns can significantly enhance query performance and response times.

View solution in original post

yufanzhang

On Google Cloud Website https://cloud.google.com/java/docs/reference/google-cloud-bigtable/latest/com.google.cloud.bigtable...., we can see the Rage can be bounded by a non-String or non-ByteString. Such as

startClosed(T start), startOpen(T start), endClosed(T start), endOpen(T start), or of(T startClosed, T endOpen)

In above functions, "T" is an Abstract data type which is not limited to String or ByteString. Therefore, can we pass "Double" as the Range data type "T" without converting to ByteString? Here, "Double" might be a custom Range Data Type.

View solution in original post

ms4446

The Bigtable API documentation does indeed mention generic methods like startClosed(T start), startOpen(T start), endClosed(T start), and endOpen(T start). However, it's important to understand that the type parameter T must be compatible with Bigtable's underlying byte-oriented data model.

Despite the generic appearance in the Java client library, all data in Bigtable must ultimately be converted to ByteString. This is because Bigtable stores and processes data as byte arrays. While client libraries provide generic handlers for convenience, they don't change this fundamental requirement.

The Java client library doesn't support passing Java primitives like Double directly into these range methods. They must first be converted into a supported format (ByteString).

Even though Java uses generics for flexibility, all non-byte types must be serialized into bytes (ByteString) before Bigtable operations. Bigtable doesn't understand Java types; it only understands bytes.

To use Double values in range queries, you need to serialize them into ByteString. Here's how to do it using ByteBuffer:

 

Double startValue = 123.45;
Double endValue = 678.90;

ByteString startKey = ByteString.copyFrom(ByteBuffer.allocate(8).putDouble(startValue).array());
ByteString endKey = ByteString.copyFrom(ByteBuffer.allocate(8).putDouble(endValue).array());

Filters.Filter filter = Filters.FILTERS.value().range()
    .startClosed(startKey)
    .endClosed(endKey); 

While the Java client library API may appear to accept a generic type T, in practice, all data types must be converted to ByteString. This is fundamental to Bigtable's byte-oriented data model, and all higher-level abstractions ultimately map down to byte operations for storage and retrieval.

This approach converts the doubles correctly into the byte format that Bigtable requires.

View solution in original post

ms4446

You are correct about the limitations of string comparison for numerical values. In Bigtable, all values are treated as byte arrays, meaning direct lexicographic comparisons are not suitable for numerical ranges. While the idea of a custom comparator is appealing, it's important to note that Google Cloud Bigtable's client library does not support direct implementation of custom comparator classes for filters. Instead, we can manipulate byte arrays to represent numerical values and use built-in filter methods effectively. Here's how you can implement this using Java:

 

import com.google.cloud.bigtable.data.v2.models.Filters;
import com.google.cloud.bigtable.data.v2.models.Query;
import com.google.protobuf.ByteString;
import java.nio.ByteBuffer;

public class BigtableNumericQuery {

    public static Query createNumericRangeQuery(
        String tableName, String columnFamily, String columnQualifier,
        double start, double end, boolean startInclusive, boolean endInclusive
    ) {
        ByteString startKey = toByteString(start);
        ByteString endKey = toByteString(end);

        Filters.Filter filter = Filters.FILTERS.chain()
            .filter(Filters.FILTERS.family().exactMatch(columnFamily))
            .filter(Filters.FILTERS.qualifier().exactMatch(columnQualifier))
            .filter(Filters.FILTERS.value().range()
                .start(startKey, startInclusive)
                .end(endKey, endInclusive)
            );

        return Query.create(tableName).filter(filter);
    }

    private static ByteString toByteString(double value) {
        return ByteString.copyFrom(ByteBuffer.allocate(8).putDouble(value).array());
    }
}

Key Points:

Byte Array Conversion with toByteString: The toByteString helper function encapsulates the logic for converting a double to a ByteString, making the main function more readable.
Built-in Filter with Chain: This leverages Bigtable's value().range() filter, combined with other filters in a chain to target specific columns. This ensures precise querying within your Bigtable dataset.
Inclusivity Control: The inclusion of startInclusive and endInclusive provides fine-grained control over the range boundaries, allowing for flexible specification of inclusive or exclusive bounds.
Adaptability: This function is easily adaptable to other numeric types like integers or longs by modifying the toByteString function for appropriate type conversions.

Additional Considerations:

Precision (Floating-Point): As highlighted, be cautious with potential rounding errors when converting floating-point numbers. Consider additional checks or precision-handling strategies for high-accuracy scenarios.
Indexing for Performance: In the case of large tables, employing secondary indexes on numerical columns can significantly enhance query performance and response times.

yufanzhang

On Google Cloud Website https://cloud.google.com/java/docs/reference/google-cloud-bigtable/latest/com.google.cloud.bigtable...., we can see the Rage can be bounded by a non-String or non-ByteString. Such as

startClosed(T start), startOpen(T start), endClosed(T start), endOpen(T start), or of(T startClosed, T endOpen)

In above functions, "T" is an Abstract data type which is not limited to String or ByteString. Therefore, can we pass "Double" as the Range data type "T" without converting to ByteString? Here, "Double" might be a custom Range Data Type.

ms4446

The Bigtable API documentation does indeed mention generic methods like startClosed(T start), startOpen(T start), endClosed(T start), and endOpen(T start). However, it's important to understand that the type parameter T must be compatible with Bigtable's underlying byte-oriented data model.

Despite the generic appearance in the Java client library, all data in Bigtable must ultimately be converted to ByteString. This is because Bigtable stores and processes data as byte arrays. While client libraries provide generic handlers for convenience, they don't change this fundamental requirement.

The Java client library doesn't support passing Java primitives like Double directly into these range methods. They must first be converted into a supported format (ByteString).

Even though Java uses generics for flexibility, all non-byte types must be serialized into bytes (ByteString) before Bigtable operations. Bigtable doesn't understand Java types; it only understands bytes.

To use Double values in range queries, you need to serialize them into ByteString. Here's how to do it using ByteBuffer:

 

Double startValue = 123.45;
Double endValue = 678.90;

ByteString startKey = ByteString.copyFrom(ByteBuffer.allocate(8).putDouble(startValue).array());
ByteString endKey = ByteString.copyFrom(ByteBuffer.allocate(8).putDouble(endValue).array());

Filters.Filter filter = Filters.FILTERS.value().range()
    .startClosed(startKey)
    .endClosed(endKey); 

While the Java client library API may appear to accept a generic type T, in practice, all data types must be converted to ByteString. This is fundamental to Bigtable's byte-oriented data model, and all higher-level abstractions ultimately map down to byte operations for storage and retrieval.

This approach converts the doubles correctly into the byte format that Bigtable requires.

yufanzhang

I think there should be another "Key Point" that the "double value" for a column has to be converted to String value by the same toByteString(double value)function at first when we insert that column's double value into a Bigtable table so that the String comparing method can be used by "ValueFilter range()". Is that true?

How to Query Bigtable rows by min, max, or between min and max on column values