Skip to content
  • Narayanan Iyer's avatar
    ce30c514
    [#345][#503] Implement `CREATE FUNCTION`/`DROP FUNCTION` including support for... · ce30c514
    Narayanan Iyer authored and Jon Badiali's avatar Jon Badiali committed
    [#345][#503] Implement `CREATE FUNCTION`/`DROP FUNCTION` including support for parameter and return value type specifications in user-defined functions
    
    Previously, Octo did not support parameter or return value type specifications for user-defined (or internally defined) functions. This was due to the fact that user-defined M functions were to be called inline in SQL queries, which amounted to these function calls being literally embedded in physical plans with no Octo/SQL level validation.
    
    This commit revises this behavior so that Octo can track and enforce type constraints on user-defined function parameters and return values. This was accomplished by implementing the previously absent `CREATE FUNCTION` syntax.
    
    As part of this implementation, inline M function calls have been disallowed. This changes the previous behavior such that queries like `SELECT $$^USERFUNC FROM names;` will no longer work. If a user attempts such a call, they will now see a new `ERR_M_CALL` (formerly `ERR_ROCTO_M_CALL`) error message and their query will not be executed.
    
    Instead, users must now map a SQL function definition to an M function via the `CREATE FUNCTION` interface. Without such a mapping, no M function can be called in either Octo or Rocto. This holds for both intrinsic and extrinsic functions. Consequently, a number of hard-coded function entries in `postgres-seed.zwr` were removed and replaced by `CREATE FUNCTION` statements in `postgres-seed.sql`.
    
    This implementation entailed revisions to a number of modules, including the lexer, the parser, and both the logical and physical planning stages, as described below.
    
    The changes to `lexer.l` included:
    + New `FUNCTION` and `RETURNS` tokens
    + Support for `%` as a leading character to M label and routine names
    + Case sensitivity for M extrinsic functions
    + Support for intrinsic functions (via a new `INTRINSIC_FUNCTION` parser token)
    
    The changes to the parser span `parser.y` and `drop.y`:
    + Added parser rules for `CREATE FUNCTION` and `DROP FUNCTION`, (i.e. `drop_function_statement` and `function_definition`)
    + Added a new `m_function` parser rule that handles both `INTRINSIC_FUNCTION` and `EXTRINSIC_FUNCTION` tokens
    + Modified the `generic_function_call` rule to maintain the unqualified function name for use when loading the binary representation of the `CREATE FUNCTION` parse tree
    + Disallowed inline M function calls for both `INTRINSIC_FUNCTION` and `EXTRINSIC_FUNCTION`
    
    To support the new functionality, a number of struct and type changes were made in `octo_types.h`:
    + Renamed the `table_STATEMENT` `SqlStatementType` to `create_table_STATEMENT` and a new `create_function_STATEMENT` type
    + Renamed `SqlDropStatement` renamed to `SqlDropTableStatement` and added a new `SqlDropFunctionStatement` definition along with `drop_function_STATEMENT` type
    + Added `SqlFunction` and `SqlParameterTypeList` structure definitions
    + Added `base_function_name` member to SqlFunctionCall for storing function name specified by the user in addition to the fully qualified M function name
    + Corrected comment for `SqlColumnList` to account for previously unaccounted for types of `SqlStatements`
    
    `run_query.c` was updated to handle both the `create_function_STATEMENT` and `drop_function_STATEMENT` cases by copying and modifying the code for the `create_table_STATEMENT` and `drop_table_STATEMENT` cases. The relevant modifications were to change the nodes used for storing the various representations of the function information and add corresponding modules to perform this storage:
    + Added new `emit_create_function.c` module to create SQL text string representation of `CREATE FUNCTION` statements
    + Revised `qualify_function_name.c` to treat SQL standard and user-defined functions the same
    + Removed code for handling inline M functions from `qualify_function_name.c`, as they are no longer allowed
    + Added logic for compression of `SqlFunction` statements to `compress_statement.c`
    + Revised `find_table.c` and `cleanup_tables.c` to use new LVN structure for storing, retrieving, and disposing of loaded schemas (i.e. `^%ydboctoloadedschemas("tables")`)
    + Added new `store_function_in_pg_proc` function for persisting function definitions in the catalog and matching `delete_function_from_pg_proc` for cleanup
    + Revised `drop_table_from_local_cache.c` to more generic `drop_schema_from_local_cache.c`
    + Added `CREATE FUNCTION`-related cases to `decompress_statement.c` and added check for new case of `data_type_STATEMENT`s in `compress_statement.c` and `decompress_statement.c`
    
    The module that actually implements the type checking functionality described by issue #345 is `populate_data_type.c`. Now, this module unpacks the various components of a function call and checks them against a stored function registered by a previously issued `CREATE FUNCTION` statement. To support this:
    + Added a new `find_function.c` module was added for looking binary function representations, which is now called in the `function_call_STATEMENT` case in `populate_data_type`
    + Added a new `get_sqlvaluetype_from_sqldatatype` function for converting `SqlDataType`s to `SqlValueType`s.
    + Added a new `ERR_FUNCTION_PARAMETER_TYPE_MISMATCH` error message
    
    To test these changes, `test_create_function.bats.in` and `test_drop_function.bats.in` were created, with corresponding fixtures and outrefs. Also, `TERR021` was added to `test_errors.bats.in` to validate the new error cases introduced.
    
    Among the new tests was is `TCF010`, which validates handling of various function call return types. This revealed an issue with the handling of `BOOLEAN` return types, wherein functions that returned `t`/`f` or `true`/`false` would be treated as always false in the surrounding query. This was due to a lack of string conversion performed on the function's return value. To correct this, a number of changes were required:
    + The `SqlFunctionCall` structure and logical plan for `LP_FUNCTION_CALL`s were modified to track the return type for the relevant function, as specified by a preceding `CREATE FUNCTION` call
    + Added logic to `lp_generate_where.c` to add an additional node (of `LP_ACTION_TYPE` `LP_VALUE`) to `LP_FUNCTION_CALL` plans for holding the return type
    + Updated `lp_emit_plan`, `generate_physical_plan.c`, `lp_verify_structure`, and `lp_replace_derived_table_references` to account for the new logical plan node introduced by the above
    + Revised the `LP_FUNCTION_CALL` case in `tmpl_print_expression.ctemplate` to extract the return type for the function and, if it is a `BOOLEAN` type, wrap the return value with the `$$String2Boolean` M function
    
    Additionally, a number of tests were updated to account for the new `CREATE FUNCTION` syntax and removal of inline M functions:
    + `test_cancel_request.bats.in`
    + `test_coerce_type.bats.in`
    + `test_composite_key.bats.in`
    + `test_group_by.bats.in`
    + `test_select_columns.bats.in`
    + `test_where.bats.in`
    
    Note particularly that `TSC16` was revised to use a function that supports `VARCHAR` arguments in order to prevent type mismatch error that is caught with these changes. Also, a redundant query was removed from the `TSC16.sql` fixture.
    
    Similarly, a number of outrefs were also revised:
    + `T0023.ref`
    + `TCT007.ref`
    + `TCT008.ref`
    + `TCT010.ref`
    + `TCT011.ref`
    + `TEO03.ref`
    + `TERR010.ref`
    + `TJC003.ref`
    + `TML010.ref`
    + `TML011.ref`
    + `TMU02.ref`
    + `TOB06.ref`
    + `TOB10.ref`
    + `TSC10.ref`
    + `TSC12.ref`
    + `TSC13.ref`
    + `TSC16.ref`
    + `TSQ004.ref`
    + `TSQ006.ref`
    + `TSQ009.ref`
    + `TSSCQ20.ref`
    + `TWO14.ref`
    + `TWO17.ref`
    + `TWO18.ref`
    + `TWO19.ref`
    
    Also, Octo's grammar documentation has been updated with new entries for `CREATE FUNCTION` and `DROP FUNCTION`.
    
    Miscellaneous changes:
    + Fixed erroneous lexer rules (acceptance of `$` at start of `IDENTIFIER` and of `%` in middle of M names)
    + Added TIMEOUT_10_SEC macro to `run_query.c`
    + Revised `ERR_ROCTO_INVALID_NUMBER` to more generic `ERR_INVALID_NUMBER`
    + Replaced `postgres-seed.*` files with `octo-seed.*` files and updated various references accordingly
    + Updated docs with explanation on how to get type names for function parameters and return types
    + Revised `ERR_INVALID_INPUT_SYNTAX_BOOL` message
    ce30c514
    [#345][#503] Implement `CREATE FUNCTION`/`DROP FUNCTION` including support for...
    Narayanan Iyer authored and Jon Badiali's avatar Jon Badiali committed
    [#345][#503] Implement `CREATE FUNCTION`/`DROP FUNCTION` including support for parameter and return value type specifications in user-defined functions
    
    Previously, Octo did not support parameter or return value type specifications for user-defined (or internally defined) functions. This was due to the fact that user-defined M functions were to be called inline in SQL queries, which amounted to these function calls being literally embedded in physical plans with no Octo/SQL level validation.
    
    This commit revises this behavior so that Octo can track and enforce type constraints on user-defined function parameters and return values. This was accomplished by implementing the previously absent `CREATE FUNCTION` syntax.
    
    As part of this implementation, inline M function calls have been disallowed. This changes the previous behavior such that queries like `SELECT $$^USERFUNC FROM names;` will no longer work. If a user attempts such a call, they will now see a new `ERR_M_CALL` (formerly `ERR_ROCTO_M_CALL`) error message and their query will not be executed.
    
    Instead, users must now map a SQL function definition to an M function via the `CREATE FUNCTION` interface. Without such a mapping, no M function can be called in either Octo or Rocto. This holds for both intrinsic and extrinsic functions. Consequently, a number of hard-coded function entries in `postgres-seed.zwr` were removed and replaced by `CREATE FUNCTION` statements in `postgres-seed.sql`.
    
    This implementation entailed revisions to a number of modules, including the lexer, the parser, and both the logical and physical planning stages, as described below.
    
    The changes to `lexer.l` included:
    + New `FUNCTION` and `RETURNS` tokens
    + Support for `%` as a leading character to M label and routine names
    + Case sensitivity for M extrinsic functions
    + Support for intrinsic functions (via a new `INTRINSIC_FUNCTION` parser token)
    
    The changes to the parser span `parser.y` and `drop.y`:
    + Added parser rules for `CREATE FUNCTION` and `DROP FUNCTION`, (i.e. `drop_function_statement` and `function_definition`)
    + Added a new `m_function` parser rule that handles both `INTRINSIC_FUNCTION` and `EXTRINSIC_FUNCTION` tokens
    + Modified the `generic_function_call` rule to maintain the unqualified function name for use when loading the binary representation of the `CREATE FUNCTION` parse tree
    + Disallowed inline M function calls for both `INTRINSIC_FUNCTION` and `EXTRINSIC_FUNCTION`
    
    To support the new functionality, a number of struct and type changes were made in `octo_types.h`:
    + Renamed the `table_STATEMENT` `SqlStatementType` to `create_table_STATEMENT` and a new `create_function_STATEMENT` type
    + Renamed `SqlDropStatement` renamed to `SqlDropTableStatement` and added a new `SqlDropFunctionStatement` definition along with `drop_function_STATEMENT` type
    + Added `SqlFunction` and `SqlParameterTypeList` structure definitions
    + Added `base_function_name` member to SqlFunctionCall for storing function name specified by the user in addition to the fully qualified M function name
    + Corrected comment for `SqlColumnList` to account for previously unaccounted for types of `SqlStatements`
    
    `run_query.c` was updated to handle both the `create_function_STATEMENT` and `drop_function_STATEMENT` cases by copying and modifying the code for the `create_table_STATEMENT` and `drop_table_STATEMENT` cases. The relevant modifications were to change the nodes used for storing the various representations of the function information and add corresponding modules to perform this storage:
    + Added new `emit_create_function.c` module to create SQL text string representation of `CREATE FUNCTION` statements
    + Revised `qualify_function_name.c` to treat SQL standard and user-defined functions the same
    + Removed code for handling inline M functions from `qualify_function_name.c`, as they are no longer allowed
    + Added logic for compression of `SqlFunction` statements to `compress_statement.c`
    + Revised `find_table.c` and `cleanup_tables.c` to use new LVN structure for storing, retrieving, and disposing of loaded schemas (i.e. `^%ydboctoloadedschemas("tables")`)
    + Added new `store_function_in_pg_proc` function for persisting function definitions in the catalog and matching `delete_function_from_pg_proc` for cleanup
    + Revised `drop_table_from_local_cache.c` to more generic `drop_schema_from_local_cache.c`
    + Added `CREATE FUNCTION`-related cases to `decompress_statement.c` and added check for new case of `data_type_STATEMENT`s in `compress_statement.c` and `decompress_statement.c`
    
    The module that actually implements the type checking functionality described by issue #345 is `populate_data_type.c`. Now, this module unpacks the various components of a function call and checks them against a stored function registered by a previously issued `CREATE FUNCTION` statement. To support this:
    + Added a new `find_function.c` module was added for looking binary function representations, which is now called in the `function_call_STATEMENT` case in `populate_data_type`
    + Added a new `get_sqlvaluetype_from_sqldatatype` function for converting `SqlDataType`s to `SqlValueType`s.
    + Added a new `ERR_FUNCTION_PARAMETER_TYPE_MISMATCH` error message
    
    To test these changes, `test_create_function.bats.in` and `test_drop_function.bats.in` were created, with corresponding fixtures and outrefs. Also, `TERR021` was added to `test_errors.bats.in` to validate the new error cases introduced.
    
    Among the new tests was is `TCF010`, which validates handling of various function call return types. This revealed an issue with the handling of `BOOLEAN` return types, wherein functions that returned `t`/`f` or `true`/`false` would be treated as always false in the surrounding query. This was due to a lack of string conversion performed on the function's return value. To correct this, a number of changes were required:
    + The `SqlFunctionCall` structure and logical plan for `LP_FUNCTION_CALL`s were modified to track the return type for the relevant function, as specified by a preceding `CREATE FUNCTION` call
    + Added logic to `lp_generate_where.c` to add an additional node (of `LP_ACTION_TYPE` `LP_VALUE`) to `LP_FUNCTION_CALL` plans for holding the return type
    + Updated `lp_emit_plan`, `generate_physical_plan.c`, `lp_verify_structure`, and `lp_replace_derived_table_references` to account for the new logical plan node introduced by the above
    + Revised the `LP_FUNCTION_CALL` case in `tmpl_print_expression.ctemplate` to extract the return type for the function and, if it is a `BOOLEAN` type, wrap the return value with the `$$String2Boolean` M function
    
    Additionally, a number of tests were updated to account for the new `CREATE FUNCTION` syntax and removal of inline M functions:
    + `test_cancel_request.bats.in`
    + `test_coerce_type.bats.in`
    + `test_composite_key.bats.in`
    + `test_group_by.bats.in`
    + `test_select_columns.bats.in`
    + `test_where.bats.in`
    
    Note particularly that `TSC16` was revised to use a function that supports `VARCHAR` arguments in order to prevent type mismatch error that is caught with these changes. Also, a redundant query was removed from the `TSC16.sql` fixture.
    
    Similarly, a number of outrefs were also revised:
    + `T0023.ref`
    + `TCT007.ref`
    + `TCT008.ref`
    + `TCT010.ref`
    + `TCT011.ref`
    + `TEO03.ref`
    + `TERR010.ref`
    + `TJC003.ref`
    + `TML010.ref`
    + `TML011.ref`
    + `TMU02.ref`
    + `TOB06.ref`
    + `TOB10.ref`
    + `TSC10.ref`
    + `TSC12.ref`
    + `TSC13.ref`
    + `TSC16.ref`
    + `TSQ004.ref`
    + `TSQ006.ref`
    + `TSQ009.ref`
    + `TSSCQ20.ref`
    + `TWO14.ref`
    + `TWO17.ref`
    + `TWO18.ref`
    + `TWO19.ref`
    
    Also, Octo's grammar documentation has been updated with new entries for `CREATE FUNCTION` and `DROP FUNCTION`.
    
    Miscellaneous changes:
    + Fixed erroneous lexer rules (acceptance of `$` at start of `IDENTIFIER` and of `%` in middle of M names)
    + Added TIMEOUT_10_SEC macro to `run_query.c`
    + Revised `ERR_ROCTO_INVALID_NUMBER` to more generic `ERR_INVALID_NUMBER`
    + Replaced `postgres-seed.*` files with `octo-seed.*` files and updated various references accordingly
    + Updated docs with explanation on how to get type names for function parameters and return types
    + Revised `ERR_INVALID_INPUT_SYNTAX_BOOL` message
Loading