Extend ClickHouse::Client to support statement logging

When implementing ClickHouse-based features it will be important to expose the executed SQL statements in the logs so we can monitor and optionally debug performance issues.

Currently, the ClickHouse::Client accepts a raw SQL string which will be sent over HTTP to the ClickHouse server. Example:

ClickHouse::Client.select('select * from issues', :main)

For logging, a subscriber can listen to the sql.click_house events to get the executed SQL queries:

ActiveSupport::Notifications.subscribe('sql.click_house') do |_, _, _, _, data| 
  puts data[:query] # send query to the logging system
end

Concern: the logged SQL statements might contain customer-specific sensitive parameters.

Securing SQL queries

The proposed solution would address two problems:

Reduce the likelihood of SQL injections by having a well-defined place to do the sanitization (placeholders hash)
Ability to generate the normalized version of the query where placeholders are masked. For PostgreSQL this is done by the pg_query library. Unfortunately, it's not working for ClickHouse.

Let's introduce a ClickHouse::Query object which can represent a SQL query where placeholder interpolation is delayed until the last point before sending the data over to ClickHouse.

class ClickHouse::Query
  def initialize(query:, placeholders:)
  end

  def to_sql
    query % placeholders
  end

  def to_redacted_sql
    query # no placeholder replacement or use incrementing $x values
  end
end

Example invocation:

query = "SELECT * FROM issues WHERE id > %{id} AND title LIKE %{title}"

# Always escape the values
placeholders = {
  id: Integer(1),
  title: quote('foo%')
}

query = ClickHouse::Query.new(query: query, placeholders: placeholders)
ClickHouse::Client.select(query, :main)

The same behavior could be built into the ClickHouse::QueryBuilder class which would return a ClickHouse::Query object.

Edited Aug 10, 2023 by Adam Hegyi