Extend ClickHouse::Client to support statement logging
When implementing ClickHouse-based features it will be important to expose the executed SQL statements in the logs so we can monitor and optionally debug performance issues.
Currently, the ClickHouse::Client
accepts a raw SQL string which will be sent over HTTP to the ClickHouse server. Example:
ClickHouse::Client.select('select * from issues', :main)
For logging, a subscriber can listen to the sql.click_house
events to get the executed SQL
queries:
ActiveSupport::Notifications.subscribe('sql.click_house') do |_, _, _, _, data|
puts data[:query] # send query to the logging system
end
Concern: the logged SQL statements might contain customer-specific sensitive parameters.
Securing SQL queries
The proposed solution would address two problems:
- Reduce the likelihood of SQL injections by having a well-defined place to do the sanitization (
placeholders
hash) - Ability to generate the normalized version of the query where placeholders are masked. For PostgreSQL this is done by the
pg_query
library. Unfortunately, it's not working for ClickHouse.
Let's introduce a ClickHouse::Query
object which can represent a SQL query where placeholder interpolation is delayed until the last point before sending the data over to ClickHouse.
class ClickHouse::Query
def initialize(query:, placeholders:)
end
def to_sql
query % placeholders
end
def to_redacted_sql
query # no placeholder replacement or use incrementing $x values
end
end
Example invocation:
query = "SELECT * FROM issues WHERE id > %{id} AND title LIKE %{title}"
# Always escape the values
placeholders = {
id: Integer(1),
title: quote('foo%')
}
query = ClickHouse::Query.new(query: query, placeholders: placeholders)
ClickHouse::Client.select(query, :main)
The same behavior could be built into the ClickHouse::QueryBuilder
class which would return a ClickHouse::Query
object.