Parsing SQLs

Varsha Gupta
In Data Analysis, SQLs are inevitable. Right business rules, the right business transformation logic, and, thereby, the right business decision depends on efficient SQLs. Validating and processing large complex SQLs with the aid of SQL parsing tools or libraries must go hand in hand with SQLs and insight automation.
For software to parse SQLs, what would be the most efficient and cost-effective approach? Writing custom-built parsers using regex? Using commercial SQL parsers? Using open-source SQL parsers?
- Custom-built parsers take a higher amount of time to roll out. It has a higher maintenance cost. And it will not cover most of the SQL scenarios. Every time a new scenario comes up, the developer will end up coding again.
- Commercial parsers are usually expensive. Most of the commercial library-software are not available for any modifications based on custom requirements. That impacts the flexibility of the product.
What is left are open-source SQL parsers viz. Apache Calcite, ANTLR, JSqlParser.
- Apache Calcite is used in popular databases and query engines viz. Apache Hive, Drill, Flink, Phoenix. Calcite requires more computing resources and coding effort to incorporate into software and run in production.
- ANTLR is widely used to build tools and frameworks. However, ANTLR also requires higher coding effort. The computing resource requirement is less compared to that of Calcite.
- JSqlParser is an SQL statement parser, it can parse multiple SQL dialects viz. MySQL, Postgres, Oracle. For E.g. it supports Oracle’s JOIN syntax using (+), PostgreSQLs cast syntax using ::, relational operators like != and so on. The results can be accessed in a structured way.
The grammar can be modified to support other SQL dialects such as that of hive. JSqlParser is lightweight and can
- Validate SQL Queries
- Check Semantics and Syntax
- Auto-map columns to respective entity/table in all SQL statements.
- Extract SQL details from DDL and DML statements.
JSqlParser is compatible with java and allows one to extend/customize the library to handle complex SQLs. It parses an SQL statement and translates it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern. JSqlParser is built using JavaCC. The core JavaCC grammar for SQL has been taken from Guido Draheim’s site and has been changed to produce a hierarchy of Java classes.
JSqlParser completely supports the following features
- Query parser, validator, and optimizer.
- Support of multiple standard SQL and aggregate functions.
- SELECT, FROM (including JOIN syntax), WHERE, GROUP BY (including GROUPING SETS), aggregate functions (including COUNT(DISTINCT …) and FILTER), HAVING, ORDER BY (including NULLS FIRST/LAST), set operations (INTERSECT, MINUS), sub-queries (including correlated sub-queries), windowed aggregates, LIMIT (syntax as Postgres);
- DELETE, INSERT and INSERT-INTO statements.
- CREATE Table statement, add Primary key, SET the index.
JSqlParser can be extended to support some specific scenarios.
Example1:
In a SELECT query which has UNION operator, combine same fields and show that field in one column. By default, JSqlParser will show it as two or more columns – depending on the number of UNION operator.
SELECT AccountNumber FROM Customer_Table
UNION
SELECT AccountNumber FROM Billing_Table
ORDER BY AccountNumber;
Expected output field: AccountNumber
JSQL parser default output: AccountNumber, AccountNumber
Example2:
In the Input SELECT statement, the new column can be added using JSqlParser for a single select expression. Multi-SELECT expression is not supported viz. query with UNION is not supported
Select select = (Select) CCJSqlParserUtil.parse(” SELECT City FROM Customers”);
SelectUtils.addExpression(select, new Column(” State “));
In the above, a New column State is added. But same cannot achieve for query that has UNION.
SELECT City FROM Customers UNION SELECT City FROM Suppliers ORDER BY City;
In both the above examples, extension of JSqlParser is required.
To summarize, JSqlParser is an effective lightweight SQL parsing that covers a wide range of SQL cases. It supports extension and can be useful in ones’ development effort.
Relevant Blogs
xfactrs Launch at the Subscription Insider Show
Ravin Checker and team has launched xfactrs - the most awaited decision intelligence platform for subscription businesses. The event took place at the Subscription Insider Show 22. Here is the transcript from Ravin's presentation. .... Drip. Drip. Drip. That is the...
Are the subscription businesses flying blind?
Kiran MohanThat might be too harsh a question, but a relevant one given what we see in the industry. So, how is your subscription business doing? Are you making the right decisions to grow your business? Chances are might not, optimally. This is because the current...
Potential Revenue Leakage Issues that can Occur Even Before the Sale
Kiran Mohan, Marketing AssuranceSubscription models offer favorable opportunities for both the providers and customers alike, yet it can be a nightmare owing to its inherent complexity. The nature of subscription businesses is that solutions are often customized to...