Parsing SQLs

Varsha Gupta
In Data Analysis, SQLs are inevitable. Right business rules, the right business transformation logic, and, thereby, the right business decision depends on efficient SQLs. Validating and processing large complex SQLs with the aid of SQL parsing tools or libraries must go hand in hand with SQLs and insight automation.
For software to parse SQLs, what would be the most efficient and cost-effective approach? Writing custom-built parsers using regex? Using commercial SQL parsers? Using open-source SQL parsers?
- Custom-built parsers take a higher amount of time to roll out. It has a higher maintenance cost. And it will not cover most of the SQL scenarios. Every time a new scenario comes up, the developer will end up coding again.
- Commercial parsers are usually expensive. Most of the commercial library-software are not available for any modifications based on custom requirements. That impacts the flexibility of the product.
What is left are open-source SQL parsers viz. Apache Calcite, ANTLR, JSqlParser.
- Apache Calcite is used in popular databases and query engines viz. Apache Hive, Drill, Flink, Phoenix. Calcite requires more computing resources and coding effort to incorporate into software and run in production.
- ANTLR is widely used to build tools and frameworks. However, ANTLR also requires higher coding effort. The computing resource requirement is less compared to that of Calcite.
- JSqlParser is an SQL statement parser, it can parse multiple SQL dialects viz. MySQL, Postgres, Oracle. For E.g. it supports Oracle’s JOIN syntax using (+), PostgreSQLs cast syntax using ::, relational operators like != and so on. The results can be accessed in a structured way.
The grammar can be modified to support other SQL dialects such as that of hive. JSqlParser is lightweight and can
- Validate SQL Queries
- Check Semantics and Syntax
- Auto-map columns to respective entity/table in all SQL statements.
- Extract SQL details from DDL and DML statements.
JSqlParser is compatible with java and allows one to extend/customize the library to handle complex SQLs. It parses an SQL statement and translates it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern. JSqlParser is built using JavaCC. The core JavaCC grammar for SQL has been taken from Guido Draheim’s site and has been changed to produce a hierarchy of Java classes.
JSqlParser completely supports the following features
- Query parser, validator, and optimizer.
- Support of multiple standard SQL and aggregate functions.
- SELECT, FROM (including JOIN syntax), WHERE, GROUP BY (including GROUPING SETS), aggregate functions (including COUNT(DISTINCT …) and FILTER), HAVING, ORDER BY (including NULLS FIRST/LAST), set operations (INTERSECT, MINUS), sub-queries (including correlated sub-queries), windowed aggregates, LIMIT (syntax as Postgres);
- DELETE, INSERT and INSERT-INTO statements.
- CREATE Table statement, add Primary key, SET the index.
JSqlParser can be extended to support some specific scenarios.
Example1:
In a SELECT query which has UNION operator, combine same fields and show that field in one column. By default, JSqlParser will show it as two or more columns – depending on the number of UNION operator.
SELECT AccountNumber FROM Customer_Table
UNION
SELECT AccountNumber FROM Billing_Table
ORDER BY AccountNumber;
Expected output field: AccountNumber
JSQL parser default output: AccountNumber, AccountNumber
Example2:
In the Input SELECT statement, the new column can be added using JSqlParser for a single select expression. Multi-SELECT expression is not supported viz. query with UNION is not supported
Select select = (Select) CCJSqlParserUtil.parse(” SELECT City FROM Customers”);
SelectUtils.addExpression(select, new Column(” State “));
In the above, a New column State is added. But same cannot achieve for query that has UNION.
SELECT City FROM Customers UNION SELECT City FROM Suppliers ORDER BY City;
In both the above examples, extension of JSqlParser is required.
To summarize, JSqlParser is an effective lightweight SQL parsing that covers a wide range of SQL cases. It supports extension and can be useful in ones’ development effort.
Relevant Blogs
Analytics Tools Vs Revenue Assurance Platform
Kiran MohanWe have an analytics tool, why do I need a revenue assurance platform? More or less, that’s what we hear from many conversations we have. I don’t blame them as we are comparing apples vs oranges. The concept of revenue assurance is new to the subscription...
Impact of Technology Silos on Revenue Leakage
Kiran MohanThe current subscription landscape relies on various technology systems such as CRM, sales force automation, billing systems, financial systems, and payment gateways to manage their operations and drive revenue growth. However, a significant challenge that...
xfactrs Launch at the Subscription Insider Show
Ravin Checker and team has launched xfactrs - the most awaited decision intelligence platform for subscription businesses. The event took place at the Subscription Insider Show 22. Here is the transcript from Ravin's presentation. .... Drip. Drip. Drip. That is the...