File size: 14,125 Bytes
f56a29b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 | # CJS Module Lexer
[![Build Status][travis-image]][travis-url]
A [very fast](#benchmarks) JS CommonJS module syntax lexer used to detect the most likely list of named exports of a CommonJS module.
Outputs the list of named exports (`exports.name = ...`) and possible module reexports (`module.exports = require('...')`), including the common transpiler variations of these cases.
Forked from https://github.com/guybedford/es-module-lexer.
_Comprehensively handles the JS language grammar while remaining small and fast. - ~90ms per MB of JS cold and ~15ms per MB of JS warm, [see benchmarks](#benchmarks) for more info._
### Project Status
This project is used in Node.js core for detecting the named exports available when importing a CJS module into ESM, and is maintained for this purpose.
PRs will be accepted and upstreamed for parser bugs, performance improvements or new syntax support only.
_Detection patterns for this project are **frozen**_. This is because adding any new export detection patterns would result in fragmented backwards-compatibility. Specifically, it would be very difficult to figure out why an ES module named export for CommonJS might work in newer Node.js versions but not older versions. This problem would only be discovered downstream of module authors, with the fix for module authors being to then have to understand which patterns in this project provide full backwards-compatibily. Rather, by fully freezing the detected patterns, if it works in any Node.js version it will work in any other. Build tools can also reliably treat the supported syntax for this project as a part of their output target for ensuring syntax support.
### Usage
```
npm install cjs-module-lexer
```
For use in CommonJS:
```js
const { parse } = require('cjs-module-lexer');
// `init` return a promise for parity with the ESM API, but you do not have to call it
const { exports, reexports } = parse(`
// named exports detection
module.exports.a = 'a';
(function () {
exports.b = 'b';
})();
Object.defineProperty(exports, 'c', { value: 'c' });
/* exports.d = 'not detected'; */
// reexports detection
if (maybe) module.exports = require('./dep1.js');
if (another) module.exports = require('./dep2.js');
// literal exports assignments
module.exports = { a, b: c, d, 'e': f }
// __esModule detection
Object.defineProperty(module.exports, '__esModule', { value: true })
`);
// exports === ['a', 'b', 'c', '__esModule']
// reexports === ['./dep1.js', './dep2.js']
```
When using the ESM version, Wasm is supported instead:
```js
import { parse, init } from 'cjs-module-lexer';
// init() needs to be called and waited upon, or use initSync() to compile
// Wasm blockingly and synchronously.
await init();
const { exports, reexports } = parse(source);
```
The Wasm build is around 1.5x faster and without a cold start.
### Grammar
CommonJS exports matches are run against the source token stream.
The token grammar is:
```
IDENTIFIER: As defined by ECMA-262, without support for identifier `\` escapes, filtered to remove strict reserved words:
"implements", "interface", "let", "package", "private", "protected", "public", "static", "yield", "enum"
STRING_LITERAL: A `"` or `'` bounded ECMA-262 string literal.
MODULE_EXPORTS: `module` `.` `exports`
EXPORTS_IDENTIFIER: MODULE_EXPORTS_IDENTIFIER | `exports`
EXPORTS_DOT_ASSIGN: EXPORTS_IDENTIFIER `.` IDENTIFIER `=`
EXPORTS_LITERAL_COMPUTED_ASSIGN: EXPORTS_IDENTIFIER `[` STRING_LITERAL `]` `=`
EXPORTS_LITERAL_PROP: (IDENTIFIER (`:` IDENTIFIER)?) | (STRING_LITERAL `:` IDENTIFIER)
EXPORTS_SPREAD: `...` (IDENTIFIER | REQUIRE)
EXPORTS_MEMBER: EXPORTS_DOT_ASSIGN | EXPORTS_LITERAL_COMPUTED_ASSIGN
EXPORTS_DEFINE: `Object` `.` `defineProperty `(` EXPORTS_IDENFITIER `,` STRING_LITERAL
EXPORTS_DEFINE_VALUE: EXPORTS_DEFINE `, {`
(`enumerable: true,`)?
(
`value:` |
`get` (`: function` IDENTIFIER? )? `() {` return IDENTIFIER (`.` IDENTIFIER | `[` STRING_LITERAL `]`)? `;`? `}` `,`?
)
`})`
EXPORTS_LITERAL: MODULE_EXPORTS `=` `{` (EXPORTS_LITERAL_PROP | EXPORTS_SPREAD) `,`)+ `}`
REQUIRE: `require` `(` STRING_LITERAL `)`
EXPORTS_ASSIGN: (`var` | `const` | `let`) IDENTIFIER `=` (`_interopRequireWildcard (`)? REQUIRE
MODULE_EXPORTS_ASSIGN: MODULE_EXPORTS `=` REQUIRE
EXPORT_STAR: (`__export` | `__exportStar`) `(` REQUIRE
EXPORT_STAR_LIB: `Object.keys(` IDENTIFIER$1 `).forEach(function (` IDENTIFIER$2 `) {`
(
(
`if (` IDENTIFIER$2 `===` ( `'default'` | `"default"` ) `||` IDENTIFIER$2 `===` ( '__esModule' | `"__esModule"` ) `) return` `;`?
(
(`if (Object` `.prototype`? `.hasOwnProperty.call(` IDENTIFIER `, ` IDENTIFIER$2 `)) return` `;`?)?
(`if (` IDENTIFIER$2 `in` EXPORTS_IDENTIFIER `&&` EXPORTS_IDENTIFIER `[` IDENTIFIER$2 `] ===` IDENTIFIER$1 `[` IDENTIFIER$2 `]) return` `;`)?
)?
) |
`if (` IDENTIFIER$2 `!==` ( `'default'` | `"default"` ) (`&& !` (`Object` `.prototype`? `.hasOwnProperty.call(` IDENTIFIER `, ` IDENTIFIER$2 `)` | IDENTIFIER `.hasOwnProperty(` IDENTIFIER$2 `)`))? `)`
)
(
EXPORTS_IDENTIFIER `[` IDENTIFIER$2 `] =` IDENTIFIER$1 `[` IDENTIFIER$2 `]` `;`? |
`Object.defineProperty(` EXPORTS_IDENTIFIER `, ` IDENTIFIER$2 `, { enumerable: true, get` (`: function` IDENTIFIER? )? `() { return ` IDENTIFIER$1 `[` IDENTIFIER$2 `]` `;`? `}` `,`? `})` `;`?
)
`})`
```
Spacing between tokens is taken to be any ECMA-262 whitespace, ECMA-262 block comment or ECMA-262 line comment.
* The returned export names are taken to be the combination of:
1. All `IDENTIFIER` and `STRING_LITERAL` slots for `EXPORTS_MEMBER` and `EXPORTS_LITERAL` matches.
2. The first `STRING_LITERAL` slot for all `EXPORTS_DEFINE_VALUE` matches where that same string is not an `EXPORTS_DEFINE` match that is not also an `EXPORTS_DEFINE_VALUE` match.
* The reexport specifiers are taken to be the combination of:
1. The `REQUIRE` matches of the last matched of either `MODULE_EXPORTS_ASSIGN` or `EXPORTS_LITERAL`.
2. All _top-level_ `EXPORT_STAR` `REQUIRE` matches and `EXPORTS_ASSIGN` matches whose `IDENTIFIER` also matches the first `IDENTIFIER` in `EXPORT_STAR_LIB`.
### Parsing Examples
#### Named Exports Parsing
The basic matching rules for named exports are `exports.name`, `exports['name']` or `Object.defineProperty(exports, 'name', ...)`. This matching is done without scope analysis and regardless of the expression position:
```js
// DETECTS EXPORTS: a, b
(function (exports) {
exports.a = 'a';
exports['b'] = 'b';
})(exports);
```
Because there is no scope analysis, the above detection may overclassify:
```js
// DETECTS EXPORTS: a, b, c
(function (exports, Object) {
exports.a = 'a';
exports['b'] = 'b';
if (false)
exports.c = 'c';
})(NOT_EXPORTS, NOT_OBJECT);
```
It will in turn underclassify in cases where the identifiers are renamed:
```js
// DETECTS: NO EXPORTS
(function (e) {
e.a = 'a';
e['b'] = 'b';
})(exports);
```
#### Getter Exports Parsing
`Object.defineProperty` is detected for specifically value and getter forms returning an identifier or member expression:
```js
// DETECTS: a, b, c, d, __esModule
Object.defineProperty(exports, 'a', {
enumerable: true,
get: function () {
return q.p;
}
});
Object.defineProperty(exports, 'b', {
enumerable: true,
get: function () {
return q['p'];
}
});
Object.defineProperty(exports, 'c', {
enumerable: true,
get () {
return b;
}
});
Object.defineProperty(exports, 'd', { value: 'd' });
Object.defineProperty(exports, '__esModule', { value: true });
```
Value properties are also detected specifically:
```js
Object.defineProperty(exports, 'a', {
value: 'no problem'
});
```
To avoid matching getters that have side effects, any getter for an export name that does not support the forms above will
opt-out of the getter matching:
```js
// DETECTS: NO EXPORTS
Object.defineProperty(exports, 'a', {
get () {
return 'nope';
}
});
if (false) {
Object.defineProperty(module.exports, 'a', {
get () {
return dynamic();
}
})
}
```
Alternative object definition structures or getter function bodies are not detected:
```js
// DETECTS: NO EXPORTS
Object.defineProperty(exports, 'a', {
enumerable: false,
get () {
return p;
}
});
Object.defineProperty(exports, 'b', {
configurable: true,
get () {
return p;
}
});
Object.defineProperty(exports, 'c', {
get: () => p
});
Object.defineProperty(exports, 'd', {
enumerable: true,
get: function () {
return dynamic();
}
});
Object.defineProperty(exports, 'e', {
enumerable: true,
get () {
return 'str';
}
});
```
`Object.defineProperties` is also not supported.
#### Exports Object Assignment
A best-effort is made to detect `module.exports` object assignments, but because this is not a full parser, arbitrary expressions are not handled in the
object parsing process.
Simple object definitions are supported:
```js
// DETECTS EXPORTS: a, b, c
module.exports = {
a,
'b': b,
c: c,
...d
};
```
Object properties that are not identifiers or string expressions will bail out of the object detection, while spreads are ignored:
```js
// DETECTS EXPORTS: a, b
module.exports = {
a,
...d,
b: require('c'),
c: "not detected since require('c') above bails the object detection"
}
```
`Object.defineProperties` is not currently supported either.
#### module.exports reexport assignment
Any `module.exports = require('mod')` assignment is detected as a reexport, but only the last one is returned:
```js
// DETECTS REEXPORTS: c
module.exports = require('a');
(module => module.exports = require('b'))(NOT_MODULE);
if (false) module.exports = require('c');
```
This is to avoid over-classification in Webpack bundles with externals which include `module.exports = require('external')` in their source for every external dependency.
In exports object assignment, any spread of `require()` are detected as multiple separate reexports:
```js
// DETECTS REEXPORTS: a, b
module.exports = require('ignored');
module.exports = {
...require('a'),
...require('b')
};
```
#### Transpiler Re-exports
For named exports, transpiler output works well with the rules described above.
But for star re-exports, special care is taken to support common patterns of transpiler outputs from Babel and TypeScript as well as bundlers like RollupJS.
These reexport and star reexport patterns are restricted to only be detected at the top-level as provided by the direct output of these tools.
For example, `export * from 'external'` is output by Babel as:
```js
"use strict";
exports.__esModule = true;
var _external = require("external");
Object.keys(_external).forEach(function (key) {
if (key === "default" || key === "__esModule") return;
exports[key] = _external[key];
});
```
Where the `var _external = require("external")` is specifically detected as well as the `Object.keys(_external)` statement, down to the exact
for of that entire expression including minor variations of the output. The `_external` and `key` identifiers are carefully matched in this
detection.
Similarly for TypeScript, `export * from 'external'` is output as:
```js
"use strict";
function __export(m) {
for (var p in m) if (!exports.hasOwnProperty(p)) exports[p] = m[p];
}
Object.defineProperty(exports, "__esModule", { value: true });
__export(require("external"));
```
Where the `__export(require("external"))` statement is explicitly detected as a reexport, including variations `tslib.__export` and `__exportStar`.
### Environment Support
Node.js 10+, and [all browsers with Web Assembly support](https://caniuse.com/#feat=wasm).
### JS Grammar Support
* Token state parses all line comments, block comments, strings, template strings, blocks, parens and punctuators.
* Division operator / regex token ambiguity is handled via backtracking checks against punctuator prefixes, including closing brace or paren backtracking.
* Always correctly parses valid JS source, but may parse invalid JS source without errors.
### Benchmarks
Benchmarks can be run with `npm run bench`.
Current results:
JS Build:
```
Module load time
> 4ms
Cold Run, All Samples
test/samples/*.js (3635 KiB)
> 299ms
Warm Runs (average of 25 runs)
test/samples/angular.js (1410 KiB)
> 13.96ms
test/samples/angular.min.js (303 KiB)
> 4.72ms
test/samples/d3.js (553 KiB)
> 6.76ms
test/samples/d3.min.js (250 KiB)
> 4ms
test/samples/magic-string.js (34 KiB)
> 0.64ms
test/samples/magic-string.min.js (20 KiB)
> 0ms
test/samples/rollup.js (698 KiB)
> 8.48ms
test/samples/rollup.min.js (367 KiB)
> 5.36ms
Warm Runs, All Samples (average of 25 runs)
test/samples/*.js (3635 KiB)
> 40.28ms
```
Wasm Build:
```
Module load time
> 10ms
Cold Run, All Samples
test/samples/*.js (3635 KiB)
> 43ms
Warm Runs (average of 25 runs)
test/samples/angular.js (1410 KiB)
> 9.32ms
test/samples/angular.min.js (303 KiB)
> 3.16ms
test/samples/d3.js (553 KiB)
> 5ms
test/samples/d3.min.js (250 KiB)
> 2.32ms
test/samples/magic-string.js (34 KiB)
> 0.16ms
test/samples/magic-string.min.js (20 KiB)
> 0ms
test/samples/rollup.js (698 KiB)
> 6.28ms
test/samples/rollup.min.js (367 KiB)
> 3.6ms
Warm Runs, All Samples (average of 25 runs)
test/samples/*.js (3635 KiB)
> 27.76ms
```
### Wasm Build Steps
To build download the WASI SDK from https://github.com/WebAssembly/wasi-sdk/releases.
The Makefile assumes the existence of "wasi-sdk-11.0" and "wabt" (optional) as sibling folders to this project.
The build through the Makefile is then run via `make lib/lexer.wasm`, which can also be triggered via `npm run build-wasm` to create `dist/lexer.js`.
On Windows it may be preferable to use the Linux subsystem.
After the Web Assembly build, the CJS build can be triggered via `npm run build`.
Optimization passes are run with [Binaryen](https://github.com/WebAssembly/binaryen) prior to publish to reduce the Web Assembly footprint.
### License
MIT
[travis-url]: https://travis-ci.org/guybedford/es-module-lexer
[travis-image]: https://travis-ci.org/guybedford/es-module-lexer.svg?branch=master
|