On JavaScript's Weirdness
“JavaScript sucks because
'0' == 0
!”- literally everyone ever
Sure, that part of JavaScript sucks, but every JS setup these days contains a linter that yells at you for code like that.
Instead, I want to talk about some of the weirder quirks of JavaScript — those that are much more insidious than that — the kind of stuff you wouldn't find on r/ProgrammerHumor or a JS tutorial.
All of them can occur in any JavaScript/ECMAScript environment (so browser, Node.js, etc.), with or without use strict
enabled. (If you're working on legacy projects without strict mode, you should run. And if you don't know where to: Stack Auth is hiring.)
#1. eval is worse than you think
How silly it would be to think that these two are the same:
function a(s) {
eval("console.log(s)");
}
a("hello"); // prints "hello"
function b(s) {
const evalButRenamed = eval;
evalButRenamed("console.log(s)");
}
b("hello"); // Uncaught ReferenceError: s is not defined
The difference is that the former has access to variables in the current scope, whereas the renamed version can only access the global scope.
Why? Turns out that ECMAScript's definition for function calls has a hardcoded special case, which runs a slightly different algorithm when the function invoked is called eval
:

I can't stress enough how insane it is to have this hack in the specification for every single function call! Although it goes without saying that any half-decent JS engine will optimize it, so while there is no direct performance penalty, it certainly makes build tools & engines more complicated. (As an example, this means that (0, eval)(...)
differs from eval(...)
, so minifiers must consider this when removing seemingly dead code. Scary!)
#2. JS loops pretend their variables are captured by value
Yes, the title makes no sense, but you'll see what I mean in just a second. Let's start with an example:
for (let i = 0; i < 3; i++) {
setTimeout(() => console.log(i));
}
// prints "0 1 2" — as expected
let i = 0;
for (i = 0; i < 3; i++) {
setTimeout(() => console.log(i));
}
// prints "3 3 3" — what?
Why does it matter where the variable is defined? It's the same variable either way, right?
In any programming language, when you capture values with a lambda/arrow function, there are two ways to pass variables: By value (copy) or by reference (passing a pointer). Some languages, like C++, let you pick:
// C++ code below:
// capture by value
int byValue = 0;
auto func1 = [byValue] { std::cout << byValue << std::endl; };
byValue = 1;
func1();
// prints 0, because the variable's value is copied
// capture by reference
int byReference = 0;
auto func2 = [&byReference] { std::cout << byReference << std::endl; };
byReference = 1;
func2();
// prints 1, because the variable is captured by reference
That said, most high-level languages (JS, Java, C#, …) capture variables by reference:
let byReference = 0;
const func = () => console.log(byReference);
byReference = 1;
func();
// prints 1
More often than not this is what you want, but it's particularly undesirable in loops. There, it's common you need to do something with the iterator variable in a callback function:
// C# code below:
for (int i = 0; i < 3; i++) {
setTimeout(() => {
Console.WriteLine(i);
}, 1000 * i);
}
// prints "3 3 3" — probably not what you wanted
As a "fix", the ECMAScript standard hacks for-loop variables to have a different behavior, but only if they're defined in the loop header:
for (let i = 0; i < 3; i++) {
setTimeout(() => {
console.log(i);
}, 1000 * i);
}
// prints "0 1 2"
// but it doesn't work if we factor out the loop variable:
let i = 0;
for (i = 0; i < 3; i++) {
setTimeout(() => {
console.log(i);
}, 1000 * i);
}
// prints "3 3 3"
I posted about this on Twitter, and a bunch of you told me that this "makes sense" if you understand how for-loops & closures are defined in terms of scope in the ECMAScript standard. That's true, although it's weird in the sense that it really doesn't fit most people's intuition. More precisely, if you want to unroll a for-loop in JavaScript, this would be the spec-compliant way to do it:
// intuitive way to unroll a for-loop (WRONG in JS)
let i = 0;
while (i < 3) {
// ... for-loop body ...
i++;
}
// spec-compliant way to unroll a for-loop
let _iteratorVariable = 0;
while (_iteratorVariable < 3) {
let i = _iteratorVariable;
// ... for-loop body ...
i++;
_iteratorVariable = i;
}
That said, the fact that nearly no one talks about it is a testament of how those "hacks" can sometimes be very useful. (TypeScript's type system has plenty of "useful" hacks like these, and I think that's part of why it's so popular despite its complexity — some day I should write a post about that.)
#3. That falsy object
Common knowledge is that there are 8 falsy values in JavaScript: false
, +0
, -0
, NaN
, ""
, null
, undefined
, and 0n
.
Oops, I lied. There's actually a ninth one, and it's an object:
console.log(document.all); // prints HTMLAllCollection [<html>, <head>, ...]
console.log(Boolean(document.all)); // prints false
I almost didn't include this one in this post, because it only affects browsers. But it turns out that it's actually specified in the ECMAScript standard, not in the DOM standard (where you'd usually see browser-specific stuff), so I left it in:

Why? Because on old versions of Internet Explorer, document.getElementById
was not available and instead there was a property called document.all
, so a lot of code was written like this:
if (document.all) { // IE-specific
// do something with document.all
} else { // every other browser
// do something with document.getElementById
}
To be compatible with IE, other browsers then went on to implement document.all
too. However, it's much slower than document.getElementById
, so those browsers decided that document.all
should be falsy, in order to make code like the above take the fast path. Don't we love IE?
#4. Graphemes & string iteration
It's relatively well-known that strings in JavaScript are UTF-16 encoded, which means that there are low- and high-surrogates. Essentially, it means that some characters take up two UTF-16 code units:
const japanese = "𠮷";
console.log(japanese.length); // prints 2
console.log(japanese.charCodeAt(0)); // prints 55362
console.log(japanese.charCodeAt(1)); // prints 57271
Surrogates always come in groups of two, never more. So, sensibly, if you have n
characters, then String.prototype.length
will always be between n
and 2n
, depending on how many surrogates there are.
But then what's the output of this?
const family = "👨👩👧👦👨👩👧👦"; // two family emojis
console.log(family.length); // prints 23
If you know Unicode well, you'll know that surrogates don't tell the whole story — some characters (particularly emojis) consist of multiple Unicode code points (each of which may be a single UTF-16 code unit, or a surrogate pair).
Now, what if we want to iterate over them?
const family = "👨👩👧👦👨👩👧👦";
let count = 0;
for (const char of family) {
count++;
}
console.log(count); // prints 15
A different number? Clearly something is off here.
Okay, whatever, the new Intl
APIs exist for this purpose and they fix this mess. Right?
const family = "👨👩👧👦👨👩👧👦";
const chars = new Intl.Segmenter().segment(family);
console.log([...chars].length); // prints 1
Still not 2!
Essentially, there are four sensible notions of "string length", and JavaScript mixes them all:
- 23, the number of UTF-16 code units (most string functions, such as
.length
,.split
, etc.) - 15, the number of Unicode code points (when iterating over strings with
for
) - 2, the number of display characters (may differ based on your browser's emoji support)
- 1, the number of extended grapheme clusters (
Intl.Segmenter
)
If we paste the string above into a Unicode analyzer, it will make more sense:
UTF-16: 0x55357 0x56424 0x08205 0x55357 0x56425 0x08205 0x55357 0x56423 0x08205 0x55357 0x56422 0x08205 0x55357 0x56424 0x08205 0x55357 0x56425 0x08205 0x55357 0x56423 0x08205 0x55357 0x56422
└────────┘ │ └────────┘ │ └────────┘ │ └────────┘ │ └────────┘ │ └────────┘ │ └────────┘ │ └────────┘
Unicode: Man zero-width-joiner Woman zero-width-joiner Girl zero-width-joiner Boy │ Man zero-width-joiner Woman zero-width-joiner Girl zero-width-joiner Boy
└─────────────────────────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────────────┘
Display: Family zero-width-joiner Family
└───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Intl: Extended grapheme cluster
Essentially, each Unicode code point is exactly one or two UTF-16 code units. Every browser/font has its own rules on how to merge them into display characters, and the extended grapheme cluster algorithm tries to approximate that, but isn't perfect.
If you're curious, Henri Sivonen wrote this excellent blog post on what other languages do, but sadly no solution is perfect because internationalization is a fundamentally hard problem. Although, I guess you can always just get rid of Unicode altogether.
#5. Sparse arrays
You can just repeat commas in arrays to make some of the elements undefined
:
const sparse = [1, , , 4];
console.log(sparse[0], sparse[1], sparse[2], sparse[3]); // prints 1 undefined undefined 4
Or not?
const sparse = [1, , , 4];
sparse.forEach(e => console.log(e)); // prints 1 4 — doesn't print undefined
Let's compare it to a normal array:
const dense = [undefined, undefined];
const sparse = [,,];
console.log(dense.length); // prints 2
console.log(sparse.length); // prints 2
console.log(dense); // prints [undefined, undefined]
console.log(sparse); // prints [empty × 2]
console.log(dense.map(x => 123)); // prints [123, 123]
console.log(sparse.map(x => 123)); // prints [empty × 2]
This is called a "sparse array". The easiest way to understand what's going on is using Object.entries
:
console.log(Object.entries([1, undefined, undefined, 4]));
// prints [
// ['0', 1],
// ['1', undefined],
// ['2', undefined],
// ['3', 4]
// ]
console.log(Object.entries([1, , , 4]));
// prints [
// ['0', 1],
// ['3', 4]
// ]
JavaScript arrays are really just objects, and array elements are just properties on it. If some of the properties are missing, this completely messes up a lot of the built-in array methods. We call this a sparse array.
That said, you probably shouldn't use sparse arrays at all. Unfortunately, the Array
constructor creates sparse arrays by default, leading to very unnatural code:
const sparse = new Array(4);
console.log(sparse); // prints [empty × 4]
// this one doesn't work either:
const stillNotDense = new Array(4).map(x => 123);
console.log(stillNotDense); // prints [empty × 4]
// but you need to do this:
const dense = new Array(4).fill(undefined).map(x => 123);
console.log(dense); // prints [123, 123, 123, 123]
// or you could write this:
const alsoDense = Array.from({ length: 4 }, () => 123);
console.log(alsoDense); // prints [123, 123, 123, 123]
If that doesn't convince you, sparse arrays also have absolutely atrocious performance. Just don't use them in your code, ever, and you'll be fine.
#6. Weird ASI quirks
What will this code print? (Hint: It's not 2 1 4 3.)
function f1(a, b, c, d) {
[a, b] = [b, a]
[c, d] = [d, c]
console.log(a, b, c, d)
}
f1(1, 2, 3, 4)
Spoiler
The result is 4 3 3 4
.
The fact that I am missing semicolons is a good hint at what's going on. There's a fairly complicated algorithm called Automatic Semicolon Insertion (ASI) that tries to guess with a bunch of heuristics where they're supposed to go.
[a, b] = [b, a]
[c, d] = [d, c]
// is interpreted by ASI as:
[a, b] = [b, a][c, d] = [d, c]
^ ^
| |
| comma operator
|
array access
// which is the same as:
[a, b] = [4, 3]
[b, a][4] = [4, 3]
The exact mechanics of the ASI are out of scope for this post, but in essence, it checks whether there's a syntax error, and if there is, and there's a newline right before it, it inserts a semicolon. Hence, if there's no syntax error, it usually won't insert a semicolon.
From the perspective of the ECMAScript standardization committee, this rule is quite restrictive. Adding new syntax to the language means that old syntax error may no longer be syntax errors, but because the ASI relies on syntax errors to occur at specific places, every new syntax could potentially break old code. For this purpose, there are special so-called restricted productions in the language, which always insert a semicolon if there's a newline, even if the code would be syntactically correct otherwise.
Et cetera
Here is a list of odd behaviors for which I didn't have enough space to write about:
- Anything that has to do with
==
and!=
- Anything that has to do with type coercion
- Anything that has to do with
this
- NaN is not equal to anything
- +0 vs. -0
- Anything that has to do with floating-point precision, or otherwise stuff that's covered by IEEE 754
typeof null
is"object"
- Anything that uses non-strict mode or
var
- Returning primitive values from constructors
- Prototype pollution
Array.sort
converting numbers to strings- ...
If you know any quirks I haven't listed here, I'd love if you could let me know on Twitter or Bluesky. And if you haven't yet, check out our blog post on OAuth!