Wilcox Development Solutions Blog

Software Analysis with GToolkit: Finding Impact of Removing A Library

December 21, 2024

GToolkit provides AST analysis tools for Javascript & others! Let’s see how much we really use a dependency!

The problem with any package used in Node.js is knowing where and exactly how much of it is used in a project. From there we can estimate effort and come up with a deprecation plan. Luckily GToolkit to the rescue!

The Desire to Understand Dependencies

Years ago, one of the tools I always added to Javascript based projects was lodash: the utility belt for Javascript. Sadly, lodash hasn’t exactly been maintained well: it’s picked up a number of security vulnerabilities and with the last release being four years ago.

The question: How frequently is it used in our project? From there we can understand how easy it might be to remove.

The project in question is a React codebase built with JS using ES Module syntax (import BLAH from "blah" / export blah). All the source lives in a folder named src.

(Pointing GToolkit to only our source files is important: we don’t want it scanning node_modules.)

Gtoolkit approach for software analysis

GToolkit provides AST based parsers for a handful of languages, including Javascript, JSX, and Typescript.


importer := GtJSImporter new.
rootDirectory := '/Users/rwilcox/PROJECT_FOLDER/src' asFileReference.
importer parserClasses: {JSXParser}.

importer import: rootDirectory.
resultingModel := importer model.

The JSX parser is more complete, I’ve found, than the default JSParser, so force everything to be evaluated as JSX. It’s fine, it’s a superset language anyway.

Now, interestingly enough, modern React codebases have a bunch of files and imports that are technically Javascript but we don’t want (test files, storybook files), or the parser wants to evaluate them as Javascript: css, image files, json.

So we construct a list of files that exclude all these unwanted items.

codeFiles := resultingModel allFiles.
sourceFiles := codeFiles reject: [:cFile | |suppliers|
    suppliers := OrderedCollection new.
    suppliers add: (cFile name includesSubstring: '.stories.').
    suppliers add: (cFile name includesSubstring: '.spec').
    suppliers add: (cFile name includesSubstring: '.css').
    suppliers add: (cFile name includesSubstring: '.jpg').
    suppliers add: (cFile name includesSubstring: '.svg').
    suppliers add: (cFile name includesSubstring: '.png').
    suppliers add: (cFile name includesSubstring: '.json').
    suppliers add: (cFile name includesSubstring: '.scss').
    suppliers anySatisfy: [:includes | includes].
].

This snippet is some clever filtering code: we are creating an array with items that will be true or false. Then we check if any item was true, which means we should not include the current file in our sourceFiles list.

sourceFiles now contains all our files, minus specs, storyboards, and asset files.

Examining the imports

As much as I like DuckDB from GToolkit a simple in-memory structure can help us examine the usages of lodash in our repo.


packages := Dictionary new.
packageInformation := Dictionary new.

packageInformation at: 'usedFunctions' put: (Dictionary new).
packages at: '"lodash"' put: packageInformation.

Now, our main code:


sourceFiles do: [:currentFile |
  imports := currentFile script findASTNode items select: [:currentItem | currentItem isKindOf: JSImportDeclarationNode].

  importedModules := imports collect: [:currentImportDeclaration | |packageName importedItems|
    packageName := currentImportDeclaration from value.

    "imports can be structured as follows:
      import * from 'package name'
      import blah from 'package name'
      import { Thing } from 'package name'
      import 'package name'

      That third case is more useful: if we run across the first/second case mark it but it's up for later human analysis.
      array. We'll ignore the forth case - where the size of the imports array is 0.
      (it may be imported CSS, or something, but it's an exercise for the reader anyway.)
    "

    (currentImportDeclaration imports size > 0) ifTrue: [
      correctImport := currentImportDeclaration imports first.
      "I don't understand why imports are a list where the second item is nil, but here we are"
      (correctImport respondsTo: #specifiers) ifTrue: [
        importedItems := correctImport specifiers collect: [:currentJSImportSpecifierNode |
          currentJSImportSpecifierNode binding name value
        ] .
       ] ifFalse: [
         importedItems := ((OrderedCollection new) add: '* - TBI'; yourself).
      ].
    ].

    "we have the package name and what it imports. See if we should record that information"
    packages at: packageName ifPresent: [:modulePackageInformation ||moduleFunctions|
      moduleFunctions := modulePackageInformation at: 'usedFunctions'.

      importedItems do: [:currentUsage |
        moduleFunctions   at: currentUsage
        		   ifPresent: [:usagesArray | usagesArray add: currentFile]
        		 ifAbsentPut: [ (OrderedCollection new) add: currentFile; yourself].
      ].
    ].
  ].
].

If we examine packages now we may see something like this:

usages of lodash in a codebase

From this diagram we can see there’s likely not that much of an effort to clean up lodash: 10 different functions are used, and we can quickly examine You Might Not Need Lodash to see if there are easy replacements.

Conclusion

Using an AST based tool lets us do more than a simple grep -R would let us: we can easily drill down to what, exact, functions are used even if prettifier has put them on multiple lines. Maybe we want to see how many times those imported functions are used: that should be relatively easy too!

Smalltalk and the software engineering tools built into GToolkit make questions like this relatively easy to ask and get our answer to better understand or plan for our work as software developers!


Tagged with:

Written by Ryan Wilcox Chief Developer, Wilcox Development Solutions... and other things