Compiling Haskell to compact Javascript programs

UHC allows for the generation of relatively compact Javascript programs from Haskell. With relatively I mean that UHC can prune unnecessary code at the Core level before generating Javascript but then still redundant code from the runtime system remains, as well as the use of lengthy identifiers. This of course can be fixed, but currently not by UHC. Let’s look at a small Hello World example and see what UHCcan do to obtain compact code.

The hello world example Hello.hsused runs in a browser, popping up an alert:

module Hello where

import Language.UHC.JS.Prelude
import Language.UHC.JS.Assorted

main = alert "Hi"

The UHC specific Javascript library UHC JavaScript librariesfor interacting with the Javascript runtime environment is required, so to get it running execute in a shell:

> git clone git://github.com/UU-ComputerScience/uhc-js.git uhcjs	# read only access
> uhc --import-path=uhcjs/uhc-js/src -tjs Hello.hs

This will create Hello.js and Hello.html; Hello.html loads both Hello.js and library modules, omitting most scripttags for brevity:

<!DOCTYPE html><html><head><title>Hello</title>
<script type="text/javascript" src="/usr/local/lib/uhc-1.1.4/lib/js/libEH-RTS.mjs"></script>
<script type="text/javascript" src="/usr/local/lib/uhc-1.1.4/lib/pkg/uhcbase-1.1.4/uhc-1.1.4/js/plain/UHC/UHC_Base.mjs"></script>
...
<script type="text/javascript" src="/usr/local/lib/uhc-1.1.4/lib/pkg/uhcbase-1.1.4/uhc-1.1.4/js/plain/UHC/UHC_Run.mjs"></script>
<script type="text/javascript" src="/usr/local/lib/uhc-1.1.4/lib/pkg/base-3.0.0.0/uhc-1.1.4/js/plain/Prelude.mjs"></script>
<script type="text/javascript" src="uhcjs/uhc-js/src/Language/UHC/JS/Language_UHC_JS_Types.mjs"></script>
...
<script type="text/javascript" src="uhcjs/uhc-js/src/Language/UHC/JS/Language_UHC_JS_Assorted.mjs"></script>
<script type="text/javascript" src="Hello.js"></script>
</head>
<body>
</body>
</html>

Opening Hello.htmlin a browser then pops up an alert box.

The problem with the resulting Hello.html is that it loads too much code; running a word count reveals that almost 2MB will be loaded!
This might be ok for locally running the html file, but now for network based access.

Luckily the -O optimization flag for UHCallows to specify in which compiler stage linking will take place:

> uhc --import-path=uhcjs/uhc-js/src -tjs -O,2 Hello.hs

With the -O flag both the amount of optimization may be specified (e.g. classical -O2) as well as the scope of it, the 2 behind the comma indicating that optimizations should be done on the whole program on the Core level (instead of just per module, being the default). Currently not many optimizations are in place in UHC but this mechanism links all imported modules on the Core level, only pulling in required code, thus automatically minimizing its size. The size of Hello.js now is almost 60KB, of which the major part is the runtime system. No other modules need to be loaded, as shown by the corresponding Hello.html:

<!DOCTYPE html><html><head><title>Hello</title>
<script type="text/javascript" src="Hello.js"></script>
</head>
<body>
</body>
</html>
This form of linking only has meaning for a program actually having a main because main acts as the root from which to start pulling in required code.

In addition to main also the foreign exports declarations of all linked modules are used as a root.

A Haskell FFI calling convention for Javascript

Haskell’s Foreign Function Interface (FFI) predefined calling conventions do not match well with Javascript’s object oriented features. In particular selecting a field of an object using dot notation (like o.f) and using an object as an array using brackets (like o[i]) do not have a natural counterpart in Haskell or the default calling conventions supported by the FFI interface. So, here are some examples of how Javascript is accessed in UHC via its jscript calling convention:

data Document
foreign import jscript "document"   document      :: IO Document
foreign import jscript "%1.write()" documentWrite :: Document -> JSString -> IO ()
foreign import jscript alert :: JSString -> IO ()

From within a browser the document representation can be accessed via the global variable document, the foreign entity "document" translates to a reference to this variable. The type of the document is defined as an opaque type, it can thus only manipulated via Javascript. Writing a string to the document is done by invoking the method write on a document. The foreign entity "%1.write()" specifies that from all arguments the first one is used as the receiver of the method write. The parenthesis () specify that a call has to be made, passing all arguments except those referred to explicitly by means of %<nr>, where <nr> >= 1 refers to argument <nr>. If an entity is omitted as in alert it defaults to "<functionname>()" where <functionname> is the name of the foreign function being defined.

Function documentWrite does not accept a String but a JSString instead, defined to be the platform dependent representation of Strings, converted to and from String with corresponding conversion functions.

type JSString = PackedString
stringToJSString :: String -> JSString
jsStringToString :: JSString -> String

stringToJSString forces its argument to be fully evaluated and then converts it to a Javascript String.

There is choice whether to put document in the IO monad or not, depending whether this global object itself will ever be assigned a new value or not. Not being a Javascript DOM wizard wrapping in IO seems to be the safest bet.

Given these functions a minimal Hello World web program thus is:

main = alert $ stringToJSString "Hi there!"

As this would pop up an alert box, an alternative Hi is the following program which writes to the document instead:

main = do d <- document
          documentWrite d $ stringToJSString "Hi there!"

Actually, the usual Hello would have worked as well because it is implemented as writing to the document:

main = putStr "Hi there!"

To show the usefulness of array like access as part of we do a bit of rudimentary DOM programming:

foreign import jscript "%1.getElementsByName()" documentGetElementsByName :: Document -> JSString -> IO (NodeList Node)

data NodeList x

foreign import jscript "%1.length" nodeListLength :: NodeList Node -> Int
foreign import jscript "%1[%2]"    nodeListItem   :: NodeList Node -> Int -> IO Node

data Node

foreign import jscript "%1.innerHTML" elementInnerHTML :: Node -> JSString
foreign import jscript "%1.tagName"   elementTagName   :: Node -> JSString

A NodeList is not an array, but behaves like an array: we can ask for its length and retrieve an element by index. It is not an array itself, so modelling it as such in Haskell would be incorrect. However, by allowing import entities to use Javascript array notation we circumvent this limitation and the Javascript array interface can still be used easily.

Finally, this minimal interface to DOM can be used to retrieve and print info about an element in an html document:

main = do d <- document
          nl <- documentGetElementsByName d (stringToJSString "myHeader")
          print (nodeListLength nl)
          n <- nodeListItem nl 0
          print $ jsStringToString $ elementTagName n
          print $ jsStringToString $ elementInnerHTML n

Given the presence of

Head says hello!

with the name "myHeader" in the document where the program is run, it will produce the following as part of the document:

1 "H1" "Head says hello!"

Haskell to Javascript backend

My first blog ever, and for a Haskell oriented blog a Javascript flavored topic seemed to be a good start :-). I intend to spend time on my UHC adventures: internals, problems, solutions, open questions, etc etc. As I have been working on a Javascript backend for UHC it thus fits quite well here.

I started making a Javascript backend after ICFP 2010. A couple of people I spoke to at the ICFP (and already before) had expressed it would be a good idea to do so. There seem to be various attempts to do something functional with Javascript, either as a Haskell compiler backend (YHC), or as a library (Functional Javascript), or as a Haskell interpreter written in Javascript (haskellinjavascript). Regretfully, none of these seem to be either alive or mature. Perhaps there is more.

For this entry, I’ll explain the interpreter for which code is generated, and how it interacts with Javascript. To make it into a full Haskell to Javascript compiler more than that is required, but I’ll go into the issues and todos in a followup blog entry.

Javascript interpreter

Functional programming in Haskell (obviously) is about functions, lazy applications of those functions, and forcing evaluation when we are interested in the actual computed value of applications. So those are the three ingredients modeled by objects in Javascript. A function Fun object is constructed given a Javascript function fun, and can be applied to arbitrary Javascript values:


function Fun( fun ) { ...
}

Fun.prototype = {
    applyN : function ( args ) ...
    needsNrArgs : function() ...
}

The main difference between strict and lazy functional languages is that the delay of a an actual computation must be represented explicitly, usually this is done by remembering the not yet applied function and its arguments (a closure, thunk). Here a Javascript apply object is used, in two variations, one for undersaturated function applications still lacking a known number of arguments (AppLT), and one for the applications of which we do not know under-, over-, or exactly right saturation (App):

AppLT.prototype = {
    applyN : function ( args ) ...
    needsNrArgs : function() ...
}

function AppLT( fun, args ) { ...
}

App.prototype = {
    applyN : function ( args ) ...
}

function App( fun, args ) { ...
}

The last ingredient is a function eval, necessary to force evaluation of an application:

function eval( x ) ...

So, let’s look at these in more detail, beginning with arbitrary, lazy, application. A design choice is to be able to arbitrarily mix Javascript values and interpreter values like evaluated and not yet evaluated applications. In order to distinguish these, the interpreter maintained values have a field eOrV, short for “evaluator or value”, which either holds a Javascript function to evaluate a not yet evaluated application, or the resulting value of this computation:


function App( fun, args ) {
    this.eOrV = function() {
        var x = ( fun.applyN( args ) ) ;
        this.eOrV = x ;
        return x ;
    }
}

The above constructor for an application takes a function and its arguments. The function fun can be a Fun, AppLT, or another App, the arguments args are represented by a Javascript Array holding arbitrary values. The App construction piggybacks on Javascript closures by building a parameterless Javascript function to force evaluation. This function is put in the eOrV field, which itself is overwritten when invoked: long live untyped interpreted languages :-)! An App thus represents closures; forcing an App to evaluate (to WHNF) is done by the following (yet incorrect version of) eval:


// Incorrect version of eval:
function eval( x ) {
    if ( typeof x.eOrV == 'function' ) {
        x = x.eOrV() ;
    } else if ( x.eOrV ) {
        x = x.eOrV ;
    }
    return x ;
}

This not yet correct version of eval (two reasons why..) inspects the eOrV field. If a function, it simply invokes it, if not, it returns the value. Internally, applyN is used to actually (i.e. strictly) do the application. Each of the objects used by the interpreter knows how to deal with this. For an App we first need to evaluate the App closure (i.e. compute the delayed application), then apply it directly to the newly given arguments. However, we do not do the last part directly as this may lead us into too deep recursion, in particular tail recursion! Instead a new anonymous Javascript object is returned holding as its only field the function which will do this, thus allowing to return and free some of the Javascript stack:


App.prototype = {
    applyN : function ( args ) {
        var fun = eval(this) ;
        return {
            eOrV : function() {
                return fun.applyN( args ) ;
            } } ;
    }
}

It now has become the responsibility of the caller of applyN to continue with the evaluation, in our case the eval function. The eval function has to repeatedly test whether still progress can be made, the correct version is as follows:

function eval( x ) {
    while ( x && x.eOrV ) {
        if ( typeof x.eOrV == 'function' ) {
            x = x.eOrV() ;
        } else {
            x = x.eOrV ;
        }
    }
    return x ;
}

Additionally it also checks whether x and x.eOrV are defined before actually using them. Plain Javascript values pass unmodified through eval, thus allowing interpreter and Javascript values to coexist.

When applyN is invoked on an App it actually does not much more than delegate the real work to Fun and AppLT, which both deal with application by consuming the right amount of arguments to achieve a saturated function call. A Fun knows how many arguments it requires, this can be extracted from Javascript function objects:


function Fun( fun ) {
    this.needs = fun.length ;
    this.fun = fun ;
}

When applyN is invoked on a Fun with too few arguments, an AppLT is constructed, thus remembering the partial unsaturated application. When given exactly enough it just calls the function, and when given more arguments than required, it slices off the right amount of arguments for calling the function, and then continues in the same way as App did by returning a Javascript continuation object for the remainder of the application.

Fun.prototype = {
    applyN : function ( args ) {
        if ( args.length < this.needs ) {
            return new AppLT( this, args ) ;
        } else if ( args.length == this.needs ) {
            var x = this.fun.apply( null, args ) ;
            return x ;
        } else {
            var fun = eval( this.fun.apply( null, args.slice( 0, this.needs ) ) ) ;
            var remargs = args.slice( this.needs ) ;
            return {
                eOrV : function() {
                    return fun.applyN( remargs ) ;
                } } ;
        }
    } ,
    needsNrArgs : function() {
        return this.needs ;
    } ,
}

Finally, undersaturated applications are encoded with AppLT objects. Its implementation resembles App and Fun, so the code is just here for completeness:

AppLT.prototype = {
    applyN : function ( args ) {
        var needs = this.needsNrArgs() ;
        if ( args.length < needs ) {
            return new AppLT( this, args ) ;
        } else if ( args.length == needs ) {
            return this.fun.applyN( this.args.concat( args ) ) ;
        } else {
            var fun = eval( this.applyN( args.slice( 0, needs ) ) ) ;
            return {
                eOrV : function() {
                    return fun.applyN( args.slice( needs ) ) ;
                } } ;
        }
    } ,
    needsNrArgs : function() {
        return this.fun.needsNrArgs() - this.args.length ;
    } ,
}
function AppLT( fun, args ) {
    this.fun = fun ;
    this.args = args ;
}

This is it! We can now do some real Haskell programming, although it is still manual labor.

Using the interpreter

As an example, a version of the primes sieve is used:


-- Haskell version
module Sieve where

notMultiple x y = not ((y `div` x) * x == y)
sieve (h:t) = h : sieve (filter (notMultiple h) t)

main :: IO ()
main = putStrLn (show (last (take 500 (sieve [2..]))))

Without a Prelude all functions have to be manually encoded, for example with the aid of helper function fun multiplication is defined as follows:

function fun(f) { return new Fun(f) ; }

var mul = fun( function(a,b) {
    return eval(a) * eval(b) ;
} ) ;

Multiplication is a primitive and requires its operands to be evaluated.

For manipulating lazy lists a couple of additional helper functions come in handy:

function app1(f,a  ) { return new App(f,[a  ]) ; }
function app2(f,a,b) { return new App(f,[a,b]) ; }

function eval1(f,a  ) { return eval( f.applyN([a  ]) ) ; }
function eval2(f,a,b) { return eval( f.applyN([a,b]) ) ; }

app1 (and variants) construct lazy application nodes, eval1 (and variants) apply arguments and enforce evaluation.

Lists are encoded as arrays, with a tag in front:


function cons(x,y) { return [0,x,y]   ; }
var nil = [1] ;
function head(l)   { return l[1]      ; }
function tail(l)   { return l[2]      ; }
function isNil(x)  { return x[0] == 1 ; }

The above functions already assume that their arguments are already evaluated. With these functions filter can now be implemented:

var filter = fun( function(a,b) {
	var list = eval(b) ;
	var test = eval1( a, head(list) ) ;
	if ( test ) {
		return cons( head(list), app2( filter, a, tail(list) ) ) ;
	} else {
		return app2( filter, a, tail(list) ) ;
	}
} ) ;

The equivalent of the infinite lazy list [a..] is the function from:

var from = fun( function(a) {
    return cons( a, app1( from, app2( add, a, 1 ) ) ) ;
} ) ;

Other function definitions are ‘just like that’, i.e. predictably follow the same ideas. We then end with the equivalent of sieve and its application:


var sieve = fun( function(a) {
    var list = eval(a) ;
    return cons( head(list)
               , app1( sieve
                     , app2( filter
                           , app1( notMultiple2, head(list) )
                           , tail(list)
               )     )     ) ;
} ) ;

var mainSieve = app2( take, 500, app1( sieve, app1( from, 2 ) ) ) ;

Finally, we just show the last element:

function show( x ) {
    var x = eval(x) ;
    document.write( eval(x) ) ;
}

show( app1( last, mainSieve ) ) ;

So, is this is all there is to functional programming in Javascript? Regretfully not, as a Haskell compiler needs to deal with foreign function interfaces in general, libraries, deployment, IO interfacing, etc etc. But that is for the next blog entry…

In the meantime the source code for this entry can be found on git@github.com:atzedijkstra/javascript-runtime-for-UHC.git

Follow

Get every new post delivered to your Inbox.