Friday, March 31, 2017

Subsituting impure version specifiers in node2nix generated package compositions

In a number of previous blog posts, I have described node2nix, a tool that can be used to automatically integrate NPM packages into the Nix packages ecosystem. The biggest challenge in making this integration possible is the fact that NPM does dependency management in addition to build management -- NPM's dependency management properties conflict with Nix's purity principles.

Dealing with a conflicting dependency manager is quite simple from a conceptual perspective -- you must substitute it by a custom implementation that uses Nix to obtain all required dependencies. The remaining responsibilities (such as build management) are left untouched and still have to be carried out by the guest package manager.

Although conceptually simple, implementing such a substitution approach is much more difficult than expected. For example, in my previous blog posts I have described the following techniques:

  • Extracting dependencies. In addition to the package we intend to deploy with Nix, we must also include all its dependencies and transitive dependencies in the generation process.
  • Computing output hashes. In order to make package deployments deterministic, Nix requires that the output hashes of downloads are known in advance. As a result, we must examine all dependencies and compute their corresponding SHA256 output hashes. Some NPM projects have thousands of transitive dependencies that need to be analyzed.
  • Snapshotting versions. Nix uses SHA256 hash codes (derived from all inputs to build a package) to address specific variants or versions of packages whereas version specifiers in NPM package.json configurations are nominal -- they permit version ranges and references to external artifacts (such as Git repositories and external URLs).

    For example, a version range of >= 1.0.3 might resolve to version 1.0.3 today and to version 1.0.4 tomorrow. Translating a version range to a Nix package with a hash code identifier breaks the ability for Nix to guarantee that a package with a specific hash code yields a (nearly) bit identical build.

    To ensure reproducibility, we must snapshot the resolved version of these nominal dependency version specifiers (such as a version range) at generation time and generate the corresponding Nix expression for the resulting snapshot.
  • Simulating shared and private dependencies. In NPM projects, dependencies of a package are stored in the node_modules/ sub folder of the package. Each dependency can have private dependencies by putting them in their corresponding node_modules/ sub folder. Sharing dependencies is also possible by placing the corresponding dependency in any of the parent node_modules/ sub folders.

    Moreover, although this is not explicitly advertised as such, NPM implicitly supports cyclic dependencies and is able cope with them because it will refuse to install a dependency in a node_modules/ sub folder if any parent folder already provides it.

    When generating Nix expressions, we must replicate the exact same behaviour when it comes to private and shared dependencies. This is particularly important to cope with cyclic dependencies -- the Nix package manager does not allow them and we have to break any potential cycles at generation time.
  • Simulating "flat module" installations. In NPM versions older than 3.0, every dependency was installed privately by default unless a shared dependency exists that fits within the required version range.

    In newer NPM versions, this strategy has been reversed -- every dependency will be shared as much as possible until a conflict has been encountered. This means that we have to move dependencies as high up in the node_modules/ folder hierarchy as possible which is an imperative operation -- in Nix this is a problem, because packages are cannot be changed after they have been built.

    To cope with flattening, we must compute the implications of flattening the dependency structure in advance at generation time.

With the above techniques it is possible construct a node_modules/ directory structure having a nearly identical structure that NPM would normally compose with a high degree of accuracy.

Impure version specifiers


Even if it would be possible to reproduce the node_modules/ directory hierarchy with 100% accuracy, there is another problem that remains -- some version specifiers always trigger network communication regardless whether the dependencies have been provided or not, such as:

[
  { "node2nix": "latest" }
, { "nijs": "git+https://github.com/svanderburg/nijs.git#master" }
, { "prom2cb": "github:svanderburg/prom2cb" }
]

When referring to tags or Git branches, NPM is unable to determine to which version a package resolves. As a consequence, it attempts to retrieve the corresponding packages to investigate even when a compatible version in the node_modules/ directory hierarchy already exists.

While performing package builds, Nix takes various precautions to prevent side effects from influencing builds including network connections. As a result, an NPM package deployment will still fail despite the fact that a compatible dependency has already been provided.

In the package builder Nix expression provided by node2nix, I used to substitute these version specifiers in the package.json configuration files by a wildcard: '*'. Wildcards used to work fine for old Node.js 4.x/NPM 2.x installations, but with NPM 3.x flat module installations they became another big source of problems -- in order to make flat module installations work, NPM needs to know to which version a package resolves to determine whether it can be shared on a higher level in the node_modules/ folder hierarchy or not. Wildcards prevent NPM from making these comparisons, and as a result, some package deployments fail that did not use to fail with older versions of NPM.

Pinpointing version specifiers


In the latest node2nix I have solved these issues by implementing a different substitution strategy -- instead of substituting impure version specifiers by wildcards, I pinpoint all the dependencies to the exact version numbers to which these dependencies resolve. Internally, NPM addresses all dependencies by their names and version numbers only (this also has a number of weird implications, because it disregards the origins of these dependencies, but I will not go into detail on that).

I got the inspiration for this pinpointing strategy from the yarn package manager (an alternative to NPM developed by Facebook) -- when deploying a project with yarn, yarn pinpoints the installed dependencies in a so-called yarn.lock file so that package deployments become reproducible when a system is deployed for a second time.

The pinpointing strategy will always prevent NPM from consulting external resources (under the condition that we have provided the package by our substitute dependency manager first) and always provide version numbers for any dependency so that NPM can perform flat module installations. As a result, the accuracy of node2nix with newer versions of NPM has improved quite a bit.

Availability


The pinpointing strategy is part of the latest node2nix that can be obtained from the NPM registry or the Nixpkgs repository.

One month ago, I have given a talk about node2nix at FOSDEM 2017 summarizing the techniques discussed in my blog posts written so far. For convenience, I have embedded the slides into this web page:

Tuesday, March 14, 2017

Reconstructing Disnix deployment configurations

In two earlier blog posts, I have described Dynamic Disnix, an experimental framework enabling self-adaptive redeployment on top of Disnix. The purpose of this framework is to redeploy a service-oriented system whenever the conditions of the environment change, so that the system can still meet its functional and non-functional requirements.

An important category of events that change the environment are machines that crash and disappear from the network -- when a disappearing machine used to host a crucial service, a system can no longer meet its functional requirements. Fortunately, Dynamic Disnix is capable of automatically responding to such events by deploying the missing components elsewhere.

Although Dynamic Disnix supports the recovery of missing services, there is one particular kind of failure I did not take into account. In addition to potentially crashing target machines that host the services of which a service-oriented systems consist, the coordinator machine that initiates the deployment process and stores the deployment state could also disappear. When the deployment state gets lost, it is no longer possible to reliably update the system.

In this blog post, I will describe a new addition to the Disnix toolset that can be used to cope with these kinds of failures by reconstructing a coordinator machine's deployment configuration from the meta data stored on the target machines.

The Disnix upgrade workflow


As explained in earlier blog posts, Disnix requires three kinds of deployment models to carry out a deployment process: a services model capturing the components of which a system consists, an infrastructure model describing the available target machines and their properties, and a distribution model mapping services in the services model to target machines in the infrastructure model. By writing instances of these three models and running the following command-line instruction:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Disnix will carry out all activities necessary to deploy the system: building the services and its intra-dependencies from source code, distributing the services and its intra-dependencies, and activating all services in the right order.

When changing any of the models and running the same command-line instruction again, Disnix attempts to upgrade the system by only rebuilding the aspects that have changed, and only deactivating the obsolete services and activating new services.

Disnix (as well as other Nix-related tools) attempt to optimize a redeployment process by only executing the steps that are required to reach a new deployment state. In Disnix, the building and distribution steps are optimized due to the fact that every package is stored in isolation the Nix store in which each package has a unique filename with a hash prefix, such as:

/nix/store/acv1y1zf7w0i6jx02kfa6gxyn2kfwj3l-firefox-48.0.2

As explained in a number of earlier blog posts, the hash prefix (acv1y1zf7w0i6jx02kfa6gxyn2kfwj3l...) is derived from all inputs used to build the package including its source code, build script, and libraries that it links to. That, for example, means that if we upgrade a system and nothing to the any of inputs of Firefox changes, we get an identical hash and if such a package build already exists, we do not have to build or transfer the package from an external site again.

The building step in Disnix produces a so-called low-level manifest file that is used by tools executing the remaining deployment activities:

<?xml version="1.0"?>
<manifest version="1">
  <distribution>
    <mapping>
      <profile>/nix/store/aiawhpk5irpjqj25kh6ah6pqfvaifm53-test1</profile>
      <target>test1</target>
    </mapping>
  </distribution>
  <activation>
    <mapping>
      <dependsOn>
        <dependency>
          <target>test1</target>
          <container>process</container>
          <key>d500194f55ce2096487c6d2cf69fd94a0d9b1340361ea76fb8b289c39cdc202d</key>
        </dependency>
      </dependsOn>
      <name>nginx</name>
      <service>/nix/store/aa5hn5n1pg2qbb7i8skr6vkgpnsjhlns-nginx-wrapper</service>
      <target>test1</target>
      <container>wrapper</container>
      <type>wrapper</type>
      <key>da8c3879ccf1b0ae34a952f36b0630d47211d7f9d185a8f2362fa001652a9753</key>
    </mapping>
  </activation>
  <targets>
    <target>
      <properties>
        <hostname>test1</hostname>
      </properties>
      <containers>
        <mongo-database/>
        <process/>
        <wrapper/>
      </containers>
      <system>x86_64-linux</system>
      <numOfCores>1</numOfCores>
      <clientInterface>disnix-ssh-client</clientInterface>
      <targetProperty>hostname</targetProperty>
    </target>
  </targets>
</manifest>

The above manifest file contains the following kinds of information:

  • The distribution element section maps Nix profiles (containing references to all packages implementing the services deployed to the machine) to target machines in the network. This information is used by the distribution step to transfer packages from the coordinator machine to a target machine.
  • The activation element section contains elements specifying which service to activate on which machine in the network including other properties relevant to the activation, such as the type plugin that needs to be invoked that takes care of the activation process. This information is used by the activation step.
  • The targets section contains properties of the machines in the network and is used by all tools that carry out remote deployment steps.
  • There is also an optional snapshots section (not shown in the code fragment above) that contains the properties of services whose state need to be snapshotted, transferred and restored in case their location changes.

When a Disnix (re)deployment process successfully completes, Disnix stores the above manifest as a Disnix coordinator Nix profile on the coorindator machine for future reference with the purpose to optimize the successive upgrade step -- when redeploying a system Disnix will compare the generated manifest with the previously deployed generated instance and only deactivate services that have become obsolete and activating services that are new, making upgrades more efficient than fresh installations.

Unfortunately, when the coordinator machine storing the manifests gets lost, then also the deployment manifest gets lost. As a result, a system can no longer be reliably upgraded -- without deactivating obsolete services, newly deployed services may conflict with services that are already running on the target machines preventing the system from working properly.

Reconstructible manifests


Recently, I have modified Disnix in such a way that the deployment manifests on the coordinator machine can be reconstructed. Each Nix profile that Disnix distributes to a target machine includes a so-called profile manifest file, e.g. /nix/store/aiawhpk5irpjqj25kh6ah6pqfvaifm53-test1/manifest. Previously, this file only contained the Nix store paths to the deployed services and was primarily used by the disnix-query tool to display the installed set of services per machines.

In the latest Disnix, I have changed the format of the profile manifest file to contain all required meta data so that the the activation mappings can be reconstructed on the coordinator machine:

stafftracker
/nix/store/mi7dn2wvwvpgdj7h8xpvyb04d1nycriy-stafftracker-wrapper
process
process
d500194f55ce2096487c6d2cf69fd94a0d9b1340361ea76fb8b289c39cdc202d
false
[{ target = "test2"; container = "process"; _key = "4827dfcde5497466b5d218edcd3326327a4174f2b23fd3c9956e664e2386a080"; } { target = "test2"; container = "process"; _key = "b629e50900fe8637c4d3ddf8e37fc5420f2f08a9ecd476648274da63f9e1ebcc"; } { target = "test1"; container = "process"; _key = "d85ba27c57ba626fa63be2520fee356570626674c5635435d9768cf7da943aa3"; }]

The above code fragment shows a portion of the profile manifest. It has a line-oriented structure in which every 7 lines represent the properties of a deployed service. The first line denotes the name of the service, second line the Nix store path, third line the Dysnomia container, fourth line the Dysnomia type, fifth line the hash code derived of all properties, sixth line whether the attached state must be managed by Disnix and the seventh line an encoding of the inter-dependencies.

The other portions of the deployment manifest can be reconstructed as follows: the distribution section can be derived by querying the Nix store paths of the installed profiles on the target machines, the snapshots section by checking which services have been marked as stateful and the targets section can be directly derived from a provided infrastructure model.

With the augmented data in the profile manifests on the target machines, I have developed a tool named disnix-reconstruct that can reconstruct a deployment manifest from all the meta data the manifests on the target machines provide.

I can now, for example, delete all the deployment manifest generations on the coordinator machine:

$ rm /nix/var/nix/profiles/per-user/sander/disnix-coordinator/*

and reconstruct the latest deployment manifest, by running:

$ disnix-reconstruct infrastructure.nix

The above command resolves the full paths to the Nix profiles on the target machines, then downloads their intra-dependency closures to the coordinator machine, reconstructs the deployment manifest from the profile manifests and finally installs the generated deployment manifest.

If the above command succeeds, then we can reliably upgrade a system again with the usual command-line instruction:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Extending the self-adaptive deployment framework


In addition to reconstructing deployment manifests that have gone missing, disnix-reconstruct offers another benefit -- the self-adaptive redeployment framework described in the two earlier blog posts is capable of responding to various kinds of events, including redeploying services to other machines when a machine crashes and disappears from the network.

However, when a machine disappears from the network and reappears at a later point in time, Disnix no longer knows about its configuration. When such a machine reappears in the network, this could have disastrous results.

Fortunately, by adding disnix-reconstruct to the framework we can solve this issue:


As shown in the above diagram, whenever a change in the infrastructure is detected, we reconstruct the deployment manifest so that Disnix knows which services are deployed to it. Then when the system is being redeployed, the services on the reappearing machines can also be upgraded or undeployed completely, if needed.

The automatic reconstruction feature can be used by providing the --reconstruct parameter to the self adapt tool:

$ dydisnix-self-adapt -s services.nix -i infrastructure.nix -q qos.nix \
  --reconstruct

Conclusion


In this blog post, I have described the latest addition to Disnix: disnix-reconstruct that can be used to reconstruct the deployment manifest on the coordinator machine from meta data stored on the target machines. With this addition, we can still update systems if the coordinator machine gets lost.

Furthermore, we can use this addition in the self-adaptive deployment framework to deal with reappearing machines that already have services deployed to them.

Finally, besides developing disnix-reconstruct, I have reached another stable point. As a result, I have decided to release Disnix 0.7. Consult the Disnix homepage for more information.

Sunday, February 12, 2017

MVC lessons in Titanium/Alloy

A while ago, I have ported the simple-xmpp library from the Node.js ecosystem to Appcelerator Titanium to enrich our company's product line with chat functionality. In addition, I have created a bare bones example app that exposes most of the library's features.



Although I am not doing that much front-end development these days, nor consider myself to be a Titanium-guru, I have observed that it is quite challenging to keep your app's code and organization clean.

In this blog post, I will report on my development experiences and describe the architecture that I have derived for the example chat application.

The Model-View-Controller (MVC) architectural pattern


Keeping the code of an end-user application sane is not unique to mobile applications or a specific framework, such as Titanium -- it basically applies to any system with a graphical user interface including desktop applications and web applications.

When diving into the literature or just by searching on the Internet, then you will most likely stumble upon a very common "solution" -- there is the Model-View-Controller (MVC) architectural pattern that can be used as a means to keep your system structured. It is a generically applicable pattern implemented by many kinds of libraries and frameworks for all kinds of domains, including the mobile application space.

The idea behind this pattern is that a system will be separated in three distinct concerns: the model, the view and the controller. The meaning of these concerns are somewhat ambiguously defined. For example, the design patterns book written by the gang of four (Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides) says:

MVC consists of three kinds of objects. The Model is the application object, the View is its screen presentation, and the Controller defines the way the user interface reacts to user input.

The "problem" I have with the above explanation is that it is a bit difficult to grasp the meaning of an "application object". Moreover, the definition of the controller object used in the explanation above, states that it only has a relation with a user interface (a.k.a. the view) while I could also think of many scenarios in which external events are involved without invoking the user interface. I have no idea how to categorize these kinds of interactions by looking at the above description.

The paper that the book cites: "A Cookbook for Using the Model-View-Controller User Interface Paradigm in Smalltalk-80" (written by: Glenn E. Krasner and Stephen T. Pope) provides more detailed definitions. For example, it defines the model as:

The model of an application is the domain-specific software simulation or implementation of the application's central structure.

I particularly find the term "domain-specific" important -- it suggests that a model should encapsulate what matters to the problem domain, without any obfuscations of things not related to it, for example, user interface components.

The paper defines the view as follows:

In this metaphor, views deal with everything graphical: they request data from their model, and display the data

The above definition suggests that views are everything about presentation of objects belonging to the model.

Finally, the paper defines controllers as follows:

Controllers contain the interface between their associated models and views and the input devices (e.g., keyboard, pointing device, time)

In contrast to the design patterns book's definition of a controller, this definition also suggests that a controller has a relationship with the model. Moreover, it does not say anything about interactions with a physical user. Instead, it refers to input devices.

Although the paper provides more detailed definitions, it still remains difficult to draw a hard line from my perspective. For example, what is the scope of MVC? Should it apply to an entire system, or can it also be applied to components of which a system consists?

For example, in an earlier blog post, I wrote a blog post about some of my experiences with web development in which I have developed a simple library MVC-based library managing the layouts of web applications. The model basically encapsulates the structure of a web applications from an abstract point of view, but it only applies to a specific sub concern, not the system as a whole.

Despite its unclarities and ambiguities, I still think MVC makes sense, for the following reasons:

  • View and controller code clutters the model with obfuscations making it much harder to read and maintain.
  • There are multiple ways to present an object visually. With a clear separation between a model and view this becomes much more flexible.
  • In general, more compact modules (in terms of lines of code) is many ways always better than having many lines of code in one module (for example for readability and maintainability). Separation of concerns stimulates reduction of the size of modules.

The Titanium and Alloy frameworks


As explained earlier, I have implemented the chat example app using the Titanium and Alloy frameworks.

Titanium is a framework targeting multiple mobile app platforms (e.g. Android, iOS, Windows and mobile web applications) using JavaScript as an implementation language providing a unified API with minor platform differences. In contrast to platforms such as Java, Titanium is not a write once, run anywhere approach, but a code reuse approach -- according to their information between 60 and 90% of the code can be reused among target platforms.

Moreover, the organization of Titanium's API makes a clear difference between UI and non-UI components, but does not impose anyone to strictly follow an MVC-like organization while implementing an application.

Alloy is a declarative MVC-framework that wraps around Titanium. To cite the Alloy documentation:

Alloy utilizes the model-view-controller (MVC) pattern, which separates the application into three different components:

  • Models provide the business logic, containing the rules, data and state of the application.
  • Views provide the GUI components to the user, either presenting data or allowing the user to interact with the model data.
  • Controllers provide the glue between the model and view components in the form of application logic.

(As may be noticed, the above description introduces yet another slightly different interpretation of the MVC architectural pattern.)

The Alloy framework uses a number of very specific technologies to realize a MVC organization:

  • For the models, it uses the backbone.js framework's model instances to organize the application's data. The framework supports automatic data binding to view components.
  • Views use an XML data encoding capturing the static structure of the view. Moreover, the style of each view is captured in TSS stylesheet (having many similarities with CSS).
  • The controllers are CommonJS modules using JavaScript as an implementation language.

Furthermore, the directory structure of an Alloy application also reflects separation of concerns. For example, each unit of an application stores each concern in a separate directory and file. For example, in the chat app, we can implement each concern of the contacts screen by providing the following files:

./app/views/contacts.xml
./app/controllers/contacts.js
./app/styles/contacts.tss

The above files reflect each concern of the contacts screen, such as the view, the controller and the style.

In addition to defining models, views, styles and controllers on unit-level, the app unit captures general properties applying of the app.

Organizing the example chat app


Despite the fact that the Alloy framework facilitates separation of concerns in some degree, I still observed that keeping the app's code structure sane remains difficult.

Constructing views


An immediate improvement of Alloy over plain Titanium is that the view code in XML is much better to read than constructing UI components in JavaScript -- the nesting of XML elements reflects the structure of the UI. Furthermore, the style of the UI elements can be separated from the layout improving the readability even further.

For example, the following snippet shows the structure of the login screen:

<Alloy>
    <Window class="container">
        <ScrollView>
            <View>
                <Label>Web socket URL</Label>
                <TextField id="url" hintText="ws://localhost:5280/websocket/" />
            </View>
            <View>
                <Label>Username</Label>
                <TextField id="username" hintText="sander" />
            </View>
            <View>
                 <Label>Domain name</Label>
                 <TextField id="domain" hintText="localhost" />
            </View>
            <View>
                 <Label>Resource</Label>
                 <TextField id="resource" hintText="" />
            </View>
            <View>
                  <Label>Password</Label>
                  <TextField id="password" passwordMask="true" hintText="" />
            </View>
            <Button onClick="doConnect">Connect</Button>
        </ScrollView>
    </Window>
</Alloy>

As may be observed, by reading the above code fragment, it becomes quite obvious that we have a window with a scroll view inside. Inside the scroll view, we have multiple views containing a label and text field pair, allowing users to provide their login credentials.

Although implementing most screens in XML is quite straight forward as their structures are quite static, I have noticed that Alloy's technologies are not particularly useful to dynamically compose screen structures, such as the contacts overview that displaying a row for each contact -- the structure of this table changes whenever a new contact gets added or an existing contact removed.

To dynamically compose a screen, I still need to write JavaScript code in the screen's controller. Furthermore, UI elements composed in JavaScript do not take the style settings of the corresponding TSS file into account. As a result, we need to manually provide styling properties while composing the dynamic screen elements.

To keep the controller's code structured and avoiding code repetition, I have encapsulated the construction of table rows into functions.

Notifying views for changes


Another practical issue I ran into is updating the UI components when something changes, such as a receiving a text messaging or an updated status of a contact. An update to a backbone model automatically updates the attached view components, but for anything that is not backbone-based (such as XMPP's internal roster object) this will not work.

I ended up implementing my own custom non-backbone based data model, with my own implementation of the Observer design pattern -- each object in the data model inherits from the Observable prototype providing an infrastructure for observers to register and unregister themselves for notifications. Each view registers itself as an observer to the corresponding model object to update themselves.

The app's architecture


In the end, this is the architecture of the example chat app that I came up with:


The UML diagram shows the following aspects:

  • All classes can be divided into four concerns: controllers, views, models, and utility classes. The observer infrastructure, for example, does in my opinion not belong to any of the MVC-categories, because they are cross cutting.
  • The XMPPEventHandler is considered to be a controller. Despite not triggered by human actions, I still classify it as such. The event handler's only responsibility is to update the corresponding model objects once an event has been received from the XMPP server, such as a chat message.
  • All model objects inherit from a custom-made Observable prototype so that views can register and unregister themselves for update notifications.
  • Views extract information from the model objects to display. Furthermore, each view has its own controller responding to user input, such as button clicks.

Lessons learned


In addition to porting an XMPP library from the Node.js ecosystem to Titanium, I have also observed some recurring challenges when implementing the test application and keeping it structured. Despite the fact that the Alloy framework is MVC-based, it does not guarantee that your application's organization remains structured.

From my experiences, I have learned the following lessons:

  • The roles of each concern in MVC are not well defined, so you need to give your own interpretation to it. For example, I would consider any controller to be an object responding to external events, regardless whether they have been triggered by humans or external systems. By following this interpretation, I ended up implementing the XMPP event handler as a controller.
  • Similarly for the models -- the purpose of backbone.js models is mostly to organize data, but a model is more than just data -- from my perspective, the model encapsulates domain knowledge. This also means that non-backbone objects belong to this domain. The same thing applies to non-data objects, such as functions doing computations.
  • You always have to look at your structure from an aesthetic point of view. Does it makes sense? Is it readable? Can it be simplified?
  • Finally, do not rely on a framework or API to solve all your problems -- study the underlying concepts and remain critical, as a framework does not always guarantee that your organization will be right.

    Within the scope of Titanium/Alloy the problem is that models only make sense if you use backbone models. Using XML markup+TSS for views only make sense if your screen structure is static. The most logical outcome is to put all missing pieces that do not classify themselves into these categories into a controller, but that is probably the most likely reason why your code becomes a mess.

As a final note, the lessons learned do not apply to mobile applications or Titanium/Alloy only -- you will find similar challenges in other domains such as web applications and desktop applications.

Sunday, January 29, 2017

Some programming patterns for multi-process programming in POSIX applications

It has been a while since I wrote my last programming-related blog post. In this blog post, I am going to elaborate about some of my experiences developing multi-process POSIX applications.

From my perspective, processes are an interesting operating systems concept, in particular in UNIX/POSIX-like operating systems, such as Linux.

The IEEE Std 1003.1 POSIX standard defines a "live process" as:

An address space with one or more threads executing within that address space, and the required system resources for those threads.

Within the boundaries of many (relatively simple) applications, process creation and management is typically not required. Nonetheless, decomposing a system into sub processes can be quite useful for a variety of reasons, such as:

  • Improving a system's responsiveness by running slow/long-running tasks in the background, concurrently with other tasks. This is particularly useful to retain the ability to respond to user events, such as mouse clicks or keyboard input while data is being processed or to handle multiple connecting clients at the same time to a server application.
  • Increased protection. If an incorrectly implemented or malicious task crashes during execution, it neither tears down the entire system nor affects the state of any other processes.
  • More security. Child processes can be run under more restrictive user privileges, making it more difficult for the system to do any harm, such accessing privacy-sensitive filesystem areas.
  • Portability. In a child process, we can invoke an external executable implemented in a different programming language.
  • Scalability and performance. The execution of a collection of tasks can be parallelized by means of processes and their executions can be divided over multiple CPU cores by the operating system, potentially increasing the execution speed of a program.

Although multi-process applications may provide a number compelling benefits, programming such applications using the C programming language and the POSIX API is IMO not always straight forward -- I have found myself frequently repeating numerous patterns over and over again.

To alleviate the burden of repetition, I have identified a number of patterns, derived abstractions from them and constructed a library package, named libprocreact, providing these abstractions. The APIs that libprocreact provides are loosely inspired by reactive programming.

Programming patterns


When programming multi-process applications, there are many housekeeping tasks that need to be performed in order to properly organize them. Below, I have described a number of recurring patterns I typically implement:

Forking


The first and most prominent house keeping task is process creation. The key ingredient in creating processes is the fork() system call, as shown in the code example below:

#include <stdio.h>
#include <unistd.h>

pid_t pid = fork();

if(pid == -1)
    printf("The child process cannot be forked!\n");
else if(pid == 0)
{
    printf("Code executed by the child process\n");
    printf("It runs in the background!\n");
    _exit(0);
}
else
{
    printf("Code executed by the parent process!\n");
    printf("The pid of the child process is: %d\n", pid);
}

Forking is a relatively simple concept -- after successfully invoking the fork() function call (i.e. the return value is not -1), a child process gets created that appears as a nearly identical clone of the parent process. For example, their memory contents and file descriptors are identical.

Furthermore, a forked child process will be executed immediately in parallel to the parent process. Since a parent and child process are almost identical, we can use the return value of fork() to make a distinction between them -- in the parent, the fork() function call returns the PID of the child process so that it can monitor its status and interact with it. In the child process, fork() returns 0.

Although creating a clone of a parent process may sound very expensive, many POSIX-compliant operating systems have optimized this process by using a Copy-On-Write (COW) memory model -- instead of copying a parent process' memory, the memory between the parent and child processes is shared. The operating system maintains a table of shared and private memory pages for each process. When a process attempts to write to a shared memory page, then the corresponding memory page is copied and marked as private to the process.

Executing tasks


After forking a child process, we typically want it to execute a task so that we can use some of the process' positive traits in our advantage. In the if(pid == 0) { ... } block (shown in the previous example), we can put the code that the child process must execute.

For example, we can execute a long running task without blocking the parent process, such as doing an expensive linear search over an array of strings:

#include <string.h>

char *strings[] = { "foo", "bar", "baz", ..., NULL };
char *search = "baz";

...
else if(pid == 0)
{
    unsigned int i;
    
    for(i = 0; i < strlen(strings); i++)
    {
        if(strcmp(strings[i], search) == 0)
        {
            printf("%s has been found!\n", search);
            _exit(0);
        }
    }

    printf("%s cannot be found!\n", search);
    _exit(1);
}

(As may be observed in the example above, the string array is allocated by the parent process, but since the child process manifests itself as a duplicate on spawning, it has a reference to a "logical copy" as well).

We can also change the user permissions of the child process (for example, to restrict a task from having super-user permissions that might do potential harm):

#include <sys/types.h>
#include <unistd.h>

...
else if(pid == 0)
{
    if(setgid(100) == 0 && setuid(1000) == 0)
    {
        /* Execute some code with restrictive user permissions */
        ...
    }
    else
    {
        printf("Cannot change user permissions!\n");
        _exit(1);
    }
}

or invoking external (pre-built) executables, such as the cat command to stream the contents of a file to the standard output:

...
else if(pid == 0)
{
    char *const args[] = { "cat", "bigtextfile.txt", NULL };
    execvp(args[0], args);
    _exit(1);
}

Waiting


In addition to forking and carrying out tasks by child processes, the parent process must also take notice of a child process' status at some point. This is actually an obligation for certain events, for example, when a child process terminates -- a terminated child process remains a zombie until the parent process takes notice of it.

Taking notice of a process' status can be done by invoking a wait function, such as:

#include <sys/types.h>
#include <wait.h>

pid_t pid;
int wstatus;

/* Fork and execute */

pid_t ret_pid = waitpid(pid, &status, 0);

The above function call specifically waits for a process with a given PID to change state, and captures its wait status in the wstatus variable.

As a sidenote: besides waiting for a specific child process' state to change, it also possible to wait for any process in a process group to terminate (e.g. by invoking wait()). Furthermore, the wait function invocation blocks the parent process' execution by default, until a child process' state changes. We can also pass the WNOHANG flag to waitpid() to prevent it from blocking.

After a wait function invocation completes, we must interpret the return value and wait status:

if(ret_pid == -1)
    printf("Cannot obtain wait status of PID: %d\n", pid);
else if(!WIFEXITED(wstatus))
    printf("The process terminated abnormally!\n");
else if(WEXITSTATUS(wstatus) != 0)
    printf("The process execution failed!\n");
else
    printf("The process has completed its tasks successfully!\n");

In the above code fragment, we check for the following properties:

  • Whether the wait status could be obtained. Sometimes this may not be possible, for example, if a process with a given PID does not exists or when it is beyond the parent process' control.
  • Whether a process has been terminated abnormally or not. For example, abnormal termination happens when a process runs into a segmentation fault. In such cases, it may happen that a process still returns a zero exit status, incorrectly indicating that everything has succeeded.
  • Whether a process has succeeded its tasks or not. By convention, a zero exit status indicates success, while any non-zero exit status indicates failure.

Output buffering


Sometimes, it may also be desired to propagate data back to the parent, for example, after the completion of a data transformation task. Since processes operate in their own private address space, we can no longer rely on shared memory, but we must use some means to transfer the data.

One of the possible means is using a pipe, as shown in the following code fragment:

int pipefd[2];

if(pipe(pipefd) == 0)
{
    pid_t pid = fork();

    if(pid == -1)
        fprintf(stderr, "Cannot fork process!\n");
    else if(pid == 0)
    {
        char *const args[] = { "sort", "words.txt", NULL };

        close(pipefd[0]); /* Close read-end of pipe */
        dup2(pipefd[1], 1); /* Attach write-end to the stdout */
        execvp(args[0], args);
        _exit(0);
    }
    else
    {
        close(pipefd[1]); /* Close write-end of pipe */
        /* Read from pipefd[0] */
        close(pipefd[0]);
    }
}
else
    fprintf(stderr, "Cannot construct a pipe!\n");

Before forking a child process, we construct a pipe consisting of two file descriptors -- a read and write end. In the child process, we close the read-end of the pipe (since it is not needed), and we write data to the write-end. In the parent, we read from the read-end and we discard the unneeded write-end.

When retrieving data from a pipe, we may want to capture its output in a data structure (such as a string, string array or struct). Capturing and transforming data to a specific structure is often not very straight forward, for example:

#define BUFFER_SIZE 1024

ssize_t bytes_read;
char *captured_string = NULL;
unsigned int captured_string_size = 0;

while((bytes_read = read(pipefd[0], buffer, BUFFER_SIZE)) > 0)
{
    char buffer[BUFFER_SIZE];

    captured_string = (char*)realloc(captured_string, captured_string_size + bytes_read);
    memcpy(captured_string + captured_string_size, buffer, bytes_read);
    captured_string_size += bytes_read;
}

/* Add NUL-termination */
captured_string = (char*)realloc(captured_string, captured_string_size + 1));
captured_string[captured_string_size] = '\0';

The purpose of above code fragment is to read data from a pipe and constructing a NUL-terminated string. It repeatedly reads chunks of data (of one kilobyte each) from the read-end of the pipe, dynamically extends the size of the buffer collecting the output, appends each chunk to the buffer, and finally appends a NUL-termination to the result.

As may be noticed, there are many concerns that we have to take care of and the resulting code is not trivial at all.

Orchestrating collections of processes


In addition to implementing the above patterns for a single child process running in the background, we may also want to apply them to collections of processes running concurrently.

Orchestrating collections of processes introduce many additional challenges beyond those described in the previous sections. For example, in order to read from the processes' pipes without blocking their executions (which could happen if any of their buffers gets full), we have to multiplex the house keeping operations in such a way that they read from each pipe breadth-first.

Multiplexing any of the previously shown patterns makes developing multi-process applications even more difficult.

A functional programming discipline


Because housekeeping tasks require us to supply many lines of boilerplate code, multi-process applications tend to become quite messy without any proper organization. For example, if I would take the sorting example (shown earlier) and extend it to capture the output of the process invocation into a string, I may end up writing:

#define BUFFER_SIZE 1024

int pipefd[2];

if(pipe(pipefd) == 0)
{
    pid_t pid = fork();

    if(pid == -1)
        fprintf(stderr, "Cannot fork process!\n");
    else if(pid == 0)
    {
        char *const args[] = { "sort", "words.txt", NULL };

        close(pipefd[0]); /* Close read-end of pipe */
        dup2(pipefd[1], 1); /* Attach write-end to the stdout */
        execvp(args[0], args);
        _exit(0);
    }
    else
    {
        ssize_t bytes_read;
        char *captured_string = NULL;
        unsigned int captured_string_size = 0;

        close(pipefd[1]); /* Close write-end of pipe */
        
        while((bytes_read = read(pipefd[0], buffer, BUFFER_SIZE)) > 0)
        {
            char buffer[BUFFER_SIZE];

            captured_string = (char*)realloc(captured_string, captured_string_size + bytes_read);
            memcpy(captured_string + captured_string_size, buffer, bytes_read);
            captured_string_size += bytes_read;
        }

        /* Add NUL-termination */
        captured_string = (char*)realloc(captured_string, captured_string_size + 1));
        captured_string[captured_string_size] = '\0';

        close(pipefd[0]); /* Close read-end of pipe */
    }
}
else
    fprintf(stderr, "Cannot construct a pipe!\n");

I do not expect anyone the study the above code fragment in detail, but just by looking at its structure and length, you could clearly see that this is not very appealing way to construct applications.

In contrast, the same task can be implemented in only one line of bash shell code:

captured_string=$(sort words.txt)

When I look at the previous code fragment from an abstract point of view, then it encapsulates an implementation of a specific concern (i.e. its primary task, such as sorting an array), and a number of general house-keeping concerns, such as constructing a pipe, waiting, output buffering, and closing file descriptors.

One possible way to structure the code in a better way is by function decomposition -- we can separate the specific and general concerns into functions and we can put the general house keeping aspects into a (reusable) library.

Consider the following synchronous function definition and invocation example that checks whether a given string exists in array of strings by doing a linear search:

#include <stdio.h>
#include <string.h>

int array_contains_string(const char **strings, const char *search)
{
    unsigned int i;
    
    for(i = 0; i < strlen(strings); i++)
    {
        if(strcmp(strings[i], search) == 0)
            return TRUE;
    }

    return FALSE;
}

int main(int argc, char *argv[])
{
    char *strings[] = { "foo", "bar", "baz", NULL };
    char *search  = "baz";
    int result = array_contains_string(strings, search);

    if(result)
        printf("The array does contain the string: %s\n", search);
    else
        printf("The array does not contain the string: %s\n", search);
        
    return 0;
}

Since searching (in particular linear searching) may take some time, we may want to change the function it into an asynchronous function running its primary task (the searching) in a child process. I can concisely express this primary concern in one single function:

#include <stdio.h>
#include <unistd.h>

pid_t array_contains_string_async(const char **strings, const char *search)
{
    pid_t pid = fork();

    if(pid == 0)
    {
        unsigned int i;
    
        for(i = 0; i < strlen(strings); i++)
        {
            if(strcmp(strings[i], search) == 0)
                _exit(0);
        }

        _exit(1);
    }

    return pid;
}

As may be noticed, the above function executes the same linear search procedure shown in the previous code fragment, with the following differences:

  • The function forks a child process and carries out the search operation in the child process.
  • Instead of returning a boolean value, it exits the child process with an exit status. By convention, a zero exit status indicates success while a non-zero exit status indicates failure.

We can capture the wait and exit status checking in a general utility function (procreact_wait_for_boolean()) that interprets the exit status of a child process as a boolean value (meaning that when a process exits with exit status 0 it returns TRUE and for any non-zero exit status, it returns FALSE):

#include <procreact_pid.h>

int main(int argc, char *argv[])
{
    char *strings[] = { "foo", "bar", "baz", NULL };
    char *search  = "baz";

    ProcReact_Status status;
    int result = procreact_wait_for_boolean(array_contains_string_async(strings, search), &status);

    if(status == PROCREACT_STATUS_OK)
    {
        if(result)
            printf("The array does contain the string: %s\n", search);
        else
            printf("The array does not contain the string: %s\n", search);
    }
    else
        fprintf(stderr, "The process terminated abnormally!\n");

    return 0;
}

As may be observed, by separating concerns and putting common operations into a library, we can accomplish the same result as the synchronous code fragment example, with relatively little overhead of boilerplate code.

Managing arbitrary output


The previously shown abstractions work well for functions returning a byte, boolean, or void-functions. However, it may also be desirable to implement asynchronous functions returning more complex data, such as strings or arrays of strings, for example:

#include <stdlib.h>
#include <string.h>

char *say_hello_to(const char *name)
{
    char *result = (char*)malloc(strlen(name) + 7 + 1);
    sprintf(result, "Hello %s!", name);
    return result;
}

The above function composes a string that greets a person with a given name. As explained earlier, implementing an asynchronous variant of the above function requires extra facilities to propagate the result back to the parent process, such as constructing a pipe, forcing us to do more housekeeping work.

In the previous examples, we were able to separate a task's primary concern into a function returning a PID and a function waiting and interpreting the exit status by using the PID reference. To manage complex data, we need to memorize more than just the PID -- we also need the pipe's file descriptors, store the buffered data, and the end result.

In some ways, a PID reference resembles another software abstraction -- a future in reactive programming or a promise in JavaScript. A future/promise is an object encapsulating a return value that will be provided at some point in the future.

We can encapsulate the entire housekeeping procedure for transferring and returning complex data in a ProcReact_Future struct:

#include <procreact_future.h>

ProcReact_Future say_hello_to_async(const char *name)
{
    ProcReact_Future future = procreact_initialize_future(procreact_create_string_type());

    if(future.pid == 0)
    {
        dprintf(future.fd, "Hello %s!", name);
        _exit(0);
    }

    return future;
}

The above code fragment looks similar to the synchronous function definition shown in the previous example, with the following differences:

  • By constructing a ProcReact_Future struct, we no longer have to fork a child process and construct a pipe ourselves.
  • The string composition step is carried out by the forked child process.
  • Instead of returning a heap-allocated string, we write the resulting string to the write-end of the pipe provided by the future struct and we terminate the process by invoking the exit function call.

The procreact_initialize_future() function takes a parameter: a type, that is responsible for reading the output from the pipe and converting it into a representation of choice -- in this case a NUL-terminated string.

We can collect the return value of the function in the parent process by invoking the procreact_future_get() function:

ProcReact_Status status;
ProcReact_Future future = say_hello_to_async(name);
char *result = procreact_future_get(&future, &status);

if(status == PROCREACT_STATUS_OK && result != NULL)
    printf("%s\n", result);
else
    fprintf(stderr, "Some error occured!\n");

The procreact_future_get() function (that looks similar to a Future's .get() method or Promise's .then() method) takes care of reading from the pipe, buffering the output, converting the output to a string, waiting for the child process to terminate and closing the obsolete file descriptors.

Furthermore, analogous to a Future or Promise, when the retrieval function gets invoked for a second time, it will return its cached value instead of reading from the pipe again.

Orchestrating collections of processes


With concerns well separated, orchestration of collections of process also becomes easier. For example, we may want to execute multiple invocations to the following function in parallel:

ProcReact_Future return_count_async(unsigned int count)
{
    ProcReact_Future future = procreact_initialize_future(procreact_create_string_type());

    if(future.pid == 0)
    {
        dprintf(future.fd, "%u", count);
        _exit(0);
    }

    return future;
}

The purpose of the function shown above is to simply return a string presentation of a given numeric counter value.

As with the abstraction facilities shown previously (such as ProcReact_Future), we can also create similar abstractions for orchestrating collections of processes including processes whose output need to be captured:

ProcReact_FutureIterator iterator = procreact_initialize_future_iterator(has_next_count,
  next_count_process,
  complete_count_process,
  &data);

The above function invocation: procreact_initialize_future_iterator() configures an ProcReact_FutureIterator struct. It takes the following parameters:

  • A pointer to a function that indicates whether there is a next element in the collection.
  • A pointer to a function that invokes the next process in the collection.
  • A pointer to a function that gets invoked when a process completes.
  • A void-pointer referring to an arbitrary data structure that gets passed to all functions above.

If I would like to invoke this function 5 times to say, count from 1 to 5, I can encapsulate the properties of this iteration process in the following data structure:

typedef struct
{
    unsigned int index;
    unsigned int amount;
    int success;
    char **results;
    unsigned int results_length;
}
IteratorData;

and compose the following instance from it:

IteratorData data = { 0, 5, TRUE, NULL, 0 };

The above struct maintains an index value indicating which element it currently processing, the amount value holds the amount of iterations that need to be executed, success is a boolean status flag indicating whether the iterations have all succeeded or not, and the results variable corresponds to an array of strings capturing the output of each function invocation.

The following function can be used to check whether we have completed the iteration or not:

static int has_next_count_process(void *data)
{
    IteratorData *iterator_data = (IteratorData*)data;
    return iterator_data->index < iterator_data->amount;
}

The following function executes each successive iteration step:

static ProcReact_Future next_count_process(void *data)
{
    IteratorData *iterator_data = (IteratorData*)data;
    ProcReact_Future future = return_count_async(iterator_data->index + 1);
    iterator_data->index++;
    return future;
}

The above function increases the index, invokes the function with the index as a parameter, and increases the index value.

The following function gets invoked when a process' execution finishes:

static void complete_count_process(void *data, ProcReact_Future *future, ProcReact_Status status)
{
    IteratorData *iterator_data = (IteratorData*)data;

    if(status == PROCREACT_STATUS_OK && future->result != NULL)
    {
        iterator_data->results = (char**)realloc(iterator_data->results, (iterator_data->results_length + 1) * sizeof(char*));
        iterator_data->results[iterator_data->results_length] = future->result;
        iterator_data->results_length++;
    }the 
    else
        iterator_data->success = FALSE;
}

The above function checks the status of the function invocation and captures the results that each function returns. When a process completes successfully (i.e. it does not terminate abnormally and provides a non-NULL result), it appends the result to the results array. In case of a failure, it sets the overall status flag of the iterator to FALSE.

With all iteration aspects abstracted away into a ProcReact_Future struct, we can execute the following function to do all iteration steps in parallel:

procreact_fork_in_parallel_buffer_and_wait(&iterator);

We can also limit the amount of processes that are allowed to run concurrently to a specific value, e.g. 2:

procreact_fork_buffer_and_wait_in_parallel_limit(&iterator, 2);

After executing all iterations, we can consult the data struct to figure out whether the iterations have succeeded and what their results are.

Asynchronously orchestrating collections of processes


The previous two collection examples are executed synchronously. This means that while the execution of each function that retrieves an element is done asynchronously, the overall iteration task blocks the parent process until it completes, which is not always desirable.

A possible solution to make iterations asynchronous is to fork another process and iterate over the collection in the child process, but this introduces another challenge when the collected data needs to be returned to the parent.

Instead of performing all iteration steps as a whole we can also control each iteration step ourselves. For example, the following command-line invocation executes a single iteration step that composes a ProcReact_Future struct:

if(procreact_spawn_next_future(&iterator))
    printf("Spawned a process and we have more of them!\n");
else
    printf("All processes have been spawned\n");

We can also run a each buffer iteration step ourselves and integrate that function call into a program's main loop:

while(TRUE)
{
    unsigned int running_processes = procreact_buffer(&iterator);

    if(running_processes == 0)
    {
        /* This indicates that there are no running processes anymore */
        /* You could do something with the end result here */
    }

    /* Do other stuff in the main loop */
}

The above code fragment allows us to evaluate processes' statuses and buffer their outputs without blocking the main loop.

Summary


In this blog post, I have described a number of common housekeeping tasks I typically need to implement when developing multi-process applications and I have described an API that abstracts over these common house keeping tasks.

To summarize, when we intend to execute task or a collection of tasks, we can use one of the following data structures, depending whether we need to evaluate a single value or a collection of values, and whether the type of the value is simple (e.g. a boolean, byte, or void) or complex (e.g. strings, arrays of strings, etc.):

One Many
Simple pid_t ProcReact_PidIterator
Complex ProcReact_Future<ProcReact_Type> ProcReact_FutureIterator

Processing collections of tasks for each type can be done either synchronously or asynchronously, by picking the appropriate utility functions:

Synchronous Asynchronous
Simple procreact_fork_in_parallel_and_wait()
procreact_fork_and_wait_in_parallel_limit()
procreact_register_signal_handler()
procreact_spawn_next_pid()
procreact_complete_all_finished_processes()
Complex procreact_fork_in_parallel_buffer_and_wait()
procreact_fork_buffer_and_wait_in_parallel_limit()
procreact_spawn_next_future()
procreact_buffer()

Motivation: Disnix


After reading this blog post, you may probably wonder why I have developed these abstractions? My main motivation is to organize Disnix's codebase in a better way.

Besides being a distributed deployment tool, all deployment activities that Disnix carries out (package management operations, state management operations, and communications) are carried out by external processes, for exactly the same reasons mentioned in the introduction.

Before deriving the abstractions described in this blog post, all process coordination in Disnix was hand coded -- Disnix's code was cluttered by boilerplate code doing output buffering, concurrency limiting, waiting, and resource allocation. As a result, some areas of the code were quite hard to read and difficult to extend. Moreover, it was also quite hard to ensure that the code is free from some serious issues, such as potential buffer overflows.

After restructuring the code to use these abstractions (between Git revisions 595c1ec and 4ba5e3d), I have managed to significantly reduce the amount of code. For example, the methods.c module (an RPC interface for all deployment operations that Disnix can execute remotely) previously consisted of 1976 LOC. After refactoring, it consists of only 488 LOC.

Moreover, the disnix-build utility's main module (build.c) size has been reduced from 250 LOC to 153 LOC. Each command-line utility that executes a deployment task (13 in total) each have at least 100 fewer lines lines of code.

In addition to reducing the size of many modules, I have also accomplished the following:

  • Better separation of concerns. I have managed to clearly separate all asynchronous functions spawning processes into three separate libraries: (libpkgmgmt for package manager, libstatemgmt for state management and libinterface for communication) considerably simplifying the toolset's architecture.
  • Performance improvements. Some tools (disnix-capture-infra and disnix-query) were still executing their tasks sequentially and were difficult to optimize. With these new abstractions, parallelizing their tasks became quite simple.
  • Better error reporting. In older versions of Disnix, it was difficult to relate error messages to target machines. With the abstractions described in this blog post, implementing a translation process became much easier. As a result, all tools can now clearly report the origins of an error.

Discussion


Although the abstractions described in this blog post have allowed me to structure Disnix's code in a much better way, they are not a solution for everything.

For example, the ProcReact_Future abstraction buffers the process' output, which is quite inadequate for processes transferring large quantities of data. Moreover, each function invocation implies a fork. On a very large scale, this could become quite expensive, as it takes time to set up a child process and memory for allocating the Copy-on-Write (COW) table. In theory, we could also make it possible to reuse spawned processes among function invocations to optimize this, but currently no such optimizations have been implemented yet.

Furthermore, processes are not the only solution for implementing concurrent applications. When frequent interaction is required between parent and child, it may be better to use threads as their memory is shared (at the same time, this also has disadvantages). A major disadvantage of processes (compared to threads) is that all communication data needs to be serialized, transferred and deserialized, introducing quite a bit of communication overhead.

Apart from processes and threads, it is also possible to use a single threaded event loop, in which the programmer has the obligation to divide bigger tasks into small tasks. For example, this model is prominently used in Node.js. The advantage of this approach is that each additional concurrent task does not require the allocation of additional resources. This works, for example, well for applications that are mostly I/O-bound (I/O is typically thousands of times slower than a CPU).

The disadvantages of such an application organization is that it becomes a programmer's obligation to ensure that the main thread never blocks and that the code remains structured properly. Moreover, "context-switching" between single threaded tasks is more expensive than a context switch on CPU-level, making it quite inadequate for computationally intensive applications. Finally, the operating system cannot automatically divide tasks over multiple CPU cores.

Availability


libprocreact is part of the latest development version of Disnix. Additionally, I have isolated the library code and created a separate GitHub repository for it.

Although libprocreact can be used independently from Disnix, I currently do not have any plans to make it a fully fledged general purpose library. At the moment, it is not supposed to be used for any other use cases than Disnix's.

Friday, December 30, 2016

Sixth annual blog reflection

Today, it is my blog's sixth anniversary. As usual, this a good opportunity to reflect over last year's writings.

Disnix


Similar to last year, many of this year's blog posts have been covering Disnix-related aspects. In January, I released version 0.5 of Disnix, wrapping up most of the new functionality I have developed in the last months of 2015.

After releasing Disnix 0.5, I have been working on revising its architecture to provide the concept of containers, so that it became possible to deploy databases to multiple DBMS instances hosted on the same machine. Also, the notational conventions used in the Disnix infrastructure models have become much more structured.

In addition to the container concept, I modified Disnix in such a way that it can treat deployed services as containers, so that they become deployment targets for newly deployed services. Previously, it was required to deploy containers by other means first. With these new changes, complex Disnix deployments can now fully manage themselves.

Furthermore, I made some small adjustments so that Disnix can be used as a remote package deployer and I have extended the prototype Dynamic Disnix framework with new features to support state deployment and automatic port assignments to services. Most of these new features have become part of Disnix 0.6, released in June 2016.

node2nix


Another major accomplishment is my Nix/NPM integration work. I have (again!) reengineered my npm2nix fork to support NPM flat module installations.

Moreover, I rebranded my npm2nix fork into node2nix, integrated it into Nixpkgs (replacing the old npm2nix generated set of packages), and extended it with support for simulating NPM global package installations so that Node.js-related build tools, such as Grunt, can be conveniently used within the Nix ecosystem.

Nix/NixOS


Similar to previous years, I also did some general Nix/NixOS work -- I have modified Dysnomia (that used to be companion tool for Disnix) to integrate it with NixOS, so that it can also do state deployment on system level.

I also did some Nix promotion work. In March, I was invited to give a talk about Nix/NixOS at a Ruby-related conference. In this talk, I elaborated about Nix's declarative deployment properties.

Furthermore, I wrote a blog post explaining Nix's push and pull distribution mechanisms.

Miscellaneous stuff


Besides Nix/deployment-related work I have also written a blog post about integrating Node.js-style callback and Promise/A-based invocation patterns in JavaScript and a porting strategy providing XMPP messaging functionality to mobile apps using the Titanium framework.

Finally, in my Christmas break, I have been playing around with an old gaming project and wrote about my experiences.

Blog posts


Like all my previous reflections, I will publish the top 10 of my most frequently read blog posts. Surprisingly enough, not much has changed compared to last year:

  1. On Nix and GNU Guix. Still remains my most popular blog post since 2012 and it appears that it is still attracting quite a few visitors.
  2. An evaluation and comparison of Snappy Ubuntu. Still remains my second most popular blog post attracting almost as many visitors as the previous blog post.
  3. Managing private Nix packages outside the Nixpkgs tree. This blog post is a practical hands on tutorial targeting Nix beginners. It has now moved to the third place and it seems to be quite frequently consulted.
  4. Setting up a multi-user Nix installation on non-NixOS systems. This blog post also remains quite popular since 2014 and demonstrates that this area in the Nix user manual is still open for improvement.
  5. An alternative explanation of the Nix package manager. Has slightly dropped in popularity, but is still frequent read.
  6. Yet another blog post about Object Oriented Programming and JavaScript. This JavaScript-related blog post still remains incredibly popular. It now seems that I have become (sort of) an authority in explaining the prototype inheritance concept.
  7. Asynchronous programming with JavaScript. Another JavaScript-related blog post that remains popular, although I have no idea why.
  8. Composing FHS-compatible chroot environments with Nix (or deploying Steam in NixOS). Popular, but gradually dropping in popularity.
  9. On NixOps, Disnix, service deployment and infrastructure deployment. This is the only blog post that was not in last year's top 10. I am actually quite happy to find out that there is an increasing amount of people taking interest in both NixOps and Disnix.
  10. Using Nix while doing development. An explanation blog post that is still popular but gradually dropping in popularity.

Conclusion


I am still not out of ideas yet, so please stay tuned, because there will be more next year! The only thing I would like to say is:

HAPPY NEW YEAR!!!!!!!

Creating a total conversion for Duke3D

It has been a while since I wrote my last blog post. Currently, I am enjoying a small Christmas break and I have spent a bit of my time playing around with a few of my old (pre-blog historical) projects.


Since the year 2016 is almost over, and because it has been roughly 20 years ago since the video game Duke Nukem 3D was released (1996), there was an abandoned project that particularly caught my interest.

Although the game was a very famous (or perhaps notorious) first-person shooter game known for having highly interactive environments, bad humour and non-linear levels, what I found particularly interesting is that it was also highly customizable -- it came with a level-editor (BUILD) allowing anyone to create their own custom maps, an art editor (EDITART) allowing people to create their own textures and sprites, and a scripting language (*.CON files) making it possible to modify or extend many aspects of the game, such as the behaviour of enemies, weapons, objects and the levels per episode.

Because of its hackability and the relative simplicity of the game engine and mechanics, these customization features were quite frequently used. Aside from many user created maps circling around the internet, there were also many total conversions available completely altering the game's behaviour. Some total conversions I found really interesting were Ages in Time (turning the game into a medieval setting), Platoon (a Vietnam war setting) and Wolf2Duke (a cross-over between Wolfenstein 3D and Duke Nukem 3D).

In 2003, the source code of Duke Nukem 3D was released under the GPLv2 (earlier in 2000, the source code of the BUILD engine that empowers Duke3D as well as several other games had already been released under Ken Silverman's BUILD license).

With the availability of the source code of both the game and the underlying engine, people have been making adjustments to make the game work on other operating systems than DOS (such as Windows and Linux) and enhancing its functionality to make it work better on modern systems.

For example, the most advanced source port of Duke Nukem 3D: EDuke32 complements the game engine with an OpenGL renderer with dynamic shading support. As a result, one of EDuke32's distinguished features is that it supports the fan-made High Resolution Pack enhancing the graphics quality of the textures and monsters.

While still being a teenager at high school, I have also spent quite a considerable amount of my spare time playing with these customization features. First, I started creating a couple of custom maps. Later, I started making other kinds of customizations, such as adjusting the monsters' behaviour to make them more dangerous.

I have combined all these modifications into my own total conversion consisting of two modified episodes containing 22 user maps in total. In this blog post, I will describe some of my development experiences.

Creating custom maps


Most of my work on the total conversion was spent on creating my own custom maps. Maps could be constructed with the BUILD editor included with the game. For example, the following command-line instruction allows us to open the first map of the single player episode of my total conversion:

> BUILD DDSL1.MAP

Starting the editor brings you a 2D view of the map shown from above:


Essentially, BUILD engine maps consist of 2D areas called sectors, denoted by areas surrounded by a white border, as shown in the picture above. It is also possible to create sub sectors (denoted by areas surrounded by a red border), to give sub areas in an existing sector different properties. I, for example, used them a lot for creating shadows.

Besides sectors, objects (such as monsters, items or weapons) are represented as sprites denoted by purple (or cyan) colored ovals with "sticks" attached to them.

By pressing the enter key on the numeric keypad, you could switch to 3D mode to see what the map would actually look like:


In 3D mode, there were many additional properties you could configure, such as the heights of the floor and ceilings, and the appearance of objects. For example, by pointing the crosshair to a wall, ceiling, floor, or sprite and pressing the 'V' key you could alter a texture or sprite's appearance:


Something that particularly puzzled me in the early beginning was constructing doors, such as garage doors that move up or down. I still remember very well that I had been studying the documentation for days, without any luck.

Eventually, by inspecting/disassembling other user maps I learned how the mechanics really worked -- the trick was to create a dedicated sector for the door and annotating the sector with a lotag and hitag value (that could be configured by pressing the 'T' key):


Lotag and hitag values (as denoted by the grey label in the picture above) were just metadata properties to the editor, but the game engine interpreted lotag values as functionality and hitag values as a means to group objects. The sector lotag value of: 20 corresponds to a function that moves the ceiling up to match the height of neighbouring ceiling once the player touches it. Any other object with the same hitag value would be triggered as well (that is why many doors in a map typically have unique hitag values, unless they were controlled by a key).

Besides configuring a sector to be a door, we also typically want to hear a sound effect when a player opens or closes it. Sound effects could be configured by placing a special-purpose sprite (MUSICANDSFX) (visible in the editor but invisible in the game), and annotating the sprite with a lotag (through Alt + 'T') corresponding to the number of the audio sample that should be played.

Additionally, we had to lower the door sector's ceiling in 3D mode in such a way that the it touches the ground, as a door is typically supposed to be closed by default. For example, if we "disassemble" a door by moving the ceiling up a bit in 3D mode, this is what we will see:


(As may be observed in the image above, the 'M'-sprite is the special-purpose sprite that configures the sound effect).

In addition to garage doors, there were many other kinds of interactivity patterns thay you could implement, such as rotating doors, elevators, moving vehicles (through moving sectors), shading effects, and objects that spawn if some action is triggered, all by annotating sectors and special-purpose sprites with lotags and hitags.

Despite the richness of features, the BUILD engine version used in Duke Nukem 3D also had a number of big limitations. For example, while 3D mode gave the player the illusion that he was observing a 3D world, most of its aspects were not truly 3D. For example, it was impossible to look up or down, because of the limitations of the renderer. Nonetheless, the game still gave the player the illusion that this was possible by simply stretching textures and adjusting the positions of the sprites a bit. (As a sidenote: after the source code of the BUILD engine was released, this issue was solved in the Polymost OpenGL-renderer).

Moreover, in 2D mode it was also possible to stack sectors on top of each other. However, in 3D mode you could only observe one of them at the time, preventing players to have a true room over room experience.

In some maps, a room over room experience was faked by "teleporting" a player from one sector to another. For example, when moving from an under water area to the surface, the player was technically "teleported" from one sector to another.


The above picture shows an area in my first map, in which I created a swimming pool. The swimming pool's surface is the rectangled sub sector on the right surrounded by a red border, while the under-water area is the rectangled sector on the left. By placing a sector effector sprite (the S-sprite shown below) in both sectors with a lotag value of: 7 and equal high tags, the teleportation could be controlled:


A similar fake experience was used for lifts -- when moving a lift up or down, the player was also technically teleported from one sector to another.

Another interesting feature of the game is that it was also possible to study and modify the maps that were included with the game. For example, the following command-line instruction opens the 9th level of the 3th episode:

> BUILD E3L9.MAP

(As a sidenote: if you may wonder why this works, the corresponding file: E3L9.MAP does not exists as a file, but is extracted from the game data's group file: DUKE3D.GRP).

Although I created nearly all my maps from scratch, the E3L9.MAP map, a football stadium in which you had to defeat the cycloid emperor boss, was a bit of a disappointment to me, because it was IMO too small and felt quite incomplete. As result, I created an extended version of the map adding many additional features, such as a stand and locker rooms:


Hacking the CON scripts


After creating a couple of maps and playing some total conversions, I also became interested in studying the CON scripts to see what kinds of modifications I could make. By skimming over them and playing around a bit, I made some interesting discoveries.

One of the things I learned in the BUILD editor is that you can create palette swapped sprites and textures. In the game, palette swaps were typically used to create color illuminated rooms or to change a monster's behaviour. A prominent example of such a monster is a blue colored lizard trooper that could be changed into a red colored lizard trooper. The latter was slightly stronger and had the ability to teleport.

Besides lizard troopers, it was also possible to create palette swaps of other enemies, such as pig cops, but I noticed that their behaviour did not change at all:


I wanted red colored pig cops to be more menacing than the blue ones. With a few small modifications to the GAME.CON file, I changed the behaviour of the red colored pig cops to shoot rockets (while keeping the blue colored pig cop's original behaviour intact), by changing the lines:

ifcanshoottarget
{
    sound PIG_ATTACK
    shoot SHOTGUN
    shoot SHOTGUN
    shoot SHOTGUN
    shoot SHOTGUN
    shoot SHOTGUN
}

into:

ifcanshoottarget
{
    ifspritepal 21
    {
        sound RPG_SHOOT
        shoot RPG
        shoot RPG
        shoot RPG
        shoot RPG
        shoot RPG
    }
    else
    {
        sound PIG_ATTACK
        shoot SHOTGUN
        shoot SHOTGUN
        shoot SHOTGUN
        shoot SHOTGUN
        shoot SHOTGUN
    }
}

To make a red colored pig cop drop an RPG instead of a shotgun when it gets killed, I changed:
ifrnd 16
    spawn SHIELD
else
    state drop_shotgun

into:
ifrnd 16
    spawn SHIELD
else
    ifspritepal 21
        state drop_rpg
    else
        state drop_shotgun

Similarly, I modified the code of the lizard mans in such a way that they would shoot with an expander if they were made red, or with a freezer when they are made blue.

In the game, there was also a miniature/weaker version of the first episode's end boss that behaved in a similar way. Also, when a miniature boss gets defeated, the game did not end.

While studying the CON files, I discovered that miniature versions of the second and third bosses also seemed to exist, but they were not used anywhere in the game. The only limitation that prevented them from being used is that their strength levels were set to 1 so that they would die almost instantly.

I modified their energy levels to make them much stronger, by taking the following lines:

ifspritepal 0
    ai AIBOSS2RUNENEMY
else
{
    strength 1
    sound BOS2_ATTACK ai AIBOSS2SHOOTENEMY
}
and changing the strength value into a something much higher than 1, e.g. 2000. One of the places where I used a miniature boss 2 is in my space station map:


Although these hacked mini bosses seemed to work, I observed that while fighting them, they would quite often hit themselves with their own rockets making it incredibly easy to beat them. :-)

Hacking on Wolf2Duke


Apart from my own, one of the total conversions I found quite interesting was Wolf2Duke creating a cross over between Duke Nukem 3D and Wolfenstein 3D.

Although I liked the idea, I was a bit disappointed about its game experience, because it did not really feel like being in a Wolfenstein 3D game at all. For example, most levels did not use any Wolfenstein 3D textures. Moreover, the enemies also behaved in odd ways and their attacks were extremely powerful.

After studying the total conversion's artifacts, such as the EDITART files, I noticed that there were many textures and sprites included from the original Wolfenstein 3D, but not used anywhere in the conversion. It even included images for items such as med kits, dogfood, and ammo clips, but no functionality was implemented to make them work.

I modified Wolf2Duke's CON scripts to add some of this missing functionality. I added the following code snippet to make the dog food item grant extra health (20 HP) to the player:

useractor notenemy DOGFOOD
  fall
  ifmove RESPAWN_ACTOR_FLAG
    state respawnit
  else
    ifp pshrunk nullop
    else
      ifp palive
        ifpdistl RETRIEVEDISTANCE
          ifcount 6
            ifphealthl MAXPLAYERHEALTH
              ifcanseetarget
      {
        addphealth 20
        quote 125
        ifspawnedby DOGFOOD
          state wolfgetcode
        else
          state wolfquikget
      }
enda


Moreover, I added some code to make the ammo clip work. When collecting a Wolfenstein 3D ammo clip, it would randomly grant the player pistol or chaingun ammo. How nice is that?

Besides hacking on the CON scripts, I also created two levels using my modified Wolf2Duke features in which I have been trying to create an experience that felt like the original game. Both levels were included as secret levels in my total conversion.

Reflection


I started the work on my own total conversion somewhere in 1997. It took me almost four years to complete it -- I finished my last map somewhere at the end of 2000.

Around 2005 (a few years after the open source release of Duke Nukem 3D and the release of EDuke32 with the High Resolution Pack (HRP)), I made some adjustments to the maps to make them look better when the engine's newer features were used.

Overall, I consider it to be a very interesting learning experience -- I learned a lot about the game engine, the game mechanics, and about game development in general. As a matter of fact, I liked tweaking and constructing my own maps much more than actually playing the game. :-)

With only limited resources, such as a slow dial-up internet connection and limited proficiency in English, I managed to figure out everything on my own. In total, I have created two episodes with 11 levels each. The maps were quite diverse -- for example, I have created city areas, space stations, landscapes, theme parks, factory buildings, and offices.


In my most ambitious map (as shown in the screenshot above), I was attempting to model the entire neighbourhood where I used to grow up. It took me nearly 7 months to complete the map and it became so big that it was pushing the engine to its limits. At some point, I had to make tradeoffs, such as sacrificing shadows, to prevent the game engine from hitting the maximum amount of sectors limit of 8192.

What I also learned is that game development is quite challenging and time consuming -- even without doing any real programming work and having a game engine at my disposal, it still takes a lot of time to construct something coherent, such as the levels, to test them and to tweak/optimize them to make them "feel right". To me, it is no surprise that many modern block buster games take many years to construct.

In addition to Duke Nukem 3D, I also created one user map for Shadow Warrior. Although it was using an improved version of the same game engine (BUILD), its game mechanics were completely different.

Furthermore, I made some attempts to construct maps for Unreal and Half-Life 1, but nothing successful came out of it. Due to lack of time and the huge amount of complexity I had to bridge, I lost my interest and changed my software development interests.