Oxidation is a process of adding oxygen to a chemical compound. Some examples are burning, and rusting. This experiment concerns the Rusting of a compound called Squeekboard: a derivative of Eekboard, originally containing high quantities of C, and reacting eagerly with GObject, GTK, and the X windowing system.
The goal of the ongoing experiment is to measure properties of Rust and the consequences of its application in real-world conditions. Due to safety and time concerns, the widely popular approach of Rewrite it in Rust (RiiR) was dismissed in favor of a gradual oxidation process.
Tested hypotheses were:
- does replacing C code with Rust reduce overall compound size?
- does using Rust instead of C reduce the incidence of uncontrollable reactions with memory (e.g. segfaults)?
- how to oxidize existing compounds rich in C?
- can compounds with Rust in them be manufactured using existing industrial processes?
The experiment relies entirely on Squeekboard as the subject. It has been chosen due to the need to redesign it for a new process (X.org to Wayland), and due to being relatively easy to separate.
Because Rust is an element belonging to the programming language group, this analysis ignores all other constituents of Squeekboard. Squeekboard’s programming languages are almost exclusively Rust and C, with some shell and Meson impurities, which are subsequently ignored, as replacing them with Rust is not expected to yield useful results.
Quantities of programming languages are measured in Significant Lines of Code (SLOC) as determined by the cloc tool.
The measure of memory reactions is the sum of anecdotal crashes, and filed memory-related issues in the bug tracker.
Unfortunately, some factors could not be rigorously accounted for due to the Squeekboard compound being under active development. For that reason, all conclusions relating measurements to macroscopic properties carry a significant margin of error.
Squeekboard was separated from Eekboard when it contained about 15 thousands SLOC of C. At the time of this analysis, it contained 5567 SLOC, out of which 3526 are of C, and 2041 of Rust, dropping from 14862 lines of C before the oxidation, for a reduction of 62.5%.
Figure 1: Lines of code by language. This figure approximates the timeline of the changes within the Squeekboard compound. Commits were ordered according to
git log. Because several changes could happen in parallel, there are artifacts in form of spikes and troughs resulting from one set of changes being displayed, then seemingly reversed (parallel set of changes displayed), ending with the first applied again (merge commit).
The general trend of size change is steeply downwards, owing chiefly to the removal of unnecessary DBus complexes, custom types, and the simplification of objects. A large drop in the middle of the figure stands out, corresponding to commit c7d5e8d, which replaced custom styling with a GTK calls. Two less pronounced drops in C size happened before that event: 4bf4500 and 6f5f497, cleaning up previously unconnected pieces. A later commit (521796a) removes a quantity of Rust by replacing an included copy of bitflags, relying instead on one provided externally.
With most recent changes, the trend flattens out, however. Considering that there are very few unnecessary components within Squeekboard, and the future need to make it exhibit new properties and behaviours, it can be expected that the trend will soon reverse.
The oxidation timeline contains three main phases: the introduction of Rust, which held at several hundred lines of code for a significant number of commits, followed by the phase of rapid increase (starting with the inclusion of bitflags in a6ee303), and a final flat phase of no increase. The general trend of the increase of Rust content seems to be matched by an opposite, but stronger trend of C content decrease, resulting in overall decrease.
Figure 2: Increase in Rust content versus total size, by commit. The red line is a least squares fit with slope equal to 0.53.
Figure 3: Increase in Rust content versus total size, by commit, zoomed in.
In order to determine if adding Rust lines correlates with a change in total size, a correlation factor was calculated. The resulting value equals 0.196, suggesting a weak positive correlation. It suggests that on a short time scale, changing the amount of Rust does not correlate with total size changes a lot.
Additional least squares fit was calculated to determine whether adding Rust is the cause of total code changes. The value obtained is equal 0.53, meaning that for each added line of Rust code, the total size changes by about 0.53 lines.
At a glance, it means that C is not being converted to Rust much at all, which is inconsistent with the large scale trend. Unfortunately, this kind of analysis is only concerned with single commits, and therefore only confirms that C to Rust conversion does not happen on the scale of single commits.
The 0.53 value could mean that adding a line of Rust removes 0.47 lines of C on average, however it’s difficult to verify due to the binomial distribution of commits: most are either concerned exclusively with C or in Rust.
Reactions with memory
The remaining question regarding the properties of a mixed C and Rust compound concerns its reactions with memory. Rust is regarded as a highly controllable element, as opposed to C. In the process of creating the converted version of Squeekboard, it has proven to be true so far. Before the conversion, spontaneous disintegration was a regular part of the development process, occurring at unpredictable intervals across most development activities. As the conversion progressed, such incidents became less common, and instead when mistakes are made, Squeekboard simply cannot be built.
However, such protection is not perfect. There is still danger of making undetected mistakes at the C/Rust interface, including the risk of leaking memory through it. considering that Squeekboard still relies on C bases, careful memory interactions will always be necessary in some places, especially in Rust FFI parts that receive calls from or call C.
While analyzing memory interaction anomalies with tools like memcheck, heaptrack, and massif, the Rust contents didn’t cause any obvious issues.
The process of removing C groups and replacing them with Rust is best done in small chunks. Since the conversion itself must be done in a single step, it’s generally advisable to break down any conversion into a series of smaller ones whenever possible.
The first step for any conversion is identifying the property to be bestowed upon the compound: e.g. easier interaction with JSON, or less time spent changing some internal part while developing. Following that, we will get an early idea of what kind of work needs to be done. Such work will generally involve several objects. In Squeekboard, the best strategy seemed to be converting each object separately in a change on its own, slowly approaching the core needed change.
The general advice for such conversion is: pass trivial structures like Rectangle, Point by value, using
CStrings if the C side needs to read their contents. And finally, pass complicated Rust/C objects as pointers. C objects can be represented in Rust as
struct MyCObbj(*const c_void);, whereas Rust must be kept boxed since creation:
Box::into_raw(boxobj) gives a pointer to an instance of
struct rustobj;. Such objects still need to be freed manually using
Box::from_raw(rawobj), but allow more freedom as more objects are converted.
In order to build Squeekboard in significant quantities, the Meson build system is used. While it’s perfectly suited for pure C compounds, integrating Rust poses new challenges.
Initial additions of Rust required not many changes: adding
rust to the project specification, the relevant
.rs files, and linking the results together. That approach had a large shortcoming: ready-made Rust parts are not contained in single files, and often use a complicated synthesis process, orchestrated by Cargo.
While Cargo is also a build system, and Meson’s counterpart, it really works much better as a package manager, while Meson is light years ahead of it as a build system. for this reason, the Squeekboard process team decided not to switch to Cargo wholesale, but rather use a Cargo process as subservient to the orchestrating Meson one.
In order to achieve this, Cargo configuration and some glue has been added to the project, and the result of the Cargo process (
librs) has been carefully and statically linked to the rest of results as a
custom_target. That procedure required Meson 0.51, which complicated matters a bit.
Ultimately, even Cargo tests are useable through Meson, although they are not linked to the C components, and therefore cannot test C interactions (this area has not been explored yet).
As Squeekboard is further integrated into the Debian process, it has been important to test its building in a Debian environment. The first snag was hit with Meson 0.51, which was not available in Buster. After packaging Meson, the packaging of Squeekboard itself posed some issues related to Debian versions of Rust pieces not necessarily corresponding to crates.io (Cargo repository) ones. It has been resolved by removing the
Cargo.lock as part of the build process, although it is clearly not a perfect solution.
The conversion of C to Rust caused Squeekboard’s structure to change considerably, with some costs and benefits. The most important benefit was replacing the bloated XML receptor with a YAML one based on serde, saving on the order of a thousand lines, while improving validation. Another benefit is the usage of typed hash tables and arrays, reducing the possibility of errors.
One big downside of conversion is the need to add glue parts between C and Rust for every conversion. While they are strictly Rust, their structure is often exactly like C, operating on raw pointers, with the added overhead of converting to Boxes and managing their locations, negating many of the benefits of Rust. Thankfully, as oxidation progresses, and the objects they deal with receive no more attention from C, they either mature to idiomatic Rust, or get removed entirely.
Experiences interfacing with external parts have been mostly positive: Wayland interfaces can be created using the same rules as internal ones, and calling Wayland functions directly in Rust is easier than connecting through C-based types. GTK usage in the recently finished popover experiment has been more mixed, degrading to “C-in-Rust” for some holes in GVariant support, while still allowing to manipulate data more easily.
As the progress of Squeekboard oxidation progressed, its size was greatly reduced (by 62.5%). However, it’s difficult to attribute those changes to oxidation alone, as many unused and unnecessary pieces have been removed or replaced. At the same time, additional properties have been added, muddling the picture even more.
On the scale of a commit, no reduction of total size as a function of Rust additions had been found either. This result is also quite uncertain, with a low correlation factor of 0.196.
There exists anecdotal evidence for having more Rust making memory errors more obvious and easier to remedy, based on the development process itself.
The oxidation experiment being successful itself proved that coexistence of C and Rust is possible, and achievable with some build process changes, and even adhering to Debian-like processes.
The authors of this paper would like to acknowledge the #debian-rust channel and the patient Debian wizards at Purism for process help, the Veusz project, Python, and LibreOffice for data analysis, and the Rust, Meson, and Cargo projects for providing tools necessary for the experiment.
Discover the Librem 5
Purism believes building the Librem 5 is just one step on the road to launching a digital rights movement, where we—the people—stand up for our digital rights, where we place the control of your data and your family’s data back where it belongs: in your own hands.
The post Oxidizing Squeekboard appeared first on Purism.