s.im.pl:
serialization guide

If you need to develop applications that make use of existing data (e.g. RSS), communicate over the network (e.g. a multiplayer game), and capture data structured data from the web (e.g. Wikipedia, the ACM Portal, or IMDB), S.IM.PL is the place to start.

The Support for Information-Mapping in Programming Languages project develops inter-platform support for writing software that manipulates complex information in applications by enhancing imperative programming languages, and developing complementary declarative mark-up languages. S.IM.PL serialization provides fine-grained control of object↔serialization binding through the Data Binding Annotation Language (DBAL) that is embedded directly in source code. DBAL enables the straightforward definition of strongly-typed objects that match serialized structures, for example existing XML formats. Inheritable translation scopes group sets of object↔serialization binding definitions, and enable inheritance. The present system supports (compressed) XML for serialization, while future work will develop alternate translation schemes, such as type-length-value and JSON.

In the present implementation, developers define the semantics of object marshalling by embedding DBAL annotations in Java source code. We refer to languages that support DBAL annotations as source languages. We currently only support Java as a source language; C# is planned. From source languages, one can generate equivalent object definitions, and XML that drives translation, for target languages. Target languages do not support DBAL annotations and require transferring object-serialization bindings defined from a source language (or specifying it by hand). Currently, we support Objective-C as a target language, enabling development of iPhone and Macintosh client applications, which use OODSS to talk to a Java server. We plan to also support other target languages, such as C++.

We recommend that new S.IM.PL Serialization developers begin with tutorials, and peruse DBAL guide and API.

performance

S.IM.PL Serialization performance was measured with the bindmark XML binding benchmarks. S.IM.PL Serialization is the best!

framework / marshalling time (nanoseconds) unmarshalling time (nanoseconds) runtime
(# runs) small (1000 runs) medium (100) large (10) small (1000) medium (100) large (10) size (KB)
s.im.pl serialization 39,004 1,528,674 7,151,383 289,966 5,815,660 35,446,572 176
JiBX* 35,943 1,719,212 15,627,181 70,420 5,683,769 39,249,876 141
JAXB 2.0 65,749 2,223,438 10,567,599 362,868 14,889,646 87,146,075 3,800
XStream 413,465 18,373,301 173,198,604 719,290 28,818,238 211,358,650 368
Castor 1,183,407 11,831,280 55,922,610 1,813,563 22,926,830 162,330,464 3,000
XML file size (KB)       1.42 123 1,003  
XML file lines       55 1,817 8,567  
XML file depth       2 20 45  

*JiBX uses a binding compile step to optimize Java byte code for performance; this step is excluded from the benchmark.

publications
Shahzad, N., S.IM.PL Serialization: Type System Scopes Encapsulate Cross-Language, Multi-Format Information Binding, Texas A&M University Masters Thesis, 2011.
Kerne, A., Toups Dugas, P. O., Dworaczyk, B., Khandelwal, M. A Concise XML Binding Framework Facilitates Practical Object-Oriented Document Engineering , Proceedings of ACM Symposium on Document Engineering, Sao Paulo, Brazil, 16-19 September 2008.
Kerne, A., Toups Dugas, P. O., Dworaczyk, B., Khandelwal, K., Expressive, Efficient, Embedded, and Component-based XML-Java Data Binding Framework, Interface Ecology Lab Technical Report 08-06
an interface ecology lab production