summer of code internships 2014
Connecting sensation, media, computing, networks, ontology, methodologies, cultures.

Summer of Code at the Interface Ecology Lab!

The Interface Ecology Lab is a supportive community of developers dedicated to human-centered computing, object-oriented design, and open source software. We are engaged in cutting-edge research that visualizes and embodies social and semantic information, with multi-touch interaction, for use in games, creativity support tools, interactive art, and other digital media. Google Summer of Code (GSoC) Internships enable students located anywhere to collaborate with us in developing open source software.

BigSemantics is a language and architecture for defining, extracting, manipulating, exchanging, and presenting semantic data (metadata) in desktop, mobile, and cloud applications. With BigSemantics, developers author application-independent, reusable code blocks in the meta-metadata language, which are called wrappers, to specify data models, extraction rules, presentation rules, and semantic actions for metadata. The architecture allows you to easily obtain structural semantic data from many template-based web sites, and conveniently operate on semantic data through native classes and objects. It comes with a large, yet growing repository of wrappers, supporting a wide range of commonly used web sites and services, such as weather forecasts, Google Books, Amazon Products, games, movies, blog posts, RSS, tweets, and pinboards. We are working on supporting more and more web sites and information types. Checkout the BigSemantics Wiki for more explanations, scenarios, and tutorials. Also checkout our example web application, the Metadata In-Context Expander (MICE) that uses BigSemantics that presents expandable semantic data.

The foundation of our software is S.IM.PL (Support for Information Mapping in Programming Languages), the best-in-class open source framework for developing distributed programs across programming languages that share and store complex data structures. Its great for writing programs that connect iOS and WPF clients to Java servers.

This summer's projects are centered on BigSemantics, but you may want to use S.IM.PL when you are working on some of them. Our proposed projects include:

Applications should author a meta-metadata wrapper. Fork the BigSemanticsWrapperRepository to author your new wrappers, and submit your products through github pull request. See the BigSemantics Wiki for instructions on setting up a development environment and tutorials on authoring wrappers.

IRC channel: irc://

M1: Content-based Extraction for BigSemantics

Mentor: Yin Qu

This project will enhance BigSemantics with functionalities to deal with complex semantics that involve multiple, variate types, based on the page content. To support extraction and presentation of such semantics, multiple enhancements are needed to BigSemantics core features:

I. Wrapper selection based on page elements. BigSemantics uses wrappers to specify how semantics should be extracted and presented. Each wrapper corresponds to not only one, but a family of web pages that contain semantics of a certain category, or type, such as a book, a movie, or a technology product. Currently, for an input web page, BigSemantics will automatically select the most appropriate wrapper in order to extract and present semantics of that type. This selection is based on URL pattern, suffix, and MIME types.

However, from experiences we found that in many cases, determining the real type of a web page requires inspecting its content, such as looking for a specific element with certain values. In this subaim, you will enhance BigSemantics’ selection mechanism, to be able to determine or refine the type of a web page by inspecting specific elements and their values.

II. Extraction of mixins. In some other cases, we need to combine semantics of different type in one metadata object dynamically based on the content of the page, such as a book for selling (which will involve product semantics), or an image with EXIF information. This can be done through mixins. BigSemantics supports mixins in metadata objects, however, we need to enhance the extraction module to be able to extract mixin objects. The type of the object will be determined again by inspecting the page content.

The two enhancements need to work together to address complex semantics. For example, for a Wikipedia page describing a movie star, we want to extract movie star semantics based on the page content, and mix in Wikipedia specific semantics, such as links to related Wikipedia articles.


  • Experiences with Java.
  • Familiar with HTML, DOM, and XPath.


  • Test cases for the 2 enhancements. They need to be specific. Use them to drive development.
  • Design document of how wrapper selection and mixin extraction should work, with help from the mentor.
  • Enhanced wrapper selection mechanism that inspects page elements.
    • Logics that selects page elements by XPath and optionally matches its content with some predefined value.
    • Logics that changes the meta-metadata type for a page when necessary.
  • Enhanced extraction module that handles mixin.
    • Logics for dynamic types can be reused from the previous deliverable.
    • Logics for adding newly extracted metadata objects as mixins.

M2: BigSemantics: Curation of Popular Culture

Mentor: Nic Lupfer

This project develops popular perspectives in web semantics in order to involve the general public, especially young people, in new experiences of collecting, presenting, and sharing information. Curation is the process of developing a conceptual focus, and selecting and connecting significant information that expresses the chosen concepts. Each REU student assigned to this project will identify a cultural area significant to her / him, and work on the curation of semantic information in this area. Examples of potential popular culture areas include fashion, sports, music, entertainment, skateboarding, DIY, crafts, design, and dance.

The goal of this semantics curation is to make information in these popular culture areas available to people in new ways, to catalyze new expressions, uses, and communities involving the information. Curating semantics involves identifying cultural areas of information on the web that matter to particular groups of people, defining data structures to best represent this information, identifying template-driven web sites that publish data that can be used to instantiate the data structures, and defining XPath and regular expressions that extract information from the web sites. This involves using the Interface Ecology Lab's open source BigSemantics toolkit and MICE.

The project involves using social media to coalesce user communities around new visual semantic presentations in the area of popular culture that the student curates.

More than one student can work on this project, assuming that each works on a different area of popular culture. These areas and their associated students can fruitfully intersect.


  • Develop reusable semantic data models for areas of popular culture, using Meta-Metadata.
  • Develop selectors and wrappers for specific web sites.
  • Build fun tutorial applicationsl / demo in DHTML incorporating MICE and popular culture.
  • Alternative dynamic metadata visualizations using D3.
  • Produce clear documentation in the BigSemantics Wiki.

M3: Rich Placemarks

Mentor: Andruid Kerne

We need methods to facilitate presenting rich heterogeneous metadata on maps. Rich metadata consists of descriptions and relational associative links, derived from web pages, such as a movie, its actors and their movies, and a historical site, its significant events and artifacts, and other related events, sites, and artifacts.

The Interface Ecology Lab's open source BigSemantics project has developed tools for wrappers that represent the data models for rich metadata, extraction rules for particular web sites and other sources, and hints for presentation. A repository of wrappers covers many popular web sites. BigSemantics also already includes components for presenting this rich metadata in JavaScript, Java, and C# / WinRT. The MICE is an example.

This project will develop tools and examples for representing rich metadata instances, each associated with a GPS location, on maps, to be known as rich placemarks. The instances of metadata for places that go on the map could be curated in advance, or collected ad hoc by drag and drop. This will be accomplished by integrating BigSemantics, MICE, and OpenStreetMaps (and perhaps Google Maps).


  • Extend and develop reusable semantic data models for places, using Meta-Metadata.
  • Develop selectors and wrappers for specific web sites that present mappable locations.
  • Develop an integration of OpenStreetMaps and MICE, which enables interaction with rich place-oriented metadata on a map.
  • Build fun tutorial / demo applications in DHTML incorporating MICE with popular culture, such as mashups involving TripAdvisor, Expedia, and WIkipedia.
  • Produce clear documentation in the BigSemantics and MICE Wikis.

The developer needs experience with HTML5, JSON, and interaction design. Source code is organized with Git.

M4: Building Dynamic Exploratory Browsing Interfaces

Mentor: Yin Qu

In this project, you will design and implement new interfaces for visualizing and interacting with web semantics within web applications and web pages, which we call dynamic exploratory browsing interfaces (DEBIs). The goal of DEBIs is to change how people interact with and interpret the vast amount of information accessible in today’s Internet age. An initial example is MICE, which supports browsing in one context and reduces switching windows or tabs.

The deliverables of this project are centered around creating new DEBIs according to the paradigm of MICE. New DEBIs will use the same underlying open-source frameworks BigSemantics and BigSemanticsJavaScript. BigSemantics provides rich metadata for web pages. BigSemanticsJavaScript provides functions for retrieving, deserializing, iterating across, displaying, and interacting with metadata. In addition, new DEBIs will build upon the open source visualization framework, D3, and will be incorporated as a part of BigSemanticsJavaScript. Along with designing and building new DEBIs, this projects involves iterating on the architecture of the BigSemanticsJavaScript framework, including developing a DEBI testing platform.

By the end of the summer, the developer working on this project will have created 2 new, innovative HTML5 exploratory browsing interfaces and a testing platform for the BigSemanticsJavaScript framework.


  • DEBI 1
    • Implement a new DEBI from a provided design
    • Validate interface with different types of metadata
  • Code Consolidation
    • Reconcile the code between the existing DEBI, MICE, and the newly created interface
    • Maximize the amount of shared code
    • Test
  • Testing Platform
    • Build a testing platform to test all DEBIs for errors when the shared code base changes
  • DEBI 2
    • Design, implement, and validate a new DEBI

M5: BigSemantics C# Generics Compiler

Mentor: Andrew Webb

BigSemantics provides a compiler that translates wrappers, written in the meta-metadata language, into classes in programming languages such as Java and C#, so that applications can conveniently access and use semantic information as native objects.

The meta-metadata language supports generics. For example, the type search has a generic type parameter for its search results. Subtypes can specialize that generic type parameter to specify a search for a particular type (Search<ResultType>), such as a search for Google Books or Wikipedia articles.

Currently, the meta-metadata compiler that generates C# code doesn’t handle generics correctly. Specifically, it doesn’t handle covariance, which means recognizing that Search<GoogleBook> is a subtype of Search<Book>. In C#, covariance is handled through interfaces instead of concrete classes. Therefore, to deal with covariance, the compiler needs to generate not only the concrete class, but also the interface containing covariance information.

In this project, you will develop this feature of generating C# generics and covariance interfaces from meta-metadata wrappers. You will learn about how type systems work in Java and C#, and how we can manipulate it in program. You will play with types and code generation, which will be useful in many large scale projects.


  • Design document on what the compiler should generate, and how.
  • Develop the testing basis for the required feature.
  • Implement the required feature.
  • Integrate the developed feature with the current meta-metadata compiler, and refactor its structure.

Application Template

Please use the following format and guidelines when preparing your application

General Information and Resume

Submit your latest resume. In addition, list the following information in your application:

Logistical Information

Short Bio

Describe your previous experience in software or hardware development, and what skills you bring to the table. Tell us what brings you to the open-source development community. Explain how open source fits with your other pursuits in life, academic or otherwise.


2 to 3 paragraphs describing your motivation for working on the project you have chosen. What you expect to gain from working on the project, why you are interested, and what you believe to be the expected benefits to the open-source community.

Deliverables and Timeline

In roughly a page of text, breakdown your selected project into a set of milestones or deliverables and set realistic timetables for accomplishing these deliverables. Include a GANTT chart showing your planned progress over the summer. Explain the deliverables that are "must-haves", as well as expanding on future work that you may or may not accomplish, time permitting. Keep in mind that some things may take longer than you originally anticipated, so be sure and include contingency plans for such circumstances. This is the heart of your proposal.


Write a short paragraph detailing how you plan on keeping in communication with your project mentor and ensuring that your deliverables are on time. Will you have weekly Skype meetings? Is email your preferred method of communication? How do you plan to keep us informed of both progress as well as problems?

Code Sample

For each project, please share us a programming project or pieces of programming work that you have done before. For example, a GitHub or Google Code project that you have done for your class, or links to patches you submitted to open source project(s). The project or code sample will be used solely for us to evaluate your programming skills.

For projects M1-M5, you should also submit a meta-metadata wrapper that you have authored, along with an exmaple URL that extracts metadata from it. You should author the wrapper by forking the BigSemanticsWrapperRepository project, and submit your wrapper through GitHub Pull Requests. If you need help, contact us at IRC or via email.


Anything else that you want to let us know about yourself, about the project, or any other information you think would help your application should be included as well. This includes other projects you may have worked on, code samples you are particularly proud of, academic publications, your personal blog, software development experience, education, etc. Remember, you are trying to sell yourself as the right person for the job! Anything that would help convice us of that should be in your application, including its associated web pages.