The Interface Ecology Lab is a supportive community of developers dedicated to human-centered computing, object-oriented design, and open source software. We are engaged in cutting-edge research that visualizes and embodies social and semantic information, with multi-touch interaction, for use in games, creativity support tools, interactive art, and other digital media. Google Summer of Code (GSoC) Internships enable students located anywhere to collaborate with us in developing open source software.
BigSemantics is a language and architecture for defining, extracting, manipulating, exchanging, and presenting semantic data (metadata) in desktop, mobile, and cloud applications. With BigSemantics, developers author application-independent, reusable code blocks in the meta-metadata language, which are called wrappers, to specify data models, extraction rules, presentation rules, and semantic actions for metadata. The architecture allows you to easily obtain structural semantic data from many template-based web sites, and conveniently operate on semantic data through native classes and objects. It comes with a large, yet growing repository of wrappers, supporting a wide range of commonly used web sites and services, such as weather forecasts, Google Books, Amazon Products, games, movies, blog posts, RSS, tweets, and pinboards. We are working on supporting more and more web sites and information types. Checkout the BigSemantics Wiki for more explanations, scenarios, and tutorials. Also checkout our example web application, the Metadata In-Context Expander (MICE) that uses BigSemantics that presents expandable semantic data.
The foundation of our software is S.IM.PL (Support for Information Mapping in Programming Languages), the best-in-class open source framework for developing distributed programs across programming languages that share and store complex data structures. Its great for writing programs that connect iOS and WPF clients to Java servers.
This summer's projects are centered on BigSemantics, but you may want to use S.IM.PL when you are working on some of them. Our proposed projects include:
Applications should author a meta-metadata wrapper. Fork the BigSemanticsWrapperRepository to author your new wrappers, and submit your products through github pull request. See the BigSemantics Wiki for instructions on setting up a development environment and tutorials on authoring wrappers.
IRC channel: irc://irc.freenode.net/ecologylab.
Mentor: Yin Qu
This project will enhance BigSemantics with functionalities to deal with complex semantics that involve multiple, variate types, based on the page content. To support extraction and presentation of such semantics, multiple enhancements are needed to BigSemantics core features:
I. Wrapper selection based on page elements. BigSemantics uses wrappers to specify how semantics should be extracted and presented. Each wrapper corresponds to not only one, but a family of web pages that contain semantics of a certain category, or type, such as a book, a movie, or a technology product. Currently, for an input web page, BigSemantics will automatically select the most appropriate wrapper in order to extract and present semantics of that type. This selection is based on URL pattern, suffix, and MIME types.
However, from experiences we found that in many cases, determining the real type of a web page requires inspecting its content, such as looking for a specific element with certain values. In this subaim, you will enhance BigSemantics’ selection mechanism, to be able to determine or refine the type of a web page by inspecting specific elements and their values.
II. Extraction of mixins. In some other cases, we need to combine semantics of different type in one metadata object dynamically based on the content of the page, such as a book for selling (which will involve product semantics), or an image with EXIF information. This can be done through mixins. BigSemantics supports mixins in metadata objects, however, we need to enhance the extraction module to be able to extract mixin objects. The type of the object will be determined again by inspecting the page content.
The two enhancements need to work together to address complex semantics. For example, for a Wikipedia page describing a movie star, we want to extract movie star semantics based on the page content, and mix in Wikipedia specific semantics, such as links to related Wikipedia articles.
Mentor: Nic Lupfer
This project develops popular perspectives in web semantics in order to involve the general public, especially young people, in new experiences of collecting, presenting, and sharing information. Curation is the process of developing a conceptual focus, and selecting and connecting significant information that expresses the chosen concepts. Each REU student assigned to this project will identify a cultural area significant to her / him, and work on the curation of semantic information in this area. Examples of potential popular culture areas include fashion, sports, music, entertainment, skateboarding, DIY, crafts, design, and dance.
The goal of this semantics curation is to make information in these popular culture areas available to people in new ways, to catalyze new expressions, uses, and communities involving the information. Curating semantics involves identifying cultural areas of information on the web that matter to particular groups of people, defining data structures to best represent this information, identifying template-driven web sites that publish data that can be used to instantiate the data structures, and defining XPath and regular expressions that extract information from the web sites. This involves using the Interface Ecology Lab's open source BigSemantics toolkit and MICE.
The project involves using social media to coalesce user communities around new visual semantic presentations in the area of popular culture that the student curates.
More than one student can work on this project, assuming that each works on a different area of popular culture. These areas and their associated students can fruitfully intersect.
Mentor: Andruid Kerne
We need methods to facilitate presenting rich heterogeneous metadata on maps. Rich metadata consists of descriptions and relational associative links, derived from web pages, such as a movie, its actors and their movies, and a historical site, its significant events and artifacts, and other related events, sites, and artifacts.
This project will develop tools and examples for representing rich metadata instances, each associated with a GPS location, on maps, to be known as rich placemarks. The instances of metadata for places that go on the map could be curated in advance, or collected ad hoc by drag and drop. This will be accomplished by integrating BigSemantics, MICE, and OpenStreetMaps (and perhaps Google Maps).
The developer needs experience with HTML5, JSON, and interaction design. Source code is organized with Git.
Mentor: Yin Qu
In this project, you will design and implement new interfaces for visualizing and interacting with web semantics within web applications and web pages, which we call dynamic exploratory browsing interfaces (DEBIs). The goal of DEBIs is to change how people interact with and interpret the vast amount of information accessible in today’s Internet age. An initial example is MICE, which supports browsing in one context and reduces switching windows or tabs.
Mentor: Andrew Webb
BigSemantics provides a compiler that translates wrappers, written in the meta-metadata language, into classes in programming languages such as Java and C#, so that applications can conveniently access and use semantic information as native objects.
The meta-metadata language supports generics. For example, the type search has a generic type parameter for its search results. Subtypes can specialize that generic type parameter to specify a search for a particular type (Search<ResultType>), such as a search for Google Books or Wikipedia articles.
Currently, the meta-metadata compiler that generates C# code doesn’t handle generics correctly. Specifically, it doesn’t handle covariance, which means recognizing that Search<GoogleBook> is a subtype of Search<Book>. In C#, covariance is handled through interfaces instead of concrete classes. Therefore, to deal with covariance, the compiler needs to generate not only the concrete class, but also the interface containing covariance information.
In this project, you will develop this feature of generating C# generics and covariance interfaces from meta-metadata wrappers. You will learn about how type systems work in Java and C#, and how we can manipulate it in program. You will play with types and code generation, which will be useful in many large scale projects.
Please use the following format and guidelines when preparing your application
Describe your previous experience in software or hardware development, and what skills you bring to the table. Tell us what brings you to the open-source development community. Explain how open source fits with your other pursuits in life, academic or otherwise.
2 to 3 paragraphs describing your motivation for working on the project you have chosen. What you expect to gain from working on the project, why you are interested, and what you believe to be the expected benefits to the open-source community.
In roughly a page of text, breakdown your selected project into a set of milestones or deliverables and set realistic timetables for accomplishing these deliverables. Include a GANTT chart showing your planned progress over the summer. Explain the deliverables that are "must-haves", as well as expanding on future work that you may or may not accomplish, time permitting. Keep in mind that some things may take longer than you originally anticipated, so be sure and include contingency plans for such circumstances. This is the heart of your proposal.
Write a short paragraph detailing how you plan on keeping in communication with your project mentor and ensuring that your deliverables are on time. Will you have weekly Skype meetings? Is email your preferred method of communication? How do you plan to keep us informed of both progress as well as problems?
For each project, please share us a programming project or pieces of programming work that you have done before. For example, a GitHub or Google Code project that you have done for your class, or links to patches you submitted to open source project(s). The project or code sample will be used solely for us to evaluate your programming skills.
For projects M1-M5, you should also submit a meta-metadata wrapper that you have authored, along with an exmaple URL that extracts metadata from it. You should author the wrapper by forking the BigSemanticsWrapperRepository project, and submit your wrapper through GitHub Pull Requests. If you need help, contact us at IRC or via email.
Anything else that you want to let us know about yourself, about the project, or any other information you think would help your application should be included as well. This includes other projects you may have worked on, code samples you are particularly proud of, academic publications, your personal blog, software development experience, education, etc. Remember, you are trying to sell yourself as the right person for the job! Anything that would help convice us of that should be in your application, including its associated web pages.