In this project, a loss function to improve upon the baseline approach which simply applies image supperresolution to each frame will be proposed. Based on the SRGAN proposed by Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, we propose a novel specific loss function which does not take only one single frame but a bundle of temporally neighbouring frames into the GAN.
The model takes a sliding window of several consecutive frames as the input for generator network and outputs a bundle of consecutive SR frames. A perceptual loss function regarding both the single frame superresolution quality and the temporal continuity is proposed and the experimental results will evaluate the quality of superresolved videos using this loss function. More detials can be found in the Github.
The goal of this project is to create a solution for taking these lab notes while in the midst of an experiment, when stopping to do so can be difficult or inconvenient (e.g. you want to record a description of the texture of stucco you are mixing, but your hands are gloved and covered in sticky plaster) In this project, we created an augmented-reality-based video recording solution to solve the problem of taking video notes in the midst of experiments.
We utilized some 3D UI interaction techniques such as video cropping area selection, walking steering travel among multiple videos, wayfinding for other experiment desks, etc. to accomplish this project. More detials can be found in the Github or the slides.
PR2-GOGR (PR2-Based Grasping and Object Geometry Reconstruction) is a COMSE6731 Humanoid Robots Final Project. In this project, we addressed an approach to reconstruct the geometry structure of a given object from a series of depth views using Willow Garage's PR2. We first utilize the depth camera to localize the target object that we would like to scan, then make PR2 to grasp the target object using predefined grasp planning algorithm.
Then, a set of color images as well as the depth images will be captured and aligned to reconstruct the point cloud for the target object. This report also gives simulated experiment results showing the point cloud reconstruction for the target object. More detials can be found in the Github or the slides.
Unsupervised Image-to-Image Translation Networks (UNIT) proposed by NVIDIA presented a general framework for unsupervised image-to-image translation. This network can learn to translate an image from one domain to another without any image pairs information between the two domains. In this project, I basically add region of interest (ROI) constraint to UNIT to enhance the ability of transferring images between summer and winter on a specific dataset "Cab Ride Norway : Trondheim - Bodø Nordland Line"(summer, winter). The experiment results show an improvement both on the training speed and the image translation quality between summer and winter railway trip.
We propose a new assumption called ROI-specific shared-latent: The cycle consistency for the region of interest in two image domains has been enforced and the specific region of two images will be mapped to a same latent space. More detials can be found in the Github or the slides.
We propose a novel 360° scene representation Parallax360 for converting real scenes into stereoscopic 3D virtual reality content with head-motion parallax. Our image-based scene representation enables efficient synthesis of novel views with six degrees-of-freedom (6-DoF) by fusing motion fields at two scales: (1) disparity motion fields carry implicit depth information and are robustly estimated from multiple laterally displaced auxiliary viewpoints, and (2) pairwise motion fields enable real-time flow-based blending, which improves the visual fidelity of results by minimizing ghosting and view transition artifacts.
Based on our scene representation, we present an end-to-end system that captures real scenes with a robotic camera arm, processes the recorded data, and finally renders the scene in a head-mounted display in real time (more than 40 Hz).
Our approach is the first to support head-motion parallax when viewing real 360° scenes. We demonstrate compelling results that illustrate the enhanced visual experience – and hence sense of immersion – achieved with our approach compared to widely-used stereoscopic panoramas.
The capturing device uses a GoPro Hero 5 Session as the camera and several stepper motors for adjusting robotic arm's pose. I used a CNC USB Controller as the controlling system, which can also be replaced by Arduino Mega CNC Shield. In the video above, we can see the obvious head-motion parallax in VR scene rendering results using our algorithm, Light Field Compression.
Panorama has always been the most popular representation of the virtual reality scene. However, both panorama and stereo panorama can not represent the depth information of the scene quite well. Hence, I designed a special kind of image representation based on optical flows to stand for the VR scene and a whole set of solutions from capturing to rendering the VR scene. Here is a demo video on how to capture images using our solution.
In the video above, I combined NEQ6 with a Logitech C920 webcam together to capture images from every possible direction. NEQ6 is highly precise and mainly used for astronomy observation. In our hardware solution, it is used to control the position and orientation of the camera.
In avaChat, you could achieve a wholly new experience of chatting. Users can define their avatars in profile settings in 3D. When talking with others, they can see their own and their friend's avatars on the screen, and these avatars will help to express the chatters' mood with body language and facial expression.
What's more, avaChat can save user's chatting history in a particular way, which helps them keep precious memories. Besides, avaChat can also be built as a cross-platform plugin so as to be used on every popular chat app. AR chatting mode is also provided to see the avatar standing in the real scenes.
avaChat_Holo is the extension of the avaChat on Microsoft HoloLens. It allows you to communicate with others in your real world! Powered by holographic UI, users may feel immersed in the chat and interact with the talker actively.
This is my graduate project done during my Master at Tsinghua University. I designed and developed a series of algorithms and applications to store and present the light field. A set of device used for capturing the light field is also invented. The light field is a kind of container which encapsulate all the information of light from every position and direction. Therefore, it is massive and redundant. What my work focuses on is to decrease the redundancy of the traditional light field and designed a compact representation.
My research is comprised of three main tasks: a. A compact representation of light field; b. A real-time method to render it; c. A set of device to capture the light field so as to store and render it. As the video below shows, a whole progress of capturing the light field, converting it into compressed version and rendering while user interaction are demonstrated:
The algorithm for compression of the light field is mainly inspired by Motion Compensation, which is a common method to store the video files efficiently. The light field can be regarded as a video file in higher dimensional space, so the similar methods can be adopted.
K3SimSearch is a simple Python script as a dictionary in which you can look up a GRE word and find its similar words (not synonyms but visual similarity). It works as a small tool for helping students to prepare for GRE.
This is a showcase about viewing the graph of visually similar GRE words.
Parocam (有劲) is an iOS app which utilize GPUImage and Sensetime face detection and alignment technology. The name of the app comes from Parody Camera. It allows real-time recording with various creation modes, such as face features morphing, background changing, and smart masks. With the help of the app, users can create a great many impressing short videos with their faces.
The app is on AppStore at the present. This work was mainly done during my early semesters in Tsinghua University. It is also one of the complete and comercial projects that I've done.
PlanarSight is a course project for Computational Geometry (CS 7024-0183). The course taught by Dr Deng really impressed me and make me feel like diving to the world of computational geometry. This project is a small game made by some advanced algorithms like constrained Delaunay triangulation and visibility polygon construction.
Here is a snapshot about this game:
This project is open source, you can fork it here.
This is a course project for Computer Graphics and Computer Aided Design. It simply triangulated the planar polygon into several triangles. To implement it with high performance, I read the paper and the open source project, poly2tri.
Here is a snapshot of the application:
This project is open source, you can fork it here.
WebGLBrush is my final project for my bachelor's graduation. It's a pure front-end project based on SculptGL. Inspired by ZBrush, a commercial software which does 3D modeling jobs with the digital sculpting solution, I want to implement a web-based 3D sculpting modeling system as my final project.
It presents a big model as the scene in canvas and allows users' interaction to make sculpture on the model. The algorithm behind will do subdivision on mesh surface and perform the precise movement for points on mesh along their normal direction. Thus an effect of WYSIWYG will be shown to users, and it's a light-weighted solution for the small modification of 3D models.
You can try it here. However, loading a model will be time-consuming.
Imagilar is a real-time image similarity search system on the mobile platform. With the rapid development of mobile intelligent devices and wireless communications, users are gradually changing the way of consuming interesting content from the traditional personal computers to smart phones. Hence, I implemented a brand-new content-based image similarity search system which runs on the mobile platform in real time.
This project is accomplished during my internship in Univeristy of Queensland. I also published a short paper based on the project.
This is a project which I submitted for the competition of Innovation Cup in Software Institute, Nanjing University. My team has created an iOS 3D game based on augmented reality by which the whole virtual scene of the game was established and implanted into reality.
The game is made in Unity3D and integrated with the Vuforia Unity Extension. The logic of the game is quite simple. The only task game players need to accomplish is to shoot bullets to the moles to clear them.
However, hardness exists where players have to keep other monsters on the plane. Thus the secret to success is finding a proper position to shoot, which is also the specularity of fun in AR.
IEEE VR 2018, the 25th IEEE Conference on Virtual Reality and 3D User Interfaces was held from March 18th through March 22nd, 2018 at the Stadthalle Reutlingen in Reutlingen, Germany. I presented a TVCG journal paper and an IEEE VR Conference paper with my supervisor Feng Xu and co-author Christian Richardt.
Vision and Graphics Track: