Two-week challenge

We took the challenge of reading one paper a day for two weeks and here is the list of the papers we all read. For the sake of exactness, it was around 1 paper per day on average and we didn’t need to read the same papers, some people shared some papers but in general, we were fully free to choose.

After the 2 weeks, we made short meetings to share a very short summary of each of the papers

Here is the list of all papers discussed

DROID-SLAM
GitHub - ISEE-Technology/CamVox: [ICRA2021] A low-cost SLAM system based on camera and Livox lidar.
A Review of Visual-LiDAR Fusion based Simultaneous Localization and Mapping
Masked Autoencoders are Scalable Vision Learners
Blind Geometric Distortion Correction on Images Through Deep Learning
Self-supervised Monocular Depth Estimation with internal feature fusion
Recurrent Multi-View alignment network for unsupervised surface registration
Prototypical cross-attention network for multiple object tracking and segmentation
Nerf
In-place scene labelling and understanding with implicit scene representation
Nerf in the wild
Recurrent Multiframe single shot detector for video object detection
LoFTR: Detector-Free Local Feature Matching with Transformers
Skip-Convolutions for Efficient Video Processing
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing
Probabilistic Future Prediction for Video Scene Understanding
Urban Driving with Conditional Imitation Learning
FIERY
Spatial Transformer Networks
ADOP
Variational End-to-End Navigation and Localization
Neural Point-Based Graphics
ORB-SLAM2
ORB-SLAM: a Versatile and Accurate Monocular SLAM System
Plen-octrees
Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D
MonoLayout: Amodal scene layout from a single image
Orthographic Feature Transform for Monocular 3D Object Detection
The Transformer Model in Equations
Every Model Learned by Gradient Descent Is Approximately a Kernel Machine
MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps
Why Having 10,000 Parameters in Your Camera Model is Better Than Twelve
ViViT: A Video Vision Transformer
Semantic-assisted 3D Normal Distributions Transform for scan registration in environments with limited structure
Discrete Kalman Filter Tutorial
Relational inductive biases, deep learning, and graph networks (in progress)
Inductive Biases for Deep Learning of Higher-Level Cognition.
On the Measure of Intelligence
Unsupervised Learning of Visual 3D Keypoints for Control
Transfer learning based few-shot classification using optimal transport mapping from preprocessed latent space of backbone neural network
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Prototypical Networks for Few-shot Learning