Commercial RGB-D cameras typically require a clear posture without occlusion. This hugely limits the usability of the device for serious applications that require manipulation of external objects. In this paper, we propose an integrated framework to track motion and object during human-object interactions. We implement a data-driven posture reconstruction algorithm to correct wrongly tracked body parts during occlusions, as well as a computer vision based object tracking algorithm using the depth image. We demonstrate preliminary results in which the system tracks a user playing with a basketball.