The user end of a medical imaging system is a display, on which the data captured by imaging techniques is provided to physicians for diagnosis and treatment of disease. Many types of displays have been developed for various clinical applications, ranging from simple monitors to immersive head-mounted displays. Still, little research has been conducted to systematically study the perceptual and cognitive factors involved in using these display technologies. In this talk, I will present my research on how imaging enables users to achieve a spatial representation and use it to direct actions. I will begin by describing what perceptual and cognitive issues are related to the image-mediated perception and action. Next I will describe experiments using ultrasound imaging to examine the localization of a target in 3D space and the formation of an object representation from a series of cross-sectional images. The experiments compared two displays: the Sonic Flashlight vs. conventional display. We use the comparison to highlight the principle that an effective display of imaging data should capitalize as fully as possible on the powerful ability of human vision.