

An effective location recommendation function in smart building management systems can optimize space utilization and enhance user service experience. Despite recent advances in Large Language Models (LLMs) for NLP-based recommender systems, smart building systems often lack communication and coordination with other devices, resulting in subpar interactivity and serviceability. To address these challenges, this paper proposes a multi-modal recommendation system for utilizing and sharing open spaces in smart buildings. The system includes a “vision-based recommendation module” that uses visual Language Models and real-time surveillance images to identify locations based on user-requested keywords. The “knowledge-based recommendation module” utilizes knowledge graph technology to match user requirements with historical feedback data, improving semantic matching and optimizing user experience. The system combines the outputs from both modules using decision fusion technology to provide final location recommendations. Simulation results demonstrate that the proposed system can effectively understand user intentions and provide satisfactory location recommendations. The multi-modal approach outperforms individual recommendation methods.