In recent years, advances in artificial intelligence have significantly enhanced the capabilities of robots in collaborative settings, improving perception, reasoning, planning, and interaction with human partners. This survey provides a comprehensive analysis of different approaches used in Human–Robot Collaboration (HRC) for manufacturing, assembly, and industrial production. Particular attention is given to the emerging role of Large Language Models (LLMs) and Vision–Language Models (VLMs), which have recently been introduced to support high-level reasoning, task planning, and natural language interaction in collaborative robotics. Furthermore, we explore key aspects of HRC, including perception techniques that allow robots to interpret the environment, reasoning and task planning strategies that define the sequence of actions to reach objectives, and interaction modalities that enable natural Human–Robot Interaction (HRI). We also examine the hardware components used in these systems, such as sensors, cameras, and robotic platforms, highlighting their role in improving perception and task execution. Through a quantitative analysis of the literature, we show that while LLM- and VLM-based approaches represent a growing research direction, the majority of existing HRC systems still rely on more classical methods, not based on Foundational Models. Finally, we discuss current technological limitations, compare the reported performance of LLM/VLM-based and traditional approaches using industrial case studies, and outline future research directions for advancing AI-driven HRC in industrial environments.

From AI foundations to Large Language Models: A survey on challenges and opportunities in collaborative robotics

Franci Rrapi;Beatrice Portelli;Giuseppe Serra;Lorenzo Scalera
2026-01-01

Abstract

In recent years, advances in artificial intelligence have significantly enhanced the capabilities of robots in collaborative settings, improving perception, reasoning, planning, and interaction with human partners. This survey provides a comprehensive analysis of different approaches used in Human–Robot Collaboration (HRC) for manufacturing, assembly, and industrial production. Particular attention is given to the emerging role of Large Language Models (LLMs) and Vision–Language Models (VLMs), which have recently been introduced to support high-level reasoning, task planning, and natural language interaction in collaborative robotics. Furthermore, we explore key aspects of HRC, including perception techniques that allow robots to interpret the environment, reasoning and task planning strategies that define the sequence of actions to reach objectives, and interaction modalities that enable natural Human–Robot Interaction (HRI). We also examine the hardware components used in these systems, such as sensors, cameras, and robotic platforms, highlighting their role in improving perception and task execution. Through a quantitative analysis of the literature, we show that while LLM- and VLM-based approaches represent a growing research direction, the majority of existing HRC systems still rely on more classical methods, not based on Foundational Models. Finally, we discuss current technological limitations, compare the reported performance of LLM/VLM-based and traditional approaches using industrial case studies, and outline future research directions for advancing AI-driven HRC in industrial environments.
File in questo prodotto:
File Dimensione Formato  
rrapi2026_RCIM.pdf

accesso aperto

Descrizione: final paper
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 3.58 MB
Formato Adobe PDF
3.58 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1325205
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact