The growing adoption of prompt-based generative models has raised concerns over the unauthorized use of proprietary data, as such models may memorize and replicate training content. To address this issue, we introduce ProCAP, a novel Membership Inference Attack approach based on a prompt-driven auditing framework.Given a proprietary dataset and a target generative model, ProCAP trains an auxiliary model to craft prompts that trigger the target model to produce outputs revealing potential violations of the proprietary data.Unlike current literature, ProCAP is automatic, fully black-box, model-agnostic, and designed to operate in settings with limited or no knowledge of the training process.To reduce the computational cost of training the prompt generator, we adopt an optimization strategy that filters high-loss samples, i.e., those less likely to have been memorized. Our approach can then “specialize” the learning phase on the most informative data regions. We validate ProCAP across different scenarios, by using both real and synthetic data. Results demonstrate its effectiveness in recognizing unauthorized data usages with strong accuracy-efficiency trade-offs.

Automated Membership Inference via Prompt-Based Attacks in Generative Models

Ritacco, Ettore
Conceptualization
;
2026-01-01

Abstract

The growing adoption of prompt-based generative models has raised concerns over the unauthorized use of proprietary data, as such models may memorize and replicate training content. To address this issue, we introduce ProCAP, a novel Membership Inference Attack approach based on a prompt-driven auditing framework.Given a proprietary dataset and a target generative model, ProCAP trains an auxiliary model to craft prompts that trigger the target model to produce outputs revealing potential violations of the proprietary data.Unlike current literature, ProCAP is automatic, fully black-box, model-agnostic, and designed to operate in settings with limited or no knowledge of the training process.To reduce the computational cost of training the prompt generator, we adopt an optimization strategy that filters high-loss samples, i.e., those less likely to have been memorized. Our approach can then “specialize” the learning phase on the most informative data regions. We validate ProCAP across different scenarios, by using both real and synthetic data. Results demonstrate its effectiveness in recognizing unauthorized data usages with strong accuracy-efficiency trade-offs.
File in questo prodotto:
File Dimensione Formato  
s10994-026-07010-4.pdf

accesso aperto

Licenza: Creative commons
Dimensione 2.99 MB
Formato Adobe PDF
2.99 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1331370
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact