A Parallel Between Words and Graphics: The Process of Urban Representation Through Verbal Descriptions, from Historical Painters to the Automatically Generated Images by Artificial Intelligence

This paper explores the relationship between verbal descriptions and visual representation of urban environments, comparing historical artistic practices with contemporary AI-based image generation. The study traces how medieval and Renaissance painters constructed cityscapes from symbolic, oral, or textual descriptions, often translating partial knowledge into coherent visual narratives. Through the analysis of selected artworks—such as Sassetta’s City on the Sea, Spinello Aretino’s frescoes, and Pisanello’s St. George and the Princess—the authors highlight recurring compositional structures and symbolic elements in premodern urban imagery (e.g., walled cities, gates, towers, and spatial hierarchies).

The research then develops an experimental framework comparing human and artificial processes: architecture students and experts were asked to draw a city based solely on a textual description, while the same prompt was processed through AI image generators such as Midjourney and DALL·E. The results reveal that both human and AI outputs depend heavily on prior visual knowledge, stylistic conventions, and selective interpretation of textual inputs. However, AI systems tend to emphasize mainstream visual patterns, ignore parts of the prompt, and generate images driven by probabilistic associations rather than structured architectural reasoning. The study concludes that AI image generation is not a replacement for human creativity but a tool that exposes the mechanisms of visual translation, offering new insights into both historical representation practices and contemporary computational creativity.