The Language of Photography in the Age of AI

Text-to-image algorithms based on Deep Learning are central to content creation in multiple application domains. In the last few years, the capacity of Neural Networks to generate increasingly realistic images quickly has blurred the boundary between authentic and realistic content, making genuine and false data less and less distinguishable. This condition leads to a profound reflection on the application of photographic images as a tool for communication and storytelling, trying to answer simple questions. Can today’s Neural Networks generate content comparable and indistinguishable from a photograph in both formal and compositional terms? Can artificial intelligence algorithms replace the photographer’s ability to design and obtain images that preserve the story and the place’s intangible culture? From a set of photographic rules framed in specific workflows, the research analyses some results obtained using text-to-image algorithms within the Midjourney program. The experiment aims to determine the pros and cons of using text-to-image algorithms to automatically generate photographic images, highlighting the potential and current limitations in constructing content subject to specific formal rules.

GPTfor Treatise Image Creation: ACritical Overview

Text-to-image systems based on generative pre-trained transformers have become pervasive in recent years. This result is mainly due to the generation tools’ ease of use, application availability, and increasingly refined graphical and algorithmic capabilities. On the other hand, the non-controllability given by a random process and the dependency on existing information databases highlight some limitations of these automatic image-generation methods. The rapid construction of increasingly realistic digital images draws new boundaries between real and unreal, highlighting a strict relation between text and image regarding semantics and logic descriptions. Therefore, a critical look into the use of these applications as tools for a reliable representation of architecture becomes cogent. We started from representations established by the treatises, as in the case of architectural orders. Most of these drawings, proportions, and rules are derived from descriptive parts in Vitruvius’ text. However, the graphic interpretations result from the architect’s experience and culture, which has become the basic grammar of architecture. This research stems precisely from the connection between Vitruvius’ text, the new text-to-image contents, and the established representations of the treatise writers. The comparison considers both image reliability and geometric rules, testing the current potential of GPT systems for image creation and reliability.

The Recognizability of a Place Through Generative Representation of Intangible Qualities

The paper explores the use of AI-based generative image models to represent the intangible qualities of architectural spaces, such as atmosphere, perception, and emotional experience. Through a comparative experimentation involving multiple platforms (Midjourney, Stable Diffusion with ControlNet, Leonardo.Ai, and Veras), the study investigates how text-to-image and image-to-image processes can translate descriptive prompts and visual inputs into evocative representations. The methodology combines architectural survey outputs (point clouds, photographs, sketches, and watercolors) with textual prompts to guide image generation. Results highlight the varying capabilities of different models in balancing formal coherence and expressive interpretation, demonstrating the potential of AI as a complementary tool for communicating non-measurable aspects of space, while also identifying current limitations in geometric accuracy and semantic control.

Between Image and Text: Automatic Image Processing for Character Recognition in Historical Inscriptions

The research addresses the challenges in Optical Character Recognition (OCR) systems when applied to ancient inscriptions and graffiti. These artifacts, serving celebratory or commemorative purposes, often present legibility issues due to erosion and gaps in the text. Our study proposes an automated image processing pipeline supported by 3D data from photogrammetric surveys. The processing phase involves manipulating image parameters and utilizing spatial coordinates and writing system information. The goal is to enhance legibility by extracting images with neutral backgrounds and highlighted characters, resembling printed texts. This processed data aims to improve the performance of pre-trained Artificial Intelligence (AI) models dedicated to OCR. Ultimately, the research seeks to provide a compar-ative study between unprocessed and processed images, validating the significance of the pre-processing phase in enhancing text recognition systems. The proposed automated workflow aims to contribute to the field of computer vision, specifically in the context of preserving and interpreting historical inscriptions.