My research examines the design of semiotic infrastructures – or infrastructures that encode data’s syntax, semantics, and relationships to other data. I do so by ethnographically analyzing the language ideologies that knowledge representation experts (or semiotic technologists) bring to their work. Through interviews, archival research, participant observation, and experimental design projects, I examine how language ideologies shape data practices and the building of data infrastructure. I also explore how language ideologies shift as experts encounter the limits of formalist (“neat”) approaches to knowledge representation, in turn coming to experiment with alternative, “scruffier” design modalities.

I consider myself an anthropologist of data infrastructure and practice. My work is informed by prior work in the history of computing and of technology, software studies, digital humanities, Science and Technology Studies (STS, in which I am completing a PhD), and the technical field of information technology and web science (in which I have a BS).   My research has resulted in scholarly publications, archival contributions (of oral histories with key actors in the development of the Web, for example), and a portfolio of digital humanities projects. I have also represented the “empirical humanities” (including the qualitative social sciences) in international efforts to support research data sharing.

Ethnography of Digital Semiotic Expertise. In my current book project, I examine semiotic infrastructure design in three diverse communities: the knowledge representation community within the field of artificial intelligence, the Semantic Web community, and the information and referral community within the field of human services. During my research, I analyzed archives of public email forums where the design of semiotic infrastructures in these communities was debated. I also interviewed semiotic technologists about how they conceptualized their expertise and how their theoretical and technical commitments evolved over time. I found that semiotic technologists in these communities often expected, early on, to build infrastructure that would digitally encode data’s syntax and semantics with “neat,” logic-based formalisms; however, I argue that, in all cases, actual work with “real world” data showed the limits of these approaches and the need to tolerate “scruffy” language dynamics – dynamics such as polysemy (when words have more than one meaning) and double-bind (the way conflicting messages at different conceptual scales often produce dissonance and paradox). My research thus examines how the semiotic technologists building these data infrastructures have come to recognize the limits of formalist approaches. I detail how they’ve retooled their language ideologies and developed tricks and hacks to adequately represent the structure and import of data in their domains.

By examining information infrastructure and knowledge representation through the lens of semiotics, the research demonstrates how infrastructures become interlaced with diverse language ideologies, and how such ideologies impact how knowledge is organized in information systems. The research also advances conceptualization of how expertise operates and shapes socio-cultural order – demonstrating how and when experts are provoked to deconstruct the binaries that structure their work and thinking. In demonstrating how data infrastructure designers learn to bring different assumptions, logics, and politics to their work, this research furthers understanding of social practice and politics in big data contexts.

This research has resulted in a book chapter for the Sage Handbook of Web History in press, which documents how ideological divisions in the history of knowledge representation have impacted the design of Semantic Web technologies. I also have a paper published in the WebSci2017 conference proceedings, which calls for Web scientists to study Web architecture historically and ethnographically. A co-authored paper in Big Data and Society examines the skills and sensibilities that characterize critical data design. In addition to these publications, I have presented my findings at the annual meetings for the American Anthropological Association, the Society for the Social Studies of Science, the Data Power Conference, and several other conferences. I have a paper in progress for Science, Technology, and Human Values that describes how semiotic technologists learn to operate in experimental ways as “real world” dynamics make it impossible to represent knowledge neatly. This research will also result in a book, tentatively titled Knowledge Representation in Scruffy Worlds: An Ethnography of Semiotic Infrastructure Design and Expertise. More broadly, the research helps data infrastructure designers develop strategies for pluralizing the logics, assumptions, and politics they bring to their work.

Design and Theorization of Data Infrastructure for Experimental and Collaborative Ethnography. Since 2013, I have worked on a collaborative research project aiming to theorize, design, and build a digital platform to support experimental and collaborative ethnography. The Platform for Experimental Collaborative Ethnography (PECE) is an open source digital system that supports the archiving of ethnographic materials, collaborative hermeneutic analysis of this data, and new experimental forms of publication. In my role as an information architect for PECE, I have translated a series of “design logics” – or design directives informed by critical theoretical commitments and post-structural assumptions about language – into digital terms. However, many of the digital infrastructures that I have relied on in designing PECE’s information architecture (such as content management systems, metadata and provenance standards, and methods for linking data) have themselves been designed according to logics that oppose such commitments. For instance, such technologies often seek to stabilize ontologies and research workflows, while experimental ethnographers often seek to unsettle ontologies and workflows. Thus, in designing the information architecture of the platform, I have had to develop strategies for leveraging digital infrastructures in ways that undercut their underlying language ideologies; I have referred to this design practice as “devious design” and have published on it in the journal Design Issues.

The research that I’ve conducted with PECE demonstrates how digital humanists can design against information architectures interlaced with language ideologies that are inconsistent with assumptions that they bring to their research. It also contributes to the theorization of experimental ethnographic methodologies, demonstrating how experimental systems can be designed to challenge hegemonic modes of “knowledge representation.” PECE now supports several international ethnographic research projects, including The Asthma Files, the Disaster-STS Network, and a new initiative to archive historical documents for the Society of the Social Studies of Science (4S). In coming years, I will continue my role as PECE’s lead information architect, continuing my collaboration with the PECE Research Group (recently moved to the University of California Irvine, Department of Anthropology).

Finally, through both PECE and a fellowship with the Research Data Alliance (RDA), I have examined, characterized, and prioritized the infrastructural developments needed in the empirical humanities, such as institutional repositories for storing data and services, protocols for credentialing infrastructural and collaborative work, and experimental semiotic infrastructure. I have presented on this research to digital humanists, to directors of organizations that support humanities data sharing (such as DARIAH), and to members of RDA’s technical advisory board. I plan to continue pushing forward this agenda. I am currently working with collaborators to propose a research infrastructure initiative that will offer digital humanists’ technical services and consulting, organizational networking, and critical data education.

Future Projects. In my next research project, I plan to expand ethnographic research done for my dissertation within the human services informatics community, further engaging with people (“semiotic technologists”) involved in the design, use and critique of federal and state-level information systems that manage client data and determine eligibility for social services. Information systems such as the Homelessness Information Management System, and various state-level information systems for Medicaid, WIC benefits, and child welfare track and report client data in order to audit the distribution of benefits and provide data on the effectiveness of programs. In one sense, these systems produce numbers needed to sustain government programs designed to help communities in need, while at the same time, they reduce diverse and specific needs to a set of algorithmically-calculated numbers. There is need for critical analysis of the language ideologies and politics designed into such systems and how they impact understanding of social needs and the effectiveness of government programs. This research will bring critical insight on data infrastructure into efforts to better understand and design social services, in process addressing basic social theoretical questions about bureaucracy, inequality, and poverty.