Discovery of usage patterns in digital library web logs using Markov modeling

Abstract : This paper proposes a family of tools based on Markov modeling to quantitatively analyze how people access the digital collections of the Bibliothèque nationale de France (BnF, the national library of France), through the web platform called Gallica. The aim is to provide the BnF with relevant information about the various usage patterns to help them to better understand their users, improve the mediation efforts and the design of the website, in order to increase the general public use of the 4M-documents collection. For that purpose, the study focuses on the access logs retrieved from the Apache HTTP servers of Gallica that are converted into sequences of actions. In order to study user navigation behaviors, we propose to model the access log data using Markov Models, whether it be Markov chains when considering sequences of actions without duration, or Markov processes when taking into account duration. Our models are either used to capture an average behavior through meaningful statistics or to cluster the data to exhibit various interpretable types of usage. The numerical results bring new insights on the way the users interact with the platform, highlighting the mean duration of some actions such as the interaction with the search engine or the consultation of documents. Even if our approach requires the use of additional information in order to properly interpret the models and the correlations that it highlights, it is able to discover all types of behaviors, including the stealthiest and the most difficult to capture in traditional surveys, giving them their fair weight in terms of audience. We also show how this approach fits into a broader work combining data mining and ethnography.
Complete list of metadatas

Cited literature [28 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02182244
Contributor : Christophe Prieur <>
Submitted on : Friday, July 12, 2019 - 4:07:55 PM
Last modification on : Friday, October 18, 2019 - 1:32:36 AM

File

nouvellet-etal-2018.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02182244, version 1

Citation

Adrien Nouvellet, Valérie Beaudouin, Florence d'Alché-Buc, Christophe Prieur, François Roueff. Discovery of usage patterns in digital library web logs using Markov modeling. 2019. ⟨hal-02182244⟩

Share

Metrics

Record views

128

Files downloads

42