Institut National des Sciences Appliquées de Lyon

Postdoc Researcher • 2023 — present

Topic: Optimal solution of partially observable multi-agent systems
Advisor: Prof. Jilles Steeve Dibangoye

University of East Anglia - Norwich Business School

Senior Research Associate • 2022 — 2023

Topic: Recommender systems and suppliers competition in digital markets
Advisors: Prof. Peter Ormosi, Prof. Amelia Fletcher Prof. Rahul Savani

University of Liverpool - Department of Computer Science

Doctor of Philosophy - Ph.D. • 2017 — 2022

Topic: Deep learning for multi-agent reinforcement learning and decision making
Supervisors: Dr. Frans Oliehoek, Prof. Rahul Savani
Passed under minor revisions

University of Perugia - Department of Mathematics and Computer Science

Master's Degree, Computer Science • 2014 — 2017

Thesis: Learning numeracy - binary arithmetic with Neural Turing Machines
Supervisors: Dr. Valentina Poggioni, Dr. Marco Baioletti
Final mark: 110/110 with honors

University of Perugia - Department of Mathematics and Computer Science

Bachelor's Degree, Computer Science • 2011 — 2014

Thesis: Krylov iterative methods for the geometric mean of two matrices times a vector
Supervisor: Dr. Bruno Iannazzo
Final mark: 110/110 with honors

On Convex Optimal Value Functions For POSGs

Co-author with Rafael F. Cunha, Johan Peralez and Jilles S. Dibangoye

• arXiv, 15 November 2023 [pdf] [.bib]

Multi-agent planning and reinforcement learning can be challenging when agents cannot see the state of the world or communicate with each other due to communication costs, latency, or noise. Partially Observable Stochastic Games (POSGs) provide a mathematical framework for modelling such scenarios. This paper aims to improve the efficiency of planning and reinforcement learning algorithms for POSGs by identifying the underlying structure of optimal state-value functions. The approach involves reformulating the original game from the perspective of a trusted third party who plans on behalf of the agents simultaneously. From this viewpoint, the original POSGs can be viewed as Markov games where states are occupancy states, \ie posterior probability distributions over the hidden states of the world and the stream of actions and observations that agents have experienced so far. This study mainly proves that the optimal state-value function is a convex function of occupancy states expressed on an appropriate basis in all zero-sum, common-payoff, and Stackelberg POSGs.

Recommender Systems and Competition on Subscription-Based Platforms

Co-author with Amelia Fletcher, Peter L. Ormosi and Rahul Savani

• SSRN Working Paper No. 4428125, 47 pages, 27 April 2023 [pdf] [.bib]

Subscription-based platforms offer consumers access to a large selection of content at a fixed subscription fee. Recommender systems can help consumers by reducing the size of this choice set by predicting consumers' preferences. However, their prediction is based on limited information on the consumers and sometimes even on the content, which means that the recommendations are often biased. In this paper we introduce a simple theoretical framework for platforms selling to consumers with a quasi-linear utility function via a recommender system. We simulate a set of different recommender systems and use them in this framework to test our hypothesis that RS biases lead to more concentrated markets, increased entry barriers, and increased homogeneity in the recommendations even where the platform is inherently customer-centric and not self-preferencing. Although encouraging more exploration can reduce these market consolidating effects, they can reduce recommendation relevance in the short-run.

Biased Recommender Systems And Supplier Competition

Co-author with Amelia Fletcher, Peter L. Ormosi and Rahul Savani

• SSRN Working Paper No. 4319311, 35 pages, 06 January 2023 [pdf] [.bib]

Recommender systems are prevalent across digital platforms. They use machine learning techniques to help consumers make choices by predicting their preferred items. If RS had perfect information about consumer preferences and item attributes, they could recommend the most suitable item for each consumer. However, in practice, recommender systems have incomplete information, and their prediction models can exhibit systemic biases. Our stylised model shows such biases can dampen competition between the suppliers selling through digital platform, arising from the fact that biased recommendations are less closely linked to true preferences. Three specific types of bias are examined and are shown to have subtly different effects. Competition remains stronger where suppliers can compete to gain the benefit of the bias, a form of competition for the market. The worst market outcomes can be avoided if consumers can reject unsuitable recommendations, since this helps to restore the competitive constraint on suppliers. However, a model extension shows that these results no longer necessarily hold with endogenous vertical quality. Importantly, in choosing its recommender system, the platform’s preferences are not typically aligned with those of consumers.

Difference Rewards Policy Gradients

Main author with Sam Devlin, Frans A. Oliehoek and Rahul Savani

• Neural Computing and Applications (S.I. on Adaptive and Learning Agents 2021), 24 pages, Springer Nature, 11 November 2022 [pdf] [.bib]
• Extended Abstract in Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems AAMAS'21, 1475-1477, IFAAMAS, 2021 [pdf] [.bib]
• Best Paper Award at ALA'21, 03-04 May 2021 [pdf] [video] [slides]

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the Q-function as done by counterfactual multi-agent policy gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the difference rewards.

Analysing factorizations of action-value networks for cooperative multi-Agent reinforcement learning

Main author with Frans A. Oliehoek, Rahul Savani and Shimon Whiteson

• Autonomous Agents and Multi-Agent Systems 35(25), 53 pages, Springer Nature, 07 June 2021 [pdf] [.bib]
• Extended Abstract in Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems AAMAS'19, 1862-1864, IFAAMAS, 2019 [pdf] [.bib]

Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. However, given the lack of theoretical insight, it remains unclear what the employed neural networks are learning, or how we should enhance their learning power to address the problems on which they fail. In this work, we empirically investigate the learning power of various network architectures on a series of one-shot games. Despite their simplicity, these games capture many of the crucial problems that arise in the multi-agent setting, such as an exponential number of joint actions or the lack of an explicit coordination mechanism. Our results extend those in Castellini et al. (Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS’19.International Foundation for Autonomous Agents and Multiagent Systems, pp 1862–1864, 2019) and quantify how well various approaches can represent the requisite value functions, and help us identify the reasons that can impede good performance, like sparsity of the values or too tight coordination requirements.

Learning Numeracy: Binary Arithmetic with Neural Turing Machines

Main author

• arXiv, 04 April 2019 [pdf] [.bib]

One of the main problems encountered so far with recurrent neural networks is that they struggle to retain long-time information dependencies in their recurrent connections. Neural Turing Machines (NTMs) attempt to mitigate this issue by providing the neural network with an external portion of memory, in which information can be stored and manipulated later on. The whole mechanism is differentiable end-to-end, allowing the network to learn how to utilise this long-term memory via SGD. This allows NTMs to infer simple algorithms directly from data sequences. Nonetheless, the model can be hard to train due to a large number of parameters and interacting components and little related work is present. In this work we use a NTM to learn and generalise two arithmetical tasks: binary addition and multiplication. These tasks are two fundamental algorithmic examples in computer science, and are a lot more challenging than the previously explored ones, with which we aim to shed some light on the capabilities on this neural model.

Fake Twitter followers detection by denoising autoencoder

Main co-author with Valentina Poggioni and Giulia Sorbi

• Proceedings of the International Conference on Web Intelligence WI’17, 195-202, ACM, 2017 [pdf] [.bib]

Gaining followers on the Twitter platform has become a rapid way to increase one’s credibility on this social network, that in the last few years has become a launch pad for new trends and to influence people opinions. So, many people have begun to buy fake followers on underground markets appositely created to sold them. Therefore, identifying fake followers profiles is useful to maintain the balance between real influential people on the network and people who simply exploited this mechanism. This work presents a model based on artificial neural networks able to detect fake Twitter profiles. In particular, a denoising autoencoder has been implemented as anomaly detector trained with a semi-supervised learning approach. The model has been tested on a benchmark already used in literature and results are presented.

Krylov iterative methods for the geometric mean of two matrices times a vector

Main author

• Numerical Algorithms 74(2), 561-571, Springer US, 26 January 2017 [buy] [pdf] [.bib]

In this work, we are presenting an efficient way to compute the geometric mean of two positive definite matrices times a vector. For this purpose, we are inspecting the application of methods based on Krylov spaces to compute the square root of a matrix. These methods, using only matrix-vector products, are capable of producing a good approximation of the result with a small computational cost.

Academic Year 2020/21

  • COMP532 Module Demonstrator (Machine Learning and BioInspired Optimization)

    • University of Liverpool - Department of Computer Science • January 2021 — May 2021 • Lecturer: Dr. Shan Luo
  • COMP211 Module Demonstrator (Computer Networks)

    • University of Liverpool - Department of Computer Science • October 2020 — December 2020 • Lecturer: Dr. Martin Gairing

Academic Year 2019/20

  • COMP532 Module Demonstrator (Machine Learning and BioInspired Optimization)

    • University of Liverpool - Department of Computer Science • January 2020 — May 2020 • Lecturer: Dr. Shan Luo
  • COMP202 Module Demonstrator (Complexity of Algorithms)

    • University of Liverpool - Department of Computer Science • January 2020 — May 2020 • Lecturer: Prof. Piotr Krysta
  • COMP211 Module Demonstrator (Computer Networks)

    • University of Liverpool - Department of Computer Science • October 2019 — December 2019 • Lecturer: Dr. Martin Gairing

Academic Year 2018/19

  • COMP532 Module Demonstrator (Machine Learning and BioInspired Optimization)

    • University of Liverpool - Department of Computer Science • January 2019 — May 2019 • Lecturer: Dr. Shan Luo
  • COMP202 Module Demonstrator (Complexity of Algorithms)

    • University of Liverpool - Department of Computer Science • January 2019 — May 2019 • Lecturer: Prof. Piotr Krysta
  • COMP305 Module Demonstrator (BioComputation)

    • University of Liverpool - Department of Computer Science • October 2018 — December 2018 • Lecturer: Prof. Irina V. Biktasheva
  • COMP219 Module Demonstrator (Advanced Artificial Intelligence)

    • University of Liverpool - Department of Computer Science • October 2018 — December 2018 • Lecturer: Dr. Xiaowei Huang

Academic Year 2017/18

  • COMP532 Module Demonstrator (Machine Learning and BioInspired Optimization)

    • University of Liverpool - Department of Computer Science • January 2018 — May 2018 • Lecturer: Dr. Shan Luo
  • COMP202 Module Demonstrator (Complexity of Algorithms)

    • University of Liverpool - Department of Computer Science • January 2018 — May 2018 • Lecturer: Prof. Piotr Krysta

Mail

jacopo [dot] castellini [at] insa-lyon [dot] fr
jacopo [dot] castellini [at] inria [dot] fr

Office

I am currently a member of the CITI Laboratory

Floor 1, Télécommunications INSA Lyon, 6 Avenue des Arts, 69100 Villeurbanne, France

Phone

I do not have a French phone number yet, I am sorry...