PhD Thesis
PhD thesis as advisor and jury.
Current
2025
- Advisor
Analyzing binary programs and obfuscation with graph-based representations and machine learningRoxane Cohen2025Thèse de doctorat dirigée par Rossi, Fabrice Mathematiques appliquees Paris DauphineProtecting sensitive program content is a critical concern in both legitimate and malicious contexts, with obfuscation being one of the most widely used techniques. By transforming the binary code of compiled programs, obfuscation hinders and delays reverse engineering, forcing attackers to follow specific analytical steps to bypass such protections. This thesispresents several contributions, all grounded in advanced graph representation learning, covering different stages of the obfuscation analysis process. First, we address obfuscation detection and characterization as binary and multi-class classification problems, comparing graph formats, baseline and advanced deep learning models, and enriched feature sets, while assessing the impact of data distribution and compiler optimizations. Our results show that advanced Graph Neural Networks models outperform baselines when provided with features capturing fine-grained code semantics, though performance remains sensitive to optimization levels and dataset partitioning. Second, we conduct an extensive study of binary diffing and similarity, from the standard unobfuscated case, through an ablation study of QBinDiff and a comparison of existing tools, to obfuscated scenarios where attackers aim to extract knowledge from an obfuscated program, without full deobfuscation. Results highlight QBinDiff’s advantages for fine-grained diffing and reveal that most intra-procedural and data-flow obfuscations offer limited protection compared to the stronger, yet rarely used, inter-procedural obfuscations. Finally, we explore direct deobfuscation by integrating reinforcement learning and graph representation learning. We model the task as a white-box symbolic compression problem, representing obfuscated functions in a novelgraph format, combining data and control flow, and simplifying them iteratively through graph edit operations guided by a Graph Neural Networks-based agent. Preliminary results on Mixed Boolean Arithmetic obfuscation demonstrate partial but promising success, opening avenues for further research.
@phdtesis{cohenthesis, title = {Analyzing binary programs and obfuscation with graph-based representations and machine learning}, url = {https://theses.hal.science/tel-05449270}, author = {Cohen, Roxane}, year = {2025}, note = {Thèse de doctorat dirigée par Rossi, Fabrice Mathematiques appliquees Paris Dauphine}, dimensions = {true}, }
Past
2022
- Advisor
Towards 1-day Vulnerability Detection using Semantic Patch SignaturesAlexis Challande2022Thèse de doctorat dirigée par Renault, Guénaël Informatique Institut polytechnique de Paris 2022To maintain the security of information systems, deploying the proposed updates as soon as they are available is a good practice encouraged by all the computer security actors. Indeed, the exploitation of 1-day vulnerabilities (so called because a patch has been available for at least 1 day) can be devastating as EternalBlue or Shellshock have illustrated.The objective of this thesis is to propose methods and their practical application to detect if these patches are well applied at the lowest level, i.e. in the binary code. This is essential to have a reliable view of a system protection.To achieve this goal, we have established several milestones. The first one consists in an in-depth study of a typical patch, before formalizing a framework for searching for them at the scale of a complete system.We then propose the implementation of a software solution that automatically builds semantic signatures of vulnerability patches and searches for these signatures in filesystems.Finally, we test this solution in real conditions (i.e. detection of patches in images of the Android operating system) and show the relevance of our approach.
@phdtesis{challandethesis, title = {Towards 1-day Vulnerability Detection using Semantic Patch Signatures}, url = {http://www.theses.fr/2022IPPAX096}, author = {Challande, Alexis}, year = {2022}, note = {Thèse de doctorat dirigée par Renault, Guénaël Informatique Institut polytechnique de Paris 2022}, dimensions = {true}, }
2021
- Advisor
Binary Diffing as a Network Alignment ProblemElie Mengin2021Thèse de doctorat dirigée par Rossi, Fabrice Mathematiques appliquees Paris 1 2021In this thesis, we address the problem of binary diffing, i.e. the problem of finding the best possible one-to-one correspondence between the functions of two programs in binary form. This problem is a major challenge in several fields of computer security since it automatically designates to an analyst the pieces of code that might have been previously analyzed among other programs. We propose a quite natural formulation of the binary diffing problem as a particular instance of a graph edit problem over the call graphs of the programs. Through this formulation, the quality of the function mapping is evaluated simultaneously with respect to both the function content similarity and the function calls consistency. We prove that this versatile formulation is in fact equivalent to the well studied network alignment problem, which enables us to leverage common optimization techniques. Following previous works, we propose a solving strategy based on max-product belief propagation, and introduce QBinDiff, a network alignment solver that outperforms other state-of-the-art methods in almost all instances. We finally show that our approach outperforms existing diffing tools, and that the matching strategy has more influence on the quality the solution than the measure of function similarity.
@phdthesis{menginthesis, title = {Binary Diffing as a Network Alignment Problem}, author = {Mengin, Elie}, year = {2021}, note = {Thèse de doctorat dirigée par Rossi, Fabrice Mathematiques appliquees Paris 1 2021}, dimensions = {true}, }
2020
- Jury
L’usage de l’exécution symbolique pour la déobfuscation binaire en milieu industrielJonathan SalwanUniversité Grenoble Alpes , Feb 2020Thèse de doctorat dirigée par Potet, Marie-Laure et Bardin, Sébastien Mathématiques et informatique Université Grenoble Alpes 2020This doctoral work has been done in an industrial environment where the main activities were reverse engineering for vulnerability research and security properties verification on binary programs. The first part of this doctoral work focuses on the collection and sharing of the industrial problems when analyzing binary programs. Based on these issues, a binary dynamic analysis framework has been developed and formalized. Real examples of use are then presented, such as the detection of opaque predicates in branch conditions. Finally, a new automatic approach for deobfuscation of binary code protected by virtualization is presented combining features of the framework as well as those of other tools.
@phdthesis{salwanthesis, url = {http://www.theses.fr/2020GRALM005}, title = {L’usage de l’exécution symbolique pour la déobfuscation binaire en milieu industriel}, author = {Salwan, Jonathan}, year = {2020}, month = feb, school = {Université Grenoble Alpes}, note = {Thèse de doctorat dirigée par Potet, Marie-Laure et Bardin, Sébastien Mathématiques et informatique Université Grenoble Alpes 2020}, dimensions = {true}, }