Tutorials – ICDAR 2023

Computational Analysis of Historical Documents – Thursday, August 24, 2023 – Morning

Deep Learning – Thursday, August 24, 2023 – Afternoon

Document Image Binarization – Friday, August 25, 2023 – Morning

Unlocking the Potential of Unstructured Data in Business Documents Through Document Intelligence – Friday, August 25, 2023 – Afternoon

Computational Analysis of Historical Documents

Thursday, August 24, 2023

09:00 – 10:30 – Tutorial Part I
10:30 – 11:00 – Break
11:00 – 12:00 – Tutorial Part II

Isabelle Marthot-Santaniello (University of Bale, Switzerland)
i.marthot-santaniello@unibas.ch

Hussein Adnan Mohammed (University of Hambourg, Germany)
hussein.adnan.mohammed@uni-hamburg.de

This tutorial will explore the specialised area of Historical Handwritten Manuscripts and Ancient Handwritten Artefacts within the broader discipline of Document Analysis and Recognition. Held jointly by a Manuscript specialist and a Computer Scientist, the main focus will be on the unique challenges, difficulties, and opportunities inherent in working with these types of documents. Research questions are often project-specific and require understanding the peculiarities of the data, carefully tuning existing solutions, and even developing novel approaches in many cases. Poor image quality and degradation can present significant obstacles for image processing and analysis. Moreover, obtaining accurate ground-truth annotations for these documents can be difficult due to the required expertise and the subjectivity of interpretation. Furthermore, the use of different terminologies by scholars in the Humanities and computer scientists can lead to misunderstandings and miscommunication. Collaborating across disciplines can also present challenges, such as coordinating project budgets and working styles.

This field offers, however, a wealth of research opportunities for computer scientists, while also providing challenging problems that can inspire innovative and creative solutions. These solutions can benefit the broader field of Document Analysis and Recognition, and can also greatly contribute to the fields of the Humanities, providing new insights and helping to answer important research questions, raising the interest of the general audience. The unique challenges presented by historical handwritten manuscripts and ancient artefacts require a cross-disciplinary approach and the application of cutting-edge technologies, leading to exciting new developments in the field.

By the end of this tutorial, participants will have gained a better understanding of the challenges and opportunities presented by Historical Handwritten Manuscripts and Ancient Handwritten Artefacts. They will have acquired practical knowledge about possible research directions and the necessary competences to collaborate effectively with scholars from different disciplines. Overall, this tutorial will provide participants with a solid foundation to conduct cutting-edge research in this exciting and rapidly-evolving field.

Dr. Hussein Mohammed received his masters degree in informatics engineering from the University of the Algarve and continued his work on shape detection and recognition as a research associate at the computer vision lab of the same university in Portugal. In October 2015, he moved to Hamburg University in order to continue his research in the field of computational document analysis. In March 2019, he received his doctoral degree in computer science from Hamburg University for his work in computational analysis of handwriting styles. Since July 2019, he is a principal investigator at the Cluster of Excellence: Understanding Written Artefacts in Hamburg University. His main research interests are pattern recognition, machine learning and computer vision.

Over the past seven years, he has published several peer-reviewed articles on various research topics related to the field of historical document analysis, including writer identification, handwriting style analysis, and pattern detection in historical documents. Additionally, he has played an active role in organising local and international scientific events focused on the computational analysis of historical written artefacts. This dedication to the practical application of his work is further reflected in the development of several of his methods as software tools, which he has made available for free and which have been used by many scholars in the Humanities for their research.

Dr. Isabelle Marthot-Santaniello received her masters degree in Ancient History and Classics from the Ecole Pratique des Hautes Etudes in Paris where she later specialised in Greek Papyrology for her PhD. After a post-doctoral experience at the University of Minnesota on a crowd-sourcing project funded by the National Endowment for the Humanities, she moved to Basel, Switzerland, where she was involved in several projects funded by the Swiss National Science Foundation (SNSF).

Between September 2018 and May 2023, she was the Principal Investigator of the SNSF Ambizione project “Reuniting fragments, identifying scribes and characterizing scripts: the Digital paleography of Greek and Coptic papyri (d-scribes)” and will continue as Principal Investigator of the SNSF Starting Grant project “EGRAPSA: Retracing the evolutions of handwritings in Graeco-Roman Egypt thanks to digital palaeography” (June 2023-May 2028). In the scope of this research and thanks to collaborations with numerous teams of scholars in France, Germany, Italy, Greece and Pakistan, she has published several articles on Writer Identification and Papyrus Image Enhancement and released a software on the later topic. Besides organising several local and international scientific events on Computational Paleography (including two workshops in ICDAR 2021 and 2023), she took part in the organisation of ICDAR 2019 DIBCO Competition and ICDAR2023 Competition on Detection and Recognition of Greek Letters on Papyri. As a recognition for the interest generated by her research, she was invited to give a keynote lecture at the latest International Congress of Papyrologists (Paris, July 2022).

Deep Learning

Thursday, August 24, 2023

14:00 – 15:00 – Tutorial Part I
15:00 – 15:30 – Break
15:30 – 16:30 – Tutorial Part II

Thomas Breuel (Nvidia, USA)
tbreuel@nvidia.com

Labeled training data has been the basis for many successful applications of deep learning, but such data is limited or unavailable in many applications. In this lecture, we examine the statistical foundations of unsupervised learning and identify techniques and principles of how these foundations are implemented in deep learning systems. We then apply these techniques to deep learning problem in OCR, including text recognition, layout analysis, and language modeling.

Concepts and tasks: self-supervised learning, weakly supervised learning, active learning, zero shot learning, one shot learning.
Statistical theory and approaches to self-supervised learning (priors, clustering, latent variables, metric learning, subspaces, cross-domain learning, EM training).
Information theoretic analysis of self-supervised learning (information sources, MDL, compression).
Deep learning techniques: representation learning, pseudolabels, masking, prediction, contrastive learning, generative models, transformations, latent variables.
Applications in OCR and document analysis.

Thomas Breuel works on deep learning and computer vision at NVIDIA Research. Prior to NVIDIA, he was a full professor of computer science at the University of Kaiserslautern (Germany) and worked as a researcher at Google, Xerox PARC, the IBM Almaden Research Center, IDIAP Switzerland, as well as a consultant to the US Bureau of the Census. He is an alumnus of Massachusetts Institute of Technology and Harvard University.

Document Image Binarization

Friday, August 25, 2023

09:00 – 10:30 – Tutorial Part I
10:30 – 11:00 – Break
11:00 – 12:30 – Tutorial Part II

Tutorial Slides can be found in the ICDAR 2023 Shared Media Repository.

Rafael Dueire Lins (Universidade Federal Rural de Pernambuco & Universidade Federal de Pernambuco, Brazil)

Ricardo Barboza (Universidade do Estado do Amazonas, Brazil)

Document image binarization is a key step in many document processing pipelines ranging from image enhancement, compression, automatic transcription and indexing, skew and orientation detection and correction, etc. Thus, it is of interest to virtually everyone in document engineering either from industry or academia. No algorithm is suitable to binarize all kinds of document images. Their performance varies widely in terms of the quality of the produced image, which should be matched with the kind of application it will be used, the time elapsed in the binarization process and the size of the final file. This half-day tutorial will provide means on how to choose a suitable binarization algorithm for different applications. The “nature” of the most widely used 68 binarization algorithms in document images will be addressed in this tutorial also making a quality, time, and file-size assessment in several kinds of document images.

Rafael Dueire Lins is possibly the pioneer researcher in document processing in Latin America, as in 1992 he started the Nabuco Project [1]. It aimed at digitalizing and making publicly available the file of about 6,500 letters of Joaquim Nabuco (b.1861-d.1910), a Brazilian statesman, writer, and diplomat, one of the key figures in the campaign for freeing black slaves in Brazil, and the 1^st Brazilian ambassador to the U.S.A. Lins binarized the document images of the Nabuco file as a way of compressing all of them to fit in one CD, when he came across documents with back-to-front interference, much later called bleeding, being the first researcher to describe such a “noise” and to propose an algorithm to remove it. Since then, Lins proposed not only several binarization schemes for different kinds of documents, but also developed several ways of assessing binarization algorithms for scanned and photographed documents. Rafael Lins led a R&D project for Hewlett-Packard Labs. (USA and India) for over a decade in several areas of document engineering.

[1] LINS, Rafael Dueire. Nabuco – Two Decades of Processing Historical Documents in Latin America. Journal of Universal Computer Science.March 2011. Doi: http://dx.doi.org/10.3217/jucs-017-01-0151

Ricardo Barboza graduated in Industrial Electrical Engineering from the Instituto de Tecnologia da Amazônia (1996) and Ph.D. in Electrical Engineering from the Federal University of Pernambuco (2013). He has over 20 years of experience in education, research, and team management. Conducted research in the field of Computer Science, with an emphasis on Digital Image Processing. Published papers on the following topics: image denoising and filtering, image binarization, forensic analysis, systems development, databases, education, digital tv and digital games, error correcting codes. Ricardo Barboza has been leading several R&D projects for Samsung and TecToy in Brazil.

Unlocking the Potential of Unstructured Data in Business Documents Through Document Intelligence

Friday, August 25, 2023

13:30 – 15:00 – Tutorial Part I
15:00 – 15:30 – Break
15:30 – 17:00 – Tutorial Part II

Anand Mishra (IIT Jodhpur) – in person

Vijay Mahadevan (AWS) – in person

Himanshu Sharad Bhatt (American Express) – online
Himanshu.s.bhatt@aexp.com

Sriranjani Ramakrishnan (American Express) – online
Sriranjani.Ramakrishnan@aexp.com

Prof. C. V. Jawahar (IIIT Hyderabad) – online
jawahar@iiit.ac.in

Over the past few decades, the document analysis and recognition system has gone through major changes ranging from simple heuristic based approaches to deep neural network systems, from processing historic hand-written documents to scene text reading and visual question answering. Traditional manual approaches include processing fixed layout information using rules which are often labor intensive, non-scalable and prone to errors. The next milestone development using statistical approaches uses annotated data with different features to learn machine learning models. This approach although provides certain degree of performance improvement, cannot be widely adapted due to lack of training samples, handcrafted features and deployment issues. Recently with the deep learning era, intelligent document processing is changing rapidly by leveraging unlabeled data with modern architectures handling unified text, image, layout, style and other information for multiple downstream applications in industry. This tutorial will present relevant systems in the evolution of the document intelligence space.

Anand Mishra is an Assistant Professor at the Department of Computer Science and Engineering and an affiliate faculty member at the School of AI and Data Science at IIT Jodhpur. He leads Vision, Language, and Learning Group (VL2G) which focuses on problems intersecting vision and language at IIT Jodhpur. Previously, Anand worked with Dr. Partha Pratim Talukdar and Dr. Anirban Chakraborty at the Indian Institute of Science for nearly two years on Knowledge-aware Computer Vision. Anand did his Ph.D. working under the supervision of Prof. C. V. Jawahar and Dr. Karteek Alahari on understanding text in scene images at IIIT Hyderabad. At IIIT, Anand was a recipient of the Microsoft Research India Ph.D. fellowship in 2012 and XRCI best doctoral dissertation award: first runner-up in 2015. Anand’s research group has been generously supported through various industry and government of India grants, including MAPG 2021 and 2023, MeitY Grant on National Language Translation Mission, a Gift Grant from Accenture, and a Start-up Research Grant from SERB, Govt. of India. Anand has several high-impact recent publications including CVPR 2023, IJCAI 2023, WACV 2023, ICDAR 2023, EMNLP 2022, ICCV 2021, ICDAR 2021, ECCV 2020, AAAI 2019, and ICCV 2019. Recently, Anand has also been recognized as an outstanding reviewer at ICCV’21 and CVPR’23.

Vijay Mahadevan is Senior Manager of Applied Science at AWS AI Labs where he leads a team of scientists building the industry leading document understanding service Amazon Textract. Vijay and his team develop approaches based on deep learning and computer vision for OCR, table recognition, document question answering and other document understanding tasks. His research interests lie in document rectification, semi-supervised and self-supervised training of document understanding models, and large multimodal transformer based models for information extraction from documents. Vijay received his PhD from UC San Diego, and has held research positions at Qualcomm, Yahoo Labs and Uber.

Himanshu Sharad Bhatt is a Research Director at AI Labs, American Express where he is actively involved in developing Document AI capabilities for unstructured documents. He is responsible for building novel AI capabilities to read, understand, and interpret documents and provide bespoke AI solutions to solve myriad business problems. Prior to joining Amex in 2017, Himanshu has worked with Xerox Research, India towards building “unstructured data analytics” capabilities for contact centers and services division. Himanshu holds a PhD degree in Computer Science & Engineering where his thesis was acknowledged with the “best thesis award” by INAE and IUPRAI in 2014. Over the years, his work has led to 30+ publications in reputed conferences and journals and 7 US patent to his credit. He has co-organized at multiple tutorials including Document Analysis Systems (DAS-2022), Toronto Machine Learning Summit (TMLS-2021), tutorial on Data Science and Machine Learning at Grace Hopper Celebration of Women in Computing (India), ACM India Compute 2015, and Xerox Innovation Group Conference in Palo Alto Research Centre East (PARC-East) at Webster, US. He has also co-organized workshops at International Workshop on Domain Adaptation for Dialog Agents (DADA) in ECML-PKDD 2016. He also presented tutorials at Document Analysis Systems (2022), The Toronto Machine Learning Systems (TMLS-2021), invited speaker at the faculty training program on “Data Science & Analytics-2020″ at IIT-Indore sponsored by MHRD, Govt of India and Continuum-2019, the rolling seminar series held at Shailesh J. Mehta School of Management, IIT Bombay.

Sriranjani Ramakrishnan is a Manager Data Science at American Express AI Labs India. She is working with her team to develop Document AI capabilities to read, understand and extract semi-structured/unstructured documents and provide customized AI solutions to power business automation and analytics use-cases. She is a machine learning/deep learning researcher with half a decade of experience in research labs. An innovator, worked with cross-functional teams, building novel AI-prototypes across multiple domains using text, image, and speech inputs. Published several patents and research papers in top international conferences including CVPR, ACL. Sriranjani worked on problems in Transfer learning, Domain adaptation, Explainable AI (interpretable models) applying to NLP and Computer Vision across transportation and customer experience domains. Prior joining Amex AI Labs, she worked as a senior research engineer at Conduent Labs (Previously known as Xerox Research Labs India) and holds master’s degree from Indian Institute of Technology Madras. She has delivered tutorial talk at ICON 2019 (International Conference on Natural Language Processing). She has been an invited speaker across multiple venues including John Hopkins University, VIT Vellore and various community meetups. She is recognized by Google as Machine Learning Expert and community organizer of TensorFlow user community Hyderabad.

Prof. C. V. Jawahar is the Dean of Research & Development and Head of the Centre for Visual Information Technology-CVIT, and Machine Learning Lab at the International Institute of Information Technology, Hyderabad (IIITH), India. He leads the research group focusing on computer vision, machine learning, and multimedia systems. He is also the CEO of the Centre for Innovation and Entrepreneurship (CIE-IIIT) at IIIT Hyderabad. Prof. Jawahar was awarded a doctorate from IIT Kharagpur in 1997 and has been associated with IIIT Hyderabad since December 2000. He plays a pivotal role in guiding a large group of Ph.D. and MS students on computer vision, machine learning, and multimedia systems. In recent years, he has been actively involved in research questions that converge mobility, health, vision, language, and text. He is also interested in large-scale multimedia systems with a special focus on retrieval.

An Amazon Chair Professor, Prof. Jawahar is also an elected Fellow of the Indian National Academy of Engineers (INAE) and the International Association of Pattern Recognition (IAPR). His prolific research is globally recognized in the Artificial Intelligence and Computer Vision research community with more than 100 publications in top tier conferences and journals in computer vision, robotics and document image processing to his credit with over 12000 citations. He is awarded the ACM India Outstanding Contribution to Computing Education (OCCE) 2021 for fundamental, selfless service in teaching of computing, and nurturing a generation of students who now serve the larger society and have led to an impact in multiple dimensions of computing education. He has served as the co-organizer and program chair in top-tier conferences that include ACCV, ECCV, ICCV, ICDAR, ICVGIP, ICFHR and WACV. Presently, he is an area editor of CVIU and an associate editor of IEEE, PAMI and IJDAR.

Extremely conscious of the social and practical relevance and application of research, Prof. Jawahar is actively engaged with several government agencies, ministries, and leading companies around innovating at scale through research.