An Algorithm for identifying, extracting and converting a table structure from a document inage into LaTeX format

San Sethasopon

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/12207

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Chidchanok Lursinsap	-
dc.contributor.author	San Sethasopon	-
dc.contributor.other	Chulalongkorn University. Faculty of Science	-
dc.date.accessioned	2010-03-15T04:12:39Z	-
dc.date.available	2010-03-15T04:12:39Z	-
dc.date.issued	2002	-
dc.identifier.isbn	9741733186	-
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/12207	-
dc.description	Thesis (M.Sc.)--Chulalongkorn University, 2002	en
dc.description.abstract	Table analysis is one of the attractive and challenging problems in document image analysis that encompasses table identification and table recognition. Table identification is based on the techniques of page segmentation and classification, whereby the results so extracted are analyzed and stored in some prearranged structures. This study proposes an algorithm for table analysis that starts from separating a document image into individual blocks. A non-tabled block is determined by the arrangement of data inside the block and the position of lines. Then, the recognized table blocks are converted into LaTeX formatted tables suitable for subsequent modification, storage, retrieval and transmission. The algorithm was tested with image blocks extracted from actual document images and synthesis samples. Various styles of tabled block-lines and data arrangement were correctly identified and analyzed. The algorithm gave good results for input samples having less skewed angle and noise.	en
dc.description.abstractalternative	การวิเคราะห์ตารางเป็นส่วนหนึ่งของปัญหาการวิเคราะห์ภาพเอกสารที่น่าสนใจ ประกอบด้วยวิธีการบ่งชี้ตารางซึ่งอยู่บนพื้นฐานของเทคนิคการแบ่งภาพและแยก ประเภทออกเป็นส่วน และวิธีการรู้จำตาราง วิทยานิพนธ์นี้เสนอขั้นตอนวิธีใหม่สำหรับการวิเคราะห์ตาราง เริ่มจากการแบ่งภาพเอกสารออกเป็นส่วนๆ ส่วนที่ไม่ใช่ตารางจะถูกกำหนดโดยการเรียงตัวของก้อนข้อมูลและตำแหน่งของเส้น แล้วส่วนที่เป็นตารางจะถูกแปลงเป็นรูปแบบลาเท็กซ์ ซึ่งเหมาะสำหรับการแก้ไข การจัดเก็บ การนำมาใช้ใหม่ และการส่งข้อมูล ขั้นตอนวิธีนี้ถูกทดสอบกับตัวอย่างที่เป็นส่วนที่สกัดมาจากภาพเอกสารจริง และจากการสร้างขึ้นเอง ตารางที่มีการเรียงตัวของข้อมูลและเส้นหลายรูปแบบถูกบ่งชี้และวิเคราะห์ได้ อย่างถูกต้อง ขั้นตอนวิธีที่ใช้นี้ให้ผลที่ดีกับตัวอย่างที่เอียงไม่มากและมีสิ่งรบกวน น้อย	en
dc.format.extent	1498707 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en	es
dc.publisher	Chulalongkorn University	en
dc.rights	Chulalongkorn University	en
dc.subject	Text processing (Computer science)	en
dc.subject	Document imaging systems	en
dc.subject	LaTeX (Computer file)	en
dc.title	An Algorithm for identifying, extracting and converting a table structure from a document inage into LaTeX format	en
dc.title.alternative	ขั้นตอนวิธีสำหรับการบ่งชี้การสกัดและการแปลงโครงสร้างตารางจากภาพเอกสารเป็นรูปแบบลาเท็กซ์	en
dc.type	Thesis	es
dc.degree.name	Master of Science	es
dc.degree.level	Master's Degree	es
dc.degree.discipline	Computational Science	es
dc.degree.grantor	Chulalongkorn University	en
dc.email.advisor	Chidchanok.L@Chula.ac.th	-
Appears in Collections:	Sci - Theses

Files in This Item:

File	Description	Size	Format
SanSet.pdf		1.46 MB	Adobe PDF	View/Open

Show simple item record