Extracting Precise Data on the Mathematical Content of PDF Documents

Baker, Josef B.; Sexton, Alan P.; Sorge, Volker

About DML-CZ | FAQ | Conditions of Use | Math Archives | Contact Us

Previous | Up | Next

Article

Baker, Josef B. ; Sexton, Alan P. ; Sorge, Volker

Extracting Precise Data on the Mathematical Content of PDF Documents. (English). In: Sojka, Petr (ed.): Towards Digital Mathematics Library. Birmingham, United Kingdom, July 27th, 2008. Masaryk University, Brno, 2008. pp. 75-79

MSC: 68P99, 68U10, 68U15 | Zbl 1170.68481

Full entry |

PDF (0.5 MB) Feedback

Keywords:
document analysis

Summary:
As more and more scientific documents become available in PDF format, their automatic analysis becomes increasingly important. We present a procedure that extracts mathematical symbols from PDF documents by examining both the original PDF file and a rasterized version. This provides more precise information than is available either directly from the PDF file or by traditional character recognition techniques. The data can then be used to improve mathematical parsing methods that transform the mathematics into richer formats such as MathML.

Similar articles:

References:

1. Proberts, S., Brailsford, D.: Substituting Outline Fonts for Bitmap Fonts in Archived PDF Files. In Soft. Pract Exper., 33(9) pp. 885–899, 2003.

2. Phelps, T.: Multivalent. http://multivalent.sourceforge.net/

3. Rahman, F., Alam, H.: Conversion of PDF documents into HTML: A case study of document image analysis. In Conf. on Signal, Systems, Computers pp. 87–91, 2003.

4. Shao, M., Futrelle, R.: Graphics recognition in PDF documents. In Proc. of GREC 2005, LNCS 3926. Springer, 2006.

5. Raja, A., Rayner, M., Sexton, A., Sorge, V.: Towards a parser for mathematical formula recognition. In Proc. of MKM 2006, LNCS 4108. pp. 139–151, Springer, 2006. Zbl 1188.68284

6. Grbavec, A., Blostein, D.: Mathematics recognition using graph rewriting. In Proc. of ICDAR ’95, pp. 417–421, 1995.

Browse
- Collections
- Titles
- Authors
- MSC

About DML-CZ

Partner of

Article

Search

Browse