On the Logic and Learning of Language

by Sean A. Fulop


Formats

E-Book
$9.99
Softcover
$34.99
E-Book
$9.99

Book Details

Language : English
Publication Date : 10/14/2004

Format : E-Book
Dimensions : N/A
Page Count : 1
ISBN : 9781412222181
Format : Softcover
Dimensions : 6.5x9.5
Page Count : 242
ISBN : 9781412023818

About the Book

This book presents the author's research on automatic learning procedures for categorial grammars of natural languages. The research program spans a number of intertwined disciplines, including syntax, semantics, learnability theory, logic, and computer science. The theoretical framework employed is an extension of categorial grammar that has come to be called multimodal or type-logical grammar. The first part of the book presents an expository summary of how grammatical sentences of any language can be deduced with a specially designed logical calculus that treats syntactic categories as its formulae. Some such Universal Type Logic is posited to underlie the human language faculty, and all linguistic variation is captured by the different systems of semantic and syntactic categories which are assigned in the lexicons of different languages. The remainder of the book is devoted to the explicit formal development of computer algorithms which can learn the lexicons of type logical grammars from learning samples of annotated sentences. The annotations consist of semantic terms expressed in the lambda calculus, and may also include an unlabeled tree-structuring over the sentence.

The major features of the research include the following:

We show how the assumption of a universal linguistic component---the logic of language---is not incompatible with the conviction that every language needs a different system of syntactic and semantic categories for its proper description.

The supposedly universal linguistic categories descending from antiquity (noun, verb, etc.) are summarily discarded.

Languages are here modeled as consisting primarily of sentence trees labeled with semantic structures; a new mathematical class of such term-labeled tree languages is developed which cross-cuts the well-known Chomsky hierarchy and provides a formal restrictive condition on the nature of human languages.

The human language acquisition mechanism is postulated to be biased, such that it assumes all input language samples are drawn from the above "syntactically homogeneous" class; in this way, the universal features of human languages arise not just from the innate logic of language, but also from the innate biases which govern language learning.

This project represents the first complete explicit attempt to model the aquisition of human language since Steve Pinker's groundbreaking 1984 publication, "Language Learnability and Language Development."




About the Author

Sean Fulop received his B.Sc. in Physics from the University of Calgary in 1991 with a Linguistics minor, and abandoned Physics in favour of Linguistics. He went on to complete his Ph.D. in Linguistics at UCLA in 1999, gaining expertise in the two specialties of phonetics and mathematical linguistics. His doctoral dissertation bore the same title as the present book, and was an incomplete precursor to the research that is now being reported. In the intervening years he has held temporary lectureships and professorships, most recently appointed as Visiting Assistant Professor of Linguistics at the University of Chicago, with an affiliation to the Computer Science Department. Aside from the mathematics of language and speech, the author's greatest preoccupations are his family (including wife Jacquie and daughters Sandra and Brenna), sports cars, and progressive rock music.



Preface

This book is my Ph.D. dissertation all grown up. Though this volume and my 1999 UCLA Linguistics dissertation share the same title and core ideas, the earlier work was woefully inadequate in many ways in which the present book is not. This is not to say, of course, that the present book isn't woefully inadequate, but it is fair to say that many of the former inadequacies have been eliminated.

Although certain publishers wouldn't believe it, this book is in part a foundational project in computational linguistics. The term "computational linguistics" is nowadays taken to refer to a kind of engineering discipline whose primary goal is to get computers to deal with information presented by means of ordinary language. This project exemplifies my view of computational linguistics, which is not as above. Consider for a moment what the various "computational sciences" amount to. Computational biology means using computer models to simulate biological systems and extract answers to questions of biology. Computational fluid dynamics (CFD) involves using computer models to simulate fluid dynamical systems and extract relevant answers—you get the idea. CFD is a favorite example because it is a relatively simple theory that has been plagued by a long history of computational obstacles.

The theory of CFD pretty much amounts to systems of equations that were first derived in the nineteenth century, now called the Navier-Stokes equations. For decades it has been thought, correctly, that if you want an answer to a fluid dynamical question, simply solve the Navier-Stokes equations. This last step proved to be very sticky since these equations can only be solved numerically (save a few special cases), and decades of work have been required to figure out decent methods for doing it; we are in some cases still waiting for sufficient computational power to get the answers we really want. In my view, computational linguistics can be like that—a real computational science in which the primary activities are the construction of mathematical and computational models of human language, and then undertaking efforts to solve the "equations." I have found in my work, some of which is presented here, that the formulation of a good theory for modeling language is just the first important step in a long series of sticky problems, like how to compute the model in a reasonable time.

The work herein is largely limited to the aforementioned first step, and it is thus a contribution to that unsung subfield known as "the mathematics of language." A mathematical model is presented for certain aspects of language and its acquisition that is fitted with a computational model for solving the linguistic equations, as it were. Unfortunately, an adequate computational methodology for extracting answers in practical cases has not yet been developed, so the results here are strictly theoretical. This is not a downfall at this stage; after all, the Navier-Stokes equations were once "strictly theoretical," too.

I have undertaken to report my work and my research in this book without much regard for who the "intended audience" might be. I suppose it can be somewhat circularly defined as "those people who find it interesting," and then I will have pleased my intended audience. But I suggest that such people are likely to be graduate students and researchers in mathematical linguistics, language learnability, and formal aspects of computational linguistics.

At times, the material presented here is quite formalized and tedious. The main reason for doing this is the desire to provide other researchers with exact details of the algorithms developed, and of the functions which the algorithms compute. I found, in my own reading when I was first learning the background for this project, that literature that was not tediously formalized read breezily but left me without a complete understanding.

I hope that there are many readers who can appreciate this project, although I realize that it will not be very accessible to most linguists in spite of my efforts to present most ideas from first principles. I thus also hope that the presentation of so many things from first principles does not too much bore those readers who are already well-versed in the background areas. Perhaps they will find in these presentations some valuable expository material for the classroom.

The production of this book has been carried out by the author, including typing, typesetting, indexing, and cover design. It has been typed and edited with GNU Emacs, and typeset using the LATEX system, with the Baskerville font family from Micropress, Inc. The final assembly was performed by the technical staff at Trafford.

This is version 1.0 (beta) Chicago, Illinois, April 2004