Ideas from Dave Inman

Welcome to Dave Inman's Project ideas for 2011/12


There are a few ideas below, and a bigger information retrieval project on MUSE at the end. If you are interested please send me e-mail (from your LSBU e-mail account only - I filter mail fiercely to remove Spam!) giving:

  1. The project you are interested in, and why!
  2. What ideas you have for the project
  3. Your results in units taken so far (so I can match your strengths to the project)

Text entry for language learners

Imagine you are writing an e-mail to a friend who has a basic grasp of English. How can you know if what you type is too difficult for them to understand? You might use vocabulary that is too hard for them, or maybe the style is too hard to follow. Perhaps the sentences are too long.

It would be great to have some advice as you enter text. You would specify the target reader (Beginner, Lower Intermediate, Intermediate etc...) and just type. It would warn you whenever a potential problem arose, and offer some suggestions. An easier word of similar meaning perhaps. Or suggest that you split a sentence into smaller parts. Or write in a simpler style.

You can see a free Thesaurus (to see words of similar meanings) at:
http://wordnet.princeton.edu/

and lists of words that English speakers might be expected to know at:
http://jbauman.com/aboutgsl.html

or
http://www.vuw.ac.nz/lals/research/awl/awlinfo.html

or
http://www.english-zone.com/reading/dolch.html

or
http://esl.about.com/library/vocabulary/bl1000_list1.htm

or look at an article such as
http://www1.harenet.ne.jp/~waring/papers/cup.html


Semantic Web

When you search the web using a search engine, you search for words, not meanings usually. We humans can see which 'hits' are good because we understand the meaning of a web page. One future direction of web search is to try and make search engines more like us by incorporating meaning into searches (the semantic web).

Two methods have been proposed.

  1. Get computers to 'understand' a web page. Natural language processing (NLP) would be used but the track record is not good. Some (including myself) would argue that there are such fundamental problems here that it may be decades if not more before real progress might be made.

  2. Ask real humans (e.g. the author of the web page) to add tags (e.g. XML) to explicitly say what the meaning of web pages are. An author ought to know the meaning of their web pages but do they know XML? This route requires a lot of effort from an author who may take weeks to learn how to do this. Many will not have that sort of time available, so it is less likely to be done.

As you can see both have flaws. I propose here a radical alternative, that if successful, would revolutionise the way we search for information. It uses greed as the main driver! And offers some help to satisfy that greed!

The basic idea is that motivation would come from higher web search listings for 'semantically tagged' web pages. Search engine optimisation (SEO) is big business, and attempts to get your web page listed higher up. For a business this is crucial. Unless your web page appears within the first few hits, most users won't click on your link. You can pay of course, but why not appeal to human greed. Tag your page with semantic tags (e.g. XML) and you will get a better ranking for free.

So much for motivation, but what if you don't know how to do this? This project would look at building a toolkit to help such users. The toolkit would be friendly, easy to use and would take an untagged web page, and working together with the author using plain simple questions, would endeavor to produce really useful semantic tags. The toolkit would 'know' about XML, and know how to spot ambiguity and other potential problems. It would not attempt to produce the tags alone (except for the simplest of web pages perhaps) as this has not been possible yet, and may take decades. Instead it uses collaboration between those who know the meaning of web pages (the author) and those who know XML (the toolkit) to do the job.

 


Multi-lingual User-centred Search Engine (MUSE)

MUSE is an exciting new research project within the Natural Language Processing (NLP) group.

In essence we aim to build a better search engine to find information in any language on the Web. It will take account of the interests of the user, and their language ability. It will give more hits relevant to the user, we hope. There is so much information out on the Web now that this is badly needed.

This page describes the MUSE project. The NLP group will supervise you if you demonstrate an interest and commitment to this project. To do that you must:

  1. Read through the project description, look at the tasks and find a task or tasks of interest to you.
  2. Look at the references given against each task as a starting point.
  3. Read around the subject.
  4. Produce about a half page annotation for the references that you feel are really important.
  5. Write about a page explaining why the task/s interest you, what you want to contribute to the tasks, and how the tasks are important for the MUSE project.
  6. Draw up a rough contents page for an MSc project based on these tasks
  7. Print the above and hand it to the office marked for Dave Inman NLP Group: MSc Project proposal for Task/s  (give task/s numbers here)

The NLP group will review all proposals, and decide how best we can supervise projects. Please be patient - we will get back to you as soon as we can.

Working on this project will be hard, but we hope stimulating and rewarding.


a (Muse) (in Greek and Roman mythology) each of nine goddesses, the daughters of Zeus and Mnemosyne, who inspire poetry, music, drama, etc. b a source of inspiration for creativity. Source Oxford English Dictionary.


©David Inman  hits