Some of the highlights from Google blogs
- Spell corrections: We recently launched spell corrections in Estonian. If your Estonian is rusty, and you don't remember how to spell "smoke detector," we can suggest a spell correction for [suitsuantur], leading to better search results.
- Diacritical marks: Many languages have diacritical marks, which alter pronunciation. Our algorithms are built to support them, and even help users who mis-type or completely ignore them. For example, if you're a resident of Quebec, Canada and would like to know the weather forecast in Quebec City, we'll serve good results whether you type with diacritical signs [Météo à Québec] or without [meteo quebec]. Czech users can read the same excellent results for a popular kids' cartoon by searching for [krtecek] and [krteček]. On the other hand, sometimes diacriticals change the meaning of the word and we have to use them correctly. For example, in Thai, [ข้าว] is "rice," with completely different results than [ข่าว], which is "news"; or in Slovakia, results for "child" [dieťa] are different than results for "diet" [diéta].
- Synonyms: A general case of diacritical support is the handling of synonyms in different languages. Korean searches showed that "samsung" can be viewed as a synonym of "삼성", so that when users search for [samsung], they find results which have the company's name in Korean.
- Compounding: Some languages allow compounding, which is the formation of new words by combining together existing words. You can see a nice example in Swedish, where we return documents about a Swedish credit card for both compounded [Visakort] and non-compounded [visa kort] queries.
- Stemming: Google has developed morphological models that can receive compound words as queries, and return pages which contain their stem, possibly as part of a different compound. For example, when searching for cars in Saudi Arabia, you can search for [سيارة] and [سيارات] because both are variants of the same stem, and both return many common results. A Polish user can search for "movie" [film], and get back results that contain other variants of the stem, such as "filmów," "filmu," "filmie," "filmy." A user from Belarus will find results for all word forms of the capital, Minsk [Мінск]: "Мінску," "Мінска," "Мінскага."
No comments:
Post a Comment