.NET Transliteration API Project

in Uncategorized

I’m thinking of initializing an .NET API project with the goal to provide Transliteration Services within the .NET framework. The class libraries are basically already done (see details below), but there’s still a lot of work to do.

Who wants to help on this project? I need people having background information about languages that can be transliterated from and into the latin alphabet and .NET developers helping to build such a framework for .NET.

This is my idea of implementing  Transliteration services in .NET:

1. First, we need a Transliteration character mapping table, that we later can use as our source. For this reason, create an UTF-8 encoded XML file with the following structure:


<?xml version="1.0" encoding="utf-8" ?>
<Transliteration Name="ISO9-1995">
<Chars IsUpper="true">
<Char>
<Source>Ә</Source>
<Destination>A̋</Destination>

</Char>

</Chars>
<Chars IsUpper="false">
<Char>
<Source>а</Source>
<Destination>a</Destination>
</Char>

</Chars>
</Transliteration>

Please note, in this example there are 2 Chars-Elements: one with IsUpper=”true” and the other with IsUpper=”false”. I’m not sure if those attributes will be relevant later in our project, but we will see.
The name property contains the Transliteration name, in this case ISO9:1995 is the source of the character set.

2. Create a new Visual Studio C# class library project and name it “Transliterations”
3. Create an Transliteration XML-File (with the structure above) and place in your characters of the language you want to transliterate. The source value contains the character to search for, the destination value contains the character to replace with.
3. Save the XML-File in your Visual Studio project in the subfolder “Transliteration” with the filename according to a “standard”, e.g. Hindi
4. Create a new Class “Transliteration” and Copy the following code into the newly created class:


using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Reflection;
using System.Xml;

namespace urmanet.ch.Transliteration
{
public class Transliterator
{
/// <summary>
///
/// </summary>
private static SortedList<TransliterationType, Transliterator> _instances;
private Hashtable _charValuePairs;
private string _name;

/// <summary>
/// Private constructor
/// </summary>
private Transliterator(){}

/// <summary>
/// Private constructor
/// </summary>
/// <param name="parTransliterationType">Type of transliteration</param>
private Transliterator(TransliterationType parTransliterationType)
{
CharValuePairs = new Hashtable();

string myEmbeddedResourceName = string.Format(
"{0}.Transliteration.{1}.xml",GetType().Namespace, parTransliterationType);

Stream myResourceStream = Assembly.GetExecutingAssembly().GetManifestResourceStream(
myEmbeddedResourceName);

if (myResourceStream == null)
throw new Exception("The given transliteration source file could not be found.");

XmlDocument myXmlDocument = new XmlDocument();
using (StreamReader myStreamReader = new StreamReader(myResourceStream))
{
myXmlDocument.LoadXml(myStreamReader.ReadToEnd());
XmlNodeList myCharNodes = myXmlDocument.SelectNodes("//Chars/Char");

if (myCharNodes == null)
throw new Exception("There were noch char pairs found in the given literation file.");

foreach (XmlNode myCharNode in myCharNodes)
{
CharValuePairs.Add(
myCharNode["Source"].InnerText,
myCharNode["Destination"].InnerText);
}

_name = myXmlDocument.SelectSingleNode("//Transliteration").Attributes["Name"].InnerText;
}
}

/// <summary>
///
/// </summary>
/// <param name="parTransliterationType">Type of transliteration</param>
/// <returns></returns>
public static Transliterator GetTransliterator(TransliterationType parTransliterationType)
{
if (_instances == null)
_instances = new SortedList<TransliterationType, Transliterator>();

if (!_instances.ContainsKey(parTransliterationType))
{
_instances.Add(
parTransliterationType,
new Transliterator(parTransliterationType));
}

return _instances[parTransliterationType];
}

/// <summary>
///
/// </summary>
public Hashtable CharValuePairs
{
protected set { _charValuePairs = value; }
get { return _charValuePairs; }
}

/// <summary>
///
/// </summary>
/// <param name="parString"></param>
/// <returns></returns>
public string Transliterate(string parString)
{
string myReturnValue = parString;
char[] myChars = parString.ToCharArray();
foreach (char myChar in myChars)
{
string myKey = myChar.ToString();
if (CharValuePairs.ContainsKey(myKey))
{
myReturnValue = myReturnValue.Replace(
myChar.ToString(), CharValuePairs[myKey].ToString());
}
}

return myReturnValue;
}

/// <summary>
///
/// </summary>
public string Name
{
private set { _name = value; }
get { return _name; }
}
}
}

Now, you need to extend the enumeration TransliterationType with the name of your created Transliteration xml file.


namespace urmanet.ch.Transliteration
{
public enum TransliterationType
{
ISO91995,
Hindi
}
}

And that’s the way how the Transliteration classes are used within .NET:

Transliterator myTransliterator = Transliterator.GetTransliterator(TransliterationType.ISO91995);
string mySourceText = "Абыйская Низменность";
string myLatinText1 = myTransliterator.Transliterate(mySourceText);
1 Comment

One Comment

  1. Serge Axenov

    Hello,

    the project seems to be very interesting, and I think I could participate. I am acquainted with the phonetic system of a number of languages which use non-latin scripts, and I do some programming, also in C#.

    Best wishes,
    Serge

Leave a Reply

Using Gravatars in the comments - get your own and be recognized!

XHTML: These are some of the tags you can use: <a href=""> <b> <blockquote> <code> <em> <i> <strike> <strong>