缅甸邀请函 for U Myint SHWE

时间：2024.4.13

August 01, 2013

Number

Subject

U MYINT SHWE (Passport: M 369475),

It’s our great honor to invite you to visit ANHUI LONGPING HIGH-TECH SEEDS CO., LTD. registered at 533#, WEST WANGJIANG ROAD, HEFEI, ANHUI, CHINA, in August, 2013. This visit will provide an opportunity for you to make a better understanding of our company, and to communicate our future business cooperation in details.

ANHUI LONGPING HIGH-TECH SEEDS CO., LTD., as one of the backbone seeds companies in China, has great contribution to the developing of hybrid rice in Myanmar. We believe this visit will be of great benefit to our future business cooperation.

Please use this invitation letter to apply for your VISA to China.

We are all looking forward to seeing you soon, and should you have any questions, please feel free to inform me.

Yours truly,

ZHIMIN GENG

Vice General Manager

ANHUI LONGPING HIGH-TECH SEEDS CO., LTD.

:20130801001 : Letter of Invitation

第二篇：Selection of XML tag set for Myanmar National Corpus

The 6th Workshop on Asian Languae Resources, 2008

Selection of XML tag set for Myanmar National Corpus

Wunna Ko Ko AWZAR Co.

Mayangone Township, Yangon,

Myanmar

wunnakoko@gmail.com

Thin Zar Phyo

Myanmar Unicode and NLP Research

Center

Myanmar Info-Tech, Hlaing Campus,

Yangon, Myanmar

myanmar.nlp5@gmail.com

Abstract

In this paper, the authors mainly describe about the selections of XML tag set for Myanmar National Corpus (MNC). MNC will be a sentence level annotated corpus. The validity of XML tag set has been tested by manually tagging the sample data. Keywords: Corpus, XML, Myanmar, Myanmar Languages

1 Introduction

Myanmar (formerly known as Burma) is one of the South-East Asian countries. There are 135 ethnic groups living in Myanmar. These ethnic groups speak more than one language and use different scripts to present their respective languages. There are a total of 109 languages spoken by the people living in Myanmar [Ethnologue, 2005].

There are seven major languages, according to the speaking population in Myanmar. They are Kachin, Kayin/Karen, Chin, Mon, Burmese, Rakhine and Shan [Ko Ko & Mikami, 2005]. Among them, Burmese is the official language and spoken by about 69% of the population as their mother tongue [Ministry of Immigration and Population, 1995].

Corpus is a large and structured set of texts. They are used to do statistical analysis, checking occurrences or validating linguistic rules on a specific universe.1

In Myanmar, there are a plenty of text for most of the languages, especially Burmese and major languages, since stone inscription.

Myanmar Language Commission and a number of scholars had been collected a number of corpora for their specific uses [Htay et al., 2006]. But there is no national corpus collection, both in digital and non-digital format, until now.

Since there are a number of languages used in Myanmar, the national level corpus to be built will include all languages and scripts used in Myanmar. It has been named as Myanmar National Corpus or MNC, in short form. During the discussion for the selection of format for the corpus, XML (eXtensible Markup Language), a subset of SGML (Standard Generalized Markup Language), format has been chosen since XML format can be a long usable and possible to keep the original format of the text [Burnard. 1996]. The range of software available for XML is increasing day by day. Certainly more and more NLP related tools and resources are produced in it. This in turn makes the necessity of selection of XML tag set to start building of MNC. MNC will include not only written text but also spoken texts. The part of written text will include regional and national newspapers and periodicals, journals and interests, academic books, fictions, memoranda, essays, etc. The part of spoken text will include scripted formal and informal conversations, movies, etc.

During the selection of XML tag sets, the sample for all the data which will be included in building of MNC, has been learnt.

2 Myanmar National Corpus

Myanmar is a country of using 109 different languages and a number of different scripts [Ethnologue, 2005]. In order to do language processing for these languages and scripts, it becomes a necessity to build a corpus with

The 6th Workshop on Asian Languae Resources, 2008

Since MNC is to be built in XML based format, the selection process for tag set of XML become an important process. The XML tagged corpus data

2.1 XML based corpus

should also keep the original format of the data. In order to select XML tag set for MNC, the XML is universal format for structured documents

and data, and can provide highly standardized sample data for the corpus has to be collected. The representation frameworks for NLP (Jin-Dong format of the sample corpus data has been studied KIM et al. 2001); especially, the ones with for the selection of the XML tag set in appropriate annotated corpus based approaches, by providing with the data format. them with the knowledge representation

2.2 Structure of a data file at MNC frameworks for morphological, syntactic,

semantics and/or pragmatics information structure. The structure of a data file at MNC will include Important features are: two main parts: information of the corpus file and

? XML is extensible and it does not consist the corpus data.

of a fixed set of tags. The first part, the header part of a corpus file,

describes the information of a corpus file. The

? XML documents must be well-formed

information of the corpus file includes the header

according to a defined syntax.

which will provide sensible use of the corpus

? XML document can be formally validated information in machine readable form. In this part,

the information such as language usage and the against a schema of some kind.

description of the corpus file will be included.

? XML is more interested in the meaning of The second part, the document part, of a corpus

data than its presentation. file will include the source description of the The XML documents must have exactly one corpus data and the corpus data, the written or top-level element or root element. All other spoken part of the text, itself. The information of elements must be nested within it. Elements must the corpus data such as bibliographic information, be properly nested [Young, 2001]. That is, if an authorship, and publisher information will be element starts within another element, it must also included in this section. Moreover, the corpus data

itself will also be included in this section. end within that same element.

The hierarchically structure of a corpus file at Each element must have both a start-tag and an

end-tag. The element type name in a start-tag must MNC will be as shown in figure 1. exactly match the name in the corresponding end-tag and element name are case sensitive.

Moreover, the advantages of XML for NLP includes ontology extraction into XML based structured languages using XML Schema. The

languages and scripts used in Myanmar; at least with major languages and scripts, which will include almost all areas of documents.

Among the different scripts used in Myanmar, the popular scripts include Burmese script (a Brahmi based script), Latin scripts. Building of MNC will be helpful for development of Natural Language Processing (NLP) tools (such as grammar rules, spelling checking, etc) and also for linguistic research on these languages and scripts. Moreover, since Burmese script is written without necessarily pausing between words with spaces, the corpus to be built is hoped to be useful for developing tools for automatic word segmentation.

great benefit about XML is that the document itself describes the structure of data. 2

Three characteristics of XML distinguish from other markup languages:3

? its emphasis on descriptive rather than

procedural markup; ? its notion of documents as instances of a

document type and ? its independence of any hardware or

software system.

The 6th Workshop on Asian Languae Resources, 2008

Figure 1. Hierarchically structure of a data file at MNC

for the text encoding and Interchange6. TEI 3 Selection of necessary XML tag set encoding scheme consists of a number of rules After studying original formats and features of with which the document has to adhere in order texts, to be used in corpus, and the structure of to be accepted as a TEI document. the corpus file has been determined, the This header part contains language usage of selection procedure for XML tag set has been the data file <langUsage> and the file started. description <fileDesc> which includes machine 4British National Corpus (BNC), American readable information of the data file. 5National Corpus (ANC) had been referenced -<mnc> for selection of XML tag set. -<teiHeader> The selection of XML tag set is based on the +<langUsage></langUsage> nature of the structure of a data file. The main +<fileDesc></fileDesc> tag for the data file will be named as <mnc> </teiHeader> which is the abbreviation of Myanmar National +<myaDoc></myaDoc> Corpus. </mnc> A data file contains two main parts, the Figure 3. Element and Child tags of MNC header part and the document part. -<mnc> The language usage part contains such +<teiHeader></teiHeader> information as language name <langName>, +<myaDoc></myaDoc> script information <script>, International </mnc> Organization for Standardization (ISO) code Figure 2. Root and element tags of MNC number <ISO>, encoding information <encodingDesc> and version of encoding 3.1 Header Part <version>. The XML tag for the header part of the corpus -<mnc> data file is named as <teiHeader>. Text -<teiHeader> Encoding Initiative (TEI) published guidelines <langUsage>

Computing Services, Oxford. 5 Nancy Ide and Keith Suderman. 2003. The American

National Corpus, first Release. Vassar College,

Poughkeepsie, USA 6 TEI Consortium. 2001, 2002 and 2004 Text Encoding Initiative. In The XML Version of the TEI Guidelines.

SelectionofXMLtagsetforMyanmarNationalCorpus

The 6th Workshop on Asian Languae Resources, 2008

+<fileDesc></fileDesc> </teiHeader>

+<myaDoc></myaDoc> </mnc>

Figure 4. 2 level Child tags in language Usage part of MNC

The file description part contains such information as title information of the corpus file <titleStmt>, edition information <editionStmt> and publication information about the corpus file <publicationStmt>. The detail information will be tagged using more specific lower level child tags under the previously described tags. -<mnc>

-<teiHeader>

+<langUsage></langUsage> -<fileDesc>

+<titleStmt></titleStmt>

+<editionStmt></editionStmt>

+<publicationStmt></publicationStmt> </fileDesc> </teiHeader>

+<myaDoc></myaDoc> </mnc>

Figure 5. 2 level Child tags in file description part of MNC 3.2

Document Part

The XML tag for the document part of the corpus data file is named as <myaDoc> which is the short form of Myanmar Document. It contains two sub parts: the source description of the data <sourceDesc> and the original data itself which in turn can be divided into two types; written text <wtext> and the spoken text <stext>. <mnc>

+<teiHeader></teiHeader> -<myaDoc>

+<sourceDesc></sourceDesc> +<wtext></wtext> </myaDoc> </mnc>

Figure 6. Element and Child tags of MNC

The first part, the source description part of the data <sourceDesc>, will contain the

bibliographic information, such as title, name of author, publisher, etc., of the original data. <mnc>

+<teiHeader></teiHeader> -<myaDoc> -<sourceDesc> -<bibl>

Figure 7. 2 level Child tags for source description part of MNC

The second part, the original data part <wtext> or <stext> will contain the whole original data. The original format information such as heading <head type=”MAIN”>, sub-heading <head type=”SUB”>, paragraph number <paragraph n=”1”>, sentence number <s n=”1”> will be saved in this part. <mnc>

+<teiHeader></teiHeader> -<myaDoc>

+<sourceDesc></sourceDesc> -<wtext> -<head>

<s></s>

+<paragraph></paragraph>

+<head></head> </head> </wtext> </myaDoc> </mnc>

Figure 8. 2 level Child tags for original data part of MNC

Since MNC is going to be annotated in sentence level, each sentence will be annotated and numbered.

The 6th Workshop on Asian Languae Resources, 2008

3.3 Sample MNC data file <mnc>

+<teiHeader></teiHeader> The Myanmar National Corpus is a major -<myaDoc> resource for linguistic research, as well as +<sourceDesc></sourceDesc> computational linguistics research, lexicography, -<wtext> corpus linguistic research and a resource for the -<head> development of Myanmar Language teaching <s></s> material because we expect the corpus to be -<paragraph> continually expanded in the future. -<s></s> A sample MNC data is use the Universal </paragraph> Declaration of Human Rights (UDHR) texts in </head> Burmese and Karen, which is one of the major +<head></head> languages in Myanmar, has been used to sample </wtext> tagging with the selected XML tag set. </myaDoc> The following figure is show for the sample </mnc> MNC. Figure 9. Down to the sentence level Child

tags of MNC

<? xml version="1.0"?>

<mnc>

-<teiHeader>

-<langUsage>

<langName> Myanmar </langName>

<version>Unicode 5.0</version> </langUsage>

-<fileDesc>

-<titleStmt>

<title>Myanmar National Corpus</title>

-<respStmt>

<resp>Corpus built by</resp> <name>Myanmar NLP Team</name> </respStmt>

</titleStmt>

-<editionStmt>

<edition> First TEI-conformant version </edition> <extent/>

</editionStmt>

-<publicationStmt> <address>Myanmar Info-Tech, Yangon, Myanmar</address> <availability status="restricted">

Availability limited to Myanmar NLP Team

</availability>

-<creation>

</creation>

<distributor>Myanmar NLP Team </distributor>

The 6th Workshop on Asian Languae Resources, 2008

</publicationStmt>

</fileDesc>

</teiHeader>

-<myaDoc xml:id="TEXTS">

-<sourceDesc>

-<bibl>

<title>

?????????????????????????????????????

(meaning: Universal Declaration of Human Rights)

</title>

-<imprint vol="64" n="46">

</imprint>

</bibl>

</sourceDesc>

-<wtext type="OTHERPUB">

-<head type="MAIN">

?????????????????????????????????????

(meaning: Universal Declaration of Human Rights)

</s>

+<paragraph n="1"></paragraph>

-<head type=”SUB”>

<s n="1"> ???????? (meaning: Preamble) </s>

-<paragraph n="1">

-<s n="1">

?????????????????????????????????????????????????????????????????????

???????????????????????????????????????????????????????????????????????????? ???????????????????????????? …….

(meaning: Whereas recognition of the inherent dignity and of the equal and

inalienable rights of all members of the human family is the foundation of

freedom, justice and peace in the world,…….)

</s>

</paragraph>

+<paragraph n=”2”></paragraph>

</head>

-<head type="SUB">

<s n="2"> ???? ? (meaning: paragraph 1) </s>

-<paragraph n="1">

??????????????????????????????????????????????????????????????? ?????????????????????????????????????????????

(meaning: All human beings are born free and equal in dignity and rights.)

The 6th Workshop on Asian Languae Resources, 2008

</s>

????????????????????????????????????????????????????????????????

???????????????????????????????????

(meaning: They are endowed with reason and conscience and should act towards

one another in a spirit of brotherhood.)

</s>

</paragraph> -<head type="SUB"> <s n="3"> ???? ? (meaning: paragraph 2) </s> +<paragraph n="1"></paragraph> +<paragraph n="2"></paragraph> </head> -<head type="SUB">

<s n="4"> ???? ? (meaning: paragraph 2) </s> -<paragraph n="1"> <s n="1"> ?????????????????????????????????????????? (meaning: Everyone has the right to life, liberty and security of person.) </s> </paragraph> </head>

+<head type=”SUB”></head>

+<head type=”SUB”></head> </head> </wtext> </myaDoc> </mnc>

Figure 10. Sample MNC Corpus file (Burmese UDHR text in MNC XML format)

next step is to develop an algorithm for

4 Conclusion and Future work automatic tagging the data. In this paper, the authors have clearly described about the selection of XML tag set for building of MNC. Since the word level segmentation for Burmese script is not yet available, the corpus data will be annotated only up to the sentence level in order to be in the same format for all Myanmar languages and scripts.

In order to check whether the selected the XML tag set will be enough and useful for tagging the corpus data, the sample corpus data has been collected by manually tagging the data which includes newspapers and periodicals, Universal Declaration of Human Rights (UDHR), novels and essays.

Since the manual tagging to the sample corpus data proves that the selected XML tag set is enough to cover a variety of data sources, the

Acknowledgement

This study was performed with the support of the Government of the Union of Myanmar through Myanmar Natural Language Implementation Committee. Thanks and gratitude towards the members of Myanmar Language Commission for providing necessary information to write this paper.

References

Ethnologue. 2005 Languages of the World, 15th

Edition, Dallas, Tex.: SIL International. Online version: /. Edited by Raymond G. Gordon, Jr.

The 6th Workshop on Asian Languae Resources, 2008

Hla Hla Htay, G. Bharadwaja Kumar and Kavi N. Murthy. 2006. Constructing English-Myanmar Parallel Corpora. The Fourth International Conference on Computer Application 2006 (ICCA 2006) Conference Program.

Jin-Dong KIM, Tomoko OHTA, Yuka TATEISI, Hideki MIMA and Jun’ichi TSUJII. 2001. XML-based Linguistic Annotation of Corpus . In the Proceedings of the first NLP and XML Workshop held at NLPRS 2001. pp. 47--53.

Lou Burnard. 1996. Using SGML for Linguistic Analysis: the case of the BNC. ACM Vol 1 Issue 2 (Spring 1999) MIT Press ISSN: 1099-6621. pp. 31-51.

Michaek J. Young. 2001. Step by Step XML.Prentice Hall of India Private Limited Press. ISBN-81-203-1804-B

Ministry of Immigration and Population. 1995. Myanmar Population Changes and Fertility Survey 1991. Immigration and Population Department

Wunna Ko Ko, Yoshiki Mikami. 2005 Languages of Myanmar in Cyberspace, In Proceedings of TALN & RECITAL 2005 (NLP for Under-Resourced Languages Workshop), Dourdan, FRANCE, 2005 June, pp. 269-278.

更多相关推荐：

邀请函范本6个: 邀请函范本1尊敬的_____Dear______为感谢您及贵公司对我们长期以来的支持与厚爱，我们将在装修一新25楼空中酒廊举办商务客户答谢会，尽情期待您的光临！******年月日邀请函范本2__________…
活动邀请函格式范文: 活动邀请函格式范文1针对公务人员/单位邀请函：邀请函又叫请柬，也称请帖，是单位、团体或个人邀请有关人员出席隆重的会议、典礼，参加某些重大活动时发出的礼仪性书信。它不仅表示礼貌庄重，也有凭证作用。1格式要求。请柬…
商务邀请函范文: 商务邀请函范文，内容附图。
邀请函范文参考: 接口与转化：从前沿语言学理论到汉语国际教育应用——汉语国际教育语境下的句式研究与教学专题研讨会邀请函尊敬的先生/女士：为进一步满足第二语言教学对汉语句式研究的迫切需要，促进语法研究新成果向国际汉语教学应用的转化…
会议邀请函: 尊敬的领导为号召全体同学以积极向上的精神面貌投身到学习当中同时进一步丰富校园文化生活营造活跃的校园文化氛围提高大学生的艺术设计水平和动手创新能力我院团委承办了第三届狮山大学园区文化艺术节系列之风筝大赛策吧网此次...
最新邀请函格式与模板大全: 邀请函格式在一般情况下邀请有正式与非正式之分非正式的邀请通常是以口头形式来表现的相对而言它显得要随便一些正式的邀请既讲究礼仪又要设法使被邀请者备忘故此它多采用书面的形式即洛阳礼仪活动邀请函的形式那么什么是礼仪活...
邀请函范文: 邀请函范文学院毕业生校园招聘会邀请函尊敬的用人单位：首先感谢贵单位长期以来对xx政法职业学院就业工作的大力支持和帮助。我院是一所具有xx年历史的高等院校，隶属于xx省委政法委。前身是xx省建设学院，成立于19x…

20xx年会赞助邀请函: 赞助邀请函尊敬的供应商您好新年快乐感谢贵公司一直以来给予我司的大力支持与配合这一切使得我公司得以健康发展我们非常感激在新的一年里希望我们的友好合作关系更进一步互惠共赢共创辉煌展望未来前程似锦面对挑战我们信心百倍...
英文邀请函格式: AFORMALVISAINVITATION格式公司抬头TOATTENTIONAFORMALVISAINVITATIONDearSirorMadamWeareverypleasedtoinviteMrcometo...
邀请函格式及说明: 邀请函格式及说明在一般情况下邀请有正式与非正式之分非正式的邀请通常是以口头形式来表现的相对而言它显得要随便一些正式的邀请既讲究礼仪又要设法使被邀请者备忘故此它多采用书面的形式即洛阳礼仪活动邀请函的形式那么什么是...
生日派对邀请函: 生日PARTY邀请函这个星期天是我的生日我要在我家举办一个聚会作为我最好的朋友我希望你能够参加我觉得那天你应该是有空的宴会上会有蛋糕饼干糖果蛋糕等顺便说一下晚会是七点开始但是我希望你能来早点帮我准备有关聚会的事...
学术会议邀请函范本: 学术会议邀请函范本亲爱的xxx教授您好我们定于xxxx年xx月xx日在xx大学举行xxxx学术会议想邀请您届时参加会议免费安排食宿往返机票自理如果您有要宣读的论文或要发言的论题请尽早来函告知以便会议安排希望您届...

热门关注