Using FastText to implement language identification
使用 fastText 实现语言识别
介绍
fastText是由Facebook AI研究室(FAIR)在 2016 年开源的一个高效、轻量级的深度学习库,专门用于文本分类和词表示学习(词向量生成)。其官方提供了包含 157 种语言的预训练模型,所以我们可以用于语言识别。
实现步骤
按顺序安装依赖:
1
2
3conda create -n fasttext python=3.10
conda activate fasttext
pip install numpy==1.26.4 fasttext fastapi fasttext由于
numpy版本较低,建议使用conda环境隔离。下载官方模型:
1
wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin
编写
web服务:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55import fasttext
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
MODEL_PATH = "lid.176.bin"
app = FastAPI(
title="Language Detection Service",
description="Language detection based on fastText lid.176",
version="1.0.0",
)
# -------- 模型全局加载(只加载一次) --------
try:
model = fasttext.load_model(MODEL_PATH)
except Exception as e:
raise RuntimeError(f"Failed to load fastText model: {e}")
# -------- 请求 / 响应模型 --------
class DetectRequest(BaseModel):
text: str
class DetectResponse(BaseModel):
language: str
confidence: float
# -------- 核心逻辑 --------
def detect_language(text: str):
labels, probs = model.predict(text, k=1)
lang = labels[0].replace("__label__", "")
confidence = float(probs[0])
return lang, confidence
# -------- API 接口 --------
def detect(req: DetectRequest):
if not req.text or not req.text.strip():
raise HTTPException(status_code=400, detail="text must not be empty")
lang, conf = detect_language(req.text)
return DetectResponse(
language=lang,
confidence=conf,
)
# -------- 健康检查 --------
def health():
return {"status": "ok"}运行
web服务:1
uvicorn app:app --host 0.0.0.0 --port 8000
注意:服务代码文件以
app.py命名。测试
web服务:1
2
3curl -X POST "http://localhost:8000/detect" \
-H "Content-Type: application/json" \
-d '{"text": "我想去广州南站。"}'输出:
1
{"language":"zh","confidence":0.9967644214630127}
识别语种
目前对于以下语言的识别准确度较高:
1
2
3
4
5
6
7
8
9
10Chinese -> zh, confidence=0.997, text=我想去广州南站。
English -> en, confidence=0.962, text=I want to go to Guangzhou South Railway Station.
German -> de, confidence=1.000, text=Ich möchte zum Südbahnhof Guangzhou fahren.
Italian -> it, confidence=0.996, text=Voglio andare alla stazione ferroviaria di Guangzhou Sud.
Portuguese -> pt, confidence=0.966, text=Quero ir para a Estação Ferroviária Sul de Guangzhou.
Spanish -> es, confidence=0.981, text=Quiero ir a la estación de tren sur de Guangzhou.
Japanese -> ja, confidence=1.000, text=広州南駅に行きたいです。
Korean -> ko, confidence=1.000, text=저는 광저우 남역에 가고 싶습니다.
French -> fr, confidence=0.997, text=Je veux aller à la gare de Guangzhou Sud.
Russian -> ru, confidence=0.998, text=Я хочу поехать на Южный железнодорожный вокзал Гуанчжоу.
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来源 后端学习手记!











