word2vec 基于 gensim 包的实现以及 预训练模型的再训练

先占个坑,基于 Python gensim模块进行word2vec的训练相对容易,在此基础上根据选择相应的预训练的word2vec 向量,基于自有数据的再训练更符合实际应用。

五一的时候写一写

以及 doc2vec

先用个猫片占坑~

其中遇到的问题先补上:

1. 将模型部署到环境上时出现如下异常:

Traceback (most recent call last):
File "/home/ubuntu/python/word2vec/eval_model.py", line 150, in
model = Word2Vec.load(model_path)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 1141, in load
model = super(Word2Vec, cls).load(*args, *
kwargs) File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 1230, in load model = super(BaseWordEmbeddingsModel, cls).load(args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 602, in load
return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/gensim/utils.py", line 435, in load
obj = unpickle(fname)
File "/usr/local/lib/python3.6/dist-packages/gensim/utils.py", line 1398, in unpickle
return _pickle.load(f, encoding='latin1')
ModuleNotFoundError: No module named 'numpy.random._pickle'

原因numpy 版本不一致, https://github.com/RaRe-Technologies/gensim/issues/2602

解决办法: 查看模型保存时使用的numpy版本:

import numpy
numpy.__version
__

升级至指定版本:pip3 install --upgrade numpy==1.17.0

2. 缺少bz2:

File “/usr/local/python3/lib/python3.7/bz2.py”, line 19, in
from _bz2 import BZ2Compressor, BZ2Decompressor
ModuleNotFoundError: No module named ‘_bz2’

解决办法https://stackoverflow.com/questions/12806122/missing-python-bz2-module

sudo yum install bzip2-devel
sudo ln -s `find /usr/lib64/ -type f -name "libbz2.so.1*"` /usr/lib64/libbz2.so.1.0

我的博客即将同步至腾讯云+社区,邀请大家一同入驻:https://cloud.tencent.com/developer/support-plan?invite_code=3gponstn3dk4w

发表评论

电子邮件地址不会被公开。 必填项已用*标注