Deep Learning Chinese Word Segment

Overview

引用 

  本项目模型BiLSTM+CRF参考论文:http://www.aclweb.org/anthology/N16-1030 ,IDCNN+CRF参考论文:https://arxiv.org/abs/1702.02098

构建

  1. 安装好bazel代码构建工具,安装好tensorflow(目前本项目需要tf 1.0.0alpha版本以上)

  2. 切换到本项目代码目录,运行./configure

  3. 编译后台服务

    bazel build //kcws/cc:seg_backend_api

训练

  1. 关注待字闺中公众号 回复 kcws 获取语料下载地址:

    logo

  2. 解压语料到一个目录

  3. 切换到代码目录,运行:

python kcws/train/process_anno_file.py <语料目录> pre_chars_for_w2v.txt

bazel build third_party/word2vec:word2vec

先得到初步词表

./bazel-bin/third_party/word2vec/word2vec -train pre_chars_for_w2v.txt -save-vocab pre_vocab.txt -min-count 3

处理低频词   python kcws/train/replace_unk.py pre_vocab.txt pre_chars_for_w2v.txt chars_for_w2v.txt

训练word2vec

./bazel-bin/third_party/word2vec/word2vec -train chars_for_w2v.txt -output vec.txt -size 50 -sample 1e-4 -negative 5 -hs 1 -binary 0 -iter 5

构建训练语料工具

bazel build kcws/train:generate_training

生成语料

./bazel-bin/kcws/train/generate_training vec.txt <语料目录> all.txt

得到train.txt , test.txt文件

python kcws/train/filter_sentence.py all.txt

  1. 安装好tensorflow,切换到kcws代码目录,运行:

python kcws/train/train_cws.py --word2vec_path vec.txt --train_data_path <绝对路径到train.txt> --test_data_path test.txt --max_sentence_len 80 --learning_rate 0.001  (默认使用IDCNN模型,可设置参数”--use_idcnn False“来切换BiLSTM模型)

  1. 生成vocab

bazel build kcws/cc:dump_vocab

./bazel-bin/kcws/cc/dump_vocab vec.txt kcws/models/basic_vocab.txt

  1. 导出训练好的模型

python tools/freeze_graph.py --input_graph logs/graph.pbtxt --input_checkpoint logs/model.ckpt --output_node_names "transitions,Reshape_7" --output_graph kcws/models/seg_model.pbtxt

  1. 词性标注模型下载 (临时方案,后续文档给出词性标注模型训练,导出等)

    https://pan.baidu.com/s/1bYmABk 下载pos_model.pbtxt到kcws/models/目录下

  2. 运行web service

./bazel-bin/kcws/cc/seg_backend_api --model_path=kcws/models/seg_model.pbtxt(绝对路径到seg_model.pbtxt>) --vocab_path=kcws/models/basic_vocab.txt --max_sentence_len=80

词性标注的训练说明:

https://github.com/koth/kcws/blob/master/pos_train.md

自定义词典

目前支持自定义词典是在解码阶段,参考具体使用方式请参考kcws/cc/test_seg.cc 字典为文本格式,每一行格式如下:

<自定义词条>\t<权重>

比如:

蓝瘦香菇 4

权重为一个正整数,一般4以上,越大越重要

demo

http://45.32.100.248:9090/

附: 使用相同模型训练的公司名识别demo:

http://45.32.100.248:18080

Comments
  • 大神,bazel build //kcws/cc:seg_backend_api 报错

    大神,bazel build //kcws/cc:seg_backend_api 报错

    ERROR: /root/kcws/third_party/gflags/BUILD:12:1: Executing genrule //third_party/gflags:gflags-srcs failed: bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 77.

    opened by maczhao 15
  • ERROR: Analysis of target '//kcws/cc:seg_backend_api' failed; build aborted.

    ERROR: Analysis of target '//kcws/cc:seg_backend_api' failed; build aborted.

    Hi, when I build the kcws, there are some issues, how can I fix them?

    the issues are as follow:

    [[email protected] cc]# /opt/BioDir/dl/bazel-0.4.3/output/bazel build //kcws/cc:seg_backend_api WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing. WARNING: /root/.cache/bazel/_bazel_root/067d099fd5fd2abf4236febace697e72/external/org_tensorflow/tensorflow/workspace.bzl:13:5: path_prefix was specified to tf_workspace but is no longer used and will be removed in the future. WARNING: /root/.cache/bazel/_bazel_root/067d099fd5fd2abf4236febace697e72/external/org_tensorflow/tensorflow/workspace.bzl:15:5: tf_repo_name was specified to tf_workspace but is no longer used and will be removed in the future. ERROR: /root/.cache/bazel/_bazel_root/067d099fd5fd2abf4236febace697e72/external/org_tensorflow/tensorflow/core/platform/default/build_config/BUILD:108:1: error loading package '@jpeg//': Extension file not found. Unable to load package for '//third_party:common.bzl': BUILD file not found on package path and referenced by '@org_tensorflow//tensorflow/core/platform/default/build_config:jpeg'. ERROR: Analysis of target '//kcws/cc:seg_backend_api' failed; build aborted. INFO: Elapsed time: 2.612s

    ================= I build the bazel tools as follow:

    [[email protected] bazel-0.4.3]# bash ./compile.sh INFO: You can skip this first step by providing a path to the bazel binary as second argument: INFO: ./compile.sh compile /path/to/bazel  Building Bazel from scratch.......  Building Bazel with Bazel. .WARNING: /tmp/bazel_lAI1U4my/out/external/bazel_tools/WORKSPACE:1: Workspace name in /tmp/bazel_lAI1U4my/out/external/bazel_tools/WORKSPACE (@io_bazel) does not match the name given in the repository's definition (@bazel_tools); this will cause a build error in future versions. INFO: Found 1 target... INFO: From Compiling third_party/ijar/platform_utils.cc [for host]: third_party/ijar/platform_utils.cc: In function 'bool devtools_ijar::write_file(const char*, mode_t, const void*, size_t)': third_party/ijar/platform_utils.cc:67:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (write(fd, data, size) != size) { ^ INFO: From Compiling third_party/ijar/platform_utils.cc: third_party/ijar/platform_utils.cc: In function 'bool devtools_ijar::write_file(const char*, mode_t, const void*, size_t)': third_party/ijar/platform_utils.cc:67:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (write(fd, data, size) != size) { ^ INFO: From Compiling third_party/ijar/ijar.cc: third_party/ijar/ijar.cc: In member function 'virtual bool devtools_ijar::JarStripperProcessor::Accept(const char*, devtools_ijar::u4)': third_party/ijar/ijar.cc:66:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (filename_len >= CLASS_EXTENSION_LENGTH) { ^ INFO: From Compiling third_party/ijar/ijar.cc [for host]: third_party/ijar/ijar.cc: In member function 'virtual bool devtools_ijar::JarStripperProcessor::Accept(const char*, devtools_ijar::u4)': third_party/ijar/ijar.cc:66:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (filename_len >= CLASS_EXTENSION_LENGTH) { ^ INFO: From Compiling src/main/cpp/blaze_util_posix.cc: src/main/cpp/blaze_util_posix.cc: In function 'void blaze::Daemonize(const string&)': src/main/cpp/blaze_util_posix.cc:190:28: warning: ignoring return value of 'int dup(int)', declared with attribute warn_unused_result [-Wunused-result] (void) dup(STDOUT_FILENO); // stderr (2>&1) ^ src/main/cpp/blaze_util_posix.cc: In function 'uint64_t blaze::AcquireLock(const string&, bool, bool, blaze::BlazeLock*)': src/main/cpp/blaze_util_posix.cc:578:30: warning: ignoring return value of 'int ftruncate(int, __off_t)', declared with attribute warn_unused_result [-Wunused-result] (void) ftruncate(lockfd, 0); ^ src/main/cpp/blaze_util_posix.cc:583:47: warning: ignoring return value of 'ssize_t write(int, const void*, size_t)', declared with attribute warn_unused_result [-Wunused-result] (void) write(lockfd, msg.data(), msg.size()); ^ INFO: From JavacBootstrap src/java_tools/buildjar/java/com/google/devtools/build/buildjar/libbootstrap_JarOwner.jar [for host]: warning: Implicitly compiled files were not subject to annotation processing. Use -proc:none to disable annotation processing or -implicit to specify a policy for implicit compilation. 1 warning INFO: From Building src/main/protobuf/libextra_actions_base_java_proto.jar (1 source jar): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/java_tools/junitrunner/java/com/google/testing/coverage/JacocoCoverage.jar (9 source files): Note: src/java_tools/junitrunner/java/com/google/testing/coverage/MethodProbesMapper.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/tools/android/java/com/google/devtools/build/android/ziputils/libziputils_lib.jar (12 source files): Note: Some input files use unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libconcurrent.jar (18 source files): Note: src/main/java/com/google/devtools/build/lib/concurrent/AbstractQueueVisitor.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building third_party/java/apkbuilder/apkbuilder.jar (15 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libutil.jar (45 source files): Note: src/main/java/com/google/devtools/build/lib/util/OrderedSetMultimap.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/cmdline/libcmdline.jar (10 source files): Note: src/main/java/com/google/devtools/build/lib/cmdline/RepositoryName.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/skyframe/libskyframe.jar (67 source files): Note: src/main/java/com/google/devtools/build/skyframe/ReverseDepsUtilImpl.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libsyntax.jar (86 source files): Note: src/main/java/com/google/devtools/build/lib/syntax/BuiltinFunction.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: Some input files use unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libpackages-internal.jar (98 source files): Note: Some input files use unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/actions/libactions.jar (91 source files): Note: src/main/java/com/google/devtools/build/lib/actions/Actions.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libbuild-base.jar (381 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: Some input files use unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libproto-rules.jar (13 source files): Note: src/main/java/com/google/devtools/build/lib/rules/proto/ProtoCommon.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/query2/libquery2.jar (12 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/query2/libquery-output.jar (10 source files): Note: src/main/java/com/google/devtools/build/lib/query2/output/QueryOutputUtils.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/rules/genquery/libgenquery.jar (2 source files): Note: src/main/java/com/google/devtools/build/lib/rules/genquery/GenQuery.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/rules/cpp/libcpp.jar (80 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libpython-rules.jar (15 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libjava-compilation.jar (37 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: src/main/java/com/google/devtools/build/lib/rules/java/JavaCompileAction.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libjava-rules.jar (32 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libandroid-rules.jar (59 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libideinfo.jar (4 source files): Note: src/main/java/com/google/devtools/build/lib/ideinfo/AndroidStudioInfoAspect.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/rules/objc/libobjc.jar (114 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: src/main/java/com/google/devtools/build/lib/rules/objc/IterableWrapper.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libruntime.jar (94 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/sandbox/libsandbox.jar (16 source files): Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/worker/libworker.jar (11 source files): Note: src/main/java/com/google/devtools/build/lib/worker/WorkerSpawnStrategy.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. INFO: From Building src/main/java/com/google/devtools/build/lib/libbazel-rules.jar (87 source files, 14 resources): Note: src/main/java/com/google/devtools/build/lib/bazel/rules/java/BazelJavaSemantics.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. Target //src:bazel up-to-date: bazel-bin/src/bazel INFO: Elapsed time: 178.725s, Critical Path: 170.17s WARNING: /tmp/bazel_lAI1U4my/out/external/bazel_tools/WORKSPACE:1: Workspace name in /tmp/bazel_lAI1U4my/out/external/bazel_tools/WORKSPACE (@io_bazel) does not match the name given in the repository's definition (@bazel_tools); this will cause a build error in future versions.

    Build successful! Binary is here: /opt/BioDir/dl/bazel-0.4.3/output/bazel

    opened by Sun-shan 12
  • error when run bazel build //kcws/cc:seg_backend_api

    error when run bazel build //kcws/cc:seg_backend_api

    ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Encountered error while reading extension file 'tensorflow/workspace.bzl': no such package '@org_tensorflow//tensorflow': local_repository rule //external:org_tensorflow must specify an existing directory. INFO: Elapsed time: 0.049s

    build on: centos6.8 x64 no gpu support Build label: 0.4.1- (@non-git) tensorflow-0.11.0

    opened by busyfree 11
  • 关于标注部分的问题

    关于标注部分的问题

    大神好,我昨天仔细研究了您新添加的词性标注模块,然后我发现有几步好像有点问题,我自己尝试更改了一下,现在已经跑通了,99.57%的准确率,请您看看,问题如下: 1、在第五步骤,传入参数“lines_withpos.txt”,然而在代码里面并没有写入信息,我觉得应该得在代码里面添加 写入每个标注与其对应的序号。 2、在第六步骤,传入的第三个参数应该是上一步生成的词典“lines_withpos.txt”而不是”pos_vocab.txt“。

    您看这样是正确的吗?

    opened by oneapmlj 7
  • gflags link failed

    gflags link failed

    Linking using thirdparty gflags failed.

    Fixed by using self compiled gflags, maybe version issues of gflag. Modification made to Build files.

    
    --- a/third_party/glog/BUILD
    +++ b/third_party/glog/BUILD
    @@ -45,10 +45,7 @@ cc_library(
             "include/glog/stl_logging.h",
             "include/glog/vlog_is_on.h",
         ],
    -    deps = [
    -      "//third_party/gflags:gflags-cxx",
    -
    -    ],
    +    linkopts = ["-lgflags"],
         hdrs = [
             "include/glog/logging.h",
         ],
    
    opened by Vimos 7
  • 修改了max_word_num 的最大值,运行起来报错

    修改了max_word_num 的最大值,运行起来报错

    koth大大,请教个问题,我修改了 seg_backend_api.cc的 DEFINE_int32(max_word_num, 300, "max num of word per sentence ");将值改到了300,我测试的句子里面的字数比较多,在运行时报以下错误: E0918 11:23:35.434610 26934 tfmodel.cc:88] Error during inference: Invalid argument: Input to reshape is a tensor with 640 values, but the requested shape requires a multiple of 1200 [[Node: Reshape_7 = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _output_shapes=[[?,300,4]], _device="/job:localhost/replica:0/task:0/cpu: 0"](idcnn_1/scores, Reshape_7/shape)]] 2017-09-18 11:23:35.434675: E kcws/cc/tf_seg_model.cc:321] Error during inference:

    这种情况是不是我要重新训练models里面的word_vocab.txt文件?还是什么问题呢?如果是word_vocab.txt的问题,这个文本文件怎么训练呢?谢谢解惑.

    opened by younger911 6
  • F tensorflow/core/platform/cpu_feature_guard.cc:35] The TensorFlow library was compiled to use AVX2 instructions, but these aren't available on your machine.

    F tensorflow/core/platform/cpu_feature_guard.cc:35] The TensorFlow library was compiled to use AVX2 instructions, but these aren't available on your machine.

    [email protected]:/mnt/kcws# export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.1-cp27-none-linux_x86_64.whl [email protected]:/mnt/kcws# pip install --upgrade $TF_BINARY_URL

    通过这种安装的tensorflow,可以运行的。 但是这个项目启动会抛这个错误

    opened by weisong82 6
  • 关于默认分词的效果

    关于默认分词的效果

    我按照说明操作后,分词的效果如下。分词效果不是很准,下面是分词结果,这个正常吗? { "msg": "OK", "segments": [ "赵雅", "淇", "洒泪", "道", "歉", " ", "和林", "丹", "没", "有", "任", "何", "经济", "关", "系" ], "status": 0 }

    duplicate 
    opened by dengzz 5
  • embedding_size  AssertionError

    embedding_size AssertionError

    在最后train的时候:也就是运行: python kcws/train/train_cws_lstm.py --word2vec_path vec.txt --train_data_path <绝对路径到train.txt> --test_data_path test.txt --max_sentence_len 80 --learning_rate 0.001

    报错: Traceback (most recent call last): File "kcws/train/train_cws_lstm.py", line 262, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) File "kcws/train/train_cws_lstm.py", line 228, in main FLAGS.word2vec_path, FLAGS.num_hidden) File "kcws/train/train_cws_lstm.py", line 62, in init self.c2v = self.load_w2v(c2vPath) File "kcws/train/train_cws_lstm.py", line 132, in load_w2v assert (dim == (FLAGS.embedding_size)) AssertionError

    然后修改了:train_cws_lstm.py 的 tf.app.flags.DEFINE_integer("embedding_size", 50, "embedding size")tf.app.flags.DEFINE_integer("embedding_size", 200, "embedding size")就好

    opened by rockyzhengwu 5
  • 词性标注模型最后一步报错 MemoryError

    词性标注模型最后一步报错 MemoryError

    $ python tools/freeze_graph.py --input_graph pos_logs/graph.pbtxt --input_checkpoint pos_logs/model.ckpt --output_node_names "transitions,Reshape_9" --output_graph kcws/models/pos_model.pbtxt Traceback (most recent call last): File "tools/freeze_graph.py", line 202, in app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "tools/freeze_graph.py", line 134, in main FLAGS.variable_names_blacklist) File "tools/freeze_graph.py", line 93, in freeze_graph text_format.Merge(f.read().decode("utf-8"), input_graph_def) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 525, in Merge descriptor_pool=descriptor_pool) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 579, in MergeLines return parser.MergeLines(lines, message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 612, in MergeLines self._ParseOrMerge(lines, message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 627, in _ParseOrMerge self._MergeField(tokenizer, message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 727, in _MergeField merger(tokenizer, message, field) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 815, in _MergeMessageField self._MergeField(tokenizer, sub_message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 727, in _MergeField merger(tokenizer, message, field) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 815, in _MergeMessageField self._MergeField(tokenizer, sub_message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 727, in _MergeField merger(tokenizer, message, field) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 815, in _MergeMessageField self._MergeField(tokenizer, sub_message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 727, in _MergeField merger(tokenizer, message, field) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 815, in _MergeMessageField self._MergeField(tokenizer, sub_message) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 714, in _MergeField tokenizer.Consume(':') File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1078, in Consume if not self.TryConsume(token): File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1065, in TryConsume self.NextToken() File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1314, in NextToken match = self._TOKEN.match(self._current_line, self._column) MemoryError

    opened by kinghuangdd 4
  • 编译后台服务出现新错误。。。

    编译后台服务出现新错误。。。

    您好,执行命令:bazel build //kcws/cc:seg_backend_api 报错如下:

    ERROR: /home/di/pycharmProjects/segment/kcws/third_party/gflags/BUILD:5:1: Reassignment of builtin build function 'package_name' not permitted. ERROR: /home/di/pycharmProjects/segment/kcws/third_party/glog/BUILD:5:1: Reassignment of builtin build function 'package_name' not permitted. ERROR: /home/di/pycharmProjects/segment/kcws/third_party/gflags/BUILD:41:1: Target '//third_party/gflags:empty.cc' contains an error and its package is in error and referenced by '//third_party/gflags:gflags-cxx'. ERROR: /home/di/pycharmProjects/segment/kcws/third_party/gflags/BUILD:41:1: Target '//third_party/gflags:include/gflags/gflags_declare.h' contains an error and its package is in error and referenced by '//third_party/gflags:gflags-cxx'. ERROR: /home/di/pycharmProjects/segment/kcws/third_party/gflags/BUILD:41:1: Target '//third_party/gflags:lib/libgflags.a' contains an error and its package is in error and referenced by '//third_party/gflags:gflags-cxx'. ERROR: /home/di/pycharmProjects/segment/kcws/third_party/gflags/BUILD:41:1: Target '//third_party/gflags:include/gflags/gflags.h' contains an error and its package is in error and referenced by '//third_party/gflags:gflags-cxx'. ERROR: /home/di/pycharmProjects/segment/kcws/base/BUILD:3:1: Target '//third_party/gflags:gflags-cxx' contains an error and its package is in error and referenced by '//base:base'. ERROR: /home/di/pycharmProjects/segment/kcws/base/BUILD:3:1: Target '//third_party/glog:glog-cxx' contains an error and its package is in error and referenced by '//base:base'. ERROR: Analysis of target '//kcws/cc:seg_backend_api' failed; build aborted. INFO: Elapsed time: 0.167s

    执行命令:bazel build third_party/word2vec:word2vec 能成功bazel,其他的命令如:bazel build kcws/train:generate_training,bazel build kcws/cc:dump_vocab均会类似如上错误。在build文件中加了“licenses(["notice"])”依然不行。。。 请问大神这是是什么原因,有空的话能不能帮看一下,不甚感激!

    opened by yufengzhixing 4
  • 编译后台服务报错

    编译后台服务报错

    WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files: /home/cly/github/kcws/tools/bazel.rc INFO: Writing tracer profile to '/home/cly/.cache/bazel/_bazel_cly/271de499a4ab5fb7350261a41335ecd2/command.profile.gz' ERROR: /home/cly/github/kcws/WORKSPACE:5:1: name 'new_http_archive' is not defined ERROR: /home/cly/github/kcws/WORKSPACE:18:1: name 'new_http_archive' is not defined ERROR: /home/cly/github/kcws/WORKSPACE:34:1: name 'http_archive' is not defined ERROR: error loading package '': Encountered error while reading extension file 'tools/build_defs/repo/http.bzl': no such package '@bazel_tools//tools/build_defs/repo': error loading package 'external': Could not load //external package ERROR: error loading package '': Encountered error while reading extension file 'tools/build_defs/repo/http.bzl': no such package '@bazel_tools//tools/build_defs/repo': error loading package 'external': Could not load //external package INFO: Elapsed time: 0.032s INFO: 0 processes. FAILED: Build did NOT complete successfully (0 packages loaded)

    opened by lingyiliu016 2
  • error C++ compilation of rule '@protobuf//:protobuf' failed (Exit 2). cl: 命令行 error D8021 :无效的数值参数“/Wwrite-strings”

    error C++ compilation of rule '@protobuf//:protobuf' failed (Exit 2). cl: 命令行 error D8021 :无效的数值参数“/Wwrite-strings”

    ERROR: C:/users/thomas/appdata/local/temp/_bazel_thomas/infhcau0/external/protob uf/BUILD:113:1: C++ compilation of rule '@protobuf//:protobuf' failed (Exit 2): cl.exe failed: error executing command cd C:/users/thomas/appdata/local/temp/_bazel_thomas/infhcau0/execroot/main

    SET INCLUDE=F:\Tools\Microsoft Visual Studio 14.0\VC\INCLUDE;F:\Tools\Microsof t Visual Studio 14.0\VC\ATLMFC\INCLUDE;C:\Program Files (x86)\Windows Kits\10\in clude\10.0.14393.0\ucrt;C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\inclu de\um;C:\Program Files (x86)\Windows Kits\10\include\10.0.14393.0\shared;C:\Prog ram Files (x86)\Windows Kits\10\include\10.0.14393.0\um;C:\Program Files (x86)\W indows Kits\10\include\10.0.14393.0\winrt; SET LIB=F:\Tools\Microsoft Visual Studio 14.0\VC\LIB\amd64;F:\Tools\Microsof t Visual Studio 14.0\VC\ATLMFC\LIB\amd64;C:\Program Files (x86)\Windows Kits\10
    lib\10.0.14393.0\ucrt\x64;C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib \um\x64;C:\Program Files (x86)\Windows Kits\10\lib\10.0.14393.0\um\x64; SET PATH=F:\Tools\Microsoft Visual Studio 14.0\Common7\IDE\CommonExtensions
    Microsoft\TestWindow;F:\Tools\Microsoft Visual Studio 14.0\VC\BIN\amd64;C:\WINDO WS\Microsoft.NET\Framework64\v4.0.30319;F:\Tools\Microsoft Visual Studio 14.0\VC \VCPackages;F:\Tools\Microsoft Visual Studio 14.0\Common7\IDE;F:\Tools\Microsoft Visual Studio 14.0\Common7\Tools;F:\Tools\Microsoft Visual Studio 14.0\Team Too ls\Performance Tools\x64;F:\Tools\Microsoft Visual Studio 14.0\Team Tools\Perfor mance Tools;C:\Program Files (x86)\Windows Kits\10\bin\x64;C:\Program Files (x86 )\Windows Kits\10\bin\x86;C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\b in\NETFX 4.6.1 Tools\x64;;C:\WINDOWS\system32 SET PWD=/proc/self/cwd SET TEMP=C:\Users\Thomas\AppData\Local\Temp SET TMP=C:\Users\Thomas\AppData\Local\Temp F:/Tools/Microsoft Visual Studio 14.0/VC/bin/amd64/cl.exe /c external/protobuf /src/google/protobuf/struct.pb.cc /Fobazel-out/msvc_x64-fastbuild/bin/external/p rotobuf/objs/protobuf/external/protobuf/src/google/protobuf/struct.pb.o /nologo /DCOMPILER_MSVC /DNOMINMAX /D_WIN32_WINNT=0x0600 /D_CRT_SECURE_NO_DEPRECATE /D CRT_SECURE_NO_WARNINGS /D_SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS /bigobj /Zm50 0 /J /Gy /GF /EHsc /wd4351 /wd4291 /wd4250 /wd4996 /Iexternal/protobuf /Ibazel-o ut/msvc_x64-fastbuild/genfiles/external/protobuf /Iexternal/bazel_tools /Ibazel- out/msvc_x64-fastbuild/genfiles/external/bazel_tools /Iexternal/protobuf/src /Ib azel-out/msvc_x64-fastbuild/genfiles/external/protobuf/src /Iexternal/bazel_tool s/tools/cpp/gcc3 /showIncludes /MT /Od /Z7 -DHAVE_PTHREAD -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare -Wno-unused-function. cl: 命令行 error D8021 :无效的数值参数“/Wwrite-strings” Target //kcws/cc:seg_backend_api failed to build ____Elapsed time: 2.704s, Critical Path: 0.13s

    opened by thomas1984 2
  • 关于模型导出--output_node_names

    关于模型导出--output_node_names "transitions,Reshape_9" "transitions,Reshape_7" 什么意思

    模型导出时指定 output node 在解码的时候作为模型的输出; 训练的时候不是应该指定这两个名字吗? 我在bilstm.py 文件找到了 Reshape_7 这个output的定义 但没找到pos训练 Reshape_9 这个output的定义 以及transitions的定义, 这两个是tensorflow 默认的output node还是什么? 麻烦解释下,谢谢

    opened by forever1dream 3
Releases(test)
([email protected]) Boosting Co-teaching with Compression Regularization for Label Noise

Nested-Co-teaching ([email protected]) Pytorch implementation of paper "Boosting Co-tea

YINGYI CHEN 41 Jan 03, 2023
Machine Leaning applied to denoise images to improve OCR Accuracy

Machine Learning to Denoise Images for Better OCR Accuracy This project is an adaptation of this tutorial and used only for learning purposes: https:/

Antonio Bri Pérez 2 Nov 16, 2022
Polaris is a Face recognition attendance system .

Support Me 🚀 About Polaris 📄 Polaris is a system based on facial recognition with a futuristic GUI design, Can easily find people informations store

XN3UR0N 215 Dec 26, 2022
Implementation of EAST scene text detector in Keras

EAST: An Efficient and Accurate Scene Text Detector This is a Keras implementation of EAST based on a Tensorflow implementation made by argman. The or

Jan Zdenek 208 Nov 15, 2022
Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

188 Dec 28, 2022
Motion Detection Squid Game with OpenCV Python

*Motion Detection Squid Game with OpenCV Python i am newbie in python. In this project I made a simple game to follow the trend about the red light gr

Nayan 17 Nov 22, 2022
Image augmentation for machine learning experiments.

imgaug This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much lar

Alexander Jung 13.2k Jan 02, 2023
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval (arXiv) Repository to contain the code, models, data for end-to-end

225 Dec 25, 2022
TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

FOTS: Fast Oriented Text Spotting with a Unified Network I am still working on this repo. updates and detailed instructions are coming soon! Table of

Masao Taketani 52 Nov 11, 2022
An interactive interface for using OpenCV's GrabCut algorithm for image segmentation.

Interactive GrabCut An interactive interface for using OpenCV's GrabCut algorithm for image segmentation. Setup Install dependencies: pip install nump

Jason Y. Zhang 16 Oct 10, 2022
Framework for the Complete Gaze Tracking Pipeline

Framework for the Complete Gaze Tracking Pipeline The figure below shows a general representation of the camera-to-screen gaze tracking pipeline [1].

Pascal 20 Jan 06, 2023
Generate a list of papers with publicly available source code in the daily arxiv

2021-06-08 paper code optimal network slicing for service-oriented networks with flexible routing and guaranteed e2e latency networkslicing multi-moda

79 Jan 03, 2023
CNN+Attention+Seq2Seq

Attention_OCR CNN+Attention+Seq2Seq The model and its tensor transformation are shown in the figure below It is necessary ch_ train and ch_ test the p

Tsukinousag1 2 Jul 14, 2022
OCR powered screen-capture tool to capture information instead of images

NormCap OCR powered screen-capture tool to capture information instead of images. Links: Repo | PyPi | Releases | Changelog | FAQs Content: Quickstart

575 Dec 31, 2022
Augmenting Anchors by the Detector Itself

Augmenting Anchors by the Detector Itself Introduction It is difficult to determine the scale and aspect ratio of anchors for anchor-based object dete

4 Nov 06, 2022
基于Paddle框架的PSENet复现

PSENet-Paddle 基于Paddle框架的PSENet复现 本项目基于paddlepaddle框架复现PSENet,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 AIStudio链接 参考项目: whai362-PSENet 环境配置 本项目

QuanHao Guo 4 Apr 24, 2022
A synthetic data generator for text recognition

TextRecognitionDataGenerator A synthetic data generator for text recognition What is it for? Generating text image samples to train an OCR software. N

Edouard Belval 2.5k Jan 04, 2023
Creating of virtual elements of the graphical interface using opencv and mediapipe.

Virtual GUI Creating of virtual elements of the graphical interface using opencv and mediapipe. Element GUI Output Description Button By default the b

Aleksei 4 Jun 16, 2022
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 224 Jan 07, 2023