Jamers Posted November 22, 2017 Report Share Posted November 22, 2017 python里有个结巴分词工具,看着还不错,我们PHP上能够尝试用它么? 先来分词py脚本 : # -*- coding: UTF-8 -*- import sys import jieba import json str = "" for i in range(1, len(sys.argv)): str += sys.argv[i]+' ' if (str == ""): str = "他来到了网易杭研大厦" t = "jieba V"+jieba.__version__ out = {"TITLE":t} seg_list = jieba.cut(str, cut_all=True) #print("Full Mode: " + "/ ".join(seg_list)) # 全模式 out['FULL'] = "||".join(seg_list) seg_list = jieba.cut(str, cut_all=False) #print("Default Mode: " + "/ ".join(seg_list)) # 精确模式 out['DEF'] = "||".join(seg_list) #seg_list = jieba.cut(str) # 默认是精确模式 #print(", ".join(seg_list)) seg_list = jieba.cut_for_search(str) # 搜索引擎模式 #print(", ".join(seg_list)) out['SEARCH'] = "||".join(seg_list) print json.dumps(out) 再来PHP脚本: <?php //路径请自行定义 $script = '/usr/local/www/apache24/data/py/tst.py'; $str = "我家住在黄土高坡"; if (isset($_REQUEST['str'])) $str = $_REQUEST['str']; header('Content-Type:application/json;charset:UTF-8'); system("python {$script} {$str}"); 现在只有一个问题,运行速度有点慢,同样的文字,python端只需要0.2秒,但PHP请求需要2-3秒。。看样子后续还是要改成直接用python返回相应请求。 Link to comment Share on other sites More sharing options...
Jamers Posted November 22, 2017 Author Report Share Posted November 22, 2017 用了python自己的http服务,速度明显快了,同样的请求,差不多200ms就可以完成了。 # -*- coding: UTF-8 -*- import tornado.httpserver import tornado.ioloop import tornado.options import tornado.web import jieba import json from tornado.options import define, options define("port", default=8000, help="run on the given port", type=int) class IndexHandler(tornado.web.RequestHandler): def get(self): sstr = self.get_argument('str', '') self.set_header("Content-Type", "application/json;charset:UTF-8") self.write(self.jjcut(sstr)) def post(self): sstr = self.get_argument('str',"") self.set_header("Content-Type", "application/json;charset:UTF-8") self.write(self.jjcut(sstr)) def jjcut(self,sstr=""): if (sstr == ''): sstr = '他来到了天安门广场' t = "jieba V" + jieba.__version__ out = {"TITLE": t} seg_list = jieba.cut(sstr, cut_all=True) # print("Full Mode: " + "/ ".join(seg_list)) # 全模式 out['FULL'] = "||".join(seg_list) seg_list = jieba.cut(sstr, cut_all=False) # print("Default Mode: " + "/ ".join(seg_list)) # 精确模式 out['DEF'] = "||".join(seg_list) # seg_list = jieba.cut(str) # 默认是精确模式 # print(", ".join(seg_list)) seg_list = jieba.cut_for_search(sstr) # 搜索引擎模式 # print(", ".join(seg_list)) out['SEARCH'] = "||".join(seg_list) return json.dumps(out) if __name__ == "__main__": tornado.options.parse_command_line() app = tornado.web.Application(handlers=[(r"/", IndexHandler)]) http_server = tornado.httpserver.HTTPServer(app) http_server.listen(options.port) tornado.ioloop.IOLoop.instance().start() Link to comment Share on other sites More sharing options...
Recommended Posts
Please sign in to comment
You will be able to leave a comment after signing in
Sign In Now