禹行思潮·部署AI改造chatglm3-6B一些过程和代码改造

下载模型和代码
代码下载
https://github.com/THUDM/ChatGLM3

下载模型这里使用的pyhon脚本下载国内源模式，国内魔塔社区。

https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k
我主要靠python脚本下载模型，速度会很快
安装依赖
pip install modelscope，ipython
进入python交互式命令
ipython
输入下面命令
from modelscope import snapshot_download
model_dir = snapshot_download(“ZhipuAI/chatglm3-6b-32k”, revision = “v1.0.0”)

代码和模式都下载好了后，进入改造环节
第一个要量化改造

model = AutoModelForCausalLM.from_pretrained(
                model_dir, trust_remote_code=True,device_map = 'cuda'
        )
    model.quantize(8).cuda()
    model.eval()
    tokenizer_dir = model_dir

在模型加载完成后进行量化

第二个加入内存主动清理，主要利用try except finally 语句处理

	try
  	 for response  in robot.gen_stream_chat(system=system_prom,tools=tools,history=history,do_sample=True,max_new_tokens=256,temperature=0.95,top_p=1.0,stop_sequences=[str(Role.USER)],repetition_penalty=1.1, )
	 业务代码
	except Exception as e:
             print("报错啦:{}".format(e))
             return "AI正忙，请稍后重试"
 	finally:
      	 if torch.cuda.is_available():
            with torch.cuda.device(torch.device("cuda")):
                torch.cuda.empty_cache()
                 torch.cuda.ipc_collect()
                 gc.collect()