east – 第31页 – gitweixin

python 9月 11,2023

清华开源ChatGPT自动编程ChatDev项目chat_env.py代码解读

这段代码定义了两个类，ChatEnvConfig和ChatEnv，用于管理软件开发的环境和资源。
ChatEnvConfig类是一个用于存储ChatEnv对象的配置信息的类，它有以下几个属性：
- self.clear_structure: 一个布尔值，表示是否需要清理结构，即是否需要删除无用的文件。
- self.brainstorming: 一个布尔值，表示是否需要进行头脑风暴，即是否需要生成一些创意点子。
- self.gui_design: 一个布尔值，表示是否需要进行GUI设计，即是否需要生成一些图形界面。
- self.git_management: 一个布尔值，表示是否需要进行Git管理，即是否需要使用Git来管理代码版本。
ChatEnvConfig类的__init__()方法是类的构造函数，它接受四个参数clear_structure, brainstorming, gui_design, git_management，并将它们赋值给类的属性。
ChatEnvConfig类的__str__()方法是类的字符串表示方法，它返回一个字符串，表示类的属性和值。
ChatEnv类是一个用于管理软件开发的环境和资源的类，它有以下几个属性：
- self.config: 一个ChatEnvConfig对象，表示ChatEnv对象的配置信息。
- self.roster: 一个Roster对象，表示智能体的名单。
- self.codes: 一个Codes对象，表示生成的代码的集合。
- self.proposed_images: 一个字典，表示提出的图像的集合，以图像名称为键，以图像URL为值。
- self.incorporated_images: 一个字典，表示采纳的图像的集合，以图像名称为键，以图像URL为值。
- self.requirements: 一个Documents对象，表示需求文档的集合。
- self.manuals: 一个Documents对象，表示用户手册的集合。
- self.env_dict: 一个字典，表示软件开发的环境变量，包括以下几个键值对：
  - “directory”: 一个字符串，表示软件保存的目录。
  - “task_prompt”: 一个字符串，表示用户输入的软件想法。
  - “modality”: 一个字符串，表示软件的交互模式。
  - “ideas”: 一个字符串，表示头脑风暴生成的创意点子。
  - “language”: 一个字符串，表示软件使用的编程语言。
  - “review_comments”: 一个字符串，表示代码审查生成的评论。
  - “error_summary”: 一个字符串，表示错误汇总生成的报告。
  - “test_reports”: 一个字符串，表示测试生成的报告。
ChatEnv类的__init__()方法是类的构造函数，它接受一个参数chat_env_config，并将其赋值给self.config属性。然后它初始化了其他属性为默认值或空值。
ChatEnv类定义了以下几个方法：
- fix_module_not_found_error()方法是一个静态方法（使用@staticmethod装饰器），用于修复模块未找到错误。它接受一个参数test_reports，表示测试报告。它判断测试报告中是否包含”ModuleNotFoundError”字符串，如果是，则使用re模块来遍历测试报告中匹配”No module named ‘(\S+)’“正则表达式的部分，并将匹配到的模块名称赋值给module变量。然后使用subprocess模块来执行”pip install {}”.format(module)命令，并等待其完成。最后使用log_and_print_online()函数来记录并打印”[CMD Execute]\n\n[CMD] pip install {}”.format(module)信息，表示执行了安装模块的命令。
- set_directory()方法用于设置软件保存的目录，并更新相关属性。它接受一个参数directory，并断言self.env_dict[‘directory’]属性为空。然后它将directory赋值给self.env_dict[‘directory’]属性、self.codes.directory属性、self.requirements.directory属性和self.manuals.directory属性。接着它判断目录是否存在并且不为空，如果是，则使用time模块来获取当前时间，并将其格式化为”%Y%m%d%H%M%S”形式，并与directory拼接起来，得到new_directory变量。然后使用shutil模块的copytree()函数来将directory复制到new_directory，并打印出”{} Copied to {}“.format(directory, new_directory)信息，表示复制了目录。如果self.config.clear_structure属性为真，则表示需要清理结构，它会判断目录是否存在，如果是，则使用shutil模块的rmtree()函数来删除目录，并使用os模块的mkdir()函数来创建目录，并打印出”{} Created”.format(directory)信息，表示创建了目录。如果不是，则直接使用os模块的mkdir()函数来创建目录。

这段代码定义了一些方法，用于更新、重写、获取和写入软件开发的各种资源，例如代码、图像、文档等。
exist_bugs()方法用于检测软件是否存在错误，它不接受任何参数，但返回一个元组(bool, str)，其中bool表示是否存在错误，str表示错误信息或成功信息。它首先获取软件保存的目录（self.env_dict[‘directory’]属性），并创建一个字符串success_info，表示软件运行成功的信息。然后它使用try-except语句来捕获可能发生的异常，并执行以下操作：
- 使用subprocess模块的Popen()函数来执行一个命令，该命令包括切换到软件目录、列出目录内容、运行main.py文件，并将shell参数设为True，preexec_fn参数设为os.setsid，stdout参数和stderr参数设为subprocess.PIPE。这个函数返回一个Popen对象，并将其赋值给process变量。这个对象用于管理子进程的输入和输出。
- 使用time模块的sleep()函数来等待3秒，让子进程有足够的时间运行。
- 获取子进程的返回码，并将其赋值给return_code变量。如果返回码为0，则表示子进程运行成功，否则表示子进程运行失败。
- 使用poll()方法来检查子进程是否仍在运行，如果是，则使用os模块的killpg()函数和signal模块的SIGTERM信号来终止子进程及其所有子进程。
- 如果返回码为0，则返回(False, success_info)元组，表示软件运行成功。
- 否则，使用stderr属性和read()方法来读取子进程的错误输出，并将其解码为utf-8格式，并赋值给error_output变量。如果error_output变量不为空，则判断其中是否包含”Traceback”字符串（不区分大小写），如果是，则表示有Python异常发生，它使用replace()方法来去除目录路径，并将其赋值给errs变量。然后返回(True, errs)元组，表示软件运行失败，并附上错误信息。如果error_output变量为空，则返回(False, success_info)元组，表示软件运行成功。
recruit()方法用于招聘合适的智能体，并将他们加入到Roster对象中。它接受一个参数agent_name，表示智能体名称。它调用self.roster对象的_recruit()方法，将agent_name作为参数传递，表示招聘该智能体。
exist_employee()方法用于判断是否存在某个智能体。它接受一个参数agent_name，表示智能体名称。它调用self.roster对象的_exist_employee()方法，将agent_name作为参数传递，并返回该方法的返回值，表示是否存在该智能体。
print_employees()方法用于打印所有智能体的名称。它不接受任何参数，也不返回任何值。它调用self.roster对象的_print_employees()方法，打印所有智能体的名称。
update_codes()方法用于更新生成的代码，根据新的LLM回复来比较、修改和保存代码。它接受一个参数generated_content，表示新的LLM回复内容。它调用self.codes对象的_update_codes()方法，将generated_content作为参数传递，并更新代码。
rewrite_codes()方法用于重写代码，根据self.codes对象中的内容来修改和保存代码。它不接受任何参数，也不返回任何值。它调用self.codes对象的_rewrite_codes()方法，并将self.config.git_management属性作为参数传递，表示是否进行Git管理，并重写代码。
get_codes()方法用于获取代码，根据self.codes对象中的内容来生成一个字符串，表示代码的集合。它不接受任何参数，但返回一个字符串。它调用self.codes对象的_get_codes()方法，并返回该方法的返回值，表示代码的集合。
_load_from_hardware()方法用于从硬盘中加载代码，根据给定的目录来读取并保存代码。它接受一个参数directory，表示代码所在的目录。它调用self.codes对象的_load_from_hardware()方法，将directory作为参数传递，并加载代码。
_update_requirements()方法用于更新需求文档，根据新的LLM回复来比较、修改和保存文档。它接受一个参数generated_content，表示新的LLM回复内容。它调用self.requirements对象的_update_docs()方法，将generated_content作为参数传递，并更新文档。
rewrite_requirements()方法用于重写需求文档，根据self.requirements对象中的内容来修改和保存文档。它不接受任何参数，也不返回任何值。它调用self.requirements对象的_rewrite_docs()方法，并重写文档。
get_requirements()方法用于获取需求文档，根据self.requirements对象中的内容来生成一个字符串，表示文档的集合。它不接受任何参数，但返回一个字符串。它调用self.requirements对象的_get_docs()方法，并返回该方法的返回值，表示文档的集合。
_update_manuals()方法用于更新用户手册，根据新的LLM回复来比较、修改和保存文档。它接受一个参数generated_content，表示新的LLM回复内容。它调用self.manuals对象的_update_docs()方法，将generated_content、parse=False和predifined_filename=”manual.md”作为参数传递，并更新文档。
rewrite_manuals()方法用于重写用户手册，根据self.manuals对象中的内容来修改和保存文档。它不接受任何参数，也不返回任何值。它调用self.manuals对象的_rewrite_docs()方法，并重写文档。
write_meta()方法用于写入元数据信息到软件目录中。它不接受任何参数，也不返回任何值。它首先获取软件保存的目录（self.env_dict[‘directory’]属性），并判断目录是否存在并且不为空，如果是，则使用time模块来获取当前时间，并将其格式化为”%Y%m%d%H%M%S”形式，并与目录拼接起来，得到new_directory变量。然后使用shutil模块的copytree()函数来将目录复制到new_directory，并打印出”{} Copied to {}“.format(directory, new_directory)信息，表示复制了目录。如果self.config.clear_structure属性为真，则表示需要清理结构，它会判断目录是否存在，如果是，则使用shutil模块的rmtree()函数来删除目录，并使用os模块的mkdir()函数来创建目录，并打印出”{} Created”.format(directory)信息，表示创建了目录。

generate_images_from_codes 函数的作用是从代码中提取图像文件名，并下载这些图像文件。具体流程如下：

download 函数用于下载图像文件，它接收一个图像的URL和文件名作为参数，使用 requests 库发送 HTTP 请求，并将图像保存到指定的目录。
regex 定义了匹配文件名的正则表达式，它匹配以字母数字字符组成的字符串，以及以 .png 结尾。这个正则表达式将用于从代码中提取图像文件名。
joined_codes 是一个合并了所有代码的字符串。
matches 使用 re.finditer 方法根据正则表达式在 joined_codes 中查找匹配的图像文件名。re.finditer 方法返回一个迭代器，通过遍历这个迭代器可以获得匹配的结果。
在 for 循环中，遍历所有匹配的文件名，并判断文件名是否存在于 proposed_images 字典中。如果存在，将对应的图像描述添加到 incorporated_images 字典中；否则，直接将文件名添加到 incorporated_images 字典中，并将下划线替换为空格。
再次使用 for 循环，遍历 incorporated_images 字典中的文件名。如果文件不存在于指定的目录中，则根据图像描述使用 OpenAI 的 API 生成图像，并将生成的图像保存到指定的目录中。

get_proposed_images_from_message 函数的作用是从消息中提取图像文件名和描述，并下载这些图像文件。具体流程如下：

download 函数同样用于下载图像文件。
regex 定义了匹配图像文件名和描述的正则表达式，它匹配以字母数字字符组成的字符串后紧跟冒号和换行符，然后紧跟任意字符。这个正则表达式将用于从消息中提取图像文件名和描述。
matches 使用 re.finditer 方法根据正则表达式在消息中查找匹配的图像文件名和描述。
在第一个 for 循环中，遍历所有匹配的结果，并将文件名和描述添加到 images 字典中。
如果 images 字典为空，则执行第二个 for 循环，使用正则表达式提取图像文件名，并根据文件名生成一个默认的描述。
最后一个 for 循环中，遍历 images 字典中的文件名。如果文件不存在于指定的目录中，则根据图像描述使用 OpenAI 的 API 生成图像，并将生成的图像保存到指定的目录中。

chat_env.py的源代码如下：

import os
import re
import shutil
import signal
import subprocess
import time
from typing import Dict

import openai
import requests

from chatdev.codes import Codes
from chatdev.documents import Documents
from chatdev.roster import Roster
from chatdev.utils import log_and_print_online


class ChatEnvConfig:
    def __init__(self, clear_structure,
                 brainstorming,
                 gui_design,
                 git_management):
        self.clear_structure = clear_structure
        self.brainstorming = brainstorming
        self.gui_design = gui_design
        self.git_management = git_management

    def __str__(self):
        string = ""
        string += "ChatEnvConfig.clear_structure: {}\n".format(self.clear_structure)
        string += "ChatEnvConfig.brainstorming: {}\n".format(self.brainstorming)
        return string


class ChatEnv:
    def __init__(self, chat_env_config: ChatEnvConfig):
        self.config = chat_env_config
        self.roster: Roster = Roster()
        self.codes: Codes = Codes()
        self.proposed_images: Dict[str, str] = {}
        self.incorporated_images: Dict[str, str] = {}
        self.requirements: Documents = Documents()
        self.manuals: Documents = Documents()
        self.env_dict = {
            "directory": "",
            "task_prompt": "",
            "modality": "",
            "ideas": "",
            "language": "",
            "review_comments": "",
            "error_summary": "",
            "test_reports": ""
        }

    @staticmethod
    def fix_module_not_found_error(test_reports):
        if "ModuleNotFoundError" in test_reports:
            for match in re.finditer(r"No module named '(\S+)'", test_reports, re.DOTALL):
                module = match.group(1)
                subprocess.Popen("pip install {}".format(module), shell=True).wait()
                log_and_print_online("**[CMD Execute]**\n\n[CMD] pip install {}".format(module))

    def set_directory(self, directory):
        assert len(self.env_dict['directory']) == 0
        self.env_dict['directory'] = directory
        self.codes.directory = directory
        self.requirements.directory = directory
        self.manuals.directory = directory

        if os.path.exists(self.env_dict['directory']) and len(os.listdir(directory)) > 0:
            new_directory = "{}.{}".format(directory, time.strftime("%Y%m%d%H%M%S", time.localtime()))
            shutil.copytree(directory, new_directory)
            print("{} Copied to {}".format(directory, new_directory))
        if self.config.clear_structure:
            if os.path.exists(self.env_dict['directory']):
                shutil.rmtree(self.env_dict['directory'])
                os.mkdir(self.env_dict['directory'])
                print("{} Created".format(directory))
            else:
                os.mkdir(self.env_dict['directory'])

    def exist_bugs(self) -> tuple[bool, str]:
        directory = self.env_dict['directory']

        success_info = "The software run successfully without errors."
        try:
            command = "cd {}; ls -l; python3 main.py;".format(directory)
            process = subprocess.Popen(command, shell=True, preexec_fn=os.setsid,
                                       stdout=subprocess.PIPE, stderr=subprocess.PIPE)
            time.sleep(3)
            return_code = process.returncode
            # Check if the software is still running
            if process.poll() is None:
                os.killpg(os.getpgid(process.pid), signal.SIGTERM)
            if return_code == 0:
                return False, success_info
            else:
                error_output = process.stderr.read().decode('utf-8')
                if error_output:
                    if "Traceback".lower() in error_output.lower():
                        errs = error_output.replace(directory + "/", "")
                        return True, errs
                else:
                    return False, success_info
        except subprocess.CalledProcessError as e:
            return True, f"Error: {e}"
        except Exception as ex:
            return True, f"An error occurred: {ex}"

        return False, success_info

    def recruit(self, agent_name: str):
        self.roster._recruit(agent_name)

    def exist_employee(self, agent_name: str) -> bool:
        return self.roster._exist_employee(agent_name)

    def print_employees(self):
        self.roster._print_employees()

    def update_codes(self, generated_content):
        self.codes._update_codes(generated_content)

    def rewrite_codes(self) -> None:
        self.codes._rewrite_codes(self.config.git_management)

    def get_codes(self) -> str:
        return self.codes._get_codes()

    def _load_from_hardware(self, directory) -> None:
        self.codes._load_from_hardware(directory)

    def _update_requirements(self, generated_content):
        self.requirements._update_docs(generated_content)

    def rewrite_requirements(self):
        self.requirements._rewrite_docs()

    def get_requirements(self) -> str:
        return self.requirements._get_docs()

    def _update_manuals(self, generated_content):
        self.manuals._update_docs(generated_content, parse=False, predifined_filename="manual.md")

    def rewrite_manuals(self):
        self.manuals._rewrite_docs()

    def write_meta(self) -> None:
        directory = self.env_dict['directory']

        if not os.path.exists(directory):
            os.mkdir(directory)
            print("{} Created.".format(directory))

        meta_filename = "meta.txt"
        with open(os.path.join(directory, meta_filename), "w", encoding="utf-8") as writer:
            writer.write("{}:\n{}\n\n".format("Task", self.env_dict['task_prompt']))
            writer.write("{}:\n{}\n\n".format("Config", self.config.__str__()))
            writer.write("{}:\n{}\n\n".format("Roster", ", ".join(self.roster.agents)))
            writer.write("{}:\n{}\n\n".format("Modality", self.env_dict['modality']))
            writer.write("{}:\n{}\n\n".format("Ideas", self.env_dict['ideas']))
            writer.write("{}:\n{}\n\n".format("Language", self.env_dict['language']))
            writer.write("{}:\n{}\n\n".format("Code_Version", self.codes.version))
            writer.write("{}:\n{}\n\n".format("Proposed_images", len(self.proposed_images.keys())))
            writer.write("{}:\n{}\n\n".format("Incorporated_images", len(self.incorporated_images.keys())))
        print(os.path.join(directory, meta_filename), "Wrote")

    def generate_images_from_codes(self):
        def download(img_url, file_name):
            r = requests.get(img_url)
            filepath = os.path.join(self.env_dict['directory'], file_name)
            if os.path.exists(filepath):
                os.remove(filepath)
            with open(filepath, "wb") as f:
                f.write(r.content)
                print("{} Downloaded".format(filepath))

        regex = r"(\w+.png)"
        joined_codes = self.get_codes()
        matches = re.finditer(regex, joined_codes, re.DOTALL)
        # matched_images = {}
        for match in matches:
            filename = match.group(1).strip()
            if filename in self.proposed_images.keys():
                self.incorporated_images[filename] = self.proposed_images[filename]
            else:
                self.incorporated_images[filename] = filename.replace("_", " ")

        for filename in self.incorporated_images.keys():
            if not os.path.exists(os.path.join(self.env_dict['directory'], filename)):
                desc = self.incorporated_images[filename]
                if desc.endswith(".png"):
                    desc = desc.replace(".png", "")
                print("{}: {}".format(filename, desc))
                response = openai.Image.create(
                    prompt=desc,
                    n=1,
                    size="256x256"
                )
                image_url = response['data'][0]['url']
                download(image_url, filename)

    def get_proposed_images_from_message(self, messages):
        def download(img_url, file_name):
            r = requests.get(img_url)
            filepath = os.path.join(self.env_dict['directory'], file_name)
            if os.path.exists(filepath):
                os.remove(filepath)
            with open(filepath, "wb") as f:
                f.write(r.content)
                print("{} Downloaded".format(filepath))

        regex = r"(\w+.png):(.*?)\n"
        matches = re.finditer(regex, messages, re.DOTALL)
        images = {}
        for match in matches:
            filename = match.group(1).strip()
            desc = match.group(2).strip()
            images[filename] = desc

        if len(images.keys()) == 0:
            regex = r"(\w+.png)"
            matches = re.finditer(regex, messages, re.DOTALL)
            images = {}
            for match in matches:
                filename = match.group(1).strip()
                desc = " ".join(filename.replace(".png", "").split("_"))
                images[filename] = desc
                print("{}: {}".format(filename, images[filename]))

        for filename in images.keys():
            if not os.path.exists(os.path.join(self.env_dict['directory'], filename)):
                desc = images[filename]
                if desc.endswith(".png"):
                    desc = desc.replace(".png", "")
                print("{}: {}".format(filename, desc))
                response = openai.Image.create(
                    prompt=desc,
                    n=1,
                    size="256x256"
                )
                image_url = response['data'][0]['url']
                download(image_url, filename)

        return images

作者 east

chatgpt, python 9月 11,2023

清华开源ChatGPT自动编程ChatDev项目chat_chain.py解读

ChatChain类是一个用于实现软件开发的多智能体协作系统，它可以根据用户的自然语言描述来创建定制的软件。
__init__()方法是类的构造函数，它接受以下几个参数：
- config_path: 一个字符串，表示ChatChainConfig.json文件的路径，这个文件包含了ChatChain的基本配置信息，例如智能体链、招聘条件、是否清理结构、是否进行头脑风暴等。
- config_phase_path: 一个字符串，表示PhaseConfig.json文件的路径，这个文件包含了软件开发的各个阶段的配置信息，例如阶段名称、阶段目标、阶段角色、阶段限制等。
- config_role_path: 一个字符串，表示RoleConfig.json文件的路径，这个文件包含了软件开发的各个角色的配置信息，例如角色名称、角色描述、角色提示等。
- task_prompt: 一个字符串，表示用户输入的软件想法，例如“我想要一个五子棋游戏”。
- project_name: 一个字符串，表示用户输入的软件名称，例如“Gomoku”。
- org_name: 一个字符串，表示用户所属的组织名称，例如“OpenBMB”。
- model_type: 一个枚举类型，表示使用的大型语言模型（LLM）的类型，例如ModelType.GPT_3_5_TURBO。
__init__()方法首先将这些参数保存在类的属性中，然后使用open()函数和json模块来打开和解析这些配置文件，并将配置信息保存在类的属性中。接着它根据配置信息初始化了ChatChain的智能体链和招聘条件，并设置了默认的最大对话轮数为10。然后它根据配置信息创建了一个ChatEnvConfig对象和一个ChatEnv对象，用于管理软件开发的环境和资源。接下来它将用户输入的软件想法保存在类的属性中，并根据配置信息决定是否对其进行自我改进（这个过程在类的preprocess()方法中实现）。然后它根据配置信息创建了一个字典对象self.role_prompts，用于存储各个角色的提示信息。最后它调用类的get_logfilepath()方法来获取日志文件的路径，并将其保存在类的属性中。
check_bool()函数是一个辅助函数，用于将字符串转换为布尔值。它接受一个参数s，表示一个字符串。它将s转换为小写，并判断是否等于”true”。如果是，则返回True，否则返回False。
get_logfilepath()方法用于获取日志文件的路径，并返回一个元组(start_time, log_filepath)，其中start_time表示开始时间，log_filepath表示日志文件路径。这个方法首先使用datetime模块来获取当前时间，并将其格式化为”%Y-%m-%d %H:%M:%S”形式，并赋值给start_time变量。然后它使用os模块来获取当前工作目录，并将其与”logs”和start_time拼接起来，得到log_filepath变量。最后它返回(start_time, log_filepath)元组。

make_recruitment()方法用于招聘合适的智能体，并将他们加入到ChatEnv对象中。它遍历配置信息中定义的招聘条件（self.recruitments属性），对于每个条件（即智能体名称），它调用ChatEnv对象的recruit()方法，将智能体名称作为参数传递，表示招聘该智能体。
execute_step()方法用于执行单个软件开发阶段，它接受一个参数phase_item，表示配置信息中定义的单个阶段信息。它首先获取阶段的名称、类型等信息，并根据不同的类型来执行不同的操作。如果阶段类型是”SimplePhase”，则表示这是一个简单的阶段，它会从self.phases属性中获取相应的SimplePhase对象，并调用其execute()方法，将ChatEnv对象、最大对话轮数、是否需要反思等参数传递给该方法，并将返回的新的ChatEnv对象赋值给self.chat_env属性。如果阶段类型是”ComposedPhase”，则表示这是一个复合的阶段，它会从self.compose_phase_module模块中获取相应的ComposedPhase类，并创建一个ComposedPhase对象，将阶段名称、循环次数、组成部分、配置信息、模型类型、日志文件路径等参数传递给其构造函数，并将该对象赋值给compose_phase_instance变量。然后它调用compose_phase_instance对象的execute()方法，将ChatEnv对象作为参数传递，并将返回的新的ChatEnv对象赋值给self.chat_env属性。如果阶段类型是其他类型，则抛出一个异常，表示未实现该类型。
execute_chain()方法用于执行整个软件开发过程，它遍历配置信息中定义的智能体链（self.chain属性），对于每个阶段信息，它调用execute_step()方法来执行该阶段。
get_logfilepath()方法用于获取日志文件的路径，并返回一个元组(start_time, log_filepath)，其中start_time表示开始时间，log_filepath表示日志文件路径。这个方法首先使用datetime模块来获取当前时间，并将其格式化为”%Y-%m-%d %H:%M:%S”形式，并赋值给start_time变量。然后它使用os模块来获取当前工作目录，并将其与”logs”和start_time拼接起来，得到log_filepath变量。最后它返回(start_time, log_filepath)元组。

pre_processing()方法用于进行预处理，例如删除无用的文件和记录一些全局的配置信息。它不接受任何参数，也不返回任何值。它首先判断ChatEnv对象的配置信息中是否需要清理结构（self.chat_env.config.clear_structure属性），如果是，则使用os模块来遍历WareHouse目录中的所有文件，并删除除了.py和.log以外的文件，并打印出删除的文件路径。然后它获取软件保存的目录（由项目名称、组织名称和开始时间拼接而成），并调用ChatEnv对象的set_directory()方法，将该目录作为参数传递，表示设置该目录为软件目录。接着它使用shutil模块的copy()函数来将配置文件复制到软件目录中，并使用open()函数和write()方法来将用户输入的软件想法写入到软件目录中的一个.prompt文件中。然后它创建一个字符串preprocess_msg，并赋值为”[Preprocessing]\n\n”，表示开始预处理。接着它创建一个ChatGPTConfig对象，并将其赋值给chat_gpt_config变量，表示LLM的配置信息。然后它在preprocess_msg字符串后面追加一些信息，例如开始时间、配置文件路径、软件想法、项目名称、日志文件路径、ChatDevConfig对象、ChatGPTConfig对象等，并使用log_and_print_online()函数来将preprocess_msg字符串记录到日志文件中，并打印出来。最后它判断配置信息中是否需要进行自我改进（self.config[‘self_improve’]属性），如果是，则调用self.self_task_improve()方法，将用户输入的软件想法作为参数传递，并将返回的更完善的想法赋值给self.chat_env.env_dict[‘task_prompt’]属性。如果不是，则直接将用户输入的软件想法赋值给self.chat_env.env_dict[‘task_prompt’]属性。
post_processing()方法用于进行后处理，例如总结产出和移动日志文件到软件目录中。它不接受任何参数，也不返回任何值。它首先调用ChatEnv对象的write_meta()方法，用于写入元数据信息到软件目录中。然后它使用os模块来获取当前工作目录，并将其赋值给filepath变量。接着它使用os模块来获取当前工作目录的父目录，并将其赋值给root变量。然后它创建一个字符串post_info，并赋值为”[Post Info]\n\n”，表示开始后处理。接着它使用datetime模块来获取当前时间，并将其格式化为”%Y%m%d%H%M%S”形式，并赋值给now_time变量。然后它使用datetime模块和strptime()函数来将开始时间和当前时间转换为datetime对象，并分别赋值给datetime1和datetime2变量。接着它使用total_seconds()方法来计算两个datetime对象之间的差异，并将其赋值给duration变量，表示软件开发所花费的时间。然后它在post_info字符串后面追加”Software Info: {}“.format(get_info(self.chat_env.env_dict[‘directory’], self.log_filepath) + “\n\n🕑duration={:.2f}s\n\n”.format(duration))，表示显示软件的信息和开发时间。接着它在post_info字符串后面追加”ChatDev Starts ({})”.format(self.start_time) + “\n\n”，表示显示开始时间。最后它在post_info字符串后面追加”ChatDev Ends ({})”.format(now_time) + “\n\n”，表示显示结束时间。

这段代码定义了一个self_task_improve()方法，用于对用户输入的软件想法进行自我改进，让LLM更好地理解这些想法。它接受一个参数task_prompt，表示用户输入的软件想法。它返回一个字符串revised_task_prompt，表示经过改进的软件想法。
这个方法首先创建一个字符串self_task_improve_prompt，并赋值为一段提示信息，表示要求用户将一个简短的软件设计需求重写为一个详细的提示，让LLM能够根据这个提示来更好地制作这个软件。这个提示信息中包含了用户输入的软件想法（task_prompt参数），以及一些注意事项，例如提示的长度、格式等。然后它创建一个RolePlaying对象role_play_session，并将其赋值给role_play_session变量，表示一个角色扮演的会话。它将以下几个参数传递给RolePlaying类的构造函数：
- assistant_role_name: 一个字符串，表示助理的角色名称，为”Prompt Engineer”。
- assistant_role_prompt: 一个字符串，表示助理的角色描述，为”You are an professional prompt engineer that can improve user input prompt to make LLM better understand these prompts.”。
- user_role_prompt: 一个字符串，表示用户的角色描述，为”You are an user that want to use LLM to build software.”。
- user_role_name: 一个字符串，表示用户的角色名称，为”User”。
- task_type: 一个枚举类型，表示任务类型，为TaskType.CHATDEV。
- task_prompt: 一个字符串，表示任务描述，为”Do prompt engineering on user query”。
- with_task_specify: 一个布尔值，表示是否需要指定任务类型，为False。
- model_type: 一个枚举类型，表示使用的LLM的类型，为self.model_type属性。
接着它调用role_play_session对象的init_chat()方法，将None、None和self_task_improve_prompt作为参数传递，并将返回的两个值赋值给_和input_user_msg变量。这个方法用于初始化角色扮演的会话，并返回助理和用户的第一轮对话。其中input_user_msg变量表示用户输入的重写后的软件想法。然后它调用role_play_session对象的step()方法，将input_user_msg和True作为参数传递，并将返回的两个值赋值给assistant_response和user_response变量。这个方法用于进行角色扮演的一步对话，并返回助理和用户的回复。其中assistant_response变量表示助理回复的内容。接着它使用split()方法和strip()方法来从助理回复中提取出改进后的软件想法，并将其赋值给revised_task_prompt变量。然后它调用log_and_print_online()函数来记录并打印助理回复的内容。最后它调用log_and_print_online()函数来记录并打印原始和改进后的软件想法，并返回revised_task_prompt变量。

chat_chain.py的源代码如下：

import importlib
import json
import os
import shutil
from datetime import datetime
import logging
import time

from camel.agents import RolePlaying
from camel.configs import ChatGPTConfig
from camel.typing import TaskType, ModelType
from chatdev.chat_env import ChatEnv, ChatEnvConfig
from chatdev.statistics import get_info
from chatdev.utils import log_and_print_online, now


def check_bool(s):
    return s.lower() == "true"


class ChatChain:

    def __init__(self,
                 config_path: str = None,
                 config_phase_path: str = None,
                 config_role_path: str = None,
                 task_prompt: str = None,
                 project_name: str = None,
                 org_name: str = None,
                 model_type: ModelType = ModelType.GPT_3_5_TURBO) -> None:
        """

        Args:
            config_path: path to the ChatChainConfig.json
            config_phase_path: path to the PhaseConfig.json
            config_role_path: path to the RoleConfig.json
            task_prompt: the user input prompt for software
            project_name: the user input name for software
            org_name: the organization name of the human user
        """

        # load config file
        self.config_path = config_path
        self.config_phase_path = config_phase_path
        self.config_role_path = config_role_path
        self.project_name = project_name
        self.org_name = org_name
        self.model_type = model_type

        with open(self.config_path, 'r', encoding="utf8") as file:
            self.config = json.load(file)
        with open(self.config_phase_path, 'r', encoding="utf8") as file:
            self.config_phase = json.load(file)
        with open(self.config_role_path, 'r', encoding="utf8") as file:
            self.config_role = json.load(file)

        # init chatchain config and recruitments
        self.chain = self.config["chain"]
        self.recruitments = self.config["recruitments"]

        # init default max chat turn
        self.chat_turn_limit_default = 10

        # init ChatEnv
        self.chat_env_config = ChatEnvConfig(clear_structure=check_bool(self.config["clear_structure"]),
                                             brainstorming=check_bool(self.config["brainstorming"]),
                                             gui_design=check_bool(self.config["gui_design"]),
                                             git_management=check_bool(self.config["git_management"]))
        self.chat_env = ChatEnv(self.chat_env_config)

        # the user input prompt will be self-improved (if set "self_improve": "True" in ChatChainConfig.json)
        # the self-improvement is done in self.preprocess
        self.task_prompt_raw = task_prompt
        self.task_prompt = ""

        # init role prompts
        self.role_prompts = dict()
        for role in self.config_role:
            self.role_prompts[role] = "\n".join(self.config_role[role])

        # init log
        self.start_time, self.log_filepath = self.get_logfilepath()

        # init SimplePhase instances
        # import all used phases in PhaseConfig.json from chatdev.phase
        # note that in PhaseConfig.json there only exist SimplePhases
        # ComposedPhases are defined in ChatChainConfig.json and will be imported in self.execute_step
        self.compose_phase_module = importlib.import_module("chatdev.composed_phase")
        self.phase_module = importlib.import_module("chatdev.phase")
        self.phases = dict()
        for phase in self.config_phase:
            assistant_role_name = self.config_phase[phase]['assistant_role_name']
            user_role_name = self.config_phase[phase]['user_role_name']
            phase_prompt = "\n\n".join(self.config_phase[phase]['phase_prompt'])
            phase_class = getattr(self.phase_module, phase)
            phase_instance = phase_class(assistant_role_name=assistant_role_name,
                                         user_role_name=user_role_name,
                                         phase_prompt=phase_prompt,
                                         role_prompts=self.role_prompts,
                                         phase_name=phase,
                                         model_type=self.model_type,
                                         log_filepath=self.log_filepath)
            self.phases[phase] = phase_instance



    def make_recruitment(self):
        """
        recruit all employees
        Returns: None

        """
        for employee in self.recruitments:
            self.chat_env.recruit(agent_name=employee)

    def execute_step(self, phase_item: dict):
        """
        execute single phase in the chain
        Args:
            phase_item: single phase configuration in the ChatChainConfig.json

        Returns:

        """

        phase = phase_item['phase']
        phase_type = phase_item['phaseType']
        # For SimplePhase, just look it up from self.phases and conduct the "Phase.execute" method
        if phase_type == "SimplePhase":
            max_turn_step = phase_item['max_turn_step']
            need_reflect = check_bool(phase_item['need_reflect'])
            if phase in self.phases:
                self.chat_env = self.phases[phase].execute(self.chat_env,
                                                           self.chat_turn_limit_default if max_turn_step <= 0 else max_turn_step,
                                                           need_reflect)
            else:
                raise RuntimeError(f"Phase '{phase}' is not yet implemented in chatdev.phase")
        # For ComposedPhase, we create instance here then conduct the "ComposedPhase.execute" method
        elif phase_type == "ComposedPhase":
            cycle_num = phase_item['cycleNum']
            composition = phase_item['Composition']
            compose_phase_class = getattr(self.compose_phase_module, phase)
            if not compose_phase_class:
                raise RuntimeError(f"Phase '{phase}' is not yet implemented in chatdev.compose_phase")
            compose_phase_instance = compose_phase_class(phase_name=phase,
                                                         cycle_num=cycle_num,
                                                         composition=composition,
                                                         config_phase=self.config_phase,
                                                         config_role=self.config_role,
                                                         model_type=self.model_type,
                                                         log_filepath=self.log_filepath)
            self.chat_env = compose_phase_instance.execute(self.chat_env)
        else:
            raise RuntimeError(f"PhaseType '{phase_type}' is not yet implemented.")

    def execute_chain(self):
        """
        execute the whole chain based on ChatChainConfig.json
        Returns: None

        """
        for phase_item in self.chain:
            self.execute_step(phase_item)

    def get_logfilepath(self):
        """
        get the log path (under the software path)
        Returns:
            start_time: time for starting making the software
            log_filepath: path to the log

        """
        start_time = now()
        filepath = os.path.dirname(__file__)
        # root = "/".join(filepath.split("/")[:-1])
        root = os.path.dirname(filepath)
        # directory = root + "/WareHouse/"
        directory = os.path.join(root, "WareHouse")
        log_filepath = os.path.join(directory, "{}.log".format("_".join([self.project_name, self.org_name,start_time])))
        return start_time, log_filepath

    def pre_processing(self):
        """
        remove useless files and log some global config settings
        Returns: None

        """
        if self.chat_env.config.clear_structure:
            filepath = os.path.dirname(__file__)
            # root = "/".join(filepath.split("/")[:-1])
            root = os.path.dirname(filepath)
            # directory = root + "/WareHouse"
            directory = os.path.join(root, "WareHouse")
            for filename in os.listdir(directory):
                file_path = os.path.join(directory, filename)
                # logs with error trials are left in WareHouse/
                if os.path.isfile(file_path) and not filename.endswith(".py") and not filename.endswith(".log"):
                    os.remove(file_path)
                    print("{} Removed.".format(file_path))

        software_path = os.path.join(directory, "_".join([self.project_name, self.org_name, self.start_time]))
        self.chat_env.set_directory(software_path)

        # copy config files to software path
        shutil.copy(self.config_path, software_path)
        shutil.copy(self.config_phase_path, software_path)
        shutil.copy(self.config_role_path, software_path)

        # write task prompt to software path
        with open(os.path.join(software_path, self.project_name + ".prompt"), "w") as f:
            f.write(self.task_prompt_raw)

        preprocess_msg = "**[Preprocessing]**\n\n"
        chat_gpt_config = ChatGPTConfig()

        preprocess_msg += "**ChatDev Starts** ({})\n\n".format(self.start_time)
        preprocess_msg += "**Timestamp**: {}\n\n".format(self.start_time)
        preprocess_msg += "**config_path**: {}\n\n".format(self.config_path)
        preprocess_msg += "**config_phase_path**: {}\n\n".format(self.config_phase_path)
        preprocess_msg += "**config_role_path**: {}\n\n".format(self.config_role_path)
        preprocess_msg += "**task_prompt**: {}\n\n".format(self.task_prompt_raw)
        preprocess_msg += "**project_name**: {}\n\n".format(self.project_name)
        preprocess_msg += "**Log File**: {}\n\n".format(self.log_filepath)
        preprocess_msg += "**ChatDevConfig**:\n {}\n\n".format(self.chat_env.config.__str__())
        preprocess_msg += "**ChatGPTConfig**:\n {}\n\n".format(chat_gpt_config)
        log_and_print_online(preprocess_msg)

        # init task prompt
        if check_bool(self.config['self_improve']):
            self.chat_env.env_dict['task_prompt'] = self.self_task_improve(self.task_prompt_raw)
        else:
            self.chat_env.env_dict['task_prompt'] = self.task_prompt_raw

    def post_processing(self):
        """
        summarize the production and move log files to the software directory
        Returns: None

        """

        self.chat_env.write_meta()
        filepath = os.path.dirname(__file__)
        # root = "/".join(filepath.split("/")[:-1])
        root = os.path.dirname(filepath)

        post_info = "**[Post Info]**\n\n"
        now_time = now()
        time_format = "%Y%m%d%H%M%S"
        datetime1 = datetime.strptime(self.start_time, time_format)
        datetime2 = datetime.strptime(now_time, time_format)
        duration = (datetime2 - datetime1).total_seconds()

        post_info += "Software Info: {}".format(
            get_info(self.chat_env.env_dict['directory'], self.log_filepath) + "\n\n🕑**duration**={:.2f}s\n\n".format(duration))

        post_info += "ChatDev Starts ({})".format(self.start_time) + "\n\n"
        post_info += "ChatDev Ends ({})".format(now_time) + "\n\n"

        if self.chat_env.config.clear_structure:
            directory = self.chat_env.env_dict['directory']
            for filename in os.listdir(directory):
                file_path = os.path.join(directory, filename)
                if os.path.isdir(file_path) and file_path.endswith("__pycache__"):
                    shutil.rmtree(file_path, ignore_errors=True)
                    post_info += "{} Removed.".format(file_path) + "\n\n"

        log_and_print_online(post_info)

        logging.shutdown()
        time.sleep(1)

        shutil.move(self.log_filepath,
                    os.path.join(root + "/WareHouse", "_".join([self.project_name, self.org_name, self.start_time]),
                                 os.path.basename(self.log_filepath)))

    # @staticmethod
    def self_task_improve(self, task_prompt):
        """
        ask agent to improve the user query prompt
        Args:
            task_prompt: original user query prompt

        Returns:
            revised_task_prompt: revised prompt from the prompt engineer agent

        """
        self_task_improve_prompt = """I will give you a short description of a software design requirement, 
please rewrite it into a detailed prompt that can make large language model know how to make this software better based this prompt,
the prompt should ensure LLMs build a software that can be run correctly, which is the most import part you need to consider.
remember that the revised prompt should not contain more than 200 words, 
here is the short description:\"{}\". 
If the revised prompt is revised_version_of_the_description, 
then you should return a message in a format like \"<INFO> revised_version_of_the_description\", do not return messages in other formats.""".format(
            task_prompt)
        role_play_session = RolePlaying(
            assistant_role_name="Prompt Engineer",
            assistant_role_prompt="You are an professional prompt engineer that can improve user input prompt to make LLM better understand these prompts.",
            user_role_prompt="You are an user that want to use LLM to build software.",
            user_role_name="User",
            task_type=TaskType.CHATDEV,
            task_prompt="Do prompt engineering on user query",
            with_task_specify=False,
            model_type=self.model_type,
        )

        # log_and_print_online("System", role_play_session.assistant_sys_msg)
        # log_and_print_online("System", role_play_session.user_sys_msg)

        _, input_user_msg = role_play_session.init_chat(None, None, self_task_improve_prompt)
        assistant_response, user_response = role_play_session.step(input_user_msg, True)
        revised_task_prompt = assistant_response.msg.content.split("<INFO>")[-1].lower().strip()
        log_and_print_online(role_play_session.assistant_agent.role_name, assistant_response.msg.content)
        log_and_print_online(
            "**[Task Prompt Self Improvement]**\n**Original Task Prompt**: {}\n**Improved Task Prompt**: {}".format(
                task_prompt, revised_task_prompt))
        return revised_task_prompt

作者 east

python 9月 11,2023

清华开源ChatGPT自动编程ChatDev项目codes.py代码解读

这段代码定义了一个Codes类，这个类是用于管理生成的代码的类，它可以根据LLM的回复来提取、格式化、更新和保存代码。
Codes类的__init__()方法是类的构造函数，它接受一个参数generated_content，表示LLM的回复内容。它首先初始化了以下几个属性：
- self.directory: 一个字符串，表示代码保存的目录。
- self.version: 一个浮点数，表示代码的版本号。
- self.generated_content: 一个字符串，表示LLM的回复内容。
- self.codebooks: 一个字典，表示代码的集合，以文件名为键，以代码内容为值。
然后它定义了两个内部函数extract_filename_from_line()和extract_filename_from_code()，用于从LLM的回复中提取文件名。这两个函数都接受一个参数lines或code，表示LLM的回复中的一部分内容。这两个函数都使用re模块来进行正则表达式匹配，并返回匹配到的文件名。如果没有匹配到文件名，则返回空字符串。
接下来，如果generated_content参数不为空，则它使用re模块来遍历LLM的回复中包含在“`符号中的代码段，并将其保存在code变量中。然后它判断code变量是否包含”CODE”字符串，如果是，则跳过这个代码段，因为这是一个占位符。然后它调用extract_filename_from_line()函数来从LLM的回复中提取文件名，并将其保存在filename变量中。如果filename变量为空，则它判断code变量是否包含”__main__“字符串，如果是，则将filename变量赋值为”main.py”，因为这是主程序文件。如果filename变量仍然为空，则它调用extract_filename_from_code()函数来从code变量中提取文件名，并将其保存在filename变量中。最后它断言filename变量不为空，并将其作为键，将经过_format_code()方法格式化后的code变量作为值，添加到self.codebooks字典中。
_format_code()方法用于对代码进行格式化，例如去除多余的空行等。它接受一个参数code，表示代码内容。它首先使用splitlines()方法和join()方法来去除空行，并返回格式化后的代码。
_update_codes()方法用于更新生成的代码，根据新的LLM的回复来比较、修改和保存代码。它接受一个参数generated_content，表示新的LLM的回复内容。它首先创建一个新的Codes对象new_codes，并将generated_content作为参数传递给其构造函数。然后它导入difflib模块，用于进行文本比较。接着它遍历new_codes对象中的self.codebooks字典，对于每个键值对（即文件名和代码内容），它判断是否存在于self.codebooks字典中，或者是否与self.codebooks字典中相同键对应的值不同。如果是，则表示需要更新代码，并执行以下操作：
- 创建一个字符串update_codes_content，并赋值为”[Update Codes]\n\n”，表示开始更新代码。
- 在update_codes_content字符串后面追加”{} updated.\n”.format(key)，表示更新了哪个文件。
- 创建两个字符串old_codes_content和new_codes_content，并分别赋值为self.codebooks字典中相同键对应的值（即旧代码）和new_codes对象中相同键对应的值（即新代码）。如果self.codebooks字典中不存在相同键，则将old_codes_content赋值为”# None”。
- 使用splitlines()方法将old_codes_content和new_codes_content分割成行列表，并分别赋值给lines_old和lines_new。
- 使用difflib.unified_diff()函数来生成两个行列表之间的差异，并返回一个生成器对象unified_diff。
- 使用join()方法将unified_diff生成器对象转换为一个字符串，并赋值给unified_diff。
- 在update_codes_content字符串后面追加”\n\n” + “””“` ‘’’

‘’’\n”“” + unified_diff + “\n“`”，表示显示代码的差异。 – 调用utils.log_and_print_online()函数来将update_codes_content字符串记录到日志文件中，并打印出来。 – 将new_codes对象中相同键对应的值赋值给self.codebooks字典中相同键对应的值，表示更新代码。

_rewrite_codes()方法用于重写代码，根据self.codebooks字典中的内容来修改和保存代码。它接受一个参数git_management，表示是否进行Git管理。它首先获取self.directory属性，表示代码保存的目录，并创建一个字符串rewrite_codes_content，用于记录重写过程。然后它判断目录是否存在并且不为空，如果是，则将self.version属性加一，表示代码的版本号增加。如果目录不存在，则使用os模块的mkdir()函数来创建目录，并在rewrite_codes_content字符串后面追加”{} Created\n”.format(directory)，表示创建了目录。
接着它遍历self.codebooks字典中的键值对（即文件名和代码内容），对于每个键值对，它使用os模块的join()函数来拼接目录和文件名，得到文件路径，并将其保存在filepath变量中。然后它使用open()函数和write()方法来打开并写入文件，并在rewrite_codes_content字符串后面追加os.path.join(directory, filename) + ” Wrote\n”，表示写入了文件。
如果git_management参数为真，则表示需要进行Git管理，它会使用os模块的system()函数来执行一些Git命令，例如初始化仓库、添加文件、提交更改等，并将self.version属性作为提交信息。
最后它调用utils.log_and_print_online()函数来将rewrite_codes_content字符串记录到日志文件中，并打印出来。
_get_codes()方法用于获取代码，根据self.codebooks字典中的内容来生成一个字符串，表示代码的集合。它首先创建一个空字符串content，然后遍历self.codebooks字典中的键值对（即文件名和代码内容），对于每个键值对，它在content字符串后面追加”{}\n{}\n{}\n\n\n”.format(filename, “python” if filename.endswith(“.py”) else filename.split(“.”)[-1], self.codebooks[filename])，表示显示文件名和代码内容，并根据文件扩展名来指定语言类型。最后它返回content字符串。
_load_from_hardware()方法用于从硬盘中加载代码，根据给定的目录来读取并保存代码。它接受一个参数directory，表示代码所在的目录。它首先断言目录中存在以.py结尾的文件，然后使用os模块的walk()函数来遍历目录中的所有文件。对于每个文件，如果文件以.py结尾，则使用open()函数和read()方法来读取文件内容，并将其保存在code变量中。然后将经过_format_code()方法格式化后的code变量作为值，将文件名作为键，添加到self.codebooks字典中。最后调用utils.log_and_print_online()函数来记录并打印”{} files read from {}”.format(len(self.codebooks.keys()), directory)，表示从目录中读取了多少个文件。

codes.py的代码如下：

import os
import re

from chatdev.utils import log_and_print_online
import difflib

class Codes:
    def __init__(self, generated_content=""):
        self.directory: str = None
        self.version: float = 1.0
        self.generated_content: str = generated_content
        self.codebooks = {}

        def extract_filename_from_line(lines):
            file_name = ""
            for candidate in re.finditer(r"(\w+\.\w+)", lines, re.DOTALL):
                file_name = candidate.group()
                file_name = file_name.lower()
            return file_name

        def extract_filename_from_code(code):
            file_name = ""
            regex_extract = r"class (\S+?):\n"
            matches_extract = re.finditer(regex_extract, code, re.DOTALL)
            for match_extract in matches_extract:
                file_name = match_extract.group(1)
            file_name = file_name.lower().split("(")[0] + ".py"
            return file_name

        if generated_content != "":
            regex = r"(.+?)\n```.*?\n(.*?)```"
            matches = re.finditer(regex, self.generated_content, re.DOTALL)
            for match in matches:
                code = match.group(2)
                if "CODE" in code:
                    continue
                group1 = match.group(1)
                filename = extract_filename_from_line(group1)
                if "__main__" in code:
                    filename = "main.py"
                if filename == "":  # post-processing
                    filename = extract_filename_from_code(code)
                assert filename != ""
                if filename is not None and code is not None and len(filename) > 0 and len(code) > 0:
                    self.codebooks[filename] = self._format_code(code)

    def _format_code(self, code):
        code = "\n".join([line for line in code.split("\n") if len(line.strip()) > 0])
        return code

    def _update_codes(self, generated_content):
        new_codes = Codes(generated_content)
        differ = difflib.Differ()
        for key in new_codes.codebooks.keys():
            if key not in self.codebooks.keys() or self.codebooks[key] != new_codes.codebooks[key]:
                update_codes_content = "**[Update Codes]**\n\n"
                update_codes_content += "{} updated.\n".format(key)
                old_codes_content = self.codebooks[key] if key in self.codebooks.keys() else "# None"
                new_codes_content = new_codes.codebooks[key]

                lines_old = old_codes_content.splitlines()
                lines_new = new_codes_content.splitlines()

                unified_diff = difflib.unified_diff(lines_old, lines_new, lineterm='', fromfile='Old', tofile='New')
                unified_diff = '\n'.join(unified_diff)
                update_codes_content = update_codes_content + "\n\n" + """```
'''

'''\n""" + unified_diff + "\n```"

                log_and_print_online(update_codes_content)
                self.codebooks[key] = new_codes.codebooks[key]

    def _rewrite_codes(self, git_management) -> None:
        directory = self.directory
        rewrite_codes_content = "**[Rewrite Codes]**\n\n"
        if os.path.exists(directory) and len(os.listdir(directory)) > 0:
            self.version += 1.0
        if not os.path.exists(directory):
            os.mkdir(self.directory)
            rewrite_codes_content += "{} Created\n".format(directory)

        for filename in self.codebooks.keys():
            filepath = os.path.join(directory, filename)
            with open(filepath, "w", encoding="utf-8") as writer:
                writer.write(self.codebooks[filename])
                rewrite_codes_content += os.path.join(directory, filename) + " Wrote\n"

        if git_management:
            if self.version == 1.0:
                os.system("cd {}; git init".format(self.directory))
            os.system("cd {}; git add .".format(self.directory))
            os.system("cd {}; git commit -m \"{}\"".format(self.directory, self.version))

        log_and_print_online(rewrite_codes_content)

    def _get_codes(self) -> str:
        content = ""
        for filename in self.codebooks.keys():
            content += "{}\n```{}\n{}\n```\n\n".format(filename,
                                                       "python" if filename.endswith(".py") else filename.split(".")[
                                                           -1], self.codebooks[filename])
        return content

    def _load_from_hardware(self, directory) -> None:
        assert len([filename for filename in os.listdir(directory) if filename.endswith(".py")]) > 0
        for root, directories, filenames in os.walk(directory):
            for filename in filenames:
                if filename.endswith(".py"):
                    code = open(os.path.join(directory, filename), "r", encoding="utf-8").read()
                    self.codebooks[filename] = self._format_code(code)
        log_and_print_online("{} files read from {}".format(len(self.codebooks.keys()), directory))

作者 east

chatgpt 9月 11,2023

清华开源ChatGPT自动编程ChatDev项目结构和关键代码解析

这个项目的详细解析如下：

项目概述：ChatDev是一个使用自然语言描述的想法来创建定制软件的项目，它通过多智能体协作的方式来实现软件开发的各个阶段，包括设计、编码、测试和文档 ¹ 。ChatDev的目标是提供一个易于使用、高度可定制和可扩展的框架，基于大型语言模型（LLM），作为研究集体智能的理想场景 ¹。
项目作用：ChatDev可以让用户通过简单的自然语言描述来构建自己想要的软件，无需编程知识或技能。用户只需要提供一个简单的想法，例如“我想要一个五子棋游戏”，就可以让ChatDev的智能体们协同工作，生成一个完整的五子棋软件，包括界面、逻辑、功能和文档 ¹ 。用户可以在过程中与智能体们进行交互，提供反馈或修改需求，从而得到更满意的结果 ¹。
项目结构：ChatDev的项目结构如下：
项目关键代码详细解析：
- run.py：这个文件是项目运行时最先执行的代码，它首先导入了chatdev模块中定义的Company类，并创建了一个Company对象company。然后它调用company.init()方法来初始化公司配置，并打印出公司名称和使命。接下来它调用company.start()方法来启动公司的运行，这个方法会创建一个新的线程来执行company.run()方法，这个方法是公司运行的主循环。然后它调用company.input()方法来接收用户输入，并将用户输入传递给company.handle_input()方法，这个方法会根据用户输入的内容来执行相应的操作，例如创建新的软件项目、查看已有的软件项目、切换在线日志模式或回放模式等。最后它调用company.stop()方法来停止公司的运行，并释放资源。
- chatdev/company.py：这个文件定义了Company类，这个类是ChatDev的核心类，代表了一个虚拟的软件公司。Company类有以下几个主要属性和方法：
  - __init__()：这个方法是Company类的构造函数，它接受一个参数config，表示公司配置文件的路径。它首先调用load_config()方法来加载配置文件，并将配置信息保存在self.config属性中。然后它创建了一个空列表self.projects，用于存储公司创建的软件项目。接着它创建了一个空字典self.agents，用于存储公司拥有的智能体。最后它创建了一个空队列self.queue，用于存储智能体之间的消息。
  - load_config()：这个方法用于加载配置文件，并返回一个字典对象，包含了配置信息。它首先使用json模块打开配置文件，并将其解析为一个Python对象。然后它检查配置信息是否合法，例如是否包含了必要的字段，是否符合预期的格式等。如果配置信息合法，它就返回这个对象，否则它就抛出一个异常。
  - init()：这个方法用于初始化公司，根据配置信息创建智能体并分配角色。它首先遍历配置信息中的agents字段，对于每个智能体，它根据其类型和名称创建一个Agent对象，并将其添加到self.agents字典中，以名称为键，对象为值。然后它遍历配置信息中的roles字段，对于每个角色，它根据其名称和成员列表创建一个Role对象，并将其添加到self.agents字典中，以名称为键，对象为值。最后它遍历配置信息中的relations字段，对于每个关系，它根据其类型和成员列表创建一个Relation对象，并将其添加到self.agents字典中，以类型为键，对象为值。
  - start()：这个方法用于启动公司的运行，它创建了一个新的线程来执行self.run()方法，并将其保存在self.thread属性中。
  - stop()：这个方法用于停止公司的运行，它向self.queue队列中发送一个特殊的消息”STOP”，表示终止信号，并等待self.thread线程结束。
  - run()：这个方法是公司运行的主循环，它不断地从self.queue队列中获取消息，并根据消息内容进行处理。如果消息是”STOP”，表示终止信号，它就退出循环并结束线程。如果消息是一个元组(msg, sender, receiver)，表示智能体之间的通信消息，它就调用handle_message()方法来处理这个消息。如果消息是其他类型，表示异常情况，它就打印出错误信息并忽略这个消息。
  - handle_message()：这个方法用于处理智能体之间的通信消息，它接受三个参数msg, sender, receiver，分别表示消息内容

作者 east

python 9月 8,2023

python自动合成图片为gif，并能根据第一张图片自动统一图片尺寸

网上找来合成图片成gif的代码，没想到运行报错：
Traceback (most recent call last): File “D:\code\python\binance-quantization-master\tools\giftool.py”, line 5, in <module> import imageio.v3 as iio ModuleNotFoundError: No module named ‘imageio.v3’

明明已经运行 pip install imageio 安装模块了。后来分析可能版本旧了，重新升级模块： pip install –upgrade imageio

随便找来几张图片试验：

Traceback (most recent call last): File “D:\code\python\binance-quantization-master\tools\giftool.py”, line 16, in <module> iio.imwrite(‘movie.gif’, images, duration=3, loop=0) File “D:\aiBigData\anaconda3\lib\site-packages\imageio\v3.py”, line 147, in imwrite encoded = img_file.write(image, **kwargs) File “D:\aiBigData\anaconda3\lib\site-packages\imageio\plugins\pillow.py”, line 389, in write ndimage = np.stack(ndimage, axis=0) File “D:\aiBigData\anaconda3\lib\site-packages\numpy\core\shape_base.py”, line 449, in stack raise ValueError(‘all input arrays must have the same shape’) ValueError: all input arrays must have the same shape

导致错误的原因是所有输入的图像数组必须具有相同的形状。这意味着合成 GIF 时，要确保所有的图像具有相同的宽度和高度。在实际应用场景，也很有可能尺寸大小有轻微不同。

一种简单的方法是使用 PIL 库来调整图像的大小。

原来的代码：

import imageio.v3 as iio
import os

png_dir = 'images'
images = []

# list file in folder 'images' and sort them by name
image_list = [os.path.join(png_dir, f) for f in os.listdir(png_dir) if f.endswith('.png')]
image_list.sort()

# append images to list
for file_name in image_list:
    images.append(iio.imread(file_name))

# save as gif file
iio.imwrite('movie.gif', images, duration=3, loop=0)

修改后代码：

import imageio.v3 as iio
import os
from PIL import Image

png_dir = 'd:\\tmp'
images = []

# list file in folder 'images' and sort them by name
image_list = [os.path.join(png_dir, f) for f in os.listdir(png_dir) if f.endswith('.png')]
image_list.sort()


# 获取第一张图像的尺寸
first_image = Image.open(os.path.join(png_dir, image_list[0]))
target_size = first_image.size

# 循环处理图像并调整大小
for file_name in image_list:
    image = Image.open(file_name)
    resized_image = image.resize(target_size)
    images.append(resized_image)


# save as gif file
iio.imwrite('movie.gif', images, duration=3, loop=0)

作者 east

Java, python 9月 7,2023

用ChatGPT自动生成流程图

我们看别人代码时，总希望有流程图，这样可以一目了然，不过自己写的代码，又不想花几个小时去画流程图。有没有更好的方法呢？

方法就是用ChatGPT等大模型自动生成流程图，并用python等语言实现自动输出流程图。

1、生成流程图的 Mermaid语法

ChatGPT提示语：

对下面的代码生成流程图，并用Mermaid语法输出 。

2、把Mermaid语法的流程图输出图片

要使用Python或Java生成Mermaid语法输出的流程图图片，您可以使用以下方法：

Python 方法：

使用 mermaid-cli 工具来将Mermaid代码转换为图片。首先，安装mmdc（mermaid-cli的执行程序）。
使用Python调用mmdc。

import os

def generate_mermaid_image(mermaid_code, output_path):
    with open("temp.mmd", "w") as file:
        file.write(mermaid_code)
    os.system(f"mmdc -i temp.mmd -o {output_path}")
    os.remove("temp.mmd")

mermaid_code = """
graph TD;
    A-->B;
    A-->C;
    B-->D;
    C-->D;
"""

generate_mermaid_image(mermaid_code, "output.png")

Java 方法：

与Python方法相同，首先安装 mermaid-cli。
使用Java的 Runtime 类来调用 mmdc。

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;

public class MermaidGenerator {

    public static void generateMermaidImage(String mermaidCode, String outputPath) throws IOException, InterruptedException {
        File tempFile = File.createTempFile("temp", ".mmd");
        try (FileWriter writer = new FileWriter(tempFile)) {
            writer.write(mermaidCode);
        }

        Process process = Runtime.getRuntime().exec("mmdc -i " + tempFile.getAbsolutePath() + " -o " + outputPath);
        process.waitFor();

        tempFile.delete();
    }

    public static void main(String[] args) throws IOException, InterruptedException {
        String mermaidCode = """
            graph TD;
                A-->B;
                A-->C;
                B-->D;
                C-->D;
            """;
        generateMermaidImage(mermaidCode, "output.png");
    }
}

请注意，这两种方法都需要您在计算机上安装并配置mermaid-cli。此外，这两种方法都是使用临时文件来存储Mermaid代码，并在转换后删除它。这是为了简化调用mmdc的过程，但您可以根据需要进行调整。

作者 east

python 9月 7,2023

python来操作ppt

自动化创建 PowerPoint 演示文稿，仅添加文字而不修改图形，可以使用 Python 和 Python-pptx 库来实现。下面是一个详细的解决方案：

步骤 1：准备 PowerPoint 模板

创建一个 PowerPoint 模板，其中包含您想要的样式、布局和占位符文本框。确保在模板中为每个要添加文字的位置添加文本框（占位符）。

步骤 2：安装 Python-pptx 库

使用 pip 安装 Python-pptx 库，这是一个用于生成 PowerPoint 文件的库。

python复制代码pip install python-pptx

步骤 3：编写 Python 脚本

创建一个 Python 脚本，以自动化生成 PowerPoint 演示文稿。以下是一个示例脚本：

from pptx import Presentation

# 1. 打开 PowerPoint 模板
ppt = Presentation('your_template.pptx')

# 2. 选择要添加文字的幻灯片和文本框（占位符）
slide_index = 0  # 幻灯片索引，从0开始
textbox_index = 0  # 文本框索引，从0开始

slide = ppt.slides[slide_index]
textbox = slide.shapes[textbox_index]

# 3. 添加文字到文本框
text_to_add = "这是要添加的文本。"
textbox.text = text_to_add

# 4. 保存生成的 PowerPoint 文件
ppt.save('generated_presentation.pptx')

在脚本中，您可以指定要添加文字的幻灯片索引和文本框索引。然后，将要添加的文本赋值给文本框的 .text 属性。

步骤 4：运行脚本

运行 Python 脚本，它将打开 PowerPoint 模板、添加指定的文字，然后保存生成的 PowerPoint 文件。

这个解决方案基于现有的 PowerPoint 模板创建演示文稿，仅添加文字而不修改图形或样式。您可以根据需要扩展脚本，以在多个幻灯片和文本框上添加不同的文字内容。请确保您的模板和脚本的格式和布局匹配，以获得所需的结果。

作者 east

mysql, 大数据开发, 提示词 9月 7,2023

java批量生成海量测试数据及用ChatGPT提示语一键生成的方法

在做大数据开发时，为了测试性能等，需要上千万，甚至TB或PB级别的，在测试环境可能没有那么多数据，这时可以考虑进行造测试数据。

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.util.Random;

public class TestDataGenerator {
    public static void main(String[] args) {
        String url = "jdbc:mysql://localhost:3306/your_database";
        String username = "your_username";
        String password = "your_password";
        int batchSize = 1000; // 每批次插入的数据量
        int totalRecords = 1000000; // 总共要生成的数据量

        try {
            Connection connection = DriverManager.getConnection(url, username, password);
            connection.setAutoCommit(false);

            String insertQuery = "INSERT INTO test (id, callid, type, ...其他列...) VALUES (?, ?, ?, ...其他值...)";
            PreparedStatement preparedStatement = connection.prepareStatement(insertQuery);

            Random random = new Random();

            for (int i = 1; i <= totalRecords; i++) {
                // 设置每个字段的值，根据表结构设置对应的数据生成逻辑
                preparedStatement.setLong(1, i);
                preparedStatement.setString(2, "CallSheet" + i);
                preparedStatement.setString(3, "Type" + (random.nextInt(5) + 1));
                // 设置其他字段的值...

                preparedStatement.addBatch();

                if (i % batchSize == 0) {
                    preparedStatement.executeBatch();
                    connection.commit();
                }
            }

            preparedStatement.executeBatch();
            connection.commit();

            preparedStatement.close();
            connection.close();

            System.out.println("测试数据生成完成！");
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }

请将上述示例中的数据库连接信息和插入逻辑根据您的数据库设置和表结构进行相应的修改。此程序将会在数据库中插入海量测试数据。

更方便的方法是在ChatGPT等大模型，输入下面提示语：

根据下面的表结构，生成100万的测试数据，给出详细的java实现代码或存储过程代码：【表结构】

亲测在New Bing是可以生成可以运行的代码。

作者 east

Flink 9月 5,2023

Flink CDC对接数据报错：you need (at least one of) the REPLICATION SLAVE privilege(s) for this operation Error code

这个错误消息表明在运行 Flink CDC 连接到 MySQL 数据库时，出现了权限问题。错误消息中提到需要 “REPLICATION SLAVE privilege” 权限来执行操作，但是当前用户似乎没有该权限。

错误原因:

Flink CDC 需要通过 MySQL 的二进制日志来捕获数据库的变更，以便进行实时流式处理。这需要 MySQL 用户具有 REPLICATION SLAVE 权限，以允许 Flink CDC 作为 MySQL 复制从机来读取二进制日志。

解决方案: 为了解决这个问题，您可以采取以下步骤：

授予 REPLICATION SLAVE 权限:
- 通过 MySQL 的 root 或具有足够权限的用户登录。
- 执行以下 SQL 命令，将 REPLICATION SLAVE 权限授予 Flink CDC 使用的用户名（在 Flink 配置中指定的用户名）：sql复制代码GRANT REPLICATION SLAVE ON *.* TO 'your_cdc_user'@'%' IDENTIFIED BY 'your_password';
  - your_cdc_user 替换为 Flink CDC 使用的用户名。
  - your_password 替换为 Flink CDC 使用的密码。
重新启动 Flink CDC 应用:
- 确保 Flink CDC 应用程序重新启动，并尝试重新连接到 MySQL 数据库。
检查 Flink CDC 配置:
- 确保 Flink CDC 配置文件中的连接字符串、用户名和密码正确配置，以匹配 MySQL 数据库的设置。
检查防火墙和网络配置:
- 确保 MySQL 数据库的防火墙和网络配置允许 Flink CDC 应用程序连接到数据库端口。
查看 MySQL 错误日志:
- 检查 MySQL 错误日志以获取更多关于访问被拒绝的详细信息。可能会提供有关错误原因的更多线索。
升级或重新配置 Flink CDC:
- 如果问题仍然存在，考虑升级 Flink CDC 或重新配置其版本，以确保与 MySQL 数据库兼容性。

通过执行上述步骤，您应该能够解决 Flink CDC 连接到 MySQL 数据库时出现的权限问题。确保授予足够的权限，并检查配置以确保准确性。

关注公众号“大模型全栈程序员”回复“大数据面试”获取800页左右大数据面试宝典，回复“大数据”获取多本大数据电子书

作者 east

大数据开发 9月 4,2023

scala比较日期字符串的大小

使用字符串的compareTo方法：如果您的日期字符串是按照“年-月-日”的格式排列的，那么您可以直接使用字符串的compareTo方法来比较它们，无需转换为日期对象。例如，您可以使用以下的Scala代码来比较两个日期字符串¹：

val date1 = "2023-09-03"
val date2 = "2023-08-21"
val result = date1.compareTo(date2)
// result: Int = 1
// result > 0 表示 date1 晚于 date2
// result < 0 表示 date1 早于 date2
// result == 0 表示 date1 等于 date2

作者 east

python 9月 3,2023

python读取doc和docx的word文档工具类

需求：
读取目录下所有word文档，对整行空行的进行删除，输出文件名和word文档的内容

docx后缀的：

使用了第三方库 python-docx 来处理 Word 文档。在运行代码之前，您需要使用以下命令安装该库：

pip install python-docx

python
import os
from docx import Document

class FunnyScriptsReader:
    def __init__(self, directory):
        self.directory = directory

    def process_scripts(self):
        for filename in os.listdir(self.directory):
            if filename.lower().endswith('.doc'):
                file_path = os.path.join(self.directory, filename)
                self.process_script_file(file_path)

    def process_script_file(self, file_path):
        document = Document(file_path)
        file_name = os.path.basename(file_path)

        # 删除整行空行
        for paragraph in document.paragraphs:
            if not paragraph.text.strip():
                runs = paragraph.runs
                for run in runs:
                    run.text = ''
        
        # 输出文件名和文档内容
        print("文件名:", file_name)
        print("文档内容:")
        for paragraph in document.paragraphs:
            if paragraph.text.strip():
                print(paragraph.text)

        print()

# 使用示例
directory = r'D:\BaiduNetdiskDownload\'
reader = FunnyScriptsReader(directory)
reader.process_scripts()

python-docx库读取doc后缀的word文档有问题，可以
win32com库。

pip install pywin32

import os
import win32com.client

# 定义一个函数，用于删除word文档中的空行
def remove_blank_lines(doc):
    paragraphs = doc.Paragraphs
    for paragraph in paragraphs:
        if paragraph.Range.Text.strip() == "":
            paragraph.Range.Delete()

# 定义一个目录路径，用于存放word文档
dir_path = r'D:\BaiduNetdiskDownload'

# 创建一个word应用对象，设置为不可见
word = win32com.client.Dispatch("Word.Application")
word.Visible = False

# 遍历目录下的所有文件，筛选出.doc后缀的文件
for file in os.listdir(dir_path):
    if file.endswith(".doc"):
        # 拼接文件的完整路径
        file_path = os.path.join(dir_path, file)
        try:
            # 打开word文档
            doc = word.Documents.Open(file_path)
            # 调用函数，删除空行
            remove_blank_lines(doc)
            # 输出文件名和文档内容
            print("文件名:", file)
            print("文档内容:", doc.Content.Text)
            # 关闭文档
            doc.Close()
        except Exception as e:
            # 捕获异常，输出错误信息
            print("Error:", e)

# 退出word应用
word.Quit()

作者 east

广告联盟 9月 3,2023

申请adsense的详细攻略

申请Google AdSense是很多网站主和博主的梦想，因为它可以让您通过在您的网站上展示Google的广告来赚取收入。但是，申请AdSense并不是一件容易的事情，您需要满足一些条件和要求，才能通过审核和激活您的账户。本文将为您提供一些申请AdSense的详细攻略，帮助您顺利完成这个过程。

准备您的网站。在申请AdSense之前，您需要确保您的网站符合Google的[内容政策]和[质量指南]，并且有足够的高质量和独特的内容。您的网站应该有一个清晰的主题和目标，有一个专业和易于导航的设计，有一个完整的[隐私政策]和[免责声明]页面，以及一个有效的联系方式。您还需要确保您的网站没有违反任何版权或商标法，没有包含任何色情、暴力、仇恨或非法的内容，没有使用任何欺骗或误导用户的手段，没有参与任何作弊或滥用行为，没有使用任何违反Google广告政策的软件或工具。如果您的网站不符合这些标准，您可能会被拒绝申请或被暂停账户。
注册Google账户。如果您还没有一个Google账户，您需要先[注册]一个。如果您已经有一个Google账户，您可以直接使用它来申请AdSense。请注意，每个人只能拥有一个AdSense账户，如果您已经有一个AdSense账户，您不能再申请另一个。如果您想在多个网站上使用AdSense，您只需要将这些网站添加到您现有的账户中即可。
填写申请表格。当您准备好申请AdSense时，您可以访问[AdSense官网]，点击“开始”按钮，然后按照提示填写申请表格。您需要提供以下信息：
- 您要在上面展示广告的网站地址（例如：https://www.example.com）
- 您要使用AdSense的语言（例如：中文）
- 您的个人信息（例如：姓名、地址、电话号码、电子邮件地址等）
- 您的支付信息（例如：银行账户、税务信息等）
- 您对AdSense条款和条件的同意
- 您对AdSense邮件偏好设置的选择
放置广告代码。当您提交了申请表格后，您会收到一封确认邮件，并且会在您的AdSense账户中看到一个广告代码。这个广告代码是一段HTML代码，您需要将它复制并粘贴到您想要展示广告的网页中。这个广告代码会在审核期间显示一些空白或测试广告，并不会产生任何收入。这个步骤是为了让Google检查您的网站是否符合AdSense政策，并且是否可以正常显示广告。请注意，放置广告代码并不意味着您已经被批准了AdSense账户，您仍然需要等待审核结果。
等待审核结果。当您放置了广告代码后，Google会开始审核您的网站和申请信息。这个过程可能需要几天到几周不等，取决于您的网站类型和地区。在这期间，请不要移除或修改广告代码，并且保持对您网站内容和流量的质量控制。当审核完成后，您会收到一封邮件通知您审核结果。如果您被批准了，您就可以开始在您的网站上展示Google的广告，并且赚取收入了。如果您被拒绝了，您会收到一封邮件告诉您拒绝的原因，并且给您一些改进的建议。您可以根据这些建议修改您的网站，并且在14天后重新申请。

申请成了adsense，要考虑怎样变现，下面有一个全自动做个人站长的教程，爬虫爬取文章，google翻译和自动发表到wordpress。
参考教程

作者 east