[특강] (부산대 AI 콜로퀴엄 2024) 이지형 성균관대학교 교수, Code Intelligence and Language Models
본문 바로가기
교육 정보

[특강] (부산대 AI 콜로퀴엄 2024) 이지형 성균관대학교 교수, Code Intelligence and Language Models

by Llogy 2025. 2. 4.
반응형

와 이 후기를 드디어 올리게 된다.

 

11/22에 있었던 특강이다.

딥러닝 과목을 담당하는 이지형 교수님 특강이 있다고 하여 2시 퇴근하고 zoom으로 들었다. ㅎ

https://his.pusan.ac.kr/maic/68731/subview.do?enc=Zm5jdDF8QEB8JTJGYmJzJTJGbWFpYyUyRjE3Mjg1JTJGMTY5NzM1MSUyRmFydGNsVmlldy5kbyUzRg%3D%3D

 

의료 인공지능 융합인재 양성 사업단

부산대학교 의료 인공지능 융합인재 양성 사업단 홈페이지입니다.

his.pusan.ac.kr

 

애석하게도 자료 공유 받은 건 없다.

 

당시 강의만 듣고 필기하지 못해 관련 내용 정도만 메모를 올린다.

확실히 중국 논문이 많다는 걸 느낀다. (일부 논문은 교수님 관련)

 

tmi 사실 이날 딥시크를 알게 되었는데 중국 꺼라 아직 못 써본 건 함정.


강의명 : Large Language Models for Code Intelligence

 

* Various Structures in Code

cf. 논문 : A Survey of Deep Learning Models for Structural Code Understanding (arXiv 22)

 

*  Understanding Tasks

- Code Clone Detection

cf. 논문 : Towards an Understanding of Large Language Models in Software Engineering Tasks (arXiv 2023)

 

*  Generation Tasks

- Code Summarization

- Code Evaluation and Test-case Generation

...

=> 대다수는 자연언어와 프로그래밍 언어 간 상호작용을 필요로 함

 

*  Pretrained LMs for Code Intelligence

- Open Source Dataset : CodeSearchNet Challenge

cf. 논문 : Large Language Models for Software Engineering: A Systematic Literature Review (arXiv 2023)

- Small Pretrained LMs

>> CodeBERT (Findings of EMNLP 2020), GraphCodeBERT (ICLR 2021), CodeT5 (EMNLP 2021), UniXcoder (ACL 2022), CodeT5+ (EMNLP 2023), Code Llama (arXiv 2023.08), DeepSeek-Coder-v2 (arXiv 24.06)

 

Challenges in Code Intelligence

- Semantic GAP between Pretraining & Fine-Tuning

cf. 논문 : CodePrompt: Task-Agnostic Prefix Tuning for Program and Language Generation (ACL 2023 Findings)

- Robustness for Code Summarization

cf. 논문 : Adversarial Robustness of Deep Code Comment Generation (TOSEM 2022)

- Robustness for Code Search

cf. 논문 : DIP: Dead code Insertion based Black-box Attack for Programming Language Model (ACL 2023)

- Robustness for Code Generation

cf. 논문 : ReCode: Robustness Evaluation of Code Generation Models (ACL 2023)

- Halluciation in Code Generation

cf. 논문 : Exploring and Evaluating Hallucinations in LLM-Powered Code Generation (arXiv 24.04)

- Halluciation -> Retrieval-Augmented Code Generation (RACG)

cf. 논문 : A Survey on Large Language Models for Code Generation (arXiv 24.06)

- Evaluation of Generated Code -> CodeBERTScore

cf. 논문 : CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code (EMNLP 2023)

- Evaluation of Generated Code -> HumanEval Dataset and pass@k metric

cf .논문 : Evaluating Large Language Models Trained on Code (arXiv 21.07)

- Evaluation of Generated Code ->  how to generate correct code

cf. 논문 : CodeT: Code Generation with Generated Tests (ICLR 2023)

- Generating Test Cases -> Generating New Test Cases Using LLMs

cf. 논문 : Measuring Coding Challenge Competence With APPS (NeurIPS 2021)

- Generating Test Cases -> Executability and Coverage in Test Case Generation

cf. developers.google.com : The chromium Chronicle: Code Coverage in Gerrit

- Generating Test Cases ->  Automatic Test Case Generation

cf. 논문 : Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models (ICSE 2023)

- Multi-modal : Text, Table, and Code (Text-to-SQL)

cf. 논문 : SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (arXiv 23.06)

cf. 논문 : How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings (arXiv 23.05)

cf. 논문 : Re-examining the Role of Schema Linking in Text-to-SQL (EMNLP 2020)

- Reasoning -> Vision-Language Models (VLMs)

cf. 논문 : Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models (CVPR 2024)

- Reasoning -> Visual Programming using Large Language Models

cf. 논문 : Visual Programming: Compositional visual reasoning without training (CVPR 2023)

cf. 논문 : ViperGPT: Visual Inference via Python Execution for Reasoning (CVPR 2023)

반응형

댓글