Log Detective

HOWTO

How to upload log

Log Detective can download the build directly from Koji, Copr or Packit. If you have the log somewhere else (different Koji instance, local Mock build), you can point Log Detective to URL where it can be downloaded. The URL can be shortlived, e.g., Pastebin, etc. If you have more logs for one build, you can concat them. E.g., cat build.log root.log | fpaste. Open the paste in a web browser to find its RAW URL, and provide it to Log Detective.

How to annotate a log

After uploading a log, find its relevant part, highlight it with mouse, and click on "Add" button in the right column.
This will create new Form Field "Snippet 1" in the right column. State there why this part of log is interresting. E.g., "Here the build fails, because the /usr/bin/make command is not available." You can add more these snipets.
Then navigate to "Why did the build fail?" and describe the reason, e.g., "The build failed because `make` command was not available in the buildroot and the %install sections requires it."
And finaly go to "How to fix the issue?" and provide a hint how to fix it. E.g., "You have to tell Mock to install the command `make` in buildroot. You can do that by putting `BuildRequires: make`. If you are not sure what package provides this command you can check it using `dnf whatprovides /usr/bin/make` command."
And then Submit your annotation. And remember - the more details you provide, the more details will the AI provide to you later.

Demo with comments

Documentation

Why?

The build logs can have thousands of lines. When the build fails, it is very hard to spot the problem, even for seasoned engineers. The string “ERROR” is not always an error. The error is not always at the end of a log. And you have to be fluent in packaging to recognize how to fix the problem. We gathered data from Copr, and it takes developers days and even weeks to fix the failed build.

How?

We decided to build an AI tool that will read the log and use LLM to give you a human-like answer. We know that the goal is achievable, but we do not have enough data. We have millions of failed jobs from Koji, and Copr. But it is not paired with a description of what needs to be fixed and how. Therefore, we started with Phase 1 and we have created this website where you can upload the failed build log and annotate it. During our experiments, we learned that we will require around one thousand annotated logs to be able to build the AI tool.

What?

The final tool will be an AI model and our artifact will be a binary blob of trained data. This model together with the command line tool will be able to read the build log and give you guidance on why the build failed. Later we plan to incorporate this tool to build systems (Copr, Koji,...) to give you a nice UI.

We need your help

Please upload a build log from your recent failed build - we can import it from Copr, Packit, Koji and even from arbitrary URL. Provided that it points to a parsable text file. Mark which lines of the log are relevant to the failure. And describe how it can be fixed. The more elaborate you will be, the better will be the final tool.

Future Plans

We plan to add a review tool to Log Detective. Similarly what Common Voice has. This will help us to identify bad samples and remove them from the data set.

FAQ

What is our first goal?

Our first step is to collect 1000 annotated logs. This is the estimated amount we need to start producing useful results when training AI. After we reach this goal, we will begin working on a CLI tool that will use the trained data. We will then continue to collect data as 2-3 thousands samples should be needed to get satisfactionary results.

What AI model are you using?

We do not know yet. Tomáš experimented with several models and literally every week he came with a new model that just became available and that is better than our previous choice. We assume that when we start training the AI model, we will be using a model that does not exist yet.

What about privacy?

We do not track any sensitive data. When you upload a build log, make sure you have the rights to do that. All data is available under CDLA-Permissive-2.0 license.

Did you consider using AI to generate the data set?

Yes, we considered it. However, we cannot use ChatGPT and similar tools because the problematic license can taint the whole project. Additionally, using AI-generated data to teach AI is prone to hallucinations. We are considering using data from Bugzilla and other tools (if Legals allows that), but we want to have a strong core dataset with a clear license.

I have lots of failed logs, can I bulk upload them?

Uploading the log is 5 seconds work. But you have to annotate it. Describe what part of log is relevant and why. Why the build failed and how can be fixed. This takes several minutes. Therefore bulk uploading will not help us.

I did not see failed log for ages - do you have some?

Pick anything from Koji's Failed Builds