IMO: This might be a contrarian opinion, but I don't think so. Its much the same problem as asking, for example, if every single line you write, or every function, becomes a commit. The answer to this granularity is, much like anything, you have to think of the audience: Who is served by persisting these sessions? I would suspect that there is little reason why future engineers, or future LLMs, would need access to them; they likely contain a significant amount of noise, incorrect implementations, and red herrings. The product of the session is what matters.
I do think there's more value in ensuring that the initial spec, or the "first prompt" (which IME is usually much bigger and tries to get 80% of the way there) is stored. And, maybe part of the product is an LLM summary of that spec, the changes we made to the spec within the session, and a summary of what is built. But... that could be the commit message? Or just in a markdown file. Or in Notion or whatever.
While it's noisy and complicated for humans to read through, this session info is primarily for future AI to read and use as additional input for their tasks.
We could have LLMs ingest all these historical sessions, and use them as context for the current session. Basically treat the current session as an extension of a much, much longer previous session.
Plus, future models might be able to "understand" the limitations of current models, and use the historical session info to identity where the generated code could have deviated from user intention. That might be useful for generating code, or just more efficient analysis by spending more tokens on possible "hotspots", etc.
Basically, it's high time we start capturing any and all human input for future models, especially open source model development, because I'm sure the companies already have a bunch of this kind of data.
There is some potential value for the audit if you work in a special place where you are sworn in and where transparency is important, but who gonna read all of that and how do you even know that the transcript corresponds to the code if the committer is up to something
This is a central problem that weve already seen proliferate wildly in Scientific research , and currently if the same is allowed to be embedded in foundational code. The future outlook would be grim.
Replication crisis[1].
Given initial conditions and even accounting for 'noise' would a LLm arrive at the same output.It should , for the same reason math problems require one to show their working. Scientific papers require the methods and pseudocode while also requireing limitations to be stated.
Without similar guardrails , maintainance and extension of future code becomes a choose your own adventure.Where you have to guess at the intent and conditions of the LLM used.
You can leave commit messages or comments without spamming your history with every "now I'm inspecting this file..." or "oops, that actually works differently than I expected" transcript.
In fact, I'd wager that all that excess noise would make it harder to discern meaningful things in the future than simply distilling the meaningful parts of the session into comments and commit messages.
IMO, you should do both. The cost of intellectual effort is dropping to zero, and getting an AI to scan through a transcript for relevant details is not going to cost much at all.
Agentic engineering is fundamentally different, not just because of the inherent unpredictability of LLMs, but also because there's a wildly good chance that two years from now Opus 4.6 will no longer even be a model anyone can use to write code with.
with normal practice , say if im reading through the linux source for a particular module.Id be able to refernce mailing lists and patchsets which by convention have to be human parsable/reviewable.Wit the history/comments/git blame etc putting in ones headspace the frame of reference that produced it.
> Its much the same problem as asking, for example, if every single line you write, or every function, becomes a commit.
Hmm, I think that's the wrong comparison? The more useful comparison might be: should all your notes you made and dead ends you tried become part of the commit?
I agree that probably not everything should be stored - it’s too noisy. But the reason the session is so interesting is precisely the later part of the conversation - all the corrections in the details, where the actual, more precise requirements crystallize.
If I can run resume {session_id} within 30 days of a file’s latest change, there’s a strong chance I’ll continue evolving that story thread—or at least I’ve removed the friction if I choose to.
It seems unlikely that a file that hasn't changed in 30 days in an environment with a lot of "agents" cranking away on things is going to be particularly meaningful to revisit with the context from 30 days ago, vs using new context with everything that's been changed and learned since then.
You ignore the reality of vibe coding. If someone just prompts and never reads the code and tests the result barely, then the prompts can be a valuable insight.
If A vibes, and B is overwhelmed with noise, how does B reliably go through it? If using AI, this necessarily faces the same problems that recording all A's actions was trying to solve in the first place, and we'd be stuck in a never-ending cycle.
We could also distribute the task to B, C, D, ... N actors, and assume that each of them would "cover" (i.e. understand) some part of A's output. But this suddenly becomes very labor intensive for other reasons, such as coordination and trust that all the reviewers cover adequately within the given time...
Or we could tell A that this is not a vibe playground and fire them.
LLM session transcripts as part of the commit is a neat idea to consider, to be sure, but I know that I damn well don't want to read eight pages of "You're absolutely right! It's not a foo. It's a bar" slop (for each commit no less!) when I'm trying to find someone to git blame.
The solution is as it always has been: the commit message is where you convey to your fellow humans, succinctly and clearly, why you made the commit.
I like the idea of committing the initial transcript somewhere in the docs/ directory or something. I'll very likely start doing this in my side projects.
I floated that idea a week ago: https://news.ycombinator.com/item?id=47096202, although I used the word "prompts" which users pointed out was obsolete. "Session" seems better for now.
The objections I heard, which seemed solid, are (1) there's no single input to the AI (i.e. no single session or prompt) from which such a project is generated,
(2) the back-and-forth between human and AI isn't exactly like working with a compiler (the loop of source code -> object code) - it's also like a conversation between two engineers [1]. In the former case, you can make the source code into an artifact and treat that as "the project", but you can't really do that in the latter case, and
(3) even if you could, the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't add much value.
At the same time, people have been submitting so many Show HNs of generated projects, often with nothing more than a generated repo with a generated readme. We need a better way of processing these because treating them like old-fashioned Show HNs is overwhelming the system with noise right now [2].
I don't want to exclude these projects, because (1) some of them are good, (2) there's nothing wrong with more people being able to create and share things, (3) it's foolish to fight the future, and (4) there's no obvious way to exclude them anyhow.
But the status quo isn't great because these projects, at the moment, are mostly not that interesting. What's needed is some kind of support to make them more interesting.
YoumuChan makes a similar point at https://news.ycombinator.com/item?id=47213296, comparing it to Google search history. The analogy is different but the issue (signal/noise ratio) is the same.
My current thinking is based on boris tanes[1] formalised method of coding with Claude code. I commit the research and plan.md files as they are when I finally tell Claude to implement changes in code. This becomes a living lexicon of the architecture and every feature added. A very slight variation I do from Boris's method is that I prefix all my research and plan .md filenames with the name of the feature. I can very quickly load relevant architecture into context by having Claude read a previous design document instead of analysing the whole code base. I'll take pieces I think are relevant and tell Claude to base research from those design documents.
> But the status quo isn't great because these projects, at the moment, are mostly not that interesting. What's needed is some kind of support to make them more interesting.
IMO it's not the lack of context that makes them uninteresting. It's the fact that the bar for "this took effort and thought to make" has moved, so it's just a lot easier to make things that we would've considered interesting two years ago.
If you're asking HN readers to sift through additional commit history or "session transcripts" in order to decide if it's interesting, because there's a lot of noise, you've already failed. There's gonna be too much noise to make it worth that sifting. The elevator pitch is just gonna need to be that much different from "vibe coded thing X" in order for a project to be worth much.
Unfortunately Codex doesn’t seem to be able to export the entire session as markdown, otherwise I’d suggest encouraging people to include that in their Show HNs. It’s kind of nuts that it’s so difficult to export what’s now a part of the engineering process.
I don’t have anything against vibe coded apps, but what makes them interesting is to see the vibe coding session and all the false starts along the way. You learn with them as they explore the problem space.
I don't think it's hard to export, on the contrary its all already saved it your ~/.claude which so you could write up a tool to convert the data there to markdown.
Why does the regular voting system fail here? Are there just too many Show HNs for people to process the new ones, so the good ones get lost in the noise?
Regarding the noise you mention, I wonder if memento's use of the git 'notes' feature is an acceptable way to contain or quarantine that noise. It might still not add much value, but at least it would live in a separate place that is easily filtered out when the user judges it irrelevant. Per the README of the linked repo,
> It runs a commit and then stores a cleaned markdown conversation as a git note on the new commit.
So it doesn't seem that normal commit history is affected - git stores notes specially, outside of the commit (https://git-scm.com/docs/git-notes).
In fact github doesn't even display them, according to some (two-year-old) blog posts I'm seeing. Not sure about other interfaces to git (magit, other forges), but git log is definitely able to ignore them (https://git-scm.com/docs/git-log#Documentation/git-log.txt--...).
This doesn't mean the saved artifacts would necessarily be valuable - just that, unlike a more naive solution (saving in commit messages or in some directory of tracked files) they may not get in the way of ordinary workflows aside from maybe bloating the repo to some degree.
Is spam on topic? and are AI codegen bots part of the community?
To me, the value of Show HN was rarely the thing, it was the work and attention that someone put into it. AI bot's don't do work. (What they do is worth it's own word, but it's not the same as work).
> I don't want to exclude these projects, because (1) some of them are good,
Most of them are barely passable at best, but I say that as a very biased person. But I'll reiterate my previous point. I'm willing to share my attention with people who've invested significant amounts of their own time. SIGNIFICANT amounts, of their time, not their tokens.
> (2) there's nothing wrong with more people being able to create and share things
This is true, only in isolation. Here, the topic is more, what to do about all this new noise, (not; should people share things they think are cool). If the noise drowns out the signal, you're allowed that noise to ruin something that was useful.
> (3) it's foolish to fight the future
coward!
I do hope you take that as the tongue-in-cheek way I meant it, because I say it as a friend would; but I refuse to resign myself completely to fatalism. Fighting the future is different from letting people doing something different ruin the good thing you currently have. Sure electric cars are the future, but that's no reason to welcome them in a group that loves rebuilding classic hot rods.
> (4) there's no obvious way to exclude them anyhow.
You got me there. But then, I just have to take your word for it, because it's not a problem I've spent a lot of time figuring out. But even then, I'd say it's a cultural problem. If people ahem, in a leadership position, comment ShowHN is reserved for projects that took a lot of time investment, and not just ideas with code... eventually the problem would solve itself, no? The inertia may take some time, but then this whole comment is about time...
I know it's not anymore, but to me, HN still somehow, feels a niche community. Given that, I'd like to encourage you to optimize for the people who want to invest time into getting good at something. A very small number of these projects could become those, but trying to optimize for best fairness to everyone, time spent be damned... I believe will turn the people who lift the quality of HN away.
> the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't really add that much value.
This is the major blocker for me. However, there might be value in saving a summary - basically the same as what you would get from taking meeting notes and then summarizing the important points.
> people have been submitting so many Show HNs of generated projects
In this case, it was more of write the X language compiler using X. I had to prove to myself if keeping the session made sense, and what better way to do it than to vibe code the tool to audit vibe code.
That is by no means all of these projects. I'm not interested in a circle-the-wagons crackdown because it won't work (see "it's foolish to fight the future" above), and because we should be welcoming and educating new users in how to contribute substantively to HN.
The future you're concerned with defending includes bots being a large part of this community, potentially the majority. Those bots will not only submit comments autonomously, but create these projects, and Show HN threads. I.e. there will be no human in the loop.
This is not something unique to this forum, but to the internet at large. We're drowning in bot-generated content, and now it is fully automated.
So the fundamental question is: do you want to treat bots as human users?
Ignoring the existential issue, whatever answer you choose, it will inevitably alienate a portion of existing (human) users. It's silly I have to say this, but bots don't think, nor "care", but will keep coming regardless.
To me the obvious answer is "no". All web sites that wish to preserve their humanity will have to do a complete block of machine-generated content. It's a tough nut to crack, but I reckon YC would know some people capable of tackling this.
It's important to note that this state of a human driving the machine directly is only temporary. The people who think these are tools as any other are sorely mistaken. This tool can do their minimal effort job much more efficiently, cheaper, and with better results, and it's only a matter of time until the human is completely displaced. This will take longer for more complex work, of course, but creating regurgitated projects on GitHub and posting content on discussion forums is a very low bar activity.
The more fundamental question is: Is there information in the AI-coding session that should be preserved? Only if the answer is "yes", the next question becomes: Where do we store that data?
git is only one possible location.
I think there is very valuable information in session logs, like the prompts, or the usage statistics at the end of the session, which model was used etc. But git history or the commit messages should focus on the outcome of the work, not on the process itself. This is why the whole issue discussion before work in git starts is also typically kept separately in tickets. Not in git itself, but close to it.
There're platforms like tulpal.com which move the whole local agent-supported process to the server and therefore have much better after-the-fact observability in what happened.
Why should it be? The agent session is a messy intermediate output, not an artifact that should be part of the final product. If the "why" of a code change is important, have your agent write a commit message or a documentation file that is polished and intended for consumption.
It should be a distillation of the session and/or the prompts, at bare minimum. No, it should not include e.g. research-type questions, but it should include prompts that the user wrote after reading the answers to those research-type questions, and perhaps some distillation of the links / references surfaced during the research.
Prompts probably should be distilled / summarized, especially if they are research-based prompts, but code-gen prompts should probably be saved verbatim.
Reproducibility is a thing, and though perfect reproducibility isn't desirable, something needs to make up for the fact that vibe-coding is highly inscrutable and hard to review. Making the summary of the session too vague / distilled makes it hard to iterate and improve when / if some bad prompts / assumptions are not documented in any way.
You have the source code though. That is the "reproducibility" bit you need. What extra reproducibility does having the prompts give you? Especially given that AI agents are non-deterministic in the first place. To me the idea that the prompts and sessions should be part of the commit history is akin to saying that the keystroke logs and commands issued to the IDE should be part of the commit history. Is it important to know that when the foo file was refactored the developer chose to do it by hand vs letting the IDE do it with an auto-refactor command vs just doing a simple find and replace? Maybe it is for code review purposes, but for "reproducibility" I don't think it is. You have the code that made build X and you have the code that made build X+1. As long as you can reliably recreate X and X+1 from what you have in the code, you have reproducibility.
> You have the source code though. That is the "reproducibility" bit you need.
I am talking about reproducing the (perhaps erroneous) logic or thinking or motivations in cases of bugs, not reproducing outputs perfectly. As you said, current LLM models are non-deterministic, so we can't have perfect reproducibility based on the prompts, but, when trying to fix a bug, having the basic prompts we can see if we run into similar issues given a bad prompt. This gives us information about whether the bad / bugged code was just a random spasm, or something reflecting bad / missing logic in the prompt.
> Is it important to know that when the foo file was refactored the developer chose to do it by hand vs letting the IDE do it with an auto-refactor command vs just doing a simple find and replace? Maybe it is for code review purposes, but for "reproducibility" I don't think it is.
I am really using "reproducibility" more abstractly here, and don't mean perfect reproducibility of the same code. I.e. consider this situation: "A developer said AI wrote this code according to these specs and prompt, which, according to all reviewers, shouldn't produce the errors and bad code we are seeing. Let's see if we can indeed reproduce similar code given their specs and prompt". The less evidence we have of the specifics of a session, the less reproducible their generated code is, in this sense.
I mean, sure, a good, detailed commit message is perfectly fine to me in place of the prompts / a session distillation. But I am not holding my breath for vibe-coders to properly review their code and make such a commit message. But, if they, do, great! No need for prompt / session details.
Completely agree. Until recently I only let LLMs write my commit messages, but I've found that versioning the plan files is the better artifact, it preserves agentic decisions and my own reasoning without the noise.
My current workflow: write a detailed plan first, then run a standard implement -> review loop where the agent updates the plan as errors surface. The final plan doc becomes something genuinely useful for future iterations, not just a transcript of how we got there.
In my case I have set up the agent is the repo. The repo texts compose the agent’s memory. Changes to the repo require the agent to approve.
Repos also message each other and coordinate plans and changes with each other and make feature requests which the repo agent then manages.
So I keep the agents’ semantically compressed memories as part of the repo as well as the original transcripts because often they lose coherence and reviewing every user submitted prompt realigns the specs and stories and requirements.
I think the parent comment is saying “why did the agent produce this big, and why wants it caught”, which is a separate problem from what granular commits solve, of finding the bug in the first place.
but that takes more tokens and time. if you just save the raw log, you can always do that later if you want to consume it. plus, having the full log allows asking many different questions later.
If you read the history of both and assuming that there’s good comments and documentation, it shows you the reasoning that went into the decision-making
An important consideration somewhat missing in discussion in this thread: if we don't carefully document AI-assisted coding sessions, how can we ever hope to improve our use of AI coding tools?
This applies both to future AI tools and also experts, and experts instructing novices.
To some degree, the lack of documenting AI sessions is also at the core of much of the skepticism toward the value of AI coding in general: there are so many claims of successes / failures, but only a vanishingly small amount of actual detailed receipts.
Automating the documentation of some aspects of the sessions (skills + prompts, at least) is something both AI skeptics and proponents ought to be able to agree on.
EDIT: Heck, if you also automate documenting the time spent prompting and waiting for answers and/or code-gen, this would also go a long way to providing really concrete evidence for / against the various claims of productivity gains.
Obviously yes, at least if not the prompts in the session, some simple / automated distillation of those prompts. Code generated by AI is already clearly not going to be reviewed as carefully as code produced by humans, and intentions / assumptions will only be documented in AI-generated comments to some limited degree, completely contingent on the prompt(s).
Otherwise, when fixing a bug, you just risk starting from scratch and wasting time using the same prompts and/or assumptions that led to the issue in the first place.
Much of the reason code review was/is worth the time is because it can teach people to improve, and prevent future mistakes. Code review is not really about "correctness", beyond basic issues, because subtle logic errors are in general very hard to spot; that is covered by testing (or, unfortunately, deployment surprises).
With AI, at least as it is currently implemented, there is no learning, as such, so this removes much of the value of code review. But, if the goal is to prevent future mistakes, having some info about the prompts that led to the code at least brings some value back to the review process.
EDIT: Also, from a business standpoint, you still need to select for competent/incompetent prompters/AI users. It is hard to do so when you have no evidence of what the session looked like. Also, how can you teach juniors to improve their vibe-coding if you can't see anything about their sessions?
I don't think this is obvious at all. We don't make the keystroke logs part of the commit history. We don't make the menu item selections part of the commit history. We don't make the 20 iterations you do while trying to debug an issue part of the commit history (well, maybe some people do but most people I know re-write the same file multiple times before committing, or rebase/squash intermediate commits into more useful logical commits. We don't make the search history part of the commit history. We don't make the discussion that two devs have about the project part of the commit history either.
Some of these things might be useful to preserve some of the time either in the commit history or along side it. For example, having some documentation for the intent behind a given series of commits and any assumptions made can be quite valuable in the future, but every single discussion between any two devs on a project as part of the commit history would be so much noise for very little gain. AI prompts and sessions seem to me to fall into that same bucket.
> well, maybe some people do but most people I know re-write the same file multiple times before committing, or rebase/squash intermediate commits into more useful logical commits
Right, agreed on this, we want a distillation, not documentation of every step.
> For example, having some documentation for the intent behind a given series of commits and any assumptions made can be quite valuable in the future, but every single discussion between any two devs on a project as part of the commit history would be so much noise for very little gain. AI prompts and sessions seem to me to fall into that same bucket.
Yes, documenting every single discussion is a waste / too much to process, but I do think prompts at least are pretty crucial relative to sessions. Prompts basically are the core intentions / motivations (skills aside). It is hard to say whether we really want earlier / later prompts, given how much context changes based on the early prompts, but having no info about prompts or sessions is a definite negative in vibe-coding, where review is weak and good documentation, comments, and commit messages are only weakly incentivized.
> Some of these things might be useful to preserve some of the time either in the commit history or along side it
Right, along side is fine to me as well. Just something has to make up for the fact that vibe-coding only appears faster (currently) if you ignore the fact it is weakly-reviewed and almost certainly incurring technical debt. Documenting some basic aspects of the vibe-coding process is the most basic and easy way to reduce these long-term costs.
EDIT: Also, as I said, information about the prompts quickly reveals competence / incompetence, and is crucial for management / business in hiring, promotions, managing token budgets, etc. Oh, and of course, one of the main purposes of code review was to teach. Now, that teaching has to shift toward teaching better prompting and AI use. That gets a lot harder with no documentation of the session!
I considered this and even built a claude code extension to bring history/chats into the project folder.
Not once have I found it useful: if the intention isn't clear from the code and/or concise docs, the code is bad and needs to be polished.
Well written code written with intention is instantly interpretable with an LLM. Sending the developer or LLM down a rabbit hole of drafts is a waste of cognition and context.
Conceptually this is very similar to the question of whether or not you should squash your commits. To the point that it's really the same question.
If you think you should squash commits, then you're only really interested in the final code change. The history of how the dev got there can go in the bin.
If you don't think you should squash commits then you're interested in being able to look back at the journey that got the dev to the final code change.
Both approaches are valid for different reasons but they're a source of long and furious debate on every team I've been on. Whether or not you should be keeping a history of your AI sessions alongside the code could be useful for debugging (less code debugging, more thought process debugging) but the 'prefer squash' developers usually prefer to look the existing code rather than the history of changes to steer it back on course, so why would they start looking at AI sessions if they don't look at commits?
All that said, your AI's memory could easily be stored and managed somewhere separately to the repo history, and in a way that makes it more easily accessible to the LLM you choose, so probably not.
I've generally been in the squash camp but it's more out of a sense of wanting a "clean" and bisectable repo history. In a word where git (and git forges) could show me atomic merge commits but also let me seamlessly fan those out to show the internal history and iteration and maybe stuff like llm sessions, I'd be into that.
And yes, it's my understanding that mercurial and fossil do actually do more of this than git does, but I haven't actually worked on any projects using those so I can't comment.
I think this is the right analogy, contrary to some other very poor ones in this thread. Yes, it is rare to really look at commit messages, but it can be invaluable in some cases.
With vibe-coding, you risk having no documentation at all for the reasoning (AI comments and tests can be degenerate / useless), but the prompts, at bare minimum, reveal something about the reasoning / motivation.
Whether this needs to be in git or not is a side issue, but there is benefit to having this available.
This only works if the software is still crafted by a human and merely using AI as a tool. In that case the use of AI is similar to using editor macros or test-driven development. I don't need to see that process playing out in real time.
It's less clear to me if the software isn't crafted by a human at all, though. In that case I would prefer to see the prompt.
I agree that fully agentic development will change things, but I don't know how. I'm still very much in the human-in-the-loop phase of AI where I want to understand and verify that it's not done anything silly. I care far more about the code that I'm deploying than the prompt that got me there and probably will for a long time. So will my prodsec team.
I think the decisions it made along the way are worth tracking. And it’s got some useful side effects with regard to actually going through the programming and architecture process. I made a tool that really helps with this and finds a pretty portable middle ground that can be used by one person or a team too, it’s flexible.
https://deciduous.dev/
It's already bad enough that people are saying there's too much code to read and review. You want to add session to it? Running it again, might not yield the same output. These models are non deterministic and models are often changed and upgraded.
A few things really leveled up both my software quality and my productivity in the last few months. It wasn’t session history, memory files, context management or any of that.
1. Writing a spec with clear acceptance criteria.
2. Assigning IDs to my acceptance criteria. Sounds tedious, but actually the idea wasn’t mine, at some point an agent went and did it without me asking. The references proved so useful for guiding my review that I formalized the process (and switched from .md to .yaml to make it easier).
3. Giving my agents a source of truth to share implementation progress so they can plan their own tasks and more effectively review.
Of course, I can’t help myself, I had to formalize it into a spec standard and a toolkit. Gonna open source it all soon, but I really want feedback before I go too far down the rabbit hole:
Yup and they do, but then I figured out that I can just write loosely structured yaml and the ids come for free. I then encourage the agents to tag and reference them everywhere, especially tests.
I was looking for an analogy and this is a good one.
The noise to signal ratio seems so bad. You’d have to sift through every little “thought”. If I could record my thought stream would I add it to the commit? Hell no.
Now, a summary of the reasoning, assumptions made and what alternatives were considered? Sure, that makes for a great message.
And not all google searches you do while working on that commit may even be related to that commit. It may be entirely unrelated, or sensitive information that should not be made public.
I don't think it should be. I think a distilled summary of what the agent did should be committed. This requires some dev discipline. But for example:
Make a button that does X when clicked.
Agent makes the button.
I tell it to make the button red.
Agent makes it red.
I test it, it is missing an edge case. I tell it to fix it.
It fixes it.
I don't like where the button is. I tell it to put it in the sidebar.
It does that.
I can go on and on. But we don't need to know all those intermediaries. We just need to know Red button that does X by Y mechanism is in the sidebar. Tests that include edge cases here. All tests passing. 2026-03-01
And that document is persisted.
If later, the button gets deleted or moved again or something, we can instruct the agent to say why. Button deleted because not used and was noisy. 2026-03-02
This can be made trivial via skills, but I find it a good way to understand a bit more deeply than commit messages would allow me to do.
Of course, we can also just write (or instruct agents to write) better PRs but AFAICT there's no easy way to know that the button came about or was deleted by which PR unless you spelunk in git blame.
If you do proper software development (planing, spec, task breakdown, test case spec, implementation, unit test, acceptance test, ...) implementation is just a single step and the generated artifact is the source code. And that's what needs to be checked in. All the other artifacts are usually stored elsewhere.
If you do spec and planing with AI, you should also commit the outcome and maybe also the prompt and session (like a meeting note on a spec meeting). But it's a different artifact then.
But if you skip all the steps and put your idea directly to an coding agent in the hope that the result is a final, tested and production ready software, you should absolutely commit the whole chat session (or at least make the AI create a summary of it).
LLMs frequently hallucinate and go off on wild goose chases. It's admittedly gotten a lot better, but it still happens.
From that perspective alone the session would be important meta information that could be used to determine the rationale of a commit - right from the intent (prompt) to what the harness (Claude code etc) made of it. So there is more value in keeping it even in your second scenario
I hope people start doing that. Not that it has any practical usage for the repo itself, but if everyone does that, it'd probably make it much easier for open weight models to catch up the proprietary ones. It'd be like a huge crowdsourced project to collect proprietary models' output for future training.
For my own projects in private repos I would benefit from exporting the session. For example if I need to return to the task, it could be great to give it as a context
For my work as one of developers in team, no. The way I prompt is my asset and advantage over others in a team who always complain about AI not being able to provide correct solutions and secures my career
Have AI explain the reasoning behind the PR. I don't think people really care about your step by step process but reviewers might care about your approach, design choices, caveats, and trade offs.
That context could clarify the problem, why the solution was chosen, key assumptions, potential risks, and future work.
Increasingly, I'd like the code to live alongside a journal and research log. My workflow right now is spending most of my time in Obsidian writing design docs for features, and then manually managing claude sessions that I paste them back and forth into. I have a page in obsidian for each ongoing session, and I record my prompts, forked paths, thoughts on future directions, etc. It seems natural that at some point this (code, journal, LLM context) will all be unified.
I created a system which I call 'devlog'. Agent summarizes what it did & how it did in a concise file, and its gets committed along with first prompt and the plan file if any. Later due to noise & volume, I started saving those in a database and adding only devlog id to commit nowadays.
Now whenever I need to reason with what agent did & why, info is linked & ready on demand. If needed, session is also saved.
The session capture problem is harder than it looks because you need to capture intent, not steps.
A coding session has a lot of 'left turn, dead end, backtrack' noise that buries the decision that actually mattered. Committing the full session is like committing compiler output — technically complete, practically unreadable.
We've been experimenting with structured post-task reflections instead: after completing significant work, capture what you tried, what failed, what you'd do differently, and the actual decision reasoning. A few hundred tokens instead of tens of thousands. Commits with a reflection pointer rather than an embedded session.
The result is more useful than raw logs. Future engineers (or future AI sessions) can understand intent without replaying the whole conversation. It's closer to how good commit messages work — not 'here's what changed' but 'here's why'.
Dang's point about there being no single session is also real. Our biggest tasks span multiple sessions and multiple contributors. 'Capture the session' doesn't compose. 'Capture the decision' does.
Something like "it is important to document core / crucial prompts somewhere" covers it. Whether this should be in git or elsewhere is trickier, but doing vibe-coding without documenting any aspect of the process is a recipe for disaster.
Also, how can we (or future AI models) hope to improve if there is only limited and summary documentation of AI usage?
My complete reasoning, notes, errors have never been part of the commit. I don't see a valid reason on why the raw conversation must be included. Rather I have hooks (or just "manually" invoked) to process all of it and update the relevant documentation that I've been putting under docs/.
If you also ensure the AI writes relevant (and correct) docs, and also code comments and commit message, then I agree there is not much need for extra info, e.g. prompts / session distillation. I am not sure that that is the case currently (though we might be getting there soon at least in some cases).
Agree with this, I've been testing how AGENTS.md and similar can do to automatically have these behaviours and I feel (it's just feeling) it's been improving over time. Clearly depends a lot on the agent, the model, the codebase size and so on.
Yup. Realistically there will always be simple changes that AI can handle completely (docs, comments, and commit message), and other changes where some human input will be hugely valuable.
Until then, it makes sense to automatically include some distillation of the AI generation process, by default, IMO.
Yes, it should remain part of the commit, and the work plan too, including judgements/reviews done with other agents. The chat log encodes user intent in raw form, which justifies tasks which in turn justify the code and its tests. Bottom up we say the tests satisfy the code, which satisfies the plan and finally the user intent. You can do the "satisfied/justified" game across the stack.
I only log my own user messages not AI responses in a chat_log.md file, which is created by user message hook in the repo.
IMO this is solving the wrong problem. the session log is just noise - its like attaching your google search history to a stackoverflow answer to "prove" you did the research. nobody wants to read 500 lines of an agent going back and forth debugging a race condition.
the actual problem is that AI produces MORE code not better code, and most people using it aren't reviewing what comes out. if you understood the code well enough to review it properly you wouldn't need the session log. and if you didn't understand it, the session log won't help you either because you'll just see the agent confidently explaining its own mistakes.
> have your agent write a commit message or a documentation file that is polished and intended for consumption
this is the right take. code review and commit messages matter more now than they ever did BECAUSE there's so much more code being generated. adding another artifact nobody reads doesn't fix the underlying issue which is that people skip the "understand what was built" step entirely.
I think this is a lot of "kicking can down the road" of not understanding what code the ai is writing. Once you give up understanding the code that is written there is no going back. You can add all the helper commit messages, architecture designs, plans, but then you introduce the problem of having to read all of those once you run into an issue. We've left readability on the wayside to the alter of "writeability".
The paradigm shift, which is a shift back, is to embrace the fact that you have to slow down, and understand all the code the ai is writing.
In the ideal world a specification file should be committed to the repository and then linked to the PR/commit. But it slows you down and is no longer a vibe coding?
Soon only implementation details will matter. Code can be generated based on those specifications again and again.
I agree, and am so captivated with the idea that I decided to build a whole toolkit around it. Would be very keen to get feedback if anyone wants to try it when it’s ready.
So far this workflow is the only way I’ve been able to have any real success running parallel agents or assigning longer running tasks that don’t get thrown out.
I’ve been thinking about a simple problem:
We’re increasingly merging AI-assisted code into production, but we rarely preserve the thing that actually produced it — the session.
Six months later, when debugging or reviewing history, the only artifact left is the diff.
So I built git-memento.
It attaches AI session transcripts to commits using Git notes.
You also have code comments, docs in the repo, the commit message, the description and comments on the PR, the description and comments on your Issue tracker.
Providing context for a change is a solved problem, and there is relatively mature MCP for all common tooling.
People won’t do that, unfortunately. We are a dying breed (I hate it). I went against my own instincts and vibe code this, works as a proof of concept.
You can see the session (including my typos) and compare what was asked for and what you got.
I already invented this in my head, thanks for not making me code it.
Excellent idea, I just wish GitHub would show notes. You also risk losing those notes if you rebase the commit they are attached to, so make sure you only attach the notes to a commit on main.
There is so much undefined in how agentic coding is going to mature. Something like what you're doing will need to be a part of it. Hopefully this makes some impressions and pushes things forward.
Goodness no! Sometimes I literally SHOUT at these agents/chats and often stoop down to using cuss words, which I am not proud of, but surprisingly it has shown to work here and there. As real as that is, I'd not want that on record in a commit.
This seems wrong, like committing debug logs to the repo. There's also lots of research showing that models regularly produce incorrect trace tokens even with a correct solution, so there's questionable value even from a debugging perspective.
I did this in the beginning and realized I never went back to it. I think we have to learn to embrace the chaos. We can try to place a couple of anchors in the search space by having Claude summarize the code base every once in a while, but I am not sure if even that is necessary. The code it writes is git versioned and is probably enough to go on.
I think so. If nothing else, when you deploy and see a bug, you can have a script that revives the LLMs of the last N commits and ask "would your change have caused this?" Probably wouldn't work or be any more efficient than a new debugging agent most of the time, but it might sometimes and you'd have a fix PR ready before you even answered the pager, and a postmortem that includes WHY it did so, and a prompt to prevent that behavior in the future. And it's cheap, so why not.
Maybe not a permanent part of the commit, but something stored on the side for a few weeks at a time. Or even permanently, it could be useful to go back and ask, "why did you do it that way?", and realize that the reason is no longer relevant and you can simplify the design without worrying you're breaking something.
The barrier for entry is just including the complete sessions. It gets a little nuanced because of the sheer size and workflows around squash merging and what not, and deciding where you actually want to store the sessions. For instance, get notes is intuitive; however, there are complexities around it. Less elegant approach is just to take all sessions in separate branches.
Beyond this, you could have agents summarize an intuitive data structure as to why certain commits exist and how the code arrived there. I think this would be a general utility for human and AI code reviewers alike. That is what we built. Cost /utility need to make sense. Research needs to determine if this is all actually better than proper comments in code
I've gotten into the habit of having the LLM produce a description of their process and summarize the change, Than I add that along with the model I used after my own commit message. It lets me know where I use AI and what I thought it did as well as what I thought it did.
The entire prompt and process would be fine if my git history was subject to research but really it is a tool for me or anyone else who wants to know what happened at a given time.
If the model in use is managed by a 3rd party, can be updated at will, and also gives different output each time it is interacted with, what is the main benefit?
If I chat with an agent and give an initial prompt, and it gets "aspect A" (some arbitrary aspect of the expected code) wrong, I'll iterate to get "aspect A" corrected. Other aspects of the output may have exactly matched my (potentially unstated) expectation.
If I feed the initial prompt into the agent at some later date, should I expect exactly "aspect A" to be incorrect again? It seems more likely the result will be different, maybe with some other aspects being "unexpected". Maybe these new problems weren't even discussed in the initial archived chat log, since at that time they happened to be generated in a way in alignment with the original engineers expectation.
reproducibility isn't really the goal imo. more like a decision audit trail -- same reason code comments have value even though you can't regenerate the code from them. six months later when you're debugging you want to know 'why did we choose this approach' not 'replay the exact conversation.'
Because intent matters and 6 months or 3 years down the line and it's time to refactor, and the original human author is long gone, there's a difference if the prompt was "I need a login screen" vs "I need a login screen, it should support magic link login and nothing else".
Back in the dark ages, you'd "cc -s hello.c" to check the assembler source. With time we stopped doing that and hello.c became the originating artefact. On the same basis the session becomes the originating artefact.
I'm not sure this analogy holds, for two reasons. First, even in the best case, chain-of-thought transcripts don't reliably tell you what the agent is doing and why it's doing it. Second, if you're dealing with a malicious actor, the transcript may have no relation to the code they're submitting.
The reason you don't have to look at assembly is that the .c file is essentially a 100% reliable and unambiguous spec of how the assembly will look like, and you will be generating the assembly from that .c file as a part of the build process anyway. I don't see how this works here. It adds a lengthy artifact without lessening the need for a code review. It may be useful for investigations in enterprise settings, but in the OSS ecosystem?...
Also, people using AI coding tools to submit patches to open-source projects are weirdly hesitant to disclose that.
This is only true if a llm session would produce a deterministic output which is not the case. This whole “LLMs are the new compiler” argument doesn’t hold water.
"Deterministic" is not the issue either, it's that small changes of the input will cause unknown changes in the output. You might theoretically achieve determinism and reproducibility for the exact same input (seeding the random number generators etc.), but the issue is that even if you formulate your request just a little differently, by changing punctuation for example, you'll get an entirely different output.
With compilers, the rules are clear, e.g. if you replace variable names with different ones, the program will still do the same thing. If you add spaces in places where whitespace doesn't matter, like around operators, the resulting behavior will still be the same. You change one function's definition, it doesn't impact another function's definition. (I'm sure you can nitpick this with some edge case, but that's not the point, it overwhelmingly can be relied upon in this way in day to day work.)
LLMs are non-deterministic, you would end up with a different output even if you paste the same conversation in. Even if the model was identical at the time you tried to reproduce it. Which gets less likely as time passes.
Also, why would you need to reproduce it? You have the code. Almost any modification to said code would benefit from a fresh context and refined prompt.
An actual full context of a thinking agent is asinine, full of busy work, at best if you want to preserve the "reason" for the commits contents maybe you could summarise the context.
Other than that I see no reason to store the whole context per commit.
If you can, run several agents. They document their process. Trade offs considered, reasoning. Etc. it’s not a full log of the session but a reasonable history of how the code came to be. Commit it with the code. Namespace it however you want.
In our (small) team, we’ve taken to documenting/disclosing what part(s) of the process an LLM tool played in the proposed changes. We’ve all agreed that we like this better, both as submitters and reviewers. And though we’ve discussed why, none of us has coined exactly WHY we like this model better.
hell to the no, in between coding sessions, I go out on plenty of sidebars about random topics that help me, the prompter understand the problem more. Prompts in this way are entirely related to context (pre-knowledge) that is not available to the LLMs.
One of the use cases i see for this tool is helping companies to understand the output coming from the llm blackbox and the process which the employee took to complete a certain task
Except it doesn't capture the majority of uses of AI, in my experience. In my current practice, the the vast majority of AI use is autocompletions, or small inline prompts. ("Fix this error."; "Open an ALSA midi connection" (things that avoid a to trip into awful documentation); "if (one of the query parameters is "gear='ir') ..." (things that break flow by forcing a trip into excellent but overly verbose Javascript URL API documentation)). Only very occasionally will I prompt for a big chunk of code.
I’ve had the same thought, but after playing around with it, it just seems like adding noise. I never find myself looking at generated code and wondering “what prompt lead to that?” There’s no point, I won’t get any kind of useful response - I’m better off talking to the developer who committed it, that’s how code review works.
Maybe Git isn't the right tool to track the sessions. Some kind of new Semi-Human Intelligence Tracking tool. It will need a clever and shorter name though.
I've thought about this, and I do save the sessions for educational purposes. But what I ended up doing is exactly what I ask developers to do: update the bug report with the analysis, plan, notes etc. In the case there's a single PR fixing one bug, GitHub and Claude tend to prefer this information go in the PR description. That's ok for me since it's one click from the bug.
I do think there's more value in ensuring that the initial spec, or the "first prompt" (which IME is usually much bigger and tries to get 80% of the way there) is stored. And, maybe part of the product is an LLM summary of that spec, the changes we made to the spec within the session, and a summary of what is built. But... that could be the commit message? Or just in a markdown file. Or in Notion or whatever.
We could have LLMs ingest all these historical sessions, and use them as context for the current session. Basically treat the current session as an extension of a much, much longer previous session.
Plus, future models might be able to "understand" the limitations of current models, and use the historical session info to identity where the generated code could have deviated from user intention. That might be useful for generating code, or just more efficient analysis by spending more tokens on possible "hotspots", etc.
Basically, it's high time we start capturing any and all human input for future models, especially open source model development, because I'm sure the companies already have a bunch of this kind of data.
Replication crisis[1].
Given initial conditions and even accounting for 'noise' would a LLm arrive at the same output.It should , for the same reason math problems require one to show their working. Scientific papers require the methods and pseudocode while also requireing limitations to be stated.
Without similar guardrails , maintainance and extension of future code becomes a choose your own adventure.Where you have to guess at the intent and conditions of the LLM used.
[1] https://www.ipr.northwestern.edu/news/2024/an-existential-cr...
In fact, I'd wager that all that excess noise would make it harder to discern meaningful things in the future than simply distilling the meaningful parts of the session into comments and commit messages.
Hmm, I think that's the wrong comparison? The more useful comparison might be: should all your notes you made and dead ends you tried become part of the commit?
If I can run resume {session_id} within 30 days of a file’s latest change, there’s a strong chance I’ll continue evolving that story thread—or at least I’ve removed the friction if I choose to.
But I am not rooting for either, just saying.
We could also distribute the task to B, C, D, ... N actors, and assume that each of them would "cover" (i.e. understand) some part of A's output. But this suddenly becomes very labor intensive for other reasons, such as coordination and trust that all the reviewers cover adequately within the given time...
Or we could tell A that this is not a vibe playground and fire them.
The solution is as it always has been: the commit message is where you convey to your fellow humans, succinctly and clearly, why you made the commit.
I like the idea of committing the initial transcript somewhere in the docs/ directory or something. I'll very likely start doing this in my side projects.
The objections I heard, which seemed solid, are (1) there's no single input to the AI (i.e. no single session or prompt) from which such a project is generated,
(2) the back-and-forth between human and AI isn't exactly like working with a compiler (the loop of source code -> object code) - it's also like a conversation between two engineers [1]. In the former case, you can make the source code into an artifact and treat that as "the project", but you can't really do that in the latter case, and
(3) even if you could, the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't add much value.
At the same time, people have been submitting so many Show HNs of generated projects, often with nothing more than a generated repo with a generated readme. We need a better way of processing these because treating them like old-fashioned Show HNs is overwhelming the system with noise right now [2].
I don't want to exclude these projects, because (1) some of them are good, (2) there's nothing wrong with more people being able to create and share things, (3) it's foolish to fight the future, and (4) there's no obvious way to exclude them anyhow.
But the status quo isn't great because these projects, at the moment, are mostly not that interesting. What's needed is some kind of support to make them more interesting.
So, community: what should we do?
[1] this point came from seldrige at https://news.ycombinator.com/item?id=47096903 and https://news.ycombinator.com/item?id=47108653.
YoumuChan makes a similar point at https://news.ycombinator.com/item?id=47213296, comparing it to Google search history. The analogy is different but the issue (signal/noise ratio) is the same.
[2] Is Show HN dead? No, but it's drowning - https://news.ycombinator.com/item?id=47045804 - Feb 2026 (422 comments)
[1] https://boristane.com/blog/how-i-use-claude-code/
IMO it's not the lack of context that makes them uninteresting. It's the fact that the bar for "this took effort and thought to make" has moved, so it's just a lot easier to make things that we would've considered interesting two years ago.
If you're asking HN readers to sift through additional commit history or "session transcripts" in order to decide if it's interesting, because there's a lot of noise, you've already failed. There's gonna be too much noise to make it worth that sifting. The elevator pitch is just gonna need to be that much different from "vibe coded thing X" in order for a project to be worth much.
I don’t have anything against vibe coded apps, but what makes them interesting is to see the vibe coding session and all the false starts along the way. You learn with them as they explore the problem space.
> It runs a commit and then stores a cleaned markdown conversation as a git note on the new commit.
So it doesn't seem that normal commit history is affected - git stores notes specially, outside of the commit (https://git-scm.com/docs/git-notes).
In fact github doesn't even display them, according to some (two-year-old) blog posts I'm seeing. Not sure about other interfaces to git (magit, other forges), but git log is definitely able to ignore them (https://git-scm.com/docs/git-log#Documentation/git-log.txt--...).
This doesn't mean the saved artifacts would necessarily be valuable - just that, unlike a more naive solution (saving in commit messages or in some directory of tracked files) they may not get in the way of ordinary workflows aside from maybe bloating the repo to some degree.
> Is Show HN dead? No, but it's drowning
Is spam on topic? and are AI codegen bots part of the community?
To me, the value of Show HN was rarely the thing, it was the work and attention that someone put into it. AI bot's don't do work. (What they do is worth it's own word, but it's not the same as work).
> I don't want to exclude these projects, because (1) some of them are good,
Most of them are barely passable at best, but I say that as a very biased person. But I'll reiterate my previous point. I'm willing to share my attention with people who've invested significant amounts of their own time. SIGNIFICANT amounts, of their time, not their tokens.
> (2) there's nothing wrong with more people being able to create and share things
This is true, only in isolation. Here, the topic is more, what to do about all this new noise, (not; should people share things they think are cool). If the noise drowns out the signal, you're allowed that noise to ruin something that was useful.
> (3) it's foolish to fight the future
coward!
I do hope you take that as the tongue-in-cheek way I meant it, because I say it as a friend would; but I refuse to resign myself completely to fatalism. Fighting the future is different from letting people doing something different ruin the good thing you currently have. Sure electric cars are the future, but that's no reason to welcome them in a group that loves rebuilding classic hot rods.
> (4) there's no obvious way to exclude them anyhow.
You got me there. But then, I just have to take your word for it, because it's not a problem I've spent a lot of time figuring out. But even then, I'd say it's a cultural problem. If people ahem, in a leadership position, comment ShowHN is reserved for projects that took a lot of time investment, and not just ideas with code... eventually the problem would solve itself, no? The inertia may take some time, but then this whole comment is about time...
I know it's not anymore, but to me, HN still somehow, feels a niche community. Given that, I'd like to encourage you to optimize for the people who want to invest time into getting good at something. A very small number of these projects could become those, but trying to optimize for best fairness to everyone, time spent be damned... I believe will turn the people who lift the quality of HN away.
This is the major blocker for me. However, there might be value in saving a summary - basically the same as what you would get from taking meeting notes and then summarizing the important points.
In this case, it was more of write the X language compiler using X. I had to prove to myself if keeping the session made sense, and what better way to do it than to vibe code the tool to audit vibe code.
I do get your point though
There is very clearly many things wrong with this when the things being shown require very little skill or effort.
The future you're concerned with defending includes bots being a large part of this community, potentially the majority. Those bots will not only submit comments autonomously, but create these projects, and Show HN threads. I.e. there will be no human in the loop.
This is not something unique to this forum, but to the internet at large. We're drowning in bot-generated content, and now it is fully automated.
So the fundamental question is: do you want to treat bots as human users?
Ignoring the existential issue, whatever answer you choose, it will inevitably alienate a portion of existing (human) users. It's silly I have to say this, but bots don't think, nor "care", but will keep coming regardless.
To me the obvious answer is "no". All web sites that wish to preserve their humanity will have to do a complete block of machine-generated content. It's a tough nut to crack, but I reckon YC would know some people capable of tackling this.
It's important to note that this state of a human driving the machine directly is only temporary. The people who think these are tools as any other are sorely mistaken. This tool can do their minimal effort job much more efficiently, cheaper, and with better results, and it's only a matter of time until the human is completely displaced. This will take longer for more complex work, of course, but creating regurgitated projects on GitHub and posting content on discussion forums is a very low bar activity.
git is only one possible location.
I think there is very valuable information in session logs, like the prompts, or the usage statistics at the end of the session, which model was used etc. But git history or the commit messages should focus on the outcome of the work, not on the process itself. This is why the whole issue discussion before work in git starts is also typically kept separately in tickets. Not in git itself, but close to it.
There're platforms like tulpal.com which move the whole local agent-supported process to the server and therefore have much better after-the-fact observability in what happened.
Prompts probably should be distilled / summarized, especially if they are research-based prompts, but code-gen prompts should probably be saved verbatim.
Reproducibility is a thing, and though perfect reproducibility isn't desirable, something needs to make up for the fact that vibe-coding is highly inscrutable and hard to review. Making the summary of the session too vague / distilled makes it hard to iterate and improve when / if some bad prompts / assumptions are not documented in any way.
I am talking about reproducing the (perhaps erroneous) logic or thinking or motivations in cases of bugs, not reproducing outputs perfectly. As you said, current LLM models are non-deterministic, so we can't have perfect reproducibility based on the prompts, but, when trying to fix a bug, having the basic prompts we can see if we run into similar issues given a bad prompt. This gives us information about whether the bad / bugged code was just a random spasm, or something reflecting bad / missing logic in the prompt.
> Is it important to know that when the foo file was refactored the developer chose to do it by hand vs letting the IDE do it with an auto-refactor command vs just doing a simple find and replace? Maybe it is for code review purposes, but for "reproducibility" I don't think it is.
I am really using "reproducibility" more abstractly here, and don't mean perfect reproducibility of the same code. I.e. consider this situation: "A developer said AI wrote this code according to these specs and prompt, which, according to all reviewers, shouldn't produce the errors and bad code we are seeing. Let's see if we can indeed reproduce similar code given their specs and prompt". The less evidence we have of the specifics of a session, the less reproducible their generated code is, in this sense.
Huh, I thought that's what commit message is for.
My current workflow: write a detailed plan first, then run a standard implement -> review loop where the agent updates the plan as errors surface. The final plan doc becomes something genuinely useful for future iterations, not just a transcript of how we got there.
Repos also message each other and coordinate plans and changes with each other and make feature requests which the repo agent then manages.
So I keep the agents’ semantically compressed memories as part of the repo as well as the original transcripts because often they lose coherence and reviewing every user submitted prompt realigns the specs and stories and requirements.
This applies both to future AI tools and also experts, and experts instructing novices.
To some degree, the lack of documenting AI sessions is also at the core of much of the skepticism toward the value of AI coding in general: there are so many claims of successes / failures, but only a vanishingly small amount of actual detailed receipts.
Automating the documentation of some aspects of the sessions (skills + prompts, at least) is something both AI skeptics and proponents ought to be able to agree on.
EDIT: Heck, if you also automate documenting the time spent prompting and waiting for answers and/or code-gen, this would also go a long way to providing really concrete evidence for / against the various claims of productivity gains.
Otherwise, when fixing a bug, you just risk starting from scratch and wasting time using the same prompts and/or assumptions that led to the issue in the first place.
Much of the reason code review was/is worth the time is because it can teach people to improve, and prevent future mistakes. Code review is not really about "correctness", beyond basic issues, because subtle logic errors are in general very hard to spot; that is covered by testing (or, unfortunately, deployment surprises).
With AI, at least as it is currently implemented, there is no learning, as such, so this removes much of the value of code review. But, if the goal is to prevent future mistakes, having some info about the prompts that led to the code at least brings some value back to the review process.
EDIT: Also, from a business standpoint, you still need to select for competent/incompetent prompters/AI users. It is hard to do so when you have no evidence of what the session looked like. Also, how can you teach juniors to improve their vibe-coding if you can't see anything about their sessions?
I don't think this is obvious at all. We don't make the keystroke logs part of the commit history. We don't make the menu item selections part of the commit history. We don't make the 20 iterations you do while trying to debug an issue part of the commit history (well, maybe some people do but most people I know re-write the same file multiple times before committing, or rebase/squash intermediate commits into more useful logical commits. We don't make the search history part of the commit history. We don't make the discussion that two devs have about the project part of the commit history either.
Some of these things might be useful to preserve some of the time either in the commit history or along side it. For example, having some documentation for the intent behind a given series of commits and any assumptions made can be quite valuable in the future, but every single discussion between any two devs on a project as part of the commit history would be so much noise for very little gain. AI prompts and sessions seem to me to fall into that same bucket.
Right, agreed on this, we want a distillation, not documentation of every step.
> For example, having some documentation for the intent behind a given series of commits and any assumptions made can be quite valuable in the future, but every single discussion between any two devs on a project as part of the commit history would be so much noise for very little gain. AI prompts and sessions seem to me to fall into that same bucket.
Yes, documenting every single discussion is a waste / too much to process, but I do think prompts at least are pretty crucial relative to sessions. Prompts basically are the core intentions / motivations (skills aside). It is hard to say whether we really want earlier / later prompts, given how much context changes based on the early prompts, but having no info about prompts or sessions is a definite negative in vibe-coding, where review is weak and good documentation, comments, and commit messages are only weakly incentivized.
> Some of these things might be useful to preserve some of the time either in the commit history or along side it
Right, along side is fine to me as well. Just something has to make up for the fact that vibe-coding only appears faster (currently) if you ignore the fact it is weakly-reviewed and almost certainly incurring technical debt. Documenting some basic aspects of the vibe-coding process is the most basic and easy way to reduce these long-term costs.
EDIT: Also, as I said, information about the prompts quickly reveals competence / incompetence, and is crucial for management / business in hiring, promotions, managing token budgets, etc. Oh, and of course, one of the main purposes of code review was to teach. Now, that teaching has to shift toward teaching better prompting and AI use. That gets a lot harder with no documentation of the session!
Not once have I found it useful: if the intention isn't clear from the code and/or concise docs, the code is bad and needs to be polished.
Well written code written with intention is instantly interpretable with an LLM. Sending the developer or LLM down a rabbit hole of drafts is a waste of cognition and context.
If you think you should squash commits, then you're only really interested in the final code change. The history of how the dev got there can go in the bin.
If you don't think you should squash commits then you're interested in being able to look back at the journey that got the dev to the final code change.
Both approaches are valid for different reasons but they're a source of long and furious debate on every team I've been on. Whether or not you should be keeping a history of your AI sessions alongside the code could be useful for debugging (less code debugging, more thought process debugging) but the 'prefer squash' developers usually prefer to look the existing code rather than the history of changes to steer it back on course, so why would they start looking at AI sessions if they don't look at commits?
All that said, your AI's memory could easily be stored and managed somewhere separately to the repo history, and in a way that makes it more easily accessible to the LLM you choose, so probably not.
And yes, it's my understanding that mercurial and fossil do actually do more of this than git does, but I haven't actually worked on any projects using those so I can't comment.
With vibe-coding, you risk having no documentation at all for the reasoning (AI comments and tests can be degenerate / useless), but the prompts, at bare minimum, reveal something about the reasoning / motivation.
Whether this needs to be in git or not is a side issue, but there is benefit to having this available.
Chat-Session-Ref: claude://gjhgdvbnjuteshjoiyew
Perhaps that could also link out to other kinds of meeting transcripts or something too.
It's less clear to me if the software isn't crafted by a human at all, though. In that case I would prefer to see the prompt.
1. Writing a spec with clear acceptance criteria.
2. Assigning IDs to my acceptance criteria. Sounds tedious, but actually the idea wasn’t mine, at some point an agent went and did it without me asking. The references proved so useful for guiding my review that I formalized the process (and switched from .md to .yaml to make it easier).
3. Giving my agents a source of truth to share implementation progress so they can plan their own tasks and more effectively review.
Of course, I can’t help myself, I had to formalize it into a spec standard and a toolkit. Gonna open source it all soon, but I really want feedback before I go too far down the rabbit hole:
https://acai.sh
Might be tedious for a human, but agents should do that just fine?
The noise to signal ratio seems so bad. You’d have to sift through every little “thought”. If I could record my thought stream would I add it to the commit? Hell no.
Now, a summary of the reasoning, assumptions made and what alternatives were considered? Sure, that makes for a great message.
Make a button that does X when clicked.
Agent makes the button.
I tell it to make the button red.
Agent makes it red.
I test it, it is missing an edge case. I tell it to fix it.
It fixes it.
I don't like where the button is. I tell it to put it in the sidebar.
It does that.
I can go on and on. But we don't need to know all those intermediaries. We just need to know Red button that does X by Y mechanism is in the sidebar. Tests that include edge cases here. All tests passing. 2026-03-01
And that document is persisted.
If later, the button gets deleted or moved again or something, we can instruct the agent to say why. Button deleted because not used and was noisy. 2026-03-02
This can be made trivial via skills, but I find it a good way to understand a bit more deeply than commit messages would allow me to do.
Of course, we can also just write (or instruct agents to write) better PRs but AFAICT there's no easy way to know that the button came about or was deleted by which PR unless you spelunk in git blame.
If you do proper software development (planing, spec, task breakdown, test case spec, implementation, unit test, acceptance test, ...) implementation is just a single step and the generated artifact is the source code. And that's what needs to be checked in. All the other artifacts are usually stored elsewhere.
If you do spec and planing with AI, you should also commit the outcome and maybe also the prompt and session (like a meeting note on a spec meeting). But it's a different artifact then.
But if you skip all the steps and put your idea directly to an coding agent in the hope that the result is a final, tested and production ready software, you should absolutely commit the whole chat session (or at least make the AI create a summary of it).
From that perspective alone the session would be important meta information that could be used to determine the rationale of a commit - right from the intent (prompt) to what the harness (Claude code etc) made of it. So there is more value in keeping it even in your second scenario
For my work as one of developers in team, no. The way I prompt is my asset and advantage over others in a team who always complain about AI not being able to provide correct solutions and secures my career
That context could clarify the problem, why the solution was chosen, key assumptions, potential risks, and future work.
Now whenever I need to reason with what agent did & why, info is linked & ready on demand. If needed, session is also saved.
It helps a lot.
A coding session has a lot of 'left turn, dead end, backtrack' noise that buries the decision that actually mattered. Committing the full session is like committing compiler output — technically complete, practically unreadable.
We've been experimenting with structured post-task reflections instead: after completing significant work, capture what you tried, what failed, what you'd do differently, and the actual decision reasoning. A few hundred tokens instead of tens of thousands. Commits with a reflection pointer rather than an embedded session.
The result is more useful than raw logs. Future engineers (or future AI sessions) can understand intent without replaying the whole conversation. It's closer to how good commit messages work — not 'here's what changed' but 'here's why'.
Dang's point about there being no single session is also real. Our biggest tasks span multiple sessions and multiple contributors. 'Capture the session' doesn't compose. 'Capture the decision' does.
Also, how can we (or future AI models) hope to improve if there is only limited and summary documentation of AI usage?
Until then, it makes sense to automatically include some distillation of the AI generation process, by default, IMO.
I only log my own user messages not AI responses in a chat_log.md file, which is created by user message hook in the repo.
the actual problem is that AI produces MORE code not better code, and most people using it aren't reviewing what comes out. if you understood the code well enough to review it properly you wouldn't need the session log. and if you didn't understand it, the session log won't help you either because you'll just see the agent confidently explaining its own mistakes.
> have your agent write a commit message or a documentation file that is polished and intended for consumption
this is the right take. code review and commit messages matter more now than they ever did BECAUSE there's so much more code being generated. adding another artifact nobody reads doesn't fix the underlying issue which is that people skip the "understand what was built" step entirely.
The paradigm shift, which is a shift back, is to embrace the fact that you have to slow down, and understand all the code the ai is writing.
Soon only implementation details will matter. Code can be generated based on those specifications again and again.
https://acai.sh
So far this workflow is the only way I’ve been able to have any real success running parallel agents or assigning longer running tasks that don’t get thrown out.
You also have code comments, docs in the repo, the commit message, the description and comments on the PR, the description and comments on your Issue tracker.
Providing context for a change is a solved problem, and there is relatively mature MCP for all common tooling.
I copied it for my own tooling to make it work a bit better for my workflows.
You can see the session (including my typos) and compare what was asked for and what you got.
https://rsaksida.com/blog/ape-coding/
Ape Coding [fiction] - https://news.ycombinator.com/item?id=47206798 - March 2026 (93 comments)
Excellent idea, I just wish GitHub would show notes. You also risk losing those notes if you rebase the commit they are attached to, so make sure you only attach the notes to a commit on main.
I did work around squash to collect all sessions and concatenate them as a single one
There is so much undefined in how agentic coding is going to mature. Something like what you're doing will need to be a part of it. Hopefully this makes some impressions and pushes things forward.
Saving sessions is even more pointless without the full context the LLM uses that is hidden from the user. That's too noisy.
1. Using LLMs as a tool but still very much crafting the software "by hand",
2. Just prompting LLMs, not reading or understanding the source code and just running the software to verify the output.
A lot of comments here seem to be thinking of 1. But I'm pretty sure the OP is thinking of 2.
Maybe not a permanent part of the commit, but something stored on the side for a few weeks at a time. Or even permanently, it could be useful to go back and ask, "why did you do it that way?", and realize that the reason is no longer relevant and you can simplify the design without worrying you're breaking something.
https://github.com/eqtylab/y just a prototype, built at codex hackathon
The barrier for entry is just including the complete sessions. It gets a little nuanced because of the sheer size and workflows around squash merging and what not, and deciding where you actually want to store the sessions. For instance, get notes is intuitive; however, there are complexities around it. Less elegant approach is just to take all sessions in separate branches.
Beyond this, you could have agents summarize an intuitive data structure as to why certain commits exist and how the code arrived there. I think this would be a general utility for human and AI code reviewers alike. That is what we built. Cost /utility need to make sense. Research needs to determine if this is all actually better than proper comments in code
The entire prompt and process would be fine if my git history was subject to research but really it is a tool for me or anyone else who wants to know what happened at a given time.
If I chat with an agent and give an initial prompt, and it gets "aspect A" (some arbitrary aspect of the expected code) wrong, I'll iterate to get "aspect A" corrected. Other aspects of the output may have exactly matched my (potentially unstated) expectation.
If I feed the initial prompt into the agent at some later date, should I expect exactly "aspect A" to be incorrect again? It seems more likely the result will be different, maybe with some other aspects being "unexpected". Maybe these new problems weren't even discussed in the initial archived chat log, since at that time they happened to be generated in a way in alignment with the original engineers expectation.
EOM
Back in the dark ages, you'd "cc -s hello.c" to check the assembler source. With time we stopped doing that and hello.c became the originating artefact. On the same basis the session becomes the originating artefact.
The reason you don't have to look at assembly is that the .c file is essentially a 100% reliable and unambiguous spec of how the assembly will look like, and you will be generating the assembly from that .c file as a part of the build process anyway. I don't see how this works here. It adds a lengthy artifact without lessening the need for a code review. It may be useful for investigations in enterprise settings, but in the OSS ecosystem?...
Also, people using AI coding tools to submit patches to open-source projects are weirdly hesitant to disclose that.
With compilers, the rules are clear, e.g. if you replace variable names with different ones, the program will still do the same thing. If you add spaces in places where whitespace doesn't matter, like around operators, the resulting behavior will still be the same. You change one function's definition, it doesn't impact another function's definition. (I'm sure you can nitpick this with some edge case, but that's not the point, it overwhelmingly can be relied upon in this way in day to day work.)
That is very much not the case with LLMs
Also, why would you need to reproduce it? You have the code. Almost any modification to said code would benefit from a fresh context and refined prompt.
An actual full context of a thinking agent is asinine, full of busy work, at best if you want to preserve the "reason" for the commits contents maybe you could summarise the context.
Other than that I see no reason to store the whole context per commit.
If that was important, why are we not already doing things like this. Should I have always been putting my browser history in commits?
You can avoid the noise with git notes. Add the session as a note on the commit. No one has to read them if they’re not interested.
Germans are much more diligent about staging before they commit.
https://techcrunch.com/2026/02/10/former-github-ceo-raises-r...
https://news.ycombinator.com/item?id=46961345
Consider:
"I got a bug report from this user:
... bunch of user PII ..."
The LLM will do the right thing with the code, the developer reviewed the code and didn't see any mention of the original user or bug report data.
Now the notes thing they forgot about goes and makes this all public.