I’ve been thinking about how to do this well, how my memory actually works. I think what is happening is I’ve either got the facts now (that is easy to repro w/ a system like this) or I’ve got an idea that I could have the facts after working on retrieval. It’s like I’ve got a feeling or sense that somewhere in cold storage is the info I need so I kick off a background process to get it. Sometimes it works.
That second system, the “I know this…” system is I think what is missing from these LLMs. They have the first one, they KNOW things they’ve seen during training, but what I think is missing is the ability to build up the working set as they are doing things, then get the “feeling” that they could know this if they did a little retrieval work. I’ve been thinking about how to repro that in a computer where knowledge is 0|1, but could be slow to fetch
You've identified a fundamental gap - that meta-cognitive "I could retrieve this" intuition that humans have but LLMs lack.
Our graph approach addresses this:
- Structure knowledge with visible relationship patterns before loading details
- Retrieval system "senses" related information without fetching everything
- Temporal tracking prioritizes recent/relevant information
- Planning recall frequency tracking for higher weightage on accessed facts
In SOL(personal assistant), we guide LLMs to use memory more effectively by providing structured knowledge boundaries. This creates that "I could know this if I looked" capability.
Do you have a take on how we reconcile the tension between these objectives? How to make sure the model has access to relevant info, while explicitly excluding irrelevant or confounding factors from the context?
that's the exact problem we've been solving! Context bloat vs. memory depth is the core challenge.
our approach tackles this by being selective, not comprehensive. We don't dump everything into context - instead, we:
- use graph structure to identify truly relevant facts (not just keyword matches)
- leverage temporal tracking to prioritize current information and filter out outdated beliefs
- structure memories as discrete statements that can be included/excluded individually
the big advantage? Instead of retrieving entire conversations or documents, we can pull just the specific facts and relevant episodes needed for a given query.
it's like having a good assistant who knows when to remind you about something relevant without overwhelming you with every tangentially related memory.
the graph structure also gives users more transparency - they can see exactly which memories are influencing responses and why, rather than a black-box retrieval system.
One of the challenges I was facing with other memory MCP servers is to get the LLM clients to actually use it to recall relevant information when they need it. Implementing to MCP tools is one thing, getting LLM clients to invoke them at the right time is another.
We faced the same challenge while building SOL (https://github.com/RedPlanetHQ/sol) — a personal assistant that relies heavily on memory for context and continuity.
Getting LLMs to invoke memory tools at the right time is definitely trickier than just wiring up MCP correctly. We're still refining it, but we've made good progress by explicitly guiding the assistant within the system prompt on when and how to use memory.
Using something on similar lines as rules in claude/cursor etc has been working better. It’s not perfect yet, but this combination of prompt engineering and structured tool exposure has been moving us in the right direction.
For those asking how this is different from a simple text based memory archive, I think that is answered here:
---
Unlike most memory systems—which act like basic sticky notes, only showing what’s true right now. C.O.R.E is built as a dynamic, living temporal knowledge graph:
Every fact is a first-class “Statement” with full history, not just a static edge between entities.
Each statement includes what was said, who said it, when it happened, and why it matters.
You get full transparency: you can always trace the source, see what changed, and explore why the system “believes” something.
---
I'm not sure the graph offers any clear advantage in the demonstrated use case.
It's overhead in coding.
The source is the doc. Raw text is as much of a fact as an abstracted data structure derived from that text (which is done by an external LLM - provenance seems to break here btw, what other context is used to support that transcription, why is it more reliable than a doc within the actual codebase?).
Hey - i agree that the demonstrated use can be solved with simple plan.md file in the codebase itself.
With use-case we wanted to showcase the shareable aspect of CORE more. The main problem statement we wanted to address was "take your memory to every AI" and not repeating yourself again and again anymore.
The relational graph based aspect of CORE architecture is an overkill for simple fact recalling. But if you want an intelligent memory layer about you that can answer What, When, Why and also is accessible in all the major AI tools that you use, then CORE would make more sense.
Hey we are actively working on improving support for Llama models. At the moment, CORE does not provide optimal results with Llama-based models, but we are making progress to ensure better compatibility and output in the near future.
Also we build core first internally for our main project SOL - AI personal assistant. Along the journey of building a better memory for our assistant we realised it's importance and are of the opinion that memory should not be vendor
locked. It should be pluggable and belong to the user. Hence build it as a separate service.
I definitely would not recommend llama models, they were mostly outdated by the time they released, but the likes of Qwen, deepseek etc are much more useful.
Hey plan.md mostly will be a static file that you manually have to maintain. It won't be relational and not be able to form connections between info. You can't recall or query intelligently? (When did my preference change?)
CORE lets you
- Automatically extracts and stores facts from conversations
- Builds intelligent connections between related information
- Answers complex queries ("What did I say about something and when?")
- Detects contradictions and explains changes with full context
For simple fact recall, plan.md should work but for complex systems a relational memory should be able to help better.
I've been building a memory system myself, so I have some thoughts...
Why use a knowledge graph/triples? I have not been able to come up with any use for the predicate or reason to make these associations. Simple flat statements seem entirely sufficient and more accurate to the source material.
... OK, looking a little more, I'm guessing it is a way to see when a memory should be updated; you can match on the first two items of the predicate. In a sense you are normalizing the input and hoping that shows an update or duplicate memory.
I would be curious how well this works in practice. I've spent a fair amount of effort trying to merge and deduplicate memories in a more ad hoc way, generally using the LLM for this process (giving it a new memory and a list of old memories). It would feel much more deterministic and understandable to do this in a structured way. On the other hand I'm not sure how stable these triples would be. Would they all end up attached to the user? And will the predicate be helpful to establish meaningful relationships, or could the memories simply be attached to an entity?
For instance I could list a bunch of facts related to my house: the address, which room I sleep in, upcoming and past repairs, observations on the yard, etc. Many (but not all) of these could be best represented as one "about my house" memory, with all the details embedded in one string of natural language text. It would be great to structure repairs... but how will that work? (my house, needs repair, attic bath)? Or (my house, has room, attic bathroom) and (attic bathroom, needs repair, bath)? Will the system pick one somewhat arbitrarily then, being able to see that past memory, replicate its structure?
Another representation that occurs to for detecting duplicates and updates is simply "is related to entities". This creates a flatter database where there's less ambiguity in how memories are represented.
Anyway, that's one area that stuck out to me. It wasn't clear to me where the schema for memories is in the codebase, I think that would be very useful to understanding the system.
I built a graph memory MCP tool as well. I don't use triplets, instead I generate nodes. The node is composed of (id, title, text) and text can contain links inlined, like @45, referencing past nodes. So it can create both a node and its relations in one tool call.
My MCP has two tools - a search tool and a node adding tool. The search tool uses embedding similarity to retrieve K nodes, then expands on links and fetches another P nodes. By controlling K and P the LLM can choose to use the graph as a simple RAG or as a pure linked graph, or anywhere in-between. In practice I use Claude which is able to do deep searches. What it does not find in one call it locates in 4-5 calls.
The LLM will only add new ideas not already in the KB. It does the searching, filtering and writing. I am just directing this process. The KB can grow unbounded because when I need to add new nodes I first search the KB and find relevant nodes to link to without loading every node.
But one problem I see with these memory systems is that they can reduce interest on a topic once we put it in the KB.
Hey, another co-founder of CORE. Great question about triples vs. fact statements! Your house example actually highlights why we went with a reified graph:
With fact statements, you'd need to decide upfront: is this one "about my house" memory or separate facts? Our approach lets you do both:
Representation flexibility: For your house example, we can model (house, needs repair, attic bath) AND connect it to (attic bathroom, has fixture, bath). The LLM extraction helps maintain consistency, but the graph structure allows both high-level and detailed representations simultaneously.
Updating and deduplication:
- We identify potential duplicates/updates by matching subject-predicate patterns
- When new information contradicts old (e.g., repair completed), we don't delete - we mark the old statement invalid at timestamp X and create a new valid statement
- This maintains a complete history while still showing the current state
- The structured format makes conflicts explicit rather than buried in text
The schema isn't rigid - we have predefined types (Person, Place, etc.), but relationships form dynamically. This gives structure where helpful, but flexibility where needed.
In practice, we've found this approach more deterministic for tracking knowledge evolution while still preserving the context and nuance of natural language through provenance links.
I love how we have come full circle. Anybody remembers the "semantic web" (RDF-based knowledge graph)? It didn't take off because building and maintaining such a graph requires extensive knowledge engineering work and tools. Fast forward a couple of decades and we have LLMs, which is basically auto-complete on steroids based on general knowledge, with the downside that it doesn't "remember" any facts unless you spoon-feed it with the right context. We're now back to: "let's encode context knowledge as a graph and plug it into LLMs". Fun times :)
The problem with semantic web was deeper, people had to agree on the semantics that would be formalized as triples and getting people to agree on an ongoing basis is not an easy task.
My question is, what’s the value of explicitly storing semantics as triples when the LLM can infer the semantics on runtime?
Not much tbh. I'm using markdown files as a memory bank[1] for my projects and it works well without the need to structure them in a schema/graph. But I guess one benefit of this particular memory graph implementation is its temporal aspect: searchable facts can evolve over time; i.e. what is true now and how it got here.
There are 3 major differences between Zep and CORE
1. Market: Zep is B2B focused, CORE indvidual users
2. Portablity: Zep is locked to their platform , CORE works across claude, cursor, windsurf
3. Architecture: Zep is Temporal based vs CORE is Reified + Temporal based graph
What this means:
Zep remembers what happened when
CORE remembers what happened when + why we should believe it + how facts relate
Example:
You say "I love Thai food" → Later: "Actually, I hate Thai food"
Zep: "You hate Thai food" (old preference vanishes)
CORE: "You currently hate Thai food. This contradicts your earlier statement from [date/source]. The change came from your correction today."
Bottom line: CORE provides full explainability and audit trails that Zep cannot.
Graphiti is free and open source. It's MCP server works with any MCP client, from Cursor to Claude, too...
Graphiti MCP has tens of thousands of users. They deploy it to their desktops, servers, you name it. And for many different use cases: B2B, B2C, and personal use.
That second system, the “I know this…” system is I think what is missing from these LLMs. They have the first one, they KNOW things they’ve seen during training, but what I think is missing is the ability to build up the working set as they are doing things, then get the “feeling” that they could know this if they did a little retrieval work. I’ve been thinking about how to repro that in a computer where knowledge is 0|1, but could be slow to fetch
Our graph approach addresses this: - Structure knowledge with visible relationship patterns before loading details
- Retrieval system "senses" related information without fetching everything
- Temporal tracking prioritizes recent/relevant information
- Planning recall frequency tracking for higher weightage on accessed facts
In SOL(personal assistant), we guide LLMs to use memory more effectively by providing structured knowledge boundaries. This creates that "I could know this if I looked" capability.
However, keeping a tight, constrained context turns out to actually be pretty important for correct LLM results (https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-ho...).
Do you have a take on how we reconcile the tension between these objectives? How to make sure the model has access to relevant info, while explicitly excluding irrelevant or confounding factors from the context?
that's the exact problem we've been solving! Context bloat vs. memory depth is the core challenge.
our approach tackles this by being selective, not comprehensive. We don't dump everything into context - instead, we:
- use graph structure to identify truly relevant facts (not just keyword matches) - leverage temporal tracking to prioritize current information and filter out outdated beliefs - structure memories as discrete statements that can be included/excluded individually the big advantage? Instead of retrieving entire conversations or documents, we can pull just the specific facts and relevant episodes needed for a given query.
it's like having a good assistant who knows when to remind you about something relevant without overwhelming you with every tangentially related memory.
the graph structure also gives users more transparency - they can see exactly which memories are influencing responses and why, rather than a black-box retrieval system.
ps: one of the authors of CORE
How do you solve that problem?
Getting LLMs to invoke memory tools at the right time is definitely trickier than just wiring up MCP correctly. We're still refining it, but we've made good progress by explicitly guiding the assistant within the system prompt on when and how to use memory.
You can see an example of how we structure this in SOL here: Prompt instructions for memory usage (https://github.com/RedPlanetHQ/sol/blob/964ed23c885910e040bd...)
Using something on similar lines as rules in claude/cursor etc has been working better. It’s not perfect yet, but this combination of prompt engineering and structured tool exposure has been moving us in the right direction.
ps - one of the authors of CORE
--- Unlike most memory systems—which act like basic sticky notes, only showing what’s true right now. C.O.R.E is built as a dynamic, living temporal knowledge graph:
Every fact is a first-class “Statement” with full history, not just a static edge between entities. Each statement includes what was said, who said it, when it happened, and why it matters. You get full transparency: you can always trace the source, see what changed, and explore why the system “believes” something. ---
It's overhead in coding.
The source is the doc. Raw text is as much of a fact as an abstracted data structure derived from that text (which is done by an external LLM - provenance seems to break here btw, what other context is used to support that transcription, why is it more reliable than a doc within the actual codebase?).
With use-case we wanted to showcase the shareable aspect of CORE more. The main problem statement we wanted to address was "take your memory to every AI" and not repeating yourself again and again anymore.
The relational graph based aspect of CORE architecture is an overkill for simple fact recalling. But if you want an intelligent memory layer about you that can answer What, When, Why and also is accessible in all the major AI tools that you use, then CORE would make more sense.
This does not seem to be local and additionally appears to be tied to one SaaS LLM provider?
Also we build core first internally for our main project SOL - AI personal assistant. Along the journey of building a better memory for our assistant we realised it's importance and are of the opinion that memory should not be vendor locked. It should be pluggable and belong to the user. Hence build it as a separate service.
We will evaluate qwen and deepseek going forward, thanks for mentioning.
We designed CORE for complex, evolving memory where text files break down.
Example: Health conversations across ChatGPT, Claude, etc. where your parameters change over time.
A text file can't give you: "What medications have I tried, why did I stop each one, and when?" or "Show me how my symptoms evolved over 6 months."
For timeline and relational memory, CORE wins. For static facts, text files are enough i guess.
CORE lets you - Automatically extracts and stores facts from conversations - Builds intelligent connections between related information - Answers complex queries ("What did I say about something and when?") - Detects contradictions and explains changes with full context
For simple fact recall, plan.md should work but for complex systems a relational memory should be able to help better.
Why use a knowledge graph/triples? I have not been able to come up with any use for the predicate or reason to make these associations. Simple flat statements seem entirely sufficient and more accurate to the source material.
... OK, looking a little more, I'm guessing it is a way to see when a memory should be updated; you can match on the first two items of the predicate. In a sense you are normalizing the input and hoping that shows an update or duplicate memory.
I would be curious how well this works in practice. I've spent a fair amount of effort trying to merge and deduplicate memories in a more ad hoc way, generally using the LLM for this process (giving it a new memory and a list of old memories). It would feel much more deterministic and understandable to do this in a structured way. On the other hand I'm not sure how stable these triples would be. Would they all end up attached to the user? And will the predicate be helpful to establish meaningful relationships, or could the memories simply be attached to an entity?
For instance I could list a bunch of facts related to my house: the address, which room I sleep in, upcoming and past repairs, observations on the yard, etc. Many (but not all) of these could be best represented as one "about my house" memory, with all the details embedded in one string of natural language text. It would be great to structure repairs... but how will that work? (my house, needs repair, attic bath)? Or (my house, has room, attic bathroom) and (attic bathroom, needs repair, bath)? Will the system pick one somewhat arbitrarily then, being able to see that past memory, replicate its structure?
Another representation that occurs to for detecting duplicates and updates is simply "is related to entities". This creates a flatter database where there's less ambiguity in how memories are represented.
Anyway, that's one area that stuck out to me. It wasn't clear to me where the schema for memories is in the codebase, I think that would be very useful to understanding the system.
My MCP has two tools - a search tool and a node adding tool. The search tool uses embedding similarity to retrieve K nodes, then expands on links and fetches another P nodes. By controlling K and P the LLM can choose to use the graph as a simple RAG or as a pure linked graph, or anywhere in-between. In practice I use Claude which is able to do deep searches. What it does not find in one call it locates in 4-5 calls.
The LLM will only add new ideas not already in the KB. It does the searching, filtering and writing. I am just directing this process. The KB can grow unbounded because when I need to add new nodes I first search the KB and find relevant nodes to link to without loading every node.
But one problem I see with these memory systems is that they can reduce interest on a topic once we put it in the KB.
With fact statements, you'd need to decide upfront: is this one "about my house" memory or separate facts? Our approach lets you do both:
Representation flexibility: For your house example, we can model (house, needs repair, attic bath) AND connect it to (attic bathroom, has fixture, bath). The LLM extraction helps maintain consistency, but the graph structure allows both high-level and detailed representations simultaneously.
Updating and deduplication: - We identify potential duplicates/updates by matching subject-predicate patterns - When new information contradicts old (e.g., repair completed), we don't delete - we mark the old statement invalid at timestamp X and create a new valid statement - This maintains a complete history while still showing the current state - The structured format makes conflicts explicit rather than buried in text
The schema isn't rigid - we have predefined types (Person, Place, etc.), but relationships form dynamically. This gives structure where helpful, but flexibility where needed.
In practice, we've found this approach more deterministic for tracking knowledge evolution while still preserving the context and nuance of natural language through provenance links.
My question is, what’s the value of explicitly storing semantics as triples when the LLM can infer the semantics on runtime?
[1] https://docs.cline.bot/prompting/cline-memory-bank
I guess "semantic web" folks were right about the destination, just few years early :P
There are 3 major differences between Zep and CORE 1. Market: Zep is B2B focused, CORE indvidual users 2. Portablity: Zep is locked to their platform , CORE works across claude, cursor, windsurf 3. Architecture: Zep is Temporal based vs CORE is Reified + Temporal based graph
What this means:
Zep remembers what happened when CORE remembers what happened when + why we should believe it + how facts relate
Example: You say "I love Thai food" → Later: "Actually, I hate Thai food"
Zep: "You hate Thai food" (old preference vanishes) CORE: "You currently hate Thai food. This contradicts your earlier statement from [date/source]. The change came from your correction today."
Bottom line: CORE provides full explainability and audit trails that Zep cannot.
Graphiti MCP has tens of thousands of users. They deploy it to their desktops, servers, you name it. And for many different use cases: B2B, B2C, and personal use.
More here: https://github.com/getzep/graphiti
Source: me, one of the authors of Graphiti :-)