-
-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement deep search graph #260
Comments
I'll help you with this as we discussed yesterday. Let's go 🤝 |
@VinciGit00 main question is if any cascading deep search graph should have its own specific config, or even that is propagated |
@VinciGit00 can you please add some details on which are the things you are planning to do on top of what I have contributed? Based on my experience while implementing and testing the deep search, these are the areas I feel need a lot of attention:
|
@mayurdb coping with both your points might require having two cascading conditional nodes in each block one deciding if the information collected by now is enough to answer the initial query; if the answer is negative, and the most relevant links are fetched, the other conditional node should weigh them and decide if there's any additional link that might aid in answering the query with undiscovered information or simply return with incomplete information. |
I had a talk with @VinciGit00 a few days ago. Some of the nodes are likely to receive an overhaul to their inner workings, but without any breaking changes to their outer interface. He didn't like the idea of having to add signals right into the base node, at the core of the node system. That's why he rolled back one of @mayurdb's commits. We then tried to figure out if there could be a way to reimplement deep scraping using the original nodes, without signals nor runtime loops. My initial idea was to use recursion, but the graph engine doesn't seem to handle it well - it would cause stack overflows even with stop conditions in place. I then considered to readapt Mayur's original idea of iterating on a The last thing we need to figure out before implementing this is what conditions to put in place for early return. Ideally, we could implement two modes for the graph:
This is not just deep scraping: if this configuration works - for how inelegant it may be in its current form - it would further prove that @PeriniM's modular system is expressive enough for complex scraping operations. As for the actual implementation, Vinci asked me to have another talk before writing any code. That will take another couple of days to be possible. I'll be glad to have Mayur as a co-author and/or reviewer. Closing note: the right-side diagram is wrong. It's not a chain of deep scrapers, but of graph iterators running deep scrapers. Between each layer of graph iterators, there are merge nodes both for links and answers, and a conditional node to provide an early return. I feel like there's something off with the left diagram too, but I'd need to look at the original drawings we made on paper. |
@f-aguzzi, exchanged some ideas with @VinciGit00 as well; we should collect all thoughts, and devise the most sensible solution without requiring a major refactoring effort from the current modular graph execution engine. you could get the best of both cheap and accurate modes if you decouple responsibilities. A very rough pseudo implementation might look like the following, but we have to simplify it, matching our nodes: stateDiagram-v2
note left of deep_search_graph
- base_endpoint
- max_rounds
- max_subgraphs
- user_query
end note
state deep_search_graph {
[*] --> plan_node
note right of plan_node
- available_information
- available_endpoints
- visited_endpoints
- rounds
- user_query
- early_exit := rounds EQ max_rounds
- missing_information_query
end note
state rounds_expired <<choice>>
state enough_information <<choice>>
plan_node --> conditional_node_rounds
conditional_node_rounds --> rounds_expired
rounds_expired --> conditional_node : if early_exit false
rounds_expired --> merge_answers_node_final : if early_exit true
conditional_node --> enough_information
enough_information --> parallel_search_graph: not enough information
enough_information --> merge_answers_node_final : enough information
state parallel_search_graph {
state fork_state <<fork>>
state join_state <<join>>
[*] --> rerank_link_node
note right of rerank_link_node
- missing_information_query
- endpoints
end note
rerank_link_node --> graph_iterator_node
note left of graph_iterator_node
- k EQ max_subgraphs
- top_k_endpoints
end note
graph_iterator_node --> fork_state
fork_state --> search_graph_1
fork_state --> search_graph_2
fork_state --> search_graph_k
state search_graph_1 {
[*] --> fetch_node
fetch_node --> parse_node
parse_node --> RAG_node
parse_node --> search_link_node
RAG_node --> generate_answers_node
search_link_node --> [*]
generate_answers_node --> [*]
}
search_graph_1 --> join_state
search_graph_2 --> join_state
search_graph_k --> join_state
join_state --> merge_answers_node
note right of merge_answers_node
- information
- endpoints
FIXME: how'd we handle concat of link candidates?
end note
merge_answers_node --> [*]
}
parallel_search_graph --> plan_node
merge_answers_node_final --> [*]
}
|
Is your feature request related to a problem? Please describe.
Implement in a good way the deep search graph
Describe the solution you'd like
The text was updated successfully, but these errors were encountered: