# Week 4
## Python
### implementation
* Refactor code.
- Fix bug with separators. ✔
- Add option to cli to select which type of inequalities to be included. ✔
* Implement different type of constraints.
- _Solving the Maximum-Weight Connected Subgraph Problem to Optimality_ ❌ (moved to backlog)
- _Solving the Connected Dominating Set Problem and Power Dominating Set Problem by Integer Programming_ ✔
- Only one type of constraints which performed best in this paper. Performs bad on our graphs.
### runtime
* Do tests according to test concept and store data in some sort of strucured format(CSV/ XML/ JSON) which could be importet in python or R. ✔
## thesis
* Begin paragraph about results and add preliminary results and observations. ❌ (moved to week 5)
- Create a concept how to test and which test results you want to include. (✔) (a preliminary concept)
## literature
* Read _Imposing Connectivity Constraints in Forest Planning Models_.
- Check for different constraints which could strengthen the formulation. ❌ (moved to week 5. But with focus on symmetry breaking)
* Read _Solving the Connected Dominating Set Problem and Power Dominating Set Problem by Integer Programming_ ✔
- Different constraints to induce connectivity are compared. Try to implement some of them to compare them and see which one is best. You may see if this is related to a charateristic of your graphs. As the literature may states that for some type of graphs some approaches are better.
* Check _On imposing connectivity constraints in integer programs_ again to see if there is some other literature which you are missing. ❌ (moved to week 5)
# Week 5
## Python
### implementation
* Try to implement a symmetry breaker to check if this affects especially the runtime which is needed to close the gap and exclude all unconnected solutions with at most as many nodes as an optimal connected.
- May implement My Ky's, even if it is not optimal/ 100% correct. Just to observe if there is a connection between symmetrical unconnected solutions and the runtime and how strong this connection is. (Possible indicators: runtime, number of lazily added constraints)
### runtime
* Create random graphs and try to compare runtime between ILP and ASP.
- Measure some characteristics of the graphs such as density, |V|, |E|, maximum degree, minimum degree, average degree, (maybe standard derivation of degree or median degree?)
- Measure and note gap between |D| of a minimal connected and a minimal unconnected solution. Measure for each |D| number of unconnected solutions which were found(ILP-only) and how many constraints where added in sum.
- Measure the time which was needed to find the first (nearly) optimal solution (strong upper bound) and the time needed to close the gap.
* Make some more detailed tests and check for the following connections:
- To what degree does the number of unconnected k-hop solutions which have at most as many nodes as an optimal connected solution affect the runtime? (Calculate how many of them exist and add this data as a column to a table of test results)
- Is there a clear corellation between the number of constraints which were added and the runtime for (possible unconnected) solutions?
- Check for graphs where the gap between an optimal connected solution and the smallest unconnected solution is rather big if this approach(our implementation) might be unapplyable. Such that there is too much time wasted to exclude all possible unconnected solutions and increase the lower bound.
* Implement a heuristic to start with a stronger lower bound. ✔
- Given a symmetrical leaf where the root is centered (at least on the x-axis). There are at least ![width-2*k+height-k-\frac{width}{2}-k](https://render.githubusercontent.com/render/math?math=width-2*k%2Bheight-k-%5Cfrac%7Bwidth%7D%7B2%7D-k) nodes needed to provide a connected dominating set.
- To reach both rims in the width and the rim on the top.
- Unfortunately this did not improve the runtime at least for the bigger-leaf example with k=2. It drastically increased the runtime.
- So at first I would not follow this idea and try to find another technique to generate a better lower bound.
## thesis
* Begin paragraph about results and add preliminary results and observations.
- Refinde the concept how to test and which test results you want to include.
## literature
* Read _Imposing Connectivity Constraints in Forest Planning Models_. ✔
- Check for different constraints which could strengthen the formulation. (with focus on symmetry breaking)
- They did not mention a symmetry breaker *but* they mentioned some other types of inequalities which could strengthen the formulation according to the connectivity specification.
- A thing which was interesting that for their specific problems (which were not really close to MCDS and had only connectivity in common) they achived a much stronger LP bound.
- They added cuts before an ILP solution was found and added cuts for LP solutions also.
* Read _An Efficient Branch and Cut Algorithm to Find Frequently Mutated Subnetworks in Cancer_ again with focus on symmetry breaking.
* Read through _An Integer Programming Approach for Fault-Tolerant Connected Dominating Sets*_ again and check for symmetry breaking or other constraints to tighten up the space of solutions. ✔
- I could not find anything about symmetry breaking or additional inequalities for the case k=d=1 (Which is standard MCDS). But the table of results was interesting because they also tested their implemenation for the case k=d=1 which then is equal to our ILP-formulation. Their results were not bad but unfortunately I could not find anything more detailed description of their test graphs. Only number of nodes and density is shown. But those two properties are not sufficient as my personal test on random graphs revealed.
- An interesting observation, according to their result tables is that their largest test graph(with lowest densitiy) which has 200 nodes has an optimal solution that consists of "only" 26 nodes whereas an optimal solution for our bigger-leaf test graph in the case k=1 consists of 24 nodes while the graph only has 70 nodes. Both graphs have approximately the same density. So their graph must have some nodes which have a higher value/degree such that adding those nodes generates a higher profit than adding others. So there is no equal alternative to adding this nodes. This reduces the number of iterations and (as one can see in the table of results) the number of lazily added constraints. As a consequence I assume that is in an important factor which reduces the runtime.
# Backlog
## Python
### Implementation
* Remove lp_to_graph from package and instead use some other format to store graphs and use them as input. ✔
* Implement different type of constraints.
- _Solving the Maximum-Weight Connected Subgraph Problem to Optimality_
* Carefully check the literature i.e. _An Integer Programming Approach for Fault-Tolerant Connected Dominating Sets*_, _An Efficient Branch and Cut Algorithm to Find Frequently Mutated Subnetworks in Cancer_, _Thinning out Steiner trees: a node-based model for uniform edge costs_ to check if they did anything differently according to the sepearating constraints.
* Check if something from the algorithmic framework of _Thinning out Steiner trees: a node-based model for uniform edge costs_ can be applied to our case. (Local Branching, Heuristic, etc.)
* Implement a heuristic to provide a better upper bound.
- The intermediate root constraint could be useful to do so.
* Maybe *change the cost function for the nodes*.
- At the moment every node has a cost of 1.
- It could work to give each node the cost of shortest path between a node and the root.
- With this technique we would encourage the solver to prefer nodes that are close to the root. With this we would avoid that far distant nodes would be added and connected subcomponents which are far away from the root are generated.
- At least this could lead to a heuristic.
### packaging
* Add unit tests to the package
* Add dependencies to the recipe
* Find a place to store the code if the package will be published(Maybe a standard github repo)