15-418 Project Website Template

Summary

We plan to implement and evaluate parallel graph coloring algorithms on multicore CPU systems using OpenMP. Our project will focus on speculative parallel coloring and conflict resolution, and we will study how graph structure, synchronization overhead, and load imbalance affect scalability and coloring quality. As a stretch goal, we may explore an additional implementation variant or a lightweight GPU comparison if time permits.

Background

Graph coloring is the problem of assigning a color to each vertex in a graph such that no two adjacent vertices share the same color. While the optimization version that minimizes the total number of colors is NP-hard, many practical systems rely on heuristic coloring algorithms that trade optimality for speed. Graph coloring appears in applications such as register allocation in compilers, scheduling, sparse matrix computations, and resource assignment problems, where conflicting tasks or objects cannot share the same resource.

Our project will focus on parallelizing greedy-style graph coloring algorithms. A standard sequential greedy algorithm processes vertices one at a time and assigns each vertex the smallest color not currently used by one of its already-colored neighbors. This is simple and often effective, but it is inherently sequential and becomes slow on large graphs. Parallel graph coloring algorithms instead allow multiple vertices to be colored concurrently, often using speculative execution where vertices are tentatively colored in parallel and conflicts are detected afterward, and conflicting vertices are recolored in later rounds.

This workload can benefit from parallelism because large graphs contain many vertices whose coloring work can be attempted simultaneously, especially when those vertices are far apart in the graph or when conflicts are rare. At the same time, graph coloring is not embarrassingly parallel. Dependencies between adjacent vertices mean that correctness must be maintained through conflict detection, priorities, or recoloring. This makes graph coloring a good fit for the course, since it sits between trivially parallel and fully sequential computation and raises meaningful questions about synchronization, load balance, and work efficiency.

At a high level, our implementation plan is:

Build a sequential greedy coloring baseline.
Implement one or more parallel coloring strategies in OpenMP.
Detect conflicts between adjacent same-colored vertices.
Resolve conflicts and repeat until the coloring is valid.
Measure speedup, conflict rate, recoloring work, and color count across different graph classes.

Possible content

Short application description
What the core computation looks like
Where the parallel work comes from

Optional additions

Block diagram
Pseudocode snippet
Example input/output

The Challenge

The main challenge in parallel graph coloring is that the color chosen for a vertex depends on the colors of its neighbors, so adjacent vertices cannot always be colored independently. If two neighboring vertices are processed in parallel, they may both choose the same color, creating a conflict that must be fixed later. This creates a fundamental tradeoff of allowing more optimistic parallelism, which increases concurrency, but it may also increase the amount of wasted work from recoloring.

The workload is irregular in several ways. Graphs often have highly uneven degree distributions, so some vertices require much more work than others. This can lead to load imbalance across threads, especially on real-world graphs such as social or web networks. The memory access pattern is also irregular since coloring a vertex requires traversing its adjacency list, which may point to data scattered throughout memory. As a result, locality can be poor, and performance may be limited more by memory access and cache behavior than by raw arithmetic throughput. Since conflict detection often requires scanning neighbors or edges after a coloring round, there is also repeated graph traversal overhead.

Another challenge is synchronization and correctness. We need to decide how threads coordinate when they tentatively color vertices and how conflicts are resolved. Too much synchronization can destroy performance, while too little can create a large number of recoloring rounds. We also want to understand the quality-performance tradeoff: more aggressive parallelization may reduce runtime but increase the total number of colors used or the amount of redundant work.

The main questions we hope to learn from this project are:

How well does speculative graph coloring scale on multicore CPUs?
How much does graph structure affect conflict rate and speedup?
What are the main bottlenecks: load imbalance, memory locality, synchronization, or recoloring overhead?
How large is the tradeoff between runtime and coloring quality?

Workload characteristics

Dependencies
Memory locality / irregularity
Communication overhead
Divergent execution

System constraints

Core or thread organization
Memory hierarchy limits
Synchronization cost
Load balancing challenges

Resources

We plan to implement our code in C++ from scratch and use OpenMP for multicore parallelism. We will develop primarily on the GHC machines and, if available, use PSC machines for larger-scale evaluation with more cores. Our graph representation will likely use adjacency lists in CSR or a similar compact format to support efficient traversal.

We will use real-world and synthetic graph datasets for evaluation. Candidate graph types include social networks, web graphs, road networks, and randomly generated graphs, since these graph families have very different degree distributions and structural properties. This should allow us to study how algorithm behavior changes across different workloads.

We expect to start from scratch rather than using an existing codebase, since implementing the baseline and parallel variants ourselves will make it easier to reason about correctness and measure the effects of different design choices. For references, we plan to use prior work on scalable parallel graph coloring, especially:

Gebremedhin, A. H., and Manne, F. “Scalable Parallel Graph Coloring Algorithms.” Concurrency: Practice and Experience, 12(12), 2000.

We may also consult more recent graph coloring and graph processing papers depending on which specific parallel strategy we settle on.

Goals and Deliverables

Plan to Achieve

Our main goal is to implement and evaluate a working parallel graph coloring system on multicore CPUs. At minimum, we plan to:

Implement a correct sequential greedy graph coloring baseline in C++.
Implement at least one OpenMP-based parallel graph coloring algorithm using speculative coloring and conflict resolution.
Evaluate the implementation on multiple graph families and measure runtime, speedup, conflict count, recoloring work, and number of colors used.
Analyze how graph structure affects scalability and identify the main sources of performance loss.

We expect a successful project to show meaningful speedup over the sequential baseline on sufficiently large graphs, while also demonstrating a thoughtful analysis of when parallel graph coloring works well and when it struggles. We do not want to promise a specific speedup number too early, since the outcome depends heavily on graph structure and implementation choices, but we expect the parallel version to outperform the sequential baseline on large enough inputs.

Hope to Achieve

If the project goes well, we would like to:

Implement and compare more than one parallel coloring strategy, such as different conflict resolution policies or ordering heuristics.
Explore whether graph preprocessing or vertex ordering can reduce conflict rates and improve scalability.
Compare the behavior of the algorithm on very different graph classes and characterize which classes are more parallel-friendly.
Possibly explore a lightweight GPU version or additional platform comparison as a stretch goal, if time permits.

If implementation takes longer than expected, we will narrow the project to a strong sequential baseline, one carefully implemented OpenMP parallel version, and a solid evaluation on several graph types with speedup and bottleneck analysis. That would still be a successful project because the main value lies not only in building multiple implementations, but in understanding the performance tradeoffs of a nontrivial parallel graph workload.

Optional Demo / Analysis Questions

Our final deliverables will include a sequential greedy graph coloring implementation, one or more parallel OpenMP graph coloring implementations, experimental evaluation across several graph families and thread counts, a final report analyzing speedup, work efficiency, conflict behavior, and coloring quality, and a project website and poster summarizing our implementation and findings. We may also include a live demo at the poster session showing the coloring process or benchmark results on representative graphs, though our main focus will be on experimental evaluation rather than a demo.

Platform Choice

We chose a shared-memory multicore CPU platform with OpenMP because it is a natural fit for the graph coloring workload and aligns well with techniques covered in class. Graph coloring requires frequent access to neighboring vertices’ colors, so a shared-memory model avoids the complexity and duplication that would come with distributed-memory implementations. OpenMP also provides a practical framework for parallelizing irregular workloads while letting us experiment with scheduling strategies and synchronization choices.

C++ is a good choice because it gives us control over graph representation, memory layout, and performance-critical implementation details. Since graph coloring performance is likely to be influenced by cache behavior, contention, and adjacency traversal overhead, using a low-level language makes it easier to optimize and analyze these effects. Overall, a multicore CPU implementation is the right balance of ambition and feasibility for this project: the problem is irregular enough to be interesting, but still tractable with the tools and systems we have used in class.

Final Report

Our final report documents the full project, including the CPU OpenMP implementation, CUDA GPU comparison, benchmark methodology, results analysis, and final conclusions.

Download the Final Project Report (PDF)

Final Poster

Our poster summarizes the project motivation, approach, benchmark results, and final takeaways in a compact visual format for the poster session.

View the Final Poster (PDF)

Schedule

We will complete all implementation, experiments, analysis, report writing, and presentation materials before the April 24 presentation.

Note: schedule for April 14 onward was updated following the milestone report to reflect revised plans and owner assignments.

Week	Planned Tasks
Week 1: March 24 – March 30	Finalize project scope and choose the specific parallel graph coloring approach to implement; set up repository, build system, and testing framework; implement and validate a sequential greedy graph coloring baseline; gather initial graph datasets and build input parsing utilities.
Week 2: March 31 – April 6	Implement the first OpenMP-based parallel graph coloring version; add correctness checks for valid coloring and color count; run initial benchmarks on small and medium graphs; identify early bottlenecks in synchronization, conflict handling, and load balance.
Week 3: April 7 – April 13	Improve conflict detection and conflict resolution logic; add instrumentation for conflict counts, recoloring rounds, runtime breakdowns, and color count; benchmark across different graph families and thread counts; start analyzing how graph structure affects performance.
April 15 – April 16	Implement hybrid speculative + Jones-Plassmann algorithm (Jerry). Optimize existing speculative implementation based on bottleneck analysis (Harry).
April 17 – 18	Run large-scale experiments across all graphs and thread counts, finalize plots (Harry). Begin drafting the final report (Jerry).
April 19 – April 21	Keep working on report (Both). Implement GPU version if time allows (Harry). Optimize GPU version if time allows (Jerry).
April 22	Finalize report (Both). Begin working on poster (Both).
April 23	Finish poster (Both). Finalize website (Harry). Submit report (Jerry).