DSPy and outlines preparation
DSPy and outlines preparation on macOS 10.13.6

DSPy, a framwork for algorithmically optimzing LM prompts and weights, and outlines, a opensource decoding framework for structured generation similar to openAI's Structured Outputs API, relies on apache arrow, which defines fast data access without serialization overhead. However, on some legacy systems, like my macOS 10.13.6 left for eGPU support, there is no pre-built arrow library, making the setup of DSPy and outlines imfeasible. This tutorial aims to describe how to build apache arrow library on macOS 10.13.6 step by step and verify outlines, DSPy after the setup.
Section 1: Apache Arrow setup
References:
1, Preparation of required libraries: don't directly use brew update && brew bundle --file=arrow/cpp/Brewfile
, because brew on macOS 10.13.6 now is out-of-dated and many libraries, like llvm 14, abestil-cpp, can't be upgraded to the latest version any more.
Conda installation for the required libraries:
conda install utf8proc lz4-c libthrift
Apache ORC installation:
git clone https://github.com/apache/orc
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=DEBUG
make package
make test-out
tar -xvf ORC-2.0.2-Darwin.tar.gz
cp -rf ORC-2.0.2-Darwin /usr/local/Cellar/ORC-2.0.2
Apache Arrow 17.0 installation:
mkdir apache-arrow
cd apache-arrow
git clone https://github.com/llv22/arrow_forward arrow
mkdir dist
- Set environment variables to let Arrow’s build system know about our build toolchain
export ARROW_HOME=$(pwd)/dist
export LD_LIBRARY_PATH=$(pwd)/dist/lib:$LD_LIBRARY_PATH
export CMAKE_PREFIX_PATH=$ARROW_HOME:$CMAKE_PREFIX_PATH
- Build arrow c++ library cleaning build folder and specifying ORC_ROOT
rm -rf ./arrow/cpp/build
cmake -S arrow/cpp -B arrow/cpp/build -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DORC_ROOT=/usr/local/Cellar/ORC-2.0.2 --preset ninja-release-python
cmake --build arrow/cpp/build --target install
- Build arrow python wheel
pushd arrow/python
export PYARROW_PARALLEL=4
export ARROW_BUILD_TYPE=debug
python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --bundle-arrow-cpp bdist_wheel
popd
- Installation of outlines and dspy
git clone https://github.com/outlines-dev/outlines.git
python -m build # wheel is located in dist/
git clone https://github.com/stanfordnlp/dspy.git
python setup.py clean bdist_wheel
python -m build # wheel is located in dist/
pip install dspy-ai[chromadb] # or [groq] or [marqo] or [milvus] or [mongodb] or [myscale] or [pinecone] or [qdrant] or [snowflake] or [weaviate]
Section 2: Outlines
Reference:
- https://outlines-dev.github.io/outlines/reference/models/openai/
- JSON is not supported by (Azure) OpenAI API
Outlines can't support JSON and structred data output based on open-source transformers.
from enum import Enum
from pydantic import BaseModel, constr
import outlines
import torch
class Weapon(str, Enum):
sword = "sword"
axe = "axe"
mace = "mace"
spear = "spear"
bow = "bow"
crossbow = "crossbow"
class Armor(str, Enum):
leather = "leather"
chainmail = "chainmail"
plate = "plate"
class Character(BaseModel):
name: constr(max_length=10)
age: int
armor: Armor
weapon: Weapon
strength: int
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
# Construct structured sequence generator
generator = outlines.generate.json(model, Character)
# Draw a sample
seed = 789001
character = generator("Give me a character description", seed=seed)
print(repr(character))
# Character(name='Anderson', age=28, armor=<Armor.chainmail: 'chainmail'>, weapon=<Weapon.sword: 'sword'>, strength=8)
character = generator("Give me an interesting character description", rng=rng)
print(repr(character))
# Character(name='Vivian Thr', age=44, armor=<Armor.plate: 'plate'>, weapon=<Weapon.crossbow: 'crossbow'>, strength=125)
OpenAI constrainted structure
from enum import Enum
from typing import Union
from pydantic import BaseModel
import openai
from openai import OpenAI
class Table(str, Enum):
orders = "orders"
customers = "customers"
products = "products"
class Column(str, Enum):
id = "id"
status = "status"
expected_delivery_date = "expected_delivery_date"
delivered_at = "delivered_at"
shipped_at = "shipped_at"
ordered_at = "ordered_at"
canceled_at = "canceled_at"
class Operator(str, Enum):
eq = "="
gt = ">"
lt = "<"
le = "<="
ge = ">="
ne = "!="
class OrderBy(str, Enum):
asc = "asc"
desc = "desc"
class DynamicValue(BaseModel):
column_name: str
class Condition(BaseModel):
column: str
operator: Operator
value: Union[str, int, DynamicValue]
class Query(BaseModel):
table_name: Table
columns: list[Column]
conditions: list[Condition]
order_by: OrderBy
client = OpenAI()
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{
"role": "system",
"content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function.",
},
{
"role": "user",
"content": "look up all my orders in may of last year that were fulfilled but not delivered on time",
},
],
tools=[
openai.pydantic_function_tool(Query),
],
)
print(completion.choices[0].message.tool_calls[0].function.parsed_arguments)
Section 3: DSPy
Reference:
import dspy
class RAG(dspy.Module):
def __init__(self, num_passages=3):
super().__init__()
self.retrieve = dspy.Retrieve(k=num_passages)
self.generate_answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = self.retrieve(question).passages
answer = self.generate_answer(context=context, question=question)
return answer
rag = RAG() # zero-shot, uncompiled version of RAG
rag("what is the capital of France?").answer # -> "Paris"