Fantasticka Stvorka Takes on the Sustainability Hackathon

October 31, 2025 By Team Fantasticka Stvorka

#Hackathon #AI #Sustainability

Team Fantasticka Stvorka awarded second place certificate at AIM Hackathon 3

When Team Fantasticka Stvorka entered a sustainability-focused hackathon, we knew we were in for a challenge. The task? Build an AI agent capable of accurately answering complex sustainability questions while providing reliable sources. Our approach brought us on the second place!

The Challenge

The hackathon presented us with a unique problem: create an AI agent that could answer sustainability-related questions with both accuracy and proper source attribution. The training dataset contained questions in a structured format, each with specific answer types, units, difficulty levels, and source information.

Sample Training Data

Here's an example of what we were working with:

{
  "5": {
    "question": "What were the annual total greenhouse gas emissions including land use in tonnes for Austria in 2000?",
    "answer": 76664000.0,
    "answer_type": "float",
    "unit": "tonnes",
    "difficulty": "easy",
    "comment": "DB entry of 'total_ghg' for Austria with 'year' 2000.",
    "sources": [
      {
        "source_name": "owid_co2_data",
        "source_type": "database",
        "page_number": null
      }
    ]
  }
}

The real twist? The final test set wouldn't include helpful metadata like difficulty tags—our agent needed to work in the wild, so to speak.

Our Three-Pronged Approach

We designed our AI agent to pull information from three distinct sources, each requiring different technical approaches:

1. Building a RAG System for PDF Documents

The first challenge was extracting information from three provided PDF documents. We built a Retrieval-Augmented Generation (RAG) system from scratch that would:

Chunk the PDFs into manageable segments for processing
Encode chunks into embeddings using modern embedding models
Retrieve relevant passages based on semantic similarity to the question
Extract precise information from the most relevant chunks

To our surprise, this worked remarkably well! The system could accurately find and extract information from the documents. However, we encountered one persistent challenge: correctly citing the specific page number from which the information came. While the content was accurate, pinpointing the exact source location proved tricky.

2. Database Integration

The second data source was a structured database with multiple schemas containing sustainability metrics. We equipped our agent with the ability to:

Generate appropriate SQL SELECT queries based on natural language questions
Query the correct schemas for specific types of data
Parse and format database results into human-readable answers
Properly attribute data to the database source

This approach proved highly effective for questions requiring precise numerical data with temporal context, such as emission statistics for specific countries and years.

3. Wikipedia API Integration

For broader contextual questions and information not available in our structured sources, we integrated the Wikipedia API. This required careful prompt engineering to:

Format queries in a way that would return relevant Wikipedia articles
Parse Wikipedia's response format correctly
Extract the most relevant information from lengthy articles
Ensure answers matched the required output format

"The key was teaching our agent not just to find information, but to know where to look and how to ask."

Focused coding session by a member of Team Fantasticka Stvorka

The Importance of Prompt Engineering

Perhaps the most critical aspect of our solution was prompt engineering. Our agent needed to:

Understand the question format — Recognize what type of answer was expected (numerical, categorical, temporal, etc.)
Choose the right source — Determine whether to check the database, PDFs, or Wikipedia first
Format answers consistently — Provide responses in the exact format required by the evaluation system
Include proper attribution — Always cite the source of information correctly

Key Takeaways

RAG systems work surprisingly well — Even a custom-built RAG system can effectively retrieve information from documents, though citation precision remains a challenge worth addressing.
Multi-source agents need smart routing — Teaching an AI when to use which data source is just as important as the retrieval mechanism itself.
Prompt engineering is critical — The difference between a working and excellent agent often comes down to how well you've crafted your prompts and output formats.

Final Thoughts

We had an intense and rewarding weekend — and we're proud to say Team Fantasticka Stvorka finished second place in the competition. This result reflects a lot of hard work and a healthy dose of creativity.

Our system ran as a suite of components: the RAG pipeline for PDFs, the relational database back-end, and Wikipedia lookups were orchestrated like MCP server–style microservices (separate "mini" servers that spoke to the agent). We also had to handle a lot of numerical gymnastics: converting units, adding and dividing values, performing calculations and rounding results at the end. Ensuring the agent handled those arithmetic steps correctly was surprisingly difficult and a major engineering challenge.

You'll find the full codebase, example data, and everything we used on our GitHub: https://github.com/lukatko/Hackathon-AIM-2025.

We'd also like to congratulate all the other teams for their creativity and strong solutions — and extend special thanks to the organizers for running a fantastic event.

Team Fantasticka Stvorka collaborating and coding during the hackathon

Find the Team

You can find all four of us on LinkedIn:

Team Fantasticka Stvorka

We're a group of students studying data science who are passionate about AI. We teamed up for this hackathon to combine our skills in machine learning, prompt engineering, and data mining to tackle real-world environmental data challenges.