 
    
  Performance analysis of recorded music material has become increasingly important in musicological research and music psychology. In this paper, we present various techniques for extracting performance aspects from field recordings of folk songs. Main challenges arise from the fact that the recorded songs are performed by non-professional singers, who deviate significantly from the expected pitches and timings even within a single recording of a song. Based on a multimodal approach, we exploit the existence of a symbolic transcription of an idealized stanza in order to analyze a given audio recording of the song that comprises a large number of stanzas. As the main contribution of this paper, we introduce the concept of chroma templates by which consistent and inconsistent aspects across the various stanzas of a recorded song are captured in the form of an explicit and semantically interpretable matrix representation. Altogether, our framework allows for capturing differences in various musical dimension such as tempo, key, tuning, and melody.