From fragments to segments: How to work with meaningful units of text when preservation of the original structure is required?
Proper segmentation is crucial not only for increasing vendor efficiency, but also for avoiding the generation of useless TM entries. However, certain file formats come with an internal structure in which the source text is broken down into non-arbitrary segment-like units that need to be preserved and reproduced at the end of the workflow. The question is whether and how it is possible to reconcile this inherent structure with the kind of segmentation that is more conducive to linguistic purposes and that PMs and vendors want to work with.
Among other formats, subtitles pose such a challenge: time-coding enforces a structure that should be followed during translation, yet the chunks of text time-coding produces often do not coincide with sentences and other relevant linguistic units. How can then vendors be presented with well-formed segments in which they can freely vary the order of clauses and other elements when necessary? At the same time, how is it possible to observe the restrictions (e.g. character limits) imposed by format-specific requirements?