This page was exported from phaq [ http://phaq.phunsites.net ] Export date:Thu Apr 25 21:13:35 2024 / +0000 GMT ___________________________________________________ Title: Recognize invalid/unexpected characters with Perl --------------------------------------------------- Today a colleague of mine faced a very weird problem. While parsing XML output from an HP ILO into Perl, his code constantly broke with the message: FILE.XML:123 parser error : PCDATA invalid Char value 1 While the message itself states clearly that there is an unexpected character value (Char value 1, ASCII SOH) on one hand, it doesn't tell the character position on the other. Looking at the input string itself on the console, it wasn't obvious either: <EVENT SEVERITY="Caution" LAST_UPDATE="08/03/2009 22:20" INITIAL_UPDATE="08/03/2009 22:20" COUNT="1" DESCRIPTION="POST Error: " /> So I proposed to add some lines to help identify the character position on the given input string, which was basically this: @array = unpack("C*", $_my_input_var); foreach (@array) { printf("char "%s" is ord %sn", chr($_), $_); } This led to the following output: char "" is ord 62 So looking at this we saw that ASCII char 1 (which is an unprintable character, it will be represented as ^A in some editors like vi) was the fifth character before the end of the string. Well, basically the solution to this is to apply an additional input filter to remove ASCII char 1 like this: $_my_input_var =~ s/x01//g; While this solves just _this_ problem, a more solid solution is to remove all non-printable characters as well, given the list of ASCII characters at http://www.asciitable.com/. So a filter like this may apply, removing all non-printable characters, leaving just a few control characters 1x08 to 1x1F (Tab, Carriage Return, Line Feed and a few others) and the printable characters in it. $_my_input_var =~ s/[x00-x08x0B-x1Fx7F-xFF]//g; --------------------------------------------------- Images: --------------------------------------------------- --------------------------------------------------- Post date: 2009-10-20 10:16:54 Post date GMT: 2009-10-20 09:16:54 Post modified date: 2010-12-18 17:29:30 Post modified date GMT: 2010-12-18 16:29:30 ____________________________________________________________________________________________ Export of Post and Page as text file has been powered by [ Universal Post Manager ] plugin from www.gconverters.com