Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Chris Saxon

Thanks for the question, Simas.

Asked: November 28, 2016 - 8:34 am UTC

Last updated: November 29, 2016 - 11:10 am UTC

Version: 11.2.0.3

Viewed 10K+ times! This question is

You Asked

Hi,

I'm in need to encode and decode big xml, however when i'm decoding from base64 after some iterations it fails to decode and i get symbols like this, "S��FS���S��FS���S�". Here's the script to reproduce the problem i'm facing:

declare

  l_clob clob := empty_clob();

  function gen_rand_xml return clob is
    
    l_xml xmltype := xmltype('<envelope><nullnode></nullnode></envelope>');
  begin
    for i in 1..100 loop
        SELECT
          insertXMLafter(
            l_xml,
            '/envelope/nullnode',
            XMLType('<node>' || i || '</node>'))
        INTO
          l_xml
        FROM dual;
    end loop;
    
    return l_xml.getClobVal();
  end;

  function to_base64(
    p_clob clob) return clob is
    
    l_length integer := dbms_lob.getLength(p_clob);
    l_offset integer := 1;
    l_amt binary_integer := 600;
    l_buffer varchar2(1800);
    
    l_result clob := empty_clob();
  begin
    while l_offset <= l_length loop
      l_result := l_result || utl_raw.cast_to_varchar2(utl_encode.base64_encode(utl_raw.cast_to_raw(dbms_lob.substr(p_clob, l_amt, l_offset))));
      l_offset := l_offset + l_amt;
    end loop;
    
    return l_result;
  end;
  
  function from_base64(
    p_clob clob) return clob is
    
    l_length integer := dbms_lob.getLength(p_clob);
    l_offset integer := 1;
    l_amt binary_integer := 800;
    l_buffer varchar2(3200);
    
    l_result clob := empty_clob();
  begin
    while l_offset <= l_length loop
      l_result := l_result || utl_raw.cast_to_varchar2(utl_encode.base64_decode(utl_raw.cast_to_raw(dbms_lob.substr(p_clob, l_amt, l_offset))));
      l_offset := l_offset + l_amt;
    end loop;
    
    return l_result;
  end;
begin
  l_clob := gen_rand_xml();
  dbms_output.put_line(from_base64(to_base64(l_clob)));
end;
/


Input:

<envelope>
  <nullnode/>
  <node>100</node>
  <node>99</node>
  <node>...</node>
  ................
  <node>2</node>
  <node>1</node>
</envelope>


Here's the output i get:

<envelope>
  <nullnode/>
  <node>100</node>
  <node>99</node>
  <node>...</node>
  ................
  <node>25</node><noFS���S��FS...



and Chris said...

The problem is your substring routine is chopping the input at the wrong point!

Encoding converts every three input 8-bit groups into a 24-bit group. So if you start chopping up the encoded string anywhere they're not going to map back to the original values correctly. The encoded binary data is ~1.37 times larger than the original.

To fix this, change the amounts you process in your to/from routines to be roughly in this ratio. For example, 48 and 66:

declare

  l_clob clob := empty_clob();

  function gen_rand_xml return clob is
    
    l_xml xmltype := xmltype('<envelope><nullnode></nullnode></envelope>');
  begin
    for i in 1..100 loop
        SELECT
          insertXMLafter(
            l_xml,
            '/envelope/nullnode',
            XMLType('<node>' || i || '</node>'))
        INTO
          l_xml
        FROM dual;
    end loop;
    
    return l_xml.getClobVal();
  end;

  function to_base64(
    p_clob clob) return clob is
    
    l_length integer := dbms_lob.getLength(p_clob);
    l_offset integer := 1;
    l_amt binary_integer := 48;
    l_buffer varchar2(1800);
    
    l_result clob := empty_clob();
  begin
    while l_offset <= l_length loop
      l_result := l_result || 
utl_raw.cast_to_varchar2(utl_encode.base64_encode(utl_raw.cast_to_raw(dbms_lob.substr(p_clob, l_amt, l_offset))));
      l_offset := l_offset + l_amt;
    end loop;
    
    return l_result;
  end;
  
  function from_base64(
    p_clob clob) return clob is
    
    l_length integer := dbms_lob.getLength(p_clob);
    l_offset integer := 1;
    l_amt binary_integer := 66;
    l_buffer varchar2(3200);
    
    l_result clob := empty_clob();
  begin
    while l_offset <= l_length loop
      l_result := l_result || 
utl_raw.cast_to_varchar2(utl_encode.base64_decode(utl_raw.cast_to_raw(dbms_lob.substr(p_clob, l_amt, l_offset))));
      l_offset := l_offset + l_amt;
    end loop;
    
    return l_result;
  end;
begin
  l_clob := gen_rand_xml();
  dbms_output.put_line(from_base64(to_base64(l_clob)));
end;
/

<envelope>
  <nullnode />
  <node>100</node>
  <node>99</node>
  <node>98</node>
  <node>97</node>
  <node>96</node>
  <node>95</node>
  <node>94</node>
  <node>93</node>
  <node>92</node>
  <node>91</node>
  <node>90</node>
  <node>89</node>
  <node>88</node>
  <node>87</node>
  <node>86</node>
  <node>85</node>
  <node>84</node>
  <node>83</node>
  <node>82</node>
  <node>81</node>
  <node>80</node>
  <node>79</node>
  <node>78</node>
  <node>77</node>
  <node>76</node>
  <node>75</node>
  <node>74</node>
  <node>73</node>
  <node>72</node>
  <node>71</node>
  <node>70</node>
  <node>69</node>
  <node>68</node>
  <node>67</node>
  <node>66</node>
  <node>65</node>
  <node>64</node>
  <node>63</node>
  <node>62</node>
  <node>61</node>
  <node>60</node>
  <node>59</node>
  <node>58</node>
  <node>57</node>
  <node>56</node>
  <node>55</node>
  <node>54</node>
  <node>53</node>
  <node>52</node>
  <node>51</node>
  <node>50</node>
  <node>49</node>
  <node>48</node>
  <node>47</node>
  <node>46</node>
  <node>45</node>
  <node>44</node>
  <node>43</node>
  <node>42</node>
  <node>41</node>
  <node>40</node>
  <node>39</node>
  <node>38</node>
  <node>37</node>
  <node>36</node>
  <node>35</node>
  <node>34</node>
  <node>33</node>
  <node>32</node>
  <node>31</node>
  <node>30</node>
  <node>29</node>
  <node>28</node>
  <node>27</node>
  <node>26</node>
  <node>25</node>
  <node>24</node>
  <node>23</node>
  <node>22</node>
  <node>21</node>
  <node>20</node>
  <node>19</node>
  <node>18</node>
  <node>17</node>
  <node>16</node>
  <node>15</node>
  <node>14</node>
  <node>13</node>
  <node>12</node>
  <node>11</node>
  <node>10</node>
  <node>9</node>
  <node>8</node>
  <node>7</node>
  <node>6</node>
  <node>5</node>
  <node>4</node>
  <node>3</node>
  <node>2</node>
  <node>1</node>
</envelope>

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Some more issues

Simas Braziunas, November 29, 2016 - 11:03 am UTC

It works for this example, but for bigger xml it still fails. I've pinpointed a few issues which solved this for me.
* First of all when encoding, i've converted clob to blob, read blob, and wrote encoded substrings to clob;
* When decoding i've striped down clob of of CR and LF, they were added during encoding, symbols. Because of CR and LF symbols it failed to decode properly;
* Read striped clob and wrote decoded substrings to blob, which then converted back to clob. This keeps the encoding. Without this f.e. if you have string = 'ąč€ė' this would be encoded, and decoded back to 'ąč�ė'.
Chris Saxon
November 29, 2016 - 11:10 am UTC

Glad you hear you got this sorted.

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here