Forum Discussion

SuperTester's avatar
SuperTester
Contributor
5 years ago

Extracting Image File Path from HTML Tag - Regular Expressions

Hello,

 

I am trying to come up with a regular expression that will match an image file path that is contained within a HTML tag.

 

<p><img src="file:///C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png" width = ""height = ""></p>

 

Now, I have a regular expression that will match the file path, but not when the file path is contained in the HTML Tag string.

RegEx:  /^(?:[\w]\:|\\)(.*png$)/gim

 

File Path: C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png

 

I think theres two problems with my regular expression. 1) My regular expression needs to start with  "C:", if other characters are included before "C:", than it won't recognize "C:" as a "begin with". 2) The file path is contained within quotes. I'm not taking this into account, but I'm also not sure if this would throw off the regular expression.

 

Any insight would be appreciated!

 

Notes

- I'm using Reg101 to develop my regular expressions. https://regex101.com/

- The numbers "07165ac8_2b0c_44e2_a4d8_1deabe5fb73e" within the string are randomly generated and will change every test execution while the sub-string "ABC_Image_DE" will remain the same.

- Scripting in javascript

  • BenoitB's avatar
    BenoitB
    5 years ago

    Yep, i've made a big mistake. As soon it's assigned it become a processed litteral and the \, as it is an escape char, doesn't exist anymore.

    No easy solution yet.

     

    You can try with the String.raw but the problem is the assignement of the value...

    https://developer.mozilla.org/fr/docs/Web/JavaScript/Reference/Objets_globaux/String/raw

     

     

     

    function getImagePath(HtmlTag, Extension = ".png") {
      let posStart = 8 + aqString.Find(HtmlTag, 'file:///', 0, false);                      // +8 to exclude 'file:///'
      let posEnd   = Extension.length + aqString.Find(HtmlTag, Extension, posStart, false); // To include the extension
      return HtmlTag.substr(posStart, posEnd-posStart);
    }  
    
    function testIt() {
      const htmlTag = String.raw`<p><img src="file:///C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png" width = ""height = ""></p>`;
      Log.Message(getImagePath(`${htmlTag}`));
    }
    

     

     

     

    In regex or string manipulations the problem exist as long as the \ is an escap char.

     

  • BenoitB's avatar
    BenoitB
    Community Hero

    I like regex but sometimes simple text parsing is good too.

     

    function getImagePath(HtmlTag, Extension = ".png") {
      let posStart = 8 + aqString.Find(HtmlTag, 'file:///', 0, false);                      // +8 to exclude 'file:///'
      let posEnd   = Extension.length + aqString.Find(HtmlTag, Extension, posStart, false); // To include the extension
      return HtmlTag.substr(posStart, posEnd-posStart);
    }  
    
    function testIt() {
      Log.Message(getImagePath('<p><img src="file:///C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png" width = ""height = ""></p>'));
    }
    

     

    • SuperTester's avatar
      SuperTester
      Contributor

      Hey!

       

      Thank you very much for the solution! I appreciate the insight of changing from regex to parsing.

       

      One thing thats not quite right is that the result string is missing backslashes:

      C:UsersUserNameAppDataLocalTempABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png

       

      I noticed during debugging that the HTML tag string also missing backslashes during test execution (see screen shot attached. Is this caused by the backslash being a escape character?

       

      Thanks again!

      • BenoitB's avatar
        BenoitB
        Community Hero

        Yep, i've made a big mistake. As soon it's assigned it become a processed litteral and the \, as it is an escape char, doesn't exist anymore.

        No easy solution yet.

         

        You can try with the String.raw but the problem is the assignement of the value...

        https://developer.mozilla.org/fr/docs/Web/JavaScript/Reference/Objets_globaux/String/raw

         

         

         

        function getImagePath(HtmlTag, Extension = ".png") {
          let posStart = 8 + aqString.Find(HtmlTag, 'file:///', 0, false);                      // +8 to exclude 'file:///'
          let posEnd   = Extension.length + aqString.Find(HtmlTag, Extension, posStart, false); // To include the extension
          return HtmlTag.substr(posStart, posEnd-posStart);
        }  
        
        function testIt() {
          const htmlTag = String.raw`<p><img src="file:///C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png" width = ""height = ""></p>`;
          Log.Message(getImagePath(`${htmlTag}`));
        }
        

         

         

         

        In regex or string manipulations the problem exist as long as the \ is an escap char.