java - Extracting some nodes from XML Files -
java - Extracting some nodes from XML Files -
i need extract nodes xml file formatted in way:
<collection sentiment="negativo"> <comment> <sentiment> ...</sentiment> <chars>...</chars> <words>...</words> <text>blabla</text> <lang>english</lang> </comment>
now assume there other <comment>
elemente have <lang>spanish</lang>
in same xml file. need create 2 separate xml files. first 1 nodes having kid <lang>english</lang>
(let's phone call eng.xml) , sec 1 having <lang>spanish</lang>
(let's phone call spa.xml)
here java code:
public void getenglishrows() throws ioexception{ outputstreamwriter f = new outputstreamwriter(new fileoutputstream("c:/eclipse/neg_eng.xml")); bufferedwriter buff; nodelist current_row = doc.getelementsbytagname("comment"); //mette in una lista tutti nodi row (che contengono loro volta degli elementi) nodelist tmp; node nodo = null; buff = new bufferedwriter(f); for(int i=0;i< current_row.getlength();i++){ tmp = current_row.item(i).getchildnodes(); for(int k=0;k<tmp.getlength();k++){ nodo = tmp.item(k); if("english".equals(nodo.gettextcontent())) system.out.println("if english"); buff.write(current_row.item(i).getnodevalue()); } } buff.close(); }
i don't know if clear, hope so.
so i've 1 xml files lots of <comment></comment>
. i've extract <comment></comment>
have <lang>english</lang>
, write node (with it's childs) xml file. same behaviour <lang>spanish</lang>
.
the output of eng.xml is:
<comment> <sentiment> ...</sentiment> <chars>...</chars> <words>...</words> <text>blabla</text> <lang>english</lang> </comment>
the output of spa.xml is:
<comment> <sentiment> ...</sentiment> <chars>...</chars> <words>...</words> <text>blabla</text> <lang>spanish</lang> </comment>
i hope i'm clear. problem can extract text of nodes, not mantain xml tags!!
please help me!
why not seek delete comments not in english language ? suggestion search tags , observe not-english ones. go parent element contains node (the element) , delete it. preserves original file structure.
try code. worked me :)
public void getenglishrows() throws ioexception, saxexception, parserconfigurationexception, transformerexception{ outputstreamwriter f = new outputstreamwriter(new fileoutputstream("./eng_sent.xml")); documentbuilderfactory dbf = documentbuilderfactory.newinstance(); documentbuilder db = dbf.newdocumentbuilder(); document doc = db.parse(new fileinputstream("c:/eclipse/neg_eng.xml")); nodelist current_row = doc.getelementsbytagname("lang"); // search lang element for(int i=0;i< current_row.getlength();i++){ string lang = current_row.item(i).gettextcontent(); if (!lang.equalsignorecase("english")) { // delete not english language comment element comment = (element) current_row.item(i).getparentnode(); doc.getdocumentelement().removechild(comment); doc.normalize(); } } // write content xml file transformerfactory transformerfactory = transformerfactory.newinstance(); transformer transformer = transformerfactory.newtransformer(); domsource source = new domsource(doc); streamresult result = new streamresult(f); transformer.transform(source, result); }
the file neg_eng appears following:
<collection sentiment="negativo"> <comment> <sentiment> ...</sentiment> <chars>...</chars> <words>...</words> <text>eng3</text> <lang>english</lang> </comment> <comment> <sentiment> ...</sentiment> <chars>...</chars> <words>...</words> <text>eng1</text> <lang>english</lang> </comment> <comment> <sentiment> ...</sentiment> <chars>...</chars> <words>...</words> <text>eng2</text> <lang>english</lang> </comment>
where original xml file was:
<collection sentiment="negativo"> <comment> <sentiment> ...</sentiment> <chars>...</chars> <words>...</words> <text>eng3</text> <lang>english</lang> </comment> <comment> <sentiment> ...</sentiment> <chars>...</chars> <words>...</words> <text>spa2</text> <lang>spanish</lang> </comment> <comment> <sentiment> ...</sentiment> <chars>...</chars> <words>...</words> <text>eng1</text> <lang>english</lang> </comment> <comment> <sentiment> ...</sentiment> <chars>...</chars> <words>...</words> <text>eng2</text> <lang>english</lang> </comment> <comment> <sentiment> ...</sentiment> <chars>...</chars> <words>...</words> <text>spa1</text> <lang>spanish</lang> </comment>
hope help you! happy hacking ;-)
java xml
Comments
Post a Comment