Name: jsoup	Distribution: openSUSE Tumbleweed
Version: 1.15.3	Vendor: openSUSE
Release: 3.2	Build date: Wed Oct 2 17:54:24 2024
Group: Development/Libraries/Java	Build host: reproducible
Size: 533120	Source RPM: jsoup-1.15.3-3.2.src.rpm
Packager: http://bugs.opensuse.org
Url: https://jsoup.org/
Summary: Java library for working with HTML

jsoup is a Java library for working with HTML.
It provides an API for extracting and manipulating data,
using DOM, CSS, and jquery-like methods.

jsoup implements the WHATWG HTML5 specification.

 - scrapes and parses HTML from a URL, file, or string
 - finds and extracts data, using DOM traversal or CSS selectors
 - manipulates the HTML elements, attributes, and text
 - cleans user-submitted content against a safe white-list,
   to prevent XSS attacks
 - outputs tidied HTML

jsoup can deal with invalid HTML tag soup.

Provides

Requires

License

MIT

Changelog

* Wed Oct 02 2024 Fridrich Strba <[email protected]>
  - Spec file cleanup
* Thu Oct 20 2022 Fridrich Strba <[email protected]>
  - Fix typo in the ant *-build.xml file that caused errors while
    building eclipse
* Mon Oct 17 2022 Fridrich Strba <[email protected]>
  - Upgrade to upstream version 1.15.3
  - Changes of 1.15.3
    * Security
      + Fixed  bsc#1203459 (CVE-2022-36033), an issue where the jsoup
      cleaner may incorrectly sanitize crafted XSS attempts if
      SafeList.preserveRelativeLinks is enabled. See the security
      advisory for more details.
    * Improvements
      + The Cleaner will preserve the source position of cleaned
      elements, if source tracking is enabled in the original parse.
      + The error messages output from Validate are more descriptive.
      Exceptions are now ValidationExceptions
      (extending IllegalArgumentException). Stack traces do not
      include the Validate class, to make it simpler to see where
      the exception originated. Common validation errors including
      malformed URLs and empty selector results have more explicit
      error messages.
      + Build Improvement: added implementation version and related
      fields to the jar manifest.
    * Bug Fixes
      + The DataUtil would incorrectly read from InputStreams that
      emitted reads less than the requested size. This lead to
      incorrect results when parsing from chunked server responses,
      for example.
  - Changes of 1.15.2
    * Improvements
      + Added the ability to track the position (line, column, index)
      in the original input source from where a given node was
      parsed. Accessible via Node.sourceRange() and
      Element.endSourceRange().
      + Added Element.firstElementChild(), Element.lastElementChild(),
      Node.firstChild(), Node.lastChild(), as convenient accessors
      to those child nodes and elements.
      + Added Element.expectFirst(), which is just like
      Element.selectFirst(), but instead of returning a null if
      there is no match, will throw an IllegalArgumentException.
      This is useful if you want to simply abort processing if an
      expected match is not found, such as in test cases.
      + When pretty-printing HTML, doctypes are emitted on a newline
      if there is a preceding comment.
      + When pretty-printing, trim the leading and trailing spaces of
      textnodes in block tags when possible, so that they are
      indented correctly.
      + In Element.selectXpath(), disable namespace awareness. This
      makes it possible to always select elements by their simple
      local name, regardless of whether an xmlns attribute was set.
    * Bug Fixes
      + When using the DataUtil.readToByteBuffer() method, such as in
      Connection.Response.body(), if the document has not already
      been parsed and must be read fully, and there is any maximum
      buffer size being applied, only the default internal buffer
      size was read.
      + When serializing HTML, newlines in elements descending from a
      pre tag were incorrectly skipped. That caused what should have
      been preformatted output to instead be a run of text.
      + When pretty-print serializing HTML, newlines separating
      phrasing content (e.g. a <span> tag within a <p> tag would be
      incorrectly skipped, instead of normalized to a space.
      Additionally, improved space normalization between other end
      of line occurences, and whitespace handling after a closing
      </body>
  - Changes of 1.15.1
    * Changes
      + Removed previously deprecated methods and classes (including
      org.jsoup.safety.Whitelist; use org.jsoup.safety.Safelist
      instead).
    * Improvements
      + When converting jsoup Documents to W3C Documents in W3CDom,
      preserve HTML valid attribute names if the input document is
      using the HTML syntax. (Previously, would always coerce using
      the more restrictive XML syntax.)
      + Added the :containsWholeText(text) selector, to match against
      non-normalized Element text. That can be useful when elements
      can only be distinguished by e.g. specific case, or leading
      whitespace, etc.
      + Added Element#wholeOwnText() to retrieve the original
      (non-normalized) ownText of an Element. Also added the
      :containsWholeOwnText(text) selector, to match against that.
      BR elements are now treated as newlines in the wholeText
      methods.
      + Added the :matchesWholeText(regex) and
      :matchesWholeOwnText(regex) selectors, to match against whole
      (non-normalized, case sensitive) element text and own text,
      respectively.
      + When evaluating an XPath query against a context element, the
      complete document is now visible to the query, vs only the
      context element's sub-tree. This enables support for queries
      outside (parent or sibling) the element, e.g.
      ancestor-or-self::*.
      + Allow a maxPaddingWidth on the indent level in OutputSettings
      when pretty printing. This defaults to 30 to limit the indent
      level for very deeply nested elements, and may be disabled by
      setting to -1.
      + When cloning a Node or an Element, the clone gets a cloned
      OwnerDocument containing only that clone, so as to preserve
      applicable settings, such as the Pretty Print settings.
      + Added a convenience method Jsoup.parse(File).
      + In the NodeTraversor, added default implementations for
      NodeVisitor.tail() and NodeFilter.tail(), so that code using
      only head() methods can be written as lambdas.
      + In NodeTraversor, added support for removing nodes via
      Node.remove() during NodeVisitor.head().
      + Added Node.forEachNode(Consumer<Node>) and
      Element.forEach(Consumer<Element) methods, to efficiently
      traverse the DOM with a functional interface.
    * Bug Fixes
      + Boolean attribute names should be case-insensitive, but were
      not when the parser was configured to preserve case.
      + When reading from SequenceInputStreams across the buffer, the
      input stream was closed too early, resulting in missed
      content.
      + A comment with all dashes (<!----->) should not emit a parse
      error.
      + When throwing a SelectorParseException for an invalid
      selector, don't try to String.format the input, as that could
      throw an IllegalFormatException.
      + When serializing HTML with Pretty Print enabled, extraneous
      whitespace may be added on closing tags, or extra newlines may
      be added at the end of script blocks.
      + When copy-creating a Safelist from another, perform a
      deep-copy of the original's settings, so that changes to the
      original after creation do not affect the copy.
      + Speed improvement when parsing constructed HTML containing
      very deeply incorrectly stacked formatting elements with many
      attributes.
      + During parsing, a StackOverflowException was possible given
      crafted HTML with hundreds of nested table elements followed
      by invalid formatting elements.
  - Changes of 1.14.3
    * Improvements
      + Added native XPath support with Element.selectXpath(String)
      + Added full support for the <template> tag, up to the HTML5
      parser spec.
      + Added support in CharacterReader to track newlines, so that
      parse errors can be reported more intuitively.
      + Tracked parse errors now have more details, including the
      erroneous token, to help clarify the errors.
      + Speed and memory optimizations for the :has(subquery)
      selector.
      + The :contains(text) and :containsOwn(text) selectors are now
      whitespace normalized, aligning to the document text that they
      are matching against.
      + In Element, speed optimized adopting all of an element's child
      nodes into a currently empty element. Improves the HTML
      adoption agency algorithm when adopting elements with many
      children.
      + Increased the parse speed when in RCData (e.g. <title>) and
      unescaped <tag> tokens are found, by memoizing the </title>
      scan and reducing GC.
      + When parsing custom tags (in HTML or XML), added a flyweight
      cache on Tag.valueOf(String) to reduce memory overhead when
      many tags are repeated. Also tuned other areas of the parser
      when many very deeply stacked custom elements were present.
    * Bug Fixes
      + The OSGi bundle meta-data incorrectly set a version on the
      import of javax.annotation (used as a build-time dependency
      for nullability assertions).
      + When tracking errors or checking for validity in the Cleaner,
      errors were incorrectly raised for missing optional closing tags.
      + The Attributes.equals() method was sensitive to the order of
      its contents, but it should not be.
      + When the HTML parser was configured to preserve case, Element
      text methods would miss adding whitespace for BR tags.
      + Attribute names are now normalized & validated correctly for
      the specific output syntax (HTML or XML). Previously,
      syntactically invalid attribute names could be output by the
      html() methods. Such attributes are still available in the
      DOM, and will be normalized if possible on output.
      + Fixed an IOOB when an empty select tag was followed by a body
      tag that needed reparenting.
    * Build Improvements
      + Fixed nullability annotations for Node.equals(Object) and
      other equals methods.
      + Added JDK 17 to the CI builds.
* Fri Aug 27 2021 Fridrich Strba <[email protected]>
  - Upgrade to upstream version 1.14.2
    * fixes bsc#1189749, CVE-2021-37714
  - Generate tarball using source service instead of a script
* Fri Feb 22 2019 Fridrich Strba <[email protected]>
  - Remove from the tarball the non-free test data
* Sat Feb 02 2019 Jan Engelhardt <[email protected]>
  - Ensure neutrality of descriptions.
* Fri Feb 01 2019 Fridrich Strba <[email protected]>
  - Initial packaging of jsoup version 1.11.3
  - Added jsoup-build.xml file to build with ant

Files

/usr/share/doc/packages/jsoup
/usr/share/doc/packages/jsoup/CHANGES
/usr/share/doc/packages/jsoup/README.md
/usr/share/java/jsoup
/usr/share/java/jsoup/jsoup.jar
/usr/share/licenses/jsoup
/usr/share/licenses/jsoup/LICENSE
/usr/share/maven-metadata/jsoup.xml
/usr/share/maven-poms/jsoup
/usr/share/maven-poms/jsoup/jsoup.pom

Generated by rpm2html 1.8.1

Fabrice Bellet, Mon Dec 9 23:39:48 2024

jsoup-1.15.3-3.2 RPM for noarch

From OpenSuSE Ports Tumbleweed for noarch

Provides

Requires

License

Changelog

Files