Pythonic way to get HTML string inside a BeautifulSoup4 Tag? - Python

TopAnswers Python

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

Pythonic way to get HTML string inside a BeautifulSoup4 Tag?

add tag

Pax

Given `<p>Bananas <em>in</em> pyjamas are <em>going</em> down the <em>stairs</em>.</p>`

How can I get the HTML string inside the `<p>` tag?

That is, `Bananas <em>in</em> pyjamas are <em>going</em> down the <em>stairs</em>.`

```
paragraph = '<p>Bananas <em>in</em> pyjamas are <em>going</em> down the <em>stairs</em>.</p>'
paragraphTag = BeautifulSoup(paragraph, "html.parser")
# paragraphTag.???
```

Top Answer

hkotsubo

The `BeautifulSoup` constructor creates an object that represents the whole HTML document. Of course in this case, the document contains only one `p` tag, but anyway, your solution also outputs the `<p>` and `</p>` tags, which is not what you wanted (if I understood correctly).

To get only the tag contents (but not the opening and closing tags themselves), you should use `find` to get only the tag, and then use `decode_contents()` to get all tag contents as a string:

```python
from bs4 import BeautifulSoup

paragraph = '''<p>Bananas <em>in</em> pyjamas are <em>going</em> down the <em>stairs</em>.</p>'''

# get a "soup" object that represents the whole HTML document
soup = BeautifulSoup(paragraph, "html.parser")

# get only the p tag
paragraphTag = soup.find('p')

# get tag contents as a string
print(paragraphTag.decode_contents()) # Bananas <em>in</em> pyjamas are <em>going</em> down the <em>stairs</em>.
```

Answer #2

Pax

It seems like there is no direct way to retrieve the string inside a tag.

`.text` and `.strings` will strip the tags.

`.contents` is our best bet but you have to cast to a string first.

```
return "".join(map(str, paragraphTag.contents))
```

2 Answers