Spanish Stock Market Company Profiles Retrieval using investpy

Álvaro Bartolomé
6 min readApr 8, 2019

Introduction

As we know, a company profile is a professional introduction of the business and aims to inform the audience about its products and services, so it has a huge relevance when it comes to company classification for its further analysis, as defined in this Udemy Blog.

investpy has a new feature recently released for 0.8 version that retrieves company profiles from the companies of the spanish stock market indexed in Investing.com and in “La Bolsa de Madrid”. Along this post I will be explaining how to use investpy in order to retrieve “Company Profiles” and the process will be sorted out.

Requirements

To test the code that is going to be presented in this post, you need to have the following dependency installed in its latest version and you can easily install it via PyPI (Python Package Index):

  • investpy: investpy is a Python library for historical data and information extraction from the companies of the spanish stock market, based on web scraping. investpy is the first Python library that retrieves information specifically from the spanish stock market, making this task easy for users that seek retrieving real time data, oriented to its further analysis.

Currently in April 2019 its latest version is 0.8 and it has been successfully tested on Python 2.7, 3.6 and 3.7, and you can install it as it follows:

python -m pip install investpy==0.8

Data Extraction Implementation

Data Extraction in this case consists on retrieving company profiles from two different sources, “Investing.com” and “Bolsa de Madrid”, via web scraping because both sources contain all the profiles from every company that operates in the continuous spanish stock market, in english and spanish, respectively.

This function has been developed following this guidelines but in this case retrieving information from two different sources because Natural Language Processing (NLP) of those company profiles can be done in a lot of languages and each source offers them in a different language.

As some web scraping tools have already been tested, we are going to based this web scraper on those results, that stated that the best combination for both HTML DOM Tree retrieval and Data Extraction from it, consisted on combining requests and lxml, in their latest version.

requests

requests allows you to send organic, grass-fed HTTP/1.1 requests, without the need for manual labour. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic, thanks to urllib3.

import requeststag_ = "bbva"
isin_code = "ES0113211835"
head = {
"User-Agent": "Mozilla/5.0 (X11; U; Linux amd64; rv:5.0) Gecko/20100101 Firefox/5.0 (Debian)",
"X-Requested-With": "XMLHttpRequest"
}
investing_url = "https://www.investing.com/equities/" + tag_ + "-company-profile"
bolsa_url = "http://www.bolsamadrid.es/esp/aspx/Empresas/FichaValor.aspx?ISIN=" + isin_code
investing_req = requests.get(investing_url, headers=head)
bolsa_req = requests.get(bolsa_url, headers=head)

There is no need to implement this function as it is already implemented on investpy as explained in the next section, so both the TAG and ISIN CODE from the specified equity are static parameters in this sample case. Those parameters are really retrieved by investpy internally for every equity of the spanish stock market.

Sources have been selected so the information extracted from them is reliable and useful. As mentioned before, we made use of the company information of equities from the spanish stock market indexed in “Investing.com” and “Bolsa de Madrid”.

Investing is the main source of the Python package investpy as previously explained and the company information of every equity can be found at https://www.investing.com/ in english (as the spanish webpage of Investing) does not have that information indexed).

Investing Company Profile HTML DOM Tree

La Bolsa de Madrid is a spanish web hosted by Bolsas y Mercados Españoles (BME) that contains information from the spanish market and also from the equities of the continuous spanish stock market, the information we need. In this webpage the company profiles are in spanish, solving the need found when retrieving information from Investing.

Bolsa de Madrid Company Profile HTML DOM Tree

Finding reliable sources where we can get information from can sometimes be a quite hard task, because there are a lot of factors involved, so we need to make sure that the information is up-to-date and that it fits our needs. “A reliable source is one that provides a thorough, well-reasoned theory, argument, discussion, etc. based on strong evidence.” — Finding Reliable Sources: What is a Reliable Source?, University of Georgia.

lxml

lxml is a Pythonic, mature binding for the libxml2 and libxslt libraries. It provides safe and convenient access to these libraries using the ElementTree API. It extends the ElementTree API significantly to offer support for XPath, RelaxNG, XML Schema, XSLT, C14N and much more.

investing_root = fromstring(investing_req.text)
investing_path = investing_root.xpath(".//*[@id=\"profile-fullStory-showhide\"]")
if investing_path:
return investing_path[0].text_content()
bolsa_root = fromstring(investing_req.text)
bolsa_path = bolsa_root.xpath(".//td[contains(@class, 'Perfil')]")
if bolsa_path:
return bolsa_path[0].text_content()

The prupose of using lxml is to extract the company profiles from the HTML DOM Tree of both sources as shown previously. Once the HTML is retrieved and the information to extract is located (its HTML container), the extraction can be done by retrieving the text contained between the selected HTML tags for each source.

Use of investpy

As the process has been already sorted out and described, now its use via investpy is going to be explained in order to clarify it, so anyone can use it. Once you have investpy installed, you can call the company retrieval function:

import investpybbva_investing = investpy.get_equity_company_profile(equity='bbva', source='Investing')bbva_bolsa = investpy.get_equity_company_profile(equity='bbva', source='Bolsa de Madrid')

As you can see, we are calling the function twice to retrieve the company profile from BBVA from both available sources “Investing.com” and “Bolsa de Madrid” in english and spanish, respectively. So the output of both functions is stored in a variable as:

Output of investpy Company Porfile Retrieval Function for BBVA

Conclusion

As data extraction has already been explained in the previous post, the conclusion of this one is focused on the relevance of company profiles and how can we use them in a later analysis of the different companies of the spanish stock market.

“Company profiles should go beyond stating facts about a company, well-written company profiles deftly convey the predominant values and corporate culture that lends the organization its distinct character.” — as explained by Terry Ottow in his post in AM:PM MARKETING.

Hence we conclude stating that the relevance of a company profile is a differential value when it comes to classifying companies, so its retrieval can give us important information for its further analysis.

Additional Information

If you want an investpy Jupyter Notebook with all the functions and some explanations on its use, further information or any question feel free to contact me via email at alvarob96@usal.es or via LinkedIn at Alvaro Bartolome del Canto.

Thank you for your support! Stay tuned for more Data Science content!

--

--

Álvaro Bartolomé

Machine Learning Engineer. Passionate about CV and NLP, and open-source developer. Also, baller and MMA enjoyer.