investpy — a Python package for historical data extraction from the spanish stock market

Álvaro Bartolomé
5 min readMar 14, 2019

Introduction

I decided to create investpy due to the needs of my Final Degree Project (TFG, in spanish) on Computer Engineering at the University of Salamanca (USAL), titled “Machine Learning for stock investment recommendation systems”. I also found out there were no Python packages for historical data extraction from the spanish stock market, so I thought it could be useful to publish my work so everyone can use it.

es.Investing.com is the source where the data is extracted from

investpy is a Python package for historical data extraction from equities, funds and etfs from the continuous spanish market. It is based on Web Scraping and HTML Parsing in order to retrieve the information from a consistent and reliable source (in this case, es.Investing.com) and then turn that data into useful information, as a pandas DataFrame or a JSON object.

Requirements

In this case, investpy has currently support for Python 2.7, 3.6 and 3.7 tested using Travis CI. So you just have to check which version of Python you are using and if you have the following dependencies installed in their latest version (if not, just install the package via PyPI as explained below):

  • pandas: pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.
  • requests: requests allows you to send organic, grass-fed HTTP/1.1 requests, without the need for manual labor. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic, thanks to urllib3.
  • lxml: lxml is a Pythonic, mature binding for the libxml2 and libxslt libraries. It provides safe and convenient access to these libraries using the ElementTree API. It extends the ElementTree API significantly to offer support for XPath, RelaxNG, XML Schema, XSLT, C14N and much more.
  • beautifulsoup4: Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.
  • pytest: the pytest framework makes it easy to write small tests, yet scales to support complex functional testing for applications and libraries.

Installation

As investpy is a Python package, you can install it using PyPI (Python Package Index) as it follows:

pip install investpy --upgrade

If you don’t have pip installed on your Python version, you can check this post to install and configure it, then you can install investpy.

Use

Currently you just have two possible options to retrieve data with this scraper:

Retrieve the recent data of an equity/fund/etf: it retrieves the historical data of an equity/fund/etf from the last month. The function also checks if the introduced equity/fund/etf name is correct and then retrieves the data. The function has some optional parameters like:

  • as_json by default is False but if True the output of the function is a JSON object, not a pandas.DataFrame.
  • order by default is ‘ascending’ ordering the historical data in the pandas.DataFrame from the older to the newest, ‘descending’ should be used for the contrary testing.
import investpyequities_ = investpy.get_recent_data(equity='bbva', as_json=False, order='ascending')funds_ = investpy.get_fund_recent_data(fund='bbva multiactivo conservador pp', as_json=False, order='ascending')etfs_ = investpy.get_etf_recent_data(etf='bbva accion dj eurostoxx 50', as_json=False, order='ascending')
Sample recent historical data retrieval from an equity of the continuous spanish stock market

Retrieve the historical data of an equity/fund/etf from a specific range of time: it retrieves the historical data from an equity/fund/etf from a range of time between the start and the end date, specified in dd/mm/YYYY format. This function also checks is the introduced equity/fund/etf name is correct and then retrieves the data. The function has some optional parameters like:

  • as_json by default is False but if True the output of the function is a JSON object, not a pandas.DataFrame.
  • order by default is ‘ascending’ ordering the historical data in the pandas.DataFrame from the older to the newest, ‘descending’ should be used for the contrary testing.
import investpyequities_ = investpy.get_historical_data(equity='bbva', start='01/01/2015', end='01/01/2019', as_json=False, order='ascending')funds_ = investpy.get_fund_historical_data(fund='bbva multiactivo conservador pp', start='01/01/2015', end='01/01/2019', as_json=False, order='ascending')etfs_ = investpy.get_etf_historical_data(fund='bbva accion dj eurostoxx 50', start='01/01/2015', end='01/01/2019', as_json=False, order='ascending')
Sample historical data retrieval between two dates from an equity of the continuous spanish stock market

You can check all the available equities/funds/etfs you can retrieve data from in Investing:

(NOTE: you will need an active Internet connection in order to get the scraper working.)

Additional Information

The package is currently in a development version, so please, if needed open an issue to solve all the possible problems the package may be causing so I can fix them as fast as possible. Also, any new ideas or proposals are welcome, and I will gladly implement them if the are useful for the package.

For further information or any question feel free to contact me via email at alvarob96@usal.es or via LinkedIn at https://www.linkedin.com/in/abartt/

Disclaimer

This Python Package has been made for research purposes in order to fit a needs that Investing.com does not cover, so this package works like an API for Investing.com developed in an altruistic way. Conclude that I am not related at all with Investing.com or any similar company, so I contacted Investing.com via mail and they gave me permission to develop this scraper with the condition of mentioning the source where I retrieve the data from.

To clear any doubt if this is legal or not, I will tell you literally what Enrique from Investing.com Support answered me when I asked them for permission to develop this scraper: “[…] thank you for contacting and choosing us (as the reliable source to get the data from) […] you can use and retrieve all the data that Investing.com offers to the users as far as you specify which is the source you get the data from […]”.

Stay tuned for more Data Science content and some tutorials on how to use investpy data in order to make predictions or analysis of stocks!

--

--

Álvaro Bartolomé

Machine Learning Engineer. Passionate about CV and NLP, and open-source developer. Also, baller and MMA enjoyer.