• Marek Vavrovic

Power Query - Extracting web data using URL parts - Building functions to fetch web data.

Updated: Oct 22, 2021

Web source: Database index (planecrashinfo.com)

Goal is to extract the accident details for each year and date. That means get a access to level 3 data.

Step 1

Bring the Accident Details page in.

Step 2


Go to: Transform > Pivot > Advanced options > Don't Aggregate

When data has been pivoted you can decide which columns (15) you want to keep or remove. You can also do data cleaning transformations. I will not demonstrate that, my goal is just focus on extracting the data.

Step 3

Get years.

URL : http://www.planecrashinfo.com/database.htm

Unpivot Columns

Keep just column with years, remove bottom rows (8 empty rows)

By now I have the access to level 1 data. Next task is to get access to level 2. Through level 2 I will be able to reach on data in level 3.

URL : http://www.planecrashinfo.com/1920/1920.htm

Step 4

Get Level 2 data. Remove all the columns and keep just one, dates. I will use these data to build a function to unlock data in level 3.

Step 5

After I have removed all the other columns I transform this dataset into a function.

Step 6



Source = Web.Page(Web.Contents("http://www.planecrashinfo.com/"&Year&"/"&Year&".htm")),

As the URL is a text, Year variable must be a text too.

Step 7

In step 4 I extracted LEVEL 2 data. I will return them back here using tblHelp function.

Step 8

For year 1920 there are 51 date records in that table. I will add an index to each date.

Step 9

We add an Index to the Date column.

=>Add custom column


Expand IndexingCustomColumn column.

Remove unnecessary columns.

Now we have this key to get to access to Level 3 data, AccidentDetails.

Step 10

I convert this dataset into a function.

(year, index)=>


Source = Web.Page(Web.Contents("http://www.planecrashinfo.com/"&year&"/"&year&"-"&index&".htm"))

This function needs 2 arguments, which will be supplied form AirAcciddents table.

Step 11

Add Custom column, use AccidentDetails function.

Expand the column names from the FinalTable and bring them in.

Now I have all the accident details in one table.

Conclusion: It takes 5 hours to Power BI Desktop to load all the rows. MS Excel can load these data in 1 hour.

41 views0 comments

Recent Posts

See All