Getting Started with Extractor API

The Extractor API allows you to extract clean text, title, author and other relevant metadata from articles, blogs, press releases, and other long-form pages. To get started you just need an API key and a target URL. If you don't have an API key, sign up for one of our plans. Please see the full documentation for a complete overview of the API.

Base URL

Here's the Base URL for any request:

https://extractorapi.com/api/v1/extractor/

Parameters

The query string only requires two parameters - apikey and url.

curl "https://extractorapi.com/api/v1/extractor/?apikey=YOUR_API_KEY&url=TARGET_URL"

Example

So if your API key was 123456789 and your target url was nytimes.com/investigative-article, you'd structure your request this way:

curl "https://extractorapi.com/api/v1/extractor/?apikey=123456789&url=nytimes.com/investigative-article"

Output

If you entered your API key and URL correctly, you should see a JSON output like this:

			
			// OUTPUT

			{
			   "url": "https://nytimes.com/investigative-article", // Your target URL
			   "status": "COMPLETE", // Status of request - will display ERROR if there was an issue crawling the URL
			   "domain": "nytimes.com", // The domain associated with the target URL
			   "date_published": "2020-03-13T00:00:00Z",
			   "images": [
				  "nytimes.com/image1.png",
				  "nytimes.com/image2.png"
			   ],
			   "videos": [],
			   "title": "Engrossing NY Times Article", // The title of the content in the target URL
			   "author": [ // Author candidates
				 "S. King",
				 "S. King on Twitter"
			   ],
			   "text": "Gluten-free locavore kale chips." // The relevant article, blog, etc. text, minus boilerplate
			}
			

HTML & Raw Text

You can add the fields parameter to specify the fields you'd like to see in your response. This includes the html and raw_text, which aren't included in reponses by default. Note that if you specify the fields parameter, only the URL, status, text and chosen fields will be displayed.

curl "https://extractorapi.com/api/v1/extractor/?apikey=YOUR_API_KEY&url=YOUR_URL&fields=html,raw_text"
			
			// HTML AND RAW TEXT

			{
			   "url": "https://nytimes.com/investigative-article", // Your target URL
			   "status": "COMPLETE", // Status of request - will display ERROR if there was an issue crawling the URL
			   "text": "Gluten-free locavore kale chips.", // The relevant article, blog, etc. text, minus boilerplate
			   "html": "<!DOCTYPE html><html lang='en'><head><title>Title</title></head><body>Text</body></html>",
			   "raw_text": "ADVERTISEMENT. Gluten-free locavore kale chips." // Text including boilerplate
			}
			

Visual Online Extractor

All plans include the option to extract article text using our visual online tool. Check out the demo below to see how to get started.

Plan Features

Depending on your Extractor API plan, you'll have different request limits per second and per month:

Free Hobby Professional Business
Price/month $0 $29 $59 $99
Requests/month 1,000 30,000 100,000 250,000
Requests/second 1 5 10 15
Visual Online Extractor Yes Yes Yes Yes
Full HTML and Raw Text Output Yes Yes Yes Yes