Version: 2.3.0-beta

Neo4j

Neo4j source connector

Description

Read data from Neo4j.

Neo4j Connector for Apache Spark allows you to read data from Neo4j in 3 different ways: by node labels, by relationship name, and by direct Cypher query.

The Options required of yes* means that you must specify one way of (query labels relationship)

for detail neo4j config message please visit neo4j doc

tip

Engine Supported and plugin name

Spark: Neo4j
Flink

Options

name	type	required	default value
result_table_name	string	yes	-
authentication.type	string	no	-
authentication.basic.username	string	no	-
authentication.basic.password	string	no	-
url	string	yes	-
query	string	yes*	-
labels	string	yes*	-
relationship	string	yes*	-
schema.flatten.limit	string	no	-
schema.strategy	string	no	-
pushdown.filters.enabled	string	no	-
pushdown.columns.enabled	string	no	-
partitions	string	no	-
query.count	string	no	-
relationship.nodes.map	string	no	-
relationship.source.labels	string	Yes	-
relationship.target.labels	string	Yes	-

result.table.name [string]

result table name

authentication.type [string]

authentication type

authentication.basic.username [string]

username

authentication.basic.password [string]

password

url [string]

url

query [string]

Cypher query to read the data.You must specify one option from [query, labels OR relationship]

labels [string]

List of node labels separated by : The first label will be the primary label. You must specify one option from [query, labels OR relationship]

relationship [string]

Name of a relationship. You must specify one option from [query, labels OR relationship]

schema.flatten.limit [string]

Number of records to be used to create the Schema (only if APOC are not installed)

schema.strategy [string]

Strategy used by the connector in order to compute the Schema definition for the Dataset. Possibile values are string, sample. When string it coerces all the properties to String otherwise it will try to sample the Neo4j’s dataset.

pushdown.filters.enabled [string]

Enable or disable the Push Down Filters support

pushdown.columns.enabled [string]

Enable or disable the Push Down Column support

partitions [string]

This defines the parallelization level while pulling data from Neo4j.

Note: as more parallelization does not mean more performances so please tune wisely in according to your Neo4j installation.

query.count [string]

Query count, used only in combination with query option, it’s a query that returns a count field like the following:

MATCH (p:Person)-[r:BOUGHT]->(pr:Product) WHERE pr.name = 'An Awesome Product' RETURN count(p) AS count or a simple number that represents the amount of records returned by query. Consider that the number passed by this value represent the volume of the data pulled of Neo4j, so please use it carefully.

relationship.nodes.map [string]

If true return source and target nodes as Map<String, String>, otherwise we flatten the properties by returning every single node property as column prefixed by source or target

relationship.source.labels [string]

List of source node Labels separated by :

relationship.target.labels [string]

List of target node Labels separated by :

Example

   Neo4j {
      result_table_name = "test"
      authentication.type = "basic"
      authentication.basic.username = "test"
      authentication.basic.password = "test"
      url = "bolt://localhost:7687"
      labels = "Person"
      #query = "MATCH (n1)-[r]->(n2) RETURN r, n1, n2 "
   }

The returned table is a data table in which both fields are strings

`<id>`	`<labels>`	name	born
1	[Person]	Keanu Reeves	1964
2	[Person]	Carrie-Anne Moss	1967
3	[Person]	Laurence Fishburne	1961
4	[Person]	Hugo Weaving	1960
5	[Person]	Andy Wachowski	1967
6	[Person]	Lana Wachowski	1965
7	[Person]	Joel Silver	1952
8	[Person]	Emil Eifrem	1978

Neo4j

Description​

Options​

result.table.name [string]​

authentication.type [string]​

authentication.basic.username [string]​

authentication.basic.password [string]​

url [string]​

query [string]​

labels [string]​

relationship [string]​

schema.flatten.limit [string]​

schema.strategy [string]​

pushdown.filters.enabled [string]​

pushdown.columns.enabled [string]​

partitions [string]​

query.count [string]​

relationship.nodes.map [string]​

relationship.source.labels [string]​

relationship.target.labels [string]​

Example​