Skip to main content
Version: 2.3.0-beta

Neo4j

Neo4j source connector

Descriptionโ€‹

Read data from Neo4j.

Neo4j Connector for Apache Spark allows you to read data from Neo4j in 3 different ways: by node labels, by relationship name, and by direct Cypher query.

The Options required of yes* means that you must specify one way of (query labels relationship)

for detail neo4j config message please visit neo4j doc

tip

Engine Supported and plugin name

  • Spark: Neo4j
  • Flink

Optionsโ€‹

nametyperequireddefault value
result_table_namestringyes-
authentication.typestringno-
authentication.basic.usernamestringno-
authentication.basic.passwordstringno-
urlstringyes-
querystringyes*-
labelsstringyes*-
relationshipstringyes*-
schema.flatten.limitstringno-
schema.strategystringno-
pushdown.filters.enabledstringno-
pushdown.columns.enabledstringno-
partitionsstringno-
query.countstringno-
relationship.nodes.mapstringno-
relationship.source.labelsstringYes-
relationship.target.labelsstringYes-

result.table.name [string]โ€‹

result table name

authentication.type [string]โ€‹

authentication type

authentication.basic.username [string]โ€‹

username

authentication.basic.password [string]โ€‹

password

url [string]โ€‹

url

query [string]โ€‹

Cypher query to read the data.You must specify one option from [query, labels OR relationship]

labels [string]โ€‹

List of node labels separated by : The first label will be the primary label. You must specify one option from [query, labels OR relationship]

relationship [string]โ€‹

Name of a relationship. You must specify one option from [query, labels OR relationship]

schema.flatten.limit [string]โ€‹

Number of records to be used to create the Schema (only if APOC are not installed)

schema.strategy [string]โ€‹

Strategy used by the connector in order to compute the Schema definition for the Dataset. Possibile values are string, sample. When string it coerces all the properties to String otherwise it will try to sample the Neo4jโ€™s dataset.

pushdown.filters.enabled [string]โ€‹

Enable or disable the Push Down Filters support

pushdown.columns.enabled [string]โ€‹

Enable or disable the Push Down Column support

partitions [string]โ€‹

This defines the parallelization level while pulling data from Neo4j.

Note: as more parallelization does not mean more performances so please tune wisely in according to your Neo4j installation.

query.count [string]โ€‹

Query count, used only in combination with query option, itโ€™s a query that returns a count field like the following:

MATCH (p:Person)-[r:BOUGHT]->(pr:Product) WHERE pr.name = 'An Awesome Product' RETURN count(p) AS count or a simple number that represents the amount of records returned by query. Consider that the number passed by this value represent the volume of the data pulled of Neo4j, so please use it carefully.

relationship.nodes.map [string]โ€‹

If true return source and target nodes as Map<String, String>, otherwise we flatten the properties by returning every single node property as column prefixed by source or target

relationship.source.labels [string]โ€‹

List of source node Labels separated by :

relationship.target.labels [string]โ€‹

List of target node Labels separated by :

Exampleโ€‹

   Neo4j {
result_table_name = "test"
authentication.type = "basic"
authentication.basic.username = "test"
authentication.basic.password = "test"
url = "bolt://localhost:7687"
labels = "Person"
#query = "MATCH (n1)-[r]->(n2) RETURN r, n1, n2 "
}

The returned table is a data table in which both fields are strings

<id><labels>nameborn
1[Person]Keanu Reeves1964
2[Person]Carrie-Anne Moss1967
3[Person]Laurence Fishburne1961
4[Person]Hugo Weaving1960
5[Person]Andy Wachowski1967
6[Person]Lana Wachowski1965
7[Person]Joel Silver1952
8[Person]Emil Eifrem1978