According to the presenter, graph databases are:
A type of No-SQL database that supports connections, graphs.
More change resilient than traditional databases.
Performance is better than relational as database size increases. Relational slows down around 100 million rows.
Easily handles complexity as compared to complex joins in relational databases.
Graph database work well in the following contexts:
Social – twitter
Interest – user interests
Consumption – user consumption of products
Intent – users, consumptions, interests
Graphs are best for complex associations. Not good for data that is not related. Getting better at aggregation of data but relational databases are better at aggregations. Relational databases have a common SQL. NeoJ has Open Cypher as a common query language.
MATCH (c:CONTENT)-[R:HAS]->(T:TAG:NAME {username: “greg”}))
MATCH (u:User)
return u
Labels are like tags. Properties are like fields. You can remove a property for one specific record.
Batch import tool
LOAD CSV, loads thousands of records.
Install directory:
/Users/alanbeal/CodeMash2019/GraphDatabases/up-and-running-with-graph-databases-software-MacOS/neo4j/
If windows, stop neo4j and delete the old graph.db, then cd into bin
.\neo4j-admin import –nodes csvs/nodes/Category.csv –nodes csvs/nodes/Product.csv –nodes csvs/nodes/Review.csv –nodes csvs/nodes/User.csv –relationships csvs/rels/CC.csv –relationships csvs/rels/PC.csv –relationships csvs/rels/PP.csv –relationships csvs/rels/PR.csv –relationships csvs/rels/RC.csv –ignore-missing-nodes=true
***************
IF Mac
Stop Neo4j. bin/neo4j stop
Delete the graph.db file under database
./import-builder-neo4j.sh “/Users/gregjordan/dev/neo4j/code-mash-precompiler/neo4j-enterprise-3.4.9/bin/csvs”
Start Neo4j, bin/neo4j start’
***************
// run these one at a time
CREATE INDEX ON :Category(categoryId)
CREATE INDEX ON :Category(categoryName)
CREATE INDEX ON :Location(locationId)
CREATE INDEX ON :Location(zip)
CREATE INDEX ON :Purchase(purchaseId)
CREATE INDEX ON :Product(productId)
CREATE INDEX ON :Status(statusId)
CREATE INDEX ON :User(userId)
CREATE INDEX ON :User(username)
:schema
***************
then….
***************
// run these one at a time
return row
LIMIT 5
***************
CREATE (:Location { locationId: row.locationId,business:row.business, city: row.city, county: row.county, usState: row.usState, zip: row.zip, latitude: toFloat(row.lat), longitude: toFloat(row.lon)})
***************
CREATE (u:User { userId:row.userId, username:row.username,firstname:row.firstname,lastname:row.lastname })
***************
CREATE (p:Purchase { purchaseId:row.purchaseId })
***************
MATCH (u:User { userId: row.userId } ),(p:Purchase { purchaseId: row.purchaseId } )
MERGE (u)-[:MADE_PURCHASE]->(p)
***************
MATCH (p:Purchase { purchaseId: row.purchaseId } ), (product:Product { productId: row.productId } )
MERGE (p)-[:PURCHASE_CONTAINS]->(product)
***************
MATCH p=(u:User {userId: “1”})-[:MADE_PURCHASE]->(pro)-[r:PURCHASE_CONTAINS]->() RETURN p LIMIT 2
***************
/// OH NO! forgot to add the timestamp field for when the purchase occured
MATCH (p:Purchase { purchaseId: row.purchaseId } )
SET p.timestamp=toInt(row.purchaseDateAsLong)
***************
CREATE (s:Status { statusId:row.statusId, userIdStatusId:row.userIdStatusId, statusText:row.statusText, timestamp:toInt(row.timestamp) })
***************
CREATE (s:Status { statusId:row.statusId, userIdStatusId:row.userIdStatusId, statusText:row.statusText, timestamp:toInt(row.timestamp) })
***************
CREATE (s:Status { statusId:row.statusId, userIdStatusId:row.userIdStatusId, statusText:row.statusText, timestamp:toInt(row.timestamp) })
***************
MATCH (u:User { userId: row.userId } ), (s:Status { statusId:row.statusId} )
MERGE (u)-[:CURRENT_STATUS]->(s)
***************
MATCH (s:Status { statusId:row.lastStatusId} ), (ns:Status { statusId:row.statusId} )
MERGE (s)-[:NEXT_STATUS]->(ns)
***************
MATCH (s:Status { statusId:row.nextToLastStatusId} ), (ns:Status { statusId:row.statusId} )
MERGE (s)-[:NEXT_STATUS]->(ns)
***************
USING PERIODIC COMMIT 10000
MATCH (user:User { userId: row.userId } ), (userToFollow:User { userId: row.followingid } )
MERGE (user)-[:FOLLOWS]->(userToFollow)
***************
MATCH (u:User {userId: “1” })
WITH u
MATCH (u)-[:FOLLOWS*0..1]->(f)
RETURN f
***************
MATCH (u:User {userId: “1” })
WITH u
MATCH (u)-[:FOLLOWS*0..1]->(f)
WITH DISTINCT f,u
MATCH (f)-[:CURRENT_STATUS]->(ls)-[:NEXT_STATUS*0..3]->(s)
RETURN s.statusId as statusId, s.statusText as statusText, s.timestamp as timestamp, f.username as username, f=u as owner
ORDER BY s.timestamp DESC
***************
// run the match to location->products
match (u:Location),(p:Product)
with u,p
limit 5000000
where rand() < 0.1
create (u)-[:LOCATION_HAS_PRODUCT]->(p);
***************
MATCH (l:Location)
WHERE (l)-[:LOCATION_HAS_PRODUCT]->()
RETURN l.zip
***************
// add a point layer
CALL spatial.addPointLayer(‘geom’)
***************
// index the locations
MATCH (l:Location) WITH collect(l) as locedafterdark
CALL spatial.addNodes(‘geom’, locedafterdark)
YIELD count
RETURN count
***************
///checked that it works
CALL spatial.withinDistance(“geom”,{latitude:35.1531,longitude:-90.0555},5.0)
YIELD node as location
RETURN location
***************