How to set up a simple subversion workflow?

What are some recommended tools for data workflow visualization?

  • I have a database with hundreds of tables, some of which contain in excess of 100 million rows. I can draw an entity relationship diagram (ERD), but the end product is almost unusable because of the number of foreign key relationships and the size of the image. I have a second challenge: the documentation of the workflow of data in the system. Most of the records in each of the tables are versioned and audited, and have a lifecycle/status. I can take a developer and for a single record trace where all of that record's data is stored and how it changes over time on the ERD. What I would like is an automated tool that can manage both the ERD and the workflows, give me a report of potential data vulnerabilities and suggestions for testing. I realize that this is probably not a realistic request. Instead, I would like a tool to be able to generate diagrams that might resemble a London Tube map- something that can be used to take slices of both the ERD and the data workflow and visually present it in a pleasing way. The workflows layered on the database are subject to continual change/enhancement and there are numerous decision/branchpoints, although all of them are made up of the same set of states. Being able to depict chain of custody (and all of the actors along the way, including splitting actions) is a big plus. I've attempted a number of ways to use tools such as Visio and they "just don't cut it" when i get beyond a few simple steps. When I see recommendations like http://flowingdata.com/2008/10/20/40-essential-tools-and-resources-to-visualize-data/ I just want to scream, because if I had the time I would code a tool myself- "Try Python" or "Try R" is not a helpful suggestion- I'm looking for off-the-shelf tools for design engineering, minimum of programming needed. Update: here's the sort of thing I want to do. For example, let's say you were a Tube engineer, and 10,000 people just went through the turnstiles at Mornington Crescent. Given the existing ridership data (from millions of on/off e-ticket transactions), how could you graphically represent top ten historical outcomes as a use case "on the fly" and give recommendations for additional trains or cars to cover particular lines? Could you sort the number of people going to a football match from the number of people going to work in Docklands? Could you trend new use cases on the fly?

  • Answer:

    In terms of "Big Data", when you define a workflow in for MapReduce applications, it's one additional line of code to generate a Dot file (see http://en.wikipedia.org/wiki/DOT_language definition) which can be read in OmniGraffle, Visio, etc. That is quite literally the visualization of the job DAG. In other words, it shows the "physical plan" of your queries: the set operations and other transforms applied to your tuple streams.

Paco Nathan at Quora Visit the source

Was this solution helpful to you?

Other answers

I'm not quite sure I understand what you are asking but as far as data visualization tools that include work flow. SAS Enterprise Miner http://www.sas.com/technologies/analytics/datamining/miner/ If you have SAS in your shop this is a good way to go. Alpine Miner http://www.alpinedatalabs.com/product/index.html I'm not sure what databases this works with but there is a free trial version.

Robert Eckhardt

Step 1: is to know what questions you need answered. Step 2 is to lay the data into an information model, vs. traditional data modeling. In daa modeling one collects data and not necessarily in tune for information. Step 3: is to establish the value of information as there are tools that can deliver 200 to 300 million records in under 10 seconds and others that can deliver 40 to 50 billion records in under 10 seconds. No wind is a good wind if the captain does not know their destination' So back to step one   Big data itself is a big misnomer. If you are NSA then lets talk Yotabytes. If you are facebook or Google then lets talk Exabytes. If you are a global enterprise then lets talk 20 to 50 billion records or tens of terabytes. A 100 million records would not qualify as big data in most cases.

Hari Guleria

This data is structured relational data. So, it is easy. One can use Fusion table API or Google charts to visualize without coding. But if there is a need to have more complicated reports, both products have APIs,

Pradeep Pujari

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.