Rogan DP

1. Usage
- 1.1. Data files
2. Task list
3. Wish list
4. Bug to report upstream

1 Usage

1.1 Data files

Headers need to be as follows

GI50 (cell, gi50)
- Cell line
- Drug (e.g., Cisplatin)
Relations (geneL, relation, geneR)
- GeneLID
- RelationID
- GeneRID
Expression (gene, cell, expression)
- GeneID
- Cell lines (e.g., 184A1, 184B5, 600MPE, AU565, etc.)
Copy (gene, cell, copy)
- GeneID
- Cell lines (e.g., 184A1, 184B5, 600MPE, AU565, etc.)

2 Task list

2.1 DONE Input/output system `[5/5]`

2.1.1 DONE Input long format tables

-- | Pull specified columns and drop the rest.

only :: forall a rs.
     -> Rec (VConst a) rs                         -- ^ Header labels
     -> NonEmpty (NonEmpty a)                     -- ^ Input table to select columns from
     -> NonEmpty (Rec (VConst a) rs)              -- ^ Selected columns from input sheet
only targets ((NonEmpty.ToList -> header) :| body) =
    header :& ( body & Lens.mapped &~ Lens.view Lens._2 ( split indicies ) )
  where
    indicies :: IntMap IntSet                         -- Indicies of targets input
    indicies = occurs group targets header

2.1.2 DONE Input short format tables

-- | Pull specified columns and put the rest into a variable and values column.
--
-- Putting the variables and values on the front means the remainder can share its memory.

melt :: forall a rVar' rVal rs. Monoid a
     => Group a                                   -- ^ Discriminator for grouping a
     -> Rec (VConst a) (rVar':rVal':rs)           -- ^ Header labels with first to being variables and values
     -> NonEmpty (NonEmpty a)                     -- ^ Input table to select columns from and melt others
     -> NonEmpty (Rec (VConst a) (rVar':rVal':rs)) -- ^ Output variables, values, and selected columns
melt group (variable :& value :& targets) variables ((NonEmpty.ToList -> header) :| body) =
    variable :& value :& targets :| Foldable.foldr output [] body
  where
    indicies :: IntMap IntSet                        -- Indicies of targets in input
    indicies = occurs group targets header

    variables :: [a]                                 -- Variables from header
    variables = split indicies header ^. Lens._1

    output :: NonEmpty a                             -- Row to process
           -> [Rec (VConst a) (rVar':rVal':rs)]      -- Processed rows
           -> [Rec (VConst a) (rVar':rVal':rs)]      -- Row expanded and added on front of processed ones
    output (NonEmpty.toList -> row) rest = List.foldr cross rest $ List.zip variables values
      where
        values :: [a]                                    -- Values from row (not 
        fixed :: Rec (VConst a) rs                       -- Fixed portion of row (header matches)
        (values,fixed) = split indicies row

        cross :: (a,a)                                   -- Variable and value for row
              -> [Rec (VConst a) (rVar':rVal':rs)]       -- Procsessed rows
              -> [Rec (VConst a) (rVar':rVal':rs)]       -- Row added on front of processed ones
        cross (variable,value) rest = (Vinyl.Const variable) :& (Vinyl.Const value) :& fixed : rest


-- | Mapping from header indicies to referencing target indicies (i.e., where to write values)

occurs :: forall a rs. Group a                   -- ^ Discriminator to group values
       -> Rec (VConst a) rs                      -- ^ Target values
       -> [a]                                    -- ^ Header values
       -> IntMap IntSet                          -- ^ Header index to referencing target indicies
occurs group (Lens.review _Rec -> targets) header = IntMap.fromAscList map
  where
    -- Target and header indicies grouped by having the same value (both form respectively disjoint sets)
    common :: [(IntSet,IntSet)]                      -- All groups of targets and headers that refer to the same value
    common = Discrimination.runGroup group (targets' <> header') & Lens.mapped %~ Foldable.foldl' collect (IntSet.empty,IntSet.empty)
      where
        targets' :: [(a,Either Int Int)]                 -- Targets and their Left wrapped indicies
        targets' = targets & Lens.mapped %@~ ( \index -> (, Either.Left index) )

        header' :: [(a,Either Int Int)]                  -- Header values and their Right wrapped indicies
        header' = header & Lens.mapped %@~ ( \index -> (, Either.Right index) )

        -- Do this with a fold instead of partitionEithers so targets' <> header' is used incrementally
        collect :: (IntSet,IntSet)                       -- Target and header referencs so far
                -> Either Int Int                        -- Target reference or header reference
                -> (IntSet,IntSet)                       -- Target and header references so far with addition
        collect (Left targetIndex ) (!targetIndicies,!headerIndicies) -> ( targetIndicies & Lens.at targetIndex %~ Maybe.Just ()
                                                                         , headerIndicies )
        collect (Right headerIndex) (!targetIndicies,!headerIndicies) -> ( targetIndicies
                                                                         , headerIndicies & Lens.at headerIndex %~ Maybe.Just () )

    -- Distinct header indicies to target indicies mapping (in order as indicies were in order and grouping is stable)
    map :: [(Int,IntSet)]
    map = common ^.. Lens.folded . Lens.to expand . Lens.folded
      where
        expand :: (IntSet,IntSet)                        -- Related target and harder indicies
               -> [(Int,IntSet)]                         -- Header index to related target indicies
        expand (targetIndicies,headerIndicies) = headerIndicies ^.. IntSet.members . Lens.to (,targetIndicies) )


-- | Pull out and accumulate values of interest at locations in a vector specified by an index to indicies map

split :: froall a rs. Monoid a
      => IntMap IntSet                           -- ^ Row indicies to target indicies (i.e., what and where to collect)
      -> [a]                                     -- ^ Row to split and collect/accumulate across
      -> ([a],V (RLength rs))                    -- ^ Non-accumulated bits of row and accumulation vector
split map row = ( others
                , Vector.accum (<>) (Vector.replicate (Linear.reflectDim $ Proxy.Proxy @(RLength n))) Monoid.mempty )
    -- Could use promises package os row would be consumed incrementally
    updates :: [Either a (Int,a)]                    -- Either leftover or a target writebacks
    updates = row ^.. Lens.folded . Lens.withIndex . Lens.to process . Lens.folded
      where
        process :: Int                                   -- Row index
                -> a                                     -- Row value
                -> [Either a (Int,a)]                    -- Either add to others or write targets
        process index value = map ^.. Lens.failing
                                        (Lens.ix index . IntSet.members . re _Right)
                                        (Lens.to (Function.const value) . re _Left )

    others :: [a]                                    -- Unreferences values
    writes :: [(Int,a)]                              -- Referenced values and index to store them
    ( others
    , writes ) = Either.partitionEithers updates

2.1.3 DONE Output long format tables

dump :: forall a rs. (RecordListN rs)
     => Rec (VConst a) rs                         -- ^ Header labels
     -> [Rec (VConst a) rs]                       -- ^ Input table to dump
     -> NonEmpty [a]                              -- ^ Output table
dump header body = ( header :| body ) & Lens.mapped %~ Lens.review _Rec

2.1.4 DONE Output short format tables

-- | Output unique values for columns along with a column for each variable and their accumulated values

cast :: forall a rVar' rVal rs. Monoid a
     => Group a                                   -- ^ Discriminator for grouping a
     -> NonEmpty (Rec (VConst a) (rVar':rVal':rs)) -- ^ Input variables, values, and other columns
     -> NonEmpty (NonEmpty a)                     -- ^ Output table with variables columns on end
cast group ( (_ :& _ :& Lens.view -> header) :| Lens.mapped &~ Lens.view _Rec -> body) =
    NonEmpty (header <> variables) :| ( body' & Lens.mapped%~ output )
  where
    body' :: [[([a],a,a)]]                           -- Body grouped by fixed portions (fixed portions are the same)
    body' = Grouping.runGroup group $ body & Lens.mapped %~ ( \variable:value:fixed -> (fixed,(fixed,variable,value)) )

    variables :: [a]                                 -- Variables (unique across all rows)
    variables = ( Grouping.runGroup group $ body & Lens.mapped %~ ( \variable:_ -> (variable,variable) ) )
                   ^.. Lens.folded . Lens._head

    variablesSize :: Int                             -- Number of unique variables
    variablesSize = Lens.lengthOf Lens.folded variables

    variablesIndex :: [(a,Either Int a)]             -- Variables with their index (one to one)
    variablesIndex = variables & Lens.mapped %@~ ( \(index,variable) -> (variable, Either.Left index) )

    output :: [([a],a,a)] -> [a]
    output row@((fixed,_,_):_) = Vector.toList $ Vector.accum (<>) (Vector.replicate Monoid.variablesSize mempty) updates
      where
        variablesValue :: [(a,Either Int a)]             -- Variables with their value
        variablesValue =  row & Lens.mapped %~ ( \(_,variable,value) -> (variable, Either.Right value) )

        variables :: [[(Either Int a)]]                  -- Variables with ther index and values
        variables = Grouping.runGroup group $ variablesIndex <> variablesValue

        updates :: [(Int,a)]                             -- Values with their index
        updates = variables ^.. Lens.folded . Lens.to ( \(Left index:rest) -> rest ^.. Lens.folded . Lens._Right )

2.1.5 DONE Hookup/specialize for Text and Vinly records

Containers

can be an iso, lens, or prism
upstream is container
can add/remove items

Fold

upstream cannot access container
upstream can add/remove elements
can add/remove items

Elements

upstream can
upstream is elements only
cannot add/remove items

Functions

generally used to act on lenses
don't compose

clean :: (HasLogging, Foldable f) => Fold (f (Rec (Either Text :. ElField rs))) (Rec ElField rs)
clean = Lens.folded . Lens.to to . Lens._Just
  where
    to :: Int -> Rec (Either Text :. ElField rs) -> Maybe (Rec ElField rs)
    to row record = case Vinyl.rtraverse Vinyl.getCompose record of
      Either.Left  text    -> Maybe.Nothing ## (Log.Note,Text.format "unable to convert value "{}"" (Text.Only text))
      Either.Right record' -> Maybe.Just record'

2.2 DONE Update existing code `[3/3]`

2.2.1 DONE Input `[5/5]`

[X] aliases (alias,gene)
- has entries on column separated by '|'
[X] relations (gene,relation,gene) [aliases applied]
- long format table
- no headers, probably best to just manually add them
- remove genes prefixed with CHEBI: and relationship controls-phosphorylation-of
[X] gi50 (cell,gi50)
- long format table
[X] copy (gene,cell,copy)
- short format table (cell line) and requires aliases
[X] expression (gene,cell,expression)
- short format table (cell line) and requires aliases

2.2.2 DONE Computed `[6/6]`

[X] apply aliases to other tables
- alias = (alias,gene)
- copy = (alias,gene,cell,copy)
- expression = (alias,gene,cell,expression)
- relation = (alias,alias,gene,gene,relation)
- gi50 = (cell,gi50)
[X] update config system
[X] mfa (gene,mfa)
- merge copy and expression on (gene,cell) to get ([alias],gene,cell,copy,expression)
- merge with gi50 to get ([alias],gene,cell,gi50,copy,expression)
- group by gene to get ([alias],gene,[(cell,gi50,copy,expression)])
- calculate mfa on [(cell,gi50,copy,expression)] to get ([alias],gene,mfa)
[X] gene' (gene)
- remove unless gi50/copy or gi50/expression angle <= target or >= 180-target
- simple filter on (gene,mfa)
[X] relations' (gene,relation,gene)
- drop relations that do not involve gene'
- fold of (gene,relation,gene) made a bit ugly by two gene
[X] steps (gene,step)
- smallest number of steps to get to gene from original set of genes using relations'
- existing code

2.2.3 DONE Output `[2/2]`

[X] dump correlation circles for genes if requested (diagram)
[X] dump mfa summary info if requested (gene, gi50/copy, gi50/expression,stongest,stongest sign,hops)

2.3 DONE Generate SVM input `[4/4]`

[X] dump svm (cell,gi50,gene,copy,expression)
- core of this is already done with new output short format tables (i.e., (cell,gi50,copy/expression/gene1,…)
- verify with John what label combinations to generate?
- verify with John how to choose whether to export copy or expression?
[X] switch between copy and expression based on which is more correlated
[X] option to limit the number of relation steps followed
[X] generate labels according to option

2.4 DONE Fix alias handling `[3/3]`

[X] alias handling when combining expressions and copies
- possible aliases are ones where the gene does not occur in the input
- only apply possible aliases on genes names that fail to merge

-- Aliases filtered to remove ones mapping to genes in copies (using those would create duplicates)

copiesPossible :: Rec ElField '["gene" ::: Text, "alias" ::: Text]
copiesPossible = Discrimination.joining merge (^. Vinyl.rlensf #gene) (^. Vinyl.rlensf #gene) aliases copies
                   ^.. Lens.folded . Lens._Just
  where
    -- Only keep aliases whose gene doesn't match any in use
    merge :: [Rec ElFlied Alias] -> [Rec ElField Copy] -> [Rec ElFlield Alias]
    merge aliases [] = Maybe.Just aliases
    merge _       _  = Maybe.Nothing


-- Aliases filtered to remove ones mapping to genes in expressions (using those would create duplicates)

expressionsPossible :: Rec ElField '["gene" ::: Text, "alias" ::: Text]
expressionsPossible = Discrimination.joining merge (^. Vinyl.rlensf #gene) (^. Vinyl.rlensf #gene) aliases expressions
                   ^.. Lens.folded . Lens._Just
  where
    -- Only keep aliases whose gene doesn't match any in use
    merge :: [Rec ElFlied Alias] -> [Rec ElField Expression] -> [Rec ElFlield Alias]
    merge aliases [] = Maybe.Just aliases
    merge _       _  = Maybe.Nothing


-- Copies and expressions that are unmatched against each other (candidates for applying aliases too)

copiesUnmatched      :: [Text]
expressionsUnmatched :: [Text]
( copiesUnmatched
, expressionsUnmatched  )
  = Either.partitionEithers $ Discrimination.joining Discrimination.grouping merge (^. Vinyl.rlensf #gene) (^. Vinyl.rlensf #gene) copies expressions
                                ^.. Lens.folded . Lens._Just
  where
    -- Only keep ones that don't match up, tagging them so we can pull them apart
    merge :: [Rec ElField Expression] -> [Rec ElField Copy] -> Maybe (Either Text Text)
    merge copies []          = Maybe.Just . Either.Left  $ copies ^. Lens._head . Vinyl.rlensf #gene
    merge []     expressions = Maybe.Just . Either.Right $ expressions      ^. Lens._head . Vinyl.rlensf #gene
    merge _      _           = Maybe.Nothing


-- Aliases to be applied to copies (don't create duplicates and match unmatched genes)

aliasesCopy :: [Rec ElField Alias]
aliasesCopy = Discrimination.joining Discrimination.grouping merge (^. Vinyl.rlensf #alias) Function.id copiesPossible copiesUnmatched
                ^.. Lens.folded . Lens._Just
  where
    -- Only keep possible aliases that provide an alias for an unmatched gene
    merge :: [Rec ElField Alias] [Text]
    merge []      _  = Maybe.Nothing
    merge _       [] = Maybe.Nothing
    merge aliases _  = Maybe.Just aliases


-- Aliases to be applied to expressions (don't create duplicates and match unmatched genes)

aliasesExpression :: [Rec ElField Alias]
aliasesExpression = Discrimination.joining Discrimination.grouping merge (^. Vinyl.rlensf #alias) Function.id expressionsPossible expressionsUnmatched
                      ^.. Lens.folded . Lens._Just
  where
    -- Only keep possible aliases that provide an alias for an unmatched gene
    merge :: [Rec ElField Alias] [Text]
    merge []      _  = Maybe.Nothing
    merge _       [] = Maybe.Nothing
    merge aliases _  = Maybe.Just aliases

[X] alias handling with relationships
- expand relationships across all aliases
[X] output aliases used to file

2.5 DONE Dump MFA input

2.6 DONE Prefilter cells

2.7 DONE Output link information `[4/4]`

[X] Add relations to geneConnections
- input: relationsFiltered, relationsPrefiltered
- output: genesSelected

geneConnections :: [Rec ElField '["gene" ::: Text, "geneN" ::: Text]]
geneConnections :: [Rec ElField '["gene" ::: Text, "geneN" ::: Text, "relations" ::: [Text]]]

[X] Add parents to genesSelected
- implement new expansion algorithm
- input: geneConnections
- output: analysesSelected, selectedBase

Pre-step

  IN

    [a,d] => [(a,[]),(d,[])]
    [(a,c),(b,c),(c,a),(c,e),(d,f),(e,b),(e,c),(e,f),(f,d),(f,e)]

  Merge fst x snd (merge input nodes, remove input back links)

    [(a,[])] [(c,a)]              ->  L (a,[])
    [(d,[])] [(f,d)]              ->  L (d,[])
    []       [(a,c),(b,c),(e,c)]  ->  R [(a,c),(b,c),(e,c)]
    []       [(c,e),(f,e)]        ->  R [(c,e),(f,e)]
    []       [(d,f),(e,f)]        ->  R [(d,f),(e,f)]
    []       [(e,b)]              ->  R [(e,b)]

  * => [(a,[]),(d,[])] => [a,d]
    => [(a,c),(b,c),(e,c),(c,e),(f,e),(d,f),(e,f),(e,b)]

Step

  Merge fst x fst (next nodes, remove used links)

    [a] [(a,c)]              -> L [(c,[a])]
    [d] [(d,f)]              -> L [(f,[d])]
    []  [(b,c)]              -> R [(b,c)]
    []  [(e,c),(e,f),(e,b)]  -> R [(e,c),(e,f),(e,b)]
    []  [(c,e)]              -> R [(c,e)]
    []  [(f,e)]              -> R [(f,e)]

    => [(c,[a]),(f,[d])]
    => [(b,c),(e,c),(e,f),(e,b),(c,e),(f,e)]

  Merge fst x snd (merge nodes, remove back links)

    [(c,[a])] [(b,c),(e,c)] -> L (c,[a])
    [(f,[d])] [(e,f)]       -> L (f,[d])
    []        [(e,b)]       -> R [(e,b)]
    []        [(c,e)]       -> R [(c,e)]
    []        [(f,e)]       -> R [(f,e)]

  * => [(c,[a]),(f,[d])] => [c,f]
    => [(e,b),(c,e),(f,e)]

Step

  Merge fst x fst (next nodes, remove used links)

    [c] [(c,e)] -> L [(e,[c])]
    [f] [(f,e)] -> L [(e,[f])]
    []  [(e,b)] -> R [(e,b)]

    => [(e,[c]),(e,[f])]
    => [(e,b)]

  Merge fst x snd (merge nodes, remove back links)

    [(e,[c]),(e,[f])] []      -> L (e,[c,f])
    []                [(e,b)] -> R [(e,b)]

  * => [(e,[c,f])] => [e]
    => [(e,b)]

Step

  Merge fst x fst (next nodes, remove used links)

    [e] [(e,b)] => L [(b,[e])]

    => [(b,[e])]
    => []

  Merge fst x snd (merge nodes, remove back links)

    [(b,[e])] [] -> L (b,[e])

  * => [(b,[e])] => [b]
    => []

genesSelected :: [Rec ElField '["gene" ::: Text, "step" ::: Int]]
genesSelected :: [Rec ElField '["gene" ::: Text, "step" ::: Int, "parents" ::: [Text,[Text]]]]

[X] Include parents in MFA output (analysesSelected)
- output format: gene:relation,relation,…|….

many (delimited '|' --> (delimited ':' -&- many (delimited ',')))

genesSelected :: [Rec ElField '["gene" ::: Text, "step" ::: Int]]
genesSelected :: [Rec ElField '["gene" ::: Text, "step" ::: Int, "parents" ::: [Text,[Text]]]]

[X] Updated SVM code to ignore new fields (selectedBase)

3 Wish list

3.1 DONE Replace P with proper Prism

The P types is actually a Prism' s (Maybe s,a) and could be replaced with it for seamless interfacing with the rest of lenses. This would allow such thing as isos and composing with existings lenses.

3.2 DONE Logging system

logging from pure code so don't polute with Monad
pass around as a constraint so compiler plumbs it
ability to supress excessive messages

3.3 DONE Warn about unused keys

3.4 DONE Space separated parser for config.tsv file

3.5 DONE Logging context instead of level limitations

3.6 Revamp of name system

Use reflection system to pass around config, allows base types to be tagged with the assoicated name component as well as making it possible to create Show instances and such

newtype Foo n = Foo Int  -- foo type based around names contained in n

Lens.makePrisms ''Foo

fooIso :: forall g. ( Given g, HasFooIso' g ( ReifiedIso' ByteString (Foo g) ) ) => Iso' ByteString (Foo g)
fooIso = Lens.runIso (Reflection.given @ g ^. fooIso')

data Names = Names
  { namesFooIso' :: !(ReifiedIso' ByteString (Foo Names)) }

class HasFooIso' s a | s -> a where
  fooIso' :: Lens' s a

Lens.makeFields ''Names

instance ( Given g, HasFooIso' g ( ReifiedIso' ByteString (Foo g) ) ) => Show (Foo g) where
  show = Show.show . Lens.view (Lens.from fooIso)


magic :: IO ()
magic = do
  iso <- Names.isoCreate
  Reflection.give (Names (Lens.Iso (Lens.runIso iso . Lens.from _Foo))) $ do
    IO.print (hi ^. fooIso :: FooNames)

3.7 Replace Text bits using name system

A large part of this falls under the existing Names framework. Need to rework to tie the Names MVar record with a skolem variable.

3.8 Further cleanup of input system

3.8.1 Replace Finite with own instance that allows tuples

Idea is to be able to draw relationships between length and types.

newtype ListN (n :: GHC.Nat) a = ListN [a]

class Finite v where
  type Size v :: GHC.Nat
  type Value v :: *
  toListN :: v -> ListN (Size v)
  fromListN :: ListN (Size v) -> v

instance Finite (a,a) where
  type Size (a,a) = 2
  type Value (a,a) = a
  toListN (a0,a1) = ListN [a0,a1]
  fromListN (ListN [a0,a1]) = (a0,a1)

# ... and so on for tuples, records, and such

3.8.2 Expose fold at top of long/short/pull/push

3.9 Handle multiple hops and loops in Pseudonyms file

ADCY3 is both a symbol and a prev_symbol for ADCY8. Are there some with multiple hops? Could there even be circles?

3.10 Don't re-compute entire table when doing SVM label dumping

3.11 Logging ability to add to parents message count

3.12 Split into separate programs for command line composability

3.13 Option of treating aliases as relations instead of duplicating relations

3.14 Graphviz output for output link information

4 Bug to report upstream

4.1 foldFreeT type signature is over restrictive in requiring transformation to hold for all monads

foldFreeT :: (MonadTrans t, Monad (t m), Monad m) => (forall n x. Monad n => f x -> t n x) -> FreeT f m a -> t m a

instead of just the monad in question m (the current definition works fine with the new signature)

foldFreeT :: (MonadTrans t, Monad (t m), Monad m) => (forall x. f x -> t m x) -> FreeT f m a -> t m a

4.2 Iso passed to au is backwards

https://github.com/ekmett/lens/issues/821

4.3 L1' Scan prefix1 implementation isn't strict in the sense of no blowing up memory

let
  sum = L1' id (+) (0+)
in
  foldl' (flip prefix1 ) [1..]

as the above expands too

... ((+) $! ((+) $! ((+) $! 0 1) 2) 3) ...