Rogan DP
Table of Contents
- 1. Usage
- 2. Task list
- 3. Wish list
- 3.1. DONE Replace P with proper Prism
- 3.2. DONE Logging system
- 3.3. DONE Warn about unused keys
- 3.4. DONE Space separated parser for config.tsv file
- 3.5. DONE Logging context instead of level limitations
- 3.6. Revamp of name system
- 3.7. Replace Text bits using name system
- 3.8. Further cleanup of input system
- 3.9. Handle multiple hops and loops in Pseudonyms file
- 3.10. Don't re-compute entire table when doing SVM label dumping
- 3.11. Logging ability to add to parents message count
- 3.12. Split into separate programs for command line composability
- 3.13. Option of treating aliases as relations instead of duplicating relations
- 3.14. Graphviz output for output link information
- 4. Bug to report upstream
1 Usage
1.1 Data files
Headers need to be as follows
- GI50
(cell, gi50)
Cell line
- Drug (e.g.,
Cisplatin
)
- Relations
(geneL, relation, geneR)
GeneLID
RelationID
GeneRID
- Expression
(gene, cell, expression)
GeneID
- Cell lines (e.g.,
184A1
,184B5
,600MPE
,AU565
, etc.)
- Copy
(gene, cell, copy)
GeneID
- Cell lines (e.g.,
184A1
,184B5
,600MPE
,AU565
, etc.)
2 Task list
2.1 DONE Input/output system [5/5]
2.1.1 DONE Input long format tables
-- | Pull specified columns and drop the rest. only :: forall a rs. -> Rec (VConst a) rs -- ^ Header labels -> NonEmpty (NonEmpty a) -- ^ Input table to select columns from -> NonEmpty (Rec (VConst a) rs) -- ^ Selected columns from input sheet only targets ((NonEmpty.ToList -> header) :| body) = header :& ( body & Lens.mapped &~ Lens.view Lens._2 ( split indicies ) ) where indicies :: IntMap IntSet -- Indicies of targets input indicies = occurs group targets header
2.1.2 DONE Input short format tables
-- | Pull specified columns and put the rest into a variable and values column. -- -- Putting the variables and values on the front means the remainder can share its memory. melt :: forall a rVar' rVal rs. Monoid a => Group a -- ^ Discriminator for grouping a -> Rec (VConst a) (rVar':rVal':rs) -- ^ Header labels with first to being variables and values -> NonEmpty (NonEmpty a) -- ^ Input table to select columns from and melt others -> NonEmpty (Rec (VConst a) (rVar':rVal':rs)) -- ^ Output variables, values, and selected columns melt group (variable :& value :& targets) variables ((NonEmpty.ToList -> header) :| body) = variable :& value :& targets :| Foldable.foldr output [] body where indicies :: IntMap IntSet -- Indicies of targets in input indicies = occurs group targets header variables :: [a] -- Variables from header variables = split indicies header ^. Lens._1 output :: NonEmpty a -- Row to process -> [Rec (VConst a) (rVar':rVal':rs)] -- Processed rows -> [Rec (VConst a) (rVar':rVal':rs)] -- Row expanded and added on front of processed ones output (NonEmpty.toList -> row) rest = List.foldr cross rest $ List.zip variables values where values :: [a] -- Values from row (not fixed :: Rec (VConst a) rs -- Fixed portion of row (header matches) (values,fixed) = split indicies row cross :: (a,a) -- Variable and value for row -> [Rec (VConst a) (rVar':rVal':rs)] -- Procsessed rows -> [Rec (VConst a) (rVar':rVal':rs)] -- Row added on front of processed ones cross (variable,value) rest = (Vinyl.Const variable) :& (Vinyl.Const value) :& fixed : rest -- | Mapping from header indicies to referencing target indicies (i.e., where to write values) occurs :: forall a rs. Group a -- ^ Discriminator to group values -> Rec (VConst a) rs -- ^ Target values -> [a] -- ^ Header values -> IntMap IntSet -- ^ Header index to referencing target indicies occurs group (Lens.review _Rec -> targets) header = IntMap.fromAscList map where -- Target and header indicies grouped by having the same value (both form respectively disjoint sets) common :: [(IntSet,IntSet)] -- All groups of targets and headers that refer to the same value common = Discrimination.runGroup group (targets' <> header') & Lens.mapped %~ Foldable.foldl' collect (IntSet.empty,IntSet.empty) where targets' :: [(a,Either Int Int)] -- Targets and their Left wrapped indicies targets' = targets & Lens.mapped %@~ ( \index -> (, Either.Left index) ) header' :: [(a,Either Int Int)] -- Header values and their Right wrapped indicies header' = header & Lens.mapped %@~ ( \index -> (, Either.Right index) ) -- Do this with a fold instead of partitionEithers so targets' <> header' is used incrementally collect :: (IntSet,IntSet) -- Target and header referencs so far -> Either Int Int -- Target reference or header reference -> (IntSet,IntSet) -- Target and header references so far with addition collect (Left targetIndex ) (!targetIndicies,!headerIndicies) -> ( targetIndicies & Lens.at targetIndex %~ Maybe.Just () , headerIndicies ) collect (Right headerIndex) (!targetIndicies,!headerIndicies) -> ( targetIndicies , headerIndicies & Lens.at headerIndex %~ Maybe.Just () ) -- Distinct header indicies to target indicies mapping (in order as indicies were in order and grouping is stable) map :: [(Int,IntSet)] map = common ^.. Lens.folded . Lens.to expand . Lens.folded where expand :: (IntSet,IntSet) -- Related target and harder indicies -> [(Int,IntSet)] -- Header index to related target indicies expand (targetIndicies,headerIndicies) = headerIndicies ^.. IntSet.members . Lens.to (,targetIndicies) ) -- | Pull out and accumulate values of interest at locations in a vector specified by an index to indicies map split :: froall a rs. Monoid a => IntMap IntSet -- ^ Row indicies to target indicies (i.e., what and where to collect) -> [a] -- ^ Row to split and collect/accumulate across -> ([a],V (RLength rs)) -- ^ Non-accumulated bits of row and accumulation vector split map row = ( others , Vector.accum (<>) (Vector.replicate (Linear.reflectDim $ Proxy.Proxy @(RLength n))) Monoid.mempty ) -- Could use promises package os row would be consumed incrementally updates :: [Either a (Int,a)] -- Either leftover or a target writebacks updates = row ^.. Lens.folded . Lens.withIndex . Lens.to process . Lens.folded where process :: Int -- Row index -> a -- Row value -> [Either a (Int,a)] -- Either add to others or write targets process index value = map ^.. Lens.failing (Lens.ix index . IntSet.members . re _Right) (Lens.to (Function.const value) . re _Left ) others :: [a] -- Unreferences values writes :: [(Int,a)] -- Referenced values and index to store them ( others , writes ) = Either.partitionEithers updates
2.1.3 DONE Output long format tables
dump :: forall a rs. (RecordListN rs) => Rec (VConst a) rs -- ^ Header labels -> [Rec (VConst a) rs] -- ^ Input table to dump -> NonEmpty [a] -- ^ Output table dump header body = ( header :| body ) & Lens.mapped %~ Lens.review _Rec
2.1.4 DONE Output short format tables
-- | Output unique values for columns along with a column for each variable and their accumulated values cast :: forall a rVar' rVal rs. Monoid a => Group a -- ^ Discriminator for grouping a -> NonEmpty (Rec (VConst a) (rVar':rVal':rs)) -- ^ Input variables, values, and other columns -> NonEmpty (NonEmpty a) -- ^ Output table with variables columns on end cast group ( (_ :& _ :& Lens.view -> header) :| Lens.mapped &~ Lens.view _Rec -> body) = NonEmpty (header <> variables) :| ( body' & Lens.mapped%~ output ) where body' :: [[([a],a,a)]] -- Body grouped by fixed portions (fixed portions are the same) body' = Grouping.runGroup group $ body & Lens.mapped %~ ( \variable:value:fixed -> (fixed,(fixed,variable,value)) ) variables :: [a] -- Variables (unique across all rows) variables = ( Grouping.runGroup group $ body & Lens.mapped %~ ( \variable:_ -> (variable,variable) ) ) ^.. Lens.folded . Lens._head variablesSize :: Int -- Number of unique variables variablesSize = Lens.lengthOf Lens.folded variables variablesIndex :: [(a,Either Int a)] -- Variables with their index (one to one) variablesIndex = variables & Lens.mapped %@~ ( \(index,variable) -> (variable, Either.Left index) ) output :: [([a],a,a)] -> [a] output row@((fixed,_,_):_) = Vector.toList $ Vector.accum (<>) (Vector.replicate Monoid.variablesSize mempty) updates where variablesValue :: [(a,Either Int a)] -- Variables with their value variablesValue = row & Lens.mapped %~ ( \(_,variable,value) -> (variable, Either.Right value) ) variables :: [[(Either Int a)]] -- Variables with ther index and values variables = Grouping.runGroup group $ variablesIndex <> variablesValue updates :: [(Int,a)] -- Values with their index updates = variables ^.. Lens.folded . Lens.to ( \(Left index:rest) -> rest ^.. Lens.folded . Lens._Right )
2.1.5 DONE Hookup/specialize for Text and Vinly records
Containers
- can be an iso, lens, or prism
- upstream is container
- can add/remove items
Fold
- upstream cannot access container
- upstream can add/remove elements
- can add/remove items
Elements
- upstream can
- upstream is elements only
- cannot add/remove items
Functions
- generally used to act on lenses
- don't compose
clean :: (HasLogging, Foldable f) => Fold (f (Rec (Either Text :. ElField rs))) (Rec ElField rs) clean = Lens.folded . Lens.to to . Lens._Just where to :: Int -> Rec (Either Text :. ElField rs) -> Maybe (Rec ElField rs) to row record = case Vinyl.rtraverse Vinyl.getCompose record of Either.Left text -> Maybe.Nothing ## (Log.Note,Text.format "unable to convert value "{}"" (Text.Only text)) Either.Right record' -> Maybe.Just record'
2.2 DONE Update existing code [3/3]
2.2.1 DONE Input [5/5]
[X]
aliases (alias,gene)- has entries on column separated by '|'
[X]
relations (gene,relation,gene) [aliases applied]- long format table
- no headers, probably best to just manually add them
- remove genes prefixed with CHEBI: and relationship controls-phosphorylation-of
[X]
gi50 (cell,gi50)- long format table
[X]
copy (gene,cell,copy)- short format table (cell line) and requires aliases
[X]
expression (gene,cell,expression)- short format table (cell line) and requires aliases
2.2.2 DONE Computed [6/6]
[X]
apply aliases to other tables- alias = (alias,gene)
- copy = (alias,gene,cell,copy)
- expression = (alias,gene,cell,expression)
- relation = (alias,alias,gene,gene,relation)
- gi50 = (cell,gi50)
[X]
update config system[X]
mfa (gene,mfa)- merge copy and expression on (gene,cell) to get ([alias],gene,cell,copy,expression)
- merge with gi50 to get ([alias],gene,cell,gi50,copy,expression)
- group by gene to get ([alias],gene,[(cell,gi50,copy,expression)])
- calculate mfa on [(cell,gi50,copy,expression)] to get ([alias],gene,mfa)
[X]
gene' (gene)- remove unless gi50/copy or gi50/expression angle <= target or >= 180-target
- simple filter on (gene,mfa)
[X]
relations' (gene,relation,gene)- drop relations that do not involve gene'
- fold of (gene,relation,gene) made a bit ugly by two gene
[X]
steps (gene,step)- smallest number of steps to get to gene from original set of genes using relations'
- existing code
2.2.3 DONE Output [2/2]
[X]
dump correlation circles for genes if requested (diagram)[X]
dump mfa summary info if requested (gene, gi50/copy, gi50/expression,stongest,stongest sign,hops)
2.3 DONE Generate SVM input [4/4]
[X]
dump svm (cell,gi50,gene,copy,expression)- core of this is already done with new output short format tables (i.e., (cell,gi50,copy/expression/gene1,…)
- verify with John what label combinations to generate?
- verify with John how to choose whether to export copy or expression?
[X]
switch between copy and expression based on which is more correlated[X]
option to limit the number of relation steps followed[X]
generate labels according to option
2.4 DONE Fix alias handling [3/3]
[X]
alias handling when combining expressions and copies- possible aliases are ones where the gene does not occur in the input
- only apply possible aliases on genes names that fail to merge
-- Aliases filtered to remove ones mapping to genes in copies (using those would create duplicates) copiesPossible :: Rec ElField '["gene" ::: Text, "alias" ::: Text] copiesPossible = Discrimination.joining merge (^. Vinyl.rlensf #gene) (^. Vinyl.rlensf #gene) aliases copies ^.. Lens.folded . Lens._Just where -- Only keep aliases whose gene doesn't match any in use merge :: [Rec ElFlied Alias] -> [Rec ElField Copy] -> [Rec ElFlield Alias] merge aliases [] = Maybe.Just aliases merge _ _ = Maybe.Nothing -- Aliases filtered to remove ones mapping to genes in expressions (using those would create duplicates) expressionsPossible :: Rec ElField '["gene" ::: Text, "alias" ::: Text] expressionsPossible = Discrimination.joining merge (^. Vinyl.rlensf #gene) (^. Vinyl.rlensf #gene) aliases expressions ^.. Lens.folded . Lens._Just where -- Only keep aliases whose gene doesn't match any in use merge :: [Rec ElFlied Alias] -> [Rec ElField Expression] -> [Rec ElFlield Alias] merge aliases [] = Maybe.Just aliases merge _ _ = Maybe.Nothing -- Copies and expressions that are unmatched against each other (candidates for applying aliases too) copiesUnmatched :: [Text] expressionsUnmatched :: [Text] ( copiesUnmatched , expressionsUnmatched ) = Either.partitionEithers $ Discrimination.joining Discrimination.grouping merge (^. Vinyl.rlensf #gene) (^. Vinyl.rlensf #gene) copies expressions ^.. Lens.folded . Lens._Just where -- Only keep ones that don't match up, tagging them so we can pull them apart merge :: [Rec ElField Expression] -> [Rec ElField Copy] -> Maybe (Either Text Text) merge copies [] = Maybe.Just . Either.Left $ copies ^. Lens._head . Vinyl.rlensf #gene merge [] expressions = Maybe.Just . Either.Right $ expressions ^. Lens._head . Vinyl.rlensf #gene merge _ _ = Maybe.Nothing -- Aliases to be applied to copies (don't create duplicates and match unmatched genes) aliasesCopy :: [Rec ElField Alias] aliasesCopy = Discrimination.joining Discrimination.grouping merge (^. Vinyl.rlensf #alias) Function.id copiesPossible copiesUnmatched ^.. Lens.folded . Lens._Just where -- Only keep possible aliases that provide an alias for an unmatched gene merge :: [Rec ElField Alias] [Text] merge [] _ = Maybe.Nothing merge _ [] = Maybe.Nothing merge aliases _ = Maybe.Just aliases -- Aliases to be applied to expressions (don't create duplicates and match unmatched genes) aliasesExpression :: [Rec ElField Alias] aliasesExpression = Discrimination.joining Discrimination.grouping merge (^. Vinyl.rlensf #alias) Function.id expressionsPossible expressionsUnmatched ^.. Lens.folded . Lens._Just where -- Only keep possible aliases that provide an alias for an unmatched gene merge :: [Rec ElField Alias] [Text] merge [] _ = Maybe.Nothing merge _ [] = Maybe.Nothing merge aliases _ = Maybe.Just aliases
[X]
alias handling with relationships- expand relationships across all aliases
[X]
output aliases used to file
2.5 DONE Dump MFA input
2.6 DONE Prefilter cells
2.7 DONE Output link information [4/4]
[X]
Add relations to geneConnections- input: relationsFiltered, relationsPrefiltered
- output: genesSelected
geneConnections :: [Rec ElField '["gene" ::: Text, "geneN" ::: Text]] geneConnections :: [Rec ElField '["gene" ::: Text, "geneN" ::: Text, "relations" ::: [Text]]]
[X]
Add parents to genesSelected- implement new expansion algorithm
- input: geneConnections
- output: analysesSelected, selectedBase
Pre-step IN [a,d] => [(a,[]),(d,[])] [(a,c),(b,c),(c,a),(c,e),(d,f),(e,b),(e,c),(e,f),(f,d),(f,e)] Merge fst x snd (merge input nodes, remove input back links) [(a,[])] [(c,a)] -> L (a,[]) [(d,[])] [(f,d)] -> L (d,[]) [] [(a,c),(b,c),(e,c)] -> R [(a,c),(b,c),(e,c)] [] [(c,e),(f,e)] -> R [(c,e),(f,e)] [] [(d,f),(e,f)] -> R [(d,f),(e,f)] [] [(e,b)] -> R [(e,b)] * => [(a,[]),(d,[])] => [a,d] => [(a,c),(b,c),(e,c),(c,e),(f,e),(d,f),(e,f),(e,b)] Step Merge fst x fst (next nodes, remove used links) [a] [(a,c)] -> L [(c,[a])] [d] [(d,f)] -> L [(f,[d])] [] [(b,c)] -> R [(b,c)] [] [(e,c),(e,f),(e,b)] -> R [(e,c),(e,f),(e,b)] [] [(c,e)] -> R [(c,e)] [] [(f,e)] -> R [(f,e)] => [(c,[a]),(f,[d])] => [(b,c),(e,c),(e,f),(e,b),(c,e),(f,e)] Merge fst x snd (merge nodes, remove back links) [(c,[a])] [(b,c),(e,c)] -> L (c,[a]) [(f,[d])] [(e,f)] -> L (f,[d]) [] [(e,b)] -> R [(e,b)] [] [(c,e)] -> R [(c,e)] [] [(f,e)] -> R [(f,e)] * => [(c,[a]),(f,[d])] => [c,f] => [(e,b),(c,e),(f,e)] Step Merge fst x fst (next nodes, remove used links) [c] [(c,e)] -> L [(e,[c])] [f] [(f,e)] -> L [(e,[f])] [] [(e,b)] -> R [(e,b)] => [(e,[c]),(e,[f])] => [(e,b)] Merge fst x snd (merge nodes, remove back links) [(e,[c]),(e,[f])] [] -> L (e,[c,f]) [] [(e,b)] -> R [(e,b)] * => [(e,[c,f])] => [e] => [(e,b)] Step Merge fst x fst (next nodes, remove used links) [e] [(e,b)] => L [(b,[e])] => [(b,[e])] => [] Merge fst x snd (merge nodes, remove back links) [(b,[e])] [] -> L (b,[e]) * => [(b,[e])] => [b] => []
genesSelected :: [Rec ElField '["gene" ::: Text, "step" ::: Int]] genesSelected :: [Rec ElField '["gene" ::: Text, "step" ::: Int, "parents" ::: [Text,[Text]]]]
[X]
Include parents in MFA output (analysesSelected)- output format: gene:relation,relation,…|….
many (delimited '|' --> (delimited ':' -&- many (delimited ','))) genesSelected :: [Rec ElField '["gene" ::: Text, "step" ::: Int]] genesSelected :: [Rec ElField '["gene" ::: Text, "step" ::: Int, "parents" ::: [Text,[Text]]]]
[X]
Updated SVM code to ignore new fields (selectedBase)
3 Wish list
3.1 DONE Replace P with proper Prism
The P
types is actually a Prism' s (Maybe s,a)
and could be replaced with it for seamless interfacing with the
rest of lenses. This would allow such thing as isos and composing with existings lenses.
3.2 DONE Logging system
- logging from pure code so don't polute with Monad
- pass around as a constraint so compiler plumbs it
- ability to supress excessive messages
3.3 DONE Warn about unused keys
3.4 DONE Space separated parser for config.tsv file
3.5 DONE Logging context instead of level limitations
3.6 Revamp of name system
Use reflection system to pass around config, allows base types to be tagged with the assoicated name component as well as making it possible to create Show instances and such
newtype Foo n = Foo Int -- foo type based around names contained in n Lens.makePrisms ''Foo fooIso :: forall g. ( Given g, HasFooIso' g ( ReifiedIso' ByteString (Foo g) ) ) => Iso' ByteString (Foo g) fooIso = Lens.runIso (Reflection.given @ g ^. fooIso') data Names = Names { namesFooIso' :: !(ReifiedIso' ByteString (Foo Names)) } class HasFooIso' s a | s -> a where fooIso' :: Lens' s a Lens.makeFields ''Names instance ( Given g, HasFooIso' g ( ReifiedIso' ByteString (Foo g) ) ) => Show (Foo g) where show = Show.show . Lens.view (Lens.from fooIso) magic :: IO () magic = do iso <- Names.isoCreate Reflection.give (Names (Lens.Iso (Lens.runIso iso . Lens.from _Foo))) $ do IO.print (hi ^. fooIso :: FooNames)
3.7 Replace Text bits using name system
A large part of this falls under the existing Names framework. Need to rework to tie the Names MVar
record with a
skolem variable.
3.8 Further cleanup of input system
3.8.1 Replace Finite with own instance that allows tuples
Idea is to be able to draw relationships between length and types.
newtype ListN (n :: GHC.Nat) a = ListN [a] class Finite v where type Size v :: GHC.Nat type Value v :: * toListN :: v -> ListN (Size v) fromListN :: ListN (Size v) -> v instance Finite (a,a) where type Size (a,a) = 2 type Value (a,a) = a toListN (a0,a1) = ListN [a0,a1] fromListN (ListN [a0,a1]) = (a0,a1) # ... and so on for tuples, records, and such
3.8.2 Expose fold at top of long/short/pull/push
3.9 Handle multiple hops and loops in Pseudonyms file
ADCY3 is both a symbol and a prevsymbol for ADCY8. Are there some with multiple hops? Could there even be circles?
3.10 Don't re-compute entire table when doing SVM label dumping
3.11 Logging ability to add to parents message count
3.12 Split into separate programs for command line composability
3.13 Option of treating aliases as relations instead of duplicating relations
3.14 Graphviz output for output link information
4 Bug to report upstream
4.1 foldFreeT type signature is over restrictive in requiring transformation to hold for all monads
foldFreeT :: (MonadTrans t, Monad (t m), Monad m) => (forall n x. Monad n => f x -> t n x) -> FreeT f m a -> t m a
instead of just the monad in question m (the current definition works fine with the new signature)
foldFreeT :: (MonadTrans t, Monad (t m), Monad m) => (forall x. f x -> t m x) -> FreeT f m a -> t m a
4.2 Iso passed to au is backwards
4.3 L1' Scan prefix1 implementation isn't strict in the sense of no blowing up memory
let sum = L1' id (+) (0+) in foldl' (flip prefix1 ) [1..]
as the above expands too
... ((+) $! ((+) $! ((+) $! 0 1) 2) 3) ...