Reputation
Badges 1
53 × Eureka!` # Development mode worker
worker {
# Status report period in seconds
report_period_sec: 2
# ping to the server - check connectivity
ping_period_sec: 30
# Log all stdout & stderr
log_stdout: true
# Carriage return (\r) support. If zero (0) \r treated as \n and flushed to backend
# Carriage return flush support in seconds, flush consecutive line feeds (\r) every X (default: 10) s...
That works fine:1631895370729 vm-aimbrain-01 info ClearML Task: created new task id=cfed3ea8512d4d9f858d085bd79e62e8 2021-09-17 16:16:10,744 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page: `
1631895370892 vm-aimbrain-01 info start
1631895370896 vm-aimbrain-01 error 0%| | 0/100 [00:00<?, ?it/s]
1631895471026 vm-aimbrain-01 error 100%|████...
I actually put all the commands in a script. The failure mode is exactly the same. I have no idea what to do next.
` #!/bin/bash
clearml_root=$1
if [[ $# -gt 0 ]]; then
echo Using "$1" as root
else
echo No root argument was provided, using /datadrive1
clearml_root=/datadrive1
fi
clearml="$clearml_root/clearml"
rm -R "$clearml"
mkdir -p "$clearml"/data/elastic_7
mkdir -p "$clearml"/data/mongo_4/db
mkdir -p "$clearml"/data/mongo_4/configdb
mkdir -p "$clearml"/data/redis
mkdir -p "$cl...
I found out that the lightning trainer has a progress_bar_refresh_rate argument (default set to 1) which produces the spamming logs. If I set that to 10, I get 1/10th of the spam (but a janky progress bar in the console). I could set it to 0 to disable it, but that's not really a fix. What I'd really want is the same behaviour in the console (one smooth progress bar) and one line per epoch in the logs; high hopes, right? 😊
ah nice, I'll try auto_connect_frameworks (probably with {'joblib': False} ? - we don't use scikit-learn)
Thanks AppetizingMouse58 . I managed to fix it by removing docker completely and reinstalling it.
Sorry to ping you @<1523701087100473344:profile|SuccessfulKoala55> , can you offer any ideas to the two questions from my reply (about the correct web app cloud access and the correct way to specify a blob storage in the clearml.conf file? Thanks 🙏
The UI shows the log as is (and as pasted above). In the console I'm getting correct output (a single tqdm progress line):
` [2021-09-17 13:29:51,860][pytorch_lightning.utilities.distributed][INFO] - GPU available: True, used: True
[2021-09-17 13:29:51,862][pytorch_lightning.utilities.distributed][INFO] - TPU available: False, using: 0 TPU cores
[2021-09-17 13:29:51,862][pytorch_lightning.utilities.distributed][INFO] - IPU available: False, using: 0 IPUs
[2021-09-17 13:29:51,866][pytorch_ligh...
In case anyone is interested, the minimum effort workaround I found is to edit pytorch_lightning/callbacks/progress.py and change all occurrences of dynamic_ncols=True to dynamic_ncols=False in the calls to tqdm . One could of course implement a custom callback inheriting from their ProgressBar class.
The template appears to be <alias> <url> <fetch|push> .
The .git/config file has sections for each remote too. Example:[remote "github"] url = git@github.com:biocatchltd/volt.git fetch = +refs/heads/:refs/remotes/github/Would be nice to report which remote the checked out branch actually tracks.
This is how a configuration item looks like:<tasks.ConfigurationItem: { "name": "filter", "value": "inference = [{\n type = \"StreamFilter\"\n params {\n context = \"full\"\n op = \"or\"\n lower_bounds {\n key = 16\n mouse = 32\n }\n }\n }]\ntrain {\n users {\n op = \"and\"\n lower_bounds {\n min_sessions = 32\n }\n }\n}", "type": "dictionary" }>The value is a string that prints pretty but I'm not sure how to p...
This is how the links to the artifacts looks like (the part I blurred out is is the last part of the secret, which is working fine since the task was able to upload those correctly to storage, I can check that):
Unfortunately there doesn't seem to be any out-of-the-box functionality for ridgeline plots (joyplots) in plotly. They are certainly doable ( https://www.python-graph-gallery.com/ridgeline-graph-plotly , or https://chart-studio.plotly.com/~empet/14632/plotly-joyplotridgelines/#/ ) but I'd guess this won't happen any time soon 🤭 . We'd be happy with also having functionality similar to the one from the Scalars tab: first isolating one iteration (the latest by default) and grouping togeth...
I apologize 😳 . Turns out I have all those things. Again, my apologies 😞 .
Also I just tried the pytorch-lightning RichProgressBar (not yet released) instead of the default (which is unfortunately based on tqdm) and it works great.
If we decide go forward with clearml we'll probably do just that 🙂
'scikit' worked nicely, thanks again
I'll let you know asap
✦ ❯ git remote -v github git@github.com:biocatchltd/volt.git (fetch) github git@github.com:biocatchltd/volt.git (push)
Hi AgitatedDove14 , I deleted everything in /opt/clearml as per the docs. Should I delete anything else?
Hi Jake, thanks for the reply. I've tried the account key method, works fine - but unfortunately clearml expects an old version of azure-storage-blob (<2.1), which is incompatible with the recent versions (^12.). Any clues of how we could work around this one? Thanks again.
Hi SweetBadger76 , thanks, I think I've made it work. The main point of confusion was between dealing with different type of Task objects (i.e. clearml.backend_api.services.v2_13.tasks.Task returned by get_all , which don't have any of those methods).
Interestingly, set_parameters didn't just work as expected, I had to flatten the dicts myself (which clearml apparently does on its own when I call set_parameters on a new task.
Thank you all. 🙏
I don't control tqdm, (otherwise I would have already gone for Stef's suggestion) - pytorch-lightning does in this particular script 😞 .
Yeah, I experienced the same issue. Training stopps / freezes at the end of the 10th, or 15th epoch. Using pytorch_lightning as well.
New to lightning too, but I'm suspecting that since your args don't mention a specific logger, the pl trainer will instantiate the default one. Excerpt from the trainer docstring:logger: Logger (or iterable collection of loggers) for experiment tracking. A ``True`` value uses the default ``TensorBoardLogger``. ``False`` will disable logging. If multiple loggers are provided and thesave_dir` property of that logger is not set, local files (check...
Must be something else foul at play here..
