Reputation
Badges 1
53 × Eureka!I actually put all the commands in a script. The failure mode is exactly the same. I have no idea what to do next.
` #!/bin/bash
clearml_root=$1
if [[ $# -gt 0 ]]; then
echo Using "$1" as root
else
echo No root argument was provided, using /datadrive1
clearml_root=/datadrive1
fi
clearml="$clearml_root/clearml"
rm -R "$clearml"
mkdir -p "$clearml"/data/elastic_7
mkdir -p "$clearml"/data/mongo_4/db
mkdir -p "$clearml"/data/mongo_4/configdb
mkdir -p "$clearml"/data/redis
mkdir -p "$cl...
Thanks AppetizingMouse58 . I managed to fix it by removing docker completely and reinstalling it.
I didn't add that to the script since the effect is persistent (i.e. it only needs to be done once, right?) In any case, I checked that multiple times and it was as expected.
Hi AgitatedDove14 , I deleted everything in /opt/clearml as per the docs. Should I delete anything else?
Try updating to 1.1.0?
Hi AgitatedDove14
this is how our calls look like:
` from pytorch_lightning.loggers import TensorBoardLogger
logger = TensorBoardLogger(save_dir=".", name="debug plotting", 1)
logger.experiment.add_histogram(f"A", data[data.by == 0])
logger.experiment.add_histogram(f"B", data[data.by == 1]) `the result of which is shown in my post above.
This is some test data, and how we'd like things to look:
` def make_data(size: int=10000, n: int=5) -> pd.DataFrame:
x = np.abs(np.random.normal(siz...
Hi Martin, to expand on my previous comments: the template for _Driver
already exists; I'm suggesting to make it public. Consequently, StorageHelper
should accept a driver
parameter to __init__
, defaulting to None
. Only when its value is not provided by the user should the library go out of its way to do the right thing and check all the known storage providers, fetch credentials, what not - stuff that will not work for most users, most of the time (even if you ...
Hi Jake, thanks for the reply. I've tried the account key method, works fine - but unfortunately clearml expects an old version of azure-storage-blob
(<2.1), which is incompatible with the recent versions (^12.). Any clues of how we could work around this one? Thanks again.
Yeah, I experienced the same issue. Training stopps / freezes at the end of the 10th, or 15th epoch. Using pytorch_lightning
as well.
True, the Hyperparameters tab does show the environment (sorry, my bad). The repo information, or the uncommitted changes don't show up for me 😞 . What version did you use?
OK I can confirm that those display ok for me too. My problem is, they don't for my experiments, which is I care about..
I apologize 😳 . Turns out I have all those things. Again, my apologies 😞 .
Hi again. After looking into the matter a little bit, I realise I'd have liked having the option of using a StoreManager
ABC which I would implement myself using whatever storage provider I happen to use and whatever package versions happened to support it. To put it differently, instead of you implementing managers for gcs, azure, aws, etc, it would be a much nicer alternative (for me, and I suspect eventually for you too) for clearml's store manager to wrap whatever object the user pr...
If we decide go forward with clearml we'll probably do just that 🙂
Interesting, I don't get newlines in any of my consoles:ClearML Task: overwriting (reusing) task id=38cc10401fcc43cfa432b7ceed7df0cc 2021-10-08 14:57:53,704 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page:
`
...
Hi Martin, it is a tqdm parameter (the default ProgressBar
in pytorch lightning is unfortunately relying on tqdm). This is from the tqdm docs:dynamic_ncols : bool, optional If set, constantly alters
ncolsand
nrows` to the
environment (allowing for window resizes) [default: False].
nrows : int, optional
The screen height. If specified, hides nested bars outside this
bound. If unspecified, attempts to use environment...
` # Development mode worker
worker {
# Status report period in seconds
report_period_sec: 2
# ping to the server - check connectivity
ping_period_sec: 30
# Log all stdout & stderr
log_stdout: true
# Carriage return (\r) support. If zero (0) \r treated as \n and flushed to backend
# Carriage return flush support in seconds, flush consecutive line feeds (\r) every X (default: 10) s...
radu on vm-aimbrain-01 in experiments/runners/all via :snake: v3.8.5 via C volt ❯ grep flush ~/clearml.conf # Carriage return (\r) support. If zero (0) \r treated as \n and flushed to backend # Carriage return flush support in seconds, flush consecutive line feeds (\r) every X (default: 10) seconds console_cr_flush_period: 600
I don't control tqdm, (otherwise I would have already gone for Stef's suggestion) - pytorch-lightning does in this particular script 😞 .
The UI shows the log as is (and as pasted above). In the console I'm getting correct output (a single tqdm progress line):
` [2021-09-17 13:29:51,860][pytorch_lightning.utilities.distributed][INFO] - GPU available: True, used: True
[2021-09-17 13:29:51,862][pytorch_lightning.utilities.distributed][INFO] - TPU available: False, using: 0 TPU cores
[2021-09-17 13:29:51,862][pytorch_lightning.utilities.distributed][INFO] - IPU available: False, using: 0 IPUs
[2021-09-17 13:29:51,866][pytorch_ligh...
Hi SweetBadger76 , thanks, I think I've made it work. The main point of confusion was between dealing with different type of Task
objects (i.e. clearml.backend_api.services.v2_13.tasks.Task
returned by get_all
, which don't have any of those methods).
Interestingly, set_parameters
didn't just work as expected, I had to flatten the dicts myself (which clearml apparently does on its own when I call set_parameters
on a new task.
Thank you all. 🙏
Also I just tried the pytorch-lightning RichProgressBar
(not yet released) instead of the default (which is unfortunately based on tqdm) and it works great.
Not sure how to check that tbh. Does this help:root@aea5d96a8ed3:/usr/agent# clearml-agent --version CLEARML-AGENT version 1.0.0
Would be nice to display this info maybe somewhere inhere:
Sorry, I meant the "origin" part. The warning is no more.
✦2 ❯ git remote show
github