LovelyHamster1 NICE! 👍
Hi LivelyLion31 I missed your S3 question, apologies. What did you guys end up doing?
BTW you could always upload the entire TB log folder as artifact, it's simple task.upload_artifact('tensorboard', './tblogsfolder')
gm folks, really liking ClearML so far as my top choice (after looking at dvc, mlflow), and thank you for your help here!
Thanks HurtWoodpecker30 !
Is there a recommended workflow to be able to “drop into” the
exact
env
(code, venv, data) of a previous experiment (which may have been several commits ago), to reproduce that experiment?
You can use clearml-agent on your local machine to build the env of any Task,
` clearml-agent build --id <ta...
oh...so is this a bug?
It was always a bug, only an elusive one 😉
Anyhow, I'll make sure we push a fix to GitHub, an RC is planned for later this week, it will contain it
Right! I just noticed that! this is odd... and yes defiantly has something to do with the multi pipeline executed on the agent, I think I know what to look for ...
(just making sure (again), running_locally produced exactly what we were expecting, is that correct?)
Hi @<1549202366266347520:profile|GorgeousMonkey78>
how do I integrate sagemaker with clearml ,
you mean to launch an experiment, or just to log it?
- try with the latest RC
1.8.1rc2
, it feels like after git clone, it spend minutes without outputting anything
yeah that is odd , can you run the agent with --debug (add before the daemon
command) , and then at the end of the command add --foreground
Now launch the same task on that queue, you will have a verbose log in the console.
Let us know what you see
SmarmySeaurchin8 I might be missing something in your description. The way the pipeline works,
the Tasks in the DAG are pre-executed (either with "execute_remotely" or actually fully executed once").
The DAG nodes themselves are executed on the trains-agent , which means they reproduce the code / env for every cloned Task in the DAG (not on the original Tasks).
WDYT?
i keep getting an failed getting token error
MiniatureCrocodile39 what's the server you are using ?
Hi FunnyTurkey96
Any chance you can try to run with the latest form GitHub (i just tested your code and it seemed to work on my machine).pip install git+
Hi CrookedWalrus33
docker_setup_bash_script= ["export PATH=""/workspace/miniconda/bin:$PATH"])
Oh I think you are correct, this should do the trick:docker_setup_bash_script= ["export PATH=/workspace/miniconda/bin:$PATH", "export LOCAL_PYTHON=/workspace/miniconda/bin/python3"]
This will make sure both agent and script execute on the same python
but to run a script inside a docker which already has the environment built in.
If this is already activated, the latest agent w...
TroubledHedgehog16
but doesn't run when I deploy it using clearml. Here's the log of the error:
...
My guess is that clearml is reimporting keras somewhere, leading to circular dependencies.
It might not be circular, but I would guess it does have something to do with order of imports. I'm trying to figure out what would be the difference between local run and using an agent
Is it the exact same TF version?
this is not the case as all the scalars report the same iterations
MassiveHippopotamus56 could it be the the machine statistics? (i.e. cpu/gpu etc. these are considered scalars as well...)
Could it be something else is missing and hence the import fails ?
Ohh I see now the force SSH did not replace the user in the SSH link (only if the original was http), right ?
looks like service-writing-time for me!
Nice!
persist/restore state so that tasks are restartable?
You mean if you write preemption-ready training code ?
logger.report_scalar(title="loss", series="train", iteration=0, value=100)
logger.report_scalar(title="loss", series="test", iteration=0, value=200)
Hi JollyChimpanzee19
I found this one:
https://clearml.slack.com/archives/CTK20V944/p1622134271306500
BTW: if you make the right column the base line (i.e. move it to the left, you will get what you probably expected)
I would do something like:
` from clearml import Logger
def forward(...):
self.iteration += 1
weights = self.compute_weights(...)
m = (weights * (target-preds)).mean()
Logger.current_logger().report_scalar(title="debug", series="mean_weight", value=m, iteration=self.iteration)
return m `
Ohh, then yes, you can use the https://github.com/allegroai/clearml/blob/bd110aed5e902efbc03fd4f0e576e40c860e0fb2/clearml/automation/monitor.py#L10 class to monitor changes in the dataset/project
Is this still an issue (if you provide queue name, the default tag is not used so no error should be printed)
Hi @<1523702932069945344:profile|CheerfulGorilla72>
Please tell me what RAM metric is tracked by ClearML?
Free RAM is the entire machine free RAM
Yeah htop shows odd numbers as it doesn't "count" allocated buffers
specifically you can see the code here:
None
I get a popup saying that the actual files weren’t deleted from S3 (so presumably only the metadata on the server gets deleted).
Hi QuaintPelican38
The browser client actual issues the delete "command", (the idea is separation of the meta-data and data, e.g. artifacts). That means you have to provide the key/secret to the UI (see profile page)
Great, you can test directly from the master 🙂pip3 install -U git+
The release was supposed to be out this week, got delayed by some py2 support issue, anyhow the release will be almost exactly like the latest we now have on the GitHub repo (and I'm assuming it will be out just after the weekend)
Hmmm maybe
I thought that was expected behavior from poetry side actually
I think this is the expected behavior, hence bug?!