Guys FYI:params = task.get_parameters_as_dict()
FYI: These days TB became the standard even for pytorch (being a stand alone package), you can actually import it from torch.
There is an example here:
https://github.com/allegroai/trains/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py
HealthyStarfish45 did you manage to solve the report_image issue ?
BTW: you also have
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
https://github.com/allegroai/trains/blob/master/examples/reporting/...
DepressedChimpanzee34 I cannot find cfg.py here
https://github.com/allegroai/clearml/tree/master/examples/frameworks/hydra/config_files
(or anywhere else)
This works.
great!
So it is still in master and should be included in 1.0.5?
correct, RC will be released soon with this fix included
Is this a logging
issue, or clearml issue ?
@<1615519322766053376:profile|DrainedOctopus19> if your code is a single file (which was stored on the clearml server), then ity is stored on the Task:
task = Task.get_task("task UID here")
# this should be your entire code
print(task.data.script.diff)
PompousBeetle71 a few questions:
is this like using PyTorch distributed , only manually? Why don't you use call trains.init
in all the sub processes? We had a few threads on that, it seems like a recurring question, I'll make sure we have an example on GitHub. Basically trains will take care of passing the arg-parser commands to the sub processes, and also on torch node settings. It will also make sure they all report to the tame experiment.What do you think?
Also, can the image not be pulled from dockerhub but used from the local build instead?
If you have your docker configured to pull from local artifactory, then the agent will do the same π (it is calling the docker command just like you do)
agent.default_docker.arguments: "--mount type=bind,source=$DATA_DIR,target=/data"
Notice that you are use default docker arguments in the example
If you want the mount to always be there use extra_docker_arguments :
https://github.com/...
Hi JitteryCoyote63
Is this close ?
https://github.com/allegroai/clearml/issues/283
Hi StaleHippopotamus38
I imagine I could make the changes specified in the warning toΒ
/etc/security/limits.conf
Yep seems like elastic memory issue, but I think the helm chart takes care of it,
You can see a reference in the docker compose:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L41
We should probably change it so it is more human readable π
Does the clearml module parse the python packages?
Yes it analyzes the installed packages based on the actual mports you have in the code.
If I'm using a private pypi artifact server, would I set the PIP_INDEX_URL on the workers so they could retrieve those packages when that experiment is cloned and re-ran?
Correct π the agent basically calls pip install
on those packages, so if you configure it, with PIP_INDEX_URL it should just work like any other pip install
and the agent default runtime mode is docker correct?
Actually the default is venv mode, to run in docker mode add --docker
to the command line
So I could install all my system dependencies in my own docker image?
Correct, inside the docker it will inherit all the preinstalled packages, But it will also install any missing ones (based on the Task requirements. i.e. "installed packages" section)
Also what is the purpose of the
aws
block in the clearml.c...
How does a task specify which docker image it needs?
Either in the code itself 'task.set_base_docker' or with the CLI, or set it in the UI when you clone an experiment (everything becomes editable)
Hi VirtuousFish83 ,
Is it throwing an exception? Are you seeing the plot in the UI but the title is incorrect?
Then the type hints are not removed from helper and the code immediately crashes when being run
Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved
That is a good question ... let me check π
LOL totally π
Let me try to add some color to this process analysis process.
Basically clearml will try to statically analyze the code (i.e. look for import/from packages)
Then it will list them in a pip requirements.txt format under installed packages.
When running inside conda environment, it will check which packages were installed via "conda install" (instead of pip install) and mark them internally. This process ensures that when the clearml-agent is running with conda package manager, it "knows" whic...
No worries π glad it worked
Hi SparklingElephant70
Anyone know how to solve?
I tired git push before,
Can you send the entire log? Could it be that the requested commit ID does not exist on the remote git (for example force push deleted it) ?
CleanPigeon16 , just making sure, docker is installed and configured on the host machine (i.e. Azure machine)?
A few examples here:
None
Grafana model performance example:
browse to
login with: admin/admin
create a new dashboard
select Prometheus as data source
Add a query: 100 * increase(test_model_sklearn:_latency_bucket[1m]) / increase(test_model_sklearn:_latency_sum[1m])
Change type to heatmap, and select on the right hand-side under "Data Format" s...
BTW: there is still the bug with the env merging, correct ?
Hi CleanPigeon16
I think now the issue is missing git credentials, did you pass git_user / git_pass to the AWS autoscaler ?
Can you please tell me how to return the folder where the script should run?
add it to the python path
PYTHONPATH="/src/project"
Hi @<1535069219354316800:profile|PerplexedRaccoon19>
On debugging, it looks like indices are corrupt.
ishhhhh, any chance you have a backup?
Hi PanickyMoth78
I had several pipeline components getting it and uploading files to is concurrently.
Should not be a problem
I've attached it's log file which only mentions skipping one file (a warning)
So what exactly is the error you are getting?
Hi WickedElephant66
Setting the pipeline controller with pipeline_execution_queue as None
is actually launching the pipeline controller on your "dev" machine, not sure why you have two of them?