Check the log to see exactly where it downloaded the torch from. Just making sure it used the right repository and did not default to the pip, where it might have gotten a CPU version...
JitteryCoyote63 any chance you have a log of the failed torch 1.7.0 ?
okay that's good, that means the agent could run it.
Now it is a matter of matching the TF with cuda (and there is no easy solution for that). Basically I htink that what you need is "nvidia/cuda:10.2-cudnn7-runtime-ubuntu16.04"
GrievingTurkey78 short answer no 😞
Long answer, the files are stored as differentiable sets (think changes set from the previous version(s)) The collection of files is then compressed and stored as a single zip. The zip itself can be stored on Google but on their object storage (not the GDrive). Notice that the default storage for the clearml-data is the clearml-server, that said you can always mix and match (even between versions).
The second problem that I am running into now, is that one of the dependencies in the package is actually hosted in a private repo.
Add your private repo to the extra index section in the clearml.conf:
None
Hi @<1695969549783928832:profile|ObedientTurkey46>
Why do tags only show on a version level, but not on the dataset-level? (see images)
Tags of datasets are tags on "all the dataset versions" i.e. to help someone locate datasets (think locating projects as an analogy). Dataset Version tags are tags on a specific version of the dataset, helping users to locate a specific version of the dataset. Does that make sense ?
Hi SourSwallow36
What do you man by Log each experiment separately ? How would you differentiate between them?
OutrageousGrasshopper93 could you send an example of the two links from the artifacts (one local one remote) ?
Hi JealousParrot68
clearml tracking of experiments run through kedro (similar to tracking with mlflow)
That's definitely very easy, I'm still not sure how Kedro scales on clusters. From what I saw, and I might have missed it, it seems more like a single instance with sub-processes, but no real ability to setup diff environment for the diff steps in the pipeline, is this correct ?
I think the challenge here is to pick the right abstraction matching. E.g. should a node in kedro (w...
Different question. How can I pass PYTHONPATH env variable to a task, run by agent (so python can find classes inside m subdirectories)?
Hi HelpfulHare30
By default the working directory will be added to the python path, this means if I have under execution:Working Dir: "." Script: "src/script.py"
The root git repo will be added to the python path.
BTW: next RC you could add a flag to the agent to always add the git repo
yey 🙂 notice that when executed by the agent the call execute_remotely
is skipped, and so does the If statement I added (since running_locally will return False when the process is executed by the agent)
(Venv mode makes sense if running inside a container, if you need docker support you will need to mount the docker socket inside)
What is exactly the error you re getting from clearml? And what do you have in the configuration file?
Hi CheerfulGorilla72
is it ideological...
Lol, no 😀
Since some of the comparisons are done client side (browser, mostly the text comparisons) it is a bit heavy , so we added a limit. We want to change it so it does some on the backend, but in the meantime we can actually expand the limit, and maybe only lazy compare the text areas. Hopefully in the next version 🤞
Hi John. sort of. It seems that archiving pipelines does not also archive the tasks that they contain so
This is correct, the rationale is that the components (i.e. Tasks) might be used (or already used) as cached steps ...
what do you say that I will manually kill the services agent and launch one myself?
Makes sense 🙂
Hi QuaintJellyfish58
You can always set it inside the function, withTask.current_task().output_uri = "s3://"
I have to ask, I would assume the agents are pre-configured with "default_output_uri" in the clearml.conf, why would you need to set it manually?
I think you are correct, this is odd, let me check ...
LudicrousParrot69 I would advise the following:
Put all the experiments in a new project Filter based on the HPO tag, and sort the experiments based on the metric we are optimizing (see adding custom columns to the experiment table) And select + archive the experiments that are not usedBTW: I think someone already suggested we do the auto archiving inside the HPO process itself. Thoughts ?
Is this from the pipeline logic? Or a component?
Thanks @<1523704157695905792:profile|VivaciousBadger56> ! great work on the docstring, I also really like the extended example. Let me make sure someone merges it
Hi JitteryCoyote63
Or even better: would it be possible to have a support for HTML files as artifacts?
If you report html files as debug media they will be previewed, as long as the link is accessible.
You can check this example:
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
In the artifacts, I think html are also supported (maybe not previewed as nicely but clickable.
Regrading the s3 link, I think you are supposed to get a popup window as...
RoundMosquito25 good news, no no need to open any ports 🙂
Basically B_i agents are always polling the server for "jobs" create an http/s request from them to the server, so all connections are out connections. Firewall is intact 🙂
models been trained stored ...
mongodb will store url links, the upload itself is controlled via the "output_uri" argument to the Task
If None is provided, the Trains log the local stored model (i.e. link to where you stored your model), if you provide one, Trains will automatically upload the model (into a new subfolder) and store the link to that subfolder.
- how can I enable the tensorboard and have the graphs been stored in trains?
Basically if you call Task.init all your...
One option is definitely having a base image that has the things needed. Anything else? Thanks!
This is a bit complicated, to get the cache to kick in you have to mount an NFS file into the pod as the cache (to create a persistent cache)
Basically, spin NFS pod to store the cache, change the glue job template yaml to mount it into the pod (see default cache folders:
/root/.cache/pip and /root/.clearml/pip-download-cache)
Make sense ?
We should probably change it so it is more human readable 🙂
Thanks CharmingShrimp37 !
Could you PR the fix ?
It will be just in time for the 0.16 release 🙂