Reputation
Badges 1
25 × Eureka!no, at least not yet, someone definitely needs to do that though haha
Currently all the unit tests are internal (the hardest part is providing server they can run against and verify the results, hence the challange)
For example, if ClearML would offer a
TestSession
that is local and does not communicate to any backend
Offline mode? it stores everything into a folder, then zips it, you can access the target folder or the zip file and verify all the data/states
DeliciousBluewhale87 my apologies you are correct 😞
We should probably add support for that, do you feel like adding a GitHub issue, so we do not forget?
Not at all, we love ideas on improving ClearML.
I do not think there is a need to replace feast, it seems to do a lot, I'm just thinking on integrating it into the ClearML workflow. Do you have a specific use case we can start to work on? Or maybe a workflow that would make sense to implment?
Hmm I think the approach in general would be to create two pipeline tasks, then launch them from a third pipeline or trigger externally? If on the other hand it makes sense to see both pipelines on the same execution graph, then the nested components makes a lot of sense. Wdyt?
SmugOx94 Yes, we just introduced it 🙂 with 0.16.3
Discussion was here (I'll make sure to update the issue that the version is out)
https://github.com/allegroai/trains/issues/222
In your trains.conf
add the following line:sdk.development.store_code_diff_from_remote = true
It will store the diff from the remote HEAD instead of the local one.
Hmm there was this one:
https://github.com/allegroai/clearml/commit/f3d42d0a531db13b1bacbf0977de6480fedce7f6
Basically always caching steps (hence the skip), you can install from the main branch to verify this is the issue. an RC is due in a few days (it was already supposed to be out but got a bit delayed)
CLEARML_AGENT_GIT_USER
Is your git user (on whatever git host/server you are using, GitHub/GitLab/BitBucket etc.)
Click on the "k8s_schedule" queue, then on the right hand side, you should see your Task, click on it, it will open the Task page. There click on the "Info" Tab, there look for "STATUS MESSAGE" and "STATUS REASON". What do you have there?
but when the dependencies are installed, the git creds are not taken in account
I have to admit, we missed that use case 😞
A quick fix will be to use git ssh, which is system wide.
but I want know to switch to git auth using Personal Access Token for security reasons)
Smart move 😉
As for the git repo credentials, we can always add them, when you are using user/pass. I guess that would be the behavior you are expecting, unless the domain is different......
PompousParrot44
It should still create a new venv, but inherit the packages from the system-wide (or specific venv) installed packages. Meaning it will not reinstalled packages you already installed, but it will ive you the option of just replacing a specific package (or install a new one) without reinstalling the entire venv
Hi TightElk12
it would raise an error if the env where execution happens is not configured to track things on our custom server to prevent logging to the public demo server ?
What do you mean by that? catching the default server instead of the configured one ?
1st: is it possible to make a pipeline component call another pipeline component (as a substep)
Should work as long as they are in the same file, you can however launch and wait any Task (see pipelines from tasks)
2nd: I am trying to call a function defined in the same script, but unable to import it. I passing the repo parameter to the component decorator, but no change, it always comes back with "No module named <module>" after my
from module import function
c...
Hi SubstantialElk6
Generally speaking here, the idea is that actual code creates a Dataset (i.e. Dataset class created from code), plus you can add some metric reporting (like table reporting) to create a preview of the data stored for better visibility, or maybe create some statistics as part of the data ingest script. Then this ingest code can be relaunched / automated. The created Dataset itself can be tagged renamed added key/value for better cataloging. wdyt?
GrotesqueDog77 one issue with this design, in order to run a sub-component, the call must be done from the parent component, does that make sense?
` def step_one(data):
return data
def step_two(path):
return model
def both_steps()
path = step_one("stuff")
return step_two(path)
def pipeline():
both_steps() Which would make
both_steps ` a component and step_one and step_two sub-components
wdyt?
Where exactly are the model files stored on the pod?
clearml cache folder, usually under ~/.clearml
Currently I encounter the problem that I always get a 404 HTTP error when I try to access the model via the...
How are you deploying it? I would start by debugging and runnign everything in the docker-compose (single machine) make sure you have everything running, and then deploy to the cluster
(becuase on a cluster level, it could be a general routing issue, way before getting t...
RobustGoldfish9
I think you need to set the trains-agent docker to be aware of the host, so it knows how to mount data/cache/configurations into the sibling docker
It should look something like:TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains"
So if running a docker:docker run -e TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains" ...
ShaggyHare67 are you saying the problem is trains
fails discovering the packages in the manual execution ?
Yes, let's assume we have a task with id aabbcc
On two different machines you can do the following:trains-agent execute --docker --id aabbcc
This means you manually spin two simultaneous copies of the same experiment, once they are up and running, will your code be able to make the connection between them? (i.e. openmpi torch distribute etc?)
what does it mean to run the steps locally?
start_locally : means the pipeline code itself (the logic that runs / controls the DAG) runs on the local machine (i.e. no agent), but this control logic creates/clones Tasks and enqueues them, for those Tasks you need an agent to execute them
run_pipeline_steps_locally=True: means the Tasks the pipeline creates, instead of enqueuing them and having an agent runs them, they will be launched on the same local machine (think debugging, other...
WackyRabbit7
we did execute locally
Sure, instead of pipe.start()
use pipe.start_locally(run_pipeline_steps_locally=False)
, this is it 🙂
is there a way for me to get a link to the task execution? I want to write a message to slack, containing the URL so collaborators can click and see the progress
WackyRabbit7 Nice!
basically you can use this one:task.get_output_log_web_page()
actually the issue is that the packages are not being detected 😞
what happens if you do the following?Task.add_requirements("tensorflow") task = Task.init(...)
WickedGoat98 nice!!
Can you also pass the login screen (i.e. can you access the api server)
CooperativeFox72 of course, anything trains related, this is the place 🙂
Fire away
So how can I temporarily fix it?
Try:task.output_uri = task.get_output_destination()
I have to leave i'll be back online in a couple of hours.
Meanwhile see if the ports are correct (just curl to all ports see if you get an answer) if everything is okay, try again to run the text example
You need to mount it to ~/clearml.conf
(i.e. /root/clearml.conf)
"regular" worker will run one job at a time, services worker will spin multiple tasks at the same time But their setup (i.e. before running the actual task) is one at a time..
So why is it trying to upload to "//:8081/files_server:" ?
What do you have in the trains.conf on the machine running the experiment ?